Identiﬁcation and Apportionment of Potential Pollution Sources Using Multivariate Statistical Techniques and APCS-MLR Model to Assess Surface Water Quality in Imjin River Watershed, South Korea

: Reliable water quality monitoring data, identifying potential pollution sources, and quan-tifying the corresponding potential pollution source apportionment are essential for future water resource management and pollution control. Here, we collected water quality data from seven monitoring sites to identify spatiotemporal changes in surface water in the Imjin River Watershed (IRW), South Korea, distinguish potential pollution sources, and quantify the source apportionment from 2018–2020. An analysis was performed based on multivariate statistical techniques (MST) and the absolute principal component score-multiple linear regression (APCS-MLR) model. Statistically signiﬁcant groups were created based on spatiotemporally similar physicochemical water quality characteristics and anthropogenic activities: low-pollution (LP) and high-pollution (HP) regions, and dry season (DS) and wet season (WS). There were statistically signiﬁcant mean differences in water quality parameters between spatial clusters, rather than between temporal clusters. We identiﬁed four and three potential factors that could explain 80.75% and 71.99% in the LP and HP regions, respectively. Identiﬁcation and quantitative evaluation of potential pollution sources using MST and the APCS-MLR model for the IRW may be useful for policymakers to improve the water quality of target watersheds and establish future management policies.


Introduction
Water is essential for human life and agricultural and industrial activities, and is a crucial resource contributing to socio-cultural and economic development [1]. Water quality is a major global issue, as surface water is degraded due to the effects of land-cover changes, higher population density, livestock manure discharge, and point and nonpoint source pollution [2]. Rivers are created by natural water sources that are susceptible to anthropological, biological, and chemical impacts, and are important indicators for sustainable development in terms of human well-being and ecological and economic development [3]. In general, the water quality of rivers is greatly affected by natural process factors, such as soil erosion and weathering [4], oxidation of rock minerals [5], and seawater intrusion [6], and by currently unsustainable anthropogenic factors such as domestic and municipal sewage [7], livestock fertilizers [8], and agricultural and industrial wastewater [9]. In particular, anthropogenic factors influencing surface water can cause surface water quality changes because pollutants are discharged through particular locations (point sources) such as wastewater treatment plants or through surface flows (nonpoint sources) [10]. Agricultural, industrial, and urban activities have altered the quality and quantity of urban sewage [11] and caused a decrease in dissolved oxygen of surface water and runoff and water temperature changes [12], with considerable impacts on aquatic ecosystems (e.g., higher eutrophication and algal production) [13].
Continuous reinforcement of water pollution management and control has improved the ability to monitor surface water. Environmental research institutes such as the United States Environment Protection Agency (US EPA) and United States Geological Survey (USGS) have regularly monitored environmental data and provide the data in a file format that is required to run an environmental model [14,15]. A regular water quality monitoring program is important not only for protecting water resources and controlling water pollution, but also for modeling the pollutant distribution, source location, and health hazards [16]. The water quality monitoring program enables regular sampling, determination of physicochemical parameters, and provision of representative surface water conditions [17]. Water quality monitoring data are obtained through sampling and analysis of a specific number of physical, chemical, and biological parameters at various sampling points in the river network; the number, type, sampling frequency, and method of monitored parameters are defined by the government systems [18]. Water quality monitoring datasets have numerous limitations in interpreting and drawing meaningful conclusions, because they generally contain large amounts of information and complex relationships between variables [19]. An evaluation of general water quality monitoring data aims to compare the measured physicochemical parameters with the threshold values recommended by a national or international organization; however, it has limitations in suggesting alternatives because the interpretation of the cause is not clear [20]. Correlation analysis and multiple linear regression (MLR) can be used to identify the correlation between environmental variables and river water quality, and to provide quantitative information and spatial variability of water quality variables; however, they are insufficient in providing general information on pollutant sources [21]. Therefore, to sustainably prevent and control surface water pollution, regular water quality monitoring should be conducted at various points in river networks and reliable water quality assessment methods should be used that interpret complex structured multivariate data [22,23].
Multivariate statistical techniques (MST) are powerful tools that provide environmental information regarding the main factors and potential pollution sources affecting water environment systems by identifying hidden relationships between variables and reducing the dimensionality of complex datasets [24]. MST has been utilized in all aspects of the social and natural environments, including in forecasts, mathematical modeling, data analysis, and statistics [17]. Recently, many scientists around the world have used MST, such as cluster analysis (CA) and principal component analysis and factor analysis (PCA/FA), to simplify large and complex water quality datasets, and to identify spatiotemporal changes, potential factors, and pollution sources [25] of surface water [26,27]. Furthermore, MST has been applied to determine spatiotemporal changes and diverse sources of groundwater [28] and drinking water [29]. The absolute principal component score-multiple linear regression (APCS-MLR) model has been used to quantitatively analyze contributions of potential pollution sources [30][31][32]. By combining MST and the APCS-MLR model, it is possible to comprehensively manage the spatiotemporal variability of surface water and the identification of pollution sources [33]. Therefore, as the data mining of water quality monitoring results involves several statistical techniques, and environmental systems are considered multivariate systems, MST and the APCS-MLR model are considered the most reliable method [34].
The river environment in Korea is mainly controlled by spatial and seasonal fluctuations, and its economic growth and development have been heavily influenced by the regional and seasonal availability of surface water. The spatial and seasonal availability of surface water is highly sensitive to the topography and monsoon climate of a country [35]. On the Korean Peninsula, the Imjin River is a transboundary river jointly occupied by South and North Korea; due to the specificity of the Military Demarcation Line (MDL), environmental conservation and water resource development and utilization are expected [36]. The Ministry of Environment of the Republic of Korea has set target water quality (biological oxygen demand (BOD) and total phosphorus (TP)) values for the Imjin River Watershed (IRW), South Korea, since May 2007. The Total Pollution Load Management System (TPLMS) is being implemented to manage the number of total pollutants flowing into the river that are below the quality standard [37]. However, it was recently estimated that the IRW is vulnerable to domestic wastewater due to rapid urbanization and industrialization, industrial wastewater pollution from textile factories or large-scale industrial facilities, and runoff from agricultural land impacted by African swine fever (ASF) virus [38]. In particular, as 62.8% (1926 facilities) of industrial wastewater discharge facilities (3065 facilities) are located in areas where the installation of discharge facilities is restricted, the recent water quality of the IRW has been insignificantly improved, or has deteriorated [38]. Considering that, rivers entering the IRW inevitably contain pollutants, posing potential health and environmental risks to those living in the vicinity of the IRW. Therefore, it is crucial to regularly monitor and establish pollution control policies to identify the spatiotemporal variability of IRW surface water and the various potential pollution sources.
This study combined MST (CA and PCA/FA) and the APCS-MLR model based on the water quality data via regular monitoring, identified the spatiotemporal variability of the IRW surface water, and conducted a quantitative analysis of apportionment of the sources. This involved the following: (1) for CA, the spatiotemporal characteristics of IRW water quality were analyzed; (2) for PCA/FA, spatial group factors and potential pollution sources influencing the water quality of the IRW were identified; (3) for the APCS-MLR model, contributions of potential factors and pollution sources identified by PCA/FA were quantitatively evaluated. The results provide scientific data for efficient water quality management and policy establishment for the IRW.

Study Area
South and North Korea on the Korean Peninsula have been divided since the Korean War (25 June 1950-27 July 1953; due to the special situation of the MDL, two countries have jointly occupied the Imjin River [39]. The Imjin River is the seventh largest river on the peninsula, originating from Duryu Mountain, Beopdong-gun, Gangwon Province, North Korea, passing through MDL, joining the Hantan River and Moonsan Stream, joining the Han River in Gimpo-si, Gyeonggi Province, and then flowing into the West Sea [38]. The Imjin River is located between 37 • 44 23 N 126 • 31 19 E and 37 • 11 12 N 126 • 36 21 E; the entire watershed area is 8139 km 2 , with a total length of 273.5 km [40]. This study targeted the IRW, South Korea, which accounts for 37.1% of the total watershed area, based on the MDL (Figure 1). The shape factor of the IRW is 0.126, indicating a smaller watershed width compared to the extension of the river channel [41]. The mean altitude is 680.5 m, and the elevation is between 155 and 1570 m, indicating a complex topography [40]. Mean annual precipitation of the IRW is approximately 1100 mm, of which approximately 74% occurs from June to September; it has a climate similar to a monsoon climate, with wet summers and dry winters [42]. The Ministry of Environment is currently implementing TPLMS for seven IRW watersheds [43]. Figure 2 shows the land use ratio and pollutant density ratio for each unit watershed, and that the urban and agricultural areas of IRW accounted for 24.6%. Hantan-A unit watershed had the highest percentage (17.9%) in the use of area, whereas Yeongpyeong-A unit watershed had the highest percentage (29.3%) in the agricultural area [44]. The IRW cannot be used as a drinking water source or tap water because it is impacted by tides twice a day from the estuary of the Han River to the Gorangpo point (approximately 40 km) in Yeoncheon-gun, Gyeonggi Province; hence, water is supplied from Paldang Dam of the Han River [41].

Water Sampling and Analysis
We conducted water sampling at the following water quality monitoring network points of the Ministry of Environment: Imjin River (IJ-A and IJ-B) of the main streams of the IRW; Hantan River (HT-A and HT-B), Yeongpyeong Stream (YP-A), and Shincheon Stream (SC-A); Moonsan Stream (MS-A) of the main tributaries of the IRW (Figure 1). These 7 sites were representative sites for river water quality monitoring and pollution sources management in this study [43]. Water quality samples were collected at mean intervals of 8 d from January 2019 to December 2020. The samples were analyzed for 12 water quality parameters: water temperature (WT), the potential of hydrogen (pH), electrical conductivity (EC), dissolved oxygen (DO), 5 d BOD, chemical oxygen demand (COD), suspended solids (SS), total nitrogen (TN), ammonium nitrogen (NH 3 -N), nitrate-nitrogen (NO 3 -N), TP, and phosphates phosphorous (PO 4 -P). WT, pH, EC, and DO were measured on site using a multiparameter sonde (EXO1, YSI Crop., Yellow Springs, OH, USA). The sample was placed in a polyethylene bottle (2 L, 3 EA) washed with 0.1 N HNO 3 solution, stored in an icebox at <4 • C, and was then transported to the laboratory. BOD, COD, SS, TN, NH 3 -N, NO 3 -N, TP, and PO 4 -P were analyzed in the laboratory in compliance with safety regulations. Water quality sampling, preservation, transportation, and analysis were performed in line with the Water Pollution Standard Method of the Ministry of Environment [45]. The analysis methods and tools for the 12 water quality parameters are summarized in Table 1. These water variables are important indicators of water pollution showing organic, nutrient, physical, and biological properties of the river [26]. Quality assurance and quality control (QA/QC) procedures of tools and data to be analyzed were evaluated for all test items, with reference to 5% repeats, 5% spikes, standard calibration curves, and quality control reference standards [45]. WT-water temperature, EC-electrical conductivity, DO-dissolved oxygen, BOD-5 d biological oxygen demand, COD-chemical oxygen demand, SS-suspended solids, TN-total nitrogen, NH 3 -N-ammonium nitrogen, NO 3 -N-nitrate nitrogen, TP-total phosphorus. PO 4 -P-phosphates phosphorous.

Data Treatments and Multivariate Statistical Techniques
Kolmogorov-Smirnov statistics were used to test the normality of water quality data [46]. Most variables of MST should show a normal distribution; however, most variables in the raw data were far from a normal distribution with 95% confidence through statistical tests of skewness and kurtosis [26]. The water resources data collected did not meet the assumption of normality due to the survey time and location [47]. Therefore, the data were standardized (mean value: 0, standard deviation value: 1; Z-Score) to improve the explanatory power of the statistical analysis results and reduce the related errors. Standardization reduces errors due to large fluctuations in water quality data (i.e., lower effects of a unit of measure and parameter variance), and satisfies the assumption of normality for statistical analysis [3].
MST has recently been widely used as a powerful tool for research involving environmental data [24]. MST enables analysis of the relationship between different variables, and the reduction to a small number of factors from large and complex physiochemical data without the loss of information, eventually providing more accurate information regarding water quality and potential pollution sources influencing research systems [48]. In this study, we utilized MST such as hierarchical cluster analysis (HCA) and principal component analysis/factor analysis (PCA/FA), and the APCS-MLR model to identify spatiotemporal variability of IRW surface water and potential pollution sources, and to quantitatively analyze the contributions of pollution sources. An analysis was conducted by using a Z-score standardized to MST. To perform MST and the APCS-MLR model, we used the Statistical Package in the Social Science software (SPSS, v22.0; IBM Corp., Armonk, NY, USA). Data processing and basic statistical analyses (e.g., minimum, maximum, mean, standard deviation, and standard uncertainty) were performed using Excel 2019 (Microsoft Corp., Redmond, WA, USA). OriginPro 2021b (OriginLab Corp., Northampton, MA, USA) was utilized to visualize the analysis data.

Cluster Analysis
Cluster analysis (CA) is an unsupervised pattern detection method that identifies the characteristics of each group after clustering large-scale data of each entity into several groups [26]. In this study, we used HCA in which clustering is sequentially performed based on distances between objects. Ward's method and squared Euclidean distance were used for a standardized dataset to provide spatiotemporal similarity information regarding the IRW surface water [29]. Ward's method suggests that pairs with the smallest variance are merged in a connected manner based on the variance of the entity constituting each cluster [49]. Euclidean distance indicates similarity between two samples in general; "distance" refers to the "difference" between the analyzed values of two samples [33]. HCA generated spatiotemporal dendrograms by grouping seasonal (temporal) similarities between sampling sites (spatial variability) and variables (samples). Spatiotemporal clusters were clearly identified by dendrograms with connection distances expressed as Dlink/Dmax × 100. Dlink/Dmax is a method of standardizing the connection distance expressed on the y-axis; it is reported as the value obtained by dividing the connection distance in a specific case by the maximum connection distance, multiplied by 100 [50]. Dendrograms are utilized to provide a visual summary of the clustering processes, explaining groups and proximities, and greatly reducing the dimensionality of the raw dataset [51]. As for the HCA, a one-way analysis of variance (ANOVA) was performed (p < 0.05) to analyze significant differences in spatiotemporal water quality parameters, and post hoc analysis was conducted to identify significant differences in clusters.

Pearson's Correlation Analysis
Correlation analysis is a statistical technique used for testing the significance of a linear relationship between two or more variables [52]. This study obtained correlation information between water quality parameters using the Pearson's correlation coefficient, based on a standardized Z-scale. Correlation coefficients closer to +1 or −1 indicate respectively strong positive or negative linear relationships between two variables [53]. When the correlation coefficient between two variables is zero, there is no linear relationship at the p < 0.05 level; however, a low correlation in large datasets can be strongly statistically significant at the p < 0.01 level [54].

Principal Component Analysis/Factor Analysis
We conducted principal component analysis (PCA) and factor analysis (FA), analyzed main factors in polluted areas grouped via HCA, and identified potential water pollution sources. PCA and FA were used as the tools for exploratory data analysis and model prediction to identify hypothetical and unobservable potential pollution sources that can trigger pollution of surface water, and to quantify pollutant source apportionment [55]. PCA is a powerful statistical method to dimensionally reduce complex multivariate datasets with linear structures and interpret them as principal components (PCs) without information loss [25]. In this study, we selected factors by considering only the PC axis for which the eigenvalue (i.e., the size of variance explaining the factor based on the Scree plot and Kaiser's rule) was >1.0. PCA enables a rational identification of main pollution factors; however, in terms of actual controlling sources and processes, the interpretation of such factors is highly subjective and has a limitation of generalization [31]. Prior to PCA, we confirmed the basic statistics of the data via descriptive statistical analysis. In addition, the strong correlation between data was identified through Pearson's correlation analysis; the applicability of PCA was determined through Keiser-Meyer-Olkin (KMO) and Bartlett's test of sphericity [56]. The KMO test indicates the degree of variance caused by the PC; in general, the KMO test coefficient is between 0.5 and 0.7, which is satisfactory [57]. Bartlett's test determines the possibility of whether a correlation matrix is a unit matrix; if the significance probability of Bartlett's test is <0.05, it is applicable to PCA [58]. After extracting factors via PCA, FA was performed to create a new group of variables referred to as variance factors (VFs) through varimax rotation of the PCs, to clarify the factor structure in line with the correlation coefficient between factors and variables [33]. VFs were classified as "strong", "moderate", or "weak" if absolute loadings were >0.75, between 0.75 and 0.50, or between 0.50 and 0.30, respectively [59].

APCS-MLR Model
The application of the modeling, based on APCS-MLR, is known as the combination of PCA and MLR; it is generally applied to statistical techniques for source apportionment of environmental pollutants [32]. Wang et al. [60] indicated that the datasets analyzed with PCA and MLR enabled an understanding of precision and quantification of source apportion. In this study, we used PCA to determine the number and properties of potential pollution sources, and then developed the APCS-MLR model to determine the contribution of potential pollution sources. To achieve this, we first established an MLR between APCS (independent variables) and the pollutant concentration (dependent variables). We then calculated regression coefficients and quantified the contribution ratio of each PCs to the water quality variable. The source contributions to the pollutant concentration (C j ) are listed in Table S1 [30,32].
However, in the APCS-MLR model, a negative contribution ratio may appear in the calculations, misleading the accuracy of the source apportionment [60]. Negative values can be considered to be inversely related to the factor. To overcome this problem, we converted negative values to positive contributions, to indicate the contribution ratios to the water quality parameters of the corresponding water source [30,61]. Table 2 shows the descriptive statistics of 12 water quality parameters collected between 2018 and 2020 from seven water quality monitoring sites in the IRW. In terms of the spatial conditions, the IRW showed relatively high levels of BOD, COD, TN, and TP compared to the water quality standards set by the Ministry of Environment in South Korea. The spatial variation in the IRW because of anthropogenic activities and various land use types was evident [44]. The standard deviation was very large for the main streams (IJ-A and IJ-B sites), because they were more affected by summer floods than tributaries were. For the BOD, COD, TN, and TP parameters, as opposed to WT and SS, the fluctuation range was smaller for the tributary (SC-A, YP-A, and MS-A sites) than for the main stream, which may be attributed to the buffering effect of the main stream mixed with water. According to the water quality environment standards, a BOD between 1 and 10 mg/L was considered a fair grade. At sites IJ-A (0.3 mg/L) and HT-A (0.3 mg/L), the BOD was at a very good level (Ia; ≤1 mg/L). At site SC-A, on-site WT (19.5 • C) and EC (1479 µS/cm) were high compared to those elsewhere. Furthermore, the concentrations of COD (11.9 mg/L), TN (8.098 mg/L), NH 3 -N (1.316 mg/L), and TP (0.186 mg/L) were highest because SC-A site was significantly affected by the wastewater from textile and leather factories around the watershed [62]. At site MS-A, the mean SS concentration (55.7 mg/L) exceeded the slightly poor (IV; ≤100 mg/L) water quality standards. The high SS concentration was impacted by the low tide due to the tidal effect of the Imjin River and domestic sewage [38]. Table 2. Descriptive statistics of water quality parameters in the Imjin River Watershed, South Korea during 2018-2020.   Table S2 presents the mean and standard deviation of monthly water quality parameters in the Imjin River Watershed in South Korea. Monthly BOD concentrations were maintained at slightly average (II; ≤3 mg/L) levels during September-December; however, these levels were close to the slightly bad (IV; ≤8 mg/L) levels measured in April and May. The BOD and COD concentrations were highest in spring, assumingly because of the relatively large effect of point pollution sources brought in by the low flow rate of rivers and the impact of river runoff on irrigation water carrying organic matter such as compost or fertilizer accumulated in paddy fields [63]. As shown in Table S2, TN and TP concentrations were the most polluted variables each month. The mean TN concentration was between 3.22 and 7.01 mg/L, exceeding the very poor (VI; >1.5 mg/L) level of water quality standard. The highest mean TN concentration had a large mean and standard deviation between January (6.67 mg/L) and February (7.01 mg/L). The TP concentration gradually increased from March (0.15 mg/L), reached a peak by May when there were active agricultural activities, and showed a decreasing trend between September and December, the months with the high precipitation. This reflects the complex diluting effect of the increase in river flows and other factors [63]. The TP concentrations were highest during summer due to surface runoff, and very close to the slightly poor (IV; ≤0.3 mg/L) levels of the water quality standard. Hence, water quality management is required.

Spatial and Temporal Hierarchical Agglomerative Clustering
The seven sampling sites were classified into two statistically significant clusters at (Dlink/Dmax) × 100 < 40 via spatial HCA (Figure 3a). There was a significant difference in water pollution at the sampling sites between both clusters. The cluster classifications showed spatial distribution because the sites had water quality features and natural backgrounds that were affected by similar pollution sources [47,49]. One-way ANOVA and post hoc analysis were conducted to increase the reliability and validity of HCA. The pvalues of EC, BOD, COD, and NH 3 -N were 0.010, 0.013, 0.011, and 0.050, respectively, indicating statistically significant mean differences between clusters for each water quality parameter. By utilizing box plots [49] featuring two or more static variables, we analyzed spatial changes in EC, BOD, COD, and NH 3 -N water quality parameters between clusters ( Figure 4). As shown in Figure 4, EC, BOD, COD, and NH 3 -N presented higher mean values with greater variations in cluster 2 than in cluster 1. Therefore, cluster 1 corresponded to a lower pollution (LP) region than did cluster 2. Cluster 1 encompassed sites YP-A, HT-B, and HT-A, which are located in Imjin River (IJ-A and IJ-B sites), the main stream, and Yeongpyeong Stream and Hatan River, the tributary. Sites IJ-A and HT-A are located in the upper reaches of the Imjin River and Hantan River, respectively, where there are dense forests and the only basalt in Korea. IJ-B, HT-B, and YP-A sites reflected dilution effects, mainly due to agricultural activities and anthropogenic factors [44]. In cluster 2, sites SC-A and MS-A corresponded to the high-pollution (HP) region; these two sites are located in the Shincheon Stream and Moonsan Stream, respectively. As shown in Figure 4, EC, BOD, COD, and NH 3 -N showed greater variability and higher mean values in the HP region than in the LP region. According to Cho et al. [38], Shincheon Stream is highly polluted by textile and leather industrial complexes, and Moonsan Stream is an area with a high rate of industrial wastewater discharge.   The temporal variability of IRW water quality was further analyzed through HCA. As for temporal HCA, we created a dendrogram by classifying 12-month mean data over 3 years (from 2018-2020) into two clusters, at (Dlink/Dmax) × 100 < 60 (Figure 3b). The two clusters corresponded to the dry and wet seasons (DS and WS, respectively) of the IRW; the DS and WS variations were statistically significant because IRW pollution was mainly affected by human activities and rainfall. Cluster 1 encompassed September-May; the cluster corresponded to DS accounting for approximately 41% of the annual mean precipitation. Cluster 2 included June-August, which corresponded to the WS and accounted for approximately 59% of the annual mean precipitation [43]. According to the measured parameter values in Table 1, the IRW had TN and TP, the most serious and concerning pollutants. In addition, EC and COD are generally considered important indicators for monitoring organic pollution in rivers, industrial water, and factory wastewater [32]. Therefore, the temporal variability of the parameters selected for DS and WS (EC, BOD, COD, and NH 3 -N) is shown in Figure 4b. The mean COD and NH 3 -N concentrations were higher in the WS than in the DS, because of the combined effect of increased surface runoff in spring and major stream inflows carrying organic matters and nitrogen from land to the aquatic environment [32,64].   As for the spatiotemporal HCA of the IRW in this study, the regional difference between LP and HP was more evident than the difference between DS and WS. Therefore, we focused on exploration and apportionment of pollution sources by performing correlation analysis of spatial groups in the IRW (LP and HP regions) and utilizing the PCA/FA or APCS-MLR model. HCA was useful in providing reliable information on spatiotemporal variability of surface water [26]. Furthermore, it was found that each cluster classified based on the characteristics of the surrounding pollution sources and monthly water quality could represent the water quality of the corresponding watershed [65]. Therefore, HCA enables the design of future optimal spatiotemporal sampling strategies by reducing the frequencies in the monitoring network and sampling, thereby reducing costs without losing the significance of the results [66].

Correlation Analysis of Water Quality Parameters According to Polluted Region
To provide correlation information between physicochemical water quality parameters of spatially clustered LP and HP regions through HCA, the Pearson's correlation coefficient (r) with statistical significance was used (p < 0.05). As indicated in Figure 5, the correlation analysis for the IRW showed the following: EC had a positive (+) correlation with the concentrations of organic matter and nitrogen-based substances, and SS and PO 4 -P had a negative (−) correlation. Such correlations were more obvious in the HP region than in the LP region. Furthermore, SS and PO 4 -P showed negative (−) correlations with pH, EC, and DO. The negative (−) correlation between WT and DO was statistically stronger in the LP region than in the HP region (r = −0.71, p < 0.01). The statistically strong positive (+) correlations between EC, and COD, TN, and NO 3 -N were predominant in the HP region (r > 0.60, p < 0.01). The reason is that the Shincheon and Moonsan Streams in the HP region showed high densities of nondegradable materials and industrial wastewater discharge due to textile factories and large-scale industrial facilities around the streams [67]. In the LP region, EC had a statistically strong positive (+) correlation with BOD and NH 3 -N (r > 0.60, p < 0.01). There was a strong positive (+) correlation between SS and COD concentrations (r = 0.59, p < 0.01) in the LP region; however, there was no statistically significant correlation in the HP region. The statistically strong positive (+) correlations between BOD, and TN and TP were more prominent in the HP region than in the LP region (r > 0.47, p < 0.01). On the basis of these correlations, organic matters in the HP region were confirmed to flow into the river along with nutrients [68].

Pollution Source Identification Using PCA/FA
PCA/FA was performed on the standardized data to identify the factors affecting water quality in the LP and HP regions clustered by HCA. The KMO test values for LP and HP regions were 0.592 and 0.621, respectively, and Bartlett's sphericity test value was 0.00 (p < 0.01), indicating that there was a statistical significance for the correlation between water quality parameters, and the conformance of PCA [28]. PCA/FA extracted four and three PCs with an eigenvalue of >1 for the LP and HP regions, respectively. This explained approximately 80.8% and 72.0% of the total variance for the water quality datasets for the LP and HP regions, respectively. Table 3 summarizes the PCA/FA results, including loading, eigenvalues, variance, and cumulative variance for each VF. VF1 in the LP region accounted for 29.37% of the total variance with strong positive loadings (0.90, 0.83, and 0.78, respectively) for EC, TN, and NH3-N, and with medium positive loadings (0.74 and 0.65, respectively) for BOD and NO3-N. EC and TN of surface water can originate from pollution sources including fertilizer use, animal waste, and domestic and industrial sewage [69]. Five sites in the LP region are dominated by mountains, grasslands, and agricultural lands. Since the LP region has low population density and there has been minimal exploitation, it appeared to be mostly related to agriculture and livestock, and to be more impacted by manure and chemical fertilizers than by industrial

Pollution Source Identification Using PCA/FA
PCA/FA was performed on the standardized data to identify the factors affecting water quality in the LP and HP regions clustered by HCA. The KMO test values for LP and HP regions were 0.592 and 0.621, respectively, and Bartlett's sphericity test value was 0.00 (p < 0.01), indicating that there was a statistical significance for the correlation between water quality parameters, and the conformance of PCA [28]. PCA/FA extracted four and three PCs with an eigenvalue of >1 for the LP and HP regions, respectively. This explained approximately 80.8% and 72.0% of the total variance for the water quality datasets for the LP and HP regions, respectively. Table 3 summarizes the PCA/FA results, including loading, eigenvalues, variance, and cumulative variance for each VF. VF1 in the LP region accounted for 29.37% of the total variance with strong positive loadings (0.90, 0.83, and 0.78, respectively) for EC, TN, and NH 3 -N, and with medium positive loadings (0.74 and 0.65, respectively) for BOD and NO 3 -N. EC and TN of surface water can originate from pollution sources including fertilizer use, animal waste, and domestic and industrial sewage [69]. Five sites in the LP region are dominated by mountains, grasslands, and agricultural lands. Since the LP region has low population density and there has been minimal exploitation, it appeared to be mostly related to agriculture and livestock, and to be more impacted by manure and chemical fertilizers than by industrial sewage [32]. Nitrogen from agricultural land and pasture in the LP region flows into rivers in the vicinity due to surface runoff and flooding irrigation, as primary fertilizer for crops and general livestock waste. Studies on the Nakdong River in South Korea, Grand River in the USA, and Min River in China also revealed that EC and TN were the most influential parameters in VF1 in terms of agricultural areas through PCA/FA [70][71][72]. Therefore, VF1 in the LP region can be considered a nonpoint source due to agricultural activities. VF2 in the LP region accounted for 21.49% of the total variance with strong positive loadings (0.90 and 0.87, respectively) for TP and SS, and with medium positive loading (0.71) for COD. This factor can be regarded as the accumulated organic pollutants from the discharge of untreated wastewater, including domestic and industrial wastewater [61]. VF3 had a strong negative loading (−0.92) for water temperature, whereas it accounted for 15.70% of the total variance with strong positive loading (0.88) for DO. WT and DO are commonly considered as factors of seasonal variation [65]. VF4 accounted for 14.19% of the total variance with a strong negative loading (−0.82) for pH, and with a medium positive loading (0.62) for PO 4 -P. This can be regarded as the physicochemical variability of the river [72].
VF1 accounted for 29.40% of the total variance in the HP region and provided strong positive loading (i.e., 0.95, 0.83, and 0.80, respectively) for COD, EC, and BOD and a medium positive loading (0.54) for pH. VF1 of HP region was associated with various organic matter sources of industrial wastewater, and municipal sewage [66]. According to Choi et al. [67], Shincheon Stream and Moonsan Stream in the HP region showed the high industrial wastewater discharge density and the discharge of a large amount of industrial wastewater into rivers and ditches in the vicinity without proper treatment, which indicates that the streams have been greatly influenced by urbanization and industrialization. The SC-A site, where there were impacts of effluent from textile factories and basic treatment facilities, indicated the highest mean concentrations of variables such as EC, BOD, COD, TN, TP, NH 3 -N, and NO 3 -N. The mean NH 3 -N concentration, a toxicity indicator of river odor and degraded health of aquatic organisms [67], was 1.32 mg/L at site SC-A, which was less than the US EPA standard of 1.9 mg/L; however, it requires strict management. Therefore, VF1 in the HP region can be interpreted as being affected by point pollution sources from municipal and industrial sewage discharges. VF2 accounted for 24.13% of the total variance, and showed strong positive loadings (0.86 and 0.75, respectively) for TP and PO 4 -P, a medium loading (0.63) for SS, and negative loadings (−0.64 and −0.49, respectively) for DO and NO 3 -N. Since SS is partially related to surface runoff and phosphorus flow into rivers with surface runoff [65], VF2 in the HP region can be attributed to municipal sewage triggered by surface runoff. VF3 accounted for 24.3% of the total variance and indicated a strong negative loading (−0.82) for WT, a positive loading (0.78) for NH 3 -N, and a medium positive loading (0.65) for TN. WT is related to seasonal effects, and nitrogen fertilizers may be a major source from agricultural activities [73]. As a result, it can be considered that VF2 in the HP region was affected by nonpoint source pollution, as nitrogen fertilizers have been shown to contribute to river loadings via agricultural activities [32].
After identifying potential pollution sources using PCA/FA, MLR analysis was performed on VFs in LP and HP regions. The R 2 values in Table 4 range between 0.883 and 0.971, which is closer to 1; the prediction of the corresponding regression equation can be considered valid. Of the information of the dependent variable VFs, 88.3-97.1% can be considered as the variation in the independent variable. Furthermore, as the standard error of the estimate was <0.5, it can be determined as a valid regression equation. As a result of one-way ANOVA, as the probability value (p-value) of the F statistics was less than the significance level of 0.001, the corresponding regression equation can be considered statistically significant.

Pollution Source Apportionment Using APCS-MLR Model
We analyzed the APCS-MLR model to quantify the contribution of each pollution source to 12 water quality parameters in the LP and HP regions ( Figure 6). MLR analysis for the LP and HP regions showed a good model with consistency between the measured and predicted values >0.6, except for the relatively lower determination coefficient (R 2 = 0.396) for SS in the HP region. According to Gholizadeh et al. [30], in general, when the coefficient of determination (R 2 ) is >0.5, there is a good consistency between observed and predicted values, and a reliable assessment of pollution source apportionment can be made. Figure 7 indicates the mean contributions of pollution sources using the APCS-MLR model in the LP and HP regions.
Source 1 (S1, 26.5%) in the LP region was greatly affected by nonpoint source pollution, mainly from agricultural activities. Nonpoint source pollution accounted for 26.5% of the total pollution sources and showed high contribution ratios in the nitrogen series (NH 3 -N, 34.1%; TN, 30.3%; NO 3 -N, 27.6%), EC (73.0%), and BOD (30.8%) water quality parameters. Furthermore, organic pollution (S2) from domestic sewage and industrial wastewater accounted for 24.9% of the entire pollution sources and showed high contribution ratios in SS (46.2%), TP (39.3%), PO 4 -P (35.2%), and COD (27.0%) water quality parameters. Physicochemical effects (S3) accounted for 21.8% of the total pollution sources, and mainly showed the contribution ratio of 23.5%, mainly in pH. Seasonal effects on diverse water quality parameters (S4, 20.4%) showed contribution ratios in WT and DO of 26.2% and 24.6%, respectively. Through the APCS-MLR model, unidentified source (UIS) estimated water quality parameters contributing to river water pollution ranging from 0.1% (TP) to 17.1% (WT). According to Zhang et al. [74], since the estimated contributions of UIS consist of a mixture of pollutants, and originate from complex sources, the identification of sources via the APCS-MLR model could be problematic. Therefore, the contributions of pollution sources in the LP area were determined as being in the following descending order: Nonpoint source pollution from agricultural activities (S1, 26.5%), point pollution sources from domestic sewage and industrial wastewater (S2, 24.9%), physicochemical effects (S4, 21.8%), seasonal effects (S3, 20.4%), and UIS (6.4%). Most water quality parameters in the HP region were significantly affected by point pollution sources from domestic sewage and industrial wastewater (S1, 31.4%). This was shown by the high contribution ratios of organic pollution sources (BOD, 39.1%; COD, 37.1%) and physicochemical effects (EC, 45.4%; pH, 30.9%) to river water quality. Point source pollution from municipal sewage (S2) accounted for 29.1% of the total pollution sources, and phosphorus-based substances were present (TP, 40.9%; PO4-P, 43.9%). The impact of nonpoint source pollution from agricultural activities (S3) accounted for 27.9% of the total pollution sources in TN (32.8%) and NH3-N (45.7%); seasonal effects had the greatest impact on WT (31.1%). As shown in Figure 4, UIS in the HP region estimated water quality parameters contributing to river water pollution ranging between 0.04% (PO4-P) and 39.6% (EC). Therefore, the pollution sources in the HP region descended in the following order: Point source from domestic sewage and industrial wastewater (S1, 31.4%), point source from urban sewage (S2, 29.1%), nonpoint source pollution from agricultural activities or seasonal effects (S3, 27.9%), and UIS (11.5%) (Figure 7). Therefore, it was demonstrated that the APCS-MLR model was able to quantify the source contribution to physicochemical water quality parameters in the LP and HP regions.  Most water quality parameters in the HP region were significantly affected by point pollution sources from domestic sewage and industrial wastewater (S1, 31.4%). This was shown by the high contribution ratios of organic pollution sources (BOD, 39.1%; COD, 37.1%) and physicochemical effects (EC, 45.4%; pH, 30.9%) to river water quality. Point source pollution from municipal sewage (S2) accounted for 29.1% of the total pollution sources, and phosphorus-based substances were present (TP, 40.9%; PO4-P, 43.9%). The impact of nonpoint source pollution from agricultural activities (S3) accounted for 27.9% of the total pollution sources in TN (32.8%) and NH3-N (45.7%); seasonal effects had the greatest impact on WT (31.1%). As shown in Figure 4, UIS in the HP region estimated water quality parameters contributing to river water pollution ranging between 0.04% (PO4-P) and 39.6% (EC). Therefore, the pollution sources in the HP region descended in the following order: Point source from domestic sewage and industrial wastewater (S1, 31.4%), point source from urban sewage (S2, 29.1%), nonpoint source pollution from agricultural activities or seasonal effects (S3, 27.9%), and UIS (11.5%) (Figure 7). Therefore, it was demonstrated that the APCS-MLR model was able to quantify the source contribution to physicochemical water quality parameters in the LP and HP regions.  Most water quality parameters in the HP region were significantly affected by point pollution sources from domestic sewage and industrial wastewater (S1, 31.4%). This was shown by the high contribution ratios of organic pollution sources (BOD, 39.1%; COD, 37.1%) and physicochemical effects (EC, 45.4%; pH, 30.9%) to river water quality. Point source pollution from municipal sewage (S2) accounted for 29.1% of the total pollution sources, and phosphorus-based substances were present (TP, 40.9%; PO 4 -P, 43.9%). The impact of nonpoint source pollution from agricultural activities (S3) accounted for 27.9% of the total pollution sources in TN (32.8%) and NH 3 -N (45.7%); seasonal effects had the greatest impact on WT (31.1%). As shown in Figure 4, UIS in the HP region estimated water quality parameters contributing to river water pollution ranging between 0.04% (PO 4 -P) and 39.6% (EC). Therefore, the pollution sources in the HP region descended in the following order: Point source from domestic sewage and industrial wastewater (S1, 31.4%), point source from urban sewage (S2, 29.1%), nonpoint source pollution from agricultural activities or seasonal effects (S3, 27.9%), and UIS (11.5%) (Figure 7). Therefore, it was demonstrated that the APCS-MLR model was able to quantify the source contribution to physicochemical water quality parameters in the LP and HP regions.

Strengths and Limitations
Several advantages and limitations were identified in this study. First, MST and the APCS-MLR model used in this study were useful tools for identifying hidden relationships and factors regarding large and complex water quality datasets, and for providing environmental information influencing environmental systems. Second, in the field of water environment research, MST and the APCS-MLR model have shown reliability and validity in water quality assessments of the target watershed, and the results are useful for water environment protection and water resource management policies in the future. Third, customized water quality assessment and management of the target watershed required the identification of potential pollution sources, by determining the main factors based on spatial conditions and quantitatively analyzing the contributions of pollution sources. However, we must consider the following limitations in interpreting the study results. First, the Imjin River on the Korean Peninsula is a transboundary river between South and North Korea, and it was hard to access to obtain necessary information for water quality data analysis of the IRW in North Korea due to the MDL and political conditions, given that the entire watershed of the river could not be evaluated. Second, we were not able to evaluate the variability of physicochemical water quality parameters (discharge, fecal coliform, and chlorophyll-a) that were not selected in this study. Third, we could not investigate unidentified pollution sources in the target watershed. Therefore, in the future, a highly reliable water quality assessment of the target watershed should consider the limitations of this study and will require additional research. Finally, there are many acronyms in this study. Therefore, all acronyms and their meanings were summarized in Table S3.

Conclusions
In this study, we combined MSA (such as HCA and PCA/FA) and the APCS-MLR model to identify spatiotemporal variability of water quality in the IRW, distinguish potential pollution sources, and apportion the pollution sources. To assess the water quality of the IRW, we selected seven monitoring sites (IJ-A, HT-A, YP-A, SC-A, HT-B, MS-A, and IJ-B) and analyzed 12 water quality parameters (WT, pH, EC, DO, BOD, COD, SS, TN, NH 3 -N, NO 3 -N, TP, and PO 4 -P). At the seven water quality monitoring sites, while reflecting the impacts of anthropogenic activities and different land use types, significant changes in physicochemical water quality parameters were observed. In particular, at the SC-A site, parameters such as WT, EC, BOD, COD, TN, NH 3 -N, and TP indicated high concentrations; this can be interpreted as meaning that the site was considerably impacted by textile and leather factories around the watershed. Spatial HCA classified the seven monitoring sites into two clusters, corresponding to LP and HP regions, in which there was a statistically significant correlation (p < 0.01) between nitrogenous (TN and NO 3 -N, r = 0.93) and organic matter (BOD and COD, r = 0.79) variables, respectively. The PCA/FA results showed that LP and HP regions were significantly affected by different pollution factors. The LP region was mainly affected by nonpoint source pollution, mainly from agricultural activities; whereas, in the HP region, a large amount of industrial wastewater and domestic sewage flowed into the river, and organic matters and nutrient salts accumulated in the surface water. The APCS-MLR model revealed that nonpoint source pollution from agricultural activities (26.5%) was the crucial pollution source in the LP region, whereas point sources from domestic sewage and industrial wastewater (31.4%) were the main contributors in the HP region.
This study showed that MST and the APCS-MLR model are useful tools for understanding spatiotemporal water quality changes and identifying and apportioning potential pollution sources, for efficient water quality management in the IRW. Currently, TPLMS has been implemented in seven watersheds in the IRW to maintain target water quality (BOD and TP); however, if appropriate water quality management policies are not enacted for the watershed, water quality will inevitably deteriorate in the future. Therefore, the priorities in the LP region should be as follows: (1) reduction of fertilizer in rural areas, control of pesticide uses, and optimization for livestock breeding management; (2) reduction of nonpoint source pollution, i.e., pilot projects should be implemented for rainwater runoff treatment facilities and rainwater pumping station reservoirs. In the HP region, there should be (1) reinforcement of effluent operation standards of wastewater treatment facilities, and application of advanced wastewater reduction treatment technology; (2) TPLMS implementation reviews for tributaries, in order to control pollutant discharges to tributaries; (3) improvement of individual treatment capacities of domestic sewage. This study can assist water quality managers and policymakers to comprehensively understand the main pollution sources in terms of spatial conditions, and to determine the priorities for water quality improvement regarding water pollution control and sustainable development in Imjin River, South Korea.

Supplementary Materials:
The following supporting information can be downloaded at https: //www.mdpi.com/article/10.3390/w14050793/s1: Table S1: Factors and calculations considered for the APCS-MLR model, Table S2: Mean and standard deviation (SD) of water quality parameters during different months in the Imjin River Watershed, South Korea, Table S3: Summary of all acronyms in this study.