Spatio-Temporal Characterization Analysis and Water Quality Assessment of the South-to-North Water Diversion Project of China

In this article, a data matrix of 20 indicators (6960 observations) was obtained from 29 water quality monitoring stations of the Middle Route (MR) of the South-to-North Water Diversion Project of China (SNWDPC). Multivariate statistical techniques including analysis of variance (ANOVA), correlation analysis (CA), and principal component analysis (PCA) were applied to understand and identify the interrelationships between the different indicators and the most contributive sources of anthropogenic and natural impacts on water quality. The water quality index (WQI) was used to assess the classification and variation of water quality. The distributions of the indicators revealed that six heavy-metal indicators including arsenic (As), mercury (Hg), cadmium (Cd), chromium (Cr), selenium (Se), and lead (Pb) were within the Class I standard, while the As, Pb, and Cd displayed spatial variation. Moreover, some physicochemical indicators such as dissolved oxygen, 5-day biochemical oxygen demand (as BOD5), and total phosphorus (TP) had spatio-temporal variability. The correlation analysis result demonstrated that As, Hg, Cd, Cr, Se, Pb, copper (Cu), and zinc (Zn) had high correlation coefficients. The PCA result extracted three principal components (PC) accounting for 82.67% of the total variance, while the first PC was indicative of the mixed sources of anthropogenic and natural contributions, the second and the third PCs were mainly controlled by human activities and natural sources, respectively. The calculation results of the WQI showed an excellent water quality of the MR of the SNWDPC where the values of the stations ranged from 10.49 to 17.93, while Hg was the key indicator to determine the WQI > 20 of six stations, which indicated that the Hg can be the main potential threat to water quality and human health in this project. The result suggests that special attention should be paid to the monitoring of Hg, and the investigation and supervision within the areas of high-density human activities in this project should be taken to control the impacts of urban and industrial production and risk sources on water quality.


Introduction
With the rapid economic development and increase in human populations, water quality deterioration has become a crucial global issue that can lead to serious public health hazards, biodiversity destruction, and aquatic environmental problems [1]. Much long-term research on different water quality problems, such as industrial wastewater discharge [2], heavy metal pollution [3], and land management influences [4], have been conducted from a regional to a global scale [5]. The spatial-temporal variations and trends of water quality can reflect the geographical differences in

Data Collection and Preparation
As the safety of the water quality of the MR of the SNWDPC is highly related to important economic and political issues in China, the government has never allowed a third-party research team to carry out water quality sampling and testing work, and the water sampling campaigns have only ever been conducted by the South-to-North Water Diversion Project Construction Committee Office of the State Council of China. On the basis of compliance with confidentiality agreements and the government's permission, the data were obtained by the research team of the State Key Laboratory of Water Resources and Hydropower Engineering Science, Wuhan University. The investigation was supported by the Middle Route Construction Management Bureau of the South-to-North Water Diversion Project. Twenty indicators that were set as the basic items in the Environmental Quality Standards for Surface Water, China (No. GB3838-2002) were obtained from 29 water quality monitoring stations per month from December 2015 to November 2016, with 6,960 observations in total. All of the water samples were collected under sunny or cloudy weather conditions to minimize the effects of rainfall or other extreme weather. These indicators included mercury (Hg), lead (Pb), arsenic (As), cadmium (Cd), selenium (Se), chromium (Cr), 5-day biochemical oxygen demand (BOD 5 ), pH, dissolved oxygen (DO), permanganate index (as PI), ammonia nitrogen (as NH 3 -N), fecal coliform (as FC), total phosphorus (as TP), total nitrogen (as TN), copper (Cu), zinc (Zn), petroleum, sulfate (as SO 4 2− ), fluoride (as F − ), and water temperature (as WT).  Some indicators, such as the water temperature, pH, and dissolved oxygen, were measured in situ by using a multi-parameter probe with a Hydrolab Datasonde 5 sensor. All of the water samples were kept in cold storage and directly transferred to the laboratory at each station using a cooler filled with ice. The concentrations of Hg, Pb, As, Cd, Se, Cr, Cu, and Zn were measured using the inductively-coupled plasma atomic emission spectrometry method (ICP-AES) with an Agilent 7900 inductively coupled plasma-mass spectrometry (ICP-MS; Agilent Technologies Inc.; Santa Clara, Calif., USA). The BOD 5 was measured as the variation in the oxygen concentration from the beginning to the fifth-day in bottles after incubation under 20 ± 1 • C conditions and assayed by the Winkler method. The concentrations of the permanganate index were tested by titrimetric analysis (potassium permanganate oxidation) using a titration assembly. The contents of TP and TN were measured using a combined potassium persulfate oxidation method. The concentrations of NH 3 -N were analyzed spectrophotometrically using the nesslerization method with a UV spectrophotometer (UV 2450). The petroleum concentrations were determined by the infrared spectrophotometry. The concentrations of SO 4 2− and F − were determined using an ion chromatograph with ion chromatography system.

Data Collection and Preparation
The contents of fecal coliform were determined by the filter membrane method.
The periods of four seasons in this study were defined as spring (March to May), summer (June to August), autumn (September to November), and winter (January, February, and December). Since there were some indicators that were not recorded with specific concentrations, but recorded as "<a" or "≤a", in order to reflect the difference between these two data records without specific concentrations and to facilitate data calculation and analysis, in this study, when the recorded concentration of an indicator at a certain station was "<a" or not detected, it means that the concentration of this indicator at that station was "0" at a certain moment, and when the recorded concentration of an indicator at a certain station was "≤a", it means that the concentration of this indicator at that station was "a" at a certain moment.

Multivariate Statistical Methods
One-way analysis of variance (one-way ANOVA) was used to classify and test the seasonal significance of water quality indicators at different stations (P < 0.05, least significant difference) [34]. The Pearson coefficient was applied to study and interpret the relationships and interactions among the different indicators. A correlation coefficient close to 1 or −1 indicates the strongest positive or negative correlation between two variables, while a value close to 0 indicates that there is no linear relationship between them at a statistically significant level with P < 0.05 and/or P < 0.01. Due to the variety of water quality indicators and large amount of raw data, and the fact that there are often different degrees of connection and relationships among different indicators, it is easy to cause information overlaps. The principal component analysis (PCA) is a multivariate statistical technique used to reduce the dimension of raw data by eliminating overlapping information through projection methods, under the principle of ensuring the minimum loss of original information in the system [3,35,36] and to study the underlying structure of the dataset and further understand the relationships among those indicators and the sources of influence on them [37]. In this study, the principal components were retained with an eigenvalue > 1 [38]. The raw data were mainly processed using SPSS 23.0 for Windows. The KMO (Kaiser-Meyer-Olkin) test and Bartlett's test of sphericity were used to check the data normality for further analysis. The KMO index was applied to compare the correlations between different variables and those of the partial correlations, i.e., the closer the KMO index is to 1, the more suitable the principal component analysis of the variable. The Bartlett's test of sphericity was used to test the null hypothesis that the intercorrelation came from the groups with unrelated variables.

Water Quality Index
The WQI method has been widely used to evaluate and monitor water quality in various environmental research. The calculation of WQI has been improved and modified by numerous scholars using different mathematical methods under different actual situations and variables. The WQI in this study was based on the recommendation of the Water Environment Monitoring Research Center of the Ministry of Water Resources, China. The classification and the threshold of each grade of different water quality indicators were evaluated in line with the Environmental Quality Standards for Surface Water, China (No. GB3838-2002). The standards divide each water quality item into Class I to V, presenting excellent (I), good (II), medium (III), poor (IV), or very poor (V) standards of water quality (Table 2) [39][40][41]. The 20 water quality indicators were classified into three types, including the Toxic Metals Group, the Easily Treated Indicators Group, and the Other Indicators Group. The Toxic Metals Group in this article included As, Hg, Pb, Cd, Cr, and Se, which had low concentrations in the water of this project. However, the Toxic Metals Group had characteristics of toxicity, persistence, and bioaccumulation that would seriously threaten human life and health, cause water pollution, and would be difficult to treat and purify once exceeding a certain standard.  (1) The WQI values of each water quality indicator (as I i ) were calculated based on Equation (1): In Equation (1), C i is the real concentration of the i-th water quality indicator, C i,k and C i,k+n are the standard concentrations of the grade K and grad K + n of the i-th water quality indicator respectively, I i,k is the K value of the classification item of assessment, and n is the number of the same standard threshold, n = 1 when there is no equal standard value.
For pH, when pH ∈ [6,9], I i = 0, otherwise I i = 100. For undetected indicators, I i = 0. For the indicator SO 4 2− that only has one standard value, its WQI was calculated based on Equation (2): (2) The grouped WQI values (CI) were calculated for the three groups, where CI(1) for the Toxic Metals Group was based on Equation (3), while CI(2) and CI(3) for the Easily Treated Indicators Group and the Other Indicators Group were based on Equation (4), respectively.
In Equation (4), n is the total number of indicators, P i is the weight of the i-th water quality indicator, which presents the importance of the i-th indicator for the water use of aquatic life/humans, and the relative values of weight were selected according to previous studies [42,43]. 7 of 23 (3) The final WQI for a station in a specific time was calculated based on Equation (5): The water quality classifications were made based on WQI values. Wu evaluated the water quality in Lake Poyang, China, and the water quality was classified from Class I to V, which corresponded to excellent, good, moderate, poor, and bad water states, respectively [40]. Wu discussed the key water quality indicators in Lake Taihu Basin, China, by using the minimum WQI method, which was developed based on a stepwise linear regression analysis, and the water quality was classified into five grades based on the WQI scores [41]. Many international water bodies have also been assessed for water quality by comprehensive multivariate statistical methods [44]. Considering the classification methods of various water quality grades, the actual situation of the Chinese water quality assessment standards, and the water quality management of this project, the classification of WQI values in this study were divided into five grades from 0 to 100, corresponding to the water quality levels from excellent to very poor, as seen in Tables 2 and 3.

Toxic Metals
The concentrations of six toxic metal indicators of 29 stations are shown in Figure 2 and Table 4. All toxic metals indicators met the Class I of Standard (No. GB3838-2002) across the four seasons. Since most of the real concentrations of six toxic metals in the MR of the SNWDPC were very low, there were some undetected cases. The Hg and Pb of the 29 stations exhibited significant differences among the four seasons (one-way ANOVA, P < 0.05), while As, Cd, Cr, and Se did not (one-way ANOVA, P > 0.05).
The annual mean concentration of Hg was 0.0156 µg/L, and the seasonal concentrations ranged from 0.0120 µg/L (summer) to 0.0182 µg/L (spring). The maximum detected Hg appeared in two adjacent stations, XNE and HW (spring, 0.033 µg/L), which indicates that some external Hg sources input to these stations during spring. The Hg concentrations exhibited volatility from the southern to northern stations in four seasons, and there was no obvious spatial distribution rule (Figure 2a). The maximum annual mean concentration of Hg was at the XNE station (0.0225 µg/L), while all of the Hg contents at the 29 stations were lower than the threshold of Class I (0.05 µg/L). Compared with the concentrations of Hg in the Danjiangkou Reservoir, the contents of Hg at some stations were distinctly different to that in the headwater TC station (Table 4), which reflect their changed constituents due to pronounced anthropogenic activity inputs. The result revealed that the Hg in the MR of the SNWDPC did not cause Hg pollution, however, the monitoring and control of risk sources around some stations with abnormally high measured concentrations should be enhanced.
Background, Dongting lake, China [49] 0.   The seasonal concentrations of Pb ranged from 0.2469 µg/L (winter) to 0.6338 µg/L (summer), while the annual mean concentration was 0.4304 µg/L. Since the sources of Pb were easily affected by human activities, the industrial production and recreational activities were more frequent in summer than in winter, which can lead to higher contents of Pb in the MR of the SNWDPC. Sixteen stations from the starting point TC to ZN had certain fluctuations of Pb, and these stations presented higher concentrations than the other 13 northern stations, from NC to HN (Figure 2b). The maximum detected Pb was at the LN and CQ stations (summer, 2.66 µg/L), while the highest annual mean concentration was 1.128 µg/L at the starting point TC station. Some studies have found that the Pb content in the Danjiangkou Reservoir was higher than that of other heavy metal ions that may lead to a higher annual mean concentration of Pb at the TC station than at the others in this study [45]. Considering that the LN and CQ stations are not geographically adjacent and are 177.8 kilometers apart (Figure 1), and that the Pb concentrations of these two stations in other seasons were quite low, the maximum Pb in summer primarily comes from external inputs. Considering the CQ station was downstream of the ZW station with a population of 9.88 million, the maximum detected Pb concentration at the CQ station reflected some understandably anthropogenic activity inputs between these two stations. The annual mean concentration of As was 0.86 µg/L, and the seasonal concentrations ranged from 0.744 µg/L (autumn) to 1.10 µg/L (spring). The maximum detected (spring, 3.033 µg/L) and annual mean concentrations (1.567 µg/L) of As were both at the XF station. The distribution of As displayed a spatial trend that decreased from the southern to northern stations across the four seasons. From the starting point station TC to ZN, these 16 stations showed a certain volatility of As, while the other 13 stations, from NC to HN, showed significantly lower As contents than those of the southern stations ( Figure 2c). Seasonal concentrations of Cd and Se were in the range of 0.0283 µg/L (winter) to 0.0303 µg/L (summer), and 0.299 µg/L (winter) to 0.310 µg/L (spring), respectively. The annual mean concentrations of Cd and Se were 0.0297 µg/L and 0.3025 µg/L, respectively. The maximum measured concentration of Cd was at the HW station (spring, 0.0833 µg/L), while the maximum concentration of Se was at the HN station (summer, 0.57 µg/L). The spatial variation trend of Cd was similar to As and Pb, with 16 stations from the starting point TC to ZN having higher concentrations than the 13 northern stations from the NC to HN (Figure 2d). There was no obvious spatial variation rule of Se in autumn and winter, however, in spring and summer, the Se of 13 stations displayed a fluctuating upward trend from NC to HN ( Figure 2e).

282
The concentrations of Cr at all stations were very stable with no spatio-temporal variation across the four seasons ( Figure 2f). Many real concentrations of Cr were actually undetected due to the low concentrations. According to the above-mentioned data processing rule, the annual mean concentration of Cr was 4.01 µg/L, which was lower than its Class I threshold (10 µg/L). Although these toxic metals are harmful to human health once exceeding certain standards [24], it can be seen that each metal was way below the threshold of Class I of their respective standard based on our analysis and showed relatively high levels of water quality in some international rivers. The results proved that there was no toxic-metals pollution in this project, but administrative departments should strengthen the control of risk sources at stations with abnormally high concentrations of some indicators.

Easily Treated Indicators
The concentrations of six easily treated indicators are shown in Figure 3. In this group, five indicators, including pH, dissolved oxygen, permanganate index, NH 3 -N, and fecal coliform, exhibited significant differences across the four seasons (one-way ANOVA, P < 0.05), while the BOD 5 did not (one-way ANOVA, P > 0.05). The annual mean concentration of BOD 5 was 1.56 mg/L, and the seasonal concentrations ranged from 1.47 mg/L (autumn) to 1.72 mg/L (winter). The maximum concentrations of the detected and annual mean were seen at TC station, with concentrations of 2.57 mg/L (summer) and 2.375 mg/L, respectively. The concentrations of BOD 5 displayed a spatial distribution where the 16 stations from the starting point TC to the ZN had higher concentrations than the northern 13 stations (from NC to HN) across the four seasons ( Figure 3a). Since the BOD 5 can be used as an effective indicator to characterize the content of organic matter in water [52], the spatial distribution of BOD 5 showed that the organic materials in this project could be reduced by self-purification after a high-flow water delivery through a long-distance open channel. The seasonal concentrations of pH ranged from 8.12 (winter) to 8.32 (summer), indicating that the water in the MR of the SNWDPC was weakly alkaline, and within the guideline of 6.0-9.0 recommended by the standard (No. GB3838-2002). The maximum detected pH was at the SZ station (summer, 8.6). The pH did not display obvious spatial variation from the southern to northern stations across the four seasons ( Figure 3b). However, four consecutive stations (from TC to FC) had a pH lower than 8 in autumn, while there were five consecutive stations (from ZFN to ZN) in winter. in other seasons, and the NH3-N had no obvious spatial distribution from the southern to northern 336 stations in spring, summer, and winter (Figure 3e).

337
The annual mean content of fecal coliform was 198 colonies/L, which was very close to the

341
( Figure 3f). Fecal coliform is a water quality indicator that is closely related to human activities.
Since 13 stations (from NC to HN) pass through the densely populated areas of the Hebei Province,

343
Tianjin Municipality, and Beijing, these northern stations are easily affected by intensive human 344 activities, which led to significantly higher contents than those of the southern stations [58]. In 345 addition, the water temperature was also a key factor that could cause the increase in the FC 346 quantity. A low temperature environment is not conducive to the reproduction of FC, and high 347 temperature in summer was a major cause of the amount of FC being significantly higher than in 348 other seasons.  The annual average concentration of dissolved oxygen was 8.8 mg/L, which was in line with Class I of the standard. The lowest and highest seasonal concentrations of DO were in summer and winter, with concentrations of 8.19 mg/L and 9.88 mg/L, respectively. The maximum annual average concentration (10.92 mg/L) and detected concentrations (winter, 13.17 mg/L) of DO were both seen at the BZ station. DO presented an obvious fluctuating upward trend from the southern to northern stations across four seasons (Figure 3c). The dissolved oxygen of all stations in spring met the Class I standard, however, in autumn, winter, and summer, 11, one, and three stations detected DO concentrations at Class II, respectively. Three consecutive stations (from TC to CG) had an annual average content of DO at Class II, with concentrations <7.5 mg/L. The high concentrations of DO and the spatial variation of this indicator in the canal can be explained by two reasons: one is that the water quality of the water source was excellent and the content of oxygen-consuming organic matter in the water body was quite low, and the other one is that the high-flow water delivery through a long-distance open channel was in full contact with the air and the reoxygenation speed can be faster than in other natural water bodies, such as lakes and rivers [53]. Dissolved oxygen was generally negatively correlated with water temperature, i.e., high temperature and abundant sunshine are likely to cause lower concentrations of dissolved oxygen in summer than in winter [54]. Since dissolved oxygen is an important indicator to maintain biological survival in water and to measure the classifications of water quality [55], the interactions between dissolved oxygen and other water quality factors in this project need further study. In addition, some stations that detected dissolved oxygen concentrations increased abnormally in some seasons, therefore, there is a need to focus on algae density monitoring.
The permanganate index (PI) is a comprehensive indicator that can present the degree of the organic pollution of surface water [56]. There was no obvious spatial variation of PI from the southern to northern stations (Figure 3d). The annual mean concentration of PI was 1.95 mg/L, and the seasonal concentrations ranged from 1.76 mg/L (winter) to 2.05 mg/L (spring). There were two seasonal PI concentrations (spring and summer) higher than the threshold of Class I (2.00 mg/L), while 19, 19, 15, and two stations from spring to winter had PI concentrations >2.00 mg/L respectively, indicating that the MR of SNWDPC has a potential risk of organic pollution. The annual mean concentration of NH 3 -N was 0.0493 mg/L, ranging from 0.0251 mg/L (autumn) to 0.0636 mg/L (spring). The highest annual average concentration (0.071 mg/L) and detected concentration (summer, 0.133 mg/L) were both at the HW station, revealing that some exogenous sources of nitrogen input at this station. The NH 3 -N content in autumn was significantly lower than in other seasons, and the NH 3 -N had no obvious spatial distribution from the southern to northern stations in spring, summer, and winter ( Figure 3e). The annual mean content of fecal coliform was 198 colonies/L, which was very close to the threshold of Class I (200 colonies/L), indicating that the FC has a high risk of becoming Class II. The maximum annual average (833 colonies/L) and the detected contents (3117 colonies/L, summer) were both at the BZ Station, while the highest seasonal content was in summer (630 colonies/L) (Figure 3f). Fecal coliform is a water quality indicator that is closely related to human activities. Since 13 stations (from NC to HN) pass through the densely populated areas of the Hebei Province, Tianjin Municipality, and Beijing, these northern stations are easily affected by intensive human activities, which led to significantly higher contents than those of the southern stations [57]. In addition, the water temperature was also a key factor that could cause the increase in the FC quantity. A low temperature environment is not conducive to the reproduction of FC, and high temperature in summer was a major cause of the amount of FC being significantly higher than in other seasons. The annual mean concentration of TP was 0.0195 mg/L, which was very close to the threshold of the Class I standard (0.02 mg/L), and the seasonal concentrations ranged from 0.0178 mg/L (winter) to 0.0220 mg/L (summer). There were two seasonal concentrations (summer and autumn) >0.02 mg/L, while 14, 14, 16, and 15 stations from spring to winter detected TP concentrations >0.02 mg/L, respectively. The TP concentrations showed a spatial trend that decreased from the southern to northern stations across the four seasons (Figure 4a). Starting from the TC station, 11 consecutive stations had annual mean concentrations >0.02 mg/L, while the maximum was at ZW station (0.034 mg/L). The maximum detected TP was at CQ station (summer and autumn, 0.04 mg/L). According to previous studies, the TP in the Danjiangkou Reservoir was maintained at the content level that was higher than 0.02 mg/L during the early operation period of the MR of the SNWDPC [58], which would cause the higher TP concentrations at the beginning of the canal. After long-distance delivery, the TP can be reduced due to the self-purification of the project by some physical, chemical, and biological processes in the water [59]. As about 80% of the eutrophication in water bodies is restricted by phosphorus [36], and nearly half of the stations in each season had TP concentrations exceeding the threshold of Class I, our results imply that besides the effect of the original water source, there were also some exogenous inputs that affect the TP concentrations, or can be closely related to strong human activities and the growth of algae [60]. Hence, the government should pay more attention to monitoring phosphorus sources and algae reproduction, especially in summer and autumn.

405
and their spatial distributions had no obvious regularity (Figures 4f, g).

412
The water temperature gradually decreased from the southern to northern stations, which is     The annual mean concentration of TN was 0.86 mg/L, and the seasonal concentrations ranged from 0.804 mg/L (autumn) to 0.898 mg/L (winter). The maximum annual average (1.02 mg/L) and detected concentration (1.107 mg/L, spring) were both found at the TC station. The seasonal variation of TN and NH 3 -N presented good similarity, with the concentrations gradually reduced from spring to autumn, and the lowest seasonal concentrations were both in autumn, while they increased significantly in winter ( Figure 4b). The reason for this result can be attributed to the fact that algae proliferate more easily in summer and autumn under sufficient sunshine and high temperature, and absorb nitrogen in the water as a nutrient source, thus reducing the contents of TN and NH 3 -N [61]. In winter, when the algae died and underwent microbial decomposition, large amounts of nitrogen was released and led to the increase of TN and NH 3 -N. Related studies have shown that nitrogen and phosphorus are the key limiting factors for algal population density and quantity, but the dominant species of phytoplankton in different water environments have discrepancies, which would cause differences in nitrogen and phosphorus consumption [62]. The concentration ratio of TN to TP was 44.13:1, which indicates that the phosphorus was the limiting element of the density and quantity of algae in the MR of the SNWDPC, and the Danjiangkou Reservoir has a high risk of eutrophication due to the rich nitrogen and phosphorus [63]. The annual mean concentrations of Cu and Zn were 1.537 and 1.692 µg/L respectively, while the seasonal concentrations ranged from 1.185 µg/L (summer) to 2.053 µg/L (spring), and 1.539 µg/L (spring) to 1.786 µg/L (autumn), respectively. The maximum detected concentrations of Cu and Zn were both present at the XS station in autumn, with concentrations of 3.667 and 3.010 µg/L, respectively. Cu exhibited spatial variation where the concentrations of 13 northern stations from NC to HN showed a general upward trend that was higher than those of the 16 southern stations from TC to ZN (Figure 4c). The Zn content in the 13 northern stations from NC to HN was relatively stable with no significant seasonal fluctuations, while 16 southern stations from the TC to ZN showed obvious fluctuation across the four seasons ( Figure 4d). As the contents of Cu and Zn were far lower than the respective threshold concentration of Class I, and there was no abnormally high concentration point, this has proven that the MR of the SNWDPC has no risk of Cu and Zn pollution.
The concentration of petroleum at all stations was very stable, most of the petroleum concentrations were actually undetected and had no spatio-temporal variations in the four seasons (Figure 4e). The SO 4 2− and F − fluctuated from the southern to northern stations in the four seasons, and their spatial distributions had no obvious regularity (Figure 4f pollutants in water to some extent [64], stations with abnormally high concentrations such as ZSE and ZN should focus on monitoring and researching the sources of inorganic nutrients. The water temperature gradually decreased from the southern to northern stations, which is consistent with the geographic variation of air temperature in China (Figure 4h). The annual average of water temperature is 17.55 • C, and the seasonal concentrations range from 6.12 • C (winter) to 27.78 • C (summer). Water temperature can affect the contents of DO, BOD 5 , and the vertical distribution of many inorganic salts [65], and can directly change the nitrogen and phosphorus cycle processes [6,62,63]. The relationships and interactions of water temperature and other water quality indicators need further study.

Correlation Analysis
The Pearson correlation was used to study and understand the relationships and interactions between the water quality indicators in the MR of the SNWDPC. Since the concentrations of Cr and Petroleum were stable and undetected at most stations, their correlations were not calculated and analyzed in this study. The impacts of different hydrological, geochemical, and human activities on the water quality at the different stations could be responsible for strong or weak correlation coefficients. The results are shown in Table 5 and Figure 5.

472
In this study, the PCA method was applied to identify the most contributive natural and 473 anthropogenic sources of the indicators, and to further understand their distribution characteristics.

474
Data reliability for the PCA was calculated by the KMO (Kaiser-Meyer-Olkin) test and Bartlett's 475 test of sphericity. The results are presented in Table 6 in Figure 6.

476
The results showed that the KMO   inorganic substances [53]. Fecal coliform has been widely used to indicate the extent of fecal   Bold values represent correlation with significance. a Significance at the 0.01 probability level. b Significance at the 0.05 probability level. Table 5 and Figure 5 display the correlation coefficients between each pair of indicators at the 0.01 or 0.05 probability level. There were significant relationships between the five metal indicators, i.e., the As, Pb, Hg, and Cd had positive correlation coefficients of 0.465 to 0.977 at the 0.01 level, and Se had negative correlation coefficients with those four indicators, with values ranging from −0.598 to −0.934 at the 0.01 level, indicating that the contents of As, Pb, Hg, and Cd in the MR of the SNWDPC are mainly from sources affected by human activities, which is basically consistent with previous studies [5,13,24], while the inverse relationship of Se with As, Pb, Hg, and Cd was indicative of the exogenous inputs of natural sources [12]. As there were no statistically significant correlations between pH and other indicators except PI and F − , which could be due to the fact that the pH was maintained in a range of 8.12 to 8.32 throughout the monitoring year, this indicates that the physical, chemical, and biological reactions and the growth process of aquatic organisms in this project had a stable alkaline environment [66], thus the pH did not become a limiting indicator in this study [67]. Since the effects of pH on other indicators cannot be determined by the correlation coefficients, the interactions between pH and planktonic algae or other water quality indicators in the project need further study. The BOD 5 indicates the degree of organic pollution in water body, i.e., the higher the BOD 5 concentrations, the more dissolved oxygen is consumed by microbial metabolism, hence the BOD 5 generally has a strong a negative correlation with dissolved oxygen (−0.829, P < 0.01) [52]. High temperature and sunshine duration in summer easily lead to the escape of dissolved oxygen in water [65], so the DO generally has a negative correlation with water temperature (−0.798, P < 0.01). Moreover, when the dissolved oxygen content was high, the sediment in water was at an oxidation state that would be suitable for the growth of aerobic bacteria such as nitrobacteria and nitrifying bacteria, and can accelerate the nitrogen and phosphorous cycle processes [62], resulting in the DO having a negative correlation with TN (−0.688, P < 0.01) and TP (−0.870, P < 0.01). In addition, higher dissolved oxygen contents also present a stronger self-purification ability of water, which would accelerate the oxidation reaction of metal ions and make them precipitate with sediment [17], so the As, Pb, Hg, and Cd will always have negative and positive correlation coefficients with dissolved oxygen and BOD 5 respectively, and a sufficient dissolved oxygen environment will also lead to the increase of fecal coliform [57]. These relationships, as reflected in the correlation coefficients, were basically consistent with previous research and natural phenomena.
The correlation coefficients of TP and TN with other indicators presented good similarity, but there were more significant statistical relationships between the TP and other indicators. This result also indicates that the TP was the key limiting indicator to characterize the nutritional status of water quality in the MR of the SNWDPC, which is consistent with our previous analysis [63,68]. The correlation coefficients of Cu and Zn with other indicators were basically consistent with Se, which suggests that these indicators predominantly originate from similar sources. Although the SO 4 2− and F − are important soluble inorganic salts that can reflect the load of inorganic nutrient pollutants in water, they both had weak correlation coefficients with most of the indicators in this case, which are indictive of various factors and multiple sources [64].

Principal Component Analysis
In this study, the PCA method was applied to identify the most contributive natural and anthropogenic sources of the indicators, and to further understand their distribution characteristics. Data reliability for the PCA was calculated by the KMO (Kaiser-Meyer-Olkin) test and Bartlett's test of sphericity. The results are presented in Table 6 in Figure 6.
The results showed that the KMO test value was 0.846 and the χ 2 of Bartlett's test was 676.95 at the significance level of P < 0.01, which proved the suitability of the data for PCA. Three components with eigenvalues >1 explained about 82.67% of the total variance (Table 6). Eighteen indicators were assembled into three groups. The first principal component (PC) accounting for the most variance (48.66%) had high loadings with absolute values from 0.674 to 0.855 on 13 indicators, including As, Cd, Pb, Se, DO, PI, BOD 5 , FC, TP, TN, Cu, Zn, and WT, presented with a very close distance of the loading balls in Figure 6. The PI, DO, BOD 5 , FC, TP, and TN mainly reflected the effects of the organic pollution, nutrients, and bacteria caused by human activities, i.e., PI, DO, and BOD 5 are indicators to measure the pollution degree of the surface water by organic and reductive inorganic substances [52]. Fecal coliform has been widely used to indicate the extent of fecal contamination and has been reported to be strongly positively correlated with pathogenic intestinal bacteria [39], and TN and TP are the most important indicators to reflect the human activities elevating nutrient levels in water bodies [60,61]. As, Cd, Pb, Cu, and Zn are primarily contributable to human activities, such as vehicle exhaust emissions [14], mining engineering [9], and metal production [13], while the content of Se can be influenced by mineral or crustal weathering [16] and pedogenesis [17], hence the first PC was indicative of the mixed sources of anthropogenic and natural contributions. The second and third PCs explained 22.26% and 11.75% of the total variance respectively, and were mainly contributed by Hg, NH 3 -N, SO 4 2− , and pH and F-respectively, which corresponded to the relatively high loadings on Axes 2 and 3, respectively ( Figure 6). The Hg predominantly came from agrochemicals and industrial waste [10,11,69], while NH 3 -N and SO 4 2− are primarily derived from some soluble inorganic nitrogen and inorganic salts affected by some agricultural practices and mining engineering [6,12]. Therefore, the second PC was assumed to come from mixed exogenous sources of human activities. However, the inverse loading relationships of Hg and SO 4 2− with NH 3 -N were suggestive of the external inputs of Hg and SO 4 2− . The third PC had high loadings on pH (0.939) and F − (−0.608), which reflects the physical and soluble inorganic salt characteristics of the water quality. Since the F − was principally from some natural processes, such as mineral weathering and karstification, this component was ascribed to natural sources [64].  that the water quality in the MR of the SNWDPC was in line with the "Excellent" level (Table 3) 519 and was quite safe and suitable as a source of water and in national protection areas. Class III (Table 2). More importantly, almost all forms of Hg in water can be converted to 530 methylmercury with appropriate conditions of temperature, pH, and dissolved oxygen, which 531 would easily transfer and accumulate in the food chain of aquatic systems and can threaten human 532 health, such as damage to the human nervous system and harm the fetus, therefore, the monitoring 533 should pay special attention to the Hg [39]. Our results also suggest that the management 534 departments should strengthen the investigation and supervision to control the urban and 535 industrial production and risk sources within the areas of high-density human activities along the The PCA results suggest that the impacts of natural and human activities on the water quality in the MR of the SNWDPC are relatively complex. Although the project has adopted various strict protections to prevent the external sources of pollution, the impact of anthropogenic activities, especially those of industrial and agricultural activities on water quality, cannot be ignored. Therefore, the government should conduct further investigation and control measurements of risk sources in intense anthropogenic areas.

Water Quality Assessment
The WQI calculation results are presented in Figure 7. The annual mean value of WQI was 13.24, and the values of the 29 stations ranged from 10.49 to 17.93, while the seasonal concentrations from spring to winter were 14.80, 14.68, 12.36, and 13.54 (Figure 7a) respectively, which revealed that the water quality in the MR of the SNWDPC was in line with the "Excellent" level (Table 3) and was quite safe and suitable as a source of water and in national protection areas.
However, the maximum calculated WQIs across the four seasons were 26.40, 21.34, 16.13, and 21.33 respectively, and six stations had water quality at the "Good" level, with WQI values > 20 including four (SS, SZ, XNE, and HW stations) in spring, one (FC station) in summer, and one (ZSE station) in winter (Figure 7a,b), while spring had the largest WQI change interval, with a value of 8.79 to 26.40. These results indicate that the water quality have higher risk in spring than other seasons. The calculation results also showed that the WQIs of these stations were all determined by the Hg indicator due to its corresponding highest I i value in the Toxic Metals Group. Although the maximum detected Hg value in 2016 was 0.033 µg/L, which was still lower than the threshold of Class I, however, when the content of Hg exceeded 0.05 µg/L, the water would directly become Class III (Table 2). More importantly, almost all forms of Hg in water can be converted to methylmercury with appropriate conditions of temperature, pH, and dissolved oxygen, which would easily transfer and accumulate in the food chain of aquatic systems and can threaten human health, such as damage to the human nervous system and harm the fetus, therefore, the monitoring should pay special attention to the Hg [39]. Our results also suggest that the management departments should strengthen the investigation and supervision to control the urban and industrial production and risk sources within the areas of high-density human activities along the MR of the SNWDPC, and carry out corresponding treatments to reduce the impacts of potential pollutants carried by rainfall runoff and dust fall on water quality.

Conclusions
This article presented 6960 observations of 20 water quality indicators in 29 water quality monitoring stations of the South-to-North Water Diversion Project of China in 2016 to analyze the spatio-temporal characterization of water quality indicators, identify the main contributive anthropogenic and natural sources impact on the indicators, and make full use of the data to evaluate and understand the water quality classifications and their variation. Our conclusions can be presented as follows: 1. Six toxic metals including Hg, Pb, As, Cd, Se, and Cr of the 29 water quality monitoring stations were shown to have quite low concentrations that were all in line with the Class I standard. The As, Pb, and Cd presented spatial variations in 16 southern stations (from TC to ZN) which had higher concentrations than the rest of the northern stations, revealing that more detection should be put into these southern stations. Biochemical indicators including the dissolved oxygen, BOD 5 , permanganate index, ammonia nitrogen, and fecal coliform met the Class I or Class II levels, while the dissolved oxygen and BOD 5 showed opposite spatial variability in the Easily Treated Indicators group. The concentration ratio of the TN to TP was 44.13:1 and these two indicators had relatively higher concentrations than the other nutrient factors, which indicated that the project has a potential risk of eutrophication and corresponding treatments should be considered.
2. The one-way analysis of variance results demonstrated that there were 14 indicators including the Hg, Pb, pH, dissolved oxygen, permanganate index, NH 3 -N, fecal coliform, TN, TP, SO 4 2− , F − , Cu, petroleum, and water temperature exhibited significant differences across the four seasons. Four toxic metals indicators, including As, Pb, Hg, and Cd, had positive correlation coefficients ranging from 0.465 to 0.977 between each pair at the P < 0.01 level, while the TP and TN presented good similarity with other indicators according to the correlation analysis results.
3. The calculations of PCA showed that the first principal component with 48.66% of the total variance was controlled by mixed sources of anthropogenic and natural contributions, while the second principal component had high loadings on Hg, NH 3 -N, and SO 4 2− , showing that 22.26% of the total variance mainly came from anthropogenic inputs, and the F-was mainly impacted by natural sources. 4. The WQI result revealed that the average WQI value of this project was 13.24, which corresponded to an "Excellent" level of water quality. There were six stations in total with WQI values exceeding 20 across different seasons, with a rating of "Good" in terms of water quality. Since the WQI values that exceeded 20 were all determined by the contents of Hg in this study, the result also indicates that the potential risk of industrial and agricultural activities near corresponding stations with abnormal WQI values should be carefully investigated, especially sources of Hg, and the monitoring of water quality should be strengthened.
5. The dominant species and the density of algae and their relationship with the water quality indicators of the MR of the SNWDPC are required for further research, and the interactions of water velocity and flow with the water quality indicators are also needed for greater quantificational study in the future for more efficient water quality management.