Combined Exceedance Probability Assessment of Water Quality Indicators Based on Multivariate Joint Probability Distribution in Urban Rivers

: Discharge and water quality are two important attributes of rivers, although the joint response relationship between discharge and multiple water quality indicators is not clear. In this paper, the joint probability distributions are established by copula functions to reveal the statistical characteristics and occurrence probability of different combinations of discharge and multiple water quality indicators. Based on the data of discharge, ammonia nitrogen content index (NH 4+ ) and permanganate index (COD Mn ) in the Xiaoqing River in Jinan, we ﬁrst tested the joint change-point with the data from 1980–2016, before we focused on analyzing the data after the change-point and established the multivariate joint probability distributions. The results show that the Gaussian copula is more suitable for describing the joint distribution of discharge and water quality, while the year of 2005 is a joint change-point of water quantity and quality. Furthermore, it is more reasonable to use the trivariate joint probability distribution as compared to the bivariate distributions to reﬂect the exceedance probability of water quality combination events under different discharge conditions. The research results can provide technical support for the water quality management of urban rivers.


Introduction
Water quantity and quality are two important attributes of rivers. In recent years, the water quality of rivers is deteriorating with the development of society and economy, with water dispatching having become one of the main technologies of water quality improvement [1][2][3][4]. However, due to the disturbance of urban rivers from human activities, the relationship between water quantity and water quality has become more complicated. The relationship between a single water quality indicator and the water quantity can hardly reflect the real situation of river pollution [5]. Thus, the methods for improvement of the river water quality should be based on a correct understanding of the joint response relationship between water quantity and multiple water quality indicators.
Many scholars have simulated and predicted the river water quality from the aspects of mechanism models and non-mechanism models. The mechanism models are based on the hydrodynamic model and water quality equations in order to simulate the diffusion and attenuation process of pollutants [6][7][8]. However, due to the wide variety of influencing factors and the cognitive limitations of the real water quality, this makes the model construction and parameter determination difficult [9][10][11]. Therefore, this indicates that the simulation effect of the nature river is better than that of the urban river, which is affected by human activities. The non-mechanism model can avoid this type of problems to some extent. Sun et al. [12] used the gray model group method and the exponent smoothing method to construct a probabilistic combination prediction framework of water quality to forecast the individual indicator of chemical oxygen demand in a complex environment. After this, some new models for water quality prediction have been proposed, which integrates different non-mechanism models, such as the neural networks model, fuzzy model, and wavelet model [13][14][15]. In addition, there are also some joint models that combine hydrodynamics and water quality simulation tests [16,17]. However, most non-mechanism models require that the variables obey the same marginal distribution or that these variables need to be transformed to the same distribution, which not only leads to the loss of the original data information, but it also does not conform to the fact that variables, such as discharge and water quality, obey different marginal distributions. Meanwhile, because less emphasis is placed on the joint distribution between the variables, non-mechanism models are more suitable to use a simulation of the discharge and individual water quality indicator when compared to a simulation of the discharge and multiple water quality indicators. Copula can construct the joint probability distribution of multiple variables [18] and it does not require each variable to follow the same marginal distribution. Thus, it can effectively preserve the nature of the original data. Copula has the advantages of great adaptability and flexibility, so it has been widely used in the field of hydrology, especially in determining flood risks [19][20][21][22] and drought characteristics [23][24][25][26][27][28]. However, copula has been used less frequently in the joint analysis of discharge and water quality, with most of the uses focusing on bivariate joint distribution. As such, there is a distinct lack of studies that use copula for a multivariable joint distribution.
Therefore, the purpose of this paper is to use copulas to establish the multivariate joint probability distribution of the discharge and water quality indicators, before analyzing the exceedance probability of multiple water quality indicators under different conditions of discharge.

Establishment of Copula Function
Copula is a connection function, which is defined in R with marginal distribution functions F 1 , F 2 , · · · , F n for n-dimensional random variables X 1 , X 2 , · · · , X n [29]. If H is the n-dimensional distribution function of n-dimensional random variables, for any x i ∈ R(i = 1, 2, · · · , n), there exists the following copula function: where n is the number of variables; C is the associated dependence copula function; and, F i (x i ) is the marginal distribution functions.
Archimedean copulas [30] and Elliptical copulas [31] are commonly used in hydrology. The typical Archimedean copulas include Clayton, Gumbel-Hougaard (G-H), and Frank copula, which require the variables to fall within a symmetric structure. Elliptical copulas are neither limited by the correlation nor restricted by the symmetric structure between variables. However, the multiple hydrological variables may be asymmetric, with independent, positive, or negative relationships occurring among the variables. Thus, it is more reasonable to use the Elliptical copulas to describe the joint distribution of variables when compared to the Archimedean copulas. The Gaussian copula in Elliptical copula is expressed as: where Φ −1 (.) is the inverse function of the standard normal distribution; d is the dimensions of the random variable; and,

Test of Joint Change-Point
The purpose of testing the change-point of the multivariable dependence is to make the data more representative in different periods. At the same time, we can utilize the collected data as much as possible to improve the accuracy of the simulation.
Assuming that there is one change-point in the copula function constructed based on the observed data (x 1 , y 1 , z 1 ), · · · (x n , y n , z n ), the original hypothesis and the alternative hypothesis can be, respectively, assumed as: If the original hypothesis is rejected, then k * is the moment of change. If k * = k, the log likelihood ratio statistic of copula can be established based on the maximum likelihood estimation method, which is shown in Equation (4): where θ k , θ k * , θ n are the maximum likelihood estimations of the corresponding data; and, C uvw (U(x), V(y), W(z)) = ∂ ∂u ∂ ∂v ∂ ∂w C(u, v, w). The statistic Z n = max 1≤k<n (−2 ln(Λ k )) is constructed. If k * is unknown, the original hypothesis will be rejected when Z n is greater than the critical value, which essentially demonstrates that the copula function has change-points [32].

Exceedance Probability of Water Quality
In order to quantitatively analyze the response relationship between the discharge and water quality, according to the full probability formula and the copula functions [33,34], we can calculate the probability that the water quality indicators exceed specific values. However, discharge needs to be less than a specific value, which addresses the exceedance probability of water quality. Taking the joint probability distribution of two and three dimensions as an example, the expressions are as follows: where P U 1 U 2 is the value of the bivariate joint probability distribution; P U 1 U 2 U 3 is the value of the trivariate joint probability distribution; U 1 is the discharge; U 2 and U 3 are the water quality indicators; F U i (u i ) is the marginal distribution function of U 1 ; and, C(u 1 , · · · , u i ) is the copula function of pair (U 1 , · · · , U i ).

Study Area and Data
Xiaoqing River, which is located in Jinan, is a manually excavated city river. Furthermore, it is the main flood discharge channel in Jinan. The Huangtaiqiao hydrologic station is located in the main stream of the river, which has monitored the discharge from the river since 1953 with complete water quality monitoring having begun in 1970. The distance between the Muli Gate to Huangtaiqiao hydrologic station is 21.0 km, with a drainage area of 321.0 km 2 and an average annual discharge of 10.4 m 3 /s. The river basin is located in the main urban area of Jinan. Due to the rapid development of urbanization, the water pollution in this area is serious. In order to build the ecological civilization, there has been water diversion from the Eastern Route of South-to-North Water Transfer Project to improve the water quality. However, the combined effect of urbanization and water diversion result in the response relationship between the water quantity and quality in Xiaoqing River has been severely disturbed, which has seriously affected its water quality simulation. Therefore, the statistical analysis of the occurrence probability of the combined events of water quantity and water quality can provide a basis for the scientific dispatch. The drainage map of Xiaoqing River in Jinan can be seen in Figure 1.
where 1 2 U U P is the value of the bivariate joint probability distribution;

Study Area and Data
Xiaoqing River, which is located in Jinan, is a manually excavated city river. Furthermore, it is the main flood discharge channel in Jinan. The Huangtaiqiao hydrologic station is located in the main stream of the river, which has monitored the discharge from the river since 1953 with complete water quality monitoring having begun in 1970. The distance between the Muli Gate to Huangtaiqiao hydrologic station is 21.0 km, with a drainage area of 321.0 km 2 and an average annual discharge of 10.4 m 3 /s. The river basin is located in the main urban area of Jinan. Due to the rapid development of urbanization, the water pollution in this area is serious. In order to build the ecological civilization, there has been water diversion from the Eastern Route of South-to-North Water Transfer Project to improve the water quality. However, the combined effect of urbanization and water diversion result in the response relationship between the water quantity and quality in Xiaoqing River has been severely disturbed, which has seriously affected its water quality simulation. Therefore, the statistical analysis of the occurrence probability of the combined events of water quantity and water quality can provide a basis for the scientific dispatch. The drainage map of Xiaoqing River in Jinan can be seen in Figure 1. This study selected the data of discharge and water quality of Huangtaiqiao station during 1980-2016, with the data of water quality having been recorded every other month and the discharge having been recorded at the same time when the water quality samples were taken (using the main pollutants, nitrogen content index (NH4 + ), which includes the ionised ammonia nitrogen and the free ammonia nitrogen, and permanganate index (CODMn) as the calculation indicators). To remove some unreasonable data, 161 sets of water quantity and water quality data were applied in This study selected the data of discharge and water quality of Huangtaiqiao station during 1980-2016, with the data of water quality having been recorded every other month and the discharge having been recorded at the same time when the water quality samples were taken (using the main pollutants, nitrogen content index (NH 4 + ), which includes the ionised ammonia nitrogen and the free ammonia nitrogen, and permanganate index (COD Mn ) as the calculation indicators). To remove some unreasonable data, 161 sets of water quantity and water quality data were applied in this paper.
The sample data series of NH 4 + , COD Mn , and discharge of the Huangtai Station from 1980 to 2016 are shown in Figure 2, and the characteristics and trend results of the generalized basic data about the water quality are given in Table 1 has a significant decreasing trend, and the discharge has a significant increasing trend, while the trend of NH 4 + descending was not obvious.
Water 2018, 10, x FOR PEER REVIEW 5 of 14 this paper. The sample data series of NH4 + , CODMn, and discharge of the Huangtai Station from 1980 to 2016 are shown in Figure 2, and the characteristics and trend results of the generalized basic data about the water quality are given in Table 1. The trends of the generalized basic data are tested by Mann-Kendall (M-K). It can be seen that the concentrations of NH4 + and CODMn are basically larger than the grade III value (Environmental Quality Standards for Surface Water (GB3838-2002)). Moreover, the CODMn has a significant decreasing trend, and the discharge has a significant increasing trend, while the trend of NH4 + descending was not obvious.   In order to better reflect the relationship between the discharge and the multiple water quality indicators in the Xiaoqing River Basin, which is a rapidly developing area, we carried out a test for the change-point of the long sequence data , before focusing on the analysis of the data after the change-point.

Marginal Distributions and Correlation Coefficient of Q, NH 4 + and COD Mn
The marginal distributions of the Q, NH 4 + , and COD Mn of Xiaoqing River are established by the lognormal, gamma, and Pearson-III distributions to search for the proper cumulative distribution function (CDF). The proper CDF of the parameters was chosen using two goodness fit tests: the Kolmogorov-Smirov (K-S) test and root mean squared error (RMSE) test. The results of the proper marginal distribution of each variable are shown in Table 2. From Table 2, we can see that the marginal distribution of Q and NH 4 + sequence is more consistent with a Gamma distribution, while COD Mn is more consistent with a lognormal distribution. Note: The number in bold means that RSME is the minimum and the K-S is 0.10172 according to the 5% significance level when n = 161.
The Kendall correlation coefficient τ is used to analyze the correlation of Q, NH 4 + , and COD Mn , respectively, which is shown in Table 3. We can see that the pair of (Q, NH 4 + ) and the pair of (Q, COD Mn ) are negatively correlated, while the pair (NH 4 + , COD Mn ) has a positive correlation.
This provides the conditions for the application of Gaussian Copula. The Frank, GH, and Gaussian copulas are used to construct the bivariate and trivariate joint probability distributions with the data of water quantity and water quality of Xiaoqing River in Jinan that were obtained during 1980-2016. The proper distribution was chosen using K-S and the RMSE tests, with the parameters of copulas and the results of the goodness fit being shown in Table 4. Figure 3 compares the sample empirical distribution with the joint distribution of the pairs of (Q, NH 4 + ), (Q, COD Mn ), and (Q, NH 4 + , COD Mn ). From these figures, we can see that the correlation coefficients R 2 of Gaussian copula are higher. This illustrates that the Gaussian copula is more suitable in establishing both the bivariate joint probability distributions and the trivariate joint probability distribution. Note: ρ 12 is the correlation coefficient of (Q, NH 4 + ); ρ 13 is the correlation coefficient of (Q, COD Mn ); and ρ 23 is the correlation coefficient of (NH 4 + , COD Mn ). The K-S is 0.1626 according to the 5% significance level when n = 70.
Water 2018, 10, x FOR PEER REVIEW 7 of 14 RMSE tests, with the parameters of copulas and the results of the goodness fit being shown in Table 4. Figure 3 compares the sample empirical distribution with the joint distribution of the pairs of (Q, NH4 + ), (Q, CODMn), and (Q, NH4 + , CODMn). From these figures, we can see that the correlation coefficients R 2 of Gaussian copula are higher. This illustrates that the Gaussian copula is more suitable in establishing both the bivariate joint probability distributions and the trivariate joint probability distribution.  is the correlation coefficient of (Q, NH4 + ); 13  is the correlation coefficient of (Q, CODMn); and 23  is the correlation coefficient of (NH4 + , CODMn). The K-S is 0.1626 according to the 5% significance level when n = 70.

Test of Joint Change-Point of Q, NH4 + , and CODMn
According to the optimal fitting copula functions of Q, NH4 + , and CODMn using the data from 1980-2016, the joint change-points can be identified. When the number of data is 161, the critical value of n Z is 13.35 [35]. The results are shown in Figure 4, which show that n Z is greater than its critical value in No. 94 of the data, which is July 2005. Therefore, this is the change-point of the combination of (Q, NH4 + , CODMn). Furthermore, the result also matches the actual situation: the continuous drought for several years before 2005 in Jinan resulted in the management department transferring about 1 × 10 8 m 3 of water into the Xiaoqing River, which might cause a change in the relationship between the water quantity and quality.
This indicates that the dependency relationship of the data was inconsistent between 1980 and 2016, so a change-point test was needed to identify the data sequences that are consistent with the current situation.

Establishment of multivariate joint probability distribution after the joint change-point
The data of Q, NH4 + , and CODMn after the joint change-point is more representative of the relationship between water quantity and quality in recent years. Therefore, we re-established the multivariate joint probability distribution using the data after the change-point. The parameters of the copulas and the results of the goodness fit are shown in Table 5. From Table 5, we can see that the joint probability distributions are still consistent with the Gaussian copula function. Meanwhile, their corresponding contour plots and contour surface are presented in Figure 5.

Test of Joint Change-Point of Q, NH 4 + , and COD Mn
According to the optimal fitting copula functions of Q, NH 4 + , and COD Mn using the data from 1980-2016, the joint change-points can be identified. When the number of data is 161, the critical value of Z n is 13.35 [35]. The results are shown in Figure 4, which show that Z n is greater than its critical value in No. 94 of the data, which is July 2005. Therefore, this is the change-point of the combination of (Q, NH 4 + , COD Mn ). Furthermore, the result also matches the actual situation: the continuous drought for several years before 2005 in Jinan resulted in the management department transferring about 1 × 10 8 m 3 of water into the Xiaoqing River, which might cause a change in the relationship between the water quantity and quality.

Test of Joint Change-Point of Q, NH4 + , and CODMn
According to the optimal fitting copula functions of Q, NH4 + , and CODMn using the data from 1980-2016, the joint change-points can be identified. When the number of data is 161, the critical value of n Z is 13.35 [35]. The results are shown in Figure 4, which show that n Z is greater than its critical value in No. 94 of the data, which is July 2005. Therefore, this is the change-point of the combination of (Q, NH4 + , CODMn). Furthermore, the result also matches the actual situation: the continuous drought for several years before 2005 in Jinan resulted in the management department transferring about 1 × 10 8 m 3 of water into the Xiaoqing River, which might cause a change in the relationship between the water quantity and quality.
This indicates that the dependency relationship of the data was inconsistent between 1980 and 2016, so a change-point test was needed to identify the data sequences that are consistent with the current situation.

Establishment of multivariate joint probability distribution after the joint change-point
The data of Q, NH4 + , and CODMn after the joint change-point is more representative of the relationship between water quantity and quality in recent years. Therefore, we re-established the multivariate joint probability distribution using the data after the change-point. The parameters of the copulas and the results of the goodness fit are shown in Table 5. From Table 5, we can see that the joint probability distributions are still consistent with the Gaussian copula function. Meanwhile, their corresponding contour plots and contour surface are presented in Figure 5. This indicates that the dependency relationship of the data was inconsistent between 1980 and 2016, so a change-point test was needed to identify the data sequences that are consistent with the current situation.

Establishment of Multivariate Joint Probability Distribution after the Joint Change-Point
The data of Q, NH 4 + , and COD Mn after the joint change-point is more representative of the relationship between water quantity and quality in recent years. Therefore, we re-established the multivariate joint probability distribution using the data after the change-point. The parameters of the copulas and the results of the goodness fit are shown in Table 5. From Table 5, we can see that the joint probability distributions are still consistent with the Gaussian copula function. Meanwhile, their corresponding contour plots and contour surface are presented in Figure 5.    (b) bivariate joint probability of (Q, COD Mn ); and, (c) trivariate joint probability of the combination of (Q, NH 4 + , COD Mn ).

Analysis of Multivariate Joint Probability Distribution of Q, NH 4 + and COD Mn after the Joint Change-Point
According to the real situation of the river water quality and the existing water quality standards "Environmental Quality Standards for Surface Water" (GB3838-2002), we conducted a cluster analysis of the water quality to obtain the new classes of water quality, which are shown in Table 6. Using the constructed joint probability distribution of Q, NH 4 + , and COD Mn , the probability of the water quality exceeding a specific value under the different conditions of discharge can be analyzed by Equations (3) and (4). P U 1 U 2 and P U 1 U 3 are the bivariate joint probability distributions of the pair of (Q, NH 4 + ) and the pair of (Q, COD Mn ), respectively, which represent the exceedance probability of NH 4 + or COD Mn in different discharge situations. P U 1 U 2 U 3 is the trivariate joint probability distribution of the pair of (Q, NH 4 + , COD Mn ), which represents the combined exceedance probability of (NH 4 + , COD Mn ) in different discharge situations. The contour plots of P U 1 U 2 and P U 1 U 3 are given in Figure 6a,b, respectively, while the contour surface of P U 1 U 2 U 3 is shown in Figure 6c. From Figure 6a,b, we can see that when P U 1 U 2 or P U 1 U 3 is in the range of 0.1-0.4, the contour plot is relatively sparse, especially in the upper right panel of the figures. This indicates that when the concentration of NH 4 + or COD Mn is high, the joint probability is insensitive to a change in discharge.
For the combination of NH 4 + and COD Mn , Figure 6c shows that, when P U 1 U 2 U 3 is between 0.1-0.3, the contour surface is relatively sparse. This is mainly due to the large amount of pollutants in the water body of Xiaoqing River at this moment. At the same time, the tributaries also carry pollutants so the joint probability is insensitive to a change in the discharge. According to the real situation of the river water quality and the existing water quality standards "Environmental Quality Standards for Surface Water" (GB3838-2002), we conducted a cluster analysis of the water quality to obtain the new classes of water quality, which are shown in Table 6. Using the constructed joint probability distribution of Q, NH4 + , and CODMn, the probability of the water quality exceeding a specific value under the different conditions of discharge can be analyzed by Equations (3) and (4). of the pair of (Q, NH4 + ) and the pair of (Q, CODMn), respectively, which represent the exceedance probability of NH4 + or CODMn in different discharge situations. This indicates that when the concentration of NH4 + or CODMn is high, the joint probability is insensitive to a change in discharge. For the combination of NH4 + and CODMn, Figure 6c shows that, when 1 2 3 U U U P is between 0.1-0.3, the contour surface is relatively sparse. This is mainly due to the large amount of pollutants in the water body of Xiaoqing River at this moment. At the same time, the tributaries also carry pollutants so the joint probability is insensitive to a change in the discharge. The typical values of the joint probability of Q, NH4 + , and CODMn are given in Tables 7-9, which describe the probability of the water quality indicators exceeding the specific values with different combinations of discharge and water quality indicators. For example, under the condition that only NH4 + is considered, The results show that the trivariate joint probability of the combination of (Q, NH4 + , CODMn) is larger than the bivariate joint probability of the pairs of (Q, NH4 + ) and (Q, CODMn). This indicates that the probability of exceeding the standards of multiple indicators is greater than that of exceeding individual indicators, which is more consistent with the actual situation. Therefore, only considering the discharge and individual water quality indicator cannot reasonably reflect the relationship between the discharge and water quality. The trivariate joint probability distribution considering Q, NH4 + , and CODMn can more comprehensively express the characteristics of discharge and multiple water quality indicators. Note: "---" denotes impossible events. Note: "---" denotes impossible events. The typical values of the joint probability of Q, NH 4 + , and COD Mn are given in Tables 7-9, which describe the probability of the water quality indicators exceeding the specific values with different combinations of discharge and water quality indicators. For example, under the condition that only NH 4 + is considered, Under the condition that only COD Mn is considered, P U 1 U 3 = P(U 1 ≤ 12, U 3 ≥ 12.0) = 0.2135. Finally, under the condition that NH 4 + and COD Mn are synergistically considered, P U 1 U 2 U 3 = P(U 1 ≤ 12, U 2 ≥ 6.3, U 3 ≥ 12.0) = 0.2493. The results show that the trivariate joint probability of the combination of (Q, NH 4 + , COD Mn ) is larger than the bivariate joint probability of the pairs of (Q, NH 4 + ) and (Q, COD Mn ). This indicates that the probability of exceeding the standards of multiple indicators is greater than that of exceeding individual indicators, which is more consistent with the actual situation. Therefore, only considering the discharge and individual water quality indicator cannot reasonably reflect the relationship between the discharge and water quality. The trivariate joint probability distribution considering Q, NH 4 + , and COD Mn can more comprehensively express the characteristics of discharge and multiple water quality indicators.

Discussion
There is not always a positive correlation between the discharge and water quality indicators, which means that the correlation between variables is asymmetric. In this paper, although Frank and G-H Copulas, which are symmetric copula functions, can fit the probability distribution between the discharge and water quality, the fitting accuracy, and the simulation of higher dimensions are not sufficient when compared to the asymmetric Gaussian copulas. Based on the statistical analysis of the historical data of discharge and water quality, this paper provides a method to quantitatively evaluate the exceedance probability of water quality by considering the joint probability of Q, NH 4 + , and COD Mn . When compared with the previous non-mechanical methods, this method adopts a multi-dimensional joint distribution that is based on a copula to more accurately describe the response relationship between water quality and discharge under the condition of the coexistence of multiple pollutants in rivers. The trivariate joint probability distribution of the combination of (Q, NH 4 + , COD Mn ) can reflect the joint probability for the different combinations of Q, NH 4 + , and COD Mn . Thus, when the combination events (NH 4 + , COD Mn ) exceed their given design values in different discharge situations, the exceedance probability can be calculated. Generally, regardless of whether the pollutant is NH 4 + , COD Mn , or a combination of NH 4 + and COD Mn , the joint probability tends to increase with an increase in the discharge and a decrease in the concentration of pollutants discharged into rivers. Thus, the exceedance probability of the water quality can be reduced by an increase in discharge (Q) and a decrease in concentration of pollutants (NH 4 + and COD Mn ). However, with an increase in discharge, the counter lines or the counter surface tend to level, which means that there is a limit imposed on the improvement of water quality. Therefore, further research is needed to determine the limit value. In addition, we suggest that the administration should strengthen the monitoring of water quality at each entry section of Xiaoqing River so that we can obtain a more comprehensive relationship between the discharge and water quality.

Conclusions
In this paper, the joint probability distributions of discharge and the main pollutants of NH 4 + and COD Mn of Xiaoqing River in Jinan were constructed by copulas, before the exceedance probability of water quality combination events under different discharge conditions was analyzed. The specific conclusions are as follows: (1) The Gaussian copula is more suitable for describing the multivariate joint probability distribution of discharge and water quality. As the relationship between the discharge and water quality indicators is not always positive, the Gaussian copula is more suitable than the Archimedean copulas in the simulation of trivariate or the above joint distribution. (2) Based on the copula, the joint change-point can be identified. For urban rivers, the dependence of water quantity and quality is often affected by human activities, which leads to the emergence of change-point. Therefore, it is necessary to identify the mutation points.