A Vine Copula-Based Modeling for Identification of Multivariate Water Pollution Risk in an Interconnected River System Network

: The Interconnected River System Network (IRSN) has become a popular and useful measure to realize the long-term health and stability of water bodies. However, there are lots of uncertain consequences derived from natural and anthropogenic pressures on the IRSN, especially the water pollution risk. In our study, a Vine Copula-based model was developed to assess the water pollution risk in the IRSN. Taking the ponds around Nanyang station as research objects, we selected five proxy indicators from water quality indexes and eutrophication indexes, which included dissolved oxygen ( DO ), total nitrogen ( TN ), total phosphorus ( TP ), chlorophyll-a ( Chla ), and ammonia nitrogen ( NH3-N ). Models based on three classes of vine copulas (C-, D-, and R-vine) were utilized respectively to identify the water quality indicators before and after the operation of the connection project. Our results showed that TN , Chla , and NH3-N should be considered as key risk factors. Moreover, we compared the advantages and prediction accuracy of C-, D-, and R-vine to discuss their applications. The results reveal that the Vine Copula-based modeling could provide eutrophication management reference and technical assistance in IRSN projects.


Introduction
River and lake function are mutually affected by river-lake connectivity, which relates to the water supply, flood control, and even economic development. However, water issues are constantly changing along with the development of society and the economy. Human activities result in changes of river-lake connectivity and acceleration of eutrophication [1]. Water transition from rivers to lakes has become a popular and useful measure to improve water quality and prevent or mitigate the ecological deterioration of water systems [2], such as Lake Monses in USA [3], Banas catchment in India [4], and Lake Dianchi in China [5].
The Interconnected River System Network (IRSN) is a new water transition method with a certain hydraulic connection, which was established among different water systems [6]. By increasing the hydraulic liquidity and continuity in water bodies, this new method can improve self-recovery ability and realize the long-term health and stability of water bodies to solve the environmental issues associated with water [7]. For water resource conservation and sustainability, IRSN improves the heterogeneous distribution of water resources and freshwater ecological restoration and has become an effective measure to increase efficiency and allocation of water resources [8].
However, there are subsequently a number of uncertain consequences derived from natural and anthropogenic pressures on the IRSN, especially ecological risk to the water environment, which has caused widespread interest and attention. Smaller area and lower depth promote higher exposure to resuspension and higher nutrient concentrations; therefore, small water bodies need more concern [9]. The Water Quality Index (WQI) is one of the most frequently used tools for assessing water quality [10]. The Nemerow pollution index [11], Organic Pollution Index, and Eutrophication Index [12] have also been used to quantify the level of water pollution and water quality condition. In addition, cross-coupling among different methods and models has emerged in water quality assessment to reduce the shortages of each model. The SWAT-ANN model combined the small watershed model SWAT and the Artificial Neural Network (ANN) for improving the accuracy of water quality prediction [13,14]. Wu et al. [15] coupled Grey Target Theory (GTT) with the traditional Analytic Hierarchy Process (AHP) to eliminate the subjective errors of the AHP. The abovementioned models are commonly applied to single water quality index analysis, and then the identified critical indicators are formed into a data set to conduct an overall assessment of the water environment without taking the correlation between different indicators into consideration [16].
As a powerful and flexible toolbox to characterize dependence profiles for multi-variables [17], Copula has been the subject of widespread interest in hydrology. Compared with other models, Copula could directly establish the joint distribution function of critical indicators and analyze the multifactorial combined risk caused by key indicators which exceed standard limits [18]. Recently, Copula has been widely used in studies of multivariate hydrological indicators analysis, particularly in flood frequency calculation [19,20], rainfall frequency analysis [21,22], and drought characteristics analysis [23][24][25]. In the field of water pollution risk assessment, Copula has mainly been used for two-dimensional and three-dimensional research, eutrophication of water quality analysis [26,27], and water quantity and quality risk assessment [28,29]. When facing high-dimensional problems, the above method is not powerful enough to model possible mutual dependencies among all variables [30].
The work of Joe [31], Bedford, and Cooke [32,33] proposed a radically reliable way to construct complex multivariate highly dependent models called Vine Copula [30,34]. Vine Copula decomposes the high-dimensional copula into several bivariate copulas, which greatly improves the flexibility of the model. Different decomposition structures result in different Vine Copula models. The regular vine (R-vine) first appeared in public as a graphical tool by Bedford and Cooke [32,33]; they also introduced the so-called canonical (C-vine) and drawable vine (D-vine). On these grounds, Aas et al. [34] extended these models and indicated that C-vine exhibited star shape structures having a tree sequence, whereas, D-vine possessed a path structure [35].
In this paper, a Vine Copula-based model was developed to identify the water pollution risk factors in the IRSN. As an efficient multiple-variables analysis tool in water ecological environment, the model based on C-vine and D-vine [36,37] has found an increasingly wide utilization, and there have been relatively few cases of application of R-vine [38,39]. However, the difference of identification results for three classes of vine copulas has not been fully addressed, that is, the uncertainties of the model structure need to be further investigated. Accordingly, we propose to use Vine Copula to identify the key factors from multiple water quality indicators, which could provide early warning and forecasting of any impending water pollution risks. Therefore, the objective of this study is to develop a Vine Copula-based analysis system, considering multiple uncertainties in a water environment, and apply it concretely to identifying the key water pollution indicators. Furthermore, we analyze the difference of identification results of C-, D-, and R-vine to discuss the distinctions of different vine structures for the purpose of choosing the most suitable structure for a case study in the future work.

Study Area
Nanyang Lake (35°4′-35°20′ N; 116°34′-116°42′ E) is located in Jining city, Shandong province, central western China, which is generally known as a shallow and eutrophic lake. Nanyang Lake has an average water depth of 1.5 m and a total surface area of 215 km 2 , with an annual average precipitation of 717 mm and an annual average evaporation of 1074 mm. The safety flow rates of the Si River, Zhuzhao New River, Wanfu River, and Baima River, which are connected to Nanyang Lake are 5140, 1700, 830 and 558 m 3 /s, respectively. Over the past two decades, with the development of industry and urbanization, great quantities of industrial wastewater and sanitary sewage without effective treatment have been discharged into Nanyang Lake. Point pollution sources were the major pollution, among which the effluent of the coal, medical, and bio-chemical industries are of importance. The level of pollution was greatly beyond the self-purification capacity of the water system and caused the water quality to deteriorate gradually with increased concentration of nitrogen, phosphorus, suspended matters, and other organisms.
As the water quality monitoring site of the Lake, Nanyang station is situated on an island in the center of Nanyang Lake ( Figure 1). Ponds around Nanyang station form a reedy, marsh-covered wetland area. In this paper, we selected five proxy indicators from water quality indexes and eutrophication indexes and collected the indicator data monthly from 2008 to 2014, which included dissolved oxygen (DO), total nitrogen (TN), total phosphorus (TP), chlorophyll-a (Chla), and ammonia nitrogen (NH3-N). To relieve water shortage in North China, the central government launched the strategically significant South-to-North Water Transfer Project (SNWTP). As the water transfer channel and impounded lake of the East Route of SNWTP, Nanyang Lake changed its hydraulic regulation after the operation of the project began, which gave rise to the changes in the water environment factors. The IRSN study of Nanyang Lake formally transferred water in November 2013, which was taken as a demarcation line; the model was divided into two periodsbefore the operation of the IRSN (B-IRSN), the period from January 2008 to October 2013, and after the operation of the IRSN (A-IRSN), for the period from November 2013 to December 2014.

Water Pollution Risk Definition
Risk refers to the uncertainty of a certain outcome in a given situation, commonly using probability to make a prediction of potential accidents as well as trends. Tung and Mays [40] defined the risk of the system as the probability when the load l on the system exceeds the resistance r, which can be expressed as where R represents the risk of the system, Pr( ) represents probability.
To the water environment system of the IRSN discussed in this paper, "load" could be defined as the pollutant concentrations, while "resistance" is the limited values of pollutants within the specified range. Thus, water pollution risk (Rw) can be computed as: where pc refers to the pollutant concentrations, pv refers to the maximum allowed value of pollutant concentration, m is the number of times that the measured contraindication was above the limit, N is all the water pollution events. Accordingly, the water pollution risk of the IRSN can be described as the probability of water quality exceeding the standard values under the influence of various uncertain pollutants before and after the operation of the connection project.

Multivariate Dependence Modeling Based on Vine Copula
In a more formal definition, Copula is a multivariate joint probability distribution function on the unit square [0, 1] with uniform marginal distributions [30]. According to Sklar's theorem, the corresponding joint distribution function F(X1, X2, …, Xn) can be expressed as follows: where Cθ ( ) is a copula distribution function, θ represents an explicit parameter to the function, F1(x), F2(x),…, F3(x) to denote the marginal distribution function of random variables.
As the system of IRSN is so large and complicated and involves lots of risk factors, which mostly have dependency relationship, it is necessary to build the joint probability distribution for a multivariable. Bedford and Cooke introduced a more advanced and flexible alternative method of constructing the dependence structure called Vine Copula [41]. At least n (n -1)/ 2 bivariate copulas with a free specification can be established between n given variables under this flexible structure [30]. That means Vine Copula addresses the flexibility limitation of classical copulas to provide different tail dependencies for different variable pairs. C-vine and D-vine are the most common copula structures in practice. C-vine has a star structure with a root node that connects all other nodes for each tree, and therefore, it is applicable to fit a multi-variable with a key variable that controls interactions in the data. The general expression of the n-dimensional joint probability density of Cvine is given as D-vine has a path structure, and each node has at most two edges. D-vines might actually be more beneficial than C-vines when we do not want to assume the existence of a key note that governs the dependencies [42]. The n-dimensional joint probability density of D-vine is expressed as where C( ) refers to the bivariate copula with index i running over the edges for each tree and index j identifying the trees, f( ) denotes the marginal density, and F(x|v) is the conditional distribution function. Figure 2a,b shows the four-dimensional structure of the C-vine and D-vine, respectively. However, the structures of C-vine and D-vine are relatively fixed and cannot be used in fitting more sophisticated dependence structures. The class of R-vine distributions is much larger than the class of C-and D-vine distributions. Unfortunately, due to the huge number of possible R-vine tree sequences, it has not been widely applied in practice. Dissmann et al. [39] developed an automated strategy that combined Maximum Spanning Tree (MST) algorithm with an R-vine copula, which simplified the original selection process and provided concise and efficient multi-dimensional data modeling. The n-dimensional density function of an R-vine is defined as Since the decomposition structure of an R-vine is too complex and does not allow for an easy way to express inference algorithms [39], this article uses the R-vine Matrix (RVM) to represent an Rvine. The following five-dimensional matrix, taken as an example, provides a more simple and efficient method to express a five-dimensional R-vine.
All the nodes of Tree 1 are arranged in an orderly way on the master diagonal and combined with the nodes in the last row of M*, respectively as edges of Tree 1. In this example, (5,1)(4,2)(3,2)(2,1) are the edges of Tree 1. The above combinations with conditioned nodes in the penultimate row of M* are the edges of Tree2, which is (5,2|1)(4,1|2)(3,1|2), and so on, for each of the nodes in M*. This completes the construction of R-vine structure.
The construction of fitting a multivariate dependence model using a C-, D-or R-vine includes the following three steps: 1. Selecting the connection order of the variables. For a C-vine, Czado et al. [43] introduced a method that calculates the absolute values of Kendall's tau coefficients for pairwise variables and selects the variable with the maximum sum of absolute values as the root node. For a Dvine, we order the variables that define a tree that maximizes a given dependence measure used as edge weights according to Kendall's tau; review Brechmann [44] for details. For the R-vine, this paper uses a maximum spanning tree algorithm such the algorithms of Prim (MST-PRIM), which maximizes the sum of absolute values of Kendall's tau in every tree to select the suitable RVM. Calculate the sum of the absolute values of Kendall's tau coefficients by Equation (8).
where Si refers to the sum of the absolute values of Kendall's tau coefficients, refers to the Kendall's tau coefficients. Thus, the Kendall tau coefficient matrix among above five water quality variables is given in Equation (9). The data before "/" represent results in the period of B-IRSN, and the data after "/' represent results in the period of A-IRSN.
2. Choosing the suitable binary copula function for each pair-copula. Taking the Akaike Information Criterion (AIC) as selection criteria [40], this paper selects the most suitable binary copula function from ten alternative copulas (Gaussian copula, Student't copula, Clayton copula, Gumbel copula, Frank copula, Joe copula, and BB copula). 3. Estimating parameters of all vine copulas. Here, we estimate the parameters of vine copulas by the maximum likelihood estimation (MLE). The specific steps are as follows. Firstly, the binary copula parameters of Tree 1 can be estimated using the MLE method. Secondly, the conditional function F(x|v) can be calculated as the observations of Tree 2 using h-functions as follows: where vj refers to an arbitrary element of v, v-j is the vector v excluding vj, and C( ) represents the bivariate copula distribution function to F(x|v). The observations obtained in the previous step are then taken to estimate the parameters of Tree 2. The parameter estimation of all trees is completed by repeating the above steps [45]. This completes multivariate dependence modeling based on the C-, D-, and R-vine copula.

Water Pollution Risk Identification Model
In this paper, we developed a Vine Copula-based modeling to identify a multivariate water pollution risk, which includes primarily three parts: 1. Processing of basic data. For the initial observational series, we calculated their eigenvalues and fitted them with six common hydrological distributions (Normal distribution, Log-normal distribution, exponential distribution, logistic distribution, Gamma distribution, and Weibull distribution). According to the goodness-of-fit test criteria, we selected the best-fit distribution which has the lowest AIC value. In this study, we collected 70 sets of data before the operation of the IRSN, for the period from January 2008 to October 2013, and 14 sets of data after the operation of the IRSN, for the period from November 2013 to December 2014 [46]. Summary statics for the water quality variables were reported in the period of B-IRSN and A-IRSN, respectively (Table 1). To compare the fit of the models, the Akaike Information Criterion (AIC), the Bayesian Information Criteria (BIC), and the Log-likelihood method were used to test the prediction accuracy of the C-, D-, and R-vine, respectively. In addition, to verify the models and further compare the C-, D-, and R-vine copula, the following work was accomplished. We brought the copula structure and parameter estimation results into the simulation function to generate 500 groups of simulated data and calculated Kendall's tau coefficients for pairwise variables. After 200 cycles, box plots were created according to the results of Kendall's tau coefficients in the previous step. 3. Identification of water pollution risk. In order to analyze the sensitivity of the water pollution risk and identify the key risk indicators, the influence extent of each water quality factors on the water pollution risk joint probability was analyzed as follows. According to the National Environmental quality standards of the surface water of China and water quality standard proposed by the Organization for Economic Co-operation and Development, the mean values of the initial series of risk factors were taken as the base-value, and the risk probability density calculated when the concentration of one of those variables varied in accordance with the water I-V quality standard described in Table 2. The higher class means the worse water quality. By comparing the evolution of the risk probability density before and after the operation of the connection project, it was possible to ascertain the effect of the IRSN operation on the environment and fully complete the whole water pollution risk identification. All statistical computations were performed using the Vines, Vine copula, and CDVine packages available in R software.

Descriptive Statistics and Marginal Distributions
Studied water quality variables (DO, TN, TP, Chla, and NH3-N) were the important pollution indicators in the ponds around Nanyang station. For the descriptive statistics of the above water quality variables, the standard deviation is quite small relative to their average, which indicated that the data of water quality were stable ( Table 1).
The selected marginal distribution and their estimated parameters for each variable can be found in Table 3 along with the AIC values. From the results, the best fitted distributions of variables were log-norm distribution in most cases. Table 3. The results of best fitted distribution to each variable along with their estimated parameters. The data before "/" represent results in the period of B-IRSN, and the data after "/' represent results in the period of A-IRSN. Notes: Gamma is the gamma distribution, lnorm is the log-norm distribution, and weibull is the weibull distribution.
The first tree of the Vine Copula often has the greatest influence on the model fit [32]. Therefore, we may more directly observe the dependencies among uncertainty variables by analyzing the structures of the first tree of vine copulas. The results show the C-vine and D-vine structures (Tree 1) with families and their estimated parameters, respectively (Figures 3 and 4).  The construction results of RVM and best-fit copula family matrix as shown below ( Figure 5). The RVMbefore and RVMafter show that the R-vines have four trees in total. Tree 1 in RVMbefore has four edges, (1,3), (4,2), (1,4), and (5,1). Tree 2 in RVMbefore has three edges, (4,3|1), (1,2|4), and (5,4|1). Tree 3 in RVMbefore has two edges, (5,3|4,1) and (5,2|1,4). Tree 4 in RVMbefore has only one edge (2,3|5,4,1). The edges of trees in RVMafter can be defined in this similar manner. Both, before and after the operation of the IRSN, DO is generally connected with other variables as the key node (Figures 3, 4, and 5). Tail dependence provides measures for the dependence between extreme variables and measures the tendency of one variable to occur with a minimal or maximal outcome simultaneously with another variable [47,48]. The selection of the Gaussian copula suggests the dependence between variables is tail symmetric and exhibits no tail dependence, the selection of the Clayton copula indicates that lower tail dependence without upper tail dependence between variables, and the Frank copula shows no tail dependence or the tail dependence is very weak. This means the five water quality variables are lowly correlated in general.
The observed and simulated Kendall's tau coefficients among the C-, D-, and R-vine were compared as shown in Figure 6. The observed statistics fall between the upper and lower quartiles of the boxplots, and most of them are near the median. It is indicated that simulated statistics fit the observed statistics well and the C-, D-, and R-vine achieve great performance in characterizing the dependence structure of multiple variables.  Furthermore, to evaluate which vine copula could fit the data better, AIC, BIC, and Loglikelihood were used to compare the C-, D-, and R-vine copula in Table 6. Generally, the lower the AIC and BIC values, the better the model; the higher the Log-likelihood value the better the model. C-vine has lower AIC and BIC values and a higher log-likelihood value than the other vines before the operation of the IRSN, while the D-vine was found to achieve greater performance after the operation of the IRSN (Table 6). Table 6. AIC, BIC, and Log-likelihood for C-, D-and R-vine. The data before "/" represent the AIC, BIC, and Log-likelihood values in the period of B-IRSN, and the data after "/' represent the AIC, BIC, and Log-likelihood values in the period of A-IRSN. The likelihood-ratio based test proposed by Vuong [49] among the C-, D-, and R-vine copula for pair comparisons listed in Table 7. The Vuong test can be used to compare the predicted probabilities of two copula models by measuring the closeness of a model to the truth. If the Vuong test statistic is positive, we tend to select the structure, families, and parameters of the first model. If the Vuong test statistic is negative, we then prefer the second model. Moreover, the difference between the two models is not statistically significant when the Vuong p-value is greater than 5%. The result shows that the Vuong test selects the C-vine structure, families, and parameters before the operation of the IRSN and the D-vine structure, families, and parameters after the operation of the IRSN (Table 7). In the 5% significance level, the Vuong p-value shows that the general difference for pairwise models is not obvious. This is probably because different observation angles are given according to the special structures of each two models.

Sensitivity Analysis of Water Pollution Risk Indicators
The sensitivity analysis of water pollution risk indicators can identify the key factors of joint water pollution risk from multiple uncertainty water quality characteristics. Using the sensitivity analysis method described in Section 2.4, a series of risk scenarios is set and applied to the established vine copula models to calculate the risk probability density and identify the key indicators. The risk scenarios involve 25 different conditions in each period (Figure 7). A-IRSN (b) by using C-, D-, and R-vine, respectively. When the indicators are in the same standard limits, the higher the probability density of a certain variable, the greater the impact on joint water pollution risk. The roman numbers represent the risk probability density.
The quality of ponds around Nanyang station needs to reach Class III; therefore, we define the water pollution risk when the quality of ponds around Nanyang station is worse than Class III water, namely Class IV, and Class V water. In the same standard limits, the higher the probability density of a certain variable, the greater the influence on the change rate of joint distribution, that is, the joint water pollution risk is more sensitive to it. Therefore, we consider the sensitive factors with less decreasing of the probability density after the operation of the IRSN as the key risk factors. From Figure 7, the probability densities of each variable except DO are higher in class IV and class V than other conditions. This means that with the possible exception of DO, other water quality variables are more sensitive to joint risk of water pollution. Comparing the probability densities before and after the operation of the IRSN, we find that the probability densities of TN, TP, Chla, and NH3-N decrease to some extent in class IV and class V in the period of A-IRSN, which indicates that the joint risk sensitivity of TN, TP, Chla, and NH3-N decreased after the operation of the IRSN. Through the comparison of the probability density of TN, TP, Chla, and NH3-N after the operation of the IRSN, it is noticeable that the probability density of TP drops markedly and it fell to below 1 in Class V water, which was the smallest figure compared with that in other Class water. It could be argued that the joint risk sensitivity of TP has an obvious change and it becomes the most insensitive factor among TN, TP, Chla, and NH3-N after the operation of the IRSN. Accordingly, TP was excluded and TN, Chla, and NH3-N were considered as the key risk factors.
In addition, we used the same method to calculate the risk probability densities for the full sample period and plotted the time-series graph of risk probability density changes by taking C-vine as an example (Figure 8). The figure facilitates to judge observations intuitively and clearly according to the color changes. By observing the color changes, it is clear that the probability densities of each indicator decrease apparently and change smoothly after the operation of the connection project. Furthermore, through the analysis of the whole time series, one can readily observe how the risk probability density of a certain water quality variable is changing in time so that we can summarize the regularity of the variation trend for water quality variables and determine the impact on the risk probability density of water quality indicators caused by a certain water transfer project, which could provide useful reference for water quality managers. For example, via Figure 8, it is concluded that the operation of the IRSN improves the water quality and the achievements are generally obvious.

Discussion
In previous research, cross-coupling water quality models, which generally require an enormous amount of detailed data, are the main method to assess water pollution. However, when we are confronted with a limited dataset, the relatively poor prediction accuracy of these models limits their application. Whereas the statistical modeling for multiple variables analysis as the other method to identify the key indicators of water quality requires relatively less data [50].
Vine Copula is a commonly multiple variables analysis model, which can be used in modeling the water quality data with more potential flexibility and adaptability over alternative multivariate copulas, particularly when the data is limited [30]. From a statistical point of view, the monthly data from 2008 to 2014 in this article are too few to fit the multivariate model. However, from the analysis result, Vine Copula gave a great performance in characterizing the dependence structure of multiple variables. In fact, Vine Copula is an essential and commonly used method for the analysis of extreme data or even missing data in hydrology [51][52][53]. It should be noted that we just take the water quality data from the one measurement location due to the insufficient data available. That is to say, this is rather a methodological paper. Actually, one of the goals of our study is proving how the Vine Copula-based modeling proposed in this paper can be used to identify the key water risk factors, particularly in a small water body. We just take the ponds around Nanyang station as an example to demonstrate the flexibility and efficiency of Vine Copula in modeling the water quality data and identifying the key water quality indicators. However, this could equally be used in several measurement locations. According to the identification results, the Vine Copula-based modeling can perform well in characterizing the dependence structure of multiple water quality variables, and successfully identified the key indicators in the ponds around Nanyang station using the limited water quality data.
Apparently, the previous evaluation results show that AIC, BIC, Log-likelihood, and the Vuong test lead to a consistent selection (Tables 6 and 7). It seems like C-vine could fit the data better before the operation of the IRSN, while D-vine has a greater performance after the operation of the IRSN. Next, we further discuss the cause for this result. It can be found that the impact of each water quality indicator on joint water pollution risk is quite different before the operation of the IRSN, and DO is the one that has the greatest impact on joint risk in most cases (Figure 8). But the impact of water quality factors on joint risk tend to nearly stable after the operation of the IRSN, that is, the leading role of DO is no longer significant. This further confirms that the C-vine is applicable to fit a multivariable with a key variable that controls interactions in the data, and the D-vine is suitable especially when variables are relatively independent. The advantage of the D-vine copula to fit the variables with unclear dependence, which is a feature that the C-vine copula models do not have. As for the Rvine, it has a greater advantage for dealing with the massive volume and high-dimension of data. This is why the previous evaluation results occurred.
As two popular classes of vine copula structures, C-and D-vine copula have been widely used in various fields. Actually, when facing the high-dimension data, the pivotal elements and independent variables may exist at the same time. R-vine copulas weaken the properties of conditional independence and enhance the flexibility and universality of the model, which make Rvines have the ability to fit a more sophisticated series of multiple dependencies even in larger dimensions. However, R-vines are seldom applied by the enormous number of possible R-vine tree sequences. The MST-PRIM algorithm, which is used to construct R-vine copula in this paper, can successfully remove this limitation. We recognize this, and in our future work, R-vine Copula-based modeling will be given priority to identify the high-dimension water quality variables in IRSN projects.

Conclusions
In this paper, we proposed a novel method based on vine copula to identify the key indicators of water pollution risk before and after the operation of the IRSN. Through this study, the major conclusions are follows: 1. A feasible assessment system for water pollution risk was established to identify the key indicators of water pollution risk in IRSN. The system achieved great performance in characterizing the dependence structure of multiple water quality variables. Our assessment system has comparative advantages over cross-coupling water quality models that require an enormous amount of detailed data. 2. Taking the IRSN of ponds around Nanyang station as an example, models based on three classes of vine copulas (C-, D-, and R-vine) were respectively utilized to identify the risk of water quality indicators before and after the operation of the connection project. The sensitivity of five risk indicators including DO, TN, TP, Chla, and NH3-N was analyzed, and the results showed that TN, Chla, and NH3-N should be considered as primary risk factors after the operation of the connection project. 3. By comparing the advantages and prediction accuracy of the C-, D-, and R-vine, the different adaptive circs among them were deduced. The C-vine is applicable to fit a multi-variable with a pivotal element. D-vines might actually be more beneficial when we do not want to assume the existence of a key variable that governs the dependencies. The R-vine has a greater advantage when it comes to dealing with the massive volume and high-dimension data. In addition, using the R-vine Matrix (RVM) to express the vine copula structures is without doubt a concise and effective method. We will consider R-vine Copula-based method when there are high-dimension water quality variables in future work.

Conflicts of Interest:
The authors declare no conflicts of interest.