Probabilistic Analysis of Extreme Discharges and Precipitations with a Nonparametric Copula Model

Urumqi River is an important river in the Xinjiang autonomous region, China, where floods or droughts are the major concerns of the local communities. This river’s discharge is mainly influenced by the natural factors such as precipitation and climates, rather than human activities. This paper quantifies the interdependent structure between Urumqi River’s discharge and precipitation using a nonparametric Copula method. It then analyzes the relationship between the extreme discharges of this river and extreme precipitation of the region. Comparison between simulation result and real data is conducted to verify the rationality of the model. Furthermore, the conditional probabilities of maximum and minimum discharge at different precipitation levels are also investigated using the Copula distribution functions. The results show a strong relationship between large discharge and heavy precipitation in this region. The upper dependence coefficient is nearly 0.6 and the probability of large discharge reaches 0.64 when the rainfall is greater than 159.56 mm. The relationship between small discharge and rainfall is insignificant. The lower dependence coefficient is zero, suggesting that the base flow and snowmelt from Tianshan likely contribute to Urumqi River’s discharge during the dry season.


Introduction
Water resources are one of the most important natural resources for the sustainable development of society [1], especially in the arid and semi-arid areas [2,3].As one of the world's extreme arid inland areas with the highest water scarcity index, water resources of the Xinjiang autonomous region have been a great concern to China, and many studies have focused on this issue [4][5][6][7][8][9][10].The Urumqi River, located on the northern slope of Tianshan Mountain, is one of the most important inland rivers in the region.It is the major water resource for Urumqi City, which has a population of more than 3.5 million [6].The fluctuation of the river flow significantly impacts the social and economic development of the city.The hydrology process in this area, as well as its response to the climate change, has drawn great attention in recent decades, especially the dependence between climate variability and discharge at the upstream of the Urumqi River.While it is a well-known fact that the rainfall and discharge are closely related, it is difficult to quantify their dependency under extreme events, that is, the dependency between heavy rainfall and large discharge, or between the low rainfall and small discharge.The extreme events are of great concerns to water resources managers.Recently developed Copula theory and method have shed light on addressing such issues, thus, we decided to choose the Copula method to investigate the rainfall and discharge under extreme conditions at this region.
The Copula function (a multivariate probability distribution function) proposed by Sklar [11] can thoroughly quantify dependence among the different variables, regardless of the specific marginal distribution of each component, as such, it has often been used to analyze the dependence between extreme values of two variables.The Copulas function was first introduced to hydrology by De Michele and Salvadori [12], and later, several papers were published [13][14][15][16].Recently, the method has been applied to investigations of the relationship between extreme hydrological and climate events: for examples, droughts analyses [17][18][19][20][21][22]; rainfall analyses [23][24][25][26][27][28][29][30]; flood analyses [30][31][32][33][34][35][36][37]; and analyses of dam overtopping risk [38,39].These studies used the parametric Copula method but not the nonparametric Copula method.The parametric Copula approaches use commonly adopted Copula functions (such as Gaussian, t, or Archimedean Copula functions) or their linear combinations to estimate the various relationship between hydrological and climate events.This parametric approach is not sufficient for analyzing hydrometeorology because of the complexity and diversity of the process (that is, it is controlled by many factors).For this reason, we introduce a nonparametric approach to estimating the Copula function in the hydrometeorological studies.Since this approach is not constrained by the commonly employed Copula functions, it can yield Copula functions for site-specific datasets, or say, it is a data-adaptive method.
Nonparametric Copulas estimation was originated from Deheuvels (1979) [40], who proposed an estimator on a multivariate empirical distribution based on the marginal empirical distributions.Then Lejeune and Sarda (1992) [41] improved the method to reduce the boundary bias.Later Fermanian and Scaillet (2003) [42] improved the above method, and then Chen and Huang (2007) [43] proposed effective and advanced estimation method, which is approximate unbiased at each interior, boundary and corner points of the support set, and yields small variances.They showed that the estimator is consistent for each point of the support set.
Considering the above advantages, this study used Chen and Huang's (2007) [43] nonparametric Copula estimation method for the first time to quantify how precipitation impacts the streamflow in mountain drainage basins Specifically, we built the nonparametric Copula function between the discharge and precipitation at this region, developed their joint probability distribution and density functions, and analyzed the dependency structure of the extreme values, including the calculation of upper and lower dependence coefficients.In addition, the performances between non-parametric and parametric Copulas are compared.Simulations were also carried out to verify the rationality of the estimated model.Meanwhile, we explained the mechanism behind the phenomenon based on the estimated Copula joint density and the dependence coefficients and investigated factors controlling the Urumqi River's discharge at different periods during a year's period.Subsequently, we used the result to derive the conditional probabilities of the maximum and minimum discharges at various precipitation levels.

Field Site
The Urumqi River upstream basin is situated on the south fringe of Junggar Basin, on the north slope of the Tianshan Mountains in Xinjiang, northwest China [7,9].The basin extends from 43 • 00 N to 43 • 28 N, and from 86 • 45 E to 87 • 18 E (Figure 1).The river originated from Glacier No. 1 at an elevation of 3900 m above mean sea level (AMSL), on the northern flank of Tianger Peak II (4479 m Water 2018, 10, 823 3 of 19 AMSL) in the middle Tianshan Mountains [9].It is a typical inland intermountain river fed by a mixture of the glacier-melt water and precipitation [6,8,44].The flood along the river occurs in July and August, and it is mainly caused by rainfall.Of course, the snowmelt on the top of the mountain can sometimes aggravate the flood [7,8].
The region has a complex topography, which includes grassland, marsh, and desert in addition to the surrounding mountainous alpine areas [45].Overall, this basin is poorly gauged.In 1958, the Yingxiongqiao Hydrological Gauging Station (YHGS) (Figure 1) installed gauges to monitor the upstream of Urumqi River.It is the only hydrological gauging station with available data for long the time series located at the mountain pass.As a typical alpine river, the length of the Urumuqi river above the YHGS is about 62.6 km with a drainage area of 924 km 2 while the altitude range is nearly 2000 m and the averaged gradient is 0.032 [46].Because of the steepness of the gradient, the velocity of the river is fast, and the discharge that comes from one month's precipitation will flow away quickly and will not affect the next month's discharge.Thus, the monthly averaged discharge data can be viewed as independent events or datum (Figure 1).

Data
The monthly averaged discharge, and precipitation datasets of Urumqi River from 1958.1-2006.12are used in this study as shown in Figure 2, and the relationship between extreme monthly averaged discharge and precipitation is the main focus of this article.The discharge data is collected from YHGS, and the precipitation data is obtained from Daxigou Meteorological Station (DMS).Both precipitation and discharge exhibit cycle characteristics with the period of one year, and have no long-term trend.The averaged discharge and precipitation are 7.63 m 3 /s and 37.80 mm, and their standard deviations are 8.74 m 3 /s and 44.7 mm, respectively.The boxplots of precipitation and discharge for each month and the autocorrelation functions of the two series with lags 1 to 20 are shown in Figure 3.At the upstream of the Urumqi River, precipitation occurs most frequently The region has a complex topography, which includes grassland, marsh, and desert in addition to the surrounding mountainous alpine areas [45].Overall, this basin is poorly gauged.In 1958, the Yingxiongqiao Hydrological Gauging Station (YHGS) (Figure 1) installed gauges to monitor the upstream of Urumqi River.It is the only hydrological gauging station with available data for long the time series located at the mountain pass.As a typical alpine river, the length of the Urumuqi river above the YHGS is about 62.6 km with a drainage area of 924 km 2 while the altitude range is nearly 2000 m and the averaged gradient is 0.032 [46].Because of the steepness of the gradient, the velocity of the river is fast, and the discharge that comes from one month's precipitation will flow away quickly and will not affect the next month's discharge.Thus, the monthly averaged discharge data can be viewed as independent events or datum (Figure 1).

Data
The monthly averaged discharge, and precipitation datasets of Urumqi River from January 1958-December 2006 are used in this study as shown in Figure 2, and the relationship between extreme monthly averaged discharge and precipitation is the main focus of this article.The discharge data is collected from YHGS, and the precipitation data is obtained from Daxigou Meteorological Station (DMS).Both precipitation and discharge exhibit cycle characteristics with the period of one year, and have no long-term trend.The averaged discharge and precipitation are 7.63 m 3 /s and 37.80 mm, and their standard deviations are 8.74 m 3 /s and 44.7 mm, respectively.The boxplots of precipitation and discharge for each month and the autocorrelation functions of the two series with lags 1 to 20 are shown in Figure 3.At the upstream of the Urumqi River, precipitation occurs most frequently from June to August, and the annual maximum discharge usually occurs during this period.The observed annual maximum precipitation in the upstream area is 632 mm (in July 1996), and the annual maximum discharge is 55.2 m 3 /s, which occurs at the same time as precipitation.The rainless season usually occurs from November to the next April, and the minimum discharge usually takes place in December, January, February, or March.From the observation data, the minimum discharge is 0.06 m 3 /s (at December 2001) while the minimum precipitation is 0 mm, taking place in January 1965, December 1967, and December 1968.The correlation coefficient of precipitation and discharge series is as high as 0.919, which implies the high correlation between the two as we expected.While the correlation coefficient can only measure the linear correlation between the two variables, it cannot provide a detailed joint distribution surface of the two and the relationship between extreme precipitation and discharge, which needs to be further studied by the copula method.It should be noted that the Daxigou Reservoir, located at 5 km upper of YHGS, was constructed in 2007.Its construction disturbed the natural hydrological conditions of upstream of the Urumqi River, and the data of YHGS after 2006 are, thus, excluded in this study.
Water 2018, 10, x FOR PEER REVIEW 4 of 19 from June to August, and the annual maximum discharge usually occurs during this period.The observed annual maximum precipitation in the upstream area is 632 mm (in July 1996), and the annual maximum discharge is 55.2 m 3 /s, which occurs at the same time as precipitation.The rainless season usually occurs from November to the next April, and the minimum discharge usually takes place in December, January, February, or March.From the observation data, the minimum discharge is 0.06 m 3 /s (at December 2001) while the minimum precipitation is 0 mm, taking place in January 1965, December 1967, and December 1968.The correlation coefficient of precipitation and discharge series is as high as 0.919, which implies the high correlation between the two as we expected.While the correlation coefficient can only measure the linear correlation between the two variables, it cannot provide a detailed joint distribution surface of the two and the relationship between extreme precipitation and discharge, which needs to be further studied by the copula method.It should be noted that the Daxigou Reservoir, located at 5 km upper of YHGS, was constructed in 2007.Its construction disturbed the natural hydrological conditions of upstream of the Urumqi River, and the data of YHGS after 2006 are, thus, excluded in this study.

The Nonparametric Copula Estimator
Let (X, Y) denote a two-dimensional random vector with F(x, y) being its joint distribution function, and F 1 (x) and F 2 (y) are the marginal distributions of X and Y respectively.Sklar(1959) [11] proved that there exists a unique function C on [0, 1] 2 , such that F(x, y) = C(F 1 (x), F 2 (y)) for all (x, y) in the real number field.Both F 1 (X) and F 2 (Y) have uniform distributions with a support set [0,1], and the function C is a two-dimensional joint distribution function of F 1 (X) and F 2 (Y), which is called the Copula distribution function of X and Y.Then, the Copula joint density function c(u, v) is Note that the surfaces of C(u, v) and c(u, v) fully describe the dependence between X and Y, after filtering out each one's marginal distribution.That means the Copula functions remove the respective characteristics of X and Y, and highlight their relationship.For hydrometeorology investigation, let X and Y be a hydrological and a meteorological time series respectively.The probability of the concurrence of extreme hydrological and meteorological events, such as heavy precipitation and huge discharge, or little precipitation and small discharge, which are often of our interests, can then be observed by the shape of the Copula density function, c, near the corner points (0,0) and (1,1).For instance, if the shape of c is convex upward at the region near (0,0) or (1,1), the possibility that both precipitation and river discharge have a minimum or a maximum value at the same time is high.On the other hand, if the surface of c is flat and close to zero at the region near (0,0) or (1,1), then the probability that both events simultaneously have extreme values is small.
The functions C or c are unknown but can be estimated from the observed data.Chen and Huang (2007) [43] developed a nonparametric procedure to estimate the copula C function, which consists of two stages.The first step is to estimate the marginal distributions functions of X and Y, that is, F 1 and F 2 using commonly used parametric distributions, such as normal, exponential distributions, and so on, or some non-parametric methods such as kernel density estimation or empirical distribution.Because the data used in this paper does not fit the well-known parametric distributions well with almost zeros p values, while the famous kernel density estimation does not have good extrapolation capacity [41], thus, in this paper, we choose the conservative approach, that is, using empirical distribution functions which can deal with interpolation problems only, as approximations of marginal distributions, the expression of which are written as where l =1 or 2, F1 (s) and F2 (s) represent the estimations of the marginal distribution functions of F 1 and F 2 respectively.n is the observation number and s 1 (k) denotes the kth order statistics of the sample from X, and similarly, s 2 (k) denotes the kth order statistics of the sample from Y, then we have . The second stage is to estimate the function C based on the estimated Fl s, the details of which are given below.
Let the local linear kernel K u,h (x) defined as where K(x) is a symmetric probability density function, satisfying that ( 1) , for any x ∈ R. h denotes the bandwidth and h > 0. I(•) is an indicator function, which equals to 1 when the logical expression in the brackets is true and otherwise equals to zero.
where Ĉ(u, v) is the estimator of C(u, v) at the point (u, v).Equation ( 2) is used in this paper to estimate the Copula function between discharge and precipitation.Chen and Huang (2007) [43] proved that Equation ( 2) is approximate unbiased and has a small variance.Subsequently, the Copula density function c can be estimated using Equation (2), that is where ∆u and ∆v are the very small increments of u and v respectively.

The Explanation of the Reasonability of the Data and Methods Application
Figure 3c,d show that the autocorrelation functions of precipitation and discharge both have long trailing tails, indicating that each of them is related to time, the performance of which is understandable because of the characteristic of monsoon climate in Urumqi, which must be rainy during the summer and less rainy during winter, leading to the periodicities and the correlation with time presented by precipitation and discharge.This is not consistent with the classical statistics which requires that the sample data should be independent and identically distributed.While in fact, all the data pairs of precipitation and discharge used in this paper can also be considered coming from one bivariate population in another point of view, the reasons of which are in the following: It is generally accepted that the precipitation and discharge data from the same month are independent and identical distributed, and we suppose that the joint distribution function of discharge and precipitation in each month is F i (x, y), i = 1, . . ., 12, where x is discharge and y is precipitation, and i represent the ith month.Then the arithmetic mean of F i (x, y)s, that is, 1  12 ∑ 12 i=1 F i (x, y) also forms a bivariate joint distribution function, which meets all the theoretical requirements of the joint distribution function.Let us define this distribution function as F(x, y), i.e., F(x, y) = 1 12 ∑ 12 i=1 F i (x, y), then the F(x, y) represents such a population that when all the discharge-precipitation pairs from different months are viewed as drawn from one population, then the joint distribution function of the population must be F(x, y), because 12 ∑ 12 i=1 F i (x, y)= F(x, y) (we have the same number of sample points coming from each month, thus, P((X, Y) comes from the ith month) = 1  12 ).
Similarly, let F 1i (x) and F 2i (y) denote the marginal distribution functions of the discharge and precipitation in each month, i = 1, . . ., 12, respectively, then all the data of discharge and precipitation can also be viewed as coming from the mixture of the twelve different monthly populations as well as from one population denoted as F 1 (x) and F 2 (y), and we also have By the same token, F 2 (y) is another marginal distribution function of F(x, y).Based on the above joint distribution F and marginal distribution F 1 and F 2 , there exists a Copula distribution function C and density function c, satisfying that F(x, y) = C(F 1 (x), F 2 (y)), and ∂ 2 C ∂u∂v = c.These C and c are just the estimation objects of this paper.

The Upper and Lower Dependence Coefficients
The upper and lower dependence coefficients, denoted by λ U and λ L , are used to measure the dependence between the maximum and the minimum values of the two variables.They are defined as and where the λ U and λ L are both between 0 and 1.The larger λ U is the higher the correlation between the maximum values between the two variables is.In this paper, a large value of λ U indicates that the possibility of the concurrence of heavy rain and large discharge is large, namely, heavy rain may lead to an immediate large discharge.On the other hand, if λ U is close to 0, the heavy rain and flood are uncorrelated.In the same way, a large value of λ L means a strong dependence between the low precipitation and small discharge, and a small λ L value indicates dependence between the two is close to none.
λ U and λ L can be estimated by Equations ( 6) and ( 7) when u is close to 1 or 0, that is, where λU and λL denote the estimators of λ U and λ L respectively and, in this paper, we take u = 0.99 for λU in Equation ( 6) and u = 0.01 for λL in Equation (7).

The Estimation of Conditional Probability
Based on the estimator Ĉ(u, v) in Equation ( 2), the conditional probabilities of maximum and minimum discharge under different precipitation levels can thus be estimated.Let x f and x d be the thresholds of maximum and minimum discharges, respectively.Then we are interested in the probability that the river discharge X is higher than the threshold of maximum discharge, that is, x f when the precipitation Y is greater than a given value y.Or, we may be interested in the probability that the discharge X is below the threshold of minimum discharge, that is, x d when the amount of precipitation Y is less than some given level y.These conditional probabilities can be evaluated by Equations ( 8) and (9).Mathematically, they are where As an initial preanalysis, the stationarity and homogeneity of precipitation and discharge series are also tested through Augmented Dickey-Fuller (ADF) [47] and Levene's [48] test, respectively.The results show that both the p values of ADF test for precipitation and discharge are far less than 0.001, implying that both of the two series are stationary.In addition, the p values of Levene's test are 0.28 and 0.14, respectively, larger than the commonly used significance level 0.05, meaning that the samples are homogeneous.To apply Copula formula to the precipitation and discharge data set for the river, we first use the empirical distribution function, that is, Equation (1) to estimate the marginal distribution functions of river discharge and precipitation.The histograms and empirical distribution functions of precipitation and discharge are shown in Figure 4.The empirical distribution function is a pure interpolated method, which limits the range of the random variable between the maximum and minimum of the sample points and, thus, avoids the risk from extrapolation, that is, the points larger than the maximal sample or smaller than the minimal sample will have a very inaccurate estimation.Their copula distribution and density functions are then estimated using Equations ( 2) and (3).The goodness-of-fit test, that is, the Cramer-von-Mises test [49] was carried out, and the p value is 0.901, which means the nonparametric copula obtained through Equations ( 2) and (3) can effectively capture the dependence structure of the data.extrapolation, that is, the points larger than the maximal sample or smaller than the minimal sample will have a very inaccurate estimation.Their copula distribution and density functions are then estimated using Equations ( 2) and ( 3).The goodness-of-fit test, that is, the Cramer-von-Mises test [49] was carried out, and the P value is 0.901, which means the nonparametric copula obtained through Equations ( 2) and ( 3) can effectively capture the dependence structure of the data.From now on, we use X to represent the river discharge and Y the precipitation.Let F y .To apply Equation (2), n is set to be 588, which is the total number of pairs of data.The value of h is set to be 0.12 as suggested by Chen and Huang [43] (that is, it should be on the same order with the value of Both the small increments u Δ and v Δ in Equation ( 3) are set to 0.01 to calculate the approximate value of ˆ( , ) c u v at each point.
Figures 5 and 6 display the estimated ˆ( , ) C u v and ˆ( , ) c u v .As illustrated in Figure 5a, the surface of ˆ( , ) C u v forms an upward slope with values ranging from 0 to 1 on the region of the unit square, that is, , and with the corner points values: ˆ(0, 0) 0 The cross-section of ˆ( , ) C u v in Figure 5b shows a monotonous increasing curve from 0 to 1 in the interval [0,1].All of these features are consistent with the characteristics of the Copula joint distribution function.From now on, we use X to represent the river discharge and Y the precipitation.Let U = F 1 (X) and V = F 2 (Y) be the standardized X and Y, respectively.Then, F1 (x) and F2 (y) denote the empirical distribution of X and Y, and F−1 1 and F−1 2 represent the inverse functions of F1 (x) and F2 (y).To apply Equation (2), n is set to be 588, which is the total number of pairs of data.The value of h is set to be 0.12 as suggested by Chen and Huang [43] (that is, it should be on the same order with the value of n −1/3 ).Both the small increments ∆u and ∆v in Equation ( 3) are set to 0.01 to calculate the approximate value of ĉ(u, v) at each point.Figures 5 and 6 display the estimated Ĉ(u, v) and ĉ(u, v).As illustrated in Figure 5a, the surface of Ĉ(u, v) forms an upward slope with values ranging from 0 to 1 on the region of the unit square, that is, (u, v) ∈ [0, 1] 2 , and with the corner points values: Ĉ(0, 0) = 0, Ĉ(0, 1) = 0, Ĉ(1, 0) = 0, Ĉ(1, 1) = 1.The cross-section of Ĉ(u, v) in Figure 5b shows a monotonous increasing curve from 0 to 1 in the interval [0,1].All of these features are consistent with the characteristics of the Copula joint distribution function.
The 3-D plot of estimated Copula density function, ĉ(u, v), is illustrated in Figure 6a over the unit square region [0, 1] 2 .Apparently, the joint probability density function of the precipitation and the river discharge is neither a normal nor lognormal distribution.The diagonal region of the function, that is, the place around u = v, bulges upward, and the function becomes flat close to zero near the corner points, (0,1) and (1,0).The upraised portion of the surface is not smooth, containing some small pits and hills, and it rises significantly near the corner point (1,1).
The plan view contour map of ĉ is plotted in Figure 6b, which can help us understand Figure 6a from another perspective.From Figure 6b we see that the color is dark blue near the corner points (0,1) and (1,0), which illustrates that the density function is zero near the two corner points.The density function with positive values (that is, the places with light blue, green, yellow and, red) is along the diagonal region, that is, near the line u = v.In addition, at the left-bottom region, that is, the region near u = v and u < 0.5, v < 0.5, the contours are relatively sparse, and the colors are light blue or green (the values are lower than 2.8).Notice that both the value and the gradient of the density function are small in this area, and the density surface is gentle and flat.On the contrary, at the right-upper region, that is, the area near u = v and u > 0.5, v > 0.5, the contour is dense, and the color is up to red (the value reaches 6.2) near the point (1,1).Both the gradient and value are relatively large around this area, and the density surface is steep.The value of density rises sharply with the increase of u and v. zero near the corner points, (0,1) and (1,0).The upraised portion of the surface is not smooth, containing some small pits and hills, and it rises significantly near the corner point (1,1).
The plan view contour map of is plotted in Figure 6b, which can help us understand Figure 6a from another perspective.From Figure 6b we see that the color is dark blue near the corner points (0,1) and (1,0), which illustrates that the density function is zero near the two corner points.The density function with positive values (that is, the places with light blue, green, yellow and, red) is along the diagonal region, that is, near the line u v = .In addition, at the left-bottom region, that is, the region near u v = and 0.5 u < , 0.5 v < , the contours are relatively sparse, and the colors are light blue or green (the values are lower than 2.8).Notice that both the value and the gradient of the density function are small in this area, and the density surface is gentle and flat.On the contrary, at the right-upper region, that is, the area near u v = and 0.5 u > , 0.5 v > , the contour is dense, and the color is up to red (the value reaches 6.2) near the point (1,1).Both the gradient and value are relatively large around this area, and the density surface is steep.The value of density rises sharply with the increase of u and v .
The bulging up surface means that there is a high possibility that the pair of ( , ) u v in these ranges will occur, while the area where ˆ0 c = suggests that the probability of the concurrence of the pair of ( , ) u v values in these ranges is small.The fact that the surface ˆ( , ) c u v is bulging upward along the diagonal and becomes zero around (1,0) and (0,1) rectifies the positive linear correlation between the discharge and precipitation.On the other hands, the narrow contours and the abrupt uprising of the function around the corner point (1,1) suggest a strong correlation between the upper tails of river discharge and precipitation.In other words, when the precipitation is large, the probability of large discharge can increase dramatically, and the dependence pattern between precipitation and discharge at rainy season is quite different from the pattern of any other seasons.
In contrast, the sparse contours and flat lower tail around (0,0) suggest that correlation between small discharge and precipitation is weak.That is, the probability of small discharge does not necessarily increase even though the precipitation is limited.This result reflects the fact that the precipitation is not the sole source of river discharge.Other sources such as snowmelt and regional groundwater flow may contribute to the river discharge.
In addition, the upper and lower tail dependence coefficients can be quantified through Equations ( 6) and ( 7), where ˆu λ = 0.58 and ˆl λ = 0.The values of the upper and lower coefficients suggest that river discharge is strongly related to heavy rainfall and has little relationship with light rainfall.Although 0.58 is less than 1, it is much larger than zero.This result means that there is a relatively strong relationship between maximum discharge and precipitation, and the dependence between minimum discharge and precipitation is very small.This result is consistent with the analysis in Figure 6a,b, and here the degree of the dependencies are quantified.

Comparison between the Non-Parametric and Parametric Copulas
For comparing the performances between parametric and non-parametric Copulas, five usual parametric Copulas, that is, Gaussian, t, Gumbel, Frank, Clayton Copulas are used to fit the standardized Urumqi River's data.The p values of the above five parametric copulas and the non-parametric copula used in this paper under the Cramer-von-Mises test are shown in Table 1.It can be seen that four of the five parametric Copulas that is-Gaussian, t, Gumbel, and Frank-pass the Cramer-von-Mises test in the sense of significance level 0.05 and, thus, can be viewed as fitting well with the data, though their p values are not so much as the non-parametric Copula of this paper, that is, 0.901.Generally speaking, the larger p values imply better fitting results.Therefore, the density surface of the best fitted one of the parametric Copulas with p value 0.319, that is, Gumbel The bulging up surface means that there is a high possibility that the pair of (u, v) in these ranges will occur, while the area where ĉ = 0 suggests that the probability of the concurrence of the pair of (u, v) values in these ranges is small.The fact that the surface ĉ(u, v) is bulging upward along the diagonal and becomes zero around (1,0) and (0,1) rectifies the positive linear correlation between the discharge and precipitation.
On the other hands, the narrow contours and the abrupt uprising of the function around the corner point (1,1) suggest a strong correlation between the upper tails of river discharge and precipitation.In other words, when the precipitation is large, the probability of large discharge can increase dramatically, and the dependence pattern between precipitation and discharge at rainy season is quite different from the pattern of any other seasons.
In contrast, the sparse contours and flat lower tail around (0,0) suggest that correlation between small discharge and precipitation is weak.That is, the probability of small discharge does not necessarily increase even though the precipitation is limited.This result reflects the fact that the precipitation is not the sole source of river discharge.Other sources such as snowmelt and regional groundwater flow may contribute to the river discharge.
In addition, the upper and lower tail dependence coefficients can be quantified through Equations ( 6) and (7), where λu = 0.58 and λl = 0.The values of the upper and lower coefficients suggest that river discharge is strongly related to heavy rainfall and has little relationship with light rainfall.Although 0.58 is less than 1, it is much larger than zero.This result means that there is a relatively strong relationship between maximum discharge and precipitation, and the dependence between minimum discharge and precipitation is very small.This result is consistent with the analysis in Figure 6a,b, and here the degree of the dependencies are quantified.

Comparison between the Non-Parametric and Parametric Copulas
For comparing the performances between parametric and non-parametric Copulas, five usual parametric Copulas, that is, Gaussian, t, Gumbel, Frank, Clayton Copulas are used to fit the standardized Urumqi River's data.The p values of the above five parametric copulas and the non-parametric copula used in this paper under the Cramer-von-Mises test are shown in Table 1.It can be seen that four of the five parametric Copulas that is-Gaussian, t, Gumbel, and Frank-pass the Cramer-von-Mises test in the sense of significance level 0.05 and, thus, can be viewed as fitting well with the data, though their p values are not so much as the non-parametric Copula of this paper, that is, 0.901.Generally speaking, the larger p values imply better fitting results.Therefore, the density surface of the best fitted one of the parametric Copulas with p value 0.319, that is, Gumbel Copula, is drawn in Figure 7.In Figure 7, both panels (a) and (b) are the fitted Gumbel Copula density surfaces but with different vertical axis range, where (a) is between 0 and 60, while (b) is between 0 and 6.5.Panel (c) shows the sample histogram of the standardized discharge and precipitation.Comparing Figures 6a  and 7a-c, we can see that the shapes of the non-parametric Copula density surface (Figure 6a) and the sample histogram (Figure 7c) are very close to each other, while both of them have some differences with the Gumbel density surface Figure 7a,b, especially near the point (0,0), where the Gumbel surface cocks upward, while the nonparametric Copulas (Figure 6a) and the histogram (Figure 7c) have no such performances.These phenomena indicate that the non-parametric Copula method can get a better fitting effect in the case of this paper compared with parametric methods.Of course, in general, the parametric Copula methods have a better ability of extrapolation than non-parametric Copula, when the fitting is indeed accurate [11].Though this paper does not involve extrapolation problems because the marginal distributions are estimated by the empirical distribution functions.

Simulation and Analysis
To verify the validity of the estimated surface ˆ( , ) c u v in Section 4.1, we conduct numerica simulations using ˆ( , ) c u v , which means using ˆ( , ) c u v as the joint density function and drawing a series of sample points from ˆ( , ) c u v .We then compare this simulated sample points with the actual observation data.The comparisons are shown in Figure 8, where panel (a) is the scatter plo of the 588 standardized real data of precipitation to discharge, panel (b) is the scatter plot with the 588 standardized simulated points, panel (c) is the scatter plot of 588 unstandardized real data, and (d) is the scatter plot with unstandardized simulated points (using

Simulation and Analysis
To verify the validity of the estimated surface ĉ(u, v) in Section 4.1, we conduct numerical simulations using ĉ(u, v), which means using ĉ(u, v) as the joint density function and drawing a series of sample points from ĉ(u, v).We then compare this simulated sample points with the actual observation data.The comparisons are shown in Figure 8, where panel (a) is the scatter plot of the 588 standardized real data of precipitation to discharge, panel (b) is the scatter plot with the 588 standardized simulated points, panel (c) is the scatter plot of 588 unstandardized real data, and (d) is the scatter plot with unstandardized simulated points (using F−1 1 and F−1 2 to restore each point in (b), then we obtain all the points in (d)).
We analyze Figure 8 regarding the following three aspects, firstly, the shapes of data clusters are very similar between (a) and (b) and between (c) and (d), which indicates that the Copula functions Ĉ(u, v) and ĉ(u, v) estimated in this paper are appropriate and correct.The model reflects the relation between precipitation and discharges very well.We further explain the validity of the used model in a quantitative way.In Figure 8a,b, frequency statistics of real observations and simulation points falling in different regions are presented, and some comparisons are also discussed.Because our concern is the extreme values, we mainly focus on the regions of the right-upper and left-bottom regions.They are the regions which are marked by red lines in panels (a) and (b).The results of the frequency statistics are listed in Table 2, where each numerical value is the quotient of the number of points falling into the corresponding region and the total number, that is, 588.According to the table, the simulated frequencies are very close to the real ones in each region, illustrating that the model is reasonable and correct.
will show that the river water will receive shallow groundwater supply.Research also shows that during the dry season, groundwater recharge accounts for about 30% of the total river volume [50].In addition, in the dry season, the recharge ratio of the ablation of ice and snow and ablation of frozen soil is also relatively high.Therefore, when precipitation is less, the decline in river flow is not very significant.

Simulation and Analysis with a Large Number of Samples
To ensure the above simulation result is representative, we once again draw 20,000 sample points (that is, more points improve the accuracy and reduce the standard errors) and apply these 20,000 points to estimate the joint probabilities between precipitation and discharge when the precipitation is at heavy or small levels.The corresponding standard errors and 95% confidence intervals are also calculated.The results are shown in Table 3, where the (X, Y) denotes the random vector of discharge and precipitation and (U, V) represents the standardized discharge and precipitation, that is, U = F1 (X) and V = F2 (Y) or X = F−1 1 (U) and Y = F−1 2 (V).Table 3 means F1 −1 (0.1) = 1.07,F−1 2 (0.1) = 1.43, thus, the probability of (U < 0.1,V < 0.1) is equal to the probability of (X < 1.07,Y < 1.43) and estimated as 0.018, whose standard error and 95% confidence interval are 0.0009 and (0.0162, 0.0198) and so on.
Table 3.The estimated probabilities, standard errors, and 95% confidence intervals of the random vector (X, Y), or (U, V), falling into different regions (X denotes discharge and Y is precipitation, U and V are the standardized X and Y respectively, that is, U = F1 (X) and Table 3 shows that all of the standard errors are less than 0.005, implying the fitting of the 20,000 points is quite accurate.Based on the estimated probabilities in Table 3, P(U < 0.1,V < 0.1) = 0.018 is much less than P(U ≥ 0.9,V ≥ 0.9) = 0.058.P(U < 0.2,V < 0.2) = 0.073 also lower than P(U ≥ 0.8,V ≥ 0.8) = 0.159.This result suggests that the random points are more likely to fall in the upper-right corner than the lower-left corner.In other words, the relationship between heavy rain and large discharge is closely related, while dry weather will not lead to an immediate small discharge at the same time.These results again agree with the conclusions obtained in Section 4.1.

Estimation of Conditional Probabilities
Besides the above analysis, the conditional probability values of X in any region given Y can also be calculated through the Ĉ(u, v), ĉ(u, v), F1 (•), and F2 (•).Typical applications of this function address the concern of the probability of maximum discharge occurs when the precipitation exceeds a certain level, or the probability of minimum discharge as the precipitation is less than some certain level.The assessment of these probabilities could guide decision makers or disaster relief groups to make some advanced preparations.
To illustrate this conditional probability analysis, we consider the upper and lower 5% sample quantiles of the Urumqi River discharge as the thresholds of maximum and minimum discharge respectively.We then calculate the corresponding conditional probabilities through Equations ( 8) and (9).After calculations, let x f = 26.2m 3 /s be the threshold of maximum discharge, and x d = 0.923 m 3 /s be the threshold of minimum discharge.The results are listed in Tables 4 and 5 and illustrated in Figure 9.In Table 4, v = 0.8, 0.82, . . ., 0.98 and is calculated by F−1 2 , the inverse function of F2 The probabilities are then obtained from Equation (8).

Summary and Conclusions
This paper quantifies the relationship between the Urumqi River's discharge and precipitation by a nonparametric Copula method.The estimated Copula distribution and density functions, that is, ˆ( , ) C u v and ˆ( , ) c u v , especially the joint density function, clearly reveal the joint relationship between river discharge and precipitation regardless of their respective marginal distributions.The bulging part of the joint density function around the diagonal region, that is, the region near u v = , Table 4 shows that the probability of large discharge is 0.23 under the condition that the precipitation is larger than 73.16 mm.This probability becomes 0.25 when the precipitation is larger than 79.46 mm and so on.The probability of large discharge continues to grow with the increasing precipitation.As the precipitation becomes larger than 159.56 mm, the probability of large discharge reaches up to 0.64.
Figure 9a displays the monotonous rising trend of large discharge probability along with the increasing precipitation, and the curve line is concave which means the growth rate is also growing.The rapid growing conditional probability of large discharge once again verifies the strong upper tail correlation discussed in the Sections 4.1 and 4.2.As a result, preparation of mitigating hazard of large discharge or even flood is necessary since the probability of large discharge is extremely high.
On the contrary, Table 5 and Figure 9b display the conditional probabilities of small discharge when the precipitation is less than the given levels.In Table 5, v = 0.02, 0.04, . . ., 0.2 and is also calculated via F−1 2 .The probabilities are obtained from Equation (9).Table 5 reveals that the probability of small discharge is just 0.04 when the precipitation is lower than 0.4 mm, and the small discharge probability is 0.08 when the precipitation is less than 0.55 mm and so on.Apparently, all the probability values in Table 5 are small and even less than 0.15.This result indicates that the probability of small discharge is not significant even though the precipitation is low.This finding corroborates the conclusion in Sections 4.1 and 4.2 that the lower tail correlation is small and nearly close to none.The small probabilities in Table 5 suggest that the precipitation is not the decisive factor for minimum discharge.

Summary and Conclusions
This paper quantifies the relationship between the Urumqi River's discharge and precipitation by a nonparametric Copula method.The estimated Copula distribution and density functions, that is, Ĉ(u, v) and ĉ(u, v), especially the joint density function, clearly reveal the joint relationship between river discharge and precipitation regardless of their respective marginal distributions.The bulging part of the joint density function around the diagonal region, that is, the region near u = v, again verifies the strong linear correlation between rainfall and river discharge, which has already been proved by correlation coefficient 0.919 in Section 2.2.On the other hands, the highly bulged upper corner and the narrow contour lines near the point (1,1) imply a strong upper tail correlation.That is, the possibility of the concurrence of the large river discharge and heavy precipitation is extremely high.Additionally, ĉ(u, v) is flat at the lower tail, that is, the region near the point (0,0).This behavior demonstrates that the possibility of the concurrence of the minimum river discharge and low precipitation is low.In other words, light precipitation does not directly correlate with river discharge at low values.Physically, light precipitation likely immediately infiltrates into the subsurface or evaporates back to the atmosphere or produces limited surface runoff, which is retained by surface depressions and ultimately infiltrates into the subsurface or is evaporated.The coefficients of tail dependence quantify the dependence of upper and lower tails (that is, extreme events).
The simulation through the estimated surface ĉ(u, v) are conducted and the observed data and simulation results are compared to verify the rationality of the model ĉ(u, v).The comparisons prove the model.Furthermore, we use the model ĉ(u, v) to simulate 20,000 points to estimate the joint probability of discharge and precipitation.The analysis of the results further demonstrates that the joint probability of maximum discharge and precipitation is large while the joint probability of minimum discharge and precipitation is small.This result further verifies the conclusion that the upper tail dependence between discharge and precipitation is strong while the lower tail dependence is weak.
The conditional probability of maximum and minimum discharge for this study area under the different precipitation levels are calculated through the estimated joint probability distribution function.The analysis shows that the probability of large discharge can substantially increase with the increase in the precipitation.Specifically, as the precipitation becomes larger than 159.56 mm, the probability of large discharge reaches 0.64.With this level of precipitation, the analysis recommends an advanced warning of maximum discharge.On the contrary, the probability of small discharge does not increase with the decrease of the precipitation (that is, the probabilities are less than 0.15).In particular, when the precipitation is less than 0.4, the probability of small discharge is only 0.04.Therefore, we are certain that the precipitation is not the only water source for the Urumqi River during the rainless seasons.The groundwater and snowmelt from Tianshan Mountain and other factors likely sustain the flow of the Urumqi River.Further, for the study area, precipitation can be used to predict floods during rainy seasons, but it is not suitable for forecasting drought during rainless seasons.
Because of the importance of Urumqi River in the local area, the results of this study have practical implications for analyzing maximum and minimum discharge in the area.In addition, this paper introduces a new nonparametric estimation approach for Copula function in statistics to the field of hydrometeorology.This paper demonstrates the practical aspects of this method through the analysis of the relationship of Urumqi River's discharge and precipitation.The methodology and results of this study can be tried to be used in other areas for other issues in the hydrometeorological field.

Figure 1 .
Figure 1.The DEM map and hydrometeorological observation sites in the upstream of Urumqi River basin.

Figure 1 .
Figure 1.The DEM map and hydrometeorological observation sites in the upstream of Urumqi River basin.

Figure 2 .
Figure 2. The monthly average precipitation and discharge series of Urumqi River from 1958 to 2006.The panel (a) is the precipitation sequence and panel (b) is the discharge sequence.

Figure 2 .
Figure 2. The monthly average precipitation and discharge series of Urumqi River from 1958 to 2006.The panel (a) is the precipitation sequence and panel (b) is the discharge sequence.

Figure 2 .
Figure 2. The monthly average precipitation and discharge series of Urumqi River from 1958 to 2006.The panel (a) is the precipitation sequence and panel (b) is the discharge sequence.

Figure 3 .
Figure 3.The boxplots of precipitation and discharge for each month and their autocorrelation functions with lags 1 to 20.(a) Boxplot of precipitation, (b) boxplot of discharge, (c) autocorrelation function of precipitation, (d) autocorrelation function of discharge.

Figure 3 .
Figure 3.The boxplots of precipitation and discharge for each month and their autocorrelation functions with lags 1 to 20.(a) Boxplot of precipitation, (b) boxplot of discharge, (c) autocorrelation function of precipitation, (d) autocorrelation function of discharge.

4 . 1 .
Estimation and Comparison of Copula Functions 4.1.1.Estimation of Non-Parametric Copula Functions and the Upper and Lower Dependence Coefficients

Figure 4 .
Figure 4.The histograms and empirical distribution functions of the samples of precipitation and discharge.(a) histogram of precipitation, (b) histogram of discharge, (c) empirical distribution function of precipitation, (d) empirical distribution function of discharge.

1 F − and 1 2 F
= be the standardized X and Y , respectively.Then, 1 ˆ( ) F x and 2 ˆ( ) F y denote the empirical distribution of X and Y , and 1

Figure 4 .
Figure 4.The histograms and empirical distribution functions of the samples of precipitation and discharge.(a) histogram of precipitation, (b) histogram of discharge, (c) empirical distribution function of precipitation, (d) empirical distribution function of discharge.

Water 2018 , 19 Figure 5 .square region 2 [ 0 , 1 ]
Figure 5.The estimations of Copula distribution function between river discharge and precipitation.denotes the standardized river discharge and is the standardized precipitation.The panel (a) is the distribution function and panel (b) is the cross-section of at .The 3-D plot of estimated Copula density function, ˆ( , ) c u v , is illustrated in Figure 6a over the unit square region

Figure 5 .
Figure 5.The estimations of Copula distribution function between river discharge and precipitation.u denotes the standardized river discharge and v is the standardized precipitation.The panel (a) is the distribution function and panel (b) is the cross-section of at u = v.

Figure 6 .
Figure 6.The estimations of Copula density function between river discharge and precipitation.u denote the standardized river discharge and v is the standardized precipitation.The panel (a) is the density function ˆ( , ) c u v , and panel (b) is the contour map of ˆ( , ) c u v .

uFigure 6 .
Figure 6.The estimations of Copula density function between river discharge and precipitation.u denote the standardized river discharge and v is the standardized precipitation.The panel (a) is the density function ĉ(u, v), and panel (b) is the contour map of ĉ(u, v).

Figure 7 .
Figure 7.The fitted Gumbel density surface and the histogram of the standardized discharge-precipitation pairs.(a) The Gumbel density surface with vertical axis range 0 to 60, (b) the Gumbel density surface with vertical axis range 0 to 6.5, (c) the histogram of the sample pairs after standardization.

1 2 F
− to restore each point in (b), then we obtain all the points in (d)).

Figure 7 .
Figure 7.The fitted Gumbel density surface and the histogram of the standardized discharge-precipitation pairs.(a) The Gumbel density surface with vertical axis range 0 to 60, (b) the Gumbel density surface with vertical axis range 0 to 6.5, (c) the histogram of the sample pairs after standardization.

Figure 9 .
Figure 9.The conditional probabilities of large and small discharge at different precipitation levels.(a) conditional probabilities of large discharge, (b) conditional probabilities of small discharge

Figure 9 .
Figure 9.The conditional probabilities of large and small discharge at different precipitation levels.(a) conditional probabilities of large discharge, (b) conditional probabilities of small discharge

Table 1 .
The p values of Cramer-von-Mises test between the observation data and various Copula models.
Water 2018, 10, x FOR PEER REVIEW 16 of 19

Table 5 .
The conditional probability that