Analysis of Flood Risk of Urban Agglomeration Polders Using Multivariate Copula

Urban agglomeration polders (UAPs) are often used to control flooding in eastern China. The impacts of UAPs on individual flood events have been extensively examined, but how flood risks are influenced by UAPs is much less examined. This study aimed to explore a three-dimensional joint distribution of annual flood volume, peak flow and water level to examine UAPs’ impact on flood risks based on hydrological simulations. The dependence between pairwise hydrological characteristics are measured by rank correlation coefficients and graphs. An Archimedean Copula is applied to model the dependence structure. This approach is applied to the Qinhuai River Basin where UAPs are used proactively for flood control. The result shows that the Frank Copula can better represent the dependence structure in the Qinhuai River Basin. UAPs increase risks of individual flood characteristics and integrated risks. UAPs have a relatively greater impact on water level than the other two flood characteristics. It is noted that the impact on flood risk levels off for greater floods.


Introduction
In the plain region of eastern China, urbanization has been developing rapidly for decades [1,2].In order to protect life and property [3], polders have been built in cities in the regions which include Suzhou, Wuxi, Changzhou and Jiaxing in the Yangtze River Delta plain, and Nanjing, Jurong and Lishui in the Qinhuai River Basin, a tributary of the lower Yangtze River.With embankment, sluice controls and drainage stations, polders can control external floods and drain excessive water that is within its range [4].After the catastrophic floods in 1954, 1991 and 1998, dikes continue to be strengthened and raised, and flood control standards have been gradually increased.In some cities, such as Nanjing, dikes even can control the 100-year flood.These dikes and embankments are "heroes" for controlling external floods, but they are also "time bombs" placed next to cities.For example, during the catastrophic flood in the Yangtze River in 1998 [5], the Yangtze River embankment in Jiujiang City breached due to high water levels, outflows of the Poyang Lake, and most primarily infiltration damages to the embankment basement.As a result, the entire city was faced with the risk of being engulfed by floods.
With the increasing flood control capacity within polders, excess water can be drained to the rivers out of polders in a relatively short period of time [6], resulting in soaring river water levels and in return increasing the flood risks of the polder area and even the entire basin [7,8].Considering flood changes with urban polders, Xing et al. [9] calculated the risk factor of hydrological failure for levees of the external Qinhuai River.Xu et al. [10] proposed a method to determine waterlogging risks for river basins with polders based on analysis of flood characteristics.Zhao et al. [11] studied the accident risk analysis method for dikes.Zhang [12] analyzed the influence of polders, which were completed as joint river embankments for the period of the 1950s to the 1970s, on storage capacity and ecological environment in the Taihu Lake Basin.Xu et al. [13] analyzed the change of river network structure caused by polders using GIS and RS.Yuan et al. [14] simulated flood events in the Qinhuai River Basin with an HEC-HMS hydrological model and analyzed the influence of polder features on flood control.Chen et al. [15] established a comprehensive evaluation model to analyze the waterlogging risks of farm lands under the current situation and various scenarios with polder construction in Yundong District, Gaoyou, China.However, there is little research on flood risks caused by polders, and most of the research on flood risks is limited to a single factor.Therefore, based on the previous work of the HEC-HMS hydrological model and an HEC-RAS hydraulic model developed by Yuan et al. [14], the impact on flood risks by developing polders are examined using the hydrological and hydraulic simulation results.

Study Area and Data
The Qinhuai River Basin (Figure 1) is located in the lower Yangtze River with a total drainage area of 2631 square kilometers, between the latitude of 28-31 • N and the longitude of 118-121 • E. It lies in the subtropical humid and semihumid monsoon climate region, with an average annual precipitation of 1047 mm and an average annual temperature of 15.4 • C. The precipitation gradually increases from north to south and the intra-annual temperatures vary greatly.The land-use types are primarily paddy field, and secondly, dry land and urban land, accounting for 62.4%, 23.3% and 5.6% of the total drainage area, respectively.

Archimedean Copula
The Copula function can be used to describe dependencies of two or more variables, and there is no assumption on underlying marginal distribution of variables [16,17].Assuming that variables X 1 , X 2 , . . ., X m have marginal distributions as F X 1 , F X 2 , . . ., F X m and their joint distribution is F(X 1 , X 2 , . . . ,X m ), then the Copula function can be defined as The Archimedean Copula is one of the most widely used Copula functions, mainly because of its small number of parameters, simple structure and clear expression.Generally, a one-parameter Archimedean Copula is symmetric, and a d-dimensional Copula C d : (0,1) d →(0,1) can be defined as: where ϕ(•) is the generating function of the Archimedean Copula, and ϕ −1 is the inverse function of ϕ.
Common Archimedean Copulas include Gumbel-Hougaard, Clayton, and Frank Copula, and their functions are shown in Table 1.

Dependence and Ranks
To measure the statistical dependence between random variables, two well-known nonparametric dependence measures, that is, the Spearman rank correlation coefficient (ρ) and Kendall rank correlation coefficient (τ), are used.The Spearman's ρ and Kendall's τ are calculated using ranking of variable values rather than actual values, so they are invariant under monotonic nonlinear transformations [21].They are calculated as follows: where R i and S i denote the ranking of variable X and Y, respectively.Sign(x) is the indicator function, sign(x) = 1 when x is positive, sign(x) = −1 when x is negative, sign(x) = 0. when x equals zero.
Visual tools offer a qualitative and direct way to examine dependence.In addition, two rank-based graphical tools for detecting dependence are also employed in this paper, namely, chi-plots and K-plots [21][22][23].

•
Chi-plot The chi-plot, originally proposed by Fisher and Switzer (1985) [24], is a scatter plot of the data pair (λ i , χ i ).λ i and χ i are calculated as where where I is an indicator function and n denotes the sample size.i = 1, 2, . . ., n.The value of λ i measures the distance of the observed pair value (x i , y i ) from the center of the scatter plot, and is positively associated with Y, λ i will tend to be positive.If X and Y are independent, it is expected that H i = F i G i , then χ i = 0. Therefore, values of χ i that fall far from zero indicate a certain degree of association between two variables.To help identify such departures, the chi-plot also includes control limits drawn at χ i = ±c p / √ n where c p values are suggested by Fisher and Switzer [24].The c p values of 1.54, 1.78 and 2.18 correspond to p-values of 0.9, 0.95 and 0.99, respectively.When the scatter points fall largely on the upper side of the control limits, it indicates positive dependence; whereas in the case of negative dependence, the scatter points are largely in the lower side of the control limits.

• K-plot
The K-plot, firstly proposed by Genest and Boies (2003) [25], is analogous to a QQ plot.The K-plot consists in plotting the data pairs (H (i) , W i:n ) for i ∈ {1, 2, . . . ,n}.H (i) in abscissa is the order statistics of the random variable H i introduced in Section 2.2.2.As for W i:n in ordinate, it is the expected value of the i-th statistic from a random sample of size n from a random variable W = F(X, Y) = C(U, V).It is given by where A straight line across the main diagonal is superimposed on the graph to the case of independence, and a smooth curve K(w) located above the straight is associated with perfect positive dependence.
In case of perfect negative dependence, it is shown as a string of data points aligned on the x-axis.

Goodness of Fit
The goodness of fit is examined with root-mean-square error (RMSE) and Akaike information criteria (AIC), two commonly used performance metrics [26].
RMSE is very sensitive to extremely large or small deviations between the theoretical value and the actual value, and it can represent the goodness of fit of a model.RMSE is calculated as follows: where F(x) is the theoretical distribution function; and f and f 0 are the probability density functions of F and F 0 , respectively.AIC was proposed by a Japanese statistician Akira Hiroshi.It combines the maximum likelihood method with the maximum entropy principle to derive the selection criterion of the best model [17].It is defined as: where m is the number of the Copula parameters; and f and f 0 are the probability density functions of F and F 0 , respectively.The distribution function with better goodness of fit is selected by the minimum values of RMSE and AIC.

Dependence of Flood Characteristics
Pair-wise dependence among the three flood characteristics, that is, flood volume (V), peak flow (P) and water level (Z), is shown in Appendix A Table A1, along with their corresponding p-values.Figure 3 presents chi-plots and K-plots of pair-wise flood characteristics for scenarios with and without polders.The dependence between pair-wise flood characteristics in the Qinhuai River Basin is strong, and the dependence between peak flow and water level (P&Z) is the strongest, which makes it reasonable to illustrate their joint probabilistic characterizations using a multivariate Copula function.For Figure 3a, data points are largely in the upper side of the control limits in the chi-plot or deviated from the main diagonal in the K-plot, confirming strong dependence among variables.The increased density of points close to the horizontal line of 1 in the upper-right corner in the chi-plot indicates strongest dependence between peak flow and water level (P&Z).
Comparing Figure 3a with Figure 3b, it can be easily seen that dependence among flood characteristics with UAPs is weaker than that without UAPs.It appears that the original river network is dissected by UAPs, and the hydraulic connectivity of the whole basin is decreased, which results in weaker dependence between flood characteristics.

Marginal Distribution
The descriptive statistics for the flood characteristics of both scenarios with and without UAPs are shown in Appendix A Table A2.The skewness and kurtosis of flood volume is relatively obvious.To fit marginal functions to flood characteristics, three parametric distribution functions, namely the Pearson type three distribution (P-Π Π ), the generalized extreme value distribution (GEV) and the log-normal distribution (LN), are tested.The IFM method [27] is applied to estimate the parameters, and the KS test is conducted at a 5% significance level to determine the best-fitted function.The smaller the KS value, the better the fitting.The larger the p-value, the higher the chance that the distribution would be accepted.The results are shown in Table 2. Based on the results, the best-fitted marginal distributions for flood volume, peak flow and water level, marked as bold, are chosen as GEV, LN and P-Π Π , respectively.The marginal fitting results are the same for the scenarios with UAPs.The correlation of empirical and theoretical frequency of marginal distribution is plotted in Figure 4.The points are positioned along the main diagonal 45 • line, showing good fitting.

Joint Distribution
The marginal distributions for all three flood characteristics have been determined, and IFM is applied to estimate parameters of the Copula function.RMSE and AIC are used to test the goodness of fit.The results are shown in Table 3.Based on the results, the Frank Copula is chosen as the joint probability distribution to analyze integrated flood risk for scenarios with and without UAPs.

Impacts on Flood Risks of Polders
According to the identified marginal distributions and joint distribution, the design values for flood characteristics at different joint return periods (JRPs) are compared.The results are shown in Table 4.By the definition of the Copula function, the design value based on the joint distribution is larger than the design values based on marginal distributions for the same return period, resulting in more conservative design values.At the same JRP, the design value of flood characteristics with UAPs is larger than those without UAPs.The flood control standards need to be improved when UAPs are used.
When runoff in a polder increases quickly, excessive water will be drained to external rivers in a short time, resulting in increasing flood out of the polder.Draining excessive water over multiple polders and peaking in external rivers will increase peak flow and water level.Two types of exceedance probability, P (P > p OR Z > z | V < v) and P (P > p AND Z > z | V < v), under the condition of flood volume less than 300 m 3 , are calculated and the conditional probability diagrams are shown in Figure 5. Figure 6 presents the corresponding contour based on the diagram of Figure 5.For both probabilities, the scenarios with UAPs showed higher risks.The contour of 'OR' conditional probability is denser than that of 'AND' conditional probability.The UAP will intensify the flood risk of the whole basin, with 'OR' exceedance scenario particularly easy to be influenced.According to the Nanjing Urban Flood Control Planning Report (2011-2020) [28], the design flood control level of Dongshan Station is 10.8 m and the design peak flow at the outlet is 1463 m 3 /s for a return period of 20 years.Conditional probabilities of peak flow 'AND/OR' water level for these design values for different return periods are shown in Table 5.With UAPs, the risk of either peak flow or water level exceeding the flood warning is higher than that of the scenarios without UAPs.As the return period increases, the relative difference between scenarios with UAPs and without UAPs reduces, showing that the impact of UAPs on integrated risk attenuates with increasing flood magnitudes.This may be due to the fact that draining water from the polder area to external rivers is prohibitive when water levels are extremely high in the entire watershed.When marginal distributions for flood volume, peak flow and water level are determined, the corresponding exceedance probability for three flood characteristics (Figure 7) and the integrated flood risk of the Qinhuai River Basin (Figure 8) can be easily calculated.The exceedance probability of flood volume, peak flow and flood level for scenarios with UAPs is larger than that for scenarios without UAPs, but the difference dwindles as flood magnitude increases.Therefore, UAPs increase flood risks in the basin while controlling floods within UAPs, and the impact on water level is greater than flood volume and peak flow.The impact of UAP on flood risks also dwindles as flood magnitude increases.

Impacts on Flood Risks of Polder Area
In order to study the impact of polder geographic distribution, four scenarios are examined, which are From scenarios (a) to (d), the number and total area of polders increase (Figure 9).6.The results show that flood risks increase as polder area and number increase.Integrated flood risk is smaller than marginal flood risks.For the minor flood in 1989, with the increasing polder area, the risk of flood volume and peak flow increase first, followed by increasing water level.For the medium flood in 1987, the risk increase of the water level was greater than that of flood peak.For the major flood in 1991, increasing polder areas had primarily influenced the water level and flood peak.In general, polders in river basins increase various risk factors for basin-wide floods while protecting cities within the polders from flood.As polder area increases, the impact on flood risks escalates.For minor floods, flood risks are not high per se, and the impact due to polders does not appear to be a threat to the entire basin.A paradox is that increasing polder area will provide flood control to a higher area within the polder, but also cause increasing flood risks for the entire basin.Therefore, a tradeoff is warranted between protection for cities within polders and risk for the entire

Figure 1 .
Figure 1.The location of the study area-the Qinhuai River Basin.Cities in the Qinhuai River Basin construct their own polders on the basis of expanding and adjusting protection areas in the basin.A total of six flood control polders had been built in the basin

Figure 2 .
Figure 2. The urban aggregation polders in the Qinhuai River Basin.

Figure 3 .
Figure 3. Chi-plots and K-plots of pair-wise flood characteristics for scenarios with and without polders: (a) Graphical representation of strength of dependence using chi-plot (upper row) and K-plot (lower row), the pair-wise variables are V&P, V&Z and P&Z from left to right in order without UAPs; (b) graphical representation of strength of dependence using chi-plot (upper row) and K-plot (lower row), the pair-wise variables are V&P, V&Z and P&Z from left to right in order with UAPs.

Figure 4 .
Figure 4.The correlation of empirical and theoretical frequency of marginal distribution: (a) The marginal fitting plots of the flood variables without UAPs; (b) the marginal fitting plots of the flood variables with UAPs.

Figure 7 .
Figure 7.The exceedance probability curve of individual flood variables.

Figure 8 .
Figure 8. Integrated flood risk comparison of Qinhuai River Basin with and without polders.

Figure 9 .
Figure 9.The map of polder distribution: (a) Jurong City Circle only; (b) Jurong City Circle in the upper reach and Qianhancun City Circle in the middle reach; (c) Jurong City Circle in the upper reach, Qianhancun City Circle in the middle reach, and Dongshan City Circle in the lower reach; (d) Jurong and Lishui City Circle in the upper reach, Qianhancun City Circle in the middle reach, and Dongshan City Circle in the lower reach.Three rainfalls with different intensities were selected to analyze the flood risk for the four polder scenarios.The three rainfalls occurred on 3 August 1989 (light rainfall), 1 July 1987 (medium rainfall), and 30 June 1991 (heavy rainfall).The results are shown in Table6.The results show that flood risks increase as polder area and number increase.Integrated flood risk is smaller than marginal flood risks.For the minor flood in 1989, with the increasing polder area, the risk of flood volume and peak flow increase first, followed by increasing water level.For the medium flood in 1987, the risk increase of the water level was greater than that of flood peak.For the major flood in 1991, increasing polder areas had primarily influenced the water level and flood peak.

Table 2 .
Parameter estimation and fitting performance of marginal distributions.

Table 3 .
The goodness-of-fit test of joint distribution.

Table 4 .
Design value for flood characteristics.

Table 5 .
The conditional probability of peak flow and water level at different return periods of flood volume.

Table 6 .
The risk results of different polder distribution scenarios.

Table A2 .
Statistics of flood variables.