The Copula Function-based Probability Characteristics Analysis on Seasonal Drought & Flood Combination Events on the North China Plain

Drought & flood events, especially the drought & flood combination events (DFCEs) on the North China Plain (NCP), known as an important grain production region in China, constitute a serious threat to China's food security. Studies on DFCEs in this region are of great significance for the rational allocation of water resources and the formulation of integrated response strategy for droughts and floods. In this study, L-moments theory and bivariate copula method were used to evaluate the probability characteristics of seasonal DFCEs (continuous drought, continuous flood, and alternation between drought and flood) on the NCP, based on the daily precipitation data (1960–2012) at 19 meteorological stations. Results indicate the following: (1) On the NCP, the precipitation in summer accounts for 56.45%–72.02% of mean annual precipitation, and the precipitation in autumn and spring come second. The winter precipitation is the smallest (less than 4%); (2) The best-fit 848 distribution for precipitation anomaly percentages in spring, summer and autumn are Generalized Normal (GNO), Generalized Logistic (GLO) and Pearson III (P-III) in sub-region I, respectively. While in sub-region II, they are respectively the P-III, P-III and Generalized Extreme-Value (GEV); (3) Compared with the Gumbel copula and Clayton copula, Frank copula is more suitable for spring-summer and summer-autumn precipitation anomaly percentage sequences on the NCP; (4) On the time scale, continuous drought respectively dominate in spring-summer DFCEs and in summer-autumn DFCEs on the NCP. Summer-autumn DFCEs prevail in sub-region I with the average probability value 0.34, while spring-summer DFCEs dominate in sub-region II, of which average probability value is 0.42; (5) On the spatial scale, most areas where the probability of continuous drought in spring-summer and spring drought & summer flood is relatively high are located in the northwest, northeast, and coastal parts of sub-region II; all the events with high probability of continuous drought in summer-autumn and summer flood & autumn drought occurred at the central part in the northwest of sub-region II.


Introduction
Due to the global warming, climate and environment have experienced significant changes, as a result of which the global average surface temperature has risen by 0.56-0.92°C [1,2].The increase of air temperature accelerated the rate of water cycle, thus leading to the increase in the frequency of occurrence of extreme hydrological events (i.e., droughts and floods), having brought about significant direct losses of social economy [3,4].However, continuous droughts, continuous floods and the alternation between drought and flood between seasons become more frequent than ever along with the exacerbation of climate changes impacts [5,6], and their impacts exhibit a multi-faceted and multi-level feature.Hence, it's essential to reinforce the studies on drought & flood combination events (DFCEs) so as to achieve the rational distribution of industry structure, the disaster risk management, and the optimized allocation of water resources.
At present, there are many studies performing characteristic analysis on drought or flood events based on copula functions.Specific to droughts, Shiau and Modarres [7] probed into the joint distribution of drought intensity, duration and frequency in Iran using the bivariate copula function.Results showed that the drought severity in humid region might be more severe under high rainfall fluctuations.Ibrahim et al. [8] have evaluated the dry conditions in Peninsular Malaysia using bivariate copula, and showed that the Galambos distribution provides the best fit for the majority of stations.Mirabbasi et al. [9] have tested several copulas to analyze the meteorological drought characteristics of the Sharafkhaneh gauge station, and come to the same conclusion as Ibrahim.According to the flood events, Zhang [10] derived bivariate distributions of flood peak and volume, and flood volume and duration through the bivariate Archimedes copula distribution, and results showed that bivariate copula distribution model fits better than others.Sraj et al. [11] have applied and compared different bivariate copulas to analyze the corresponding hydrograph volumes and durations of 58 flood events at the Litija gauging station on the Sava River in Slovenia.Results showed that the differences among most of the applied copulas were not significant.Li et al. [12] have analyzed flood peaks and flood volumes considering the historical information based on the copula.
Recently, frequency analyses of droughts or floods have been performed using multivariate copulas, which are more complex mathematically than the bivariate case [13].Some research focused more on the drought duration, severity, and intensity to analyze the drought events by applying trivariate copulas [14,15].And some researches devoted to the flood peak, volume, duration and suspended sediment concentration to probe into the flood events using the same kind of method as droughts [16][17][18][19].These studies aimed at individual drought or flood events, and did not take into account the DFCEs.Meanwhile, few studies on drought, flood or DFCEs analysis using copulas have been conducted on the NCP.
This paper focuses on the probability characteristics of seasonal DFCEs on the NCP based on the L-moments theory and bivariate copula distribution, so as to provide necessary basis for the response to droughts and floods on the NCP.This paper is arranged as follows: (1) Section 2 describes the study area and data used in the study; (2) Section 3 introduces analysis methods based on the L-moments, bivariate copula distribution and goodness of fit test; (3) Section 4 provides results and discussions; (4) Finally, Section 5 draws the conclusions.

Overview of the Region of Interest
The NCP (E: 112°30′-119°30′, N: 34°46′-40°25′) is a low-lying depositional plain of the Yellow River and the Haihe River and covers an area of 136,000 km 2 in the east of China (Figure 1).This plain has a continental monsoon climate, where it's dry and cold in winter, hot and rainy in summer, and dry in spring.The annual precipitation ranges from about 800 mm in the south to 500 mm in the north [20], most of which occurs from June to September.The mean annual land surface evaporation and water surface evaporation are 470 mm and 1100 mm, respectively [21].Droughts and floods happen frequently in this region.Meanwhile, there are four clear seasons on the NCP.Spring includes March, April and May.June, July and August belong to the summer.Autumn is composed of September, October and November.And the rest months pertain to the winter.The NCP is part of the basins of the Haihe River, the Yellow River and the Luanhe River.It comprises nearly 60 rivers in different sizes, including the Tuhai River, the Majia River, and the coastal rivers in Hebei Province etc.
The NCP is substantially located in the Haihe River Basin.According to the Historical Data of Floods and Droughts at the Haihe River Basin, floods and droughts have respectively occurred 194 and 192 times at the Haihe River Basin from 1469 to 1948.During the period of 1949-1990, both floods and droughts have occurred 18 times [22].The Haihe River Basin is also subjected to continuous occurrence of droughts and floods for years [23].From 1470 to 1911, droughts which lasted 2 years have occurred 11 times; three consecutive years of drought has occurred 8 times.Over three consecutive years of drought has occurred 7 times, in which the longest one lasted 7 years.Moreover, there were two dry years at the Haihe River Basin from 1956 to 1998.Specific to flood, it occurred continually at the Haihe River Basin during the period of 1652-1654 and 1822-1823.Besides, the wet years lasted 12 years during the period from 1883 to 1894, including three devastating floods.

Data Sources
The daily precipitation observations (1960-2012) are collected from 19 meteorological stations (Figure 1) on the NCP.These data are derived from the National Meteorological Information Center (NMIC) [24], and has been subjected to stringent quality inspection by NMIC [25,26].Besides, this paper makes the following provisions on data missing: Where the data for 5 days within one month is missing, the data for that month should be deemed missing.If the period of data missing reaches 1 month within a year, the data for that year should be considered missing.

Theory of L-Moments
L-moments are proposed by Hosking (1990) [27] based on probability weighted moments (PWM) and represent the PWM linear combination as [28].The most important feature of L-moments is that it is less sensitive than ordinary moments to the data sequence extremes and can obtain more robust parameter estimates.L-moments are defined as follows [29]: Assume F(x) is the distribution function of random variable X, let X1n ≤ X2n ≤ … ≤ Xnn be the random sampling order statistics of X which sample size is n.The r-order linear moment of variable X should be: where, EXr-i:r means the mathematical expectation of (r-i)-order statistics when sample size is r.

Identification of Homogenous Regions
The homogeneous region means the monitoring data of meteorological stations within the region could be expressed with the same probability distribution [26,29].The identification could be performed in many ways, principally including geographical proximity principle, subjective division method, objective division method, and clustering methodology [30].Clustering method is a multivariate statistical analysis-based standard classification method that is frequently used for hydrological partition [26].Clustering methods mainly include partitioning methods, hierarchical methods, density-based methods, grid-based methods, model-based methods, etc.This paper employs K-means under partitioning method.However, it should be noted that, since the result of cluster analysis is not necessarily the final result, it's necessary to make reasonable adjustments.For example, moving a meteorological station from one sub-region to another, subdividing a sub-region, merging a number of sub-regions, removing some meteorological stations, or adding new meteorological stations and performing repartition [30].

Screening Data Using the Discordancy Measure Test
Discordancy test is performed to check the existence of singular meteorological stations in each sub-region in the following principle: Assume there are N meteorological stations within a certain area, and it's possible to work out the linear moment coefficient of sample ( ( ) , ( ) , ( ) ) for each meteorological station, i = 1, 2, …, N; let: The discordancy measure D could be expressed as [29]: where, ( ) means the L-Cv of sample linear moment; ( ) represents the L-γ of sample linear moment; ( ) stands for the L-k of sample linear moment.When discordancy measure Di is not greater than critical value [26] (Table 1), no singularity is considered existing within the area.

Regional Heterogeneity Test
Regional heterogeneity test is performed to estimate the rationality of all meteorological stations being regarded as an area through the calculation of sub-regional homogeneity.Regional heterogeneity test could be performed in many ways.This paper employs H value test method [29].
The Monte Carlo simulation is performed to randomly generating sequences of Nsim (Nsim = 1000 in this study) equivalent region data via numerical simulation of kappa distribution with four parameters.And then the variability of the L-moments ratios, which is calculated from the simulated sequences is compared with those of the observations [31].
The heterogeneity measures Hj (j = 1, 2, 3) are calculated as follows: where, H1 is the standard deviation, weighted according to record length, of the at-site L-Cvs; H2 indicates the average distance from the site coordinates to the regional average on a plot of L-Cv versus L-γ; H3 means the average distance from the site coordinates to the regional average on a plot of L-γ versus L-k [32].Vj (j = 1, 2, 3) are the dispersion degree of sequences composed of variation coefficients of sample linear moments; μvj and σvj are respectively mean value and standard deviation of the Nsim values of Vj; N is the total number of meteorological stations in the region of interest; ni is the record length at site i; ( ) , ( ) , ( ) are, respectively, the sample L-Cv, L-γ and L-k at site i; , , are the weighted regional average L-Cv, L-γ and L-k, respectively.If H < 1, the sub-region could be considered homogeneous; if 1 ≤ H ≤ 2, such sub-region may be heterogeneous; if H > 2, such sub-region must be heterogeneous [29].

Selection of Best-Fit Distribution Function
The method of the best-fit distribution selection described by Hosking and Wallis [29] is based on a comparison between sample L-kurtosis and population L-kurtosis for different distributions.The test method is termed Z DIST and given as follows: where, DIST means a specified distribution function; is the regional average L-kurtosis value which is calculated from simulation for the distribution function; B4 and σ4 are the bias and standard deviation of the regional average L-kurtosis ( ) of Monte Carlo simulation samples by Kappa distribution [26]; Ns stands for the total number of samples in the region generated.If |Z DIST | ≤ 1.64, the distribution DIST designated could be accepted as uniform distribution of the region at 90% confidence level, and smaller |Z DIST | value indicates better fit.

Precipitation Anomaly Percentage
where, Pa means the precipitation anomaly percentage; p indicates the precipitation within a certain period, mm; ̅ means the long-time average annual value of the precipitation of such period, mm.

Bivariate Copula Joint Distribution Function
Copula function refers to the multivariate joint distribution function evenly distributed within [0, 1], and can combine with marginal distribution of a number of random variables and work out their joint distribution [33].Common copula joint distribution functions exhibit 3 types of structures, i.e., elliptic type, quadratic type, and Archimedean type.Bivariate Gumbel copula function, Clayton copula function and Frank Copula function under Archimedean copula function are more commonly used (Table 2).As a parameter of copula function, θ is calculated by Copula function non-parametric estimation method as proposed by Genest and Rivest [34] (Table 2); τ is the Kendall rank correlation coefficient of random sample, and could be calculated as follows [35]: where, sgn(•) is a sign function; xi and yi are random samples (i = 1, 2, …, n); n represents the random number of samples.

Kolmogorov-Smirnov (K-S) Test
K-S test method [36] is used to verify the distance DI between empirical distribution function Fn(x) and hypothetic overall distribution function F0(x).DI is calculated as follows:

Graphic Test
This method gives intuitive description of the goodness of fit through graphics.Theoretical joint probability and empirical joint probability are indicated in a scatter diagram.If the pitches are evenly distributed in the vicinity of 45° line, it means the joint probability distribution model built is reasonable.Bivariate empirical joint probability distribution is calculated as follows [34]: (14) where, Femp means empirical joint probability distribution; ng,k represents the number of joint observations that satisfy X ≤ xi1, X ≤ xi2; n stands for sequences length.

Seasonal Distribution of Regional Precipitation
This paper makes statistics of the mean annual precipitation and its seasonal distribution (Table 3) observed at each meteorological station on the NCP from 1960 to 2012.According to the results, summer contributed the major part of precipitation on the NCP, which accounted for 56.45%-72.02% of mean annual precipitation.The precipitation in autumn and spring came the second.Winter precipitation was the smallest (less than 4%).Meanwhile, it is hardly that the drought or flood events happen in winter according to the statistical results of historical drought and flood events in the Haihe River Basin [22].Therefore, this paper principally addresses the characteristics of DFCEs in spring, summer and autumn.

Results of Homogenous Regional Identification Based on L-moments
Four indicators of 19 meteorological stations on the NCP, including longitude, latitude, elevation and mean annual precipitation, are selected to identify the potential homogenous sub-region through K-Means cluster analysis.The regionalization results are shown in Figure 2. Discordancy test is performed respectively on the chosen seasonal precipitation (seasonal precipitation refers to the sum of precipitation of the three months in each season) of spring, summer and autumn at meteorological stations of each sub-region.This paper works out regional heterogeneity measure using Equations ( 7) and ( 8) through 1000 cycles of Monte Carlo simulation to test the regional heterogeneity.Finally, best-fit distribution functions are screened, including Generalized Extreme-Value (GEV), Generalized Logistic (GLO), Generalized Normal (GNO), Generalized Pareto (GPA), and Pearson III (P-III) [29,37].Results of regional discordancy test, heterogeneity test and best-fit distribution function screening are shown in Tables 4-6, respectively.
As shown in Figure 2, the NCP is divided into two sub-regions.Sub-region I and sub-region II have 6 and 13 meteorological stations, respectively.The critical values of discordancy measure D are respectively 1.648 and 2.869 in sub-region I and II.As can be seen from Table 4, the discordancy measure D of meteorological stations in both sub-region I and sub-region II is smaller than respective critical values, which indicates that there is no singular station.According to the result of regional heterogeneity test (Table 5), the values of heterogeneity measure H for sub-region I and sub-region II in spring, summer and autumn are all smaller than 1, which meets the consistency test requirement.Therefore, it is reasonable that the meteorological stations of sub-region I and sub-region II are regarded as being in one region, respectively.The statistics values Z DIST are calculated for distribution GLO, GEV, GNO, P-III and GPA in spring, summer and autumn, respectively.The results (Table 6) are shown that the best distribution fitting functions for spring, summer and autumn in sub-region I are GNO, GLO and P-III distributions, respectively.In sub-region II, the best distribution fitting functions for spring, summer and autumn are respectively P-III, P-III and GEV.
Besides, this paper compares the empirical frequency with the best-fit distributions' theoretical frequency of seasonal precipitation in spring, summer and autumn at each meteorological station through graphic test.Figure 3 further demonstrates that the selected best-fit distributions are reasonable.

Parameter Estimation and Goodness of Fit Test of the Best-Fit Distribution
According to Equation (11), the precipitation anomaly percentage and precipitation of a certain period are subject to the same distribution, so the fitting is possible with the same distribution fitting function.Therefore, the precipitation anomaly percentages in spring, summer and autumn are fitted with GNO, GLO and P-III distributions in sub-region I, respectively.In sub-region II, the precipitation anomaly percentages in spring, summer and autumn are, respectively, fitted with P-III, P-III and GEV.Moreover, the parameters of each distribution are estimated by L-moments method.Goodness of fit test is performed on each distribution by K-S method with statistically significance (α = 0.05, n = 53, D < 1.87).The results of parameter estimation and K-S test are shown in Table 7. Furthermore, this paper uses the graphic test again to compare the empirical frequency with the best-fit distributions' theoretical frequency of precipitation anomaly percentage in spring, summer and autumn at each meteorological station.Figure 4 further demonstrates that the selected best-fit distributions are reasonable.

Bivariate Copula Distribution Parameter Estimation and Fitting Test
For the precipitation anomaly percentage sequence in spring, summer and autumn at meteorological stations in each sub-region, this paper works out Kendall rank correlation coefficient τ with Equation ( 12), and estimates the parameter θ in bivariate Gumbel copula, Clayton copula, and Frank copula distribution functions based on the relation between θ and τ as shown in Table 2.By comparing the calculated value and its value range of parameter θ, Frank copula is suitable for using by the positive and negative correlation.Hence, this paper seeks the probability of seasonal DFCEs only through Frank copula distribution function.Refer to Table 8 for the Kendall rank correlation coefficient τ and the parameter θ of spring-summer and summer-autumn precipitation anomaly percentage sequence, as well as the results of K-S test.As shown in Table 8, K-S test has passed α=0.05 significance test (n = 53, D < 1.87).Furthermore, this paper compares the spring-summer and summer-autumn empirical frequency at each meteorological station with the theoretical frequency of Frank copula distribution through graphic test.Figure 5 further demonstrates that the selected Frank copula distribution function is reasonable.

Analysis of the Occurrence Frequency of DFCEs
According to the two inquations Pa < −25% and Pa > 25%, the precipitations of spring, summer and autumn at each station are divided into "drought", "normal" and "flood" states based on single-station drought & flood indicators division criteria (Table 9) [38].As shown in Table 9, Pa < −25% means "drought"; −25% ≤ Pa < 25% stands for "normal"; Pa ≥ 25% represents "flood".DFCEs between seasons could be divided into the following 4 categories (X and Y respectively refer to the spring-summer or summer-autumn precipitation anomaly percentages at a certain meteorological station).Pflo-flo=P(X ≥ 25%, Y ≥ 25%).
The frequency of above-noted 4 DFCEs is figured out based on the anomaly percentage marginal distribution of seasonal precipitation and the Frank copula joint distribution between seasons (see Table 10).As shown in Table 10:  This paper probes into the probability characteristics of DFCEs on the NCP from the perspective of meteorological drought & flood.It's possible to use this method in subsequent researches to make further analysis of DFCEs from hydrological, agricultural and social perspectives, etc., so as to provide convincing reference frame for the comprehensive response to flood and drought disasters and water resource management.

Figure 1 .
Figure 1.North China Plain, meteorological stations and rivers.

(is
Note: u and v are marginal distributions; Debye function, where m=1).

Figure 2 .
Figure 2. Identification of homogenous sub-regions in the NCP.

Figure 3 .
Figure 3.The comparison between empirical frequency and the best-fit distributions' theoretical frequency of seasonal precipitation at meteorological stations 53798 and 54511 (station 53798: (a) for spring; (b) for summer and (c) for autumn; station 54511: (d) for spring; (e) for summer; (f) for autumn.).

Figure 4 .
Figure 4.The comparison between empirical frequency and the best-fit distributions' theoretical frequency of precipitation anomaly percentage at meteorological stations 53798 and 54511 (station 53798: (a) for spring; (b) for summer and (c) for autumn; station 54511: (d) for spring; (e) for summer and (f) for autumn).

Figure 5 .
Figure 5.The comparison between empirical frequency and the theoretical frequency of Frank copula distribution at meteorological station 53798 and 54511 (station 53798: (a) for spring-summer; (b) for summer-autumn; Station 54511: (c) for spring-summer; (d) for summer-autumn).

Figure 6 .
Figure 6.Spatial distribution of the probability of seasonal DFCEs on the NCP.(a,b) for continuous drought in spring-summer and spring drought & summer flood, respectively; (c,d) for continuous drought in summer-autumn and summer flood & autumn drought, respectively).

Table 1 .
Critical values of discordancy measure coefficient D.

Table 2 .
Common types of Copula function and parameter estimation formula.

Table 3 .
Seasonal distribution of mean annual precipitation at meteorological stations in NCP.

Table 4 .
Results of region-specific seasonal precipitation discordancy measure D.

Table 5 .
Results of regional heterogeneity measure H.

Table 6 .
Calculation of statistics Z DIST for distribution fitting function of seasonal precipitation within sub-region.

Table 7 .
Parameter estimates and K-S test values of the best-fit distribution for precipitation anomaly percentage in spring, summer and autumn in NCP.

Table 8 .
Computation of Kendall rank correlation coefficient τ, parameter θ and K-S test in spring-summer and summer-autumn.

Table 9 .
Drought & flood grade standard of precipitation anomaly percentage (M).

Table 10 .
Computed result of joint distribution of DFCEs on the NCP.