Copula-Based Stochastic Simulation for Regional Drought Risk Assessment in South Korea

In South Korea, meteorological droughts are becoming frequently-occurring phenomena in different parts of the country, because precipitation varies significantly in both space and time. In this study, the quantiles of four identified homogeneous regions were estimated by incorporating major drought variables (e.g., duration and severity) based on the Standardized Precipitation Index (SPI). The regional frequency analysis of drought was performed by evaluating a variety of probability distributions and copulas, using graphical comparisons and goodness-of-fit test statistics. Results indicate that the Pearson type III (PE3) and Kappa marginal distributions, as well as Gaussian and Frank copulas, are better able to simulate the drought variables across the region. Bivariate stochastic simulation of selected copulas showed that the behavior of simulated data may change when the degree of association (e.g., Kendall’s τ) between the drought variables was considered. Results showed that the south-west coast and east coastal areas are under high drought risk, and inland mid-latitude areas (surrounding areas of Yeongju station) and northwest parts are under low drought risk. The joint distributions were used to compute conditional probabilities, as well as primary, secondary, and conditional return periods, which can be useful for designing and managing water demand and the supply system on a regional scale.


Introduction
The United Nations classified South Korea as water-deficit country, and recently, South Korea has faced serious droughts and water deficit problems [1]. Historical climate records confirm the presence of severe droughts across South Korea [2,3]. In addition, it is reported that the temperature of South Korea increased throughout the 20th century [4]. Since anthropogenic climate change is expected to increase in future, South Korea will be vulnerable to extreme drought events. It is reported that changes in climate extremes have an adverse effect on water resources of South Korea [5].
The stochastic nature of drought should be expressed by different correlated random variables, such as drought duration and severity. A bivariate distribution is needed to express those correlated drought variables. In the case of flood studies, there are a variety of bivariate distributions applied to obtained joint distribution, such as bivariate normal distribution [6], bivariate exponential distributions [7], and bivariate gamma distributions [8]. However, these bivariate distributions are unable to characterize the individual behavior of random variables using the same parametric family of univariate distribution [9]. In the case of drought studies, drought characteristics are assumed to be

Drought Identification
The Standardized Precipitation Index (SPI) proposed by Mckee [27] was used in this study for identifying the duration of drought events and to evaluate their severity. The SPI was computed based on fitting long-term rainfall data to the probability distribution on any desired timescale, such as 1, 3, 6, 12, and 24 months. It was found that the gamma distribution fit more closely to the precipitation data of candidate stations (55 total) across South Korea [28]. Since there are a number of zero-bounded continuous variables in climatology, it is important to give a distribution which may be used for such variables. The gamma distribution, which has a zero lower bound, has been found to fit several such variables well [29]. Therefore, gamma distribution has successfully been applied for drought studies by many researchers in South Korea [1,30].
In this study, the SPI timescale was estimated using a timescale of 6 months, because it matches well with the dry and wet alternations in South Korea, and has successfully been applied for drought monitoring in South Korea [1]. The duration of any drought event is defined as the time when the SPI values I continuously remain below the predefined truncation level [31]. The severity of any drought event indicates the cumulative deficiency below the truncation level, defined as S =

Drought Identification
The Standardized Precipitation Index (SPI) proposed by Mckee [27] was used in this study for identifying the duration of drought events and to evaluate their severity. The SPI was computed based on fitting long-term rainfall data to the probability distribution on any desired timescale, such as 1, 3, 6, 12, and 24 months. It was found that the gamma distribution fit more closely to the precipitation data of candidate stations (55 total) across South Korea [28]. Since there are a number of zero-bounded continuous variables in climatology, it is important to give a distribution which may be used for such variables. The gamma distribution, which has a zero lower bound, has been found to fit several such variables well [29]. Therefore, gamma distribution has successfully been applied for drought studies by many researchers in South Korea [1,30].
In this study, the SPI timescale was estimated using a timescale of 6 months, because it matches well with the dry and wet alternations in South Korea, and has successfully been applied for drought monitoring in South Korea [1]. The duration of any drought event is defined as the time when the SPI values I continuously remain below the predefined truncation level [31]. The severity of any drought event indicates the cumulative deficiency below the truncation level, defined as S = ∑ D i=1 |−I i | (Figure 2), as described in [31]. In this study, the theory of run analysis was performed using the predefined truncation level of −0.99.
Water 2018, 10, x FOR PEER REVIEW 4 of 30 ∑ |−I | (Figure 2), as described in [31]. In this study, the theory of run analysis was performed using the predefined truncation level of −0.99.   Figure 3a shows that droughts of highest severities were recorded at the southwest coast (areas surrounding Jangheung, Goheung, Haenam, and Mokpo stations) and the areas surrounding Sancheong and Jecheon stations (Figure 3b). It can be seen that the droughts of longest mean duration mostly correspond to the droughts of highest mean severities, and thus drought duration and severity are correlated variables.   Figure 3a shows that droughts of highest severities were recorded at the southwest coast (areas surrounding Jangheung, Goheung, Haenam, and Mokpo stations) and the areas surrounding Sancheong and Jecheon stations (Figure 3b). It can be seen that the droughts of longest mean duration mostly correspond to the droughts of highest mean severities, and thus drought duration and severity are correlated variables.  . Spatial distribution of (a) mean duration (months); (b) mean severity, using the inverse distance weighted (IDW) method. IDW assumes substantially that the rate of correlations and similarities between neighbors is proportional to the distance between them. The IDW method is recommended when there are enough sample points with a suitable dispersion at local scale. The accuracy of IDW is affected by the size of the neighborhood and the number of neighbors.

Cluster Analysis and Testing of Regional Homogeneity
A more robust and statistic based clustering algorithm, the hierarchical classification on principal components (HCPC), was applied for the delineation of homogeneous regions. The regionalization of drought variables was performed, considering the topographical variables such as latitude, longitude, and elevation above sea level (m), as well as climatological variables, such as mean annual precipitation (mm), mean daily maximum temperature (°C), mean daily minimum temperature (°C), annual evaporation (mm/year), and mean relative humidity (%) (refer to [32]). The HCPC clustering algorithm proposed in this study dealt with the above-stated eight topographical and climatological variables, which are likely to affect the drought mechanism in the region. Clusters formed by HCPC algorithm were validated using distance-based validating indices (e.g., connectivity, silhouette width, Dunne index, and Calinski and Harabasz index). A detailed explanation about the bivariate discordancy and homogeneity tests applied for regionalization of drought across South Korea is provided in [32]. . Spatial distribution of (a) mean duration (months); (b) mean severity, using the inverse distance weighted (IDW) method. IDW assumes substantially that the rate of correlations and similarities between neighbors is proportional to the distance between them. The IDW method is recommended when there are enough sample points with a suitable dispersion at local scale. The accuracy of IDW is affected by the size of the neighborhood and the number of neighbors.

Cluster Analysis and Testing of Regional Homogeneity
A more robust and statistic based clustering algorithm, the hierarchical classification on principal components (HCPC), was applied for the delineation of homogeneous regions. The regionalization of drought variables was performed, considering the topographical variables such as latitude, longitude, and elevation above sea level (m), as well as climatological variables, such as mean annual precipitation (mm), mean daily maximum temperature ( • C), mean daily minimum temperature ( • C), annual evaporation (mm/year), and mean relative humidity (%) (refer to [32]). The HCPC clustering algorithm proposed in this study dealt with the above-stated eight topographical and climatological variables, which are likely to affect the drought mechanism in the region. Clusters formed by HCPC algorithm were validated using distance-based validating indices (e.g., connectivity, silhouette width, Dunne index, and Calinski and Harabasz index). A detailed explanation about the bivariate discordancy and homogeneity tests applied for regionalization of drought across South Korea is provided in [32].

Selection of Regional Distribution Models
After making sure that identified regions are homogeneous, it is necessary to identify the probability model for each of the drought variables; in our case, the drought variables are duration and severity. To accomplish this task, five candidate probability distributions, such as generalized logistic (GLO, three parameters), generalized Pareto (GPA, three parameters), generalized extreme value (GEV, three parameters), generalized normal (GNO, three parameters), and Pearson type III (PE3, three parameters) were considered in this study. In this study, the L-moment ratio diagrams and a goodness-of-fit measure (the Z statistic) were used to identify the best-fitted marginal distribution, as recommended by many researchers [33][34][35][36]. The Z statistic for a particular distribution [37] can be expressed as follows. (1) where t R 4 indicates regional average L-kurtosis, σ t 4 indicates the standard deviation of t R 4 , and t dist 4 indicates the L-kurtosis of the fitted distribution. The best-fitted probability distribution is selected based on the smallest value of |Z|. The probability distribution function, with the |Z| ≤ 1.64 at a significant level of α = 10%, is considered acceptable for representing the sample data [35].

Dependence Measures
Dependence among drought variables was analyzed using both qualitative and quantitative approaches. Qualitative dependence was assessed using graphical diagnostic tools, such as Chi and Kendall plots, whereas quantitative dependence was assessed using Pearson's linear correlation r, Spearman's rank correlation ρ, and Kendall's τ. The detailed description of the Chi and Kendall plot are provided in [14].

Copula Function
Copula helps us to construct a joint distribution function of univariate random variates and be able to capture the dependence between two variables. Let X = (X 1 , X 2 , ..., X N ) indicate a random vector with a continuous marginal distribution function (CDFs) F 1 , F 2 , . . . , F N . Sklar [11] describes the relationship between CDF H(X) copula function C, which can be written as described in [38]: where the unique function C : [0, 1] d → [0, 1] is called a d-dimensional copula. The multivariate joint distribution model for H can be constructed in two parts: (1) computation of the marginal CDFs (F 1 , F 2 , . . . , F N ) and (2) the computation of the copula model (C). In this study, the maximum pseudo-likelihood (MPL) approach was used to compute the parameters of copula family [38]. Candidate copula families used for the analysis were the elliptical and Archimedean copula, as shown in Table 1. Elliptical copulas include normal (Gaussian) and Student's t, and Archimedean copulas include Clayton, Gumbel, Joe, and Frank. The best-fitted copula was chosen using parametric the bootstrap-based Cramér-von Mises test (Sn) [39] and Akaike information criterion (AIC) [40] goodness-of-fit. Bivariate joint distributions were estimated using the Copula package in R programming [41]. Table 1. List of copulas applied in this study.

Conditional Probability
Copula-based joint distribution is necessary to derive the conditional probability distributions of drought duration and severity. The conditional probability of drought severity with drought duration exceeding various threshold values is indicated by d , and the conditional probability of drought duration with drought severity exceeding various threshold values is indicated by s . The corresponding conditional probabilities can be expressed as follows [42].

Primary Return Periods
Estimation of the return period has special importance in the planning and management of water resources. Let us suppose that D and S denote the drought duration and severity, respectively-then, the univariate return period can be computed using the procedure described in [15,43]: T D and T S indicate the return period for drought duration and drought severity, respectively. F(d) and F(s) indicate the cumulative distribution functions of drought duration and severity, respectively. In this study, µ t can be calculated using the theory of run and the Markov theorem [42]: where P DW = p(SPI t ≤ −0.99|SPI t−1 − 0.99 and P WD = p(SPI t > −0.99|SPI t−1 ≤ −0.99) . Here, the unit of µ t is months. There are two possible cases for the derivation of return periods as described by [42]: (1) the return period for D ≥ d and S ≥ s and (2) the return period for D ≥ d or S ≥ s. The copula-based joint return period in the above stated cases can be derived as follows: where C(F (d), F (s)) indicates the copula-based joint distribution function of the drought duration and severity, respectively. In this study, sign ∧ indicates "and", and sign ∨ indicates "or". Additionally, "or" is also known as standard return periods.

Secondary Return Period
The secondary return period provided a precise indication for performing risk analysis, and may also yield useful hints for doing numerical simulations [13,44,45]. In this study, the secondary return period was adopted for the evaluation of drought risk within South Korea, and was compared with the primary return periods computed using Equations (5) and (6). The occurrence probability should be estimated for the calculation of secondary or Kendall's return period. K C , denoting Kendall's distribution, can be computed following K C (t) = P(C(F (d, s) ≤ t)). The related secondary return period can be derived as follows [32]: where t indicates the critical probability level and K C indicates the Kendall distribution function. The detailed description of the K C function for other Archimedean and elliptical copula families are available in [15,38,45].

Conditional Return Period
The concept of a conditional return period has special importance, especially for particular conditional drought events that are of interest in our applications. For instance, the evaluation of the risk for malfunction of a specific water resources system needs to consider the drought event, at the given drought duration D, when drought severity equals or exceeds a certain design value s, and vice versa.
The return period of drought duration and severity is derived conditionally, following the expression proposed by [42]. For example, the conditional return period for D given S ≥ s, and the conditional return period for S given D ≥ d: T D|S ≥ s indicates the conditional return period of drought duration when drought severity exceeds a certain level, and T S|D ≥ d indicates the conditional return period of drought severity when drought duration exceeds a certain level.

Bivariate Regionalization of Drought in South Korea
The HCPC algorithm was applied for the formation of initial regions, using topography and climate-based site characteristics of each station. However, regions obtained by the HCPC algorithm are not statistically homogeneous. Univariate and bivariate discordancy and homogeneity tests results showed that Tongyeong station in Region I, as well as the Ganghwa, Jeonju, and Boeun stations in Region III, were identified as discordant sites. Following methodology proposed by [37], the stations shifted from one region to another, in order to improve the homogeneity of the regions. The spatial distribution of the final identified homogeneous regions is presented in Figure 4. Region II, located at the mid-latitude inland of South Korea, is the smallest region compared to the others, as it contains only five stations. Moderate drought attributes observed in Region II may be because of its location away from the coastal areas, making it thus less affected by summer typhoons, which occur mostly at the coastal areas. Table 2 showed that all the regions fulfill the criteria of homogeneity. The detailed results of the bivariate homogeneity tests are provided in [32]. Since regional drought frequency analysis is based on the extreme values of drought variables (duration and severity), basic statistics such as mean, maximum, minimum, standard deviation, skewness, and kurtosis were computed for four homogeneous regions ( Table 3). The longest drought duration of 13 months was recorded at Jecheon station in Region I, and the highest severity of 22.30 was recorded at Yeongju station in Region II.
The HCPC algorithm was applied for the formation of initial regions, using topography and climate-based site characteristics of each station. However, regions obtained by the HCPC algorithm are not statistically homogeneous. Univariate and bivariate discordancy and homogeneity tests results showed that Tongyeong station in Region I, as well as the Ganghwa, Jeonju, and Boeun stations in Region III, were identified as discordant sites. Following methodology proposed by [37], the stations shifted from one region to another, in order to improve the homogeneity of the regions. The spatial distribution of the final identified homogeneous regions is presented in Figure 4. Region II, located at the mid-latitude inland of South Korea, is the smallest region compared to the others, as it contains only five stations. Moderate drought attributes observed in Region II may be because of its location away from the coastal areas, making it thus less affected by summer typhoons, which occur mostly at the coastal areas. Table 2 showed that all the regions fulfill the criteria of homogeneity. The detailed results of the bivariate homogeneity tests are provided in [32]. Since regional drought frequency analysis is based on the extreme values of drought variables (duration and severity), basic statistics such as mean, maximum, minimum, standard deviation, skewness, and kurtosis were computed for four homogeneous regions ( Table 3). The longest drought duration of 13 months was recorded at Jecheon station in Region I, and the highest severity of 22.30 was recorded at Yeongju station in Region II.

Identification of Regional Marginal Distribution
The L-moment ratio diagram, in the case of drought duration for the four identified homogeneous regions, is shown in Figure 5. Among the five candidate probability distributions, the PE3 distribution line passed through the small portion of the observed data, in case of Regions I and III, and passed away from the observed data in the case of Regions II and IV. However, it is noted that the regional average point is always located away from the PE3 distribution line. The Z statistic value showed that none of the candidate probability distributions satisfied the criteria that |Z| ≤ 1.64 (Table 4). This may be due to a large number of ties, especially with the short drought duration. Therefore, for drought duration, a more robust Kappa distribution was selected, as recommended by [37,[46][47][48]. In the case of drought severity, the L-moment ratio diagram ( Figure 6) and Z statistics (Table 4) showed that except for Region I, all other regions can be modeled with PE3 distribution. For Region I, a four-parameter Kappa distribution was selected.     The visual comparison between empirical and theoretical probability distribution functions (PDFs) and their corresponding cumulative distribution functions (CDFs) for the drought duration and severity of each region is presented in Figure 7. CDFs and PDFs were computed using Kappa distribution for Regions I, II, III, and IV, in the case of drought duration, and for Region I in the case of drought severity. The drought severity of Regions II, III, and IV were computed using PE3 distribution. Here, the empirical cumulative probability was calculated by using plotting position formula p(X ≤ x i ) = i−0.35 n , proposed by Hosking [49]. Here, i denotes the rank of the observations in ascending order, and n denotes sample size. The derived theoretical CDFs and PDFs show good agreement with the empirical ones, which indicates that these probability distributions perform well with the observed data. The CDFs of the final best-fitted probability distributions and their corresponding parameters are given in Table 5. The visual comparison between empirical and theoretical probability distribution functions (PDFs) and their corresponding cumulative distribution functions (CDFs) for the drought duration and severity of each region is presented in Figure 7. CDFs and PDFs were computed using Kappa distribution for Regions I, II, III, and IV, in the case of drought duration, and for Region I in the case of drought severity. The drought severity of Regions II, III, and IV were computed using PE3 distribution. Here, the empirical cumulative probability was calculated by using plotting position formula p(X ≤ x ) = . , proposed by Hosking [49]. Here, i denotes the rank of the observations in ascending order, and denotes sample size. The derived theoretical CDFs and PDFs show good agreement with the empirical ones, which indicates that these probability distributions perform well with the observed data. The CDFs of the final best-fitted probability distributions and their corresponding parameters are given in Table 5.
β:1.232; α:0.557; ξ:0.313 β:1.274; α:0.510; ξ:0.350 β:1.44; α:0.444; ξ:0.362 The parameters of each distribution were estimated using the L-moment method. Before fitting the copula model, site-specific scaling factors were used to standardize the observations from the pooled sites [50]. In this study µ i, S and µ i, D denote the site-specific scaling factor for drought duration and severity, respectively. This approach helped us to increase statistical reliability of the calculations, especially at the sites having small a length of records. The site-specific scaling factor for drought duration and severity of each region and the number of drought events recorded at each station are shown in Table 6. The highest number of drought events (34) were recorded at Tongyeong station. This can be correlated with unusual precipitation patterns in southern coastal areas. This is because of the major contribution of typhoons to the seasonal (particularly summer) precipitation patterns and convective systems within the air mass in southern coastal areas [51]. A study based on spatial patterns of trends in summer precipitation showed a significant increasing trend in amount and intensity of precipitation at southeast coastal areas [20]. Table 6. µ i,D and µ i,S indicate the site-specific scaling factors for drought duration and severity, respectively, and n i indicates the number of drought events recorded at each station.

Region I
Site

Application of Copulas
Prior to fitting the bivariate copulas, it is important to examine the dependence structure between the drought duration and severity. In this study, quantitative dependence was assessed using three correlation coefficient measures, such as Pearson's linear correlation r, Spearman's rank correlation ρ, and Kendall's τ. The Student's t-test at the significance level of 0.05 was used to test the statistical significance of the correlation values. Since Pearson correlation coefficient can only indicate the linear dependence between the variables, it may not be useful for heavy-tailed variables and may be affected by the outliers. Therefore, Kendall's τ was used to describe the wider class of dependencies and to cope with the outliers [52]. Moreover, rank-based evaluation of the correlation between drought duration and severity was also presented in Table 7. The three correlation coefficient test results showed that there is a statistically significant positive dependence between the drought variables for all regions. The qualitative dependence between drought variables were accomplished using graphical diagnostic tools. The pair-wise dependence pattern of drought variables, using Chi and Kendall plots for each region, is presented in Figure 8. According to [53,54], the two variables could be considered as independent if the majority of the events lay within the confidence band of the Chi plots. In the case of the Chi plot, a strong deviation from the control point was observed for all regions. All the points fell near or away from the confidence band of the Chi plots. Furthermore, most of the drought event values were higher than the median value (positive lambda values). In the case of the Kendall plots, two variables could be considered as independent if the majority of the event lay on the diagonal line. Events above the diagonal line indicate positive dependence, and event below the diagonal line indicate the negative dependence. The results of the Kendall plots showed a strong deviation from the diagonal line for all regions. This indicates the presence of strong positive association between drought variables. Since the qualitative and quantitative dependence measures indicated significant positive correlation between the drought variables, and goodness-of-fit tests indicated the best-fitted probability distributions, a copula can be employed to simulate the drought variables using joint distribution. Elliptical and Archimedean copula families, described in Section 2.6, were fitted and compared to find the best-fitted copula. The Ali-Maikhail-Haq copula is also the part of Archimedean copula family, but was not considered in this study, because the application of this copula needs Kendall's τ value to be within the range of [−0.1817, 0.333] [38]. Furthermore, the six copula models were fitted using the MPL method, where the parameters were estimated using either Kappa or PE3 distribution. The performance of the copula families was compared using the Cramér-von Mises test (Sn). The test statistic Sn and its associated p-value can be estimated using either parametric bootstrap [39,55] or by means of a multiplier approach [56,57].
The drawbacks of the parametric bootstrap are that it has very high computational cost, as each iteration requires both random number generation from the fitted copula and the estimation of copula parameters [58]. Additionally, the parametric bootstrap method becomes prohibitive with an increase in the sample size. Therefore, a fast large-sample testing procedure based on the multiplier approach was preferred in this study. The Sn and p-value were computed using the multiplier iteration of 10,000, with the sample length as the historical data ( Table 8). The Sn statistic was computed on the basis of the probability integral transformation of Rosenblatt [59]. In addition to this, the AIC criteria as described in [40] was also used to choose the appropriate copula model. Neither the Joe copula in Regions I, III, and IV nor the Gumbel copula in Region III was able to pass the Sn test at 90% significance level. Since the difference in the Sn statistic between the six copula values was very small, additional AIC statistics were also considered. Therefore, if the Sn statistics for the two copulas were the same, then the copula having a lower value of AIC was preferred. The Gaussian copula was identified as the best copula for Regions I and IV, and the Frank copula for Regions II and III are shown in Table 8 (bold). Estimated parameters using the MPL approach for each copula family are also presented in Table 8. Since the qualitative and quantitative dependence measures indicated significant positive correlation between the drought variables, and goodness-of-fit tests indicated the best-fitted probability distributions, a copula can be employed to simulate the drought variables using joint distribution. Elliptical and Archimedean copula families, described in Section 2.6, were fitted and compared to find the best-fitted copula. The Ali-Maikhail-Haq copula is also the part of Archimedean copula family, but was not considered in this study, because the application of this copula needs Kendall's τ value to be within the range of [−0.1817, 0.333] [38]. Furthermore, the six copula models were fitted using the MPL method, where the parameters were estimated using either Kappa or PE3 distribution. The performance of the copula families was compared using the Cramér-von Mises test (S n ). The test statistic S n and its associated p-value can be estimated using either parametric bootstrap [39,55] or by means of a multiplier approach [56,57].
The drawbacks of the parametric bootstrap are that it has very high computational cost, as each iteration requires both random number generation from the fitted copula and the estimation of copula parameters [58]. Additionally, the parametric bootstrap method becomes prohibitive with an increase in the sample size. Therefore, a fast large-sample testing procedure based on the multiplier approach was preferred in this study. The S n and p-value were computed using the multiplier iteration of 10,000, with the sample length as the historical data ( Table 8). The S n statistic was computed on the basis of the probability integral transformation of Rosenblatt [59]. In addition to this, the AIC criteria as described in [40] was also used to choose the appropriate copula model. Neither the Joe copula in Regions I, III, and IV nor the Gumbel copula in Region III was able to pass the S n test at 90% significance level. Since the difference in the S n statistic between the six copula values was very small, additional AIC statistics were also considered. Therefore, if the S n statistics for the two copulas were the same, then the copula having a lower value of AIC was preferred. The Gaussian copula was identified as the best copula for Regions I and IV, and the Frank copula for Regions II and III are shown in Table 8 (bold). Estimated parameters using the MPL approach for each copula family are also presented in Table 8. Table 8. Goodness-of-fit test and parameters of the bivariate copula for both duration and severity. The fitted copula is indicated in bold. Overall, the AIC and S n statistics indicate the different aspects of the model under study. The S n statistic was used as the formal hypothesis test, as a criterion to rank the copulas according to a predefined significance level, whereas AIC is a relatively simple likelihood criteria-based method, where selection is only a matter of choosing the models with the lowest likelihood value. Results did not show any significant correlation between the two statistical tests; this may be because the numerical tests tend to narrowly focus on a particular aspect of the relationship between the empirical and theoretical copulas, and often try to compress the information into a single descriptive number (see e.g., [60]). The second reason is that the test has been successfully applied to at-site frequency analysis, where the total sample size is very small compared to regional drought frequency analysis. Large sample sizes can affect the performance of an S n goodness-of-fit test statistic, as described in [39]. Comparing the method of computation, the p-value in the S n test statistic is complicated to implement and computationally intensive; however, the AIC has a relatively low computational cost.

Region
Even though the AIC and S n goodness-of-fit test statistics accepted the commonly used copulas in drought studies, the graphical comparison of the empirical and theoretical copulas may not show good agreement. Therefore, joint CDFs and PDFs for drought duration and severity were estimated using the two best-fitted copulas (Gaussian and Frank) mentioned in Table 8, and are shown in Figure 9. Joint CDFs in Figure 9 show the visual comparison between empirical and theoretical copula functions for drought duration and severity. The red dashed contour line shows the empirical copula obtained , R i and S i denote the ranks of the ordered sample. The solid contour line represents the theoretical copula. Results indicate that the Gaussian and Frank empirical copulas from the Regions I and II, respectively, matched well with the theoretical copulas.  However, the Frank and Gaussian empirical copulas from Regions III and IV, respectively, showed some deviations between the theoretical copulas, especially at low probability levels. This may be due to discrete characteristics of drought duration and the relative low accuracy of marginal distribution. Furthermore, the estimation of the probabilities of interest is affected by the majority of "ties" (i.e., identical pairs of drought duration and severity) at the lower probability level. It can be observed from Figure 9 that at high probability levels, all regions showed good agreement between empirical and theoretical copula values. The upper and lower tail dependencies in the case of both the Gaussian and Frank copulas is 0.

Conditional Probability
The copula-based joint distribution of drought duration and severity can provide very useful information for regional drought management. For illustration, the probability when drought duration and severity exceeds a threshold value, which can act as a special condition for a specific water demand and supply system, and may help to propose drought mitigation plan in the region. This probability can be easily derived with the help of copula-based joint distribution.
For example, if d = 3 months and s = 4, as the threshold condition for water demand and the supply system of Region I, then the Kappa distribution would be F(d) = 0.67 and F(s) = 0.75 and the Gaussian copula would yield F (d, s) = C(F (d), F(s) = 0.62. Therefore, the probability that drought duration and severity exceeding three months and 2, respectively, will be equal to 0.20.
Water resource managers are interested to know about the conditional probability P S ≤ s D ≥ d of drought severity, given that drought durations exceeding the threshold values of d can be computed using Equation (4), and the conditional probability P(D ≤ d|S ≥ s ) of drought duration, given drought severity exceeding threshold values of s , can be computed using Equation (5). In this study, d and s took the threshold levels of the 25th, 50th, 75th, and 95th percentile, shown in Figures 10 and 11. Both cases of conditional probability evaluated for the four regions (P S ≤ s D ≥ d and P(D ≤ d|S ≥ s )) showed an increasing trend in the skewness of the conditional probability curves. Based on the computed conditional probability curves, water resource managers can extract useful information; for example, in the case of Region IV, the probabilities for a drought severity of less than 2 or 4, given a drought duration exceeding two months (50th percentile), will be equal to 0.235 and 0.785, respectively ( Figure 10). Similarly, the probabilities for a drought duration less than two or four months, given drought severity exceeding 1.97 (50th percentile), will be equal to 0.331 and 0. 853, respectively ( Figure 11).
Conditional probability curves also help us to evaluate and compare the drought risk between the regions. For example, for a drought severity of less than 3, given a drought duration exceeding three months (75th percentile), corresponding conditional probability (P S ≤ s D ≥ d ) values estimated for Regions I, II III, and IV are 0.532, 0.641, 0.612 and 0.596, respectively. This information can serve as basis to evaluate the capability of water demand and the supply system when considering different drought events, and can be used to trigger a drought contingency plan on regional scale.

Stochastic Simulation of Copulas and Bivariate Return Periods
The simulation of the copulas was performed using 5000 random samples generated from the Gaussian copula for Regions I and IV, and from the Frank copula for Regions II and III, as presented in Figure 12. It can be observed that the simulated data (grey dots) for all regions fitted well with the observed data (red dots) of drought variables. A large number of "ties" (i.e., identical pairs of drought duration and severity) were observed at drought durations of one and two months, and the concentration of observed data decreased with an increase in drought duration. Simulated drought data also showed a similar trend, as it closely resembled that of the observed sample. Pearson's r, Kendall's τ, and Spearman's rank ρ correlation coefficients of simulated drought data for Region I were 0.913, 0.748 and 0.916, respectively; for Region II they were 0.910, 0.839, and 0.967, respectively; for Region III, they were 0.925, 0.846, and 0.970, respectively; and for Region IV they were 0.914, 0.749, and 0.916, respectively. Among the three correlation coefficients, Kendall's τ showed more variation from one region to another. Figure 12 shows that the simulated data tend to concentrate along the main diagonal with the higher Kendall's τ value (Regions II and III, with Kendall's τ values of 0.839 and 0.846, respectively), and tend to disperse along the main diagonal with the lower Kendall's τ value (Regions I and IV with Kendall's τ values of 0.749 and 0.748 and, respectively). This shows that the behavior of a bivariate drought sample may change when the degree of association (e.g., Kendall's τ) between the variables is considered. These results are in accordance with the findings of [61] by using Frank and Gumbel copulas. The contour lines of joint return (standard) periods for 5, 10, 20, 50, 100, 200, and 500 years, computed using Equation (9), are shown in Figure 12. Since the different combination of the correlated drought variables (duration and severity) can occur at the same time, the return periods are shown using contour lines, together with observed and simulated samples.
In drought frequency analysis, the droughts of longer duration and highest severities were always more important, because they can affect water resource planning and may pose a high risk to agriculture. Therefore, historical long-lasting drought events were identified for each region. In the case of Region I, the longest drought event recorded among 55 stations lasted for 13 months (Table 3), at Jecheon station. It lasted from April 2008 to April 2009, with the corresponding severity of 16.04; the joint return period of this event is more than 1000 years. The second longest drought event was recorded at Suncheon and Miryang stations, with the duration of 10 months, from April 1988 to January 1989; their corresponding severities were 16.98 and 16.46, respectively, and the joint return period was closer to 500 years. Simultaneous occurrence of drought events was observed at both stations, and surrounding stations may have similar precipitation patterns.

Stochastic Simulation of Copulas and Bivariate Return Periods
The simulation of the copulas was performed using 5000 random samples generated from the Gaussian copula for Regions I and IV, and from the Frank copula for Regions II and III, as presented in Figure 12. It can be observed that the simulated data (grey dots) for all regions fitted well with the observed data (red dots) of drought variables. A large number of "ties" (i.e., identical pairs of drought duration and severity) were observed at drought durations of one and two months, and the concentration of observed data decreased with an increase in drought duration. Simulated drought data also showed a similar trend, as it closely resembled that of the observed sample. Pearson's r, Kendall's τ, and Spearman's rank ρ correlation coefficients of simulated drought data for Region I were 0.913, 0.748 and 0.916, respectively; for Region II they were 0.910, 0.839, and 0.967, respectively; for Region III, they were 0.925, 0.846, and 0.970, respectively; and for Region IV they were 0.914, 0.749, and 0.916, respectively. Among the three correlation coefficients, Kendall's τ showed more variation from one region to another. Figure 12 shows that the simulated data tend to concentrate along the main diagonal with the higher Kendall's τ value (Regions II and III, with Kendall's τ values of 0.839 and 0.846, respectively), and tend to disperse along the main diagonal with the lower Kendall's τ value (Regions I and IV with Kendall's τ values of 0.749 and 0.748 and, respectively). This shows that the behavior of a bivariate drought sample may change when the degree of association (e.g., Kendall's τ) between the variables is considered. These results are in accordance with the findings of [61] by using Frank and Gumbel copulas. The contour lines of joint return (standard) periods for 5, 10, 20, 50, 100, 200, and 500 years, computed using Equation (9), are shown in Figure 12. Since the different combination of the correlated drought variables (duration and severity) can occur at the same time, the return periods are shown using contour lines, together with observed and simulated samples.
In drought frequency analysis, the droughts of longer duration and highest severities were always more important, because they can affect water resource planning and may pose a high risk to agriculture. Therefore, historical long-lasting drought events were identified for each region. In the case of Region I, the longest drought event recorded among 55 stations lasted for 13 months (Table  3), at Jecheon station. It lasted from April 2008 to April 2009, with the corresponding severity of 16.04; the joint return period of this event is more than 1000 years. The second longest drought event was recorded at Suncheon and Miryang stations, with the duration of 10 months, from April 1988 to January 1989; their corresponding severities were 16.98 and 16.46, respectively, and the joint return period was closer to 500 years. Simultaneous occurrence of drought events was observed at both stations, and surrounding stations may have similar precipitation patterns.
(a)  Since extreme drought events are more of a concern in drought frequency analysis, only droughts of the longest duration and highest severities were selected for spatial representation of drought risk maps. In this study, the drought events with D ≥ 6 months and S ≥ 6.5 were considered, being the droughts of longest duration and highest severities. This criterion was chosen because agriculture and surface water supplies are likely to be affected after six months. Additionally, this criteria was also recommend in previous studies [62]. On the basis of the copula-based bivariate drought return period, the associated drought risk (R) can be computed using the equation to impose the probability of extreme drought with aŤ D, S year return period for N years of life span . Figure 13 indicates spatial patterns of the regional drought risks, using theŤ D, S observed at each site for the life spans of 10 and 50 years. In Figure 13, green, yellow, and red colors indicate the areas of low, moderate, and high drought risks, respectively, and their values vary from 0 to 1. The drought risk map for N = 10 years shows low or nearly moderate drought risk, except on the southwestern coast (areas surrounding Jangheung, Goheung, Wando, and Mokpo stations) and the east coast (Uljin and Yeongdeok stations) of South Korea (Figure 13a). In case when the N = 50 years, most of the areas fall under the category of moderate and high drought risk, except for the area surrounding Yeongju, Mungyeong, Ganghwa, and Chuncheon stations (Figure 13b). Consequently, those regions which are located at the southern (Region I) and eastern coast (Region IV) of South Korea are under high drought risk, compared to other regions. This may be due to changes in precipitation patterns caused by the "Changma front"; thus, the rainy season becomes short, but the amount of the rainfall and the number of heavy rainfall days have increased at the southern and eastern coasts of South Korea [20,64]. Precipitation trend analysis showed the decline in annual mean precipitation on the southwest coast of South Korea [21], which is therefore more vulnerable to drought risk. Kim et al. [3] showed that extreme historical drought events mostly occurred in the central and southern regions of South Korea.

Comparison of the Bivariate Return Periods
Comparison of multivariate drought frequency analysis was performed using the best-fitted copulas for each region (Table 9). T , ("and"), T , ("or"), T , * (Kendall's), and T | , T | (conditional) joint return periods were computed for the duration of 2, 5, 10, 20, 50, 100, 200, 500, and 1000 years for the four regions, accounting for the bivariate drought properties (Table 9). T , , T , , T , * , T | , and T | were computed using Equations (8)-(12), respectively. Since the primary, secondary, and conditional return periods explain the different situations, the preference of the return period may change, based on what type of drought risk needs to be evaluated for the area under study. The different levels of drought risks can be used to assess the malfunction of a specific water demand and supply system. For example, a particular water demand and supply system in Region I may be unable to supply water under a condition of drought severity exceeding 5.67, given a drought duration exceeding 4.41 months. The return period in this situation will be equal to 30.16 years, computed using Equation (12). Table 9 shows the difference in the return period, according to the considered type of risk. For example, for Region I, at a univariate return period of 100 years, the values of drought duration and severities are 8.50 months and 9.65, respectively. However, joint bivariate return periods of drought variables show the return period being 9.95 years for T , (D ≥ 8.50 and S ≥ 9.65) and 8.28 years for T , (D ≥ 8.50 or S ≥ 9.65). Similarly, conditional and secondary return periods for T | , T | , and

Comparison of the Bivariate Return Periods
Comparison of multivariate drought frequency analysis was performed using the best-fitted copulas for each region (Table 9).T D, S ("and"),Ť D, S ("or"), T * D, S (Kendall's), and T D|S ≥ s , T S|D ≥ d (conditional) joint return periods were computed for the duration of 2, 5, 10, 20, 50, 100, 200, 500, and 1000 years for the four regions, accounting for the bivariate drought properties (Table 9).T D, S ,Ť D, S , T * D, S , T D|S ≥ s , and T S|D ≥ d were computed using Equations (8)-(12), respectively. Since the primary, secondary, and conditional return periods explain the different situations, the preference of the return period may change, based on what type of drought risk needs to be evaluated for the area under study. The different levels of drought risks can be used to assess the malfunction of a specific water demand and supply system. For example, a particular water demand and supply system in Region I may be unable to supply water under a condition of drought severity exceeding 5.67, given a drought duration exceeding 4.41 months. The return period in this situation will be equal to 30.16 years, computed using Equation (12).  Table 9 shows the difference in the return period, according to the considered type of risk. For example, for Region I, at a univariate return period of 100 years, the values of drought duration and severities are 8.50 months and 9.65, respectively. However, joint bivariate return periods of drought variables show the return period being 9.95 years forT D, S (D ≥ 8.50 and S ≥ 9.65) and 8.28 years foř T D, S (D ≥ 8.50 or S ≥ 9.65). Similarly, conditional and secondary return periods for T D|S ≥ s , T S|D ≥ d , and T * D, S are 96.04, 84.56, and 9.14 years, respectively. Furthermore, Table 9 shows that Kendall return periods T * D, S are always greater than theŤ D, S return periods and smaller than theT D, S return periods. T D, S is always smaller thanT D, S because the probability of occurrence of two cases simultaneously is always smaller than when only one of the two cases occur. The results match well with the finding of [61]. Moreover, it is noticed that the difference between T * D, S andŤ D, S increases with an increase in critical probability level t. Results showed that conditional return periods T D|S ≥ s and T S|D ≥ d are always greater than the primary and secondary return periodsT D, S ,Ť D, S , and T * D, S , and the conditional return period T D|S ≥ s was greater than T S|D ≥ d . However, direct comparison of different return periods is a difficult task, because each type of return period has a different physical meaning in drought risk analysis.

Conclusions
In the presence of complex topographical and climatic features of South Korea, precipitation varies both in space and time. Therefore, drought is becoming a frequently-occurring phenomena in different parts of the country. Droughts are recognized as the main obstacle to effective water resource management and planning. Regionalization of drought events (extracted using SPI-6) is performed using the HCPC algorithm (a blend of Ward's classification and PCA approach) and bivariate homogeneity tests. Results of bivariate regionalization of drought showed that the whole of South Korea can be divided into four homogeneous regions, which have been tested to be robust. In drought frequency analysis, the lack of lengthy records reduces the reliability of quantile estimation. To overcome this problem, regional frequency analysis of drought on a multivariate framework was performed for the four identified homogeneous regions. Various probability distributions were tested and evaluated, using both quantitative (Z-statistic) and qualitative (visual comparison) goodness-of-fit tests, in order to fit drought duration and severity for all regions. Dependence structures between drought variables were assessed, using various graphical diagnostic tools and correlation coefficients. For the construction of joint distributions between drought variables, best-fitted copulas were identified on the basis of goodness-of-fit test statistics (S n and AIC), and the visual comparison between empirical vs theoretical values. Potential drought risks for all regions across South Korea were evaluated and compared through primary, secondary, and conditional types of joint return periods. The primary conclusions determined from this study are as follows: 1.
The spatial distribution of mean drought duration and severity from 1980 to 2015 indicates that the droughts of longest duration and highest severities were recorded at southwestern coastal areas of South Korea, which is due to the unusual precipitation patterns along the coast.

2.
The Z statistics and L-moment ratio diagram suggested that among the five candidate distributions evaluated for both drought duration and severity, only the PE3 distribution fits better for the drought severity of Regions II, III, and IV. All regions of drought duration showed poor fit because of a large number of ties, especially at short drought durations. Therefore, a more general and robust Kappa distribution model was used instead for the regions having higher Z statistic values (>1.64). The further evaluation of the selected probability distributions, using a visual comparison between empirical and theoretical probabilities, showed that PE3 and Kappa distributions are better able to simulate the drought variables across the region.

3.
Results of Chi plot, Kendall plots, Pearson's r, Spearman's ρ and Kendall's τ correlation coefficients showed the significant positive dependence between the drought variables, and thus indicate the suitability of drought variables for joint modeling.

4.
Goodness-of-fit statistics (S n and AIC), based on Rosenblatt's transformation and visual comparison between empirical and theoretical copula functions, indicate that among six copulas, the bivariate Gaussian copula is better able to simulate the drought variables for Regions I and IV, and the bivariate Frank copula for Regions II and III.

5.
Evaluation of best-fitted copulas by comparing observed and simulated random samples (5000 samples), showed fairly good agreement. It is noticed that Kendall's τ is more sensitive to drought variables as compared to Pearson's r and Spearman's ρ correlation coefficients, as Kendall's τ varies significantly from one region to another. Furthermore, the structure of the simulated data changes, according to the degree of association (e.g., Kendall's τ) between drought variables. 6.
The drought risk maps derived for 10 and 50 year life spans, using the droughts of longest duration and highest severities, showed that the southwestern coast and eastern coastal areas are under high drought risk, and that inland mid-latitude areas (areas surrounding Yeongju station) and northwestern parts are under low drought risk. This is due to the effect of monsoons to the summer precipitation patterns and convective system within the air mass on the southern and eastern coast of South Korea.

7.
Comparison between the different types of return periods showed that secondary return periods T * D, S are always greater than theŤ D, S return periods and smaller than theT D, S return periods, andŤ D, S return periods are always smaller thanT D, S return periods. Moreover, conditional return periods T D|S ≥ s and T S|D ≥ d are always greater than the primary return periodsT D, S anď T D, S , and the secondary return period T * D, S . Since each type of return period explains a different situation, the preference of the return period may change based on type of drought risk that needs to be evaluated for the area under study.