Bayesian Inference in Extremes Using the Four-Parameter Kappa Distribution

: Maximum likelihood estimation (MLE) of the four-parameter kappa distribution (K4D) is known to be occasionally unstable for small sample sizes and to be very sensitive to outliers. To overcome this problem, this study proposes Bayesian analysis of the K4D. Bayesian estimators are obtained by virtue of a posterior distribution using the random walk Metropolis–Hastings algorithm. Five different priors are considered. The properties of the Bayesian estimators are veriﬁed in a simulation study. The empirical Bayesian method turns out to work well. Our approach is then compared to the MLE and the method of the L-moments estimator by calculating the 20-year return level, the conﬁdence interval, and various goodness-of-ﬁt measures. It is also compared to modeling using the generalized extreme value distribution. We illustrate the usefulness of our approach in an application to the annual maximum wind speeds in Udon Thani, Thailand, and to the annual maximum sea-levels in Fremantle, Australia. In the latter example, non-stationarity is modeled through a trend in time on the location parameter. We conclude that Bayesian inference for K4D may be substantially useful for modeling extreme events.


Introduction
Introduced by Hosking [1], the four-parameter kappa distribution (K4D) is a generalized form of some frequently used distributions such as the generalized Pareto, logistic, Gumbel, and extreme value (GEV) distributions. The flexibility of the K4D enables its wide application in fitting data when three parameters are not sufficient. The K4D has been used in many fields, including hydrological sciences (e.g., [2][3][4][5][6][7][8][9]). Blum et al. [10] found that the K4D provides a very good representation of daily streamflow across most physiographic regions in the conterminous United States.
The method of L-moments estimation (LME) has been employed for parameter estimation in the K4D as it provides a reliable and robust parameter estimator, particularly for small sample sizes. However, the LME is sometimes non-feasible with the K4D, in that the estimator is not computable within the feasible parameter space [4,11]. Dupuis and Winchester [11] compared the LME and the maximum likelihood estimator (MLE) of the K4D by simulation. The MLE is optimal for large samples and is applicable to various situations, including non-stationary models. Nonetheless, we found in this study that the MLE of the K4D with a small sample produces relatively large variances of the estimated parameters compared to the LME.
The MLE has a larger sampling variance than LME, whereas LME is right biased. The MLE for a generalized extreme value distribution (GEVD) shows a smaller variance, which may indicate where k and h are shape parameters, ξ is a location parameter, and α is a scale parameter. The probability density function (pdf) is: The changes in extremes are often described in terms of the changes in extreme quantiles. These values are obtained by solving the equation F(x) = 1 − 1/T with respect to x for a given year T; that is, by inverting the following quantile function: Here, T is called a return period. The value x(1 − 1/T) is referred to as the return level (or return value) associated with the return period T, because the value x(1 − 1/T) is expected to be exceeded on average once every T years [12]. For example, a 20-year return value is computed as the 95th percentile of the fitted K4D from x(0.95) (because 1 − 1/20 = 0.95) and a 50-year return level as the 98th percentile.
As shown in Figure 1, the K4D includes many distributions as special cases: the generalized Pareto distribution for h = 1, the GEVD [12,21] for h = 0, the generalized logistic distribution (GLD) for h = −1, and the generalized Gumbel distribution [22] for k = 0. Thus, the K4D is very flexible and widely applicable to many situations including not only extreme values, but also skewed data. Figure 2 shows density plots in which the parameters of the K4D are estimated by different methods. Figure S1 in the SM presents some shapes of the pdf of K4D for various combinations of two shape parameters (k, h) while the scale and location parameters are fixed at α = 1 and ξ = 0.
Parida [3] and Park and Jung [4] employed this distribution to analyze annual maximum daily precipitation in India and in Korea, respectively. It has also been used for the analysis of extreme flood flow series [7] and maximum wind speed [9]. Hosking and Wallis [2], Wallis et al. [6], and Kjeldsen et al. [8] adopted the K4D for regional frequency analysis of daily rainfall extremes and flood flow in the U.K. Shin et al. [23] considered the r-largest K4D. Dupuis and Winchester [11] examined this distribution within the ranges of k ∈ (−0.5, 0.5) and h ∈ (−1.2, 1.2). The range of k ∈ (−0.5, 0.5) is based on the fact that, when h = 0, the distribution has finite variance for k > −0.5 and is regular (has finite Fisher information) for k < 0.5 [12]. We therefore confine this study to within these ranges of parameters.

Maximum Likelihood Estimation
The MLE is the most popular method of estimation and inference for parametric models. Suppose x = (x 1 , x 2 , . . . , x n ) is a sample of size n obtained from the K4D with pdf f (x); assuming k = 0 and h = 0, the negative log-likelihood function of the parameters θ = (ξ, α, k, h) is written as: where The MLEs are obtained by minimizing Equation (4), setting the first partial derivatives of (θ|x) to zero with respect to θ. Since no explicit minimizer is possible, a quasi-Newton algorithm is implemented using the "optim" function in the R program [24] to minimize Equation (4). Park and Park [25] calculated the MLE of the K4D parameters employing a penalty method.
We find that the MLE of the K4D with a small sample performs poorly and is unstable, especially when the scale parameter k is negative. It produces relatively large variances of estimated parameters compared to the LME ( Figure 2). This poor performance of the MLE with a small sample may be due to an intrinsic property of the K4D or due to adding one more parameter from GEVD with more constraints, which makes it harder for the optimization routine to reach the true solution. The K4D however may result in less biased estimates than the GEVD does, by the bias-variance trade-off principle.

L-Moments Estimation
The main benefits of using LME are that the parameter estimates are more credible than the method of moments estimates, particularly for small samples, and are computationally more efficient than the MLE. LME is known to work better than the MLE for small sample sizes. Moreover, it is less sensitive to outliers [2].
Hosking [26] introduced L-moments as a linear combination of expectations of order statistics. The sample L-moment as an unbiased estimator of the population L-moment is calculated from a sample. The LME is acquired by solving a system of equations that equate population L-moments to the corresponding sample L-moments. The system of equations is sometimes solved numerically using Newton-Raphson iterations. However, the iterations sometimes do not converge to the solution. Dupuis and Winchester [11] reported that this non-feasibility can be observed in up to 40% of trials in theK4D. LME has been used widely in many fields, including environmental sciences [8,27]. The details are omitted here because it is a standard method for analyzing skewed data and extreme values. See [1,2] for the L-moments formula and the formula for obtaining the LME of the K4D.
The LME of the K4D was used for regional flood frequency analysis in Kjeldsen et al. [8] and Hosking and Wallis [2]. Parida [3] and Moses and Parida [9] used the LME of the K4D for modeling Indian summer monsoon rainfall and Botswana's monthly maximum wind speed, respectively. Murshed et al. [7] considered the higher order L-moment estimator in the K4D with application to annual maximum flood and sea-level data. Dupuis and Winchester [11] compared the performance of the MLE and LME of the K4D. The disadvantage of LME is that it is difficult to apply it to non-stationary or multivariate data. For this study, we used the R package "lmom" developed by Hosking [26] to calculate the LME of the K4D.

Bayesian Inference
The main difference between classical frequentist theory and Bayesian estimation is that the latter considers parameters as random variables represented by a prior distribution. This prior distribution is incorporated with the likelihood to obtain the posterior of the parameter on which the statistical inference is based. We now set up a framework to incorporate prior information π(θ) with the data. Let us assume that x = (x 1 , . . . , x n ) are independent realizations of a random variable whose probability density lives within a parametric family { f (x; θ) : θ ∈ Θ}.
The posterior distribution is proportional to the product of the likelihood function and the prior, but the computation of the normalizing constant Θ π(θ)L(θ|x)dθ can be problematic, especially when θ is a high-dimensional vector. Although this can be addressed in various ways, recent advances in the Markov chain Monte Carlo (MCMC) method have revolutionized the subject. It enables the statistical inference of models that would otherwise not be available [28]. Our approach of obtaining the posterior is the random walk Metropolis-Hastings algorithm. Specifically, Steps 3 and 4 described in the following subsection are for the conditional posterior distribution and for the incorporation with the parameter k.

Computation by Markov Chain Monte Carlo
Using the MCMC technique, we pursue generating (stationary) sequences of simulated values from the posterior distribution. The Metropolis-Hastings algorithm was developed by Metropolis et al. [29] and generalized by Hastings [30], and it is now the most popular MCMC method (see [31][32][33], for example).
The random variables x are distributed as the K4D with the likelihood function L(k, h, α, ξ|x). To specify the prior, the parametrization φ = log(α) is applied because α should be positive. The independent prior density is set as: where π k (k), π h (h), π φ (φ), and π ξ (ξ) are the marginal priors of k, h, φ, and ξ, respectively. Our interest is the posterior density: By Equation (5), the posterior is of the form: Hence, the conditionals of the posterior distribution of parameters k, h, φ, and ξ are as follows: To simulate each of the above conditionals of the posterior distribution, a Metropolis step with a random walk update is employed, because the above form is a multivariate density [34].
In the MCMC algorithm, the value of r = 2000 is the burn-in period, and the first r values are discarded from the chain. The maximum a posteriori (MAP) estimator is: where i = r + 1, . . . , R, and R is the number of iterations in the Metropolis-Hastings algorithm. We set R = 10,000. The full details of the random walk Metropolis-Hastings algorithm employed in this study are described as follows:

1.
Start with an initial value Generate a so-called "proposed value" k * ∼ N(k (i−1) , s 2 k ), where the "proposed" standard deviation of k, s k is specified. Set: . Accept: where the proposed standard deviation of h, s h is specified. Set: . Accept: , where the proposed standard deviation of φ, s φ is specified. Set: . Accept: , where the proposed standard deviation of ξ, s ξ is specified. Set: . Accept:
The final step is to transform our chain for φ using α (i) = exp(φ (i) ) to regain our original scale parameter.
Here, the "proposed" standard deviation (s k , s h , s φ , and s ξ ) was chosen by a calibration process to make the acceptance rates be not far from 30% such as between 25% and 35% [35]. Actually, at the very beginning of the algorithm, the initial values of standard deviations are given such that all are equal to 1.0. When the iterations of Metropolis-Hastings algorithm are done, the acceptance rate (AR) is provided. If the AR is greater than 35%, the standard deviations are updated to be larger than the previous ones. If the AR is smaller than 25%, those are updated to be smaller than the previous ones. Then Metropolis-Hastings algorithm is run again with the updated standard deviations, until the AR falls between 25% and 35%.

Prior Specification
The principal argument against a Bayesian analysis is that the results are sometimes too sensitive to the choice of the prior distribution. Previous research on similar models or relevant data has provided some details on the prior distribution. Many researchers have applied the GEVD to hydrologic data. Some studies have offered a more exact constraint on the range of the shape parameters. Martins and Stedinger [36] found that −0.3 to 0 is the most likely range for the shape parameter for GEVD. Sometimes, confusion arises regarding the sign of k between Martins and Stedinger [36] and Coles [12]. For consistency with the definition of the K4D, we follow the former.
Previous MLE research for long records of heavy rainfall in the U.K. have found the mean estimate of the shape parameter to be −0.1 with a standard deviation of 0.1 [37]. Extensive empirical evidence was provided by Koutsoyiannis [38], who examined 169 datasets of extreme rainfall records worldwide of a duration of at least 100 years. The investigation concluded that the mean of the estimated shape parameter is about −0.15, with a small standard deviation. They also found that 91% of the MLEs for the shape parameter were negative, which adds further weight to the belief that the shape parameter is negative for heavy rainfall. Based on this information, we set the following priors: P1: Non-informative priors on all shape parameters: k ∼ N(0, 1) and h ∼ N(0, 2). Given that the ranges of k and h are (−0.5, 0.5) and (−1.2, 1.2), respectively, the variances of this prior make the distributions almost non-informative. P2: Priors P2 are based on the hydrologic experience that the shape parameters of the GEVD are between −0.3 to 0 for rainfall data [36,38]: k ∼ N(−0.15, 0.2) and h ∼ N(−0.15, 0.5).
P3: Priors P3 are normal distributions with means of −0.15 and 0.15, respectively, but with different variances reflecting different levels of uncertainty because there is no previous evidence on h.
Note that the sign for h is positive: k ∼ N(−0.15, 0.01) and h ∼ N(0.15, (0.25) 2 ). P4: Tight normal distributions around the shape parameter to assume the best scenario: k ∼ N(k opt , 0.01) and h ∼ N(h opt , (0.25) 2 ). In real applications, the mean values (k opt and h opt , respectively) of k and h are estimated by LME from data. P5: When k < 0 and h < 0, negative priors correspond to k = −|γ 1 The non-informative prior (P1) is least sensitive to the selection of priors and still covers the range of true shape parameters. We therefore expect the Bayes estimator with P1 to behave similarly to the MLE and LME, but it still supplies a more comprehensive summary of the analysis, such as posterior density estimates of parameters and return values. A feature of P4 is that the mean values of k and h are estimated by LME from data, which can be viewed as an empirical Bayes approach.

Simulation Study
We now compare the performances of the MLE, LME, and BEs for the K4D. A total of 1000 samples are tried to calculate the bias (BIAS) and the root mean squared error (RMSE) of each parameter and the bias (BIASQ) and the root mean squared error (RMSEQ) of the 95th percentile estimators. The BIAS, RMSE, BIASQ, and RMSEQ are defined as: whereθ and θ are the estimated and true parameters, respectively, and q e i and q t are the estimated and true quantiles, respectively.  Table S1 in the SM (Supplementary Material) when the sample sizes of the simulation are 30, 50, 100, and 200. Figure S2 in the SM shows the estimation bias of the shape parameter of k. We can see that the MLE and LME are positively biased while the BEs are negatively biased. The BE with prior P4 yields consistently good estimates.All estimators have decreasing bias as k and h increase. It is interesting to note that the BE with P1 (non-informative prior) has a relatively small bias compared to the others. Figure S3 in the SM shows the estimation bias of h. The MLE is negatively biased, while LME is still positively biased. BEs have a similar pattern and relative performance as in Figure 4.
The comparisons of the RMSE for shape parameter k are given in Figure 3. For negative k, the MLE and LME work relatively poorly, but they do not work so bad for positive k compared to the BEs.
For positive k, the RMSE of the BEs increases as h increases. BEs work well in general except for high positive values of k and h. In Figure 4, the MLEs have the highest RMSE in estimating h. The BEs work better than LME for negative k and work similarly for LME for positive k in the RMSE sense. From the result based on the terms of the bias and RMSE for k and h estimations, we observe that the BEs work well.    Figure 6. All the methods have decreasing RMSEQ as k and h increase. The MLE and BE with P1 do not work well for negative k, while the MLE and LME work well for positive k. The BE with P3 works well for negative k, while it works poorly for positive k. From the result based on the BIASQ and RMSEQ, we can infer that the MLE and LME work well in estimating quantiles when k is positive, even though these do not work well in estimating shape parameters in some cases. The prior P4 results in low RMSEQ values over all combinations of k and h. This is possibly because the mean values of k and h are estimated by LME from data. This result suggests that an empirical Bayesian method works well in the K4D. Overall, the BE with a non-informative prior (especially with P4) works well, even though there is no universal winner. The results of the Bayesian, maximum likelihood analysis, and L-moments methods are consistent with each other, whereas the Bayesian analysis makes an easy transformation possible across scales, which is independent of the feasibility of LME and the asymptotic optimality of the MLE. Given that the MLE has a relatively larger variance and LME is sometimes non-feasible for the K4D and not applicable to non-stationary data, we can infer that the Bayesian framework (especially an empirical Bayes method) offers substantial benefits for the analysis of extreme events [12].

Maximum Wind Speeds in Udon Thani, Thailand
Our example data consist of 26 annual maximum wind speeds (unit: km/h) in Udon Thani, Thailand, from 1990 to 2015 collected by the Meteorological Department of Thailand. The data are provided in the SM. Table 1 presents the summary statistics including parameter estimates with the standard error in parentheses, 20-year return levels, and 95% confidence intervals. Table 2 provides the goodness-of-fit measures including the Kolmogorov-Smirnov (K-S) test statistic, the Anderson-Darling (A-D) test statistic, and the information criteria such as the Akaike (AIC), Bayesian (BIC), and deviance information criterion (DIC). For all these measures, smaller is better. The MLE and LME for the GEV distribution are also included for comparison.
The DIC is a Bayesian analogy of the AIC. It has been successfully employed in model selection problems in which the posterior distributions of the models have been acquired by MCMC simulation. It is defined as: where D(θ) = D(E θ|x [θ]) is the deviance at the mean posterior estimate,D = E θ|x [D] is the expected value of the deviance over the posterior, and p D =D − D(θ). Here, the deviance is defined as D(θ) = −2 log L(θ|x) [28,39]. A credible interval for the Bayesian estimate is obtained using the highest posterior density (HPD). The 95% confidence intervals (CI) for the 20-year return level of the MLE and LME are also obtained for comparison purposes. The profile likelihood method and a bias-corrected with acceleration constant bootstrap method are also considered in calculating the CI. The details of these methods are provided in the SM.
In Bayesian statistics, a credible (or posterior) interval is an interval within which an unobserved parameter value maintains a given probability [28,39]. Credible intervals are not unique on a posterior or a predictive distribution. Among some methods for defining an appropriate credible interval, the HPD interval chooses the narrowest interval, which consists of those values of highest probability density including the maximum a posteriori for a unimodal posterior distribution [39].
The HPD interval is detailed as follows: The 95% HPD consists of such values of θ that have the minimal level of posterior credibility so that the total probability of all those θ values is 0.95. A 100 (1−β)% HPD region for θ is a subset B ∈ Θ defined by: where λ(β) is the largest constant satisfying: The value λ(β) is thought of as a horizontal line located over the posterior density, in which intersections with the posterior define regions with probability 1 − β [39]. We calculated the HPD interval using the "coda" package in the R program, developed by Plummer et al. [40].
In Table 1, the 20-year return levels calculated from the BEs are greater than the corresponding values from MLE. This may be due to the implicitly permitted portion for uncertainty in parameter estimation [12]. In Table 2, the BEs with five priors yield p-values of K-S and of A-D larger than 0.7, which indicates that the maximum wind speed data are well fitted by the K4D with the BEs. Based on the length of the confidence intervals on the 20-year return levels, LME and the BE with P4 work well. The BEs with P1 and P4 have the smallest DIC values. In Tables 1 and 2, the values from the MLE, LME, AIC, and BIC are presented in this study only for the purpose of comparison with the BEs. In real Bayesian data analysis, analysts are usually not required to obtain these values, even though one can calculate and use them.   Figure 7 shows the probability density plots of the K4D fitted to the example data by various methods. Except the BEs with P3 and P5, other methods work similarly. Figure 8 shows the posterior densities and the HPD intervals of the 20-year return level for the annual maximum wind speeds in Udon Thani, obtained from the BEs. We have the narrowest and second narrowest HPD intervals from the BEs with P4 and P1, respectively. The BEs with P3 and P5 provide the widest HPD intervals. As observed in the simulation study, the BE with P4 (empirical Bayes method) works well in this example as well. For comparison purposes only, Figure S4 in the SM shows the profile log-likelihood function and a 95% confidence interval for the 20-year return value of the the annual maximum wind speeds in Udon Thani. The interval is [41.09, 48.13]. Figure 9 shows the densities of five priors (dashed lines), P1 to P5 considered in the Section 4.2, and of the posteriors (solid lines) for each parameter. Priors P3 and P4 for parameters h and k show that the posterior densities are strongly influenced by the assumed priors. These are not ideal, but inevitable for prior P4 because the mean values of h and k in P4 were estimated from data by the LME. The reason for prior P3 seems because the assumed variances of h and k in P3 were relatively small compared to those in priors P1, P2, and P5.

Parameter Estimates CIs of 20-Year Return Level
Using the method provided in Coles [12] (Section 9.1.2), the plots of the predictive distribution of a future annual maximum wind speed obtained by the BEs with five priors are drawn in Figure 10 on a log 10 return period scale. The predictive return levels of 1.6-year and 2.4-year return period for fives priors are between 40 to 55 km/h and 42 to 60 km/h, respectively, for the annual maximum wind speeds at Udon Thani. The convergence of the MCMC chains can be assessed by some diagnostic measures including the potential scale reduction factor (PSRF) [41]. The estimated PSRF close to 1.0 indicates a successful convergence of the Markov chains. Table 3 provides the point estimates of PSRF and the upper bounds of its 95% confidence intervals by iterations in the Markov chains for each parameter based on the prior P1 considered in the Section 4.2 for the Udon Thani data. As iterations in the chain increase, the PSRF values decrease to 1.0. Figure 11 shows the trace plots of the estimated PSRF (solid line) and the upper bounds (dashed line) of its 95% confidence intervals in the Markov chains for each parameter. It appears to begin to be stationary after 500 iterations. Then, a burn-in is sufficient for 2000 iterations.
The effective numbers of the sample size (or independent draws) for each parameters k, h, α, and ξ are obtained as 200, 154, 105, and 255, respectively. These numbers are calculated by measuring the autocorrelations within each sequence, based on the formula in [41]. If the effective number of independent draws is small, variances within the sequence will have a high sampling variability. Thus, α among four parameters has the highest sampling variability Moreover, the values generated by 10,000 iterations of the chain for diagnostic analysis are plotted in Figure S5, which also shows that the chain converges. Figure S6 shows plots of the posterior correlations of the parameters. It seems there are weak non-linear correlations among parameters, except between h and α. This may cause a problem resulting in long auto-correlation times. To make up for this correlation, it is practical to run MCMC sufficiently longer than the effective sample size [41].   The next example data are the annual maximum sea-levels (unit: meter) recorded at Fremantle, near Perth, Western Australia, over the period 1897-1989. The data were analyzed in Coles [12] and are available from the website therein. As seen in Figure 12, there is a discernible increase in the data over time, although the increase seems marginal in recent years. A linear trend model of the location parameter is therefore introduced, with the scale and shape parameters remain constant: Inference is now sought for the five parameters through the posterior distribution. Prior distributions are specified as follows, with a reparameterization φ = log(α): φ ∼ N(0, 100), (24) k ∼ N(0, 1), (25) h ∼ N(0, 2).
The distributions for φ, k, and h are the same as the non-informative Prior 1 (P1) considered in the Section 4.2. To preserve the location parameter be positive, we set ξ 0 ∼ LogN(0, 100) instead of N(0, 100). Table 4 provides the estimates and DICs where the K4D and GEV distribution were fitted to the data. Based on the DIC, the Bayesian approach using the non-stationary K4D seems better than others. It is notable that a Bayes estimate of k on the non-stationary K4D is almost equal to zero, which results in the generalized Gumbel distribution [22]. Figure 13 shows the posterior pdf of each parameter and the 100-year return level. Given that the pdf of parameter ξ 1 is mostly concentrated on positive values, a positive trend may be suspected. Figure 12 shows the evolutions of the 20-year and 100-year return levels over time, with the 95% HPD intervals. The 20-year return level in 1960 is the level for 20 years corresponding to 1960. It is not calculated by extrapolating the time trend for the period 1960-1980, but calculated from the quantile function of the fitted non-stationary K4D. The above table and figures show that the non-stationary K4D with a Bayes estimate fits these sea-level data well.
Year Sea−level (meters)  1897 1905 1916 1923 1931 1938 1946 1953 1960 1967 1974 1981 1988 Table 4. Comparison of the estimates and deviance information criterion (DIC) obtained from the stationary and non-stationary GEVD and the K4D using Bayesian methods with the prior P1 given in the Section 4.2 for the annual maximum sea-levels (unit: meter) recorded at Fremantle, Western Australia, 1897-1989.

Methods
Parameter  Figure S7 in the SM shows the MCMC realizations of the posterior density function from 10,000 iterations for estimating the 100-year return level by the Bayesian method with the prior P1 given in the Section 4.2 using non-stationary K4D.

Conclusions and Discussion
This study considers Bayesian inference with five prior distributions for the K4D. A random walk Metropolis-Hastings algorithm is employed for simulating the posterior distribution. The properties of the BEs with five priors are verified by simulation and compared to those of the MLE and LME. The simulation results reveal that even though there is no universal winner among the considered estimators, the BE with prior P4 works well. This is because the mean values of k and h are estimated by LME from data, which suggests that an empirical Bayesian method works well in the K4D. The result of Bayesian estimation, maximum likelihood analysis, and the L-moments method are consistent with each other, whereas the Bayesian analysis makes an easy transformation possible across scales that does not depend on the feasibility of LME and the asymptotic optimality of the MLE.
In real-world examples, the annual maximum wind speed at Udon Thani, Thailand, was analyzed by the K4D. The data were well fitted by the K4D with the BEs. Based on the length of the confidence intervals on the 20-year return levels, LME and the BE with P4 work well. In another example with the annual maximum sea-level data in Fremantle, Australia, non-stationarity was modeled through a linear trend in time on the location parameter. These examples illustrate the improved efficiency of the Bayesian procedure in the parameter estimation of the K4D.
In general, the expected benefit of the Bayesian analysis is the improvements in the precision of the parameter estimates over the MLE and LME. One reason the BEs work better than the MLE and LME for a shape parameter of negative k is because the mean values of the priors of k are set to negative. These priors function well regarding the samples derived from the K4D with negative k. If we also take a prior that reflects a case of positive k, the result may be improved.
The value h = 1.0 was not included in the simulation study because our pilot experiments of MCMC failed with a zero acceptance rate when h = 1.0. We were not successful in finding the reason why MCMC did not work in the case when h = 1. It may need an improved method as in [42]. We leave this task for future study.
In this study, the prior distribution of two shape parameters for the K4D were specified as having independent distributions. Future studies should find a bivariate prior for K4D that incorporates the joint information on k and h. The R code used to calculate the BEs of the K4D is available in the SM.
Supplementary Materials: The following supporting information is available as part of the online article: www. mdpi.com/xxx/s1. Figure S1. Shape of the probability density functions of the four-parameter kappa distribution with different combinations of k and h at α = 1 and ξ = 0. Figure S2. Average of the estimate bias (BIAS) of k under various combinations of k and h for sample size 30. The acronyms MLE, LME, and Pi represent the maximum likelihood estimator, the method of L-moments estimator, and the Bayesian estimator with prior Pi, respectively. Figure S3. Same as Figure S2, but the BIAS for h. Figure S4. Profile likelihood and 95% confidence interval for the 20-year return levels of the K4D for the annual maximum wind speeds at Udon Thani, Thailand, where the parameters are estimated using the maximum likelihood method. Figure S5. Trace plots of the K4D parameters using five priors for the annual maximum wind speeds data at Udon Thani, Thailand. Figure S6. The posterior correlations of the parameters obtained from the annual maximum wind speed at Udon Thani, Thailand. Figure S7. MCMC realizations of posterior density function and 100-year return level with the nonstationary prior P1. Table S1. Average CPU time per case (seconds/data) using the Intel(R) Core(TM) i5-6300HQ CPU @ 2.30 GHz (RAM 8 GB). R program. R code to calculate the Bayesian estimates of the four-parameter kappa distribution.