Bayesian Inference for the Parameters of Kumaraswamy Distribution via Ranked Set Sampling

: In this paper, we address the estimation of the parameters for a two-parameter Kumaraswamy distribution by using the maximum likelihood and Bayesian methods based on simple random sampling, ranked set sampling, and maximum ranked set sampling with unequal samples. The Bayes loss functions used are symmetric and asymmetric. The Metropolis-Hastings-within-Gibbs algorithm was employed to calculate the Bayes point estimates and credible intervals. We illustrate a simulation experiment to compare the implications of the proposed point estimators in sense of bias, estimated risk, and relative efﬁciency as well as evaluate the interval estimators in terms of average conﬁdence interval length and coverage percentage. Finally, a real-life example and remarks are presented.


Introduction
The Kumaraswamy distribution was proposed for double-bounded random processes by [1]. The cumulative distribution function (CDF) and the corresponding probability density function (PDF) can be expressed as and: f (x; θ, β) = θβx θ−1 1 − x θ β−1 , 0 < x < 1; θ, β > 0 respectively. For simplicity, we denote Kumaraswamy distribution with two positive parameters θ and β as K(θ, β). Based on varying values of θ and β, there are similar shape properties between Kumaraswamy distribution and Beta distribution. However, the former is superior to the latter in some respects: there are not any special functions involved in K(θ, β) and its quantile function; the generation of random variables is simple, as L-moments and moments of order statistics for K(θ, β) have simple formulars (see [2]). For the PDF of K(θ, β), as shown on the left of Figure 1, when θ > 1 and β > 1, it is unimodal; when θ > 1 and β ≤ 1, it is increasing; when θ ≤ 1 and β > 1, it is decreasing; when θ < 1 and β < 1, it is uniantimodal; when θ = β = 1, it is constant. The CDF of K(θ, β) (the plot is presented on the right of Figure 1) has an explicit expression, while the CDF of the Beta distribution appears in an integral form. Therefore, Kumaraswamy distribution is considered as a substitutive model for Beta distribution in practical terms. For instance, Kumaraswamy distribution has an outstanding fitting effect on daily rainfall, individual heights, unemployment numbers, and ball bearing revolutions (see [3][4][5][6]). Ranked set sampling (RSS), first introduced by [15], is considered to be a more effective method than simple random sampling (SRS) in estimating the mean values of a distribution (see [16]). It is appropriate to employ RSS in many fields where sampling schemes cost a lot of human strength and money. It has a widespread application in the domains of the environment, biology, engineering applications, and agriculture (see [17][18][19][20]). Recently, the authors in Ref. [21] proposed a new sampling design known as the maximum ranked set sampling procedure with unequal samples (MRSSU).
The difficulties overcome in this paper are that (a) it is difficult to obtain the solutions of the equations of maximum likelihood estimation to tackle Big Data problems [22]. Hence, we adopted the Newton-Raphson iterative algorithm to derive the approximate and numerical roots; (b) since it is difficult to express the integrals for Bayes estimates in closed forms [23], we chose the Metropolis-Hastings-within-Gibbs algorithm to obtain the estimates [24][25][26][27][28].
The organization of this paper is summarized as follows. In Sections 2-4, we consider point and interval estimates for Kumaraswamy distribution based on three sampling schemes, namely SRS, RSS, and MRSSU, and two estimation methods of maximum likelihood estimation and Bayes estimation. Section 5 compares the performance of proposed estimators and Section 6 presents an illustrative example. This paper concludes with some comments in Section 7.

Maximum Likelihood Estimations Based on SRS
Considering that X = (X 1 , X 2 , . . . , X s ) is an SRS scheme of size s from K(θ, β) and the likelihood function is given by The corresponding log-likelihood function is: Calculate the first derivatives of (3) about θ, β and equate them to zero. The maximum likelihood estimates (MLEs) of θ and β can be earned by solving follow equations: We obtain the MLE of β and denote it aŝ According to the formula in (6),β MLE exists and is unique for fixed θ. Substituting the (6) into (3), the profile log-likelihood function of θ can be expressed as θ MLE is defined to the MLE of θ, which can be obtained by maximizing (7). Since it is difficult to determine the explicit solutions of θ and β, the Newton-Raphson iterative algorithm is adopted to obtain MLEs.

Theorem 1.θ MLE exists and is unique.
Proof of Theorem 1. Calculate the first and second derivatives of (7) about θ: Letη MLE.SRS be the MLEs of η based on SRS. The inverse matrix of I(η) is the approximate variance-covariance matrix of MLEs with: The asymptotic distribution ofη MLE.SRS is (η MLE.SRS − η) → N(0, I −1 (η MLE.SRS )). Thus, we derive the two-sided 100(1 − τ)% approximate confidence intervals (ACIs) for θ and β. Theorem 2. The two-sided 100(1 − τ)% ACIs for θ, β based on SRS are: Here, u τ is the (1 − τ)th quantile of N(0, 1), the standard normal distribution. Since the lower bound for the CI will possibly be less than zero and θ, β are greater than zero, we set the lower bound to be the larger value of the computed lower bound and zero.

Bayes Estimations Based on SRS
In this section, we focus on Bayesian inference of θ and β based on SRS, and obtain Bayesian estimates (BEs) and credible intervals (CrIs). Let us consider the following independent gamma priors: The joint posterior distribution of (θ, β) can be expressed as where: The conditional posterior densities of θ and β are written as and: Generally, Bayes point estimators are acquired by minimizing the loss function. Here, we consider two typical loss functions. The first one is symmetric, which is the square error (SE) loss function. Letd be the estimator of d, and the definition of SE loss function and Bayes estimators are given by Symmetric functions assign equal weight to both overestimation and underestimation, so there emerge many asymmetric functions. The second is the general entropy (GE) loss function, which is an asymmetric function proposed by [29] defined as The corresponding Bayes estimator is: Here, the positive errors (d > d) make more mistakes than the negative errors when k > 0, and vice versa when k < 0.
Based on SE and GE loss functions, Bayes point estimators of θ can be gained aŝ and: provided that the expectations exist, while the Bayes estimator of β is able to be obtained in a similar way. It is easily observed that while using the above integrals, it is difficult to gain the explicit solutions. Therefore, we employed the MHG algorithm to generate samples from the posterior density distribution (13) and computed the approximated BEs and the 100(1 − γ)% symmetric CrIs. More information about this algorithm can be found in [30]. The logarithmic calculations in the 5th and 11th steps of Algorithm 1 are for the purpose of narrowing down the value to facilitate the calculation and preventing the generation of infinity in the process of program operation.
We can obtain the approximated Bayes point estimates of θ and β under the SE and GE loss functions asθ where H is a burn-in period (which refers to the iteration times required before the stationary distribution is reached).β * BE.SE.SRS andβ * BE.GE.SRS can be acquired by a similar way. Initial θ (0) =θ MLE , β (0) =β MLE and T = 1. while T ≤ M do Generate θ and β from N(θ (T−1) , σ 1 ) and N(β (T−1) , σ 2 ), two normal distributions, where σ 1 and σ 2 are diagonal elements of the variance-covariance matrix.

Maximum Likelihood Estimations Based on RSS
The RSS scheme aims to sample and observe the data by using the ranking mechanism. First, we draw m random sample sets of size m from K(θ, β) and sort each dataset from the smallest to the largest, meaning that: . . . X Here, X (i) j(p:m) means p-th ranked units, j-th set, and i-th cycle (i = 1, 2, . . . , n; j = 1, 2, . . . , m). Then, the j-th ordered unit from the j-th set (X (i) j(j:m) ) is chosen for practical quantification. After n cycle of above process, we end up with a complete RSS scheme of size mn expressed as where m is the set size and n is the number of cycles.
To simplify the notations, we use X ij to denote X (i) j(j:m) . Let X * = (X 11 , X 12 , . . . , X nm ). The likelihood function is given by where x * = (x 11 , x 12 , . . . , x nm ) is the observed sample of X * , and the log-likelihood function is: Calculate the first derivatives of (14) about θ, β and equate them to zero. The MLEs of θ, β can be earned by solving the follow system of equations: The second derivatives of (14) are expressed as We can derive the expected Fisher information matrix I * (η), where: Because the integral form of matrix I * (η) is difficult to calculate, the observed Fisher information matrix J * (η) is used to replace matrix I * (η), where: Letη MLE.RSS be the MLEs of η. The inverse matrix of J * (η) is the approximate variance-covariance matrix of MLEs with: Theorem 3. The two-sided 100(1 − τ)% ACIs for θ and β based on RSS are given by and: respectively.

Bayes Estimations Based on RSS
We still use independent gamma priors (11) and (12). The joint posterior distribution of (θ, β) is: The conditional posterior densities of θ, β are given by and: The Algorithm 1 mentioned in Section 2 was employed to generate a large number of samples from the posterior distribution (18). Then, Bayes point estimators of θ under SE and GE loss functions are given bŷ Then, symmetric Bayes CrIs of θ is given by Similarly, we can derive the Bayes point estimators and symmetric Bayes CrI for β.

Maximum Likelihood Estimations Based on MRSSU
First, we select m random sample sets from the population, where the j-th sample set possesses j units (j = 1, 2, . . . , m) and sort the units in each dataset from smallest to largest, meaning that: Here, X (i) j(p:m) means p-th ranked units, the j-th set which contains j units and i-th cycle (i = 1, 2, . . . , n; j = 1, 2, . . . , m). In each circle, the total number of units of samples we draw is Then, the maximum element in each sample set (X (i) j(j:m) ) is chosen for practical quantification. After n iterations of the above process, we end up with a complete MRSSU of size mn expressed as where m is the set size and n is the number of cycles.
To simplify the notations, we used Y ij to denote X . The corresponding likelihood function is: where y = (y 11 , y 12 , . . . , y nm ) is the observation of Y, and the log-likelihood function is: Calculate the first derivatives of (19) about θ, β and equate them to zero. By solving the follow equations: the MLEs of θ, β can be earned. The second derivatives of (19) are expressed as Thus, we derive expected Fisher information matrix I * * (η), where: We adopt the observed Fisher information matrix J * * (η) to replace matrix I * * (η), where: Letη MLE.MRSSU be the MLEs of η. The inverse matrix of J * * (η) is the approximate variance-covariance matrix of MLEs with: Theorem 4. The two-sided 100(1 − τ)% ACIs for θ, β based MRSSU are given by respectively.

Bayes Estimations Based on MRSSU
We use the same independent gamma priors (11) and (12). The joint posterior distribution of (θ, β) is: where: The conditional posterior densities of θ, β are denoted by and: The Algorithm 1 is used to generate a large number of samples from the posterior distribution (23) Similarly, we can derive the Bayes point estimators and symmetric Bayes CrI for β.

Simulation Result
In this section, we intend to carry out comparisons of the SRS, RSS, and MRSSU schemes under different hyperparameter values and sample sizes, by comparing the numerical performance of point estimates and interval estimates. We evaluate bias and estimated risks (ERs) based on SE and GE loss functions by the following formulas: Here, N denotes the iteration number of the simulation whileθ t means the estimate obtained in the t-th iteration. As for the three sampling methods, namely RSR, RSS and MRSSU, we can draw the samples, respectively, from K(θ, β) through Algorithms 2-4:

Algorithm 3
The random number generator for K(θ, β) based on RSS Input: θ, β, n and m.
In addition, based on different sampling schemes, we evaluate the relative efficiency (RE) values based on SE and GE loss functions which are given by Here,θ SRS ,θ RSS ,θ MRSSU are the corresponding estimates of θ based on different sample methods. Obviously, if ER i < 1(i = 1, 2, . . . , 6), then we know the estimate of θ in the numerator outperforms the estimate of θ in the denominator. Equally, the bias, ERs and RE values of β can be calculated by similar formulas.
In the simulation, all calculations are performed by R language while true values are denoted to (θ, β) = (2, 3). We design three sampling schemes and two prior sets, namely the informative prior distribution and non-informative prior distribution. Each sampling scheme contains three sampling methods: SRS, RSS, and MRSSU. All information related to schemes and priors is given as follows:  Table 4, and for BEs of θ and β based on SE loss function and GE loss function (k = 0.2, k = 0.8) are presented on Tables 5-7, respectively.
We also calculate the average CIs lengths (ACLs) and coverage percentages (CPs) to compare the performance of the interval estimators. For each case, we generate 90% CIs and verify whether (θ, β) = (2, 3) fall into the corresponding CIs with 2500 iterations. The results are presented in Table 8.
In Tables 1 and 2, it can be seen that the ERs of estimates become small when the number of samples increases. Furthermore, the Bayesian methods outperform the maximum likelihood methods, while the estimators of θ are better than the estimators of β. For maximum likelihood estimations, the ERs of the estimators of β are larger when the sample sets are smaller. For example, the ER ofθ MLE.SRS under the scheme I is 2.37421, while the ER ofθ MLE.SRS under schemes II and III is 1.26192 and 0.81074, respectively. For Bayesian estimation, information prior is superior to non-information prior and the SE loss function is best followed by GE loss function (k = 0.2) and GE loss function (k = 0.8).  Table 3 shows that the Bayesian estimates of θ have a tendency to underestimate because they have the negative biases. Observing Table 4, we know that, based on maximumlikelihood estimation, the RSS schemes are more suitable for the estimation of θ and β than MRSSU and RSS schemes. Furthermore, the SRS schemes exceed the MRSSU schemes on the estimation of θ while they perform similarly on the estimation of β (except for scheme I). As it can be seen in Tables 6 and 7, there is a regularity of RSS schemes being superior to MRSSU and SRS schemes, in Bayesian estimation as well.
As shown in Table 8, with the reduction in the sample size, the ACLs of interval estimates increase and the CPs also ascend. Figures 2 and 3 visually show the change of ACLs of θ and β. Whereas the CPs of ACIs are always less than 1, and the CPs of CrIs remain unchanged (are equal to 1) and Figure 4 clearly illustrates the trend. This indicates that Bayesian estimation has better performance than the maximum likelihood estimation in the sense of interval estimation. The optimal interval estimations are the Bayes CrIs based on MRSSU schemes except for (i) θ for scheme I and (ii) (θ, β) for scheme II under prior 2.

Data Analysis
Here, an open access dataset from the California Data Exchange Center is illustrated, which can be found in [31]. It is related to the monthly water capacity from the Shasta Reservoir in California, USA, between August and December from 1975 to 2016. The set of data, previously used by some authors such as [11,32], contains a total of 42 observations, which is given in Table 9. For evaluating the proposed point estimators and interval estimators based on SRS, RSS and MRSSU, we designed s = 12, m = 3 and n = 4. Use different sampling methods and draw a random sample of size 12. In the case of SRS, a random sample of size 12 is directly drawn. In the case of RSS and MRSSU, three sample sets of size 3 are drawn primarily and some components are selected with iterations according to the specific procedure described in Sections 3 and 4.