Objective Bayesian Entropy Inference for Two-Parameter Logistic Distribution Using Upper Record Values

: In this paper, we provide an entropy inference method that is based on an objective Bayesian approach for upper record values having a two-parameter logistic distribution. We derive the entropy that is based on the i -th upper record value and the joint entropy that is based on the upper record values. Moreover, we examine their properties. For objective Bayesian analysis, we obtain objective priors, namely, the Jeffreys and reference priors, for the unknown parameters of the logistic distribution. The priors are based on upper record values. Then, we develop an entropy inference method that is based on these objective priors. In real data analysis, we assess the quality of the proposed models under the objective priors and compare them with the model under the informative prior.


Introduction
Shannon [1] proposed information theory for quantifying information loss and introduced statistical entropy.Baratpour et al. [2] obtained the entropy of a continuous probability distribution using upper record values.Moreover, they obtained several bounds for this entropy using the hazard rate function.Abo-Eleneen [3] suggested an efficient computation method for the entropy in progressively Type-II censored samples.Kang et al. [4], using maximum likelihood estimators (MLE) and approximate MLE (AMLE), derived estimators of the entropy of a double-exponential distribution that are based on multiply Type-II censored samples.Seo and Kang [5], using estimators of the shape parameter in the generalized half-logistic distribution, developed methods for estimating entropy that are based on Type-II censored samples.
In this paper, we provide an entropy inference method that is based on an objective Bayesian approach for upper record values having the two-parameter logistic distribution.The cumulative distribution function (cdf) and probability density function (pdf) of a random variable X with this distribution are given by respectively, where µ is the location parameter and σ is the scale parameter.The paper is organized as follows: In Section 2, we obtain the Jeffreys and reference priors and derive an entropy inference method that is based on the two non-informative priors.In Section 3, we analyze a real data set in order to demonstrate the validity of the proposed method.Section 4 concludes this paper.

Entropy
The entropy of f (x) is defined by Then, the entropy based on the i-th upper record value X U(i) is Assuming that X U(i) is the i-th upper record value from the logistic distribution with pdf f (x) as in (1), the marginal density function ( 2) is given by Then, the corresponding entropy is given by This only depends on the scale parameter σ and it is clear that it is an increasing function of σ.Therefore, as σ increases, less information is provided by the distribution.

Remark 1.
We can obtain the following relationship between the entropies corresponding to two consecutive record times: Theorem 1.The joint entropy that is based on X U(1) , . . ., X U(k) from the logistic distribution with pdf as in (1) is This is an increasing function of σ, as is the case with H U(i) .
Proof.The joint entropy based on the upper record values X U(1) , . . ., X U(k) is defined by Park [6] as where f X U(1) ,...,X U(k) (x U(1) ,...,x U(k) ) is the joint density function of X U(1) , . . ., X U(k) .In addition, it is simplified to a single integral by Rad et al. [7] as follows Let X U(1) , . . ., X U(k) be the upper record values from the logistic distribution with pdf as in (1) and Then, the integral term in ( 5) is given by Finally, using the series expansion we can complete the proof.
Remark 2. The entropies (3) and (4) can take negative values because of the term log σ.This is because the marginal density function f X U(i) (x) and the joint density function f X U(1) ,...,X U(k) (x U(1) ,...,x U(k) ) can have values greater than one for very small σ.
We present the values of the entropies H U(i) and H U(1),...,U(k) for various values of σ, i and k in Tables 1 and 2 and Figure 1.Table 1 shows that H U(i) is an increasing function of σ for fixed i. Symmetrically, it is an increasing function of i for fixed σ and i ≥ 3. Likewise, Table 2 shows that H U(1),...,U(k) increases as σ and k increase, except for σ = 0.1.
We note that σ in (3) and ( 4) is an unknown parameter.Thus, it should be estimated when the upper record values are observed.The following theorem provides an estimator of the joint entropy H U(1),...,U(k) in the Bayesian framework.

Theorem 2. The Bayes estimator of H
where the posterior expectation E π|x (•) exists and is finite.
Proof.In the Bayesian framework, the entropy estimator that is based on X U(1) , . . ., X U(k) is defined as Then, the estimator is given by This completes the proof.
In the following subsection we will provide a method for obtaining the term E π|x (log σ) in (6).

Posterior Analysis Based on Objective Priors
Asgharzadeh et al. [8] proposed a subjective prior distribution for π(µ, σ) as follows If one has sufficient prior information, the hyperparameters a, b and µ 0 can be easily determined; otherwise one should depend on objective or non-informative priors.In fact, it is not easy to elicit suitable prior information.We will not consider a method for eliciting the values of the hyperparameters, but rather an inference method that is based on objective priors.We will now obtain objective priors (the Jeffreys and reference priors) that are based on the Fisher information matrix for (µ, σ).See [8].
Let X U(1) , . . ., X U(k) be the upper record values of X 1 . . ., X n from the logistic distribution with pdf as in (1).Then, the corresponding likelihood function is given by In addition, the Fisher information matrix for (µ, σ) is given by By the result in [8], all elements of the Fisher information matrix are proportional to 1/σ 2 .Therefore, the Jeffreys prior is since it is proportional to the square root of the determinant of the Fisher information matrix.However, the Jeffreys prior has some drawbacks in the multi-parameter case, such as the marginalization paradox and the Neyman-Scott problem.Alternatively, Bernardo [9] introduced the reference prior.Moreover, Berger et al. [10,11] provided a general algorithm for deriving the reference prior.Using this algorithm, we can obtain the reference prior π R (µ, σ) as follows regardless of which parameter is of interest.Under a joint prior π(µ, σ), the resulting posterior distribution is Unfortunately, it is impossible to express in closed forms the marginal distribution for µ and σ under the derived priors (7) and (8).In order to generate Markov chain Monte Carlo (MCMC) samples from the marginal distributions, it is necessary to obtain the full conditional posterior distributions for each parameter under the joint prior π(µ, σ) as follows π(µ|σ, x) ∝ π(µ, σ) exp (µ/σ) Under both objective priors ( 7) and ( 8), the full conditional posterior distributions for µ are log-concave.Therefore, we can draw the MCMC samples µ i (i = 1, . . ., N) from these conditional posterior distributions using the method proposed by [12].Moreover, we note that σ ∈ R + , whereas µ ∈ R and X U(i) ∈ R. Thus, it is not easy to find a suitable proposal distribution for drawing the MCMC samples σ i (i = 1, . . ., N) from the full conditional posterior distribution π(σ|µ, x).Therefore, we employ the random-walk Metropolis algorithm that is based on a normal proposal distribution truncated at zero.Using the MCMC samples, the term E π|x (log σ) in ( 6) can be approximated as follows where M is the number of burn-in samples.
The following section examines the validity of the provided objective Bayesian method by analyzing a real data set.

Application
Asgharzadeh et al. [8] analyzed the upper record values 2.70, 3.78, 4.83, 8.02, 8.37 from the total annual rainfall (in inches) during March that was recorded at Los Angeles Civic Center from 1973 to 2006.To obtain Bayes estimates under the subjective prior π I (µ, σ), we use the same values that [8] used (i.e., a = b = 0.00001 and µ 0 = 0).The MCMC samples are generated using the algorithm that is described in Section 2.1.To obtain the optimal acceptance rate under priors ( 7) and ( 8), the variances in a truncated normal proposal are set to 0.7 and 0.8, respectively [13].Based on 5500 MCMC samples with 500 brun-in samples, the Bayes estimates under the square error loss function and the corresponding 95% HPD CrIs are computed in order to compare the MLE.The results are presented in Table 3.To verify the validity of the MCMC samples, we present their autocorrelation functions (ACF) and trace plots in Figures 2 and 3. From Figures 2 and 3, we can see that the MCMC samples are mixing and converge to the stationary distribution well.
Table 3 shows that the length of the HPD CrIs under the objective priors π J (µ, σ) and π R (µ, σ) is smaller than it is under the subjective prior π I (µ, σ) with a = b = 0.00001 and µ 0 = 0.  Furthermore, we assess the quality of the Bayesian models under priors (7) and ( 8) based on the replications X rep U(i) (i = 1, . . ., 5) of the observed upper record values from the posterior predictive distributions that are given by where f X rep (x rep ) is the marginal density function of X rep .The replications are obtained as follows ).The replications and their mean and standard deviation (std) are given in Table 4.The mean and standard deviation (std) of the observed upper record values are 5.54 and 2.541, respectively.The model under the Jeffreys prior (7) exhibits better performance with respect to the replications X rep U(i) (i = 1, 2, 3, 5) and the mean, whereas, under the reference prior (8), it exhibits better performance with respect to the replication X rep U(4) .However, there is no significant difference between the replications under the priors.
Finally, we present the estimation results for the joint entropy ĤB U(1),...,U(k) under the subjective prior π IB (µ, σ) with a = b = 0.00001 and µ 0 = 0 and the objective priors π J (µ, σ) and π R (µ, σ) in Table 5.In addition, we present the kernel density of the joint entropy based on the MCMC samples in Figure 4.   Table 5 shows that the joint entropy under the informative prior distribution π IB (µ, σ) is larger than it is under the objective priors ( 7) and ( 8).In addition, Figure 4 shows that the tail of the kernel density under the subjective prior π I (µ, σ) with a = b = 0.00001 and µ 0 = 0 is heavier than it is under the objective priors π J (µ, σ) and π R (µ, σ).This is due to the fact that σIB is estimated to be larger than σJB and σRB (see Table 3).

Conclusions
In this paper, we proposed an entropy inference method that is based on an objective Bayesian approach for upper record values having the two-parameter logistic distribution.We first obtained non-informative priors, namely, the Jeffreys and reference priors, for the unknown parameters of the two-parameter logistic distribution.Subsequently, we derived the joint entropy based on the upper record values and examined its properties.We evaluated the objective Bayesian models under the two objective priors through the posterior predictive checking that was based on the replications of the observed upper record values.The proposed objective Bayesian approach is useful when there is not enough prior information.

Figure 4 .
Figure 4. (a) Kernel density of the joint entropy based on the MCMC samples under the prior π I (µ, σ), (b) Kernel density of the joint entropy based on MCMC samples under the prior π J (µ, σ) and (c) Kernel density of the joint entropy based on MCMC samples under the prior π R (µ, σ).

Table 1 .
Entropy based on the i-th upper record value X U(i) .

Table 3 .
Estimates of µ and σ and the corresponding 95% HPD CrIs.

Table 4 .
Replications and their mean and standard deviation (std).