A Two-Stage Maximum Entropy Prior of Location Parameter with a Stochastic Multivariate Interval Constraint and Its Properties

This paper proposes a two-stage maximum entropy prior to elicit uncertainty regarding a multivariate interval constraint of the location parameter of a scale mixture of normal model. Using Shannon’s entropy, this study demonstrates how the prior, obtained by using two stages of a prior hierarchy, appropriately accounts for the information regarding the stochastic constraint and suggests an objective measure of the degree of belief in the stochastic constraint. The study also verifies that the proposed prior plays the role of bridging the gap between the canonical maximum entropy prior of the parameter with no interval constraint and that with a certain multivariate interval constraint. It is shown that the two-stage maximum entropy prior belongs to the family of rectangle screened normal distributions that is conjugate for samples from a normal distribution. Some properties of the prior density, useful for developing a Bayesian inference of the parameter with the stochastic constraint, are provided. We also propose a hierarchical constrained scale mixture of normal model (HCSMN), which uses the prior density to estimate the constrained location parameter of a scale mixture of normal model and demonstrates the scope of its applicability.


Introduction
Suppose y i 's are independent observations from a scale mixture of a p-variate normal distribution with the p × 1 location parameter θ and known scale matrix.Then, a simple location model for the p-variate observations with y i ∈ R p is: where the distribution of the p × 1 vector variable i is F ∈ F with: F = F : N p 0, κ(η)Λ , η ∼ G(η) with κ(η) > 0, and η > 0 , (2) where η is a mixing variable with the cdf G(η) and κ(η) is a suitably-chosen weight function.
Bayesian analysis of the model (1) begins with the specification of a prior distribution, which represents the information about the uncertain parameter θ that is combined with the joint probability distribution of y i 's to yield the posterior distribution.When there are no constraints on the location parameter, then usual priors (e.g., Jeffreys invariant prior or an informative normal conjugate prior) can be used, and posterior inference can be performed without any difficulty.In some practical situations, however, we may have prior information that θ have a multivariate interval constraint, and thus, the value of θ needs to be located in a restricted space C ⊂ R p , where C = (a, b) is a p-variate interval with a = (a 1 , . . ., a p ) and b = (b 1 , . . ., b p ) .For the remainder of this paper, we use θ ∈ C to denote the multivariate interval constraint: θ; a i ≤ θ i ≤ b i , i = 1, . . ., p , where θ = (θ 1 , . . ., θ p ) . ( When we have sufficient evidence that the constraint condition on the model ( 1) is true, then a suitable restriction on the parameter space, such as using a truncated prior distribution, is expected.See, e.g., [1][2][3][4], for various applications of the truncated prior distribution in Bayesian inference.However, it is often the case that prior information about the constraint is not certain for Bayesian inference.Further, even the observations from the assumed model (1) often do not provide strong evidence that the constraint is true and, therefore, may appear to contradict the assumption of the model associated with the constraint.In this case, it is expected that the uncertainty about the constraint is taken into account in eliciting a prior distribution of θ.When the parameter constraint is not certain for Bayesian estimation in the univariate normal location model, the seminal work by [5] proposed the use of a two-stage hierarchical prior distribution by constructing a family of skew densities based on the positively-truncated normal prior distribution.Generalizing the framework of the prior hierarchy proposed by [5][6][7][8][9][10], among others, various priors were considered for the Bayesian estimation of normal and scale mixture of normal models with uncertain interval constraints.In particular, [7] obtained the prior of θ as the normal selection distribution (see, e.g., [11]) and, thus, exploited the class of weighted normal distribution by [12] for reflecting the uncertain prior belief on θ.On the other hand, there are situations to set up a prior density of θ on the basis of information regarding the moments of the density, such as the mean and covariance matrix.A useful method of dealing with this situation is through the concept of entropy by [13,14].Other general references where moment inequality constraints have been considered include [15,16].To the best of our knowledge, however, a formal method to set up a prior density of θ, consistent with information regarding the moments of the density, as well as the uncertain prior belief on the location parameter, has not previously been investigated in the literature.Thus, such practical considerations motivate us to develop a prior density of θ, which is tackled in this paper.
As discussed by [17][18][19][20], the entropy has a direct relationship to information theory and measures the amount of uncertainty inherent in the probability distribution.Using this property of the entropy, we propose a two-stage hierarchical method for setting up the two-stage maximum entropy prior density of θ.The method will enable us to elicit information regarding the moments of the prior distribution, as well as the degree of belief in the constraint θ ∈ C. Furthermore, this paper also suggests an objective method to measure the degree of belief regarding the multivariate interval constraint accounted for by using the prior.We also propose a simple way of controlling the degree of belief regarding the constraint of θ in Bayesian inference.This is done by investigating the relation between the degree of belief and the enrichment of the hyper-parameters of the prior density.In this respect, the study concerning the two-stage maximum entropy prior is interesting both from a theoretical and an applied point of view.On the theoretical side, it develops yet another conjugate prior of constrained θ based on the maximum entropy approach.The study provides several properties of the proposed prior, which advocate the idea of two stages of a prior hierarchy to elicit information regarding the moments of the prior and the stochastic constraint of θ.From the applied view point, the prior is especially useful for a Bayesian subjective methodology for inequality constrained multivariate linear models.
The remainder of this paper is arranged as follows.In Section 2, we propose the two-stage maximum entropy prior of θ by applying Boltzmann's maximum entropy theorem (see, e.g., [21,22]) to the frame of the two-stage prior hierarchy by [5].We also suggest an objective measure of uncertainty regarding the stochastic constraint of θ that is accounted for by the two-stage maximum entropy prior.In Section 3, we briefly discuss the properties of the proposed prior of θ, which will be useful for the Bayesian analysis of θ subject to uncertainty regarding the multivariate interval constraint θ ∈ C. Section 4 provides a hierarchical scale mixture of normal model of Equation (1) using the two-stage prior, referred to as the hierarchical constrained scale mixture of normal model (HCSMN).Section 4 explores the Bayesian estimation model (1) by deriving the posterior distributions of the unknown parameters under the HCSMN and discusses the properties of the proposed measure of uncertainty that can be explained in the context of the HCSMN.In Section 5, we compare the empirical performance of the proposed prior based on synthetic data and real data applications with the HCSMN models for the estimation of θ with a stochastic multivariate interval constraint.Finally, the concluding remarks along with a discussion are provided in Section 6.

Maximum Entropy Prior
Sometimes, we have a situation in which partial prior information is available, outside of which it is desirable to use a prior that is as non-informative as possible.Assume that we can specify the partial information concerning θ in Equation ( 1) with continuous space Θ.That is: The maximum entropy prior can be obtained by choosing π(θ) that maximizes the entropy: in the presence of the partial information in the form of Equation ( 4).A straightforward application of the calculus of variation leads us to the following theorem.
When the partial information is about the mean and covariance matrix of θ, outside of which it is desired to use a prior that is as non-informative as possible, then the theorem yields the following result.
In practical situations, we sometimes have partial information about a multivariate interval constraint (i.e., θ ∈ C) in addition to the first two moments as given in Corollary 1.

Two-Stage Maximum Entropy Prior
This subsection considers the case where the maximum entropy prior of θ has stochastic constraint in the form of a multivariate interval, i.e., Pr(θ ∈ C) = γ, where C is defined by Equation (3) and γ ∈ [γ max , 1].Here, γ max is Pr(θ ∈ C) calculated by using the maximum entropy prior distribution in Equation (6).We develop a two-stage prior of θ, denoted by π two (θ), which has a different formula according to the degree of belief, γ, regarding the constraint.
Suppose we have only partial information about the covariance matrix, Ω 2 , of the parameter θ, in the first stage of a prior elicitation.Then, for a given mean vector µ 0 , we may construct the maximum entropy prior, Equation (6), so that the first stage maximum entropy prior will be π max (θ|µ 0 ), which is the density of the N p (µ 0 , Ω 2 ) distribution.In addition to the information, suppose we have collected prior information about the unknown µ 0 , which gives a value of the mean vector θ 0 and covariance matrix Ω 1 , as well as a stochastic (or certain) constraint, indicating Pr(µ 0 ∈ C) = 1.Then, in the second stage of the prior elicitation, one can elicit the additional prior partial information by using the constrained maximum entropy prior in Equation (7).
Analogous to the work of [5], we can specify all of the partial information about θ by following two stages of the maximum entropy prior hierarchy over θ ∈ R p : where φ p (θ 0 , Ω 1 )I(µ 0 ∈ C) is a truncated normal density, i.e., the density of the N p (θ 0 , Ω 1 )I(µ 0 ∈ C) variate, and Ω 1 + Ω 2 = Σ.Thus, the two stages of prior hierarchy are as follows.In the first stage, given µ 0 , θ has a maximum entropy prior that is the N p (µ 0 , Ω 2 ) distribution as in Equation ( 6).In the second stage, µ 0 has a distribution obtained by truncating the maximum entropy prior distribution to elicit uncertainty about the prior information that θ ∈ C. It may be sensible to assume that the value of θ 0 is located in the multivariate interval C or in the centroid of the interval.
Definition 1.The marginal prior density of θ, obtained from the two stages of the maximum entropy prior hierarchy Equations (8) and (9), is called as a two-stage maximum entropy prior of θ.
Since Ω 1 + Ω 2 = Σ, if the constraint is completely certain (i.e., γ = 1), we may set Ω 2 → O to get the π const (θ) from the two stages of maximum entropy prior, while the two-stage prior yields π max (θ) with γ = γ max for the case where Ω 1 = O.Thus, the hyper-parameters Ω 1 and Ω 2 may need to be assessed to achieve the degree of belief γ about the stochastic constraint.When Ω 1 = O and Ω 2 = O, the above hierarchy of priors yields the following marginal prior of θ.Lemma 2. The two stages of the prior hierarchy of Equations ( 8) and ( 9) yield the two-stage maximum entropy prior distribution of θ given by: where φ p (x; c, A) denotes the pdf of X ∼ N p (c, A) and Φp (C; c, A) denotes a p-dimensional rectangle probability of the distribution of X, i.e., P(X ∈ C), Proof.
In fact, the density π two (θ) belongs to the family of rectangle screened multivariate normal (RSN) distributions studied by [23].
Corollary 3. The distribution law of θ with the density in Equation (10) is: which is a p-dimensional RSN distribution with respective location and scale parameters τ and Ψ and the rectangle screening space C. Here, the joint distribution of X 1 and X 2 is N 2p (τ, Ψ), where τ = (θ 0 , θ 0 ) and By use of the binomial inverse theorem (see, e.g., [24] p. 23), one can easily see that µ x 1 |x 2 and Ω x 1 |x 2 are respectively equivalent to µ and Q, in Equation (10), provided that x 2 is changed to θ.
According to [23], we see that the stochastic representation for the RSN vector θ ∼ RSN p (C; τ, Ψ) is: where denotes a doubly-truncated multivariate normal random vector whose distribution is defined by This representation enables us to implement a one-for-one method for generating a random vector with the RSN p (C, τ, Ψ) distribution.
For generating the doubly-truncated multivariate normal vector Y (α, β) 1 , the R package tmvtnorm by [25] can be used, where R is a computer language and an environment for statistical computing and graphics.

Entropy of a Maximum Entropy Prior
Suppose we have partial a priori information that we can specify values for the covariance matrices Ω 1 and Ω 2 , where Σ = Ω 1 + Ω 2 .

Case 1: Two-stage Maximum Entropy Prior
When the two-stage maximum entropy prior π two (θ) is assumed for the prior distribution of θ, its entropy is given by: where and the E two denotes the expectation with respect to the RSN distribution with the density π two (θ).Equation (12) shows that E[θ] = θ 0 + ξ, and Cov(θ) = Ω 2 + H. Here, ξ = (ξ 1 , . . ., ξ p ) and H = {h ij }, i, j = 1, . . ., p, are the mean vector and covariance matrix of the doubly-truncated multivariate normal random vector, Y 1 ∼ N p 0, Ω 1 I y 1 ∈ (α, β) .Readers are referred to [25] with the R package tmvtnorm and [26] with the R package mvtnorm for implementing the respective calculations of doubly-truncated moments and integrations.As seen in Equation ( 13), an analytic calculation of E log h(θ) involves a complicated integration.Instead, by using a Monte Carlo integration, we may calculate it approximately.
According to Equation (12), it follows that the stochastic representation of the prior distribution θ ∼ RSN p (C; τ, Ψ) with density π two (θ) is useful for generating θ's from the prior distribution θ by using the R packages mvtnorm and tmvtnorm and, hence, implementing the Monte Carlo integration.

Case 2: Constrained Maximum Entropy Prior
When the constrained maximum entropy prior π const (θ) in Equation ( 7) is assumed for the prior distribution of θ, its entropy is given by: The E const denotes the expectation with respect to the doubly-truncated multivariate normal distribution with the density π const (θ), and its analytic calculation is not possible.Instead, the R packages tmvtnorm and mvtnorm are available for calculating the respective moment and integration in the expression of Ent(π const ).

Case 3: Maximum Entropy Prior
On the other hand, if the maximum entropy prior π max (θ) is assumed for the prior distribution of the location parameter θ, its entropy is given by: The following theorem asserts the relationship among the degrees of belief, accounted for by the three priors, about the a priori uncertain constraint θ; θ ∈ C .Theorem 1.The degrees of belief γ max , γ two , and γ const about the a priori constraint θ; θ ∈ C , accounted for by π max (θ), π two (θ) and π const (θ), have the following relation: provided that the parameters of π two (θ) in Equation ( 10) satisfy: where Proof.The conditions for equalities are straightforward from the stochastic representation in Equation (12).Under the π max (θ) in Equation ( 6), because π two (θ) is the density of θ ∼ RSN p (C; τ, Ψ), and γ const = θ∈C π const (θ)dθ = 1.Therefore, the condition Φ2p C * ; τ, Ψ ≥ Φp C; θ 0 , Ω 1 Φp C; θ 0 , Σ gives the inequality relation.

Objective Measure of Uncertainty
In constructing the two stages of prior hierarchy over θ ∈ R p , the usual practice is to set the value of θ 0 as the centroid of the uncertain constrained multivariate interval C = (a, b).In this case, we have the following result.Corollary 4. In the case where the value of θ 0 in π two (θ) is the centroid of the multivariate interval C, Proof.Equation (12) indicates that: where Y 1 ∼ N p (0, Ω 1 ) and Y 2 ∼ N p (0, Ω 2 ) are independent random vectors, α = a − θ 0 , and When θ 0 is the centroid of C, α = −β, and hence: by the theorem of [27].This leads to the first inequality, γ max ≤ γ two .Since γ const = 1, we see that the second inequality in Equation ( 15) holds.
The following are immediate from Theorem 1 and Corollary 4 : (i) The two-stage maximum entropy prior achieves γ two for the degree of belief about the uncertain multivariate interval constraint θ; θ ∈ C , and its value satisfies γ two ∈ [γ max , 1] if the condition in the theorem is satisfied.Note that the equality γ two = 1 holds for Ω 2 = O; (ii) The degree of belief about the multivariate interval constraint is a function of the covariance matrices Ω 1 and Ω 2 .Thus, if we have the partial a priori information that specifies values of the covariance matrices Ω 1 and Ω 2 , the degree of belief γ two , associated with π two (θ), can be assessed.
is an intra-class covariance matrix, and 1 p denotes a p × 1 summing vector whose every element is unity.When the constraint is changed to C = (−(21 p + a), −a) in this comparison, one can easily check that the degrees of belief do not change and give the same results seen in Figure 1.The figure depicts exactly the same inequality relationship given in Theorem 1.In comparison with γ two and γ const = 1, we see that the degree of belief in the uncertain constraint, accounted for by using π two (θ), becomes large as Ω 2 → O (or equivalently Ω 1 → Σ).In particular, this tendency is more evident for small σ 2 and large ρ values.Third, the difference in γ two and γ max in the right panel suggests that the difference becomes large as Ω 2 tends to O.In particular, for fixed values of δ and ρ, the figure shows that the difference increases as the value of σ 2 decreases, while it decreases as the value of ρ increases for fixed values of δ and σ 2 .Therefore, the figure confirms that the two-stage maximum entropy prior π two (θ) accounts for the a priori uncertain constraint θ; θ ∈ C with the degree of belief γ two ∈ [γ max , 1].The figure also notes that the magnitude of γ two depends on both the first stage covariance Ω 2 and the second stage covariance Ω 1 in the two stages of prior hierarchy in Equations ( 8) and (9).All other choices of the values of p, ρ, and C, satisfying the condition in Theorem 1, produced similar graphics depicted in Figure 1, with the exception of the magnitude of the differences among the degrees of belief.

Properties of the Entropy
The expected uncertainty in the multivariate interval constraint of the location parameter θ, θ; θ ∈ C , accounted for by the two-stage prior π two (θ), is measured by its entropy Ent(π two (θ)), and information about the constraint is defined by −Ent(π two (θ)).Thus, as considered by [20,28], the difference between the Shannon measures of information, before and after applying the uncertain constraint {θ; θ ∈ C}, can be explained by the following property.
Proof.It is straightforward to check the equalities by using the stochastic representation in Equation (12).Since π max (θ) is the maximum entropy prior, it is sufficient to show that Ent(π two (θ)) by the lemma of [27].Second, This and the lemma of [27] is a positive-semi-definite, and hence, tr give the inequality Ent(π two (θ)) ≥ Ent(π const (θ)), because E two log h(θ) ≤ 0.
Figure 2 depicts the difference between Ent(π max (θ)), Ent(π two (θ)) and Ent(π const (θ)) using the same parameter values used in constructing Figure 1. Figure 2 coincides with the inequality relation given in Corollary 5 and indicates the following consequences: (i) Even though θ 0 is not the centroid of the multivariate interval C, we see that Ent(π max (θ)) > Ent(π two (θ)) > Ent(π const (θ)) for δ ∈ (0, 1).(ii) The difference Ent(π two (θ)) − Ent(π const (θ)) is a monotone decreasing function of δ, while Ent(π max (θ)) − Ent(π two (θ)) is a monotone increasing function.(iii) The differences get bigger the larger σ 2 becomes for δ ∈ (0, 1).This indicates that the entropy of π two (θ) is associated not only with the covariance of the first stage prior Ω 2 , but that of the second stage prior Ω 1 in Equations ( 8) and ( 9), respectively.(iv) Upon comparing Figures 1 and 2, the entropy Ent(π two (θ)) is closely related to the degree of belief γ two , such that: where c two > 0 is obtained by using Equations ( 13) and ( 16) and 1 − γ two denotes the degree of uncertainty in a priori information regarding the multivariate interval constraint θ; θ ∈ C elicited by π two (θ).These consequences and Corollary 5 indicate that 1 − γ two stands between 1 − γ const and 1 − γ max .Thus, the two-stage prior π two (θ) is useful for eliciting uncertain information about the multivariate interval constraint.Theorem 1 and the above statements produce an objective method for eliciting the stochastic constraint {θ; θ ∈ C} via π two (θ).Corollary 6. Suppose the degree (1 − γ two ) of uncertainty associated with the stochastic constraint {θ; θ ∈ C} is given.An objective way of eliciting the prior information by using π two (θ) is to choose the covariance matrices Ω 1 and Ω 2 in π two (θ), such that γ two = Φ2p C * ; τ, Ψ / Φp C; θ 0 , Ω 1 , where Σ = Ω 1 + Ω 2 is known and Since γ const = 1, the degree of uncertainty ( 1 − γ two ) is equal to γ const − γ two .The left panel of Figure 1 plots a graph of 1 − γ two against δ.The graph indicates that a δ value for π two (θ) can be easily determined for given Σ, and the value is in inverse proportion to the degree of uncertainty regardless of Σ.

Posterior Distribution
Suppose the distribution of the error vector in the model (1) belongs to the family of scale mixture of normal distributions defined in Equation ( 2); then, the conditional distribution of the data information from n = 1 is [y|η] ∼ N p (θ, κ(η)Λ).It is well known that the priors π max (θ) and π const (θ) are conjugate priors for the location vector θ, provided that η and Λ are known.That is, conditional on η, each prior satisfies the conjugate property that the prior and the posterior distributions of θ belong to the same family of distributions.The following corollary provides that the conditional conjugate property also applies to π two (θ).
Corollary 7. Let y|η ∼ N p (θ, κ(η)Λ) with known Λ.Then, the two-stage maximum entropy prior π two (θ) in Equation (10) yields the conditional posterior distribution of θ given by: where Proof.When the two-stage prior π two (θ) in Equation ( 10) is used, the conditional posterior density of θ given η is: The last term of the proportional relations is a kernel of the RSN p (C; τ * η , ψ * η ) density defined by Corollary 3.
Corollaries 3 and 7 establish the conditional conjugate property of π two (θ) : Suppose the location parameter θ is the normal mean vector, then the RSN prior distribution, i.e., π two (θ), yields the conditional posterior distribution, which belongs to the class of RSN distributions as given in Corollary 7. In the particular case where the distribution of η degenerates at κ(η) = 1, i.e., the model ( 1) is a normal model, then the conditional conjugate property of π two (θ) reduces to the unconditional conjugate property.
Using the relation between the distribution of Equation (11) and that of Equation ( 12), we can obtain the stochastic representation for the conditional posterior RSN distribution in Equation ( 17) as follows.

Corollary 8. Conditional on the mixing variable η, the stochastic representation of θ|y
where W 1 ∼ N p (0, Σ * 1η ) and W 2 ∼ N p (0, I p ) are independent and W Proof.Suppose the distributions of X 1 and X 2 in Equation ( 11) changed to X 1 Then, the stochastic representation in Equation ( 12) associated with the distribution X 2 |X 1 ∈ C in Equation (11) gives the result.

Hierarchical Constrained Scale Mixture of Normal Model
For the model ( 1), if we are completely sure about a multivariate interval constraint on θ, a suitable restriction on the parameter space θ ∈ R p , such as using a truncated normal prior distribution, is expected for eliciting the information.However, there are certain cases where we have a priori information that the location parameter θ is highly likely to have a multivariate interval constraint, and thus, the value of θ needs to be located with uncertainty in a restricted space {θ ∈ C} with C = (a, b).Then, we cannot be sure about the constraint, and then, the constraint becomes stochastic (or uncertain), as in our problem of interest.In this case, the uncertainty about the constraint must be taken into account in the estimation procedure of the model ( 1).This section considers a hierarchical Bayesian estimation of the scale mixture of normal models reflecting the uncertain prior belief on θ.

The Hierarchical Model
Let us consider a hierarchical constrained scale mixture of normal model (HCSMN) that uses the hierarchy of the scale mixture of normal model ( 1) and includes the two stages of a prior hierarchy in the following way: where W −1 p (D, d) denotes the inverted Wishart distribution with positive definite scale matrix D and d degrees of freedom whose pdf W −1 p (Λ; D, d) is:

The Gibbs Sampler
Based on the HCSMN model structure with the likelihood and the prior distributions in Equation ( 19), the joint posterior distribution of θ, Λ and η = (η 1 , . . ., η n ) given the data {y 1 , . . ., y n } is: where g(η i )'s denote the densities of the mixing variables η i 's.Note that the joint posterior of Equation ( 20) is not simplified in an analytic form of the known density and, thus, intractable for the posterior inference.Instead, we use the Gibbs sampler for the posterior inference.See [30] for a reference.To run the Gibbs sampler, we need the following full conditional posterior distributions: (i) The full conditional posterior densities of η i 's are given by: (ii) The full conditional distribution of θ is obtained by using the way analogous to the proof of Corollary 7. It is: where: and where

Markov Chain Monte Carlo Sampling Scheme
When conducting a posterior inference of the HCSMN model, using the Gibbs sampling algorithm with the full conditional posterior distributions of η i 's, θ and Λ, the following points should be noted.note 2: According to choice of the distribution η i and the mixing function κ(η i ), the HCSMN model may produce a different model other than the HCN model, such as hierarchical constrained multivariate t ν (HCt ν ), hierarchical constrained multivariate logit, hierarchical constrained multivariate stable and hierarchical constrained multivariate exponential power models.See, e.g., [31,32], for various distributions of η i and corresponding function κ(η i ), which can be used to construct the HCSMN model.note 3: When the hierarchical constrained multivariate t ν (HCt ν ) model is considered, the hierarchy of the model in Equation (19) . ., n.Thus, the Gibbs sampler comprises the conditional posterior Equations ( 21)- (23).Under the HCt ν model, the distribution of Equation ( 21) reduces to: where ν * = p + ν and h = ν + (y i − θ) Λ −1 (y i − θ).To limit model complexity, we consider only fixed ν, so that we can investigate different HCt ν models.As suggested by [32], a uniform prior on 1/ν (0 < 1/ν < 1) can be considered.However, this will bring additional computational burden.note 4: Except for the HCN and HCt ν models, the Metropolis-Hastings algorithm within the Gibbs sampler is used for estimating the HCSMN models, because the conditional posterior densities Equation (20) do not have explicit forms of known distributions as in Equations ( 21) and (22).See, e.g., [22], for the algorithm for sampling η i from various mixing distributions, g i (η i ).
A general procedure for the algorithm is as follows: Given the current values Θ = {η, θ, Λ}, we independently generate a candidate η i from a proposal density q(η * i |η i ) = g i (η * i ), as suggested by [33], which is used for a Metropolis-Hastings algorithm.Then, accept the candidate value with the acceptance rate: note 5: As noted from Equations ( 8) and ( 9), the second and third stage priors of the HCSMN model in Equation ( 19) reduce to the two-stage prior π two (θ), eliciting the stochastic multivariate interval constraint with degree of uncertainty 1 − γ two .Instead, if the maximum entropy prior π max (θ) and the constrained maximum entropy prior π const (θ) are used for the HCSMN, then the respective full conditional distributions of θ of the Gibbs sampler change from Equation ( 22) to: where τ 1 and Ω * are the same as given in Equation ( 22).

Bayes Estimation
For a simple example, let us consider the HCN model with known Λ.When we assume a stochastic constraint {θ; θ ∈ C} obtained from a priori information, we may use the two-stage maximum entropy prior π two (θ) defined by the second and third stages of the HCSMN model (19) with δ ∈ (0, 1), where the value of δ is determined by using Corollary 6.This yields a Bayes estimate based on the two-stage maximum entropy prior.Corollary 8 yields: and: Here, τ 1 and τ 0 are the same as those in Equation ( 22), and φ(•) denotes the univariate standard normal density function.See [25,34] for the first moment of the truncated multivariate normal distribution and for a numerical calculation of the posterior covariance matrix Cov θ * tn , respectively.On the other hand, when we have certainty about the constraint {θ; θ ∈ C}, we may use the HCSMN model with δ = 1, which uses the constrained maximum entropy prior π const (θ) instead of π two (θ) in its hierarchy.This case gives the Bayes estimate: and: where θ tn ∼ N p (τ 1 , Ω * )I θ tn ∈ C and w i denotes i-th diagonal element of Ω * .
On the contrary, when we have completely no a priori information about the constraint in the space of θ, the HCSMN model with the maximum entropy prior π max (θ) (equivalently, the HCSMN model with δ = 0) may be used for the posterior inference.In this model, the Bayes estimate of the location parameter is given by: θmax = τ 1 . ( Comparing Equations ( 24) and (25) to Equation ( 26), we see that Equations ( 24) and ( 25) are the same for δ = 1, and the last term in Equation ( 24) vanishes when we assume that there is no a priori information about the stochastic constraint, {θ; θ ∈ C}.In this sense, the last term in Equation ( 24) can be interpreted as a shrinkage effect of the HCSMN model with δ = 0.This effect makes the Bayes estimator of θ shrink toward the stochastic constraint.In addition, we can calculate the difference between the estimates in Equations ( 24) and ( 25): This difference vector is a function of the degree of belief γ two or δ ∈ (0, 1) for Equation ( 25) is based on γ const = 1 and δ = 1 and Di f f = 0 for δ = 1.Thus, the difference represents a stochastic effect of the multivariate interval constraint.

Numerical Illustrations
This section presents an empirical analysis of the proposed approach (using the HCSMN model) to the stochastic multivariate interval constraint on the location model.We provide numerical simulation results and a real data application comparing the proposed approach to the hierarchical Bayesian approaches, which use usual priors, π max (θ) and π const (θ).For numerical implementations, we develop our program written in R, which is available from the author upon request.
To fit each of the 200 synthetic datasets (Dataset I) generated from the N 4 (θ, Λ) distribution, we implemented the Markov chain Monte Carlo (MCMC) posterior simulation with the three different HCN models with the multivariate interval constraint C = (a, b) : the HCN models that use π two (θ), π max (θ), and π const (θ).We denote these models by HCN(π two ), HCN(π max ) and HCN(π const ).For each dataset, MCMC posterior sampling was based on the first 10,000 posterior samples as the burn-in, followed by a further 100,000 posterior samples with a thinning size of 10.Thus, the final MCMC posterior samples with a size of 10,000 were obtained for each of the three HCN models.Exactly the same MCMC posterior sampling scheme is applied to each of the 200 synthetic datasets (Dataset II) from the t 4 (θ, Λ, ν) distribution based on the three HCt ν models, HCt ν (π two ), HCt ν (π max ) and HCt ν (π const ).
Summary statistics of the posterior samples of the location parameters (the mean and the standard deviation of 200 posterior means of each parameter) along with the degrees of belief about the constraint C (γ max , γ two and γ const ) are listed in Table 1.For the sake of saving a space, we omit the summary statistics regarding Λ from the table.The table indicates the followings: (i) The MCMC method performs well in estimating the location parameters of all of the models considered.This can be justified by the estimation results of the HCN(π max ) and HCt ν (π max ) models.Specifically, in the posterior estimation of θ, the data information tends to dominate the prior information about θ for the large sample case (i.e., n = 200), while the latter tends to dominate the former for the small sample case of n = 20.Furthermore, the convergence of the MCMC sampling algorithm was evident, and a discussion about the convergence will be given in Subsection 5.2; (ii) The estimates of θ obtained from the HCN(π two ) and HCt ν (π two ) models are uniformly closer to the stochastic constraint θ ∈ C than those from the HCN(π max ) and HCt ν (π max ) models.This confirms that π two (θ) induces an obvious shrinkage effect in Bayesian estimation of the location parameter with a stochastic multivariate interval constraint; (iii) Comparing the estimates of θ obtained from the HCN(π two ) (or HCt ν (π two )) model to those from the HCN(π const ) (or HCt ν (π const )) model, we see that the difference between their vector values is significant.Thus, we can expect an apparent stochastic effect if we use π two (θ) in Bayesian estimation of the location parameter with a stochastic multivariate interval constraint.

Car Body Assembly Data Example
John and Wichern consider car body assembly data (accessible through www.prenhall.com/statistics, [35]) obtained from a study of its sheet metal assembly process.A major automobile manufacturer uses sensors that record the deviation from the nominal thickness (millimeters ×10 −1 ) at a specific location on a car, which has the following levels: the deviation of the car body at the final stage of assembly (Y 1 ) and that at an early stage of assembly (Y 2 ).The data consist of 50 pairs of observations of (Y 1 , Y 2 ), and they provide summary statistics as listed in Table 2.The tests given by ( [36], p. 148), using the measures of multivariate skewness and kurtosis, accept the bivariate normality of the joint distribution of Y = (Y 1 , Y 2 ) .The respective skewness and kurtosis are b 1p = 0.074 and b 2p = 7.337, which give respective p-values of 0.954 (chi-square test for the skewness) and 0.721 (normal test for the kurtosis), indicating the observation model for the dataset is:  In practical situations, we may have information about the mean vector of the observation model (i.e., mean deviation from the nominal thickness) from a past study of the sheet metal assembly process or a quality control report of the automobile manufacturer.Suppose that the information about the centroid of the mean deviation vector, θ = (θ 1 , θ 2 ) , is (−1, 4) with Cov(θ) = diag{1, 4}.Furthermore, there is uncertain information that θ ∈ (a, b), where a = (−1.5, 3) and b = (−0.5, 5) .This paper has proposed the two-stage maximum entropy prior π two (θ) to represent all of the information, which is not available with the other priors, such as π max (θ) and π const (θ).
Using the three hierarchical models (i.e., the HCN(π max ), HCN(π two ) and HCN(π const ) models), we obtain 10,000 posterior samples from the MCMC sampling scheme based on each of the three models with a 10 thinning period after a burn-in period of 10,000 samples.In estimating the Mote Carlo (MC) error, we used the batch mean method method with 50 batches; see, e.g., [37] (pp.39-40).For a formal test for the convergence of the MCMC algorithm, we applied the Heidelberger-Welch diagnostic test of [38] to single-chain MCMC runs and calculated the p-values of the test.For the posterior simulation, we used the following choice of hyper-parameter values: θ 0 = (−1, 4) , Σ = Ω 1 + Ω 2 = 10I 2 , Ω 1 = δΣ, Ω 2 = (1 − δ)Σ, δ ∈ (0, 1), D = 10 −2 I 2 and d = 10 2 + 2p + 1.The posterior estimation and the convergence test results are shown in Table 3.Note that Columns 7-9 of the table list the values obtained from implementing the MCMC sampling for the posterior estimation of HCN(π two ).The small MC error values listed in Table 3 convince us of the convergence of the MCMC algorithm.Furthermore, the p-values of the Heidelberger-Welch test for the stationarity of the single MCMC run are larger than 0.1.Thus, both of the diagnostic checking methods advocate the convergence of the proposed MCMC sampling scheme.Similar to Table 1, this table also shows that π two (θ) induces the shrinkage and stochastic effects in the Bayesian estimation of θ with the uncertain multivariate interval constraint: (i) From the comparison of the posterior estimates obtained from HCN(π two ) with those from HCN(π max ), we see that the estimates of θ 1 and θ 2 , obtained from HCN(π two ), shrink toward the stochastic interval C. The magnitude of shrinkage effect induced by using the proposed prior π two (θ) becomes more evident as the degree of belief in the interval constraint γ two (or δ) gets larger; (ii) On the other hand, we can see the stochastic effect of the prior π two (θ) by comparing the posterior estimate of θ obtained from HCN(π two ) with that from HCN(π const ).The stochastic effect can be measured by the difference between the estimates, and we see that the difference becomes smaller as γ two (or δ) gets larger.

Conclusions
In this paper, we have proposed a two-stage maximum entropy prior π two (θ) of the location parameter of a scale mixture of normal model.The prior is derived by using the two stages of a prior hierarchy advocated by [5] to elicit a stochastic multivariate interval constraint, {θ; θ ∈ C}.With regard to eliciting the stochastic constraint, the two-stage maximum entropy prior has the following properties.(i) Theorem 1 and Corollary 4 indicate that the two-stage prior is flexible enough to elicit all of the degrees of belief in the stochastic constraint; (ii) Corollary 4 confirms that the entropy of the two-stage prior is commensurate with the uncertainty about the constraint {θ; θ ∈ C}; (iii) As given in Corollary 6, the preceding two properties enable us to propose an objective way of eliciting the uncertain prior information by using π two (θ).From the inferential view point: (i) the two-stage prior for the normal mean vector has the conjugate property that the prior and posterior distributions belong to the same family of the RSN distributions by [23]; (ii) the conjugate property enables us to construct an analytically simple Gibbs sampler for the posterior inference of the model (1) with unknown covariance matrix Λ; (iii) this paper also provides the HCSMN model, which is flexible enough to elicit all of the types of stochastic constraints and the scale mixture for Bayesian inference of the model (1).Based on the HCSMN model, the full conditional posterior distributions of unknown parameters were derived, and the calculation of posterior summary was discussed by using the Gibbs sampler and two numerical applications.
The methodological results of the Bayesian estimation procedure proposed in the paper can be extended to other multivariate models that incorporate functional means, such as linear and nonlinear regression models.For example, the seemingly unrelated regression (SUR) model and the factor analysis model (see, e.g., [24]) can be explained in the same framework of the proposed HCSMN in Equation (1).We hope to address these issues in the near future.

Figure 1 .
Figure 1.Graphs of the difference between p 1 = γ max , p 2 = γ two and p 3 = γ const .(a), (c), and (e) for the difference between p 3 and p 2 ; (b), (d), and (f) for the difference between p 2 and p 1 .

Figure 1
Figure 1 compares the degrees of belief about the uncertain multivariate interval constraint θ; θ ∈ C , accounted for by the three priors of θ.The figure is obtained in terms of δ ∈ [0, 1]with Ω 1 = δΣ and Ω 2 = (1 − δ)Σ, p = 3, C = (a, 21 p + a) and θ 0 = 0, where a = (−0.1 × p)1 p , Σ = σ 2 (1 − ρ)I p + σ 2 ρ1 p 1 p is an intra-class covariance matrix, and 1 p denotes a p × 1 summing vector whose every element is unity.When the constraint is changed to C = (−(21 p + a), −a) in this comparison, one can easily check that the degrees of belief do not change and give the same results seen in Figure1.The figure depicts exactly the same inequality relationship given in Theorem 1.In comparison with γ two and γ const = 1, we see that the degree of belief in the uncertain constraint, accounted for by using π two (θ), becomes large as Ω 2 → O (or equivalently Ω 1 → Σ).In particular, this tendency is more evident for small σ 2 and large ρ values.Third, the difference in γ two and γ max in the right panel suggests that the difference becomes large as Ω 2 tends to O.In particular, for fixed values of δ and ρ, the figure shows that the difference increases as the value of σ 2 decreases, while it decreases as the value of ρ increases for fixed values of δ and σ 2 .Therefore, the figure confirms that the two-stage maximum entropy prior π two (θ) accounts for the a priori uncertain constraint θ; θ ∈ C with the degree of belief γ two ∈ [γ max, 1].The figure also notes that the magnitude of γ two depends on both the first stage covariance Ω 2 and the second stage covariance Ω 1 in the two stages of prior hierarchy in Equations (8) and(9).All other choices of the values of p, ρ, and C, satisfying the condition in Theorem 1, produced similar graphics depicted in Figure1, with the exception of the magnitude of the differences among the degrees of belief.

note 1 :
Variable η i at κ(η i ) = 1, i.e., the HCN (hierarchical constrained normal) model with i iid ∼ N p (0, Λ), i = 1, . . ., n, the Gibbs sampler consists of two conditional distributions [θ |Λ, Data] and [Λ |θ, Data].To sample from the first full conditional posterior distribution, we can utilize the stochastic representations of the RSN distribution in Corollary 8.The R package tmvtnorm and the R package mvtnorm can be used to sample from the RSN distribution in Equation (22).

Table 2 .
The Shapiro-Wilk (S-W) test is also implemented to see the marginal normality of each Y i , i = 1, 2. The test statistic values and corresponding p-values of the S-W test are listed in

Table 2 .
Summary statistics for the car body assembly data.S-W, Shapiro-Wilk.

Table 3 .
The posterior estimates and the convergence test results.