A Two-Stage Maximum Entropy Prior of Location Parameter with a Stochastic Multivariate Interval Constraint and Its Properties

Hea-Jung Kim

doi:10.3390/e18050188

Department of Statistics, Dongguk University-Seoul, Seoul 100-715, Korea

Entropy2016, 18(5), 188;https://doi.org/10.3390/e18050188

This article belongs to the Section Information Theory, Probability and Statistics

Version Notes

Order Reprints

Abstract

This paper proposes a two-stage maximum entropy prior to elicit uncertainty regarding a multivariate interval constraint of the location parameter of a scale mixture of normal model. Using Shannon’s entropy, this study demonstrates how the prior, obtained by using two stages of a prior hierarchy, appropriately accounts for the information regarding the stochastic constraint and suggests an objective measure of the degree of belief in the stochastic constraint. The study also verifies that the proposed prior plays the role of bridging the gap between the canonical maximum entropy prior of the parameter with no interval constraint and that with a certain multivariate interval constraint. It is shown that the two-stage maximum entropy prior belongs to the family of rectangle screened normal distributions that is conjugate for samples from a normal distribution. Some properties of the prior density, useful for developing a Bayesian inference of the parameter with the stochastic constraint, are provided. We also propose a hierarchical constrained scale mixture of normal model (HCSMN), which uses the prior density to estimate the constrained location parameter of a scale mixture of normal model and demonstrates the scope of its applicability.

Keywords:

hierarchical constrained scale mixture of normal model; rectangle-screened normal distribution; two-stage maximum entropy prior; uncertain constraint; 62H30; 62F15

MSC:

62H30; 62F15

1. Introduction

Suppose

y_{i}

’s are independent observations from a scale mixture of a p-variate normal distribution with the

p \times 1

location parameter

θ

and known scale matrix. Then, a simple location model for the p-variate observations with

y_{i} \in R^{p}

is:

y_{i} = θ + ϵ_{i}, i = 1, \dots, n,

(1)

where the distribution of the

p \times 1

vector variable

ϵ_{i}

is

F \in F

with:

\begin{matrix} F = \{F : N_{p} (0, κ (η) Λ), η \sim G (η) with κ (η) > 0, and η > 0\}, \end{matrix}

(2)

where η is a mixing variable with the cdf

G (η)

and

κ (η)

is a suitably-chosen weight function.

Bayesian analysis of the model (1) begins with the specification of a prior distribution, which represents the information about the uncertain parameter

θ

that is combined with the joint probability distribution of

y_{i}

’s to yield the posterior distribution. When there are no constraints on the location parameter, then usual priors (e.g., Jeffreys invariant prior or an informative normal conjugate prior) can be used, and posterior inference can be performed without any difficulty. In some practical situations, however, we may have prior information that

θ

have a multivariate interval constraint, and thus, the value of

θ

needs to be located in a restricted space

C \subset R^{p},

where

C = (a, b)

is a p-variate interval with

a = {(a_{1}, \dots, a_{p})}^{⊤}

and

b = {(b_{1}, \dots, b_{p})}^{⊤} .

For the remainder of this paper, we use

θ \in C

to denote the multivariate interval constraint:

\{θ; a_{i} \leq θ_{i} \leq b_{i}, i = 1, \dots, p\}, where θ = {(θ_{1}, \dots, θ_{p})}^{⊤} .

(3)

When we have sufficient evidence that the constraint condition on the model (1) is true, then a suitable restriction on the parameter space, such as using a truncated prior distribution, is expected. See, e.g., [1,2,3,4], for various applications of the truncated prior distribution in Bayesian inference. However, it is often the case that prior information about the constraint is not certain for Bayesian inference. Further, even the observations from the assumed model (1) often do not provide strong evidence that the constraint is true and, therefore, may appear to contradict the assumption of the model associated with the constraint. In this case, it is expected that the uncertainty about the constraint is taken into account in eliciting a prior distribution of

θ .

When the parameter constraint is not certain for Bayesian estimation in the univariate normal location model, the seminal work by [5] proposed the use of a two-stage hierarchical prior distribution by constructing a family of skew densities based on the positively-truncated normal prior distribution. Generalizing the framework of the prior hierarchy proposed by [5,6,7,8,9,10], among others, various priors were considered for the Bayesian estimation of normal and scale mixture of normal models with uncertain interval constraints. In particular, [7] obtained the prior of

θ

as the normal selection distribution (see, e.g., [11]) and, thus, exploited the class of weighted normal distribution by [12] for reflecting the uncertain prior belief on

θ .

On the other hand, there are situations to set up a prior density of

θ

on the basis of information regarding the moments of the density, such as the mean and covariance matrix. A useful method of dealing with this situation is through the concept of entropy by [13,14]. Other general references where moment inequality constraints have been considered include [15,16]. To the best of our knowledge, however, a formal method to set up a prior density of

θ,

consistent with information regarding the moments of the density, as well as the uncertain prior belief on the location parameter, has not previously been investigated in the literature. Thus, such practical considerations motivate us to develop a prior density of

θ,

which is tackled in this paper.

As discussed by [17,18,19,20], the entropy has a direct relationship to information theory and measures the amount of uncertainty inherent in the probability distribution. Using this property of the entropy, we propose a two-stage hierarchical method for setting up the two-stage maximum entropy prior density of

θ .

The method will enable us to elicit information regarding the moments of the prior distribution, as well as the degree of belief in the constraint

θ \in C .

Furthermore, this paper also suggests an objective method to measure the degree of belief regarding the multivariate interval constraint accounted for by using the prior. We also propose a simple way of controlling the degree of belief regarding the constraint of

θ

in Bayesian inference. This is done by investigating the relation between the degree of belief and the enrichment of the hyper-parameters of the prior density. In this respect, the study concerning the two-stage maximum entropy prior is interesting both from a theoretical and an applied point of view. On the theoretical side, it develops yet another conjugate prior of constrained

θ

based on the maximum entropy approach. The study provides several properties of the proposed prior, which advocate the idea of two stages of a prior hierarchy to elicit information regarding the moments of the prior and the stochastic constraint of

θ .

From the applied view point, the prior is especially useful for a Bayesian subjective methodology for inequality constrained multivariate linear models.

The remainder of this paper is arranged as follows. In Section 2, we propose the two-stage maximum entropy prior of

θ

by applying Boltzmann’s maximum entropy theorem (see, e.g., [21,22]) to the frame of the two-stage prior hierarchy by [5]. We also suggest an objective measure of uncertainty regarding the stochastic constraint of

θ

that is accounted for by the two-stage maximum entropy prior. In Section 3, we briefly discuss the properties of the proposed prior of

θ,

which will be useful for the Bayesian analysis of

θ

subject to uncertainty regarding the multivariate interval constraint

θ \in C

. Section 4 provides a hierarchical scale mixture of normal model of Equation (1) using the two-stage prior, referred to as the hierarchical constrained scale mixture of normal model (HCSMN). Section 4 explores the Bayesian estimation model (1) by deriving the posterior distributions of the unknown parameters under the HCSMN and discusses the properties of the proposed measure of uncertainty that can be explained in the context of the HCSMN. In Section 5, we compare the empirical performance of the proposed prior based on synthetic data and real data applications with the HCSMN models for the estimation of

θ

with a stochastic multivariate interval constraint. Finally, the concluding remarks along with a discussion are provided in Section 6.

2. Two-Stage Maximum Entropy Prior

2.1. Maximum Entropy Prior

Sometimes, we have a situation in which partial prior information is available, outside of which it is desirable to use a prior that is as non-informative as possible. Assume that we can specify the partial information concerning

θ

in Equation (1) with continuous space

Θ .

That is:

E [t_{j} (θ)] = \int_{Θ}^{} t_{j} (θ) π (θ) d θ = t_{j}, j = 1, \dots, k .

(4)

The maximum entropy prior can be obtained by choosing

π (θ)

that maximizes the entropy:

ξ (π) = - \int_{Θ}^{} π (θ) log π (θ) d θ,

in the presence of the partial information in the form of Equation (4). A straightforward application of the calculus of variation leads us to the following theorem.

Lemma 1.

(Boltzmann’s maximum entropy theorem): The density

π (θ)

that maximizes

ξ (π)

, subject to the constraints

E [t_{j} (θ)] = t_{j},

j = 1, \dots, k,

takes the k-parameter exponential family form:

π_{m a x} (θ) \propto exp \{λ_{1} t_{1} (θ) + λ_{2} t_{2} (θ) \dots + λ_{k} t_{k} (θ)\}, θ \in Θ,

(5)

where

λ_{1}, λ_{2}, \dots, λ_{k}

can be determined, via the k-constraints, in terms of

t_{1},

…,

t_{k} .

Proof.

See, [22] for the proof. ☐

When the partial information is about the mean and covariance matrix of

θ,

outside of which it is desired to use a prior that is as non-informative as possible, then the theorem yields the following result.

Corollary 1.

As partial prior information, let the parameter

θ = {(θ_{1}, \dots, θ_{p})}^{⊤}

have a probability distribution on

R^{p}

with mean vector

θ_{0} = {(θ_{01}, \dots, θ_{0 p})}^{⊤}

and covariance matrix

Σ,

then the maximum entropy prior of

θ

is:

π_{m a x} (θ) = {(2 π)}^{- p / 2} {| Σ |}^{- 1 / 2} exp \{- \frac{1}{2} {(θ - θ_{0})}^{⊤} Σ^{- 1} (θ - θ_{0})\}, θ \in R^{p},

(6)

a density of the

N_{p} (θ_{0}, Σ)

distribution.

Proof.

According to Lemma 1, the partial information gives

t_{j} (θ) = θ_{j}

and

t_{j} = θ_{0 j}

for

j = 1, \dots, p,

t_{p + 1} (θ) = tr [Σ^{- 1} (θ - θ_{0}) {(θ - θ_{0})}^{⊤}]

and

t_{p + 1} = p .

\int_{R^{p}} π_{m a x} (θ) d θ = 1

requires

λ_{1} = \dots = λ_{p} = 0

and

λ_{p + 1} < 0 .

Thus, the density

π_{m a x} (θ)

is proportional to

exp \{- λ_{p + 1} tr [Σ^{- 1} (θ - θ_{0}) {(θ - θ_{0})}^{⊤}]\} .

Setting

λ_{p + 1} = - t_{p + 1} / 2 p

and obtaining the normalizing constant, then we see that the maximum entropy prior of the parameter in Equation (1) is Equation (6). ☐

In practical situations, we sometimes have partial information about a multivariate interval constraint (i.e.,

θ \in C

) in addition to the first two moments as given in Corollary 1.

Corollary 2.

Assume that the prior distribution of

θ = {(θ_{1}, \dots, θ_{p})}^{⊤}

has the mean vector

θ_{0} = {(θ_{01}, \dots, θ_{0 p})}^{⊤}

and covariance matrix

Σ .

Further assume, a priori, that the space of

θ

is constrained to a multivariate interval,

{θ; θ \in C}

given in Equation (3). Then, a constrained maximum entropy prior of

θ

is given by:

π_{c o n s t} (θ) = \frac{{(2 π)}^{- p / 2} {| Σ |}^{- 1 / 2} exp \{- \frac{1}{2} {(θ - θ_{0})}^{⊤} Σ^{- 1} (θ - θ_{0})\}}{P r (θ \in C)}, θ \in C,

(7)

a density of the

N_{p} (θ_{0}, Σ) I (θ \in C)

distribution, which is a p-dimensional truncated

N_{p} (θ_{0}, Σ)

distribution with the space

C .

Proof.

The certain multivariate interval constraint,

P r (θ \in C) = 1,

can be expressed in terms of moment,

E [I (θ \in C)] = 1 .

Upon applying Lemma 1 with

t_{j} (θ) = θ_{j}

and

t_{j} = θ_{0 j}

for

j = 1, \dots, p,

t_{p + 1} (θ) = tr [Σ^{- 1} (θ - θ_{0}) {(θ - θ_{0})}^{⊤}]

and

t_{p + 1} = p,

t_{p + 2} (θ) = I (θ \in C)

and

t_{p + 2} = 1,

and

\int_{C} π_{c o n s t} (θ) d θ = 1,

we see that

λ_{1} = \dots = λ_{p} = 0,

λ_{p + 1} < 0,

λ_{p + 2} = 1,

and

π_{c o n s t} (θ) \propto exp \{- λ_{p + 1} tr [Σ^{- 1} (θ - θ_{0}) {(θ - θ_{0})}^{⊤}]\} .

Setting

λ_{p + 1} = - 1 / 2

, and obtaining the normalizing constant, we obtain Equation (7). ☐

2.2. Two-Stage Maximum Entropy Prior

This subsection considers the case where the maximum entropy prior of

θ

has stochastic constraint in the form of a multivariate interval, i.e., Pr

(θ \in C) = γ,

where

C

is defined by Equation (3) and

γ \in [γ_{m a x}, 1] .

Here,

γ_{m a x}

is Pr

(θ \in C)

calculated by using the maximum entropy prior distribution in Equation (6). We develop a two-stage prior of

θ

, denoted by

π_{t w o} (θ)

, which has a different formula according to the degree of belief,

γ,

regarding the constraint.

Suppose we have only partial information about the covariance matrix,

Ω_{2},

of the parameter

θ,

in the first stage of a prior elicitation. Then, for a given mean vector

μ_{0}

, we may construct the maximum entropy prior, Equation (6), so that the first stage maximum entropy prior will be

π_{m a x} (θ | μ_{0})

, which is the density of the

N_{p} (μ_{0}, Ω_{2})

distribution. In addition to the information, suppose we have collected prior information about the unknown

μ_{0}

, which gives a value of the mean vector

θ_{0}

and covariance matrix

Ω_{1}

, as well as a stochastic (or certain) constraint, indicating Pr

(μ_{0} \in C) = 1 .

Then, in the second stage of the prior elicitation, one can elicit the additional prior partial information by using the constrained maximum entropy prior in Equation (7).

Analogous to the work of [5], we can specify all of the partial information about

θ

by following two stages of the maximum entropy prior hierarchy over

θ \in R^{p} :

\begin{matrix} π_{m a x} (θ | μ_{0}) & = & ϕ_{p} (μ_{0}, Ω_{2}), \end{matrix}

(8)

\begin{matrix} π_{c o n s t} (μ_{0}) & = & ϕ_{p} (θ_{0}, Ω_{1}) I (μ_{0} \in C), \end{matrix}

(9)

where

ϕ_{p} (θ_{0}, Ω_{1}) I (μ_{0} \in C)

is a truncated normal density, i.e., the density of the

N_{p} (θ_{0}, Ω_{1}) I (μ_{0} \in C)

variate, and

Ω_{1} + Ω_{2} = Σ .

Thus, the two stages of prior hierarchy are as follows. In the first stage, given

μ_{0},

θ

has a maximum entropy prior that is the

N_{p} (μ_{0}, Ω_{2})

distribution as in Equation (6). In the second stage,

μ_{0}

has a distribution obtained by truncating the maximum entropy prior distribution to elicit uncertainty about the prior information that

θ \in C .

It may be sensible to assume that the value of

θ_{0}

is located in the multivariate interval

C

or in the centroid of the interval.

Definition 1.

The marginal prior density of

θ

, obtained from the two stages of the maximum entropy prior hierarchy Equations (8) and (9), is called as a two-stage maximum entropy prior of

θ

.

Since

Ω_{1} + Ω_{2} = Σ,

if the constraint is completely certain (i.e.,

γ = 1

), we may set

Ω_{2} \to O

to get the

π_{c o n s t} (θ)

from the two stages of maximum entropy prior, while the two-stage prior yields

π_{m a x} (θ)

with

γ = γ_{m a x}

for the case where

Ω_{1} = O .

Thus, the hyper-parameters

Ω_{1}

and

Ω_{2}

may need to be assessed to achieve the degree of belief γ about the stochastic constraint. When

Ω_{1} \neq O

and

Ω_{2} \neq O,

the above hierarchy of priors yields the following marginal prior of

θ .

Lemma 2.

The two stages of the prior hierarchy of Equations (8) and (9) yield the two-stage maximum entropy prior distribution of

θ

given by:

\begin{matrix} π_{t w o} (θ) & = & \frac{ϕ_{p} (θ; θ_{0}, Σ) {\bar{Φ}}_{p} (C; μ, Q)}{{\bar{Φ}}_{p} (C; θ_{0}, Ω_{1})}, θ \in R^{p} \end{matrix}

(10)

where

ϕ_{p} (x; c, A)

denotes the pdf of

X \sim N_{p} (c, A)

and

{\bar{Φ}}_{p} (C; c, A)

denotes a p-dimensional rectangle probability of the distribution of

X

, i.e.,

P (X \in C),

μ = θ_{0} + Ω_{1} Σ^{- 1} (θ - θ_{0}),

Σ = Ω_{1} + Ω_{2}

and

Q = {(Ω_{1}^{- 1} + Ω_{2}^{- 1})}^{- 1} .

Proof.

\begin{matrix} π_{t w o} (θ) & = & \frac{\int_{μ_{0} \in C} ϕ_{p} (θ; μ_{0}, Ω_{2}) ϕ_{p} (μ_{0}; θ_{0}, Ω_{1}) d μ_{0}}{P r (μ_{0} \in C)}, \\ = & \frac{ϕ_{p} (θ; θ_{0}, Σ) \int_{μ_{0} \in C} ϕ_{p} (μ_{0}; μ, Q) d μ_{0}}{{\bar{Φ}}_{p} (C; θ_{0}, Ω_{1})}, \end{matrix}

because

μ = θ_{0} + Ω_{1} Σ^{- 1} (θ - θ_{0}) = θ + Ω_{2} Σ^{- 1} (θ_{0} - θ)

and

θ^{⊤} Ω_{2}^{- 1} θ + θ_{0}^{⊤} Ω_{1}^{- 1} θ_{0} - μ^{⊤} (Ω_{2}^{- 1} θ + Ω_{1}^{- 1} θ_{0}) = {(θ - θ_{0})}^{⊤} Σ^{- 1} (θ - θ_{0}) .

☐

In fact, the density

π_{t w o} (θ)

belongs to the family of rectangle screened multivariate normal (

R S N

) distributions studied by [23].

Corollary 3.

The distribution law of

θ

with the density in Equation (10) is:

\begin{matrix} θ & \overset{d}{=} & [X_{2} | X_{1} \in C] \sim R S N_{p} (C; τ, Ψ), \end{matrix}

(11)

which is a p-dimensional

R S N

distribution with respective location and scale parameters

τ

and Ψ and the rectangle screening space

C .

Here, the joint distribution of

X_{1}

and

X_{2}

is

N_{2 p} (τ, Ψ),

where

τ = {(θ_{0}^{⊤}, θ_{0}^{⊤})}^{⊤}

and

Ψ = (\begin{matrix} Ω_{1} & Ω_{1} \\ Ω_{1} & Σ \end{matrix})

.

Proof.

The density of

[X_{2} | X_{1} \in C]

is:

\begin{matrix} π_{t w o} (x_{2}) & = & \frac{ϕ_{p} (x_{2}; θ_{0}, Σ) \int_{C}^{} ϕ_{p} (x_{1}; μ_{x_{1} | x_{2}}, Ω_{x_{1} | x_{2}}) d x_{1}}{\int_{C}^{} ϕ_{p} (x_{1}; θ_{0}, Ω_{1}) d x_{1}} \\ = & \frac{ϕ_{p} (x_{2}; θ_{0}, Σ) {\bar{Φ}}_{p} (C; μ_{x_{1} | x_{2}}, Ω_{x_{1} | x_{2}})}{{\bar{Φ}}_{p} (C; θ_{0}, Ω_{1})} \end{matrix}

where

μ_{x_{1} | x_{2}} = θ_{0} + Ω_{1} Σ^{- 1} (x_{2} - θ_{0})

and

Ω_{x_{1} | x_{2}} = Ω_{1} - Ω_{1} Σ^{- 1} Ω_{1} .

By use of the binomial inverse theorem (see, e.g., [24] p. 23), one can easily see that

μ_{x_{1} | x_{2}}

and

Ω_{x_{1} | x_{2}}

are respectively equivalent to

μ

and

Q,

in Equation (10), provided that

x_{2}

is changed to

θ .

☐

According to [23], we see that the stochastic representation for the

R S N

vector

θ \sim R S N_{p} (C; τ, Ψ)

is:

\begin{matrix} θ & \overset{d}{=} & θ_{0} + Y_{1}^{(α, β)} + Y_{2}, \end{matrix}

(12)

where

Y_{1} \sim N_{p} (0, Ω_{1})

and

Y_{2} \sim N_{p} (0, Ω_{2})

are independent random vectors. Here,

Y_{1}^{(α, β)}

denotes a doubly-truncated multivariate normal random vector whose distribution is defined by

Y_{1}^{(α, β)} \overset{d}{=} [Y_{1} | Y_{1} \in (α, β)]

with

α = a - θ_{0}

and

β = b - θ_{0} .

This representation enables us to implement a one-for-one method for generating a random vector with the

R S N_{p} (C, τ, Ψ)

distribution. For generating the doubly-truncated multivariate normal vector

Y_{1}^{(α, β)},

the R package tmvtnorm by [25] can be used, where R is a computer language and an environment for statistical computing and graphics.

2.3. Entropy of a Maximum Entropy Prior

Suppose we have partial a priori information that we can specify values for the covariance matrices

Ω_{1}

and

Ω_{2},

where

Σ = Ω_{1} + Ω_{2} .

2.3.1. Case 1: Two-stage Maximum Entropy Prior

When the two-stage maximum entropy prior

π_{t w o} (θ)

is assumed for the prior distribution of

θ,

its entropy is given by:

\begin{matrix} E n t (π_{t w o} (θ)) & = & - \int_{R^{p}} π_{t w o} (θ) log π_{t w o} (θ) d θ \\ = & \frac{p}{2} log (2 π) + log {\bar{Φ}}_{p} (C; θ_{0}, Ω_{1}) + \frac{1}{2} tr [Σ^{- 1} E_{t w o} [(θ - θ_{0}) {(θ - θ_{0})}^{⊤}]] \\ + & \frac{1}{2} log | Σ | - E_{t w o} [log h (θ)], \end{matrix}

(13)

where

Σ = Ω_{1} + Ω_{2},

h (θ) = {\bar{Φ}}_{p} (C; θ_{0} + Ω_{1}^{⊤} Σ^{- 1} (θ - θ_{0}), Ω_{1} - Ω_{1}^{⊤} Σ^{- 1} Ω_{1}),

and the

E_{t w o}

denotes the expectation with respect to the

R S N

distribution with the density

π_{t w o} (θ) .

Equation (12) shows that

E [θ] = θ_{0} + ξ,

and

C o v (θ) = Ω_{2} + H .

Here,

ξ = {(ξ_{1}, \dots, ξ_{p})}^{⊤}

and

H = {h_{i j}},

i, j = 1, \dots, p,

are the mean vector and covariance matrix of the doubly-truncated multivariate normal random vector,

Y_{1} \sim N_{p} (0, Ω_{1}) I (y_{1} \in (α, β))

. Readers are referred to [25] with the R package tmvtnorm and [26] with the R package mvtnorm for implementing the respective calculations of doubly-truncated moments and integrations. As seen in Equation (13), an analytic calculation of

E [log h (θ)]

involves a complicated integration. Instead, by using a Monte Carlo integration, we may calculate it approximately. According to Equation (12), it follows that the stochastic representation of the prior distribution

θ \sim R S N_{p} (C; τ, Ψ)

with density

π_{t w o} (θ)

is useful for generating

θ

’s from the prior distribution

θ

by using the R packages mvtnorm and tmvtnorm and, hence, implementing the Monte Carlo integration.

2.3.2. Case 2: Constrained Maximum Entropy Prior

When the constrained maximum entropy prior

π_{c o n s t} (θ)

in Equation (7) is assumed for the prior distribution of

θ,

its entropy is given by:

\begin{matrix} E n t (π_{c o n s t} (θ)) & = & - \int_{C} π_{c o n s t} (θ) log π_{c o n s t} (θ) d θ \\ = & \frac{p}{2} log (2 π) + log {\bar{Φ}}_{p} (C; θ_{0}, Σ) + \frac{1}{2} log | Σ | \\ + & \frac{1}{2} tr (Σ^{- 1} E_{c o n s t} [(θ - θ_{0}) {(θ - θ_{0})}^{⊤}]) . \end{matrix}

The

E_{c o n s t}

denotes the expectation with respect to the doubly-truncated multivariate normal distribution with the density

π_{c o n s t} (θ)

, and its analytic calculation is not possible. Instead, the R packages tmvtnorm and mvtnorm are available for calculating the respective moment and integration in the expression of

E n t (π_{c o n s t}) .

2.3.3. Case 3: Maximum Entropy Prior

On the other hand, if the maximum entropy prior

π_{m a x} (θ)

is assumed for the prior distribution of the location parameter

θ,

its entropy is given by:

\begin{matrix} E n t (π_{m a x} (θ)) & = & - \int_{R^{p}} π_{m a x} (θ) log π_{m a x} (θ) d θ \\ = & \frac{p}{2} + \frac{p}{2} log (2 π) + \frac{1}{2} log | Σ | . \end{matrix}

The following theorem asserts the relationship among the degrees of belief, accounted for by the three priors, about the a priori uncertain constraint

\{θ; θ \in C\} .

Theorem 1.

The degrees of belief

γ_{m a x},

γ_{t w o},

and

γ_{c o n s t}

about the a priori constraint

\{θ; θ \in C\},

accounted for by

π_{m a x} (θ),

π_{t w o} (θ)

and

π_{c o n s t} (θ)

, have the following relation:

\begin{matrix} γ_{m a x} & \leq & γ_{t w o} \leq γ_{c o n s t}, \end{matrix}

(14)

provided that the parameters of

π_{t w o} (θ)

in Equation (10) satisfy:

{\bar{Φ}}_{2 p} (C^{*}; τ, Ψ) \geq {\bar{Φ}}_{p} (C; θ_{0}, Ω_{1}) {\bar{Φ}}_{p} (C; θ_{0}, Σ),

where

C^{*} = {x; x_{1} \in C, x_{2} \in C}

denotes the

2 p

-variate interval of random vector

X = {(X_{1}^{⊤}, X_{2}^{⊤})}^{⊤},

the equality

γ_{m a x} = γ_{t w o}

holds for

Ω_{1} = O,

γ_{t w o} = γ_{c o n s t}

holds for

Ω_{2} = O

and

γ_{m a x} = γ_{t w o} = γ_{c o n s t}

holds for

C = R^{p} .

Proof.

The conditions for equalities are straightforward from the stochastic representation in Equation (12). Under the

π_{m a x} (θ)

in Equation (6),

γ_{m a x} = P r (θ \in C) = {\bar{Φ}}_{p} (C; θ_{0}, Σ), γ_{t w o} = \int_{θ \in C} π_{t w o} (θ) d θ = \frac{{\bar{Φ}}_{2 p} (C^{*}; τ, Ψ)}{{\bar{Φ}}_{p} (C; θ_{0}, Ω_{1})},

because

π_{t w o} (θ)

is the density of

θ \sim R S N_{p} (C; τ, Ψ),

and

γ_{c o n s t} = \int_{θ \in C}^{} π_{c o n s t} (θ) d θ = 1 .

Therefore, the condition

{\bar{Φ}}_{2 p} (C^{*}; τ, Ψ) \geq {\bar{Φ}}_{p} (C; θ_{0}, Ω_{1}) {\bar{Φ}}_{p} (C; θ_{0}, Σ)

gives the inequality relation. ☐

3. Properties

3.1. Objective Measure of Uncertainty

In constructing the two stages of prior hierarchy over

θ \in R^{p},

the usual practice is to set the value of

θ_{0}

as the centroid of the uncertain constrained multivariate interval

C = (a, b) .

In this case, we have the following result.

Corollary 4.

In the case where the value of

θ_{0}

in

π_{t w o} (θ)

is the centroid of the multivariate interval

C,

\begin{matrix} γ_{m a x} & \leq & γ_{t w o} \leq γ_{c o n s t} . \end{matrix}

(15)

Proof.

Equation (12) indicates that:

\begin{matrix} γ_{t w o} & = & P r (Y_{1} + Y_{2} \in (α, β) | Y_{1} \in (α, β)) and γ_{m a x} = P r (Y_{1} + Y_{2} \in (α, β)), \end{matrix}

where

Y_{1} \sim N_{p} (0, Ω_{1})

and

Y_{2} \sim N_{p} (0, Ω_{2})

are independent random vectors,

α = a - θ_{0},

and

β = b - θ_{0} .

When

θ_{0}

is the centroid of

C,

α = - β

, and hence:

\begin{matrix} P r (Y_{1} + Y_{2} \in (α, β), Y_{1} \in (α, β)) \geq P r (Y_{1} + Y_{2} \in (α, β)) P r (Y_{1} \in (α, β)) \end{matrix}

by the theorem of [27]. This leads to the first inequality,

γ_{m a x} \leq γ_{t w o} .

Since

γ_{c o n s t} = 1

, we see that the second inequality in Equation (15) holds.

The following are immediate from Theorem 1 and Corollary 4: (i) The two-stage maximum entropy prior achieves

γ_{t w o}

for the degree of belief about the uncertain multivariate interval constraint

\{θ; θ \in C\}

, and its value satisfies

γ_{t w o} \in [γ_{m a x}, 1]

if the condition in the theorem is satisfied. Note that the equality

γ_{t w o} = 1

holds for

Ω_{2} = O

; (ii) The degree of belief about the multivariate interval constraint is a function of the covariance matrices

Ω_{1}

and

Ω_{2} .

Thus, if we have the partial a priori information that specifies values of the covariance matrices

Ω_{1}

and

Ω_{2},

the degree of belief

γ_{t w o},

associated with

π_{t w o} (θ),

can be assessed.

Figure 1 compares the degrees of belief about the uncertain multivariate interval constraint

\{θ; θ \in C\},

accounted for by the three priors of

θ .

The figure is obtained in terms of

δ \in [0, 1]

with

Ω_{1} = δ Σ

and

Ω_{2} = (1 - δ) Σ,

p = 3,

C = (a, 2 1_{p} + a)

and

θ_{0} = 0,

where

a = (- 0.1 \times p) 1_{p},

Σ = σ^{2} (1 - ρ) I_{p} + σ^{2} ρ 1_{p} 1_{p}^{⊤}

is an intra-class covariance matrix, and

1_{p}

denotes a

p \times 1

summing vector whose every element is unity. When the constraint is changed to

C = (- (2 1_{p} + a), - a)

in this comparison, one can easily check that the degrees of belief do not change and give the same results seen in Figure 1. The figure depicts exactly the same inequality relationship given in Theorem 1. In comparison with

γ_{t w o}

and

γ_{c o n s t} = 1,

we see that the degree of belief in the uncertain constraint, accounted for by using

π_{t w o} (θ),

becomes large as

Ω_{2} \to O

(or equivalently

Ω_{1} \to Σ) .

In particular, this tendency is more evident for small

σ^{2}

and large ρ values. Third, the difference in

γ_{t w o}

and

γ_{m a x}

in the right panel suggests that the difference becomes large as

Ω_{2}

tends to

O .

In particular, for fixed values of δ and

ρ,

the figure shows that the difference increases as the value of

σ^{2}

decreases, while it decreases as the value of ρ increases for fixed values of δ and

σ^{2} .

Therefore, the figure confirms that the two-stage maximum entropy prior

π_{t w o} (θ)

accounts for the a priori uncertain constraint

\{θ; θ \in C\}

with the degree of belief

γ_{t w o} \in [γ_{m a x}, 1] .

The figure also notes that the magnitude of

γ_{t w o}

depends on both the first stage covariance

Ω_{2}

and the second stage covariance

Ω_{1}

in the two stages of prior hierarchy in Equations (8) and (9). All other choices of the values of

p,

ρ,

and

C,

satisfying the condition in Theorem 1, produced similar graphics depicted in Figure 1, with the exception of the magnitude of the differences among the degrees of belief.

Figure 1. Graphs of the difference between

p_{1} = γ_{m a x},

p_{2} = γ_{t w o}

and

p_{3} = γ_{c o n s t}

. (a), (c), and (e) for the difference between

p_{3}

and

p_{2}

; (b), (d), and (f) for the difference between

p_{2}

and

p_{1}

.

3.2. Properties of the Entropy

The expected uncertainty in the multivariate interval constraint of the location parameter

θ

,

\{θ; θ \in C\},

accounted for by the two-stage prior

π_{t w o} (θ)

, is measured by its entropy

E n t (π_{t w o} (θ))

, and information about the constraint is defined by

- E n t (π_{t w o} (θ)) .

Thus, as considered by [20,28], the difference between the Shannon measures of information, before and after applying the uncertain constraint

{θ; θ \in C},

can be explained by the following property.

Corollary 5.

When

θ_{0}

is the centroid of the multivariate interval

C,

\begin{matrix} E n t (π_{m a x} (θ)) \geq E n t (π_{t w o} (θ)) \geq E n t (π_{c o n s t} (θ)), \end{matrix}

(16)

where

E n t (π_{t w o} (θ))

reduces to

E n t (π_{m a x} (θ))

for

Ω_{1} = O,

while

E n t (π_{t w o} (θ))

is equal to

E n t (π_{c o n s t} (θ))

for

Ω_{2} = O .

All of the equalities hold for

C = R^{p} .

Proof. It is straightforward to check the equalities by using the stochastic representation in Equation (12). Since

π_{m a x} (θ)

is the maximum entropy prior, it is sufficient to show that

E n t (π_{t w o} (θ)) \geq E n t (π_{c o n s t} (θ)) .

First,

Σ - Ω_{1} = Ω_{2} > 0

implies that

{\bar{Φ}}_{p} (C; θ_{0}, Ω_{1}) \geq {\bar{Φ}}_{p} (C; θ_{0}, Σ)

by the lemma of [27]. Second,

γ_{c o n s t} = P r (Y_{1} + Y_{2} \in (α, β) | Y_{1} + Y_{2} \in (α, β)) \geq P r (Y_{1} + Y_{2} \in (α, β) | Y_{1} \in (α, β)) = γ_{t w o}

by Corollary 4. This and the lemma of [27] indicate that

C o v (Y_{1} + Y_{2} | Y_{1} \in (α, β)) - C o v (Y_{1} + Y_{2} | Y_{1} + Y_{2} \in (α, β))

is a positive-semi-definite, and hence,

tr (Σ^{- 1} E_{t w o} [(θ - θ_{0}) {(θ - θ_{0})}^{⊤}]) \geq tr (Σ^{- 1} E_{c o n s t} [(θ - θ_{0}) {(θ - θ_{0})}^{⊤}])

by ([29], p. 54), where

E_{t w o} [(θ - θ_{0}) {(θ - θ_{0})}^{⊤}] = C o v (Y_{1} + Y_{2} | Y_{1} \in (α, β))

and

E_{c o n s t} [(θ - θ_{0}) {(θ - θ_{0})}^{⊤}] = C o v (Y_{1} + Y_{2} | Y_{1} + Y_{2} \in (α, β))

for

α = - β .

These two results give the inequality

E n t (π_{t w o} (θ)) \geq E n t (π_{c o n s t} (θ)),

because

E_{t w o} [log h (θ)] \leq 0 .

Figure 2 depicts the difference between

E n t (π_{m a x} (θ)),

E n t (π_{t w o} (θ))

and

E n t (π_{c o n s t} (θ))

using the same parameter values used in constructing Figure 1. Figure 2 coincides with the inequality relation given in Corollary 5 and indicates the following consequences: (i) Even though

θ_{0}

is not the centroid of the multivariate interval

C,

we see that

E n t (π_{m a x} (θ)) > E n t (π_{t w o} (θ)) > E n t (π_{c o n s t} (θ))

for

δ \in (0, 1) .

(ii) The difference

E n t (π_{t w o} (θ)) - E n t (π_{c o n s t} (θ))

is a monotone decreasing function of

δ,

while

E n t (π_{m a x} (θ)) - E n t (π_{t w o} (θ))

is a monotone increasing function. (iii) The differences get bigger the larger

σ^{2}

becomes for

δ \in (0, 1) .

This indicates that the entropy of

π_{t w o} (θ)

is associated not only with the covariance of the first stage prior

Ω_{2}

, but that of the second stage prior

Ω_{1}

in Equations (8) and (9), respectively. (iv) Upon comparing Figure 1 and Figure 2, the entropy

E n t (π_{t w o} (θ))

is closely related to the degree of belief

γ_{t w o}

, such that:

E n t (π_{t w o} (θ)) = c_{t w o} (1 - γ_{t w o}),

where

c_{t w o} > 0

is obtained by using Equations (13) and (16) and

1 - γ_{t w o}

denotes the degree of uncertainty in a priori information regarding the multivariate interval constraint

\{θ; θ \in C\}

elicited by

π_{t w o} (θ) .

These consequences and Corollary 5 indicate that

1 - γ_{t w o}

stands between

1 - γ_{c o n s t}

and

1 - γ_{m a x} .

Thus, the two-stage prior

π_{t w o} (θ)

is useful for eliciting uncertain information about the multivariate interval constraint. Theorem 1 and the above statements produce an objective method for eliciting the stochastic constraint

{θ; θ \in C}

via

π_{t w o} (θ) .

Figure 2. Graphs of the entropy difference between

E_{1} = E n t (π_{m a x} (θ)),

E_{2} = E n t (π_{t w o} (θ))

and

E_{3} = E n t (π_{c o n s t} (θ))

for different values of

δ \in [0, 1] .

(a), (c), and (e) for the difference between

E_{2}

and

E_{3}

; (b), (d), and (f) for the difference between

E_{1}

and

E_{2}

.

Corollary 6.

Suppose the degree (

1 - γ_{t w o}

) of uncertainty associated with the stochastic constraint

{θ; θ \in C}

is given. An objective way of eliciting the prior information by using

π_{t w o} (θ)

is to choose the covariance matrices

Ω_{1}

and

Ω_{2}

in

π_{t w o} (θ)

, such that

γ_{t w o} = {\bar{Φ}}_{2 p} (C^{*}; τ, Ψ) / {\bar{Φ}}_{p} (C; θ_{0}, Ω_{1}),

where

Σ = Ω_{1} + Ω_{2}

is known and

Ω_{1} = δ Σ

with

δ \in [0, 1] .

Since

γ_{c o n s t} = 1

, the degree of uncertainty (

1 - γ_{t w o}

) is equal to

γ_{c o n s t} - γ_{t w o} .

The left panel of Figure 1 plots a graph of

1 - γ_{t w o}

against

δ .

The graph indicates that a δ value for

π_{t w o} (θ)

can be easily determined for given

Σ,

and the value is in inverse proportion to the degree of uncertainty regardless of

Σ .

3.3. Posterior Distribution

Suppose the distribution of the error vector in the model (1) belongs to the family of scale mixture of normal distributions defined in Equation (2); then, the conditional distribution of the data information from

n = 1

is

[y | η] \sim N_{p} (θ, κ (η) Λ) .

It is well known that the priors

π_{m a x} (θ)

and

π_{c o n s t} (θ)

are conjugate priors for the location vector

θ,

provided that η and Λ are known. That is, conditional on

η,

each prior satisfies the conjugate property that the prior and the posterior distributions of

θ

belong to the same family of distributions. The following corollary provides that the conditional conjugate property also applies to

π_{t w o} (θ) .

Corollary 7.

Let

[y | η] \sim N_{p} (θ, κ (η) Λ)

with known

Λ .

Then, the two-stage maximum entropy prior

π_{t w o} (θ)

in Equation (10) yields the conditional posterior distribution of

θ

given by:

\begin{matrix} [θ | y, η] & \sim & R S N_{p} (C; τ_{η}^{*}, Ψ_{η}^{*}), \end{matrix}

(17)

where

Ω_{1} = δ Σ,

δ \in (0, 1),

Σ = Ω_{1} + Ω_{2},

Σ_{1 η}^{*} = δ (1 - δ) Σ + δ^{2} Σ_{η}^{*},

Σ_{η}^{*} = {(κ {(η)}^{- 1} Λ^{- 1} + Σ^{- 1})}^{- 1},

τ_{η}^{*} = (\begin{matrix} θ_{0 η}^{*} \\ θ_{η}^{*} \end{matrix}), Ψ_{η}^{*} = (\begin{matrix} Σ_{1 η}^{*} & δ Σ_{η}^{*} \\ δ Σ_{η}^{*} & Σ_{η}^{*} \end{matrix}), θ_{0 η}^{*} = (1 - δ) θ_{0} + δ θ_{η}^{*}, a n d θ_{η}^{*} = Σ_{η}^{*} (κ {(η)}^{- 1} Λ^{- 1} y + Σ^{- 1} θ_{0}) .

Proof.

When the two-stage prior

π_{t w o} (θ)

in Equation (10) is used, the conditional posterior density of

θ

given η is:

\begin{matrix} p (θ | y, η) & \propto & ϕ_{p} (y; θ, κ (η) Λ) ϕ_{p} (θ; θ_{0}, Σ) {\bar{Φ}}_{p} (C; μ, Q) / {\bar{Φ}}_{p} (C; θ_{0}, Ω_{1}), \\ \propto & ϕ_{p} (θ; θ_{η}^{*}, Σ_{η}^{*}) {\bar{Φ}}_{p} (C; μ_{η}^{*}, Q_{η}^{*}), \end{matrix}

in that

{\bar{Φ}}_{p} (C; μ, Q) = {\bar{Φ}}_{p} (C; μ_{η}^{*}, Q_{η}^{*}),

where

μ = θ_{0} + Ω_{1} Σ^{- 1} (θ - θ_{0}),

μ_{η}^{*} = θ_{0 η}^{*} + δ (θ - θ_{η}^{*})

and

Q_{η}^{*} = Σ_{1 η}^{*} - δ^{2} Σ_{η}^{*} .

The last term of the proportional relations is a kernel of the

R S N_{p} (C; τ_{η}^{*}, ψ_{η}^{*})

density defined by Corollary 3. ☐

Corollaries 3 and 7 establish the conditional conjugate property of

π_{t w o} (θ) :

Suppose the location parameter

θ

is the normal mean vector, then the

R S N

prior distribution, i.e.,

π_{t w o} (θ),

yields the conditional posterior distribution, which belongs to the class of

R S N

distributions as given in Corollary 7. In the particular case where the distribution of η degenerates at

κ (η) = 1,

i.e., the model (1) is a normal model, then the conditional conjugate property of

π_{t w o} (θ)

reduces to the unconditional conjugate property.

Using the relation between the distribution of Equation (11) and that of Equation (12), we can obtain the stochastic representation for the conditional posterior

R S N

distribution in Equation (17) as follows.

Corollary 8.

Conditional on the mixing variable η, the stochastic representation of

[θ | y, η] \sim R S N_{p} (C; τ_{η}^{*}, Ψ_{η}^{*})

is:

\begin{matrix} [θ | y, η] & \overset{d}{=} & θ_{η}^{*} + δ Σ_{η}^{*} Σ_{1 η}^{* - 1} W_{1}^{(α_{η}^{*}, β_{η}^{*})} + {(Σ_{η}^{*} - δ^{2} Σ_{η}^{*} Σ_{1 η}^{* - 1} Σ_{η}^{*})}^{1 / 2} W_{2}, \end{matrix}

(18)

where

W_{1} \sim N_{p} (0, Σ_{1 η}^{*})

and

W_{2} \sim N_{p} (0, I_{p})

are independent and

W_{1}^{(α_{η}^{*}, β_{η}^{*})} \overset{d}{=} [W_{1} | W_{1} \in (α_{η}^{*}, β_{η}^{*})],

where

α^{*} = a - θ_{0 η}^{*}

and

β^{*} = b - θ_{0 η}^{*} .

Proof.

Suppose the distributions of

X_{1}

and

X_{2}

in Equation (11) changed to

X_{1} \overset{d}{=} θ_{0 η}^{*} + W_{1}

and

X_{2} \overset{d}{=} θ_{η}^{*} + δ Σ_{η}^{*} Σ_{1 η}^{* - 1} W_{1} + {(Σ_{η}^{*} - δ^{2} Σ_{η}^{*} Σ_{1 η}^{* - 1} Σ_{η}^{*})}^{1 / 2} W_{2} .

Then, the stochastic representation in Equation (12) associated with the distribution

[X_{2} | X_{1} \in C]

in Equation (11) gives the result. ☐

4. Hierarchical Constrained Scale Mixture of Normal Model

For the model (1), if we are completely sure about a multivariate interval constraint on

θ,

a suitable restriction on the parameter space

θ \in R^{p},

such as using a truncated normal prior distribution, is expected for eliciting the information. However, there are certain cases where we have a priori information that the location parameter

θ

is highly likely to have a multivariate interval constraint, and thus, the value of

θ

needs to be located with uncertainty in a restricted space

{θ \in C}

with

C = (a, b)

. Then, we cannot be sure about the constraint, and then, the constraint becomes stochastic (or uncertain), as in our problem of interest. In this case, the uncertainty about the constraint must be taken into account in the estimation procedure of the model (1). This section considers a hierarchical Bayesian estimation of the scale mixture of normal models reflecting the uncertain prior belief on

θ .

4.1. The Hierarchical Model

Let us consider a hierarchical constrained scale mixture of normal model (HCSMN) that uses the hierarchy of the scale mixture of normal model (1) and includes the two stages of a prior hierarchy in the following way:

\begin{matrix} [y_{i} | η_{i}, Λ] & = & θ + ϵ_{i}, ϵ_{i} \sim N_{p} (0, κ (η_{i}) Λ), i = 1, \dots, n, \\ θ | μ_{0} & \sim & N_{p} (μ_{0}, Ω_{2}), independent of {(ϵ_{1}, \dots, ϵ_{n})}^{⊤}, \\ μ_{0} & \sim & N_{p} (θ_{0}, Ω_{1}) I (μ_{0} \in C), \\ Λ & \sim & W_{p}^{- 1} (D, d), d > 2 p, \\ η_{i} & \overset{i i d}{\sim} & g (η), i = 1, \dots, n, \end{matrix}

(19)

where

W_{p}^{- 1} (D, d)

denotes the inverted Wishart distribution with positive definite scale matrix D and d degrees of freedom whose pdf

W_{p}^{- 1} (Λ; D, d)

is:

W_{p}^{- 1} (Λ; D, d) \propto {| Λ |}^{- d / 2} exp \{- \frac{1}{2} tr (Λ^{- 1} D)\},

Ω_{1} + Ω_{2} = Σ

with

Ω_{1} = δ Σ,

and

δ \in [0, 1] .

4.2. The Gibbs Sampler

Based on the HCSMN model structure with the likelihood and the prior distributions in Equation (19), the joint posterior distribution of

θ,

Λ and

η = {(η_{1}, \dots, η_{n})}^{⊤}

given the data

{y_{1}, \dots, y_{n}}

is:

\begin{matrix} p (θ, Λ, η | D a t a) & \propto & \prod_{i = 1}^{n} {| κ (η_{i}) Λ |}^{- 1 / 2} exp \{- \frac{1}{2} tr [Λ^{- 1} κ {(η_{i})}^{- 1} (y_{i} - θ) {(y_{i} - θ)}^{⊤}]\} \\ \times & ϕ_{p} (θ; μ_{0}, Ω_{2}) ϕ_{p} (μ_{0}; θ_{0}, Ω_{1}) I (μ_{0} \in C) \\ \times & {| Λ |}^{- d / 2} exp \{- \frac{1}{2} tr (Λ^{- 1} D)\} \prod_{i = 1}^{n} g_{i} (η_{i}), \end{matrix}

(20)

where

g (η_{i})

’s denote the densities of the mixing variables

η_{i}

’s. Note that the joint posterior of Equation (20) is not simplified in an analytic form of the known density and, thus, intractable for the posterior inference. Instead, we use the Gibbs sampler for the posterior inference. See [30] for a reference. To run the Gibbs sampler, we need the following full conditional posterior distributions:

(i): The full conditional posterior densities of $η_{i}$ ’s are given by:

$\begin{matrix} p (η_{i} | θ, Λ, y_{i}) & \propto & κ {(η_{i})}^{- \frac{p}{2}} exp \{- \frac{{(y_{i} - θ)}^{⊤} Λ^{- 1} (y_{i} - θ)}{2 κ (η_{i})}\} g (η_{i}), i = 1, \dots, n, \end{matrix}$

(21)
(ii): The full conditional distribution of $θ$ is obtained by using the way analogous to the proof of Corollary 7. It is:

$[θ | Λ, η, D a t a] \sim R S N_{p} (C; τ_{p o s}, Ψ_{p o s}),$

(22)

where:

$τ_{p o s} = (\begin{matrix} τ_{0} \\ τ_{1} \end{matrix}), Ψ_{p o s} (\begin{matrix} Ω_{0}^{*} & δ Ω^{*} \\ δ Ω^{*} & Ω^{*} \end{matrix}), τ_{0} = (1 - δ) θ_{0} + δ τ_{1},$

$τ_{1} = Ω^{*} (Σ^{- 1} θ_{0} + \sum_{i = 1}^{n} {(κ (η_{i}) Λ)}^{- 1} y_{i}),$ $Ω^{*} = {(Σ^{- 1} + \sum_{i = 1}^{n} {(κ (η_{i}) Λ)}^{- 1})}^{- 1},$ and $Ω_{0}^{*} = δ (1 - δ) Σ + δ^{2} Ω^{*} .$
(iii): The full conditional posterior distribution of Λ is an inverse-Wishart distribution:

$[Λ | θ, η, D a t a] \sim W_{p}^{- 1} (V, m), m > 2 p,$

(23)

where $V = D + \sum_{i = 1}^{n} κ {(η_{i})}^{- 1} (y_{i} - θ) {(y_{i} - θ)}^{⊤}$ and $m = n + d .$

4.3. Markov Chain Monte Carlo Sampling Scheme

When conducting a posterior inference of the HCSMN model, using the Gibbs sampling algorithm with the full conditional posterior distributions of

η_{i}

’s,

θ

and Λ, the following points should be noted.

note 1: Variable $η_{i}$ at $κ (η_{i}) = 1,$ i.e., the HCN (hierarchical constrained normal) model with $ϵ_{i} \overset{i i d}{\sim} N_{p} (0, Λ),$ $i = 1, \dots, n,$ the Gibbs sampler consists of two conditional distributions $[θ | Λ, D a t a]$ and $[Λ | θ, D a t a] .$ To sample from the first full conditional posterior distribution, we can utilize the stochastic representations of the $R S N$ distribution in Corollary 8. The R package tmvtnorm and the R package mvtnorm can be used to sample from the $R S N$ distribution in Equation (22).
note 2: According to choice of the distribution $η_{i}$ and the mixing function $κ (η_{i}),$ the HCSMN model may produce a different model other than the HCN model, such as hierarchical constrained multivariate $t_{ν}$ (HC $t_{ν}$ ), hierarchical constrained multivariate $l o g i t$ , hierarchical constrained multivariate $s t a b l e$ and hierarchical constrained multivariate $e x p o n e n t i a l$ $p o w e r$ models. See, e.g., [31,32], for various distributions of $η_{i}$ and corresponding function $κ (η_{i})$ , which can be used to construct the HCSMN model.
note 3: When the hierarchical constrained multivariate $t_{ν}$ (HC $t_{ν}$ ) model is considered, the hierarchy of the model in Equation (19) consists of $ϵ_{i} \overset{i i d}{\sim} N_{p} (0, κ (η_{i}) Λ)$ with $κ (η_{i}) = η_{i}^{- 1}$ and $η_{i} \sim G a m m a (ν / 2, ν / 2),$ $i = 1, \dots, n .$ Thus, the Gibbs sampler comprises the conditional posterior Equations (21)–(23). Under the HC $t_{ν}$ model, the distribution of Equation (21) reduces to:

$[η_{i} | θ, Λ, y_{i}] \sim G a m m a (ν^{*} / 2, h / 2),$

where $ν^{*} = p + ν$ and $h = ν + {(y_{i} - θ)}^{⊤} Λ^{- 1} (y_{i} - θ) .$ To limit model complexity, we consider only fixed ν, so that we can investigate different HC $t_{ν}$ models. As suggested by [32], a uniform prior on $1 / ν$ ( $0 < 1 / ν < 1$ ) can be considered. However, this will bring additional computational burden.
note 4: Except for the HCN and HC $t_{ν}$ models, the Metropolis–Hastings algorithm within the Gibbs sampler is used for estimating the HCSMN models, because the conditional posterior densities Equation (20) do not have explicit forms of known distributions as in Equations (21) and (22). See, e.g., [22], for the algorithm for sampling $η_{i}$ from various mixing distributions, $g_{i} (η_{i}) .$ A general procedure for the algorithm is as follows: Given the current values $Θ = {η, θ, Λ}$ , we independently generate a candidate $η_{i}$ from a proposal density $q (η_{i}^{*} | η_{i}) = g_{i} (η_{i}^{*}),$ as suggested by [33], which is used for a Metropolis–Hastings algorithm. Then, accept the candidate value with the acceptance rate:

$α (η_{i}, η_{i}^{*}) = min \{\frac{p (Θ | η_{i}^{*})}{p (Θ | η_{i})}, 1\}$

$i = 1, \dots, n .$ Because the target density is proportional to $p (Θ | η_{i}) g_{i} (η_{i})$ and $p (Θ | η_{i}) = ϕ_{p} (y_{i}; θ, κ (η_{i}) Λ)$ is uniformly bounded for $η_{i} > 0 .$
note 5: As noted from Equations (8) and (9), the second and third stage priors of the HCSMN model in Equation (19) reduce to the two-stage prior $π_{t w o} (θ),$ eliciting the stochastic multivariate interval constraint with degree of uncertainty $1 - γ_{t w o} .$ Instead, if the maximum entropy prior $π_{m a x} (θ)$ and the constrained maximum entropy prior $π_{c o n s t} (θ)$ are used for the HCSMN, then the respective full conditional distributions of $θ$ of the Gibbs sampler change from Equation (22) to:

$[θ | Λ, η, D a t a] \sim N_{p} (τ_{1}, Ω^{*}) for θ \in R^{p} and [θ | Λ, η, D a t a] \sim N_{p} (τ_{1}, Ω^{*}) I (θ \in C),$

where $τ_{1}$ and $Ω^{*}$ are the same as given in Equation (22).

4.4. Bayes Estimation

For a simple example, let us consider the HCN model with known

Λ .

When we assume a stochastic constraint

{θ; θ \in C}

obtained from a priori information, we may use the two-stage maximum entropy prior

π_{t w o} (θ)

defined by the second and third stages of the HCSMN model (19) with

δ \in (0, 1),

where the value of δ is determined by using Corollary 6. This yields a Bayes estimate based on the two-stage maximum entropy prior. Corollary 8 yields:

\begin{matrix} {\hat{θ}}_{t w o} & = & τ_{1} + δ Ω^{*} Ω_{0}^{* - 1} E [θ_{t n}^{*}] = τ_{1} + δ Ω^{*} Ω_{0}^{* - 1} ζ \end{matrix}

(24)

and:

ζ = {(ζ_{1}, \dots, ζ_{p})}^{⊤} with ζ_{i} = w_{0 i} \frac{ϕ (u_{i} / w_{0 i}) - ϕ (v_{i} / w_{0 i})}{Φ (v_{i} / w_{0 i}) - Φ (u_{i} / w_{0 i})}, i = 1, \dots, p,

where

θ_{t n}^{*} \sim N_{p} (0, Ω_{0}^{*})

I

(θ_{t n}^{*} \in (u, v)),

a truncated normal distribution with

u = a - τ_{0} = {(u_{1}, \dots, u_{p})}^{⊤}

and

v = b - τ_{0} = {(v_{1}, \dots, v_{p})}^{⊤}

and

w_{0 i}

denotes i-th diagonal element of

Ω_{0}^{*} .

Here,

τ_{1}

and

τ_{0}

are the same as those in Equation (22), and

ϕ (\cdot)

denotes the univariate standard normal density function. See [25,34] for the first moment of the truncated multivariate normal distribution and for a numerical calculation of the posterior covariance matrix

C o v (θ_{t n}^{*}),

respectively.

On the other hand, when we have certainty about the constraint

{θ; θ \in C},

we may use the HCSMN model with

δ = 1,

which uses the constrained maximum entropy prior

π_{c o n s t} (θ)

instead of

π_{t w o} (θ)

in its hierarchy. This case gives the Bayes estimate:

\begin{matrix} {\hat{θ}}_{c o n s t} & = & E [θ_{t n}] = τ_{1} + ζ^{*} \end{matrix}

(25)

and:

ζ^{*} = {(ζ_{1}^{*}, \dots, ζ_{p}^{*})}^{⊤} with ζ_{i}^{*} = w_{i} \frac{ϕ (a_{i} / w_{i}) - ϕ (b_{i} / w_{i})}{Φ (b_{i} / w_{i}) - Φ (a_{i} / w_{i})}, i = 1, \dots, p,

where

θ_{t n} \sim N_{p} (τ_{1}, Ω^{*})

I

(θ_{t n} \in C)

and

w_{i}

denotes i-th diagonal element of

Ω^{*} .

On the contrary, when we have completely no a priori information about the constraint in the space of

θ,

the HCSMN model with the maximum entropy prior

π_{m a x} (θ)

(equivalently, the HCSMN model with

δ = 0

) may be used for the posterior inference. In this model, the Bayes estimate of the location parameter is given by:

\begin{matrix} {\hat{θ}}_{m a x} & = & τ_{1} . \end{matrix}

(26)

Comparing Equations (24) and (25) to Equation (26), we see that Equations (24) and (25) are the same for

δ = 1

, and the last term in Equation (24) vanishes when we assume that there is no a priori information about the stochastic constraint,

{θ; θ \in C} .

In this sense, the last term in Equation (24) can be interpreted as a shrinkage effect of the HCSMN model with

δ \neq 0 .

This effect makes the Bayes estimator of

θ

shrink toward the stochastic constraint. In addition, we can calculate the difference between the estimates in Equations (24) and (25):

D i f f = {\hat{θ}}_{c o n s t} - {\hat{θ}}_{t w o} = ζ^{*} - δ Ω^{*} Ω_{0}^{* - 1} ζ .

This difference vector is a function of the degree of belief

γ_{t w o}

or

δ \in (0, 1)

for Equation (25) is based on

γ_{c o n s t} = 1

and

δ = 1

and

D i f f = 0

for

δ = 1 .

Thus, the difference represents a stochastic effect of the multivariate interval constraint.

5. Numerical Illustrations

This section presents an empirical analysis of the proposed approach (using the HCSMN model) to the stochastic multivariate interval constraint on the location model. We provide numerical simulation results and a real data application comparing the proposed approach to the hierarchical Bayesian approaches, which use usual priors,

π_{m a x} (θ)

and

π_{c o n s t} (θ) .

For numerical implementations, we develop our program written in R, which is available from the author upon request.

5.1. Simulation Study

To examine the performance of the HSCMN model for estimating the location parameter with a stochastic multivariate interval constraint, we conduct a simulation study. The study is based up 200 synthetic datasets for different sample sizes

n = 20, 200

generated form each distribution of

N_{4} (θ, Λ)

and

t_{4} (θ, Λ, ν),

a four-dimensional t with the location parameter

θ,

scale matrix Λ and degrees of freedom

ν = 5 .

For the simulation, we used the following choice of parameter values:

θ = {(θ, θ, m θ, m θ)}^{⊤}

and

Λ = (1 - ρ) I_{4} + ρ 1_{4} 1_{4}^{⊤},

where

m = {(- 1)}^{θ + 1},

ρ = 0.5,

and

θ = 1, 2 .

To fit each of the 200 synthetic datasets (Dataset I) generated from the

N_{4} (θ, Λ)

distribution, we implemented the Markov chain Monte Carlo (MCMC) posterior simulation with the three different HCN models with the multivariate interval constraint

C = {(a, b)}^{⊤}

: the HCN models that use

π_{t w o} (θ),

π_{m a x} (θ),

and

π_{c o n s t} (θ) .

We denote these models by HCN

(π_{t w o})

, HCN

(π_{m a x})

and HCN

(π_{c o n s t}) .

For each dataset, MCMC posterior sampling was based on the first 10,000 posterior samples as the burn-in, followed by a further 100,000 posterior samples with a thinning size of 10. Thus, the final MCMC posterior samples with a size of 10,000 were obtained for each of the three HCN models. Exactly the same MCMC posterior sampling scheme is applied to each of the 200 synthetic datasets (Dataset II) from the

t_{4} (θ, Λ, ν)

distribution based on the three HC

t_{ν}

models, HC

t_{ν} (π_{t w o})

, HC

t_{ν} (π_{m a x})

and HC

t_{ν} (π_{c o n s t}) .

To satisfy a subjective perspective of the hierarchical models, we set

θ_{0} = 0,

Σ = Ω_{1} + Ω_{2} = 0.5 θ I_{4},

and

Ω_{1} = 0.85 Σ

to specify our information about the parameter

θ,

while we set

D = 10^{- 2} I_{4}

and

d = 2 p + 5

to elicit no information about Λ (see, e.g., [32]). For the stochastic multivariate interval constraint, we set

a = - 0.5 θ 1_{4}

and

b = 0.5 θ 1_{4},

and this constraint gives the degree of belief

γ_{m a x} = 0.073

(or

0.217

) and

γ_{t w o} = 0.394

(or

0.571

) for

θ = 1

(or 2). Note that the degree of belief in the constraint, accounted for by

π_{c o n s t} (θ),

is

γ_{c o n s t} = 1

for all of the values of

θ .

Summary statistics of the posterior samples of the location parameters (the mean and the standard deviation of 200 posterior means of each parameter) along with the degrees of belief about the constraint

C

(

γ_{m a x},

γ_{t w o}

and

γ_{c o n s t}

) are listed in Table 1. For the sake of saving a space, we omit the summary statistics regarding Λ from the table. The table indicates the followings: (i) The MCMC method performs well in estimating the location parameters of all of the models considered. This can be justified by the estimation results of the HCN

(π_{m a x})

and HC

t_{ν} (π_{m a x})

models. Specifically, in the posterior estimation of

θ,

the data information tends to dominate the prior information about

θ

for the large sample case (i.e.,

n = 200

), while the latter tends to dominate the former for the small sample case of

n = 20 .

Furthermore, the convergence of the MCMC sampling algorithm was evident, and a discussion about the convergence will be given in Subsection 5.2; (ii) The estimates of

θ

obtained from the HCN

(π_{t w o})

and HC

t_{ν} (π_{t w o})

models are uniformly closer to the stochastic constraint

θ \in C

than those from the HCN

(π_{m a x})

and HC

t_{ν} (π_{m a x})

models. This confirms that

π_{t w o} (θ)

induces an obvious shrinkage effect in Bayesian estimation of the location parameter with a stochastic multivariate interval constraint; (iii) Comparing the estimates of

θ

obtained from the HCN

(π_{t w o})

(or HC

t_{ν} (π_{t w o}))

model to those from the HCN

(π_{c o n s t})

(or HC

t_{ν} (π_{c o n s t}))

model, we see that the difference between their vector values is significant. Thus, we can expect an apparent stochastic effect if we use

π_{t w o} (θ)

in Bayesian estimation of the location parameter with a stochastic multivariate interval constraint.

Table 1. Summaries of posterior samples of

θ = {(θ_{1}, θ_{2}, θ_{3}, θ_{4})}^{⊤}

obtained by using three different priors;

π_{t w o} (θ),

π_{m a x} (θ)

and

π_{c o n s t} (θ) .

HCN, hierarchical constrained normal.

5.2. Car Body Assembly Data Example

John and Wichern consider car body assembly data (accessible through www.prenhall.com/statistics, [35]) obtained from a study of its sheet metal assembly process. A major automobile manufacturer uses sensors that record the deviation from the nominal thickness (millimeters

\times 10^{- 1}

) at a specific location on a car, which has the following levels: the deviation of the car body at the final stage of assembly (

Y_{1}

) and that at an early stage of assembly (

Y_{2}

). The data consist of 50 pairs of observations of (

Y_{1}

,

Y_{2}

), and they provide summary statistics as listed in Table 2. The tests given by ([36], p. 148), using the measures of multivariate skewness and kurtosis, accept the bivariate normality of the joint distribution of

Y = {(Y_{1}, Y_{2})}^{⊤} .

The respective skewness and kurtosis are

b_{1 p} = 0.074

and

b_{2 p} = 7.337,

which give respective p-values of 0.954 (chi-square test for the skewness) and 0.721 (normal test for the kurtosis), indicating the observation model for the dataset is:

y_{i} = θ + ϵ_{i}, i = 1, \dots, 50,

where

ϵ_{i} \overset{i i d}{\sim} N_{2} (0, Λ),

Λ = {λ_{i j}} .

The Shapiro–Wilk (S-W) test is also implemented to see the marginal normality of each

Y_{i},

i = 1, 2 .

The test statistic values and corresponding p-values of the S-W test are listed in Table 2.

Table 2. Summary statistics for the car body assembly data. S-W, Shapiro–Wilk.

In practical situations, we may have information about the mean vector of the observation model (i.e., mean deviation from the nominal thickness) from a past study of the sheet metal assembly process or a quality control report of the automobile manufacturer. Suppose that the information about the centroid of the mean deviation vector,

θ = {(θ_{1}, θ_{2})}^{⊤},

is

{(- 1, 4)}^{⊤}

with

C o v (θ) = d i a g {1, 4} .

Furthermore, there is uncertain information that

θ \in (a, b),

where

a = {(- 1.5, 3)}^{⊤}

and

b = {(- 0.5, 5)}^{⊤} .

This paper has proposed the two-stage maximum entropy prior

π_{t w o} (θ)

to represent all of the information, which is not available with the other priors, such as

π_{m a x} (θ)

and

π_{c o n s t} (θ)

.

Using the three hierarchical models (i.e., the HCN

(π_{m a x}),

HCN

(π_{t w o})

and HCN

(π_{c o n s t})

models), we obtain 10,000 posterior samples from the MCMC sampling scheme based on each of the three models with a 10 thinning period after a burn-in period of 10,000 samples. In estimating the Mote Carlo (MC) error, we used the batch mean method method with 50 batches; see, e.g., [37] (pp. 39–40). For a formal test for the convergence of the MCMC algorithm, we applied the Heidelberger–Welch diagnostic test of [38] to single-chain MCMC runs and calculated the p-values of the test. For the posterior simulation, we used the following choice of hyper-parameter values:

θ_{0} = {(- 1, 4)}^{⊤},

Σ = Ω_{1} + Ω_{2} = 10 I_{2},

Ω_{1} = δ Σ,

Ω_{2} = (1 - δ) Σ,

δ \in (0, 1),

D = 10^{- 2} I_{2}

and

d = 10^{2} + 2 p + 1 .

The posterior estimation and the convergence test results are shown in Table 3. Note that Columns 7–9 of the table list the values obtained from implementing the MCMC sampling for the posterior estimation of HCN(π_two).

Table 3. The posterior estimates and the convergence test results.

The small MC error values listed in Table 3 convince us of the convergence of the MCMC algorithm. Furthermore, the p-values of the Heidelberger–Welch test for the stationarity of the single MCMC run are larger than 0.1. Thus, both of the diagnostic checking methods advocate the convergence of the proposed MCMC sampling scheme. Similar to Table 1, this table also shows that

π_{t w o} (θ)

induces the shrinkage and stochastic effects in the Bayesian estimation of

θ

with the uncertain multivariate interval constraint: (i) From the comparison of the posterior estimates obtained from HCN

(π_{t w o})

with those from HCN

(π_{m a x})

, we see that the estimates of

θ_{1}

and

θ_{2},

obtained from HCN

(π_{t w o}),

shrink toward the stochastic interval

C .

The magnitude of shrinkage effect induced by using the proposed prior

π_{t w o} (θ)

becomes more evident as the degree of belief in the interval constraint

γ_{t w o}

(or δ) gets larger; (ii) On the other hand, we can see the stochastic effect of the prior

π_{t w o} (θ)

by comparing the posterior estimate of

θ

obtained from HCN

(π_{t w o})

with that from HCN

(π_{c o n s t}) .

The stochastic effect can be measured by the difference between the estimates, and we see that the difference becomes smaller as

γ_{t w o}

(or δ) gets larger.

6. Conclusions

In this paper, we have proposed a two-stage maximum entropy prior

π_{t w o} (θ)

of the location parameter of a scale mixture of normal model. The prior is derived by using the two stages of a prior hierarchy advocated by [5] to elicit a stochastic multivariate interval constraint,

{θ; θ \in C} .

With regard to eliciting the stochastic constraint, the two-stage maximum entropy prior has the following properties. (i) Theorem 1 and Corollary 4 indicate that the two-stage prior is flexible enough to elicit all of the degrees of belief in the stochastic constraint; (ii) Corollary 4 confirms that the entropy of the two-stage prior is commensurate with the uncertainty about the constraint

{θ; θ \in C}

; (iii) As given in Corollary 6, the preceding two properties enable us to propose an objective way of eliciting the uncertain prior information by using

π_{t w o} (θ) .

From the inferential view point: (i) the two-stage prior for the normal mean vector has the conjugate property that the prior and posterior distributions belong to the same family of the

R S N

distributions by [23]; (ii) the conjugate property enables us to construct an analytically simple Gibbs sampler for the posterior inference of the model (1) with unknown covariance matrix Λ; (iii) this paper also provides the HCSMN model, which is flexible enough to elicit all of the types of stochastic constraints and the scale mixture for Bayesian inference of the model (1). Based on the HCSMN model, the full conditional posterior distributions of unknown parameters were derived, and the calculation of posterior summary was discussed by using the Gibbs sampler and two numerical applications.

The methodological results of the Bayesian estimation procedure proposed in the paper can be extended to other multivariate models that incorporate functional means, such as linear and nonlinear regression models. For example, the seemingly unrelated regression (SUR) model and the factor analysis model (see, e.g., [24]) can be explained in the same framework of the proposed HCSMN in Equation (1). We hope to address these issues in the near future.

Acknowledgments

The research of Hea-Jung Kim was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2015R1D1A1A01057106).

Conflicts of Interest

The author declares no conflict of interest.

References

O’Hagan, A. Bayes estimation of a convex quadratic. Biometrika 1973, 60, 565–572. [Google Scholar] [CrossRef]
Steiger, J. When constraints interact: A caution about reference variables, identification constraints, and scale dependencies in structural equation modeling. Psychol. Methods 2002, 7, 210–227. [Google Scholar] [CrossRef] [PubMed]
Lopes, H.F.; West, M. Bayesian model assessment in factor analysis. Stat. Sin. 2004, 14, 41–67. [Google Scholar]
Loken, E. Identification constraints and inference in factor models. Struct. Equ. Model. 2005, 12, 232–244. [Google Scholar] [CrossRef]
O’Hagan, A.; Leonard, T. Bayes estimation subject to uncertainty about parameter constraints. Biometrika 1976, 63, 201–203. [Google Scholar] [CrossRef]
Liseo, B.; Loperfido, N. A Bayesian interpretation of the multivariate skew-normal distribution. Stat. Probab. Lett. 2003, 49, 395–401. [Google Scholar] [CrossRef]
Kim, H.J. On a class of multivariate normal selection priors and its applications in Bayesian inference. J. Korean Stat. Soc. 2011, 40, 63–73. [Google Scholar] [CrossRef]
Kim, H.J. A measure of uncertainty regarding the interval constraint of normal mean elicited by two stages of a prior hierarchy. Sic. World J. 2014, 2014, 676545. [Google Scholar] [CrossRef] [PubMed]
Kim, H.J.; Choi, T. On Bayesian estimation of regression models subject to uncertainty about functional constraints. J. Korean Stat. Soc. 2015, 43, 133–147. [Google Scholar] [CrossRef]
Kim, H.J.; Choi, T.; Lee, S. A hierarchical Bayesian regression model for the uncertain functional constraint using screened scale mixture of Gaussian distributions. Statistics 2016, 50, 350–376. [Google Scholar] [CrossRef]
Arellano-Valle, R.B.; Branco, M.D.; Genton, M.G. A unified view on skewed distributions arising from selection. Can. J. Stat. 2006, 34, 581–601. [Google Scholar] [CrossRef]
Kim, H.J. A class of weighted multivariate normal distributions and its properties. J. Multivar. Anal. 2008, 99, 1758–1771. [Google Scholar] [CrossRef]
Jaynes, E.T. Prior probabilities. IEEE Trans. Syst. Sci. Cybern. 1968, 4, 227–241. [Google Scholar] [CrossRef]
Jaynes, E.T. Papers on Probability, Statistics, and Statistical Physics; Rosenkrantz, R.D., Ed.; Reidel: Boston, MA, USA, 1983. [Google Scholar]
Smith, S.R.; Grandy, W. Maximum-Entropy and Bayesian Methods in Inverse Problems; Reidel: Boston, MA, USA, 2013. [Google Scholar]
Ishwar, P.; Moulin, P. On the existence of characterization of the Maxent distribution under general moment inequality constraints. IEEE Trans. Inf. Theory 2005, 51, 3322–3333. [Google Scholar] [CrossRef]
Rosenkrantz, R.D. Inference, Method, and Decision: Towards a Bayesian Philosophy and Science; Reidel: Boston, MA, USA, 1977. [Google Scholar]
Rosenkrantz, R.D. (Ed.) E.T. Jaynes: Papers on Probability, Statistics, and Statistical Physics; Kluwer Academic: Dordrecht, The Netherlands, 1989.
Yuen, K.V. Bayesian Methods for Structural Dynamics and Civil Engineering; John Wiley & Sons: Singapore, Singapore, 2010. [Google Scholar]
Wu, N. The Maximum Entropy Method; Springer: New York, NY, USA, 2012. [Google Scholar]
Cercignani, C. The Boltzman Equation and Its Applications; Springer: Berlin/Heidelberg, Germany, 1988. [Google Scholar]
Leonard, T.; Hsu, J.S.J. Bayesian Methods: An Analysis for Statisticians and Interdisciplinary Researchers; Cambridge University Press: New York, NY, USA, 1999. [Google Scholar]
Kim, H.J.; Kim, H.-M. A Class of Rectangle-Screened Multivariate Normal Distributions and Its Applications. Statisitcs 2015, 49, 878–899. [Google Scholar] [CrossRef]
Press, S.J. Applied Multivariate Analysis, 2nd ed.; Dover Publications, Inc.: New York, NY, USA, 2005. [Google Scholar]
Wilhelm, S.; Manjunath, B.G. Tmvtnorm: Truncated Multivariate Normal Distribution and Student t Distribution. Available online: http://CRAN.R-project.org/package=tmvtnorm (accessed on 17 May 2016).
Genz, A.; Bretz, F. Computation of Multivariate Normal and t Probabilities; Springer: New York, NY, USA, 2009. [Google Scholar]
Gupta, S.D. A note on some inequalities for multivariate normal distribution. Bull. Calcutta Stat. Assoc. 1969, 18, 179–180. [Google Scholar]
Lindly, D.V. Bayesian Statistics: A Review; SIAM: Philadelphia, PA, USA, 1970. [Google Scholar]
Khuri, A.I. Advanced Calculus with Applications in Statistics; John Wiley & Son: New York, NY, USA, 2003. [Google Scholar]
Gamerman, D.; Lopes, H.F. Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference, 2nd ed.; Chapman and Hall: New York, NY, USA, 2006. [Google Scholar]
Branco, M.D. A general class of multivariate skew-elliptical distributions. J. Multivar. Anal. 2001, 79, 99–113. [Google Scholar] [CrossRef]
Chen, M.-H.; Dey, D.K. Bayesian modeling of correlated binary response via scale mixture of multivariate normal link functions. Sankhyã 1998, 60, 322–343. [Google Scholar]
Chib, S.; Greenberg, E. Understanding the Metropolis-Hastings algorithm. Am. Stat. 1995, 49, 327–335. [Google Scholar]
Johnson, N.L.; Kotz, S.; Balakrishnan, N. Distribution in Statistics: Continuous Univariate Distributions, 2nd ed.; John Wiley & Son: New York, NY, USA, 1994; Volume 1. [Google Scholar]
Johnson, R.A.; Wichern, D.W. Applied Multivariate Statistical Analysis, 6th ed.; Pearson Prentice Hall: London, UK, 2007. [Google Scholar]
Mardia, K.V.; Kent, J.T.; Bibby, J.M. Multivariate Analysis; Academic Press: London, UK, 1979. [Google Scholar]
Ntzoufras, I. Bayesian Modeling Using WinBUGS; John Wiley & Son: New York, NY, USA, 2009. [Google Scholar]
Heidelberger, P.; Welch, P. Simulation run length control in the presence of an initial transient. Oper. Res. 1992, 31, 1109–1144. [Google Scholar] [CrossRef]

Figure 1. Graphs of the difference between

p_{1} = γ_{m a x},

p_{2} = γ_{t w o}

and

p_{3} = γ_{c o n s t}

. (a), (c), and (e) for the difference between

p_{3}

and

p_{2}

; (b), (d), and (f) for the difference between

p_{2}

and

p_{1}

.

Figure 2. Graphs of the entropy difference between

E_{1} = E n t (π_{m a x} (θ)),

E_{2} = E n t (π_{t w o} (θ))

and

E_{3} = E n t (π_{c o n s t} (θ))

for different values of

δ \in [0, 1] .

(a), (c), and (e) for the difference between

E_{2}

and

E_{3}

; (b), (d), and (f) for the difference between

E_{1}

and

E_{2}

.

Table 1. Summaries of posterior samples of

θ = {(θ_{1}, θ_{2}, θ_{3}, θ_{4})}^{⊤}

obtained by using three different priors;

π_{t w o} (θ),

π_{m a x} (θ)

and

π_{c o n s t} (θ) .

HCN, hierarchical constrained normal.

**Table 1.** Summaries of posterior samples of $θ = {(θ_{1}, θ_{2}, θ_{3}, θ_{4})}^{⊤}$ obtained by using three different priors; $π_{t w o} (θ),$ $π_{m a x} (θ)$ and $π_{c o n s t} (θ) .$ HCN, hierarchical constrained normal.
	$\underline{HCN (π_{m a x})}$				$\underline{HCN (π_{t w o})}$				$\underline{HCN (π_{c o n s t})}$
Dataset I	$(θ_{1}$	$θ_{2}$	$θ_{3}$	$θ_{4})$	$(θ_{1}$	$θ_{2}$	$θ_{3}$	$θ_{4})$	$(θ_{1}$	$θ_{2}$	$θ_{3}$	$θ_{4})$
$\underline{n = 20}$
true	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000
mean	0.853	0.862	0.869	0.857	0.525	0.530	0.542	0.549	0.141	0.112	0.139	0.121
s.d.	0.203	0.212	0.196	0.194	0.189	0.187	0.176	0.167	0.099	0.111	0.096	0.055
$\underline{n = 200}$
true	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000
mean	0.978	0.979	0.981	0.977	0.912	0.914	0.916	0.912	0.342	0.291	0.294	0.299
s.d.	0.073	0.072	0.074	0.068	0.060	0.067	0.069	0.064	0.037	0.058	0.077	0.064
$\underline{n = 20}$
true	2.000	2.000	−2.000	−2.000	2.000	2.000	−2.000	−2.000	2.000	2.000	−2.000	−2.000
mean	1.837	1.846	−1.853	−1.842	1.306	1.314	−1.331	−1.349	0.599	0.501	−0.483	−0.493
s.d.	0.215	0.223	0.208	0.205	0.295	0.271	0.262	0.238	0.232	0.263	0.291	0.141
$\underline{n = 200}$
true	2.000	2.000	−2.000	−2.000	2.000	2.000	−2.000	−2.000	2.000	2.000	−2.000	−2.000
mean	1.977	1.979	−1.980	−1.977	1.782	1.784	−1.785	−1.782	0.884	0.856	−0.795	−0.733
s.d.	0.074	0.072	0.074	0.068	0.068	0.065	0.068	0.062	0.029	0.051	0.068	0.062
	$\underline{HC t_{5} (π_{t w o})}$				$\underline{HC t_{5} (π_{m a x})}$				$\underline{HC t_{5} (π_{c o n s t})}$
Dataset II	$(θ_{1}$	$θ_{2}$	$θ_{3}$	$θ_{4})$	$(θ_{1}$	$θ_{2}$	$θ_{3}$	$θ_{4})$	$(θ_{1}$	$θ_{2}$	$θ_{3}$	$θ_{4})$
$\underline{n = 20}$
true	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000
mean	0.681	0.691	0.688	0.774	0.332	0.368	0.403	0.361	0.155	0.157	0.158	0.161
s.d.	0.172	0.182	0.201	0.205	0.198	0.190	0.218	0.186	0.108	0.112	0.120	0.099
$\underline{n = 200}$
true	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000
mean	1.003	0.974	1.033	1.006	0.802	0.818	0.814	0.806	0.362	0.411	0.341	0.351
s.d.	0.175	0.178	0.172	0.199	0.061	0.065	0.070	0.081	0.048	0.052	0.039	0.047
$\underline{n = 20}$
true	2.000	2.000	−2.000	−2.000	2.000	2.000	−2.000	−2.000	2.000	2.000	−2.000	−2.000
mean	1.724	1.763	−1.764	−1.675	0.886	0.874	−0.857	−0.924	0.415	0.428	−0.496	−0.489
s.d.	0.239	0.230	0.245	0.231	0.324	0.319	0.343	0.313	0.201	0.196	0.193	0.204
$\underline{n = 200}$
true	2.000	2.000	−2.000	−2.000	2.000	2.000	−2.000	−2.000	2.000	2.000	−2.000	−2.000
mean	2.018	1.943	−1.991	−2.103	1.702	1.699	−1.715	−1.689	0.967	0.986	−0.965	−0.959
s.d.	0.079	0.082	0.081	0.073	0.096	0.099	0.091	0.089	0.053	0.049	0.047	0.045

Table 2. Summary statistics for the car body assembly data. S-W, Shapiro–Wilk.

**Table 2.** Summary statistics for the car body assembly data. S-W, Shapiro–Wilk.
Variable	Mean	s.d.	S-W	p-Value
$Y_{1}$	−1.996	2.781	0.959	0.083
$Y_{2}$	7.426	5.347	0.989	0.926

Table 3. The posterior estimates and the convergence test results.

**Table 3.** The posterior estimates and the convergence test results.
δ	$γ_{t w o}$	Parameter	HCN $(π_{m a x})$	HCN $(π_{c o n s t})$	HCN $(π_{t w o})$	s.d.	MC Error	p-Value
		$θ_{1}$	−1.874	−1.321	−1.665	0.581	0.005	0.483
		$θ_{2}$	7.045	4.814	6.332	1.089	0.008	0.354
0.8	0.423	$λ_{11}$	7.549	7.863	7.682	1.231	0.007	0.551
		$λ_{12}$	−4.632	−4.905	−4.819	1.631	0.005	0.671
		$λ_{22}$	26.872	25.021	27.351	4.926	0.013	0.352
		$θ_{1}$	−1.874	−1.321	−1.557	0.496	0.004	0.434
		$θ_{2}$	7.045	4.814	5.905	1.112	0.006	0.298
0.9	0.567	$λ_{11}$	7.526	7.781	7.959	1.317	0.008	0.635
		$λ_{12}$	−4.726	−5.347	−4.989	1.546	0.006	0.712
		$λ_{22}$	27.587	25.347	28.012	4.836	0.012	0.384

© 2016 by the author; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

A Two-Stage Maximum Entropy Prior of Location Parameter with a Stochastic Multivariate Interval Constraint and Its Properties

Abstract

1. Introduction

2. Two-Stage Maximum Entropy Prior

2.1. Maximum Entropy Prior

2.2. Two-Stage Maximum Entropy Prior

2.3. Entropy of a Maximum Entropy Prior

2.3.1. Case 1: Two-stage Maximum Entropy Prior

2.3.2. Case 2: Constrained Maximum Entropy Prior

2.3.3. Case 3: Maximum Entropy Prior

3. Properties

3.1. Objective Measure of Uncertainty

3.2. Properties of the Entropy

3.3. Posterior Distribution

4. Hierarchical Constrained Scale Mixture of Normal Model

4.1. The Hierarchical Model

4.2. The Gibbs Sampler

4.3. Markov Chain Monte Carlo Sampling Scheme

4.4. Bayes Estimation

5. Numerical Illustrations

5.1. Simulation Study

5.2. Car Body Assembly Data Example

6. Conclusions

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics