Next Article in Journal
Comparing Robust Linking and Regularized Estimation for Linking Two Groups in the 1PL and 2PL Models in the Presence of Sparse Uniform Differential Item Functioning
Next Article in Special Issue
The Flexible Gumbel Distribution: A New Model for Inference about the Mode
Previous Article in Journal
A Novel Flexible Class of Intervened Poisson Distribution by Lagrangian Approach
Previous Article in Special Issue
Closed Form Bayesian Inferences for Binary Logistic Regression with Applications to American Voter Turnout
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Informative g-Priors for Mixed Models

1
Department of Statistics and Actuarial Science, Northern Illinois University, DeKalb, IL 60115, USA
2
Structural Heart & Aortic, Medtronic, Minneapolis, MN 55432, USA
3
BridgeBio, Palo Alto, CA 94304, USA
*
Author to whom correspondence should be addressed.
Current Affiliation: Daiichi Sankyo, Inc., Basking Ridge, NJ 07920, USA.
Stats 2023, 6(1), 169-191; https://doi.org/10.3390/stats6010011
Submission received: 11 November 2022 / Revised: 6 January 2023 / Accepted: 10 January 2023 / Published: 16 January 2023
(This article belongs to the Special Issue Bayes and Empirical Bayes Inference)

Abstract

:
Zellner’s objective g-prior has been widely used in linear regression models due to its simple interpretation and computational tractability in evaluating marginal likelihoods. However, the g-prior further allows portioning the prior variability explained by the linear predictor versus that of pure noise. In this paper, we propose a novel yet remarkably simple g-prior specification when a subject matter expert has information on the marginal distribution of the response y i . The approach is extended for use in mixed models with some surprising but intuitive results. Simulation studies are conducted to compare the model fitting under the proposed g-prior with that under other existing priors.

1. Introduction

Incorporation of expert opinion has been an integral component of informative priors for Bayesian models in a wide variety of settings, many of them clinical [1,2,3]. Even in a highly regulated industry such as the medical devices field, guidance has existed for some time for how expert opinion might be incorporated into models [4]. However, the willingness of regulators to accept expert opinion does not necessarily mean that the process of obtaining and utilizing such information is straightforward. Existing approaches for leveraging prior opinions tend to be cumbersome and labor intensive [5,6,7,8]. This paper provides a simple and easy to use method for experts to specify g-priors for a wide class of mixed models focusing only on the marginal distribution of population responses y 1 , , y n .
A linear model is initially considered y i = x i β + ϵ i , ϵ i i i d ( 0 , σ 2 ) , and prior information ( m , v ) is included such that, marginally, E ( y i ) = m and var ( y i ) = v . Here, the notation of x ( μ , τ 2 ) denotes that a random variable x has mean μ and variance τ 2 , y i is the ith response, x i is a p-vector of covariates which usually includes an intercept, and β = ( β 1 , , β p ) is the p-vector of regression coefficients. The errors ϵ i are assumed Gaussian for the bulk of the paper, but this assumption can often be relaxed. Zellner’s g-prior [9,10] posits
β N p ( β 0 , g σ 2 ( X X ) 1 ) ,
where X is the usual n × p design matrix, yielding a posterior mean that is a weighted average of the usual ordinary least squares (OLS) estimator β ^ = ( X X ) 1 X y and the prior value β 0 , i.e., β ˜ = g 1 + g β ^ + 1 1 + g β 0 . Note that g = 0 gives no weight to the outcome data y = ( y 1 , , y n ) and g gives complete weight to the data. The choice of g has received considerable interest in the literature, and the g-prior has been widely adopted for use in variable selection e.g., [11,12]. It is not our intent to add to the burgeoning literature on variable selection here but rather provide a useful prior for model parameters when some information about the data generating mechanism is known; in such cases, the “informative g-prior” developed here is competitive with existing approaches for variable selection (Section 4.1.2). In this paper, we propose an informative g-prior that can be used by default when prior information is lacking, or that can reflect available prior information on the marginal distribution of population responses. For example, if the outcome is cholesterol level in a certain population and the interest is to investigate how cholesterol level changes with covariates such as age, gender, ethnicity and body mass index, the expert might find that, marginally, y i N ( m = 190 , v = 5 2 ) from various studies. This marginal prior specification does not rely on any covariates, which makes the prior excitation relatively easy. The theoretical marginal distribution of the y i ’s can be obtained from the population distribution of covariates x i , the distribution on β , and the value of σ 2 through the linear regression model under a specific form of the g-prior. Then, the g-prior can be derived by matching moments between this theoretical marginal distribution and the prior distribution of N ( 190 , 5 2 ) to ensure, e.g., E ( y i ) = 190 and var ( y i ) = 5 2 . The method is further extended to provide default priors for mixed models, allowing for random-effects ANOVA, random coefficient models, etc.
The sampling distribution of the OLS estimator β ^ has covariance σ 2 ( X X ) 1 . In a Bayesian analysis assuming normal errors, the flat prior p ( β ) 1 yields the conditional posterior β | y , X , σ 2 N p ( β ^ , σ 2 ( X X ) 1 ) . In either case, the covariance σ 2 ( X X ) 1 estimates σ 2 n [ μ μ + Σ ] 1 , where E ( x i ) = μ and cov ( x i ) = Σ . That is, greater variability in x i implies greater precision in estimating β . Thus, ref. [9] specifies a vague conditional prior for β that takes advantage of information on distributional shape based solely on X and a flat prior on σ 2 . The g-prior developed here further separates how much marginal variability in y i is explained a priori by the model from that of pure noise σ 2 ; a default specification assumes a flat uniform prior on this quantity.
Two popular classes of priors for regression models are conditional means priors [13,14] and power priors [15]. Conditional means priors require a subject matter expert to provide information on the mean response for several candidate vectors of covariates (that do not have to be among those actually observed); the usual specification requires the expert to be able to think about the mean responses independently, but this is not strictly required. Let the candidate vectors be x ˜ 1 , , x ˜ N , where N p . The subject matter expert is asked to provide, say, a 95% interval that contains the mean response x ˜ i β at covariates x ˜ i , e.g., P ( a i x i ˜ β b i ) = 0.95 . This information on the conditional means m ˜ i = x ˜ i β is summarized as m ˜ i i n d . N ( m i , v i ) , yielding m ˜ = X ˜ β N N ( m , V ) , where m = ( m 1 , , m N ) and V = diag ( v 1 , , v N ) . If X ˜ is invertible, requiring N = p ; then, the induced prior is simply β N N ( X ˜ 1 m , X ˜ 1 V ( X ˜ 1 ) ) . Ref. [13] propose methods for handling partial prior information on a subset N < p , i.e., the subject matter expert only need specify a handful of priors for conditional means. In contrast, the g-prior developed here only requires information on the marginal distribution of the y i ’s, namely ( m , v ) .
Power priors are built from historical regression data having the same covariates as the current data. Say that the historical data are { x ˜ i , y ˜ i } i = 1 M and the current data are { ( x i , y i } i = 1 n . The power prior is simply the posterior of β based on a reference prior, raised to the power α [ 0 , 1 ] : p ( β , σ 2 ) i = 1 M ϕ ( y ˜ i | x ˜ i β , σ 2 ) α σ 2 , where ϕ ( y | m , v ) is the density of a normal random variable with mean m and variance v. The parameter α provides the “degree of borrowing” from the historical data, with α = 0 giving none and α = 1 treating the historical data the same as the current study data. The choice of α has also received considerable research [16,17,18]. In addition to the power and conditional mean priors, ref. [19] proposed a natural conjugate reference informative prior by taking into account various degrees of certainty in covariates, and [20] proposed a default prior for β j s by using a normal distribution with mean zero and standard deviation equal to the standard error of the M-estimator of each β j .
There are several notable limitations for conditional means and power priors. Conditional means priors involve the analyst thinking about various covariate combinations and providing information on the mean response for each covariate setting. As the number of predictors increases, this becomes increasingly difficult; it can be conceptually easier to think about marginal quantities such as the overall mean m and variance v in the population. Such marginal information may be available via census or through published summary data. The power prior requires a historical data set having a superset of the variables under consideration in the current study, which is often unavailable for new treatments.
One consequence of the proposed priors developed here is that proper, data-driven priors are given in closed-form with default settings. Thus, standard model comparison via Bayes factors is possible as no improper priors are used. Difficult-to-elicit prior information such as the range of a variance component is replaced with the question “How much variability in the data do you think the model explains?” If the answer to this is “I have no idea” then a uniform distribution on σ 2 is suggested. The proposed priors do not have closed-form full conditional distributions for all parameters but are easily specified and fit in R using statistical software Just Another Gibbs Sampler (JAGS) [21] via packages such as R2jags [22].
Bayesians have long known that injecting a small amount of prior information can often “fix” pathological MCMC schemes. The g-prior developed here can be viewed as a ridge prior that takes multicollinearity into account, with the added benefit that the ridge parameter is automatically chosen by the data. Section 2 introduces the informative g-prior for linear regression models. Section 3 extends the g-prior for use in mixed models. Section 4 presents a detailed set of simulation studies exploring the use of the g-prior and compares to other priors in common use. Section 5 concludes the paper with a discussion and eye toward future research.

2. Prior for Linear Regression Models

2.1. The Prior in [23]

The g-prior in [23] was developed for logistic regression; this section carefully extends their approach to normal-errors regression, and Section 3 generalizes further to mixed models. Their g-prior is specified as
β | β 0 , g , X N p ( β 0 , g n ( X X ) 1 ) ,
where g > 0 , and X is the usual n × p design matrix. Assume x i i i d H for some distribution H where x i ( μ , Σ ) . Noting that x i includes the intercept in the first element, the first element of μ is one and the first row along with the first column entries of Σ are all zeros. Given the data X , for any new subject with response y and covariates x H , assuming x and β are mutually independent, one has E ( x β ) = E x { E β ( x β | x ) } = μ β 0 , by the law of iterated expectations. In addition, by the law of total variance, one has
Var ( x β ) = E x { Var β ( x β | x ) } + Var x { E β ( x β | x ) } = E x { g n x ( X X ) 1 x } + Var x ( μ β 0 ) = g · trace n ( X X ) 1 ( Σ + μ μ ) p g · trace ( Σ + μ μ ) 1 ( Σ + μ μ ) = g p ,
where p denotes convergence in probability, and the limiting statement originates from the fact that n ( X X ) 1 p ( μ μ + Σ ) 1 . Hence, given X , the g-prior in (1) implies that x β has a variance approximately equal to g p for any covariate x randomly drawn from its population H. [23] found that x β also often approximately follows a normal distribution, and this approximation is good for a variety of H considered in their simulations, even when some covariates are categorical. Therefore, it is reasonable to assume that x β approximately follows N ( μ β 0 , g p ) .
For the linear normal regression model y i | x i i n d . N ( x i β , σ 2 ) , the g-prior in [23] can be applied as follows. Assuming a subject matter expert has in hand information on the distribution of marginal mean responses (i.e., E ( y i ) = s e t m ) in a population, rather than the distribution of y i , say m N ( μ m , σ m 2 ) with ( μ m , σ m 2 ) being chosen to reflect the prior knowledge about the distribution of m. Then using the prior matching idea in [23], one can immediately solve for β 0 and g in (1) as β 0 = μ m e 1 and g = σ m 2 / p where e 1 = ( 1 , 0 , , 0 ) . Although [24] finds the default prior given by [23] for logistic regression to provide the best predictive performance among several contenders, the performance for linear regression model has not been well tested. In addition, it is not straightforward to set default values for ( μ m , σ m 2 ) and its extension to linear mixed models is not readily available.
In this paper, we will propose a new g-prior for the linear regression model when a subject matter expert has information on the marginal distribution of the response y i rather than E ( y i ) with reasonable default settings and then extend it for use in linear mixed models.

2.2. New Prior Development

An easily implemented g-prior is first proposed for use in the linear regression model:
y i | x i , β , σ 2 i n d . ( x i β , σ 2 ) , i = 1 , , n .
Consider the situation where a subject matter expert has information on the marginal distribution of observations y i that can be synthesized as
y i ( m , v ) , m N ( m 0 , v / k m ) , v 1 Γ ( k v , v 0 k v ) ,
where m 0 , v 0 , k m and k v can be obtained from previous studies or published summary data; details are given in Section 2.3. Here Γ ( a , b ) denotes the gamma distribution with mean equal to a / b . The goal here is to develop a particular version of g-prior on ( β , σ 2 ) in (2) that achieves the marginal distribution of y i ( m , v ) .
Consider the g-prior in (1). Given σ 2 , the total expectation formula gives
E ( y ) = E x β E y | x β ( y ) = E x β ( x β ) = μ β 0 ,
and the total variance formula gives
var ( y ) = E x β var y | x β ( y ) + var x β E y | x β ( y ) = E x β ( σ 2 ) + var x β ( x β ) σ 2 + g p .
For models with an intercept, setting β 0 = m e 1 satisfies the first moment condition E ( y i ) = m . The larger σ 2 is, the more the prior shrinks β toward the intercept only model (with an intercept focused on m), and so is conservative in favoring the null of the overall F-test that no covariates are important.
To match the second moment condition var ( y i ) = v , set g p + σ 2 = v and solve for g = ( v σ 2 ) / p in (1) when σ 2 v . Since E ( y i | σ , m , v ) = m and var ( y i | σ , m , v ) = v for all σ 0 , the marginal constraint of y i ( m , v ) approximately holds for any prior σ 2 p ( · ) with support σ 2 [ 0 , v ] . In particular, a special case of the generalized beta distribution,
p a , b , v ( σ 2 ) = Γ ( a + b ) Γ ( a ) Γ ( b ) v σ 2 v a 1 1 σ 2 v b 1 I [ 0 , v ] ( σ 2 ) ,
denoted gb ( a , b , v ) , allows flexibility in specifying how much variability the regression model explains relative to the total variability v; note E [ v σ 2 v | v ] = b a + b . If one had prior information that, say, the amount of variation explained by regression is r 0 (similar to R 2 in OLS regression, but R 2 conditions on X and fixes β = β ^ ), then the parameters in (4) could be chosen such that b a + b = r 0 with the total “sample size” n 0 going into the prior as n 0 = a + b ; solving yields a = ( 1 r 0 ) n 0 and b = r 0 n 0 . No prior preference gives a = b = 1 , i.e., σ 2 uniform ( 0 , v ) , a sensible default choice.
Encapsulating the above, a hierarchical prior that maintains y i ( m , v ) is
β | σ 2 , m , v N p e 1 m , n p ( v σ 2 ) ( X X ) 1 , σ 2 | v gb ( a , b , v ) , m N ( m 0 , v / k m ) , v 1 Γ ( k v , v 0 k v ) .
This prior provides an intuitive interpretation given v: when σ 2 = 0 ( b ) the model explains all variability in y i , when σ 2 = v ( a ) then the model explains nothing. Values a , b ( 0 , ) indicates that the truth is somewhere between these two extremes, with a = b = 1 reflecting no preference on how much variability the model explains. This formulation of the g-prior can be viewed as a type of ridge regression which further addresses multicollinearity among predictors, but where the ridge parameter is chosen automatically. The special form of the g-prior enables easy computation of the amount of variability the model explains relative to the total v.
Once a distribution (e.g., Gaussian) is assumed for the linear regression model given in (2), estimates of β and σ 2 can be obtained using statistical software such as JAGS [21] via the R package R2jags [22]; see Supplementary Materials for the R code.

2.3. Hyper-Prior Elicitation for ( M , V )

Our prior in (5) requires a specification for hyperparameters m 0 , v 0 , k m and k v . Suppose we have historical data y o = ( y o 1 , , y o M ) from a similar study population. If we assume y o i i i d N ( m , v ) , then using a noninformative prior such as p ( m , v ) 1 / v gives
m | y o , v N ( y ¯ o , v / M ) , v 1 | y o Γ ( M / 2 , s y o 2 · M / 2 ) ,
where s y o 2 = i = 1 n ( y o i y ¯ o ) 2 / M . If one believes that the historical data y o come from the same population as the current observed response data y = ( y 1 , , y n ) , it is reasonable to set m 0 = y ¯ o , v 0 = s y o 2 , k m = M and k v = M / 2 in (5). If the historical data come from a population quite different from the current study or the population distribution is not plausibly normal, one may set lower values for k m and k v to put less weight on the historical data relative to the current data. If historical data are not available, we recommend setting m 0 = y ¯ = i = 1 n y i / n , v 0 = s y 2 = i = 1 n ( y i y ¯ ) 2 / n , k m = 2 and k v = 1 instead of setting k m = k v = 0 ; this assumes that the unavailable historical data have the sample mean equal to y ¯ and the sample variance equal to s y 2 and are given with the weight of two observations. In real applications, a sensitivity analysis can be performed by setting k m to several different values between 2 and M.
The idea behind our hyperprior elicitation for ( m , v ) is similar to the power prior [15], which is defined as the posterior of model parameters given the historical data, raised to a power α [ 0 , 1 ] , where α provides the “degree of borrowing” from the historical data. Consider an intercept-only model y i i i d N ( β 1 , σ 2 ) . Note that our prior (5) simply reduces to β 1 = m , σ 2 = v , m N ( m 0 , v / k m ) , v 1 Γ ( k v , v 0 k v ) . Given the historical data y o , setting m 0 = y ¯ o , v 0 = s y o 2 , k m = M and k v = M / 2 is exactly the power prior with α = 1 . Similarly, the values of k m and k v control the influence of historical data. For the general linear model in (2), the important difference is that our prior on ( β , σ 2 ) does not require any covariates in the historical data, since it depends on the historical data only through ( m , v ) .

2.4. Comparing to the Mixture of G Priors

For the linear model in (2) with Gaussian errors, the hyper-g prior in [12] can be expressed as
β | g , σ 2 N p ( m e 1 , g σ 2 ( X X ) 1 ) p ( m , σ 2 ) 1 / σ 2 p ( g ) = h 2 2 ( 1 + g ) h / 2 I ( g > 0 ) ,
where h > 2 is set to ensure a proper distribution. Ref. [12] show that the hyper g-prior is not consistent for model selection when the true model is the null model, and then propose a hyper- g / n prior as
p ( g ) = h 2 2 n ( 1 + g / n ) h / 2 I ( g > 0 ) .
Setting g σ 2 = n p ( v σ 2 ) i.e., σ 2 / v = 1 g p / n + 1 in our prior (5) gives
β | g , σ 2 N p e 1 m , g σ 2 ( X X ) 1 , m | g , σ 2 N ( m 0 , σ 2 ( g p / n + 1 ) / k m ) , σ 2 | g Γ ( k v , k v v 0 g p / n + 1 ) , 1 g p / n + 1 beta ( a , b ) .
If we set a = h 2 2 , b = 1 , k m = 0 and k v = 0 , it is easy to show that our prior further becomes
β | g , σ 2 N p e 1 m , g σ 2 ( X X ) 1 , p ( m , σ 2 ) 1 / σ 2 , p ( g ) = h 2 2 n / p ( 1 + g p / n ) h / 2 I ( g > 0 ) .
which is similar to the hyper- g / n prior in (8), the only difference being that our g is scaled by n / p instead of n. Therefore, the proposed prior here naturally leads to a modified version of the hyper- g / n prior considered in [12] when there is no history information on ( m , v ) .

2.5. Simple Example

Ref. [25] analyze data on the n = 27 lengths y i (in meters) of dugongs (sea cows) having ages a i (in years). They fit a nonlinear exponential model for length based on a i ; we consider a linear model by transforming age, i.e., x i = ( 1 , log ( a i ) ) . An example of a commonly used vague, proper prior is β 0 , β 1 i i d N ( 0 , 10 2 ) and σ 2 Γ ( 0.01 , 0.01 ) . The prior marginal mean and variance for the response y under this prior can be estimated via Monte Carlo (MC) by simulating σ 2 Γ ( 0.01 , 0.01 ) , β N 2 ( 0 , 10 2 I 2 ) , and y i ( l ) i i d N ( x i β , σ 2 ) , i = 1 , , 27 , l = 1 , , 1000 , yielding 1000 datasets { y i ( l ) : i = 1 , , n } . The simulation of σ 2 Γ ( 0.01 , 0.01 ) is completed using the method of [26] designed for gamma distributions with small shape parameters. The average prior sample mean (across the 1000 datasets) and prior sample variance are around 2 × 10 120 and 2 × 10 249 , respectively. These are nowhere near the observed sample mean and variance of y ¯ = 2.334 and s y 2 = 0.073 . In contrast, a similar simulation under our proposed new g-prior in (5) with a = b = 1 , m 0 = y ¯ , v 0 = s y 2 , k m = 2 and k v = 1 yields an average sample mean of 2.305 with MC standard deviation of 0.559 and an average sample variance of 0.442 with MC standard deviation of 3.243. That is, the inference under our prior focuses on a much smaller set of potential models that could have conceivably generated the observed data. The posterior estimates for β 0 , β 1 , and σ 2 under our proposed new g-prior are 1.770 (0.047), 0.273 (0.021), and 0.0094 (0.0029), respectively, where the values in parentheses are posterior standard deviations. The commonly used vague priors specified above yield to similar estimates but with slightly higher posterior standard deviations: 1.763 (0.047), 0.277 (0.021), and 0.0097 (0.0031).
The use of such prior predictive checks have recently been advocated by [27,28,29]; in particular, ref. [27] suggests that analysts …“visualize simulations from the prior marginal distribution of the data to assess the consistency of the chosen priors with domain knowledge.” They further suggest the use of “weakly informative” priors to gently urge the prior in the direction of providing plausible marginal values. This requires some thought and visual exploration on the part of the user; the prior developed here provides a safe, default method for nudging the prior toward domain knowledge in the form of either historical marginal values or the sample moments seen in the data. The prior mean and variance exist whether analyst wants to think about them or not; this example illustrates that “vague” priors are not necessarily noninformative.

2.6. Variable Selection

Consider the Gaussian linear regression model y N n ( X β , σ 2 I n ) . Using the proposed g-prior in (5) for Bayesian variable selection requires the calculation of marginal likelihood for each of the 2 p 1 submodels, denoted as M ξ , where ξ = ( ξ 1 , , ξ p ) { 0 , 1 } p is a p-dimensional vector of indicators with ξ j = 1 implying that the jth covariate x i j is included in the model. Here we always set ξ 1 = 1 so that an intercept is included. Under model M ξ , we have y N n ( X ξ β ξ , σ 2 I n ) , where X ξ is the n × p ξ design matrix under model M ξ , and β ξ is the corresponding p ξ -vector of regression coefficients. For model M ξ , a default prior specification for β ξ and σ 2 is given by
β ξ | σ 2 N m e 1 ξ , n p ξ ( v σ 2 ) ( X ξ X ξ ) 1 , σ 2 gb ( a , b , v ) , m N ( m 0 , v / k m ) , v 1 Γ ( k v , v 0 k v ) ,
where e 1 ξ = ( 1 , 0 , , 0 ) is p ξ -dimensional.
To perform the variable selection, we need to calculate the Bayes factors for comparing each model M ξ with the null model M N = M e 1 . Note that under model M ξ ( M N ) with prior (11), the marginal likelihood given σ 2 and ( m , v ) is
p ( y | M ξ , σ 2 , m , v ) = ( 2 π ) n 2 p ξ n v ( n p ξ ) σ 2 p ξ / 2 ( σ 2 ) n p ξ 2 × exp ( S S T ) p ξ 1 + n ( v σ 2 ) ( 1 R ξ 2 ) p ξ σ 2 + n ( y ¯ m ) 2 S S T 2 [ n v ( n p ξ ) σ 2 ] ,
where S S T = i = 1 n ( y i y ¯ ) 2 and R ξ 2 is the usual R-squared under model M ξ . Under the null model M N : y i N ( β 1 , σ 2 ) , the prior (11) simply reduces to β 1 = m , σ 2 = v , m N ( m 0 , v / k m ) , v 1 Γ ( k v , v 0 k v ) . Therefore, the marginal likelihood under the null model given ( m , v ) is
p ( y | M N , m , v ) = ( 2 π ) n 2 ( v ) n 2 exp 1 2 v i = 1 n ( y i m ) 2 .
Note that this is a special case of (12) with σ 2 = v and p ξ = 1 .
When ( m , v ) is fixed and known, the Bayes factor for comparing any model M ξ ( M N ) to the null model M N is
BF [ M ξ : M N | m , v ] = p ( y | M ξ , m , v ) p ( y | M N , m , v ) ,
where p ( y | M ξ , m , v ) = 0 v p a , b , v ( σ 2 ) p ( y | M ξ , σ 2 , m , v ) d σ 2 . It is easy to show that the Bayes factor in (14) is finite for all p ξ p < n . The integrals in p ( y | M ξ , m , v ) can be numerically computed using the R function integrate [30].
When the hyperprior on ( m , v ) in (5) is used, the Bayes factor for comparing M ξ to M N becomes
BF [ M ξ : M N ] = E m , v [ p ( y | M ξ , m , v ) ] E m , v [ p ( y | M N , m , v ) ] ,
where the expectation E m , v [ · ] is taken under the prior for ( m , v ) in (5). However, the calculation of expectations E m , v [ · ] in (15) is considerably more computationally demanding. Based on the competitive performance of our prior compared to other methods in simulation studies, we recommend using the Bayes factor in (14) with ( m , v ) fixed at ( m ^ , v ^ ) , where m ^ and v ^ are determined as follows. If there is no history information available for ( m , v ) , we simply use m ^ = y ¯ and v ^ = s y 2 based on the current marginal data { y i } . If there is some history information for ( m , v ) that can be summarized as m N ( m 0 , v / k m ) , v 1 Γ ( k v , v 0 k v ) , we set ( m ^ , v ^ ) = ( m ˜ , v ˜ ) , where ( m ˜ , v ˜ ) is the posterior mean estimate for ( m , v ) based on only the marginal data ( y 1 , , y n ) ; see Section 2.3 for the specification of m 0 , v 0 , k m and k v with historical data y o . Note that closed-form formulas for m ˜ and v ˜ can be derived; see [31] for the derivations. Once the model is selected, we can apply the prior (5) to fit the model under the selected model.

Information Paradox

The information paradox [32] refers to the situations when we have very strong information supporting a non-null model M ξ , but the Bayes factor BF [ M ξ : M N | m , v ] does not go to as the information about M ξ accumulate (i.e., R ξ 2 1 ). The proposed informative g prior resolves the information paradox in the sense that BF [ M ξ : M N | m , v ] with fixed n, p ξ p ( n 2 ) and R ξ 2 1 . Note that the denominator in (14) is finite, and by the mean value theorem for definite integrals, there exists c in ( 0 , v ) such that
0 v p a , b , v ( σ 2 ) p ( y | M ξ , σ 2 , m , v ) d σ 2 = p a , b , v ( c ) 0 v p ( y | M ξ , σ 2 , m , v ) d σ 2 .
Therefore, it suffices to show that
lim R ξ 2 1 0 v p ( y | M ξ , σ 2 , m , v ) d σ 2 = for all p ξ p ( n 2 ) .
Noting that 0 v p ( y | M ξ , σ 2 , m , v ) d σ 2 is an increasing function of R ξ 2 , we have
lim R ξ 2 1 0 v p ( y | M ξ , σ 2 , m , v ) d σ 2 = 0 v ( 2 π ) n 2 p ξ n v ( n p ξ ) σ 2 p ξ / 2 ( σ 2 ) n p ξ 2 × exp ( S S T ) p ξ 1 + n ( y ¯ m ) 2 S S T 2 [ n v ( n p ξ ) σ 2 ] d σ 2 0 v ( 2 π ) n 2 p ξ n v p ξ / 2 ( σ 2 ) n p ξ 2 × exp ( S S T ) p ξ 1 + n ( y ¯ m ) 2 S S T 2 [ n v ( n p ξ ) v ] d σ 2 c o n s t a n t 0 v ( σ 2 ) n p ξ 2 d σ 2 =
for all p ξ p ( n 2 ) .

3. Mixed Models

3.1. One-Way Random Effects ANOVA

The g-prior developed in Section 2 for regression models can be immediately extended to mixed models in an analogous fashion. The shrinkage induced by the g-prior yields familiar exchangeable prior specifications already in widespread use as special cases, as well as some new default formulations. We first examine the simplest random effects model, a one-way ANOVA, typically formulated as
y i j = β + γ i + ϵ i j , i = 1 , , c , j = 1 , , n i ,
where ϵ i j i i d N ( 0 , σ 2 ) , rewritten in matrix form as
y i = 1 n i β + 1 n i γ i + ϵ i ,
where y i = ( y i 1 , , y i n i ) , 1 n i is a n i -vector of ones, and ϵ i = ( ϵ i 1 , , ϵ i n i ) . Note that without further constraints, (16) is overparameterized; shrinkage on both the “fixed” and “random” portions separately is required for identifiability.
Noting that n i ( 1 n i 1 n i ) 1 = 1 for all i, a g-prior on the first portion is
β | g 1 N ( β 0 , g 1 ) .
Similarly, a g-prior on the second portion is
γ 1 , , γ c | g 2 i i d N ( 0 , g 2 ) .
This prior is the same as assuming exchangeable random effects, e.g., γ 1 , , γ c | σ r 2 i i d N ( 0 , σ r 2 ) , where σ r 2 = g 2 ; placing a prior on g 2 is the same as placing a prior on σ r 2 . The g-prior as a ridge prior is evident here, with model identifiability achieved by shrinking γ i towards 0. The amount of shrinkage is controlled a priori via the parameter g 2 . There are obvious links from the g-prior to ridge regression, shrinkage priors, and penalized likelihood.
The prior on σ r has received considerable interest; suggestions include the half-Cauchy prior and uniform priors [33], as well as approximations to Jeffreys’ prior, e.g., σ r 2 Γ ( 0.001 , 0.001 ) , which permeated Bayesian literature in the 1990’s. Ref. [34] advocate a data-driven prior that is similar in spirit to what is presented here. Ref. [35] considers a shrinkage prior for σ r 2 induced by a uniform prior on σ 2 / ( σ 2 + σ r 2 ) . Ref. [36] uses a g-prior for ANOVA with diverging number of parameters. In contrast, we will build a prior that facilitates the borrowing of history information on the overall marginal mean m and variance v of the data y i j .

3.2. Linear Mixed Models

Now consider the linear mixed model
y i j = x i j β + z i j γ i + ϵ i j , or equivalently , y i = X i β + Z i γ i + ϵ i ,
where ϵ i j i i d N ( 0 , σ 2 ) , γ i is a k-vector of random effects, X i = [ x i 1 x i n i ] , Z i = [ z i 1 z i n i ] . In this setting, i = 1 , , c denotes the data cluster associated with γ i and j = 1 , , n i are the number of repeated measures within cluster i; the total sample size is n = i = 1 c n i . The variability in model (18) is portioned to X i β , Z i γ i and ϵ i . The first two components will have dependent g-priors, inducing differing amounts of shrinkage across the two regression models; the second portion is further shrunk toward zero. Set γ = ( γ 1 , , γ c ) . Again, the goal here is to develop a prior on ( β , γ , σ 2 ) that incorporates the marginal information of y i j ( m , v ) , where a hyperprior on ( m , v ) can be extracted from historical data or expert option. The usual g-prior on β for cluster i is
β | g 1 N p m e 1 , g 1 n i ( X i X i ) 1 .
Let μ i x = E ( x i j | i ) and Σ i x = cov ( z i j | i ) denote by mean and covariance of x i j for cluster i, then X i X i / n i p ( μ i x μ i x + Σ i x ) . Similarly, let μ x = E ( x i j ) and Σ x = cov ( x i j ) denote the overall mean and covariance across all clusters, set X = [ X 1 X c ] , then X X / n = 1 n i = 1 c X i X i p [ μ x μ x + Σ x ] . Noting that the same coefficient β is used for all clusters, the overall g-prior for β can be set as
β | g 1 N p m e 1 , g 1 n ( X X ) 1 .
The usual g-prior on γ i for cluster i is
γ i | g 2 i n d . N k 0 , g 2 n i ( Z i Z i ) 1 N k 0 , g 2 ( μ i μ i + Σ i ) 1 ,
where μ i = E ( z i j | i ) , Σ i = cov ( z i j | i ) . Denote by μ = E ( z i j ) and Σ = cov ( z i j ) the overall mean and covariance of z i j across all clusters. If the z i j s come from the same population, i.e., μ 1 = = μ c = μ and Σ 1 = Σ c = Σ , (20) is equivalent to
γ 1 , , γ c | g 2 i i d N k ( 0 , Ω ) ,
where Ω = g 2 [ μ μ + Σ ] 1 . This final expression lies at the heart of hundreds of mixed model analyses; the derivation here clarifies that this is exactly what the g-prior gives us when z i j i i d ( μ , Σ ) . Define Z = [ Z 1 Z c ] . Noting that Z Z / n = 1 n i = 1 c Z i Z i p [ μ μ + Σ ] , a sensible default prior is
γ i | g 2 i i d N k 0 , g 2 n ( Z Z ) 1 ,
assuming μ 1 = = μ c = μ and Σ 1 = Σ c = Σ is approximately correct.
Let t k ( r , μ , Σ ) be the k-dimensional multivariate t distribution with degrees of freedom r, mean μ for r > 1 , and covariance r r 2 Σ for r > 2 . Taking r / g 2 χ r 2 under the default prior (21), the induced marginal prior on γ i is a multivariate t distribution see [37], given by
γ i t k ( r , 0 , n ( Z Z ) 1 ) .
It is tempting to seek out a more flexible model via the Wishart distribution, but note if instead γ i | Ω N k ( 0 , Ω ) , and Ω W k 1 ( r + k 1 , n r ( Z Z ) 1 ) , the same marginal distribution is induced on γ i . Here, W k 1 ( r , R ) is an inverted-Wishart distribution with the usual parameters ( r , R ) , r > ( k 1 ) . One can play around with different settings for various hyperparameters, but the end result is typically a multivariate t distribution or something close. For example, ref. [38] proposed a default random effects specification for generalized linear models; under the normal errors model their proposal is γ i | Ω i i d N k ( 0 , Ω ) where Ω | σ 2 W 1 ( k , k R ) , R = w c σ 2 Z Z 1 , where w > 0 is an inflation factor. Their induced marginal prior is γ i t k ( 1 , 0 , k w c σ 2 ( Z Z ) 1 ) . Note that our specification is not conditional on σ 2 , otherwise, all these priors induce a multivariate t-distribution with similar covariance structures. Ref. [38] compare their approach to the approximate uniform shrinkage prior of [39]. Ref. [40] extended half-t prior [33] to the multivariate setting so that the prior on the covariance matrix induces half-t priors on standard deviations and uniform priors on correlations.
We proceed to build a prior that reflects the prior knowledge on the overall marginal mean m and variance v of the data y i j . Under prior (20) or (21) along with (19), we have m = E ( y i j ) as before and now
v = var ( y i j ) = var ( x i j β ) + var ( z i j γ i ) + var ( ϵ i j ) g 1 p + g 2 k + σ 2 .
Certainly σ 2 v and g 2 k v σ 2 are reasonable bounds. The following default specification enforces the mean and variance constraint of y i j ( m , v ) :
β | g , σ 2 N p m e 1 , n p ( v σ 2 g k ) ( X X ) 1 . γ i | g i i d N k 0 , g n ( Z Z ) 1 g | σ 2 g b a 1 , b 1 , v σ 2 k σ 2 g b ( a 2 , b 2 , v ) m N ( m 0 , v / k m ) , v 1 Γ ( k v , v 0 k v ) .
A uniform prior on σ 2 is specified a 2 = b 2 = 1 ; a uniform prior on g obtains from a 1 = b 1 = 1 . When covariates z i j come from quite different subpopulations across clusters, we recommend replacing the prior on γ i in (22) with γ i | g i n d . N k 0 , g n i ( Z i Z i ) 1 . The proposed prior (22) enables easy computation of the approximate amount of variation explained by random effects (i.e., g k / v ) and fixed effects (i.e., ( v σ 2 g k ) / v ) relative to the total v.
The priors on the fixed and random effect portions of the model are tied together and correlated; this is necessary to a priori conserve marginal variance. Ref. [41] note that, although variance components are usually modeled independently in the prior, typically as inverse-gamma, uniform, or half-Cauchy, they are “linked as they are components of the total variation in the response…” and suggests modeling them jointly as we do here, though via generalized multivariate gamma or multivariate log-normal distributions.

3.3. Hyper-Prior Elicitation for ( M , V ) in Mixed Models

Our prior in (22) requires specifying hyperparameters m 0 , v 0 , k m and k v in the hyperprior for ( m , v ) . Suppose the historical data are y o = { y o i j | i = 1 , , c o ; j = 1 , , n o i } ( m , v ) . Set M = i = 1 c o n o i . We need to extract sensible hyper-parameter values m 0 , v 0 , k m and k v so that the hyper-prior for ( m , v ) in (22) is close to the true posterior of ( m , v ) based on the historical data. Assume that the historical data can be approximately fit by the one-way random ANOVA: y o i j = m + γ i + ϵ i j , γ i i i d N ( 0 , σ o r 2 ) , ϵ i j i i d N ( 0 , σ o 2 ) . Unbiased estimates for m, σ o r 2 and σ o 2 can be obtained using restricted maximum likelihood (REML) via the R function lmer in package lme4 [42], denoted as m ^ o , σ ^ o r 2 and σ ^ o 2 . Then v ^ o = σ ^ o 2 + σ ^ o r 2 is an unbiased estimate of v, and ρ ^ o = σ ^ o r 2 / v ^ o is an estimate of the intraclass correlation coefficient. Based on some simulation trials, we find that the following posterior distributions approximately hold
m | y o , ρ ^ o , v N ( m ^ o , v n o m ) , v 1 | y o , ρ ^ o Γ ( n o v 2 , v ^ y n o v 2 ) ,
where n o m = M / ( 1 + ρ ^ o / ( n o λ 1 ) ) and n o v = M / ( 1 + ρ ^ o 2 / ( n o λ 1 ) ) can be interpreted as the effective sizes to account for the intraclass dependency, where n o λ = c o / i 1 n o i . Simple simulations (not shown here) reveal that the posterior distributions in (23) often provide us empirical coverage probabilities for ( m , v ) around 0.95 , and the confidence width for v is much narrower than the methods proposed in [43]. Further investigation is needed to understand the reason behind this. Fortunately, we use this approximate posterior to only select a reasonable hyperprior for ( m , v ) but not for our actual posterior inference based on the current data.
If one believes that the historical data y o come from the same population as the current observed response data y = { y i j | i = 1 , , c ; j = 1 , , n i } , it is reasonable to set m 0 = m ^ o , v 0 = v ^ o , k m = n o m and k v = n o v / 2 in the hyperprior of ( m , v ) in (22). Setting lower values for k m and k v puts less weight on the historical data relative to the current data. If the historical data are not available, we recommend setting m 0 = m ^ , v 0 = v ^ , k m = 2 and k v = 1 , where m ^ and v ^ are the REML estimates of ( m , v ) based on the current response data y .
For the random effects one-way ANOVA model (16), the prior (22) reduces to β | g , σ 2 N ( m , v σ 2 g ) , γ i N ( 0 , g ) , g | σ 2 gb ( a 1 , b 1 , v σ 2 ) and σ 2 gb ( a 2 , b 2 , v ) . In addition, the prior information of y i j ( m , v ) indicates σ 2 + g = v which further leads to β = m . Therefore, it is easy to show that the prior (22) for the random effects one-way ANOVA model finally reduces to
γ i N ( 0 , g ) , σ 2 v beta ( a 2 , b 2 ) , β N ( m 0 , v / k m ) , v 1 Γ ( k v , v 0 k v ) .
If we set a 2 = b 2 = 1 and k m = k v = 0 , the prior in (24) is equivalent to
γ i N ( 0 , g ) , σ 2 σ 2 + g uniform ( 0 , 1 ) , p ( β , σ 2 ) 1 σ 2 ,
which is exactly the shrinkage prior considered in [35]. That is, our prior naturally reduces to a well-known shrinkage prior for the random one-way ANOVA when there is no history information available for ( m , v ) .

3.4. Rats Data Example

In the rat data example from WinBUGS manual [44], 30 rats’ weights (in kg) were measured weekly for five weeks. Let y i j be the weight of the ith rat measured in week j and a i j be the corresponding age, i = 1 , , 30 , j = 1 , , 5 . Consider the mixed model (18) with x i j = ( 1 , a i j ) , z i j = ( 1 , a i j ) and γ i = ( γ i 1 , γ i 2 ) i i d N k ( 0 , Ω ) , where Ω = diag ( σ r 1 2 , σ r 2 2 ) . Typically vague priors are used, e.g., γ i 1 N ( 0 , σ r 1 2 ) , γ i 2 N ( 0 , σ r 2 2 ) , σ r 1 2 , σ r 2 2 i i d Γ 1 ( 0.01 , 0.01 ) . The marginal mean and variance for the response y i j under this prior can be estimated via Monte Carlo (MC) by simulating σ r 1 2 , σ r 2 2 , γ i 1 , γ i 2 , β = ( β 0 , β 1 ) N 2 ( 0 , 10 2 I 2 ) , and y i j ( l ) i i d N ( x i j β + z i j γ i , σ 2 ) , i = 1 , , 30 , j = 1 , , 5 , l = 1 , , 1000 , yielding 1000 datesets { y i j ( l ) : i = 1 , , 30 , j = 1 , , 5 } , where σ 2 Γ 1 ( 0.01 , 0.01 ) . The average prior sample mean (across the 1000 datasets) and prior sample variance are around 2.5 × 10 162 and (as reported in R), respectively. These substantially differ from the observed sample mean and variance of y ¯ = 0.243 and s y 2 = 0.004 . In contrast, a similar simulation under our proposed new g-prior in (22) with a 1 = b 1 = a 2 = b 2 = 1 , m 0 = 0.243 , v 0 = 0.004 , k m = 2 and k v = 1 yields an average sample mean of 0.249 with MC standard deviation of 0.144 and an average sample variance of 0.024 with MC standard deviation of 0.120. That is, the inference under our prior focuses on a much smaller set of potential models around those that could have conceivably generated the observed marginal data. The posterior estimates for β 0 , β 1 , and σ 2 under our proposed new g-prior are 0.1073 (0.0051), 0.0062 (0.0002), and 0.00004 (0.000006), respectively, where the values in parentheses are posterior standard deviations. The commonly used vague priors specified above yield to similar estimates but with much higher posterior standard deviations: 0.1067 (0.0059), 0.0061 (0.0049), and 0.00006 (0.000009).

3.5. Model Fitting via Block MCMC

Although the previous section portions variability due to X β and Z i γ i separately, Ref. [45] note that updating ( β , γ ) in one large block virtually eliminates problematic MCMC mixing, as β and γ are often highly correlated in the posterior. An optimal approach considers the full model (18) jointly
y = X Z ˜ β γ + ϵ ,
where Z ˜ = block-diag ( Z 1 , , Z c ) = d e f block-diag { Z i | i = 1 , , c } . Under the prior (22), the full conditional for ( β , γ ) is
β γ | y , σ 2 , g N p + c k μ n , Σ n ,
where
μ n = 1 σ 2 Σ n X Z ˜ y + p σ 2 n ( v σ 2 g k ) X X m e i 0 Σ n = σ 2 1 + p σ 2 n ( v σ 2 g k ) X X X Z ˜ Z ˜ X block-diag Z i Z i + w i σ 2 g n Z Z | i = 1 , , c 1 .
The full conditionals for σ 2 and g do not correspond to any known distributions, so an adaptive Metropolis algorithm [46] can be used.

4. Simulation Study

In all simulation studies, for each MCMC run, 5000 scans were thinned from 20,000 after a burn-in period of 2000 iterations; convergence diagnostics deemed this more than adequate. We use posterior means as the point estimates for all parameters. R functions to implement linear and linear mixed models using the proposed priors are provided in Supplementary Materials.

4.1. Simulation I: Fixed Effects Model

Simulations were carried out to evaluate the proposed methodology and compare it to the benchmark prior, local empirical Bayes (EB) approach and a hyper-g prior considered in [12]. Data were generated from the Gaussian regression model
y i = β x i + ϵ i , ϵ i i i d N ( 0 , σ 2 ) , i = 1 , , n ,
where β = ( β 1 , β 2 , , β p ) and x i = ( 1 , x i 2 , , x i p ) . Let X c be the usual centered design matrix for ( x i 2 , , x i p ) . The benchmark and EB methods consider the following priors
( β 2 , , β p ) = d e f β * N p 1 ( 0 , g σ 2 ( X c X c ) 1 ) β 1 N ( 0 , 10 10 ) σ 2 Γ 1 ( 0.001 , 0.001 ) ,
where g = max { n , ( p 1 ) 2 } is set for the benchmark method and g = max { 0 , R 2 ( n p ) / ( 1 R 2 ) / p } is used for the EB approach, where R 2 is the R-squared value under the considered model. The hyper-g is given by
β * N p 1 ( 0 , g σ 2 ( X c X c ) 1 ) β 1 N ( 0 , 10 10 ) σ 2 Γ 1 ( 0.001 , 0.001 ) g 1 + g beta ( 1 , h / 2 1 ) ,
where we set h = 3 in all simulations which is the same as the setting used in [12].

4.1.1. Parameter Estimation

First we evaluate the performance for estimating model parameters using various methods. We generated ( x i 2 , , x i p ) i i d N p 1 ( 1 , Σ ρ ) , where Σ ρ has diagonals being 1 and off-diagonals being ρ . We set p = 3 , σ 2 = 1 , ρ = 0.9 , and β = ( 0.3 , 0.3 , 0.3 ) , yielding R-squared values around 0.26. The true marginal mean and variance of y i are given by m T = E ( y i ) = β 1 + β 2 + β 3 = 0.9 and v T = var ( y i j ) = σ 2 + ( β 2 , β 3 ) Σ ρ ( β 2 , β 3 ) = 1.342 , respectively. We implemented our proposed prior in (5) with a = b = 1 .
To evaluate how historical data can improve the parameter estimation accuracy, we additionally generated y o i s of size M = 50 in the same way as generating y i s and considered three settings of the hyperprior for ( m , v ) : (V1) new-true, when infinite historical data available, m 0 = m T , v 0 = v T , k m = 10 10 and k v = 10 10 , i.e., ( m , v ) is fixed at the truth ( m T , v T ) ; (V2) new-hist, when a small set of historical data available, m 0 = y ¯ o , v 0 = s y o 2 , k m = M and k v = M / 2 ; (V3) new-none, when no historical data available, m 0 = y ¯ , v 0 = s y 2 , k m = 2 and k v = 1 .
Let θ be a generic parameter and θ ^ be an estimate. The mean squared error (MSE) for θ ^ is defined as MSE = θ ^ θ 2 = j ( θ ^ j θ j ) 2 . The bias for θ ^ is defined as j ( θ ^ j θ j ) . Table 1 reports the average bias and MSE values and coverage probabilities with interval widths across 500 Monte Carlo (MC) replicates. When n = 100 , our method without using history information (new-none) performs very similarly to the other three completing methods. When a little history information is available, our prior (new-hist) has significantly lower MSE values and reduced interval widths on estimating σ 2 without compromising the coverage probabilities; the performance for estimating β j s is also slightly better than other approaches. When the true information on ( m , v ) is available, the estimation performance under our prior (new-true) is further improved comparing to new-hist. Regarding the estimation bias, we can see that all informative priors lead to biased estimates with a general trend that higher informativeness of the prior leads to larger biases. As the sample size increases to n = 500 , although our methods (new-hist and new-true) still outperform other priors, the differences become smaller.

4.1.2. Variable Selection

For a given p 2 , we generated ( x i 2 , , x i p ) = d e f x i * as follows: (i) simulate x i * i i d N p 1 ( 1 , Σ ρ ) ; (ii) set the even elements of x i * to be binary by setting them to 0 if less than 1 and to be 1 if greater than 1. We set ρ = 0.7 , p = 16 and β = ( β l , 0 ) , where β l is the first l elements of β for l = 1 , 2 , 3 , 4 , 7 , 10 , 13 , 16 . That is, among the p = 16 covariates (including the intercept), there are l of them having non-zero coefficients. For each given l, we generated σ 2 beta ( 400 , 100 ) and β l N l ( 2 e 1 , n l ( X l X l ) 1 ( 1 σ 2 ) ) , where X l is the design matrix for ( 1 , x i 1 , , x i l ) . These settings yield R-squared values ranging from 0.11 to 0.30 for l = 1 , 2 , 3 , 4 , 7 , 10 , 13 , 16 . For our method, we additionally generated y o i s of size M = 50 in the same way as generating y i s and considered the same three versions of the hyper prior for ( m , v ) as in Section 4.1.1: (V1) new-true; (V2) new-hist; (V3) new-none. To compare our methods to the benchmark, EB and hyper-g approaches, we considered the following three cases under each prior: (C1) implement the variable selection procedure and obtain OLS estimation using the selected model; (C2) obtain Bayesian estimation using the true model; (C3) obtain Bayesian estimation using the full model. Here (C1) is used to compare the pure variable selection performance, (C2) is used to compare the predictive performance under the true model, and (C3) is used to compare the overall predictive performance when the model contains noisy covariates. For all Bayesian methods, posterior means β ^ were used for estimating β .
Table 2 reports the average values of X β X β ^ 2 / n across 200 MC replicates with n = 100 . When OLS is used for fitting the selected model, the three versions of our methods perform very similarly, indicating that the history information on ( m , v ) has little influence on variable selection accuracy. Comparing to EB and hyper-g priors, our methods perform slightly better when the true model size is small ( l 2 ), and perform very similarly when l 3 . The benchmark prior works much better when the true model size is less than or equal to 7, but performs much worse when the true model size increases. The reason is that the benchmark prior sets g = max ( n , ( p 1 ) 2 ) = 225 which leads to a more flat prior on β . When Bayesian estimation is used under the true or full model and there is some history information available on ( m , v ) , our methods (both new-hist and new-true) outperform the other methods, and the benchmark prior is the worst due to its large choice of g. The only case where new-hist and new-true do not perform better is when the full model is fit but the null model is the truth i.e., l = 1 under (C3), for which more historical data (see new-true) will help. Even when we don’t have any history information on ( m , v ) , the results under (C2) and (C3) show that our method performs slightly better than other methods, especially when l 4 .

4.2. Simulation II: Random One-Way ANOVA

Data are generated from the random one-way ANOVA model
y i j = β 1 + γ i + ϵ i j , γ i i i d N ( 0 , σ r 2 ) , ϵ i j i i d N ( 0 , σ 2 ) , i = 1 , , c , j = 1 , , n i .
where we set β 1 = 2 , c = 10 , σ 2 = 1 . In addition, we consider σ r 2 = 0.25 , 0.5 and n i dis-unif [ 10 , 15 ] , where dis-unif [ a , b ] represents a discrete uniform distribution with support being all integers with [ a , b ] . The true marginal mean and variance of y i j are given by m T = E ( y i j ) = β 1 = 2 and v T = v a r ( y i j ) = σ 2 + σ r 2 , respectively. We implement our proposed default prior in (22) with a 1 = b 1 = a 2 = b 2 = 1 and the hyper-prior settings recommended in Section 3.3. Then σ r 2 can be estimated from the posterior samples of g.
We additionally generate { y o i j | i = 1 , , 10 ; j = 1 , , 10 } in the same way as generating y i j s and consider three versions of the hyper prior for ( m , v ) : (V1) new-true, m 0 = m T , v 0 = v T , k m = 10 10 and k v = 10 10 ; (V2) new-hist, m 0 = m ^ o , v 0 = v ^ o , k m = n o m and k v = n o v / 2 ; (V3) new-none, m 0 = m ^ , v 0 = v ^ , k m = 2 and k v = 1 ; see Section 3.3 for the definitions of these hyper-parameters. We also compare our methods to the σ r uniform ( 0 , 10 2 ) prior [33], the σ r 2 uniform ( 0 , 10 4 ) prior [47], the σ r 2 Γ 1 ( 0.001 , 0.001 ) prior [47] and the σ 2 / ( σ 2 + σ r 2 ) uniform ( 0 , 1 ) shrinkage prior [35]. For these alternative priors, the typical priors N ( 0 , 10 3 ) and Γ 1 ( 0.001 , 0.001 ) are used on β 1 and σ 2 , respectively.
Table 3 reports the average bias and MSE values and coverage probabilities with interval widths across 500 MC replicates, where the coverage probabilities for γ i s are defined as the average coverage across all γ i s for i = 1 , , c . Our approach with new-hist or new-true has significantly lower MSE values and narrower interval widths for estimating all model parameters while maintaining coverage probabilities around the nominal level 95 % than other methods in all cases. Even when history information on ( m , v ) is not available, our method with new-none still has much lower MSE values for estimating σ r 2 and narrower confidence interval widths than all other priors. Note that the induced prior under new-true essentially assumes that the prior variance of β 1 is zero, so we didn’t report the coverage probability for β 1 here.

4.3. Simulation III: Random Intercept Model

Data were generated from the mixed model
y i j = β x i j + γ i + ϵ i j , γ i i i d N ( 0 , σ r 2 ) , ϵ i j i i d N ( 0 , σ 2 ) , i = 1 , , c , j = 1 , , n i ,
where β = ( β 1 , β 2 , , β p ) , x i j = ( 1 , x i j 2 , , x i j p ) and ( x i j 2 , , x i j p ) i i d N p 1 ( 1 , Σ ρ ) . We set p = 3 , ρ = 0.9 , β = ( 0.5 , 0.5 , 0.5 ) , c = 10 , σ 2 = 1 . In addition, we consider σ r 2 = 0.25 , 0.5 and n i dis-unif [ 10 , 15 ] . The true marginal mean and variance of y i j are given by m T = E ( y i j ) = β 1 + β 2 + β 3 = 1.5 and v T = v a r ( y i j ) = σ 2 + σ r 2 + ( β 2 , β 3 ) Σ ρ ( β 2 , β 3 ) = 1.95 + σ r 2 , respectively. The prior settings are the same as those used in Section 4.2.
Table 4 reports the average bias and MSE values and coverage probabilities with interval widths across 500 MC replicates. Our approach with new-hist or new-true has significantly lower MSE values and narrower interval widths for estimating all model parameters while maintaining coverage probabilities around the nominal level 95 % than other methods in all cases. Even when history information on ( m , v ) is not available, our method with new-none still has much lower MSE values for estimating ( β 2 , β 3 ) and σ r 2 with slightly narrower confidence interval widths than all other priors.

4.4. Simulation IV: Linear Mixed Model

Data were generated from the mixed model
y i j = β x i j + γ i z i j + ϵ i j , γ i i i d N k ( 0 , Ω i ) , ϵ i j i i d N ( 0 , σ 2 ) , i = 1 , , c , j = 1 , , n i ,
where β = ( β 1 , β 2 , β 3 ) = ( 0.5 , 0.5 , 0.5 ) , x i j = ( 1 , x i j 2 , x i j 3 ) i n d N 3 ( 1 , Σ i ) , k = 2 , γ i = ( γ i 1 , γ i 2 ) , z i j = ( 1 , x i j 2 ) , and Ω i = g ( 1 1 + Σ i ) [ 1 : 2 , 1 : 2 ] 1 . Here we set σ 2 = 1 , c = 10 , Σ i = diag ( 0 , 1 , 1 ) for i = 1 , , 5 and Σ i = diag ( 0 , 4 , 4 ) for i = 6 , , 10 . Under this setting, we have the total random effective variance equal to var ( z i j γ ) = g k = 2 g = d e f σ r 2 . We consider σ r 2 = 0.5 , 1 and n i dis-unif [ 10 , 15 ] .
We implement our proposed default prior in (22) with a 1 = b 1 = a 2 = b 2 = 1 and a more general version of it with γ i | g i n d . N k 0 , g n i ( Z i Z i ) 1 (denoted as new-i below). Regarding the hyperprior of ( m , v ) , we only consider new-hist and new-none as defined in Section 4.2, considering that the true marginal mean and variance of y i j are not available in closed forms. We then compare our methods to the prior proposed in [38]: γ i | Ω i i d N k ( 0 , Ω ) where Ω | σ 2 W 1 ( k , k R ) , R = c σ 2 Z Z 1 .
Table 5 reports the average bias and MSE values and coverage probabilities with interval widths across 500 MC replicates, where the coverage probabilities for γ i j ’s are defined as the average coverage across all γ i j ’s over i = 1 , , c for each j = 1 , 2 . Comparing our default prior in (22) with its more general version new-i, the new-i method has lower MSE values for estimating most model parameters and is markedly better for estimating γ i j and σ r 2 . Comparing to our default prior (22) and the prior in [38] (both assuming homogeneous covariance for γ i ), our prior has much lower MSE values and narrower interval widths for estimating γ i j s while maintaining coverage probabilities around the nominal level 95 % . When the more general prior new-i is used, our method consistently performs better than [38] on estimating all model parameters.

5. Discussion

Prior elicitation plays an important role in Bayesian inference. We have proposed a novel, yet remarkably simple class of informative g-priors for linear mixed models elicited from existing information on the marginal distribution of the responses. The prior is firstly developed for the linear regression model (2) assuming that a subject-matter expert has information on the marginal distribution y i i i d ( m , v ) . A simple, intuitive interpretation of the prior is obtained: when σ 2 = v the model explains nothing (i.e., reduces to the null model), when σ 2 = 0 the model explains all variability in responses; furthermore, the use of a generalized beta prior on σ 2 [ 0 , v ] allows one to specify the prior information on the amount of variation explained by the considered model. The proposed prior also naturally reduces to a modified version of the hyper- g / n prior introduced in [12] when there is no history information available for ( m , v ) . Under the Gaussian linear regression models with the proposed g-prior, Bayes factors for comparing all possible submodels can be easily computed for the purpose of variable selection and do not encounter the information paradox commonly seen in Zellner’s g-priors with fixed g. Our approach is further extended for use in linear mixed models. Interesting relationships between the proposed g-priors and some other commonly used priors in mixed models are discussed. For example, under the random effect one-way ANOVA, the proposed prior (22) with a reference hyper prior on ( m , v ) reduces exactly to the shrinkage prior of [35]. Posterior sampling for all considered models can be obtained using JAGS via R. Finally, extensive simulation studies reveal that the proposed g-prior outperforms almost all other approaches under consideration when some history information on ( m , v ) is available. Even without historical data, better performance of the proposed new g-prior over other priors is still seen in many settings. Interesting generalizations of the proposed idea may include additive penalized B-spline regression, variable selection in the linear mixed models and prior elicitation for generalized linear mixed models. Recently, Ref. [48] proposed two informative priors for the between-cluster slope in a multilevel latent covariate model. However, extension of their methods to multiple covariates has not been investigated. It would be interesting to extend the proposed g-prior here to general multilevel latent covariate models.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/stats6010011/s1, R functions to fit the linear and linear mixed models.

Author Contributions

Conceptualization, Y.-F.C., H.Z. and T.H.; methodology, Y.-F.C., H.Z. and T.H.; software, Y.-F.C., H.Z. and T.H.; validation, Y.-F.C., H.Z. and T.H.; formal analysis, Y.-F.C., H.Z. and T.H.; investigation, Y.-F.C., H.Z. and T.H.; resources, Y.-F.C., H.Z., T.H. and T.L.; data curation, Y.-F.C., H.Z. and T.H.; writing—original draft preparation, Y.-F.C., H.Z. and T.H.; writing—review and editing, Y.-F.C., H.Z., T.H. and T.L.; visualization, Y.-F.C., H.Z. and T.H.; supervision, H.Z., T.H. and T.L.; project administration, H.Z. and T.H.; funding acquisition, H.Z., T.H. and T.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors wish to thank the Editor and four anonymous referees for their insightful comments and suggestions that greatly improved the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Sun, C.Q.; Prajna, N.V.; Krishnan, T.; Mascarenhas, J.; Rajaraman, R.; Srinivasan, M.; Raghavan, A.; O’Brien, K.S.; Ray, K.J.; McLeod, S.D.; et al. Expert Prior Elicitation and Bayesian Analysis of the Mycotic Ulcer Treatment Trial I. Investig. Ophthalmol. Vis. Sci. 2013, 54, 4167–4173. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Hampson, L.V.; Whitehead, J.; Eleftheriou, D.; Brogan, P. Bayesian methods for the design and interpretation of clinical trials in very rare diseases. Stat. Med. 2014, 33, 4186–4201. [Google Scholar] [CrossRef] [Green Version]
  3. Zhang, G.; Thai, V.V. Expert elicitation and Bayesian Network modeling for shipping accidents: A literature review. Saf. Sci. 2016, 87, 53–62. [Google Scholar] [CrossRef]
  4. Food and Drug Administration. Guidance for the use of Bayesian statistics in medical device clinical trials. Guid. Ind. Fda Staff. 2010, 2006, 1–50. [Google Scholar]
  5. O’Hagan, A. Eliciting expert beliefs in substantial practical applications. J. R. Stat. Soc. Ser. 1998, 47, 21–35. [Google Scholar]
  6. Kinnersley, N.; Day, S. Structured approach to the elicitation of expert beliefs for a Bayesian-designed clinical trial: A case study. Pharm. Stat. 2013, 12, 104–113. [Google Scholar] [CrossRef]
  7. Dallow, N.; Best, N.; Montague, T.H. Better decision making in drug development through adoption of formal prior elicitation. Pharm. Stat. 2018, 17, 301–316. [Google Scholar] [CrossRef] [Green Version]
  8. Hartmann, M.; Agiashvili, G.; Bürkner, P.; Klami, A. Flexible Prior Elicitation via the Prior Predictive Distribution. In Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), Virtual, 3–6 August 2020; Peters, J., Sontag, D., Eds.; PMLR: London, UK, 2020; Volume 124, pp. 1129–1138. [Google Scholar]
  9. Zellner, A. Applications of Bayesian Analysis in Econometrics. Statistician 1983, 32, 23–34. [Google Scholar] [CrossRef]
  10. Zellner, A. On Assessing Prior Distributions and Bayesian Regression Analysis With g-Prior Distributions. Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti; North-Holland/Elsevier: Amsterdam, The Netherlands, 1986; pp. 233–243. [Google Scholar]
  11. Li, Y.; Clyde, M.A. Mixtures of g-priors in generalized linear models. J. Am. Stat. Assoc. 2018, 113, 1828–1845. [Google Scholar] [CrossRef] [Green Version]
  12. Liang, F.; Paulo, R.; Molina, G.; Clyde, M.A.; Berger, J.O. Mixtures of g priors for Bayesian variable selection. J. Am. Stat. Assoc. 2008, 103, 410–423. [Google Scholar] [CrossRef]
  13. Bedrick, E.J.; Christensen, R.; Johnson, W. A New Perspective on Priors for Generalized Linear Models. J. Am. Stat. Assoc. 1996, 91, 1450–1460. [Google Scholar] [CrossRef]
  14. Hosack, G.R.; Hayes, K.R.; Barry, S.C. Prior elicitation for Bayesian generalised linear models with application to risk control option assessment. Reliab. Eng. Syst. Saf. 2017, 167, 351–361. [Google Scholar] [CrossRef]
  15. Ibrahim, J.G.; Chen, M.H. Power prior distributions for regression models. Stat. Sci. 2000, 15, 46–60. [Google Scholar]
  16. Ibrahim, J.G.; Chen, M.H.; Sinha, D. On optimality properties of the power prior. J. Am. Stat. Assoc. 2003, 98, 204–213. [Google Scholar] [CrossRef]
  17. Hobbs, B.P.; Carlin, B.P.; Mandrekar, S.J.; Sargent, D.J. Hierarchical commensurate and power prior models for adaptive incorporation of historical information in clinical trials. Biometrics 2011, 67, 1047–1056. [Google Scholar] [CrossRef] [PubMed]
  18. Ibrahim, J.G.; Chen, M.H.; Gwon, Y.; Chen, F. The power prior: Theory and applications. Stat. Med. 2015, 34, 3724–3749. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  19. Agliari, A.; Parisetti, C.C. A-g Reference Informative Prior: A Note on Zellner’s g-Prior. J. R. Stat. Soc. Ser. D 1988, 37, 271–275. [Google Scholar] [CrossRef]
  20. van Zwet, E. A default prior for regression coefficients. Stat. Methods Med. Res. 2019, 28, 3799–3807. [Google Scholar] [CrossRef]
  21. Plummer, M. JAGS: A Program for Analysis of Bayesian Graphical Models Using Gibbs Sampling. In Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003), Vienna, Austria, 20–22 March 2003; Hornik, K., Leisch, F., Zeileis, A., Eds.; ISSN 1609-395X. [Google Scholar]
  22. Su, Y.S.; Yajima, M. R2jags: Using R to Run ‘JAGS’; R Package Version 0.5-7; R Foundation for Statistical Computing: Vienna, Austria, 2015. [Google Scholar]
  23. Hanson, T.E.; Branscum, A.J.; Johnson, W.O. Informative g-Priors for Logistic Regression. Bayesian Anal. 2014, 9, 597–612. [Google Scholar] [CrossRef]
  24. Lally, N.R. The Informative g-Prior vs. Common Reference Priors for Binomial Regression with an Application to Hurricane Electrical Utility Asset Damage Prediction. Master’s Thesis, University of Connecticut, Mansfield, CT, USA, 31 July 2015. [Google Scholar]
  25. Carlin, B.P.; Gelfand, A.E. An iterative Monte Carlo method for nonconjugate Bayesian analysis. Stat. Comput. 1991, 1, 119–128. [Google Scholar] [CrossRef]
  26. Liu, C.; Martin, R.; Syring, N. Efficient simulation from a gamma distribution with small shape parameter. Comput. Stat. 2017, 32, 1767–1775. [Google Scholar] [CrossRef]
  27. Gabry, J.; Simpson, D.; Vehtari, A.; Betancourt, M.; Gelman, A. Visualization in Bayesian workflow. J. R. Stat. Soc. Ser. A 2019, 182, 389–402. [Google Scholar] [CrossRef] [Green Version]
  28. Gelman, A.; Simpson, D.; Betancourt, M. The Prior Can Often Only Be Understood in the Context of the Likelihood. Entropy 2017, 19, 555. [Google Scholar] [CrossRef]
  29. Wesner, J.S.; Pomeranz, J.P.F. Choosing priors in Bayesian ecological models by simulating from the prior predictive distribution. Ecosphere 2021, 12, e03739. [Google Scholar] [CrossRef]
  30. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
  31. Murphy, K.P. Conjugate Bayesian Analysis of the Gaussian Distribution; Technical Report; University of British Columbia: Vancouver, BC, Canada, 3 October 2007. [Google Scholar]
  32. Berger, J.O.; Pericchi, L.R.; Ghosh, J.; Samanta, T.; De Santis, F.; Berger, J.; Pericchi, L. Objective Bayesian methods for model selection: Introduction and comparison. Lect.-Notes-Monogr. Ser. 2001, 38, 135–207. [Google Scholar]
  33. Gelman, A. Prior distributions for variance parameters in hierarchical models. Bayesian Anal. 2006, 1, 515–533. [Google Scholar] [CrossRef]
  34. Box, G.E.P.; Tiao, G.C. Bayesian Inference in Statistical Analysis; Addison-Wesley: Reading, MA, USA, 1973. [Google Scholar]
  35. Daniels, M.J. A prior for the variance in hierarchical models. Can. J. Stat. 1999, 27, 567–578. [Google Scholar] [CrossRef] [Green Version]
  36. Wang, M. Mixture of g-priors for analysis of variance models with a divergining number of parameters. Bayesian Anal. 2017, 12, 511–532. [Google Scholar] [CrossRef]
  37. Lin, P.E. Some characterizations of the multivariate t distribution. J. Multivar. Anal. 1972, 2, 339–344. [Google Scholar] [CrossRef] [Green Version]
  38. Kass, R.E.; Natarajan, R. A default conjugate prior for variance components in a generalized linear mixed models (Comment on article by Browne and Draper). Bayesian Anal. 2006, 1, 535–542. [Google Scholar] [CrossRef]
  39. Natarajan, R.; Kass, R.E. Reference Bayesian methods for generalized linear mixed models. J. Am. Stat. Assoc. 2000, 95, 227–237. [Google Scholar] [CrossRef]
  40. Huang, A.; Wand, M.P. Simple marginally noninformative prior distributions for covariance matrices. Bayesian Anal. 2013, 8, 439–452. [Google Scholar] [CrossRef]
  41. Demirhan, H.; Kalaylioglu, Z. Joint prior distributions for variance parameters in Bayesian analysis of normal hierarchical models. J. Multivar. Anal. 2015, 135, 163–174. [Google Scholar] [CrossRef]
  42. Bates, D.; Mächler, M.; Bolker, B.; Walker, S. Fitting Linear Mixed-Effects Models Using lme4. J. Stat. Softw. 2015, 67, 1–48. [Google Scholar] [CrossRef]
  43. Burdick, R.K.; Borror, C.M.; Montgomery, D.C. Design and Analysis of Gauge R and R Studies: Making Decisions with Confidence Intervals in Random and Mixed ANOVA Models; SIAM: Philadelphia, PA, USA, 2005. [Google Scholar]
  44. Spiegelhalter, D.; Thomas, A.; Best, N.; Lunn, D. WinBUGS User Manual, Version 1.4; Medical Research Council Biostatistics Unit: Cambridge, UK, 2003. [Google Scholar]
  45. Sargent, D.J.; Hodges, J.S.; Carlin, B.P. Structured Markov Chain Monte Carlo. J. Comput. Graph. Stat. 2000, 9, 217–234. [Google Scholar]
  46. Haario, H.; Saksman, E.; Tamminen, J. An Adaptive Metropolis Algorithm. Bernoulli 2001, 7, 223–242. [Google Scholar] [CrossRef] [Green Version]
  47. Brown, W.J.; Draper, D. A comparison of Bayesian and likelihood-based methods for fitting multilevel models. Bayesian Anal. 2006, 1, 473–514. [Google Scholar] [CrossRef]
  48. Zitzmann, S.; Helm, C.; Hecht, M. Prior specification for more stable Bayesian estimation of multilevel latent variable models in small samples: A comparative investigation of two different approaches. Front. Psychol. 2021, 11, 611267. [Google Scholar] [CrossRef]
Table 1. Simulation I: Average biases (MSEs), coverage probabilities (interval widths) across 500 MC replicates in the simulation study for parameter estimation. Here, new-true, new-hist and new-none corresponds to the three hyperprior versions (V1), (V2) and (V3), respectively.
Table 1. Simulation I: Average biases (MSEs), coverage probabilities (interval widths) across 500 MC replicates in the simulation study for parameter estimation. Here, new-true, new-hist and new-none corresponds to the three hyperprior versions (V1), (V2) and (V3), respectively.
Bias (MSE) Coverage (Width)
Method β 1 ( β 2 , β 3 ) σ 2 β 1 β 2 β 3 σ 2
n = 100
new-true0.041 (0.0196)−0.050 (0.0894)−0.012 (0.0092) 0.94 (0.55)0.97 (0.86)0.96 (0.86)0.98 (0.43)
new-hist0.038 (0.0202)−0.045 (0.0906)−0.011 (0.0157) 0.95 (0.56)0.98 (0.88)0.97 (0.87)0.96 (0.51)
new-none0.022 (0.0205)−0.031 (0.0948)0.011 (0.0206) 0.95 (0.57)0.98 (0.90)0.97 (0.90)0.95 (0.57)
benchmark−0.012 (0.0194)0.003 (0.1049)0.006 (0.0205) 0.96 (0.57)0.97 (0.91)0.96 (0.92)0.95 (0.57)
EB0.018 (0.0212)−0.027 (0.0961)0.023 (0.0219) 0.94 (0.56)0.97 (0.90)0.96 (0.90)0.95 (0.58)
hyper-g0.037 (0.0231)−0.047 (0.0906)0.034 (0.0229) 0.95 (0.59)0.97 (0.89)0.97 (0.89)0.95 (0.60)
n = 500
new-true0.010 (0.0043)−0.012 (0.0216)0.004 (0.0034) 0.96 (0.25)0.95 (0.40)0.94 (0.40)0.96 (0.24)
new-hist0.010 (0.0044)−0.012 (0.0217)−0.001 (0.0038) 0.95 (0.25)0.94 (0.40)0.94 (0.40)0.96 (0.24)
new-none0.007 (0.0044)−0.008 (0.0219)0.003 (0.0041) 0.95 (0.25)0.95 (0.40)0.94 (0.40)0.96 (0.25)
benchmark0.000 (0.0043)−0.002 (0.0223)0.003 (0.0041) 0.95 (0.25)0.94 (0.40)0.94 (0.40)0.96 (0.25)
EB0.006 (0.0044)−0.007 (0.0219)0.006 (0.0042) 0.95 (0.25)0.94 (0.40)0.94 (0.40)0.96 (0.25)
hyper-g0.009 (0.0045)−0.011 (0.0217)0.008 (0.0042) 0.95 (0.26)0.95 (0.40)0.94 (0.40)0.95 (0.25)
Table 2. Simulation I: Average X β X β ^ 2 / n values across 200 MC replicates in the simulation study for variable selection.
Table 2. Simulation I: Average X β X β ^ 2 / n values across 200 MC replicates in the simulation study for variable selection.
MethodSize = 1Size = 2Size = 3Size = 4Size = 7Size = 10Size = 13Size = 16
(C1) OLS estimation using the selected model
new-true0.0640.0760.0790.0950.1080.1160.1300.138
new-hist0.0620.0740.0780.0950.1070.1160.1300.137
new-none0.0560.0720.0790.0940.1070.1160.1300.137
benchmark0.0250.0460.0520.0710.1000.1260.1450.159
EB0.1100.0870.0810.0950.1080.1150.1300.137
hyper-g0.0940.0810.0790.0930.1070.1140.1310.138
(C2) Bayesian estimation using the true model
new-true0.0000.0120.0200.0290.0420.0580.0690.078
new-hist0.0070.0130.0210.0300.0440.0610.0720.083
new-none0.0100.0150.0230.0320.0470.0630.0740.085
benchmark0.0100.0160.0240.0340.0570.0790.1030.127
EB0.0100.0160.0250.0340.0490.0650.0760.088
hyper-g0.0100.0160.0250.0340.0480.0640.0750.086
(C3) Bayesian estimation using the full model
new-true0.0170.0500.0600.0690.0720.0780.0790.078
new-hist0.0280.0560.0650.0730.0770.0820.0830.083
new-none0.0260.0580.0670.0750.0790.0830.0840.085
benchmark0.1540.1320.1260.1310.1310.1270.1290.127
EB0.0180.0570.0690.0780.0820.0870.0870.088
hyper-g0.0230.0570.0670.0750.0800.0840.0850.086
Table 3. Simulation II: Average biases (MSEs), coverage probabilities (interval widths) across 500 MC replicates in the simulation study for random one-way ANOVA model.
Table 3. Simulation II: Average biases (MSEs), coverage probabilities (interval widths) across 500 MC replicates in the simulation study for random one-way ANOVA model.
Bias (MSE) Coverage (Width)
Method β 1 γ i σ 2 σ r 2 β 1 γ i σ 2 σ r 2
σ r 2 = 0.25 , n i sample ( 10 , 15 )
new-true−0.000 (0.000)−0.015 (0.619)−0.010 (0.007)0.010 (0.007) -0.94 (0.94)0.97 (0.34)0.97 (0.34)
new-hist−0.000 (0.018)−0.0003 (0.711)−0.0001 (0.012)0.055 (0.019) 0.96 (0.55)0.95 (1.05)0.95 (0.46)0.97 (0.57)
new-none0.018 (0.035)−0.145 (0.796)0.002 (0.015)0.094 (0.034) 0.96 (0.78)0.95 (1.15)0.96 (0.51)0.98 (0.85)
unif σ r 0.017 (0.035)−0.139 (0.814)0.019 (0.016)0.140 (0.067) 0.97 (0.84)0.95 (1.20)0.95 (0.52)0.96 (1.13)
unif σ r 2 0.018 (0.035)−0.145 (0.801)0.015 (0.016)0.267 (0.140) 0.98 (0.96)0.98 (1.30)0.95 (0.52)0.95 (1.64)
gamma0.017 (0.035)−0.133 (0.846)0.026 (0.017)0.057 (0.040) 0.94 (0.77)0.93 (1.12)0.96 (0.53)0.93 (0.88)
shrink0.018 (0.035)−0.142 (0.796)0.007 (0.015)0.125 (0.046) 0.97 (0.83)0.96 (1.19)0.96 (0.51)0.98 (0.99)
σ r 2 = 0.5 , n i sample ( 10 , 15 )
new-true−0.0000 (0.000)−0.0006 (0.685)0.012 (0.011)−0.0012 (0.011) -0.94 (1.01)0.96 (0.43)0.96 (0.43)
new-hist−0.0000 (0.029)0.003 (0.892)0.010 (0.013)0.056 (0.045) 0.96 (0.71)0.95 (1.20)0.96 (0.49)0.95 (0.89)
new-none0.024 (0.059)−0.206 (1.092)0.007 (0.015)0.109 (0.079) 0.96 (0.98)0.96 (1.34)0.95 (0.51)0.98 (1.32)
unif σ r 0.023 (0.059)−0.199 (1.099)0.018 (0.016)0.264 (0.211) 0.97 (1.14)0.97 (1.46)0.96 (0.52)0.97 (2.02)
unif σ r 2 0.023 (0.059)−0.195 (1.092)0.016 (0.016)0.471 (0.434) 0.99 (1.27)0.98 (1.57)0.96 (0.52)0.95 (2.90)
gamma0.022 (0.059)−0.185 (1.115)0.021 (0.016)0.135 (0.127) 0.96 (1.05)0.96 (1.38)0.95 (0.53)0.95 (1.61)
shrink0.024 (0.059)−0.205 (1.092)0.011 (0.015)0.168 (0.112) 0.97 (1.07)0.96 (1.41)0.95 (0.52)0.98 (1.57)
Table 4. Simulation III: Average biases (MSEs), coverage probabilities (interval widths) across 500 MC replicates in the simulation study for random intercept model.
Table 4. Simulation III: Average biases (MSEs), coverage probabilities (interval widths) across 500 MC replicates in the simulation study for random intercept model.
Bias (MSE) Coverage (Width)
Method β 1 ( β 2 , β 3 ) γ i σ 2 σ r 2 β 1 β 2 β 3 γ i σ 2 σ r 2
σ r 2 = 0.25 , n i sample ( 10 , 15 )
new-true0.025 (0.034)−0.0032 (0.085)0.053 (0.772)0.038 (0.017)0.084 (0.022) 0.98 (0.81)0.95 (0.82)0.95 (0.82)0.96 (1.15)0.96 (0.52)0.98 (0.64)
new-hist0.028 (0.035)−0.0034 (0.085)0.046 (0.787)0.029 (0.017)0.083 (0.026) 0.97 (0.82)0.94 (0.81)0.94 (0.81)0.96 (1.14)0.96 (0.52)0.98 (0.70)
new-none0.025 (0.043)−0.0034 (0.085)0.076 (0.828)0.027 (0.017)0.106 (0.039) 0.94 (0.83)0.94 (0.81)0.94 (0.81)0.95 (1.15)0.97 (0.53)0.97 (0.88)
unif σ r −0.0011 (0.042)0.002 (0.090)0.074 (0.846)0.027 (0.018)0.154 (0.075) 0.96 (0.93)0.94 (0.83)0.94 (0.83)0.96 (1.22)0.96 (0.53)0.96 (1.15)
unif σ r 2 −0.0011 (0.041)0.002 (0.090)0.072 (0.832)0.022 (0.017)0.273 (0.146) 0.98 (1.03)0.94 (0.82)0.95 (0.83)0.97 (1.31)0.97 (0.53)0.93 (1.55)
gamma−0.0010 (0.042)0.001 (0.090)0.075 (0.941)0.052 (0.022)0.039 (0.043) 0.94 (0.84)0.94 (0.84)0.95 (0.84)0.92 (1.12)0.95 (0.57)0.96 (0.94)
shrink−0.0010 (0.041)0.001 (0.090)0.072 (0.827)0.015 (0.016)0.138 (0.052) 0.96 (0.93)0.94 (0.82)0.94 (0.82)0.96 (1.21)0.97 (0.52)0.97 (1.00)
σ r 2 = 0.5 , n i sample ( 10 , 15 )
new-true0.023 (0.046)−0.0032 (0.086)0.071 (0.976)0.035 (0.017)0.055 (0.028) 0.97 (0.94)0.94 (0.82)0.95 (0.82)0.96 (1.28)0.96 (0.52)0.99 (0.82)
new-hist0.025 (0.050)−0.0032 (0.086)0.061 (1.018)0.028 (0.017)0.069 (0.048) 0.97 (0.96)0.94 (0.81)0.94 (0.82)0.96 (1.29)0.96 (0.52)0.98 (1.00)
new-none0.019 (0.066)−0.0031 (0.086)0.109 (1.129)0.029 (0.018)0.131 (0.090) 0.93 (0.99)0.94 (0.82)0.95 (0.82)0.95 (1.31)0.96 (0.53)0.96 (1.38)
unif σ r −0.0014 (0.065)0.002 (0.091)0.107 (1.134)0.025 (0.018)0.292 (0.238) 0.97 (1.21)0.94 (0.83)0.95 (0.83)0.97 (1.48)0.97 (0.53)0.95 (2.07)
unif σ r 2 −0.0015 (0.065)0.002 (0.091)0.107 (1.127)0.023 (0.017)0.481 (0.445) 0.98 (1.33)0.94 (0.83)0.94 (0.83)0.98 (1.59)0.97 (0.53)0.93 (2.73)
gamma−0.0014 (0.066)0.001 (0.091)0.106 (1.208)0.042 (0.021)0.133 (0.145) 0.94 (1.11)0.94 (0.84)0.95 (0.84)0.95 (1.40)0.95 (0.57)0.96 (1.74)
shrink−0.0013 (0.065)0.002 (0.091)0.102 (1.127)0.019 (0.017)0.192 (0.126) 0.96 (1.15)0.94 (0.83)0.95 (0.83)0.96 (1.43)0.97 (0.53)0.97 (1.60)
Table 5. Simulation IV: Average biases (MSEs), coverage probabilities (interval widths) across 500 MC replicates in the simulation study for general mixed model. Here, the suffix -i refers to the prior in (22) with γ i | g i n d . N k 0 , g n i w i ( Z i Z i ) 1 ; KN refers to the prior introduced in [38].
Table 5. Simulation IV: Average biases (MSEs), coverage probabilities (interval widths) across 500 MC replicates in the simulation study for general mixed model. Here, the suffix -i refers to the prior in (22) with γ i | g i n d . N k 0 , g n i w i ( Z i Z i ) 1 ; KN refers to the prior introduced in [38].
Bias (MSE) Coverage (Width)
Method β 1 ( β 2 , β 3 ) γ i σ 2 σ r 2 β 1 β 2 β 3 γ 1 i γ 2 i σ 2 σ r 2
σ r 2 = 0.5 , n i sample ( 10 , 15 )
new-hist0.084 (0.048)−0.0071 (0.020)−0.160 (2.078)0.037 (0.019)0.093 (0.063) 0.96 (0.90)0.94 (0.49)0.96 (0.23)0.93 (1.40)0.91 (0.80)0.95 (0.55)0.95 (0.98)
new-hist-i0.077 (0.049)−0.0062 (0.018)−0.184 (2.045)0.030 (0.018)0.038 (0.043) 0.94 (0.86)0.92 (0.45)0.96 (0.23)0.94 (1.46)0.94 (0.90)0.96 (0.55)0.96 (0.87)
new-none0.086 (0.056)−0.0069 (0.021)−0.189 (2.129)0.035 (0.019)0.119 (0.091) 0.94 (0.92)0.94 (0.50)0.95 (0.23)0.93 (1.41)0.91 (0.81)0.95 (0.56)0.94 (1.16)
new-none-i0.081 (0.057)−0.0063 (0.019)−0.211 (2.101)0.026 (0.019)0.045 (0.053) 0.92 (0.87)0.91 (0.46)0.96 (0.23)0.94 (1.47)0.94 (0.90)0.96 (0.55)0.96 (0.97)
KN0.010 (0.050)0.005 (0.019)−0.167 (2.209)0.046 (0.019)- 0.96 (0.95)0.95 (0.53)0.96 (0.23)0.93 (1.40)0.91 (0.81)0.96 (0.55)-
σ r 2 = 1 , n i sample ( 10 , 15 )
new-hist0.091 (0.074)−0.103 (0.035)0.096 (2.735)0.019 (0.019)0.125 (0.165) 0.96 (1.11)0.89 (0.60)0.95 (0.23)0.94 (1.62)0.90 (0.94)0.94 (0.55)0.93 (1.47)
new-hist-i0.080 (0.070)−0.0089 (0.028)0.068 (2.544)0.013 (0.019)0.010 (0.114) 0.95 (1.06)0.91 (0.57)0.95 (0.23)0.95 (1.66)0.94 (1.01)0.94 (0.54)0.95 (1.32)
new-none0.086 (0.091)−0.0096 (0.036)0.083 (2.848)0.021 (0.020)0.240 (0.330) 0.94 (1.16)0.90 (0.63)0.95 (0.23)0.94 (1.66)0.91 (0.97)0.94 (0.55)0.91 (1.98)
new-none-i0.080 (0.087)−0.0088 (0.029)0.055 (2.670)0.013 (0.019)0.057 (0.172) 0.92 (1.09)0.90 (0.58)0.95 (0.23)0.94 (1.67)0.94 (1.02)0.94 (0.55)0.92 (1.61)
KN−0.0014 (0.088)0.002 (0.035)0.104 (3.050)0.053 (0.023)- 0.95 (1.23)0.93 (0.70)0.95 (0.23)0.93 (1.68)0.91 (1.00)0.94 (0.58)-
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chien, Y.-F.; Zhou, H.; Hanson, T.; Lystig, T. Informative g-Priors for Mixed Models. Stats 2023, 6, 169-191. https://doi.org/10.3390/stats6010011

AMA Style

Chien Y-F, Zhou H, Hanson T, Lystig T. Informative g-Priors for Mixed Models. Stats. 2023; 6(1):169-191. https://doi.org/10.3390/stats6010011

Chicago/Turabian Style

Chien, Yu-Fang, Haiming Zhou, Timothy Hanson, and Theodore Lystig. 2023. "Informative g-Priors for Mixed Models" Stats 6, no. 1: 169-191. https://doi.org/10.3390/stats6010011

APA Style

Chien, Y. -F., Zhou, H., Hanson, T., & Lystig, T. (2023). Informative g-Priors for Mixed Models. Stats, 6(1), 169-191. https://doi.org/10.3390/stats6010011

Article Metrics

Back to TopTop