Approximated Information Analysis in Bayesian Inference

In models with nuisance parameters, Bayesian procedures based on Markov Chain Monte Carlo (MCMC) methods have been developed to approximate the posterior distribution of the parameter of interest. Because these procedures require burdensome computations related to the use of MCMC, approximation and convergence in these procedures are important issues. In this paper, we explore Gibbs sensitivity by using an alternative to the full conditional distribution of the nuisance parameter. The approximate sensitivity of the posterior distribution of interest is studied in terms of an information measure, including Kullback–Leibler divergence. As an illustration, we then apply these results to simple spatial model settings.


Introduction
Let d denote the data, which can be scalar-or vector-valued, and suppose that d ∼ p(d|θ, β), where θ ∈ Θ is the parameter of interest and β ∈ B is a nuisance parameter.Realizations from the joint posterior distribution π(θ, β|d) can be produced by independent sampling based on π(θ, β|d) = π(θ|β, d) × π(β|d) or, if π(β|d) is intractable, by Gibbs sampling based on the full conditional distributions π(θ|β, d) and π(β|θ, d).Since β is a nuisance parameter, our primary interest is in the marginal posterior distribution π(θ|d) = π(θ, β|d)dβ.In general, it may often be feasible to integrate out some nuisance parameters either analytically or numerically.Missing data problems are brought into this framework by augmenting the observed data d with latent data β.
The latent variable/nuisance parameter scenario is a commonly studied one in the literature.One issue in nuisance parameter problems is the relationship between π(β|d) and π(θ|d).Our main interest lies in the impacts on the sensitivity of inferences based on the target marginal posterior distribution π(θ|d) compared with an approximation based on choosing π * (β|d), an alternative to the posterior distribution of the nuisance parameter, instead of π(β|d).The point is that π * (β|d) is a flexible and manageable approximation to an unmanageable π(β|d).For application to simple spatial model settings, we can consider the Gaussian approximation or the Laplace approximation to π(β|d).For example, we can use where πG (θ|β, d) is the Gaussian approximation to π(θ|β, d).Under "standard conditions," the Laplace approximation of a marginal posterior density has the error rate O(n −1 ) [1].
State-of-the-art Markov Chain Monte Carlo (MCMC) approaches to posterior inference typically revolve around reparameterizations. Yu and Meng introduce an alternative strategy for boosting MCMC efficiency by simply interweaving-but not alternating-two parameterizations, namely the centered parameterization and the non-centered parameterization, to ensure effective MCMC implementation [2].Filippone and Girolamipresent a pseudo-marginal MCMC approach to account for uncertainty in the model parameters when making model-based predictions on out-of-sample data [3].Attias presents the Variational Bayes framework, which provides a solution for the structure of models with latent variables [4].Here, Kullback-Leibler divergence is minimized between the posterior and a typically exponential family approximation [4].Expectation propagation is similar in nature [5].The integrated nested Laplace approximation is considered for the approximate Bayesian inference in latent Gaussian models [6].
In this paper, we explore Gibbs sensitivity by using an alternative to the full conditional distribution of the nuisance parameter.The approximate sensitivity of the posterior distribution of interest is studied in terms of an information measure including Kullback-Leibler divergence.As an illustration, we apply the proposed approach to PRUDENCE (Prediction of Regional scenarios and Uncertainties for Defining EuropeaN Climate change risks and Effects; http://prudence.dmi.dk/)ensemble of regional climate models over Central Europe (about 8000 grid points), which involves the analysis of large quantities of data.Furrer and Sain combine two techniques, namely tapering and backfitting, to model and analyze these spatial datasets [7].Kim and Kimpropose an approximate likelihood function of the spatial correlation parameter based on PRUDENCE data [8].This paper thus provides some information analysis for their approaches.
The rest of the paper is organized as follows.In Section 2, we describe the approximation setting in the Bayesian computation and discuss some of its sensitivity issues.Various theoretical results on the information analysis are provided in the subsequent section.As an illustration, the results from Section 3 are applied to simple spatial model settings in Section 4, and Section 5 concludes the study.

Sensitivity Issues
Let π(θ|β, d) and π(β|θ, d) be the full conditional distributions for the joint posterior distribution π(θ, β|d), which can be expressed in terms of these full conditional distributions.The consistency conditions on the full conditional distributions are required to reconstruct the joint posterior distribution (e.g., see the Hammersley-Clifford Theorem in [9]).Let π * (β|θ, d) be an approximated full conditional distribution of π(β|θ, d).Under the regulation condition with reference points β 0 and θ 0 , the joint posterior distributions can be written in the form where Therefore, where C and C * are normalizing constants, that is, .
Alternatively, the bounds for the (log) ratio of π(θ|d) and π * (θ|d) can be obtained in terms of the full conditional distribution of the nuisance parameter π(β|θ, d) (see [10]).
In practice, the marginal posterior distribution of the parameter of interest is hard to calculate analytically.Suppose π(θ|β, d) is a smooth (positive) function of β.By using Laplace's method, π(θ|d) is approximated as Assume β maximizes logπ(β|d), and define Then, π(θ|d) can be well approximated by Laplace's method.That is, where m is the dimension of β.More generally, we have Laplace's method requires three conditions, referred to as Laplace regularity: (1) the integrals in the equation must exist and be finite; (2) the determinant of the Hessians must be bounded away from zero at the optimizers; and (3) the log-likelihood must be differentiable on the parameters and all the partial derivatives be bounded in the neighborhood of the optimizers.These conditions imply, under mild assumptions, the asymptotic normality of the posterior.
Based on the above results, the bounds for the differences between the marginal posterior distributions can be approximated.Suppose that |π(β|d) − π * (β|d)| has a unique maximum at β.Then, where O(n −1 ) does not depend on θ.
Theorem 2. Suppose that β and β * maximize π(β|d) and π * (β|d), respectively.Then, we have where Under the weak conditions allowing the exchange of integral and limit, it can be shown that Î1 (π, π * ) converges to I(π, π * ).Suppose that both π(β|d) and π * (β|d) are maximized at β, a posterior mode.Then, Therefore, Note that the first-order approximation Î1 (π, π * ) depends only on the marginal posterior distribution of the nuisance parameters π(β|d) and π * (β|d) but not the full conditional distribution of the parameter of interest π(θ|β, d).Based on the asymptotic properties of the posterior distributions (which can be achieved easily under fairly general conditions when the true value of the parameter is in the support of the prior), a Gaussian distribution with mean β, the generalized MLE of β, and variance For a sufficiently large n, π * (β|d) is maximized at β, the posterior mode of π(β|d), and

Approximation to Other Information Measures
Instead of Kullback-Leibler divergence, other useful information measures based on uncertainty functions or the entropy function can be used.One information measure based on uncertainty functions [12] is Another information measure is based on Renyi's entropy function [13] : Theorem 3. Suppose that β maximizes both π(β|d) and π * (β|d).Then, Then, it suffices to show that Under similar conditions to those in Theorem 2, it can be shown that

Illustrative Example
We consider an approach to approximating the likelihood function of the spatial correlation parameters in the Gaussian random field.Consider the simple Gaussian random field Z ∼ M V N (0, Σ(θ)), where Σ(θ) is parameterized by a variance term and a correlation function.That is, Σ(θ) = σ 2 R(θ).Then, the likelihood function of (σ 2 , θ) is Note that in the problem with a large spatial domain, it is not computationally feasible to compute the likelihood function of the spatial correlation parameters because of R −1 .
The proposed approaches are also illustrated in regional climate models, which are used to model the evolution of a climate system over a limited area.These models address smaller spatial regions than global climate models.However, the higher resolution of regional climate models better captures the impact of local features such as lakes and mountains as well as the subgrid-scale atmospheric process.The PRUDENCE project involves regional models over Europe from various climate research centers (http://prudence.dmi.dk/) and employs a major archive of data of a 25-km resolution covering the 1951-2100 transient periods.In the analysis, spatial parameters are estimated based on the approximated likelihood approach using PRUDENCE data (about 8000 grid points).Here, the mean trend in the surface temperature change is modeled as follows: where I land/sea (s) is an indicator function for the sea and land, P (s) is the amount of seasonal precipitation, lon(s) is the longitude, lat(s) is the latitude, and elev(s) is the elevation in location s.
For the detrended surface temperature field, we consider a stationary Gaussian spatial process with an exponential covariance function, σ 2 exp − d 2 2ξ 2 .For simplicity, we also assume that σ 2 is known and θ = 2ξ 2 .Now we consider an approximated likelihood function for θ.Considering a log transformation for the correlation function ρ leads to where θ is the MLE of the spatial parameter θ, Note that the model coefficients (β 0 , β 1 , β 2 , β 3 , β 4 ) are the parameters of interest and that the correlation parameter θ is the nuisance parameter in this example.Here, the full conditional distribution of the parameter of interest, π(β 0 , β 1 , β 2 , β 3 , β 4 |θ, d), can be obtained based on the model of the mean trend in the surface temperature change, while the true marginal density of the nuisance parameter, π(β|d), and approximate marginal density of the nuisance parameter, π * (β|d), can be computed by using likelihood functions (1) and (3), respectively.Therefore, the full conditional distribution of the parameter of interest, (β 0 , β 1 , β 2 , β 3 , β 4 ), can be expressed as where D = {d(1), d(2), . . ., d(n)} is the observation vector and T = {T (1), T (2), . . ., T (n)} is the mean trend vector in (2).The approximated full conditional distribution of the correlation parameter θ is of the form whereas the exact full conditional distribution is quite similar to (5).Further, θ is the MLE of the spatial parameter θ in (4).The reference points for (β 0 , β 1 , β 2 , β 3 , β 4 ) and θ are randomly chosen in the neighborhood of the MLEs of the parameters.For more details, see [8].
In the next step, various grid points (n = 400, 900, 1600) are randomly chosen and then eliminated as in the simulation study.Table 1 provides the first-order-approximated Kullback-Leibler divergence along with the exact Kullback-Leibler divergence under various settings.The estimated information measures are quite efficient and competitive because the seasonal mean surface temperature fields from the global climate model are already smoothed.Further, the approximated likelihood function of the correlation parameter is not well estimated, particularly when the number of observations is less than 100.

Sampled Grid Size
First  1. First-order-approximated Kullback-Leibler divergence and the exact Kullback-Leibler divergence under the various settings of the grid points.

Summary
We introduced various ways of checking the sensitive effects on the target posterior distribution of the parameter of interest by using an alternative to the full conditional distribution of the nuisance parameter.By using Laplace's method, approximated Kullback-Leibler divergence between π(θ|d) and π * (θ|d) was also calculated in terms of the entropy of π(β|d) and π * (β|d) at the generalized MLE β.Other information measures provided similar results.However, it is still difficult to check analytically the robustness of the marginal posterior distribution of interest, π(θ|d), according to the choice of the full conditional distribution of the nuisance parameter, π(β|θ, d).Nonetheless, for the general class of available marginal posterior distributions of the nuisance parameters, the sensitivity and robustness to the marginal posterior distribution of the parameters of interest can be checked approximately.In addition, we can find a reasonable and flexible substitute for the complicated full conditional distribution of the nuisance parameter under the sensitivity and robustness criteria to the target distribution and then perform inference based on this substitute.Our approach can be applied to future sensitivity analysis on the posterior predictive distribution by using an approximated posterior distribution, Bayes factor or marginal density by using the choice of prior distribution, and expected loss or utility by using an approximated posterior distribution.