^{1}

^{*}

^{1}

^{2}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

We investigate the asymptotic construction of constant-risk Bayesian predictive densities under the Kullback–Leibler risk when the distributions of data and target variables are different and have a common unknown parameter. It is known that the Kullback–Leibler risk is asymptotically equal to a trace of the product of two matrices: the inverse of the Fisher information matrix for the data and the Fisher information matrix for the target variables. We assume that the trace has a unique maximum point with respect to the parameter. We construct asymptotically constant-risk Bayesian predictive densities using a prior depending on the sample size. Further, we apply the theory to the subminimax estimator problem and the prediction based on the binary regression model.

Let

We construct predictive densities for target variables based on the data. We measure the performance of the predictive density,

Then, the risk function,

For the construction of predictive densities, we consider the Bayesian predictive density defined by:

where

Among various criteria, we focus on a criterion of constructing minimax predictive densities under the Kullback–Leibler risk. For simplicity, we refer to the priors generating minimax predictive densities as minimax priors. Minimax priors have been previously studied in various predictive settings; see [

Except for [

We focus on the minimax priors in predictions where the distributions,

Let
^{X}^{,}^{ij}^{Y}^{,}^{ij}

On the asymptotics as the sample size

is constant up to O(^{−}^{2}). Since the proper prior with the constant risk is a minimax prior for any finite sample size, the asymptotically constant-risk prior relates to the minimax prior; in Section 4, we verify that the asymptotically constant-risk prior agrees with the exact minimax prior in binomial examples.

When we use the prior, ^{−}^{1}-order term,

However, we consider the settings where there exists a unique maximum point of the trace,

Since, in our settings, the first-order term,

When there exists a unique maximum point of the trace,
^{−}^{2}), by making the prior dependent on the sample size,

where ^{X}^{X}

The key idea is that, if the specified parameter point has more undue risk than the other parameter points, then the more prior weights should be concentrated on that point.

Further, we clarify the subminimax estimator problem based on the mean squared error from the viewpoint of the prediction where the distributions of data and target variables are different and have a common unknown parameter. We obtain the improvement achieved by the minimax estimator over the subminimax estimators up to O(^{−}^{2}). The subminimax estimator problem [

In this section, we prepare the information geometrical notations; see [^{i}_{i}^{2}^{i}∂θ^{j}^{3}^{i}∂θ^{j}∂θ^{k}^{4}^{i}∂θ^{j}∂θ^{k}∂θ^{l}_{ij}_{ijk}_{ijkl}^{(}^{N}^{)}, by E_{X}_{Y}_{X}^{(}^{N}^{)}|

We define the predictive metric proposed by Komaki [

When the parameter is one-dimensional, _{θθ}^{θθ}

and:

Using these quantities, the e-connection and m-connection coefficients with respect to the parameter,

and:

respectively.

The (0, 3)-tensor,

The tensor,

In the same manner, the information geometrical quantities,

Let

For a derivative,

In this section, we consider the settings where the trace,
^{−}^{2}) is constant. We find asymptotically constant-risk priors up to O(^{−}^{2}) in two steps: first, expand the Kullback–Leibler risks of Bayesian predictive densities; second, find the prior having an asymptotically constant risk using this expansion.

From now on, we assume the following two conditions for the prior,

where

Based on Conditions (C1) and (C2), we expand the Kullback–Leibler risk of a Bayesian predictive density up to O(^{−}^{2}).

The proof is given in the ^{X}^{;}^{ij}

_{f}

^{−}^{3}^{/}^{2}_{f}. Under Condition (C2), θ_{f} is equal to the unique maximum point, θ_{max},

Based on

_{max}

^{−}^{2})

From Theorem 1, the Kullback–Leibler risk,

This is constant up to o(^{−}^{1}).

Suppose that there exists another prior,

and the Bayesian predictive density based on the prior,

From Theorem 1, the prior

The left-hand side of the above equation is non-negative, because the matrix,
^{−}^{1}-order term of the risk based on the prior, ^{−}^{1}).

Second, we consider the prior,

The above argument ensures that the prior, ^{−}^{1}). Thus, we only have to check if the ^{−}^{3}^{/}^{2}-order term of the risk is the smallest constant. From ^{−}^{3}^{/}^{2}-order term of the risk at the point, _{max}, is unchanged by the choice of the scalar function, log ^{−}^{3}^{/}^{2}-order term must agree with the quantity,
^{−3}^{/}^{2}-order term of the risk is the smallest constant, and it agrees with the quantity,
^{−}^{2}). □

_{max}

^{−}^{1}^{−}^{1}^{−3/2}^{−}^{3}^{/}^{2}

In this section, we refer to the subminimax estimator problem based on the mean squared error, from the viewpoint of the prediction where the distributions of data and target variables are different and have a common unknown parameter. First, we give a brief review of subminimax estimator problem through the binomial example.

^{−}^{1}^{−}^{1}

^{−}^{3}^{/}^{2}^{−}^{3}^{/}^{2}^{−}^{2}),

Next, we construct the asymptotically constant-risk prior in the estimation based on the mean squared error when the subminimax estimator problem occurs, from the viewpoint of the prediction. We consider the priors, ^{−}^{2}) from Lemma 4 in the

Finally, we compare the mean squared error of the asymptotically constant-risk Bayes estimator,

In contrast, the mean squared error of the maximum likelihood estimator,

See [

Thus, the maximum of the mean squared error of the asymptotically constant-risk Bayes estimator is smaller than that of estimators by the improvement of order ^{−}^{3}^{/}^{2} in proportion to the Hessian of the scalar function,
_{max}. In the prediction where the trace,

^{X,θθ}_{max}) = 1/2 ^{−}^{2}).

In this section, we construct asymptotically constant-risk priors in the prediction based on the binary regression model under the covariate shift; see [

We consider that we predict a binary response variable, ^{(}^{N}^{)}. We assume that the target variable, ^{(}^{N}^{)}, follow the logistic regression models with the same parameter,

and:

where Π_{x}_{y}

Using the parameter _{x}

and:

respectively. We obtain two Fisher information for

and:

respectively.

For simplicity, we consider the setting where

respectively. In the same manner, the geometrical quantities for the model, {

respectively.

Using these quantities,

By noting that the maximum point of

Using this solution, we obtain the solution of

The asymptotically constant-risk priors for the different sample sizes are shown in

In this example, we obtain the Kullback–Leibler risk of the Bayesian predictive density based on the asymptotically constant-risk prior,

We compare this value with the Bayes risk calculated using the Monte Carlo simulation; see

We have considered the setting where the quantity,
^{X,ij}^{−}^{2}).

In Section 3, we have considered the prior depending on the sample size,

We have assumed that the trace,

The authors thank the referees for their helpful comments. This research was partially supported by a Grant-in-Aid for Scientific Research (23650144, 26280005).

Both authors contributed to the research and writing of this paper. Both authors read and approved the final manuscript.

The authors declare no conflict of interest.

We prove Theorem 1. First, we introduce some lemmas for the proof. For the expansion, we follow the following six steps (the first five steps are arranged in the form of lemmas): the first is to expand the MAPestimator; the second is to calculate their bias and mean squared error; the third is to expand the Kullback–Leibler risk using

We use some additional notations for the expansion. Let
^{(}^{N}^{)}|^{X}^{1/2}}. Let ^{(}^{N}^{)}) denote the log likelihood of the data, ^{(}^{N}^{)}. Let _{ij}^{(}^{N}^{)}), _{ijk}^{(}^{N}^{)}) and _{ijkl}^{(}^{N}^{)}) be the derivatives of order 2, 3 and 4 of the log likelihood, ^{(}^{N}^{)}). Let _{ij}^{(}^{N}^{)}) denote the quantity,
_{ij}_{ij}_{i}_{(}_{j}b_{k}_{)}_{l}_{i}_{(}_{j}b_{k}_{)}_{l}_{ij}b_{kl}_{ik}b_{jl}

^{(}^{N}^{)}|^{X}^{1}^{/}^{2}}.

From our assumption that prior

we rewrite this equation as:

By applying Taylor expansion around

From the law of large numbers and the central limit theorem, we rewrite the above expansion as:

By substituting the deviation,

^{(}^{N}^{)}^{X}^{1}^{/}^{2}}.

Second, consider the following relationship:

By differentiating the

where

Finally, by taking the expectation of the third power of the deviation,

□

^{(}^{N}^{)}^{X}^{1}^{/}^{2}}.

where

By the definition of the predictive metric,

By substituting Expansion

Note that Expansion ^{−}^{2}) under the reparametrization, so that each term of this expansion is a scalar function of

^{(}^{N}^{)}), as:

We denote the ^{−}^{1}^{/}^{2}-order, ^{−}^{1}-order and ^{−}^{3}^{/}^{2}-order terms by

To make the expansion easier to see, the following notations are used. Let
^{i}η^{j}

Using the above notations, we get the following posterior expansion:

Second, using

Here, the following two equations hold:

By combining

By substituting

Note that the integration of Expansion _{P}(^{−}^{2}). Further, Expansion

The first term of this decomposition is not dependent on.

Using these lemmas, we prove Theorem 1. First, we find that the Kullback–Leibler risk of the plug-in predictive density with the estimator,

Using Expansion ^{−}^{2}), because we expand the Bayesian predictive density,

Thus, we obtain Expansion

Asymptotically constant-risk prior in the prediction where the data are distributed according to the binomial distribution, Bin(^{2}/(^{2} + (1 − ^{2})).

Bayes risk based on the asymptotically constant-risk prior in the prediction where the data are distributed according to the binomial distribution, Bin(^{2}/(^{2} + (1 − ^{2})).

Comparison of the Kullback–Leibler risk calculated using the Monte Carlo simulations and the asymptotic risk,
^{2}/(^{2} + (1 − ^{2})).