Abstract
Under mild conditions, strong consistency of the Bayes estimator of the density is proved. Moreover, the Bayes risk (for some common loss functions) of the Bayes estimator of the density (i.e., the posterior predictive density) goes to zero as the sample size goes to ∞. In passing, a similar result is obtained for the estimation of the sampling distribution.
Keywords:
Bayesian density estimation; Bayesian estimation of the sampling distribution; posterior predictive distribution; consistency of the Bayes estimator MSC:
primary; 62G07; 62G20; secondary; 62G07
1. Introduction
In a statistical context, since the expression the probability of an event A (usually denoted ) depends on the unknown parameter, it is really a misuse of language. Before performing the experiment, this expression can be assigned a natural meaning from a Bayesian perspective as the prior predictive probability of A since it is the prior mean of the probabilities . However, in accordance with Bayesian philosophy, once the experiment has been carried out and the value has been observed, a more appropriate estimate of is the posterior predictive probability given of A. The author has recently proved ([1]) that not only is this the Bayes estimator of but that the posterior predictive distribution (resp. the posterior predictive density) is the Bayes estimator of the sampling distribution (resp. the density ) for the squared variation total (resp. the squared ) loss function in the Bayesian experiment corresponding to an n-sized sample of the unknown distribution. It should be noted that the loss functions considered derive in a natural way from the commonly used squared error loss function when estimating a real function of the parameter.
The posterior predictive distribution is the cornerstone of Predictive Inference, which seeks to make inferences about a new unknown observation from a preceding random sample (see [2,3]). With that idea in mind, it has also been used in other areas such as model selection, testing for discordancy, goodness of fit, perturbation analysis, and classification (see additional fields of application in [1,2,3,4,5]). Furthermore, in [1], it has been presented as a solution for the Bayesian density estimation problem, giving several examples to illustrate the results and, in particular, to calculate a posterior predictive density. [3] provide many other examples of determining the posterior predictive distribution. But in practice, explicit evaluation of the posterior predictive distribution may be cumbersome, and its simulation may become preferable. The aforementioned work of [3] also constitutes a good reference for such simulation methods, and hence for the computation of the Bayes estimators of the density and the sampling distribution.
We would refer to the references cited in [1] for other statistical uses of the posterior predictive distribution and some useful ways to calculate it.
In this communication, we shall explore the asymptotic behaviour of the posterior predictive density as the Bayes estimator of the density, showing its strong consistency and that the Bayes risk goes to 0 as n goes to ∞.
2. The Framework
Let
be a Bayesian experiment (where Q denotes de prior distribution on the parameter space ), and consider the infinite product Bayesian experiment
corresponding to an infinite sample of the unknown distribution . Let us write
for integer n.
We suppose that is a Markov kernel. Let
be the joint distribution of the parameter and the observations, i.e.,
As (i.e., the probability distribution of J with respect to ), is a version of the conditional distribution (regular conditional probability) . Analogously, is a version of the conditional distribution .
Let , the prior predictive distribution in (so that is the prior mean of the probabilities ). Similarly, write for the prior predictive distribution in . So, the posterior distribution given satisfies
Denote by for the posterior distribution given .
Write for the posterior predictive distribution given defined for as
So is nothing but the posterior mean given of the probabilities .
In the dominated case, we can assume without loss of generality that the dominating measure is a probability measure (because of (1) below). We write . The likelihood function is assumed to be -measurable.
We have that, for all n and every event ,
which proves that
is a -density of that we recognize as the posterior predictive density on given .
In the same way,
is a -density of , the posterior predictive density on given .
In the following, we will assume the following additional regularity conditions:
- (i)
- is a standard Borel space;
- (ii)
- is a Borel subset of a Polish space and is its Borel -field;
- (iii)
- is identifiable.
According to [1], the posterior predictive distribution (resp. the posterior predictive density ) is the Bayes estimator of the sampling distribution (resp. the density ) for the squared variation total (resp. the squared ) loss function in the product experiment . Analogously, the posterior predictive distribution (resp. the posterior predictive density ) is the Bayes estimator of the sampling distribution (resp. the density ) for the squared variation total (resp. the squared ) loss function in the product experiment .
As a particular case of a well known result about the total variation distance between two probability measures and the -distance between their densities, we have that
3. The Main Result
We ask whether the Bayes risk of the Bayes estimator of the sampling distribution goes to zero when , i.e., whether
In terms of densities, the question is whether the Bayes risk of the Bayes estimator of the density goes to zero when , i.e., whether
Let us consider the auxiliary Bayesian experiment
For , and , we will continue to write and , and now we write .
The new prior predictive distribution is since
To compute the new posterior distributions, notice that
On the other hand,
So,
It follows that if then
when , we have that is an increasing sequence of sub--fields of such that . According to the martingale convergence theorem of Lévy, if Y is -measurable and -integrable then
converges -a.e. and in to .
Let us consider the -integrable function
We shall see that
Indeed, given and , we have that
which proves (2).
Analogously, it can be shown that
Hence, it follows from the aforementioned theorem of Lévy that
and
i.e.,
On the other hand, as a consequence of a known theorem of Doob (see Theorem 6.9 and Proposition 6.10 of [4], pp. 129, 130, we have that, for every ,
for Q-almost every . Hence
for Q-almost every , i.e., given there exists such that and, ,
So, for , there exists such that and
In particular,
From (4) and (6), it follows that ,
From this and (5), it follows that
i.e., the risk of the Bayes estimator of the density for the loss function goes to 0 when .
It follows from this and (1) that
i.e., the risk of the Bayes estimator of the sampling distribution for the variation total loss function goes to 0 when .
We ask whether these results remain true for the squared versions of the loss functions. The answer is affirmative because of the following general result: Let be a sequence of r.r.v. on a probability space such that . If there exists such that , for all n, then because
In our case , and
So, we have proved the following result.
Theorem 1.
Let be a Bayesian experiment dominated by a σ-finite measure μ. Let us assume that is a standard Borel space, and that Θ is a Borel subset of a Polish space and is its Borel σ-field. Assume also that the likelihood function is -measurable and the family is identifiable. Then:
- (a)
- The posterior predictive density is the Bayes estimator of the density in the product experiment for the squared loss function. Moreover the risk function converges to 0 for both the loss function and the squared loss function.
- (b)
- The posterior predictive distribution is the Bayes estimator of the sampling distribution in the product experiment for the squared variation total loss function. Moreover the risk function converges to 0 for both the variation total loss function and the squared variation total loss function.
- (c)
- The posterior predictive density is a strongly consistent estimator of the density , i.e.,for Q-almost every .
Funding
This research was funded by the Junta de Extremaura (SPAIN) grant number GR21044.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The author declares no conflict of interest.
References
- Nogales, A.G. On Bayesian estimation of densities and sampling distributions: The posterior predictive distribution as the Bayes estimator. Stat. Neerl. 2021. accepted. [Google Scholar] [CrossRef]
- Geisser, S. Predictive Inference: An Introduction; Chapman & Hall: New York, NY, USA, 1993. [Google Scholar]
- Gelman, A.; Carlin, J.B.; Stern, H.S.; Dunson, D.B.; Vehtari, A.; Rubin, D.B. Bayesian Data Analysis, 3rd ed.; CRC Press (Taylor & Francis Group): Boca Raton, FL, USA, 2014. [Google Scholar]
- Ghosal, S.; Vaart, A.V.D. Fundamentals of Nonparametric Bayesian Inference; Cambridge University Press: Cambridge, UK, 2017. [Google Scholar]
- Rubin, D.B. Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Stat. 1984, 12, 1151–1172. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).