Optimal Bayesian Estimation of a Regression Curve, a Conditional Density, and a Conditional Distribution

Nogales, Agustín G.

doi:10.3390/math10081213

Open AccessArticle

Optimal Bayesian Estimation of a Regression Curve, a Conditional Density, and a Conditional Distribution

by

Agustín G. Nogales

Departamento de Matemáticas, IMUEx, Universidad de Extremadura, 06006 Badajoz, Spain

Mathematics 2022, 10(8), 1213; https://doi.org/10.3390/math10081213

Submission received: 20 February 2022 / Revised: 9 March 2022 / Accepted: 4 April 2022 / Published: 7 April 2022

(This article belongs to the Section D1: Probability and Statistics)

Download Versions Notes

Abstract

:

In this paper, several related estimation problems are addressed from a Bayesian point of view, and optimal estimators are obtained for each of them when some natural loss functions are considered. The problems considered are the estimation of a regression curve, a conditional distribution function, a conditional density, and even the conditional distribution itself. These problems are posed in a sufficiently general framework to cover continuous and discrete, univariate and multivariate, and parametric and nonparametric cases, without the need to use a specific prior distribution. The loss functions considered come naturally from the quadratic error loss function commonly used in estimating a real function of the unknown parameter. The cornerstone of these Bayes estimators is the posterior predictive distribution. Some examples are provided to illustrate the results.

Keywords:

Bayesian estimation of a regression curve; posterior predictive distribution

MSC:

62F15; 62G07; 62Jxx

1. Introduction

In Statistics, the expression the probability of an event A (written

P_{θ} (A)

) is, in general, ambiguous, as it depends on the unknown parameter

θ

. Before conducting the experiment, a Bayesian statistician, provided with the prior distribution, possesses a natural candidate—the prior predictive probability of A—since it is the prior mean of the probabilities of A. However, in accordance with Bayesian philosophy, after the experiment has been performed and the data

ω

observed, a reasonable estimation is the posterior predictive probability of A given

ω

because it is the posterior mean of the probabilities of A given

ω

. It can be shown that not only is this the Bayes estimator of the probability

P_{θ} (A)

of A for the squared error loss function but also that the posterior predictive distribution is the Bayes estimator of the sampling probability distribution

P_{θ}

for the squared variation total loss function and that the posterior predictive density is the Bayes estimator of its density for the

L^{1}

-squared loss function. Note that these loss functions should be considered natural in the sense that they are derived directly from the quadratic error loss function commonly used in the estimation of a real function of the parameter. Ref. [1] contains precise statements and proofs of these results, which are nothing but a functional generalization of Theorem 1.1 (more specifically of its Corollary 1.2.(a)) of [2], p. 228, which yields the Bayes estimator of a real function of the parameter for the squared error loss function.

This communication addresses the estimation of a regression curve and some related problems, such as the estimation of a conditional density or a conditional distribution function or even the conditional distribution itself from a Bayesian perspective. It should, therefore, be considered as the conditional counterpart of [1], and the results to be presented below as the functional extension of [2], Theorem 1.1, for the conditional case. Thus, it is unsurprising that the posterior predictive distribution is the cornerstone for the estimation problems to be discussed below. Some examples illustrating the results will be presented in Section 7. See [1] and the references therein for other examples of the determination of the posterior predictive distribution. In practice, however, the explicit evaluation of the posterior predictive distribution could well be cumbersome, and its simulation may become preferable. Ref. [3] is a good reference for such simulation methods, and hence, for the computation of the Bayes estimators of the conditional density and the regression curve.

The posterior predictive distribution has been presented as the base of Predictive Inference, which seeks to make inferences about a new unknown observation from the previous random sample in contrast with the greater emphasis that statistical inference, since its mathematical foundations in the early twentieth century, puts on parameter estimation and contrast (see [4] or [3]). With that idea in mind, it has also been used in other areas, such as model selection, testing for discordancy, goodness of fit, perturbation analysis, or classification (see additional fields of application in [4,5]), but never as a possible solution for the Bayesian problems of estimating an unconditional or conditional density. The reader is referred to the references within [1] for other uses of the posterior predictive distribution in Bayesian statistics.

To summarize the contribution of this work, I want to emphasize that the problems of estimating a density (conditional or not) or a regression curve are of central importance in Nonparametric Inference and Functional Data Analysis (for example, see [6] or [7], and the references they contain). Although nobody expects an optimal result for these problems in a frequentist environment, this article together with [1] produces optimal solutions for them in a Bayesian framework. The reader should note that these are not just theorems of existence and uniqueness of solutions; rather, on the contrary, the results obtained explicit formulas for the solutions based on the posterior predictive distribution. Note also that there is enough literature on how to calculate it, exactly or approximately.

Section 2 sets out the proper statistical framework for tackling the problems, i.e., the proper Bayesian experiment (conceived also as a probability space along the lines suggested by [8], for example.)

Section 3 deals with the problem of Bayesian estimation of a conditional distribution when the squared total variation loss function is used and Theorem 1 gives the Bayes estimator in terms of the posterior predictive distribution.

Section 4 takes advantage of Theorem 1 to solve the problem of the Bayesian estimation of a conditional density using the

L^{1}

-squared loss function, obtaining the Bayes estimator of the conditional density (see Theorem 2).

Section 5 and Section 6 deal with the problems of Bayesian estimation of a conditional distribution function and a regression curve in the real case. Theorems 3 and 4 yieds the solutions.

Section 7 provide some examples to illustrate the application of all these theorems.

For ease of reading, the proofs are postponed until Section 8. This is followed by an appendix (Appendix A) explaining the notation and concepts used in the text.

We shall place ourselves from this point onwards in a general framework for Bayesian inference, as described in [9].

2. The Framework

Let

(Ω, A, {P_{θ} : θ \in (Θ, T, Q)})

be a Bayesian statistical experiment, and

X_{i} : (Ω, A, {P_{θ} : θ \in (Θ, T, Q)}) \to (Ω_{i}, A_{i})

,

i = 1, 2

, two statistics. Consider the Bayesian experiment image of

(X_{1}, X_{2})

:

(Ω_{1} \times Ω_{2}, A_{1} \times A_{2}, {P_{θ}^{(X_{1}, X_{2})} : θ \in (Θ, T, Q)}) .

In what follows, we shall assume that

P^{(X_{1}, X_{2})} (θ, A_{12}) : = P_{θ}^{(X_{1}, X_{2})} (A_{12})

,

θ \in Θ, A_{12} \in A_{1} \times A_{2}

is a Markov kernel and write

R_{θ} = P_{θ}^{(X_{1}, X_{2})}

.

The Bayesian experiment corresponding to a sample of size n of the joint distribution of

(X_{1}, X_{2})

is:

({(Ω_{1} \times Ω_{2})}^{n}, {(A_{1} \times A_{2})}^{n}, \{R_{θ}^{n} : θ \in (Θ, T, Q)\}) .

(1)

We write

R^{n} (θ, A_{12, n}^{'}) = R_{θ}^{n} (A_{12, n}^{'})

for

A_{12, n}^{'} \in {(A_{1} \times A_{2})}^{n}

and

Π_{12, n} : = Q \otimes R^{n}

for the joint distribution of the parameter and the sample:

Π_{12, n} (A_{12, n}^{'} \times T) = \int_{T} R_{θ}^{n} (A_{12, n}^{'}) d Q (θ), A_{12, n}^{'} \in {(A_{1} \times A_{2})}^{n}, T \in T .

(2)

The corresponding prior predictive distribution

β_{12, n}^{*}

is:

β_{12, n}^{*} (A_{12, n}^{'}) = \int_{Θ} R_{θ}^{n} (A_{12, n}^{'}) d Q (θ), A_{12, n}^{'} \in {(A_{1} \times A_{2})}^{n} .

(3)

The posterior distribution is a Markov kernel:

R_{n}^{*} : ({(Ω_{1} \times Ω_{2})}^{n}, {(A_{1} \times A_{2})}^{n}) ≻ ⟶ (Θ, T)

such that, for all

A_{12, n}^{'} \in {(A_{1} \times A_{2})}^{n}

and

T \in T

,

Π_{12, n} (A_{12, n}^{'} \times T) = \int_{T} R_{θ}^{n} (A_{12, n}^{'}) d Q (θ) = \int_{A_{12, n}^{'}} R_{n}^{*} (x^{'}, T) d β_{12, n}^{*} (x^{'}) .

Let us write

R_{n, x^{'}}^{*} (T) : = R_{n}^{*} (x^{'}, T)

.

The posterior predictive distribution on

A_{1} \times A_{2}

is the Markov kernel:

{R_{n}^{*}}^{R} : ({(Ω_{1} \times Ω_{2})}^{n}, {(A_{1} \times A_{2})}^{n}) ≻ ⟶ (Ω_{1} \times Ω_{2}, A_{1} \times A_{2})

defined, for

x^{'} \in {(Ω_{1} \times Ω_{2})}^{n}

, by:

{R_{n}^{*}}^{R} (x^{'}, A_{12}) : = \int_{Θ} R_{θ} (A_{12}) d R_{n, x^{'}}^{*} (θ) .

It follows that, with obvious notation:

\int_{Ω_{1} \times Ω_{2}} f (x) d {R_{n, x^{'}}^{*}}^{R} (x) = \int_{Θ} \int_{Ω_{1} \times Ω_{2}} f (x) d R_{θ} (x) d R_{n, x^{'}}^{*} (θ)

for any non-negative or integrable real random variable (r.r.v. for short) f.

We can also consider the posterior predictive distribution on

{(A_{1} \times A_{2})}^{n}

defined as the Markov kernel:

{R_{n}^{*}}^{R^{n}} : ({(Ω_{1} \times Ω_{2})}^{n}, {(A_{1} \times A_{2})}^{n}) ≻ ⟶ ({(Ω_{1} \times Ω_{2})}^{n}, {(A_{1} \times A_{2})}^{n})

such that:

{R_{n}^{*}}^{R^{n}} (x^{'}, A_{12, n}^{'}) : = \int_{Θ} R_{θ}^{n} (A_{12, n}^{'}) d R_{n, x^{'}}^{*} (θ) .

According to Theorem 1 of [1], this is the Bayes estimator of the distribution

R_{θ}^{n}

for the squared total variation function:

\begin{matrix} \int_{{(Ω_{1} \times Ω_{2})}^{n} \times Θ} sup_{A_{12, n} \in {(A_{1} \times A_{2})}^{n}} {|{R_{n, x^{'}}^{*}}^{R^{n}} (A_{12, n}) - R_{θ}^{n} (A_{12, n})|}^{2} d Π_{12, n} (x^{'}, θ) \leq \\ \int_{{(Ω_{1} \times Ω_{2})}^{n} \times Θ} sup_{A_{12, n} \in {(A_{1} \times A_{2})}^{n}} {|M (x^{'}, A_{12, n}) - R_{θ}^{n} (A_{12, n})|}^{2} d Π_{12, n} (x^{'}, θ), \end{matrix}

for every Markov kernel

M : {(Ω_{1} \times Ω_{2}, A_{1} \times A_{2})}^{n} ≻ ⟶ {(Ω_{1} \times Ω_{2}, A_{1} \times A_{2})}^{n}

.

It can be readily checked that:

{[{R_{n, x^{'}}^{*}}^{R^{n}}]}^{π_{1}^{'}} = {R_{n, x^{'}}^{*}}^{R},

where

π_{1}^{'} (x^{'}) : = x_{1}^{'} : = (x_{11}^{'}, x_{21}^{'})

for

x^{'} \in {(Ω_{1} \times Ω_{2})}^{n}

. Then, Theorem 2 of [1] shows that:

\begin{matrix} \int_{{(Ω_{1} \times Ω_{2})}^{n} \times Θ} sup_{A_{12} \in A_{1} \times A_{2}} {|{R_{n, x^{'}}^{*}}^{R} (A_{12}) - R_{θ} (A_{12})|}^{2} d Π_{12, n} (x^{'}, θ) \leq \\ \int_{{(Ω_{1} \times Ω_{2})}^{n} \times Θ} sup_{A_{12} \in A_{1} \times A_{2}} {|M (x^{'}, A_{12}) - R_{θ} (A_{12})|}^{2} d Π_{12, n} (x^{'}, θ), \end{matrix}

for every Markov kernel

M : {(Ω_{1} \times Ω_{2}, A_{1} \times A_{2})}^{n} ≻ ⟶ (Ω_{1} \times Ω_{2}, A_{1} \times A_{2})

.

We introduce some notation for

(x^{'}, x, θ) \in {(Ω_{1} \times Ω_{2})}^{n} \times (Ω_{1} \times Ω_{2}) \times Θ

:

\begin{matrix} π^{'} (x^{'}, x, θ) : = x^{'}, π_{i}^{'} (x^{'}, x, θ) : = x_{i}^{'} : = (x_{i 1}^{'}, x_{i 2}^{'}), 1 \leq i \leq n, \\ π (x^{'}, x, θ) : = x, π_{i} (x^{'}, x, θ) : = x_{i}, i = 1, 2, \\ q (x^{'}, x, θ) : = θ . \end{matrix}

Let us consider the probability space:

\begin{matrix} ({(Ω_{1} \times Ω_{2})}^{n} \times (Ω_{1} \times Ω_{2}) \times Θ, {(A_{1} \times A_{2})}^{n} \times (A_{1} \times A_{2}) \times T, Π_{n}), \end{matrix}

(4)

where:

\begin{matrix} Π_{n} (A_{12, n}^{'} \times A_{12} \times T) = \int_{T} R_{θ} (A_{12}) R_{θ}^{n} (A_{12, n}^{'}) d Q (θ) \end{matrix}

(5)

when

A_{12, n}^{'} \in {(A_{1} \times A_{2})}^{n}

,

A_{12} \in A_{1} \times A_{2}

and

T \in T

.

Thus, for a r.r.v. f on:

({(Ω_{1} \times Ω_{2})}^{n} \times (Ω_{1} \times Ω_{2}) \times Θ, {(A_{1} \times A_{2})}^{n} \times (A_{1} \times A_{2}) \times T)

,

\begin{matrix} \int f d Π_{n} = \int_{Θ} \int_{{(Ω_{1} \times Ω_{2})}^{n}} \int_{(Ω_{1} \times Ω_{2})} f (x^{'}, x, θ) d R_{θ} (x) d R_{θ}^{n} (x^{'}) d Q (θ) \end{matrix}

(6)

provided that the integral exists. Moreover, for a r.r.v. h on

((Ω_{1} \times Ω_{2}) \times Θ, (A_{1} \times A_{2}) \times T)

:

\int h d Π_{n} = \int_{Θ} \int_{Ω_{1} \times Ω_{2}} h (x, θ) d R_{θ} (x) d Q (θ) = \int_{Ω_{1} \times Ω_{2}} \int_{Θ} h (x, θ) d R_{1, x}^{*} (θ) d β_{12, 1}^{*} (x) .

The following proposition is straightforward.

Proposition 1.

Given

A_{12, n}^{'} \in {(A_{1} \times A_{2})}^{n}

,

A_{12} \in A_{1} \times A_{2}

and

T \in T

, we have that:

\begin{matrix} Π_{n}^{(π^{'}, q)} (A_{12 . n}^{'} \times T) = Π_{12, n} (A_{12, n}^{'} \times T) = \int_{T} R_{θ}^{n} (A_{12, n}^{'}) d Q (θ) = \int_{A_{12, n}^{'}} R_{n, x^{'}}^{*} (T) d β_{12, n}^{*} (x^{'}), \\ Π_{n}^{(π, q)} (A_{12} \times T) = Π_{12, 1} (A_{12} \times T) = \int_{T} R_{θ} (A_{12}) d Q (θ) = \int_{A_{12}} R_{1, x}^{*} (T) d β_{12, 1}^{*} (x) . \end{matrix}

Moreover:

\begin{matrix} Π_{n}^{q} = Q, Π_{n}^{(π^{'}, q)} = Π_{12, n}, Π_{n}^{π^{'}} = β_{12, n}^{*}, Π_{n}^{(π, q)} = Π_{12, 1}, Π_{n}^{π} = β_{12, 1}^{*}, \\ Π_{n}^{π^{'} | q = θ} = R_{θ}^{n}, Π_{n}^{π | q = θ} = R_{θ}, Π_{n}^{q | π^{'} = x^{'}} = R_{n, x^{'}}^{*}, Π_{n}^{q | π = x} = R_{1, x}^{*}, \\ P_{θ}^{X_{1}} = R_{θ}^{π_{1}}, P_{θ}^{X_{2} | X_{1} = x_{1}} = R_{θ}^{π_{2} | π_{1} = x_{1}}, E_{P_{θ}} (X_{2} | X_{1} = x_{1}) = E_{R_{θ}} (π_{2} | π_{1} = x_{1}), \end{matrix}

where the last equality refers to the case where

X_{2}

is a real statistic with a finite mean.

In particular, the probability space (4) contains all the basic ingredients of the Bayesian experiment (1), i.e., the prior distribution, the sampling probabilities, the posterior distributions, and the prior predictive distribution. In addition, it becomes the natural framework in which to address the estimation problems of this communication, as we shall see in what follows.

3. Bayes Estimator of the Conditional Distribution

An estimator of the conditional distribution

P_{θ}^{X_{2} | X_{1}}

from an n-sized sample of the joint distribution of

(X_{1}, X_{2})

is a Markov kernel:

M : ({(Ω_{1} \times Ω_{2})}^{n} \times Ω_{1}, {(A_{1} \times A_{2})}^{n} \times A_{1}) ≻ ⟶ (Ω_{2}, A_{2})

such that, for observed

x^{'} = ((x_{11}^{'}, x_{21}^{'}), \dots, (x_{1 n}^{'}, x_{2 n}^{'})) \in {(Ω_{1} \times Ω_{2})}^{n}

,

M (x^{'}, x_{1}, \cdot)

is a probability measure on

A_{2}

that can be considered to be an estimation of the conditional distribution

P_{θ}^{X_{2} | X_{1} = x_{1}}

for a given

x_{1} \in Ω_{1}

.

From a Bayesian point of view, the Bayes estimator of the conditional distribution

P^{X_{2} | X_{1}} = R^{π_{2} | π_{1}}

is a Markov kernel:

M : ({(Ω_{1} \times Ω_{2})}^{n} \times Ω_{1}, {(A_{1} \times A_{2})}^{n} \times A_{1}) ≻ ⟶ (Ω_{2}, A_{2})

minimizing the Bayes risk:

\begin{matrix} \int_{{(Ω_{1} \times Ω_{2})}^{n} \times Θ} \int_{Ω_{1}} sup_{A_{2} \in A_{2}} | M (x^{'}, x_{1}, A_{2}) - R_{θ}^{π_{2} | π_{1} = x_{1}} (A_{2}) |^{2} d R_{θ}^{π_{1}} (x_{1}) d Π_{12, n} (x^{'}, θ) = \\ \int_{Θ} \int_{{(Ω_{1} \times Ω_{2})}^{n}} \int_{Ω_{1}} sup_{A_{2} \in A_{2}} | M (x^{'}, x_{1}, A_{2}) - R_{θ}^{π_{2} | π_{1} = x_{1}} (A_{2}) |^{2} d R_{θ}^{π_{1}} (x_{1}) d R_{θ}^{n} (x^{'}) d Q (θ) = \\ \int_{{(Ω_{1} \times Ω_{2})}^{n} \times (Ω_{1} \times Ω_{2}) \times Θ} sup_{A_{2} \in A_{2}} | M (x^{'}, x_{1}, A_{2}) - R_{θ}^{π_{2} | π_{1} = x_{1}} (A_{2}) |^{2} d Π_{n} (x^{'}, x, θ) . \end{matrix}

The following result yields the Bayes estimator of the conditional distribution

P_{θ}^{X_{2} | X_{1}}

from the posterior predictive distribution.

Theorem 1.

Assume that the σ-field

A_{2}

is separable. Then, the conditional distribution of

π_{2}

given

π_{1} = x_{1}

with respect to the posterior predictive distribution

{R_{n, x^{'}}^{*}}^{R}

:

M_{n}^{*} (x^{'}, x_{1}, A_{2}) : = {({R_{n, x^{'}}^{*}}^{R})}^{π_{2} | π_{1} = x_{1}} (A_{2}),

is the Bayes estimator of the conditional distribution

R^{π_{2} | π_{1}}

for the squared total variation loss function:

\begin{matrix} \int_{{(Ω_{1} \times Ω_{2})}^{n + 1} \times Θ} sup_{A_{2} \in A_{2}} | M_{n}^{*} (x^{'}, x_{1}, A_{2}) - R_{θ}^{π_{2} | π_{1} = x_{1}} (A_{2}) |^{2} d Π_{n} (x^{'}, x, θ) \leq \\ \int_{{(Ω_{1} \times Ω_{2})}^{n + 1} \times Θ} sup_{A_{2} \in A_{2}} | M (x^{'}, x_{1}, A_{2}) - R_{θ}^{π_{2} | π_{1} = x_{1}} (A_{2}) |^{2} d Π_{n} (x^{'}, x, θ) \end{matrix}

for any estimator M of the conditional distribution

R^{π_{2} | π_{1}}

.

Fix an event

A_{2} \in A_{2}

and define:

H_{A_{2}} (x^{'}, x_{1}, θ) : = P_{θ}^{X_{2} | X_{1} = x_{1}} (A_{2}) = R_{θ}^{π_{2} | π_{1} = x_{1}} (A_{2}) .

Jensen’s inequality could help reach a proof of the theorem if the following result can be proved.

Lemma 1.

Given

A_{2} \in A_{2}

:

E_{Π_{n}} (H_{A_{2}} | π^{'} = x^{'}, π_{1} = x_{1}) = M_{n}^{*} (x^{'}, x_{1}, A_{2}),

i.e., for all

A_{12, n}^{'} \in {(A_{1} \times A_{2})}^{n}

and all

A_{1} \in A_{1}

:

\begin{matrix} \int_{A_{12, n}^{'} \times A_{1} \times Ω_{2} \times Θ} R_{θ}^{π_{2} | π_{1} = x_{1}} (A_{2}) d Π_{n} (x^{'}, x, θ) = \int_{A_{12, n}^{'} \times A_{1}} {[{R_{n, x^{'}}^{*}}^{R}]}^{π_{2} | π_{1} = x_{1}} (A_{2}) d {Π_{n}}^{(π^{'}, π_{1})} (x^{'}, x_{1}) . \end{matrix}

4. Bayes Estimator of the Conditional Density

When the joint distribution

R_{θ} = P_{θ}^{(X_{1}, X_{2})}

has a density

f_{θ}

with respect to the product of two

σ

-finite measures

μ_{1}

and

μ_{2}

on

A_{1}

and

A_{2}

, resp., the conditional density is:

f_{θ}^{X_{2} | X_{1} = x_{1}} (x_{2}) : = \frac{f_{θ} (x_{1}, x_{2})}{f_{θ, X_{1}} (x_{1})}

for almost every

x_{1}

, where

f_{θ, X_{1}} (x_{1})

stands for the marginal density of

X_{1}

.

An estimator of the conditional density

f_{θ}^{X_{2} | X_{1}}

from an n-sized sample of the joint distribution of

(X_{1}, X_{2})

is a map:

m : ({(Ω_{1} \times Ω_{2})}^{n} \times Ω_{1} \times Ω_{2}, {(A_{1} \times A_{2})}^{n} \times A_{1} \times A_{2}) \to (R, R)

such that, being observed

x^{'} = ((x_{11}^{'}, x_{21}^{'}), \dots, (x_{1 n}^{'}, x_{2 n}^{'})) \in {(Ω_{1} \times Ω_{2})}^{n}

,

m (x^{'}, x_{1}, \cdot)

is considered to be an estimation of the conditional density

f_{θ}^{X_{2} | X_{1} = x_{1}}

of

X_{2}

given

X_{1} = x_{1}

.

It is well known (see, for instance [7], p. 126)) that, given two probability measures

P_{1}

and

P_{2}

on a measurable space

(Ω, A)

having densities

p_{1}

and

p_{2}

with respect to a

σ

-finite measure

μ

:

sup_{A \in A} | P_{1} (A) - P_{2} (A) | = \frac{1}{2} \int_{Ω} | p_{1} - p_{2} | d μ .

Thus, the Bayesian estimation of the conditional distribution

P_{θ}^{X_{2} | X_{1} = x_{1}} = R_{θ}^{π_{2} | π_{1} = x_{1}}

for the squared total variation loss function corresponds to the Bayesian estimation of its density

f_{θ}^{X_{2} | X_{1} = x_{1}}

for the

L^{1}

-squared loss function. Hence, according to Theorem 1, the Bayes estimator of the conditional density

f_{θ}^{X_{2} | X_{1} = x_{1}}

for the

L^{1}

-squared loss function is the

μ_{2}

-density

{f_{n, x^{'}}^{*}}^{X_{2} | X_{1} = x_{1}}

of the conditional distribution:

{[{R_{n, x^{'}}^{*}}^{R}]}^{π_{2} | π_{1} = x_{1}} .

Note that:

\begin{matrix} {R_{n, x^{'}}^{*}}^{R} (A_{1} \times A_{2}) = \int_{Θ} R_{θ} (A_{1} \times A_{2}) d R_{n, x^{'}}^{*} (θ) = \\ \int_{Θ} \int_{A_{1} \times A_{2}} f_{θ} (x_{1}, x_{2}) d (μ_{1} \times μ_{2}) (x_{1}, x_{2}) r_{n, x^{'}}^{*} (θ) d Q (θ) = \\ \int_{A_{1} \times A_{2}} \int_{Θ} f_{θ} (x_{1}, x_{2}) r_{n, x^{'}}^{*} (θ) d Q (θ) d (μ_{1} \times μ_{2}) (x_{1}, x_{2}) \end{matrix}

where

r_{n, x^{'}}^{*} (θ)

denotes the Q-density of the posterior distribution

R_{n, x^{'}}^{*}

. Thus,

r_{n, x^{'}}^{*} (θ)

is of the form

K (x^{'}) f_{n, θ} (x^{'})

, where:

f_{n, θ} (x^{'}) : = \prod_{i = 1}^{n} f_{θ} (x_{i}^{'})

is the

{(μ_{1} \times μ_{2})}^{n}

-density of

R_{θ}^{n}

. Hence, the

μ_{1} \times μ_{2}

-density of the posterior predictive distribution

{R_{x^{'}}^{*}}^{R}

is:

f_{n, x^{'}}^{*} (x_{1}, x_{2}) : = \int_{Θ} f_{θ} (x_{1}, x_{2}) r_{n, x^{'}}^{*} (θ) d Q (θ),

and its first marginal is:

f_{n, x^{'}, 1}^{*} (x_{1}) : = \int_{Ω_{2}} \int_{Θ} f_{θ} (x_{1}, t) r_{n, x^{'}}^{*} (θ) d Q (θ) d μ_{2} (t) .

Thus, we have proved the following result.

Theorem 2.

Assume that

A_{2}

is separable. The Bayes estimator of the conditional density

f_{θ}^{X_{2} | X_{1}}

for the

L^{1}

-squared loss function is the

μ_{2}

-density:

{f_{n, x^{'}}^{*}}^{X_{2} | X_{1} = x_{1}} (x_{2}) : = \frac{f_{n, x^{'}}^{*} (x_{1}, x_{2})}{f_{n, x^{'}, 1}^{*} (x_{1})} = \frac{\int_{Θ} f_{θ} (x_{1}, x_{2}) r_{n, x^{'}}^{*} (θ) d Q (θ)}{\int_{Ω_{2}} \int_{Θ} f_{θ} (x_{1}, t) r_{n, x^{'}}^{*} (θ) d Q (θ) d μ_{2} (t)}

of the conditional distribution

{[{R_{n, x^{'}}^{*}}^{R}]}^{π_{2} | π_{1}}

of

π_{2}

given

π_{1}

with respect to the posterior predictive distribution

{R_{n, x^{'}}^{*}}^{R}

:

\begin{matrix} \int_{{(Ω_{1} \times Ω_{2})}^{n + 1} \times Θ} {(\int_{Ω_{2}} |{f_{n, x^{'}}^{*}}^{X_{2} | X_{1} = x_{1}} (t) - f_{θ}^{X_{2} | X_{1} = x_{1}} (t)| d μ_{2} (t))}^{2} d Π_{n} (x^{'}, x, θ) \leq \\ \int_{{(Ω_{1} \times Ω_{2})}^{n + 1} \times Θ} {(\int_{Ω_{2}} |m (x^{'}, x_{1}, t) - f_{θ}^{X_{2} | X_{1} = x_{1}} (t)| d μ_{2} (t))}^{2} d Π_{n} (x^{'}, x, θ), \end{matrix}

for any estimator m of the conditional density.

5. Bayes Estimator of the Conditional Distribution Function

When

X_{2}

is a r.r.v., we may be interested in the estimation of the conditional distribution function of

X_{2}

given

X_{1} = x_{1}

:

F_{θ} (x_{1}, t) : = P_{θ} (X_{2} \leq t | X_{1} = x_{1}) = R_{θ}^{π_{2} | π_{1} = x_{1}} (] - \infty, t]) .

An estimator of such a conditional distribution function from an n-sized sample of

R_{θ}

is a map of the form:

F : (x^{'}, x_{1}, t) \in {(Ω_{1} \times R)}^{n} \times Ω_{1} \times R \mapsto F (x^{'}, x_{1}, t) : = M (x^{'}, x_{1},] - \infty, t]) \in [0, 1]

for a Markov kernel:

M : ({(Ω_{1} \times R)}^{n} \times Ω_{1}, {(A_{1} \times R)}^{n} \times A_{1}) ≻ ⟶ (R, R) .

An optimal estimator of the conditional distribution function

F_{θ}

for the

L^{\infty}

-squared loss function from a Bayesian point of view (i.e., a Bayes estimator) is an estimator

F_{n}^{*}

minimizing the Bayes risk, i.e., such that:

\begin{matrix} \int_{{(Ω_{1} \times R)}^{n + 1} \times Θ} sup_{t \in R} {| F_{n}^{*} (x^{'}, x_{1}, t) - F_{θ} (x_{1}, t) |}^{2} d Π_{n} (x^{'}, x, θ) \leq \\ \int_{{(Ω_{1} \times R)}^{n + 1} \times Θ} sup_{t \in R} {| F (x^{'}, x_{1}, t) - F_{θ} (x_{1}, t) |}^{2} d Π_{n} (x^{'}, x, θ) \end{matrix}

for any estimator F of the conditional distribution function

F_{θ}

.

A natural candidate is the conditional distribution function for the posterior predictive distribution, as is stated in the following theorem.

Theorem 3.

The posterior predictive conditional distribution function:

F_{n}^{*} (x^{'}, x_{1}, t) : = {({R_{n, x^{'}}^{*}}^{R})}^{π_{2} | π_{1} = x_{1}} (] - \infty, t])

is the Bayes estimator of the conditional distribution function

F_{θ}

for the

L^{\infty}

-squared loss function.

6. Bayes Estimator of a Regression Curve

Now assume that

X_{2}

is a squared-integrable r.r.v. Thus,

(Ω_{2}, A_{2}) = (R, R)

. The regression curve of

X_{2}

given

X_{1}

is the map

x_{1} \in Ω_{1} \mapsto r_{θ} (x_{1}) : = E_{θ} (X_{2} | X_{1} = x_{1})

. An estimator of the regression curve

r_{θ}

from a sample of size n of the joint distribution of

(X_{1}, X_{2})

is a statistic:

m : (x^{'}, x_{1}) \in {(Ω_{1} \times R)}^{n} \times Ω_{1} ⟼ m (x^{'}, x_{1}) \in R,

so that, being observed

x^{'} \in {(Ω_{1} \times R)}^{n}

,

m (x^{'}, \cdot)

is the estimation of

r_{θ}

.

From a frequentist point of view, the simplest way to evaluate the error in estimating an unknown regression curve is to use the expectation of the quadratic deviation (see [6], p. 120):

\begin{matrix} E_{θ} [\int_{Ω_{1}} {(m (x^{'}, x_{1}) - r_{θ} (x_{1}))}^{2} d P_{θ}^{X_{1}} (x_{1})] = \\ \int_{{(Ω_{1} \times R)}^{n}} \int_{Ω_{1}} {(m (x^{'}, x_{1}) - r_{θ} (x_{1}))}^{2} d R_{θ}^{π_{1}} (x_{1}) d R_{θ}^{n} (x^{'}) . \end{matrix}

From a Bayesian point of view, the Bayes estimator of the regression curve

r_{θ}

should minimize the Bayes risk (i.e., the prior mean of the expectation of the quadratic deviation):

\begin{matrix} \int_{Θ} \int_{{(Ω_{1} \times R)}^{n}} \int_{Ω_{1}} {(m (x^{'}, x_{1}) - r_{θ} (x_{1}))}^{2} d R_{θ}^{π_{1}} (x_{1}) d R_{θ}^{n} (x^{'}) d Q (θ) = \\ E_{Π_{n}} [{(m (x^{'}, x_{1}) - r_{θ} (x_{1}))}^{2}] . \end{matrix}

The following result solves the problem of estimating the regression curve from a Bayesian point of view.

Theorem 4.

The regression curve of

π_{2}

on

π_{1}

with respect to the posterior predictive distribution

{R_{n, x^{'}}^{*}}^{R}

:

m_{n}^{*} (x^{'}, x_{1}) : = E_{{R_{n, x^{'}}^{*}}^{R}} (π_{2} | π_{1} = x_{1})

is the Bayes estimator of the regression curve

r_{θ} (x_{1}) : = E_{θ} (X_{2} | X_{1} = x_{1})

for the squared error loss function:

E_{Π_{n}} [{(m_{n}^{*} (x^{'}, x_{1}) - r_{θ} (x_{1}))}^{2}] \leq E_{Π_{n}} [{(m_{n} (x^{'}, x_{1}) - r_{θ} (x_{1}))}^{2}]

for any other estimator

m_{n}

of the regression curve

r_{θ}

.

Remark 1.

(Estimation of the regression curve when densities are available.) According to the previous results, when

R_{θ}

has density

f_{θ}

with respect to the product

μ_{1} \times μ_{2}

of two σ-finite measures, the

μ_{2}

-density

{f_{n, x^{'}}^{*}}^{X_{2} | X_{1} = x_{1}}

of the conditional distribution

{[{R_{n, x^{'}}^{*}}^{R}]}^{π_{2} | π_{1} = x_{1}}

is:

{f_{n, x^{'}}^{*}}^{X_{2} | X_{1} = x_{1}} (x_{2}) : = \frac{f_{n, x^{'}}^{*} (x_{1}, x_{2})}{f_{n, x^{'}, 1}^{*} (x_{1})} = \frac{\int_{Θ} f_{θ} (x_{1}, x_{2}) r_{n, x^{'}}^{*} (θ) d Q (θ)}{\int_{R} \int_{Θ} f_{θ} (x_{1}, t) r_{n, x^{'}}^{*} (θ) d Q (θ) d μ_{2} (t)},

which is the Bayes estimator of the conditional density

f_{θ}^{X_{2} | X_{1}}

. Hence, the Bayes estimator of the regression curve can be computed as:

\begin{matrix} m_{n}^{*} (x^{'}, x_{1}) : = E_{{R_{n, x^{'}}^{*}}^{R}} (π_{2} | π_{1} = x_{1}) = \\ \int_{R} x_{2} \cdot {f_{n, x^{'}}^{*}}^{X_{2} | X_{1} = x_{1}} (x_{2}) d μ_{2} (x_{2}) = \\ \frac{\int_{R} x_{2} \int_{Θ} f_{θ} (x_{1}, x_{2}) r_{n, x^{'}}^{*} (θ) d Q (θ) d μ_{2} (x_{2})}{\int_{R} \int_{Θ} f_{θ} (x_{1}, x_{2}) r_{n, x^{'}}^{*} (θ) d Q (θ) d μ_{2} (x_{2})} . \end{matrix}

7. Examples

Example 1.

Let us assume that, for

θ, λ, x_{1} > 0

,

P_{θ}^{X_{1}} = G (1, θ^{- 1})

,

P_{θ}^{X_{2} | X_{1} = x_{1}} = G (1, {(θ x_{1})}^{- 1})

, and

Q = G (1, λ^{- 1})

, where

G (α, β)

denotes the gamma distribution of parameters

α, β > 0

(where β stands for the scale parameter). Hence, the joint density of

X_{1}

and

X_{2}

is:

f_{θ} (x_{1}, x_{2}) = θ^{2} x_{1} exp {- θ x_{1} (1 + x_{2})} I_{{] 0, \infty [}^{2}} (x_{1}, x_{2}) .

Then the density of

R_{θ}^{n}

is:

f_{n, θ} (x^{'}) = θ^{2 n} \cdot \prod_{i = 1}^{n} x_{i 1}^{'} \cdot exp \{- θ \sum_{i = 1}^{n} x_{i 1}^{'} (1 + x_{i 2}^{'})\} \cdot I_{{] 0, \infty [}^{2 n}} (x^{'}),

and the posterior Q-density given

x^{'}

is:

\frac{d R_{n, x^{'}}^{*} (θ)}{d Q} = : r_{n, x^{'}}^{*} (θ) = K (x^{'}) f_{n, θ} (x^{'})

where

K (x^{'}) = {[\int_{0}^{\infty} f_{n, θ} (x^{'}) d Q (θ)]}^{- 1}

.

Hence, the posterior predictive density given

x^{'} \in {] 0, \infty [}^{2 n}

is:

\begin{matrix} f_{n, x^{'}}^{*} (x) = \int_{0}^{\infty} f_{θ} (x) r_{n, x^{'}}^{*} (θ) d Q (θ) = \\ λ K (x^{'}) x_{1} \prod_{i = 1}^{n} x_{i 1}^{'} \int_{0}^{\infty} θ^{2 n + 2} exp \{- θ [λ + x_{1} (1 + x_{2}) + \sum_{i = 1}^{n} x_{i 1}^{'} (1 + x_{i 2}^{'})]\} d θ \cdot I_{{] 0, \infty [}^{2}} (x) . \end{matrix}

Since:

\int_{0}^{\infty} θ^{n} exp {- a θ} d θ = \frac{n!}{a^{n + 1}},

we have that:

f_{n, x^{'}}^{*} (x) = \frac{(2 n + 2)! λ K (x^{'}) x_{1} \prod_{i = 1}^{n} x_{i 1}^{'}}{{[λ + x_{1} (1 + x_{2}) + \sum_{i = 1}^{n} x_{i 1}^{'} (1 + x_{i 2}^{'})]}^{(2 n + 3)}}

and its first marginal is:

f_{n, x^{'}, 1}^{*} (x_{1}) = \int_{R} f_{n, x^{'}}^{*} (x_{1}, x_{2}) d x_{2} = \int_{0}^{\infty} \frac{A}{{(B t + C)}^{m}} d t = \frac{A}{(m - 1) B C^{m - 1}}

where:

m = 2 n + 3, A = (2 n + 2)! λ K (x^{'}) x_{1} \prod_{i = 1}^{n} x_{i 1}^{'}, B = x_{1}, a n d C = λ + x_{1} + \sum_{i = 1}^{n} x_{i 1}^{'} (1 + x_{i 2}^{'}) .

Thus:

f_{n, x^{'}, 1}^{*} (x_{1}) = \frac{(2 n + 2)! λ K (x^{'}) x_{1} \prod_{i = 1}^{n} x_{i 1}^{'}}{(2 n + 2) x_{1} {[λ + x_{1} + \sum_{i = 1}^{n} x_{i 1}^{'} (1 + x_{i 2}^{'})]}^{2 n + 2}} .

The Bayes estimator of the conditional density

f_{θ}^{X_{2} | X_{1} = x_{1}} (x_{2}) = θ x_{1} exp {- θ x_{1} x_{2}} I_{] 0, \infty [} (x_{2})

is, for

x_{1}, x_{2} > 0

:

\begin{matrix} {f_{n, x^{'}}^{*}}^{X_{2} | X_{1} = x_{1}} (x_{2}) : = \frac{f_{n, x^{'}}^{*} (x_{1}, x_{2})}{f_{n, x^{'}, 1}^{*} (x_{1})} = \\ (2 n + 2) x_{1} \frac{{[λ + x_{1} + \sum_{i = 1}^{n} x_{i 1}^{'} (1 + x_{i 2}^{'})]}^{2 n + 2}}{{[λ + x_{1} (1 + x_{2}) + \sum_{i = 1}^{n} x_{i 1}^{'} (1 + x_{i 2}^{'})]}^{2 n + 3}} = \\ \frac{(2 n + 2) x_{1} a_{n} {(x^{'}, x_{1})}^{2 n + 2}}{{(x_{1} x_{2} + a_{n} (x^{'}, x_{1}))}^{2 n + 3}} \end{matrix}

where:

a_{n} (x^{'}, x_{1}) = λ + x_{1} + \sum_{i = 1}^{n} x_{i 1}^{'} (1 + x_{i 2}^{'}) .

The Bayes estimator of the conditional distribution function:

F_{θ} (x_{1}, t) : = P_{θ} (X_{2} \leq t | X_{1} = x_{1})

is, for

t > 0

:

\begin{matrix} \begin{matrix} F_{n}^{*} (x^{'}, x_{1}, t) & = \int_{0}^{t} (2 n + 2) x_{1} \frac{{[λ + x_{1} + \sum_{i = 1}^{n} x_{i 1}^{'} (1 + x_{i 2}^{'})]}^{2 n + 2}}{{[λ + x_{1} (1 + x_{2}) + \sum_{i = 1}^{n} x_{i 1}^{'} (1 + x_{i 2}^{'})]}^{2 n + 3}} d x_{2} \\ = \int_{0}^{t} \frac{(2 n + 2) x_{1} a_{n} {(x^{'}, x_{1})}^{2 n + 2}}{{(x_{1} x_{2} + a_{n} (x^{'}, x_{1}))}^{2 n + 3}} d x_{2} \\ = a_{n} {(x^{'}, x_{1})}^{2 n + 2} (\frac{1}{a_{n} {(x^{'}, x_{1})}^{2 n + 2}} - \frac{1}{{(x_{1} t + a_{n} (x^{'}, x_{1}))}^{2 n + 2}}) \\ = 1 - {(1 + \frac{x_{1} t}{a_{n} (x^{'}, x_{1})})}^{- 2 n - 2} . \end{matrix} \end{matrix}

The Bayes estimator of the regression curve

r_{θ} (x_{1}) : = E_{θ} (X_{2} | X_{1} = x_{1}) = \frac{1}{θ x_{1}}

is, for

x_{1} > 0

:

m_{n}^{*} (x^{'}, x_{1}) = \int_{0}^{\infty} x_{2} \cdot {f_{n, x^{'}}^{*}}^{X_{2} | X_{1} = x_{1}} (x_{2}) d x_{2} = \frac{λ + x_{1} + \sum_{i = 1}^{n} x_{i 1}^{'} (1 + x_{i 2}^{'})}{(2 n + 1) x_{1}} = \frac{a_{n} (x^{'}, x_{1})}{(2 n + 1) x_{1}} .

Example 2.

Let us assume that

X_{1}

has a Bernoulli distribution

B i (1, θ)

of unknown parameter

θ \in] 0, 1 [

(i.e.,

P_{θ}^{X_{1}} = B i (1, θ)

) and, given

X_{1} = k_{1} \in {0, 1}

,

X_{2}

has distribution

B i (1, 1 - θ)

when

k_{1} = 0

and

B i (1, θ)

when

k_{1} = 1

, i.e.,

P_{θ}^{X_{2} | X_{1} = k_{1}} = B i (1, k_{1} + (1 - 2 k_{1}) (1 - θ))

. We can think of tossing a coin with probability θ of getting heads (

= 1

) and making a second toss of this coin if it comes up heads on the first toss, or tossing a second coin with probability

1 - θ

of making heads if the first toss is tails (=0). Consider the uniform distribution on

] 0, 1 [

as the prior distribution Q.

Then, the joint probability function of

X_{1}

and

X_{2}

is:

\begin{matrix} \begin{matrix} f_{θ} (k_{1}, k_{2}) & = θ^{k_{1}} {(1 - θ)}^{1 - k_{1}} {[k_{1} + (1 - 2 k_{1}) (1 - θ)]}^{k_{2}} {[1 - k_{1} - (1 - 2 k_{1}) (1 - θ)]}^{1 - k_{2}} \\ = \{\begin{matrix} θ (1 - θ) i f k_{2} = 0, \\ {(1 - θ)}^{2} i f k_{1} = 0, k_{2} = 1, \\ θ^{2} i f k_{1} = 1, k_{2} = 1 . \end{matrix} \end{matrix} \end{matrix}

The probability function of

R_{θ}^{n}

is:

f_{n, θ} (k^{'}) = \prod_{i = 1}^{n} f_{θ} (k_{i}^{'}) = θ^{a_{n} (k^{'})} {(1 - θ)}^{b_{n} (k^{'})}

for

k^{'} = (k_{1}^{'}, \dots, k_{n}^{'}) = (k_{11}^{'}, k_{12}^{'}, \dots, k_{n 1}^{'}, k_{n 2}^{'}) \in {0, 1}^{2 n}

, where:

a_{n} (k^{'}) = n_{+ 0} (k^{'}) + 2 n_{11} (k^{'}), b_{n} (k^{'}) = n_{+ 0} (k^{'}) + 2 n_{01} (k^{'}),

being

n_{j_{1} j_{2}} (k^{'})

the number of indices

i \in {1, \dots, n}

such that

(k_{i 1}^{'}, k_{i 2}^{'}) = (j_{1}, j_{2})

and

n_{+ j} = n_{0 j} + n_{1 j}

for

j = 0, 1

. Note that

a_{n} (k^{'}) + b_{n} (k^{'}) = 2 n

.

Hence, the posterior Q-density given

k^{'}

is:

r_{n, k^{'}}^{*} (θ) = \frac{d R_{n, k^{'}}^{*}}{d Q} (θ) = K (k^{'}) f_{n, θ} (k^{'}) = K (k^{'}) θ^{a_{n} (k^{'})} {(1 - θ)}^{b_{n} (k^{'})}

where:

K (k^{'}) = {[\int_{0}^{1} f_{n, θ} (k^{'}) d Q (θ)]}^{- 1} = \frac{1}{B (a_{n} (k^{'}) + 1, b_{n} (k^{'}) + 1)},

where

B (α, β)

stands for the beta function.

Thus, the posterior predictive density given

k^{'} \in {0, 1}^{2 n}

is:

\begin{matrix} \begin{matrix} f_{n, k^{'}}^{*} (k_{1}, k_{2}) & = \int_{0}^{1} f_{θ} (k_{1}, k_{2}) r_{n, k^{'}}^{*} (θ) d Q (θ) \\ = K (k^{'}) \int_{0}^{1} θ^{a_{n} (k^{'}) + k_{1}} {(1 - θ)}^{b_{n} (k^{'}) + k_{2}} d θ \\ = K (k^{'}) B (a_{n} (k^{'}) + k_{1} + 1, b_{n} (k^{'}) + k_{2} + 1), \end{matrix} \end{matrix}

and its first marginal is:

\begin{matrix} f_{n, k^{'}, 1}^{*} (k_{1}) = K (k^{'}) [B (a_{n} (k^{'}) + k_{1} + 1, b_{n} (k^{'}) + 1) + B (a_{n} (k^{'}) + k_{1} + 1, b_{n} (k^{'}) + 2)] . \end{matrix}

The Bayes estimator of the conditional probability function:

f_{θ}^{X_{2} | X_{1} = k_{1}} (k_{2}) = {[k_{1} + (1 - 2 k_{1}) (1 - θ)]}^{k_{2}} {[1 - k_{1} - (1 - 2 k_{1}) (1 - θ)]}^{1 - k_{2}}

is

\begin{matrix} {f_{n, x^{'}}^{*}}^{X_{2} | X_{1} = k_{1}} (k_{2}) : = \frac{f_{n, k^{'}}^{*} (k_{1}, k_{2})}{f_{n, k^{'}, 1}^{*} (k_{1})} = \{\begin{matrix} \frac{2 n + 2}{2 n + n_{+ 0} (k^{'}) + 2 n_{01} (k^{'}) + 3} & i f k_{1} = k_{2} = 0, \\ \frac{n_{+ 0} (k^{'}) + 2 n_{01} (k^{'}) + 1}{2 n + n_{+ 0} (k^{'}) + 2 n_{01} (k^{'}) + 3} & i f k_{1} = 0, k_{2} = 1, \\ \frac{2 n + 3}{2 n + n_{+ 0} (k^{'}) + 2 n_{01} (k^{'}) + 4} & i f k_{1} = 1, k_{2} = 0, \\ \frac{n_{+ 0} (k^{'}) + 2 n_{01} (k^{'}) + 1}{2 n + n_{+ 0} (k^{'}) + 2 n_{01} (k^{'}) + 4} & i f k_{1} = k_{2} = 1 . \end{matrix} \end{matrix}

The Bayes estimator of the conditional mean

r_{θ} (k_{1}) : = E_{θ} (X_{2} | X_{1} = k_{1}) = θ^{k_{1}} {(1 - θ)}^{1 - k_{1}}

is, for

k_{1} = 0, 1

:

m_{n}^{*} (k^{'}, k_{1}) = {f_{n, k^{'}}^{*}}^{X_{2} | X_{1} = k_{1}} (1) = \{\begin{matrix} \frac{n_{+ 0} (k^{'}) + 2 n_{01} (k^{'}) + 1}{2 n + n_{+ 0} (k^{'}) + 2 n_{01} (k^{'}) + 3} & i f k_{1} = 0, \\ \frac{n_{+ 0} (k^{'}) + 2 n_{01} (k^{'}) + 1}{2 n + n_{+ 0} (k^{'}) + 2 n_{01} (k^{'}) + 4} & i f k_{1} = 1 . \end{matrix}

Example 3.

Let

(X_{1}, X_{2})

have bivariate normal distribution with density:

\begin{matrix} f_{θ} (x) : = \frac{1}{2 π σ^{2} \sqrt{1 - ρ^{2}}} exp \{- \frac{1}{2 σ^{2} (1 - ρ^{2})} [{(x_{1} - θ)}^{2} - 2 ρ (x_{1} - θ) (x_{2} - θ) + {(x_{2} - θ)}^{2}]\} \\ = \frac{1}{2 π σ^{2} \sqrt{1 - ρ^{2}}} exp \{- \frac{1}{2 σ^{2} (1 - ρ^{2})} [x_{1}^{2} + x_{2}^{2} - 2 ρ x_{1} x_{2} - 2 (1 - ρ) (x_{1} + x_{2}) θ + 2 (1 - ρ) θ^{2}]\}, \end{matrix}

where

σ > 0

and

ρ \in] - 1, 1 [

are assumed to be known. Thus:

\begin{matrix} R_{θ} = N_{2} ((\begin{matrix} θ \\ θ \end{matrix}), σ^{2} (\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix})), X_{1}, X_{2} \sim_{θ} N (θ, σ^{2}) \\ P_{θ}^{X_{2} | X_{1} = x_{1}} = N ((1 - ρ) θ + ρ x_{1}, σ^{2} \sqrt{1 - ρ^{2}}), E_{θ} (X_{2} | X_{1} = x_{1}) = (1 - ρ) θ + ρ x_{1} . \end{matrix}

Hence, for

x^{'} = (x_{1}^{'}, \dots, x_{n}^{'}) = (x_{11}^{'}, x_{12}^{'}, \dots, x_{n 1}^{'}, x_{n 2}^{'}) \in {(R^{2})}^{n}

:

\begin{matrix} f_{n, θ} (x^{'}) = \prod_{i = 1}^{n} f_{θ} (x_{i}^{'}) = \\ \frac{1}{{[2 π σ^{2} \sqrt{1 - ρ^{2}}]}^{n}} exp \{- \frac{1}{2 σ^{2} (1 - ρ^{2})} \sum_{i = 1}^{n} [{(x_{i 1}^{'} - θ)}^{2} - 2 ρ (x_{i 1}^{'} - θ) (x_{i 2}^{'} - θ) + {(x_{i 2}^{'} - θ)}^{2}]\} . \end{matrix}

Let us consider the prior distribution

Q = N (μ, τ^{2})

whose density is:

g (θ) = \frac{1}{τ \sqrt{2 π}} exp \{- \frac{1}{2 τ^{2}} {(θ - μ)}^{2}\} .

The posterior Q-density is:

\begin{matrix} r_{n, x^{'}}^{*} (θ) : = \frac{d R_{n, x^{'}}^{*}}{d Q} (θ) = K_{1} (x^{'}) f_{n, θ} (x^{'}) = \\ \frac{K_{1} (x^{'})}{{[2 π σ^{2} \sqrt{1 - ρ^{2}}]}^{n}} exp \{- \frac{1}{2 σ^{2} (1 - ρ^{2})} [2 n (1 - ρ) θ^{2} - 2 (1 - ρ) s_{1} (x^{'}) θ + s_{2} (x^{'}) - 2 ρ p (x^{'})]\} \end{matrix}

where:

\begin{matrix} s_{1} (x^{'}) : = \sum_{i} (x_{i 1}^{'} + x_{i 2}^{'}), s_{2} (x^{'}) : = \sum_{i} ({x_{i 1}^{'}}^{2} + {x_{i 2}^{'}}^{2}), p (x^{'}) = \sum_{i} x_{i 1}^{'} x_{i 2}^{'}, \\ K_{1} (x^{'}) = {[\int_{R} f_{n, x^{'}} (θ) d Q (θ)]}^{- 1} . \end{matrix}

Note that, writing

c_{n} (ρ, σ, τ) = {[τ \sqrt{2 π} {(2 π σ^{2} \sqrt{1 - ρ^{2}})}^{n}]}^{- 1}

:

\begin{matrix} \int_{R} f_{n, x^{'}} (θ) d Q (θ) = c_{n} (ρ, σ, τ) \cdot \\ \int_{R} \{- \frac{1}{2 σ^{2} (1 - ρ^{2})} [2 n (1 - ρ) θ^{2} - 2 (1 - ρ) s_{1} (x^{'}) θ + s_{2} (x^{'}) - 2 ρ p (x^{'})] - \frac{1}{2 τ^{2}} {(θ - μ)}^{2}]\} d θ \\ = c_{n} (ρ, σ, τ) \int_{R} exp {- (A_{1} θ^{2} - B_{1} (x^{'}) θ + C_{1} (x^{'}))} d θ \\ = \frac{exp \{- (C_{1} (x^{'}) - \frac{B_{1}^{2} (x^{'})}{4 A_{1}})\}}{τ \sqrt{2 π} {[2 π σ^{2} \sqrt{1 - ρ^{2}}]}^{n}} \int_{R} exp \{- A_{1} {(θ - \frac{B_{1} (x^{'})}{2 A_{1}})}^{2}\} d θ \\ = \frac{exp \{- (C_{1} (x^{'}) - \frac{B_{1}^{2} (x^{'})}{4 A_{1}})\}}{τ \sqrt{2 A_{1}} {[2 π σ^{2} \sqrt{1 - ρ^{2}}]}^{n}} \end{matrix}

where:

\begin{matrix} A_{1} = \frac{n}{σ^{2} (1 + ρ)} + \frac{1}{2 τ^{2}}, B_{1} (x^{'}) = \frac{s_{1} (x^{'})}{σ^{2} (1 + ρ)} + \frac{μ}{τ^{2}}, C_{1} (x^{'}) = \frac{s_{2} (x^{'}) - 2 ρ p (x^{'})}{2 σ^{2} (1 - ρ^{2})} + \frac{μ^{2}}{2 τ^{2}} . \end{matrix}

Hence:

K_{1} (x^{'}) = τ \sqrt{2 A_{1}} {[2 π σ^{2} \sqrt{1 - ρ^{2}}]}^{n} exp \{C_{1} (x^{'}) - \frac{B_{1}^{2} (x^{'})}{4 A_{1}}\} .

The posterior predictive density given

x^{'} \in {(R^{2})}^{n}

is:

\begin{matrix} f_{n, x^{'}}^{*} (x) : = \int_{R} f_{θ} (x) r_{n, x^{'}}^{*} (θ) g (θ) d θ = \frac{K_{1} (x^{'})}{τ \sqrt{2 π} {[2 π σ^{2} \sqrt{1 - ρ^{2}}]}^{n + 1}} \cdot \\ \int_{R} exp \{- \frac{2 (n + 1) (1 - ρ) θ^{2} - 2 (1 - ρ) (x_{1} + x_{2} + s_{1} (x^{'})) θ + (x_{1}^{2} + x_{2}^{2} + s_{2} (x^{'})) - 2 ρ (x_{1} x_{2} + p (x^{'}))}{2 σ^{2} (1 - ρ^{2})} \\ - \frac{(θ^{2} - 2 μ θ + μ^{2})}{2 τ^{2}}\} d θ = \\ K_{2} (x^{'}) \int_{R} exp {- (A_{2} θ^{2} - B_{2} (x, x^{'}) θ + C_{2} (x, x^{'}))} d θ = \\ K_{2} (x^{'}) exp \{- (C_{2} (x, x^{'}) - \frac{B_{2}^{2} (x, x^{'})}{4 A_{2}})\} \int_{R} exp \{- A_{2} {(θ - \frac{B_{2} (x, x^{'})}{2 A_{2}})}^{2}\} d θ = \\ K_{2} (x^{'}) \sqrt{\frac{π}{A_{2}}} exp \{- (C_{2} (x, x^{'}) - \frac{B_{2}^{2} (x, x^{'})}{4 A_{2}})\} \end{matrix}

where:

\begin{matrix} K_{2} (x^{'}) = \frac{K_{1} (x^{'})}{τ \sqrt{2 π} {[2 π σ^{2} \sqrt{1 - ρ^{2}}]}^{n + 1}} = \frac{\sqrt{A_{1}} exp \{C_{1} (x^{'}) - \frac{B_{1}^{2} (x^{'})}{4 A_{1}}\}}{2 π^{3 / 2} σ^{2} \sqrt{1 - ρ^{2}}}, A_{2} = \frac{(n + 1) (1 + ρ)}{σ^{2}} + \frac{1}{2 τ^{2}} \\ B_{2} (x . x^{'}) = \frac{x_{1} + x_{2} + s_{1} (x^{'})}{σ^{2} (1 + ρ)} + \frac{μ}{τ^{2}}, C_{2} (x, x^{'}) = \frac{x_{1}^{2} + x_{2}^{2} + s_{2} (x^{'}) - 2 ρ (x_{1} x_{2} + p (x^{'}))}{2 σ^{2} (1 - ρ^{2})} + \frac{μ^{2}}{τ^{2}} \end{matrix}

We can write:

\begin{matrix} C_{2} (x, x^{'}) - \frac{B_{2}^{2} (x, x^{'})}{4 A_{2}} = A_{3} x_{1}^{2} + A_{3} x_{2}^{2} + B_{3} x_{1} x_{2} + C_{3} (x^{'}) (x_{1} + x_{2}) + D_{3} (x^{'}) \end{matrix}

where:

\begin{matrix} A_{3} = \frac{1}{2 σ^{2} (1 - ρ^{2})} - \frac{τ^{2}}{σ^{2} {(1 + ρ)}^{2} [4 (n + 1) (1 + ρ) τ^{2} + 2 σ^{2}]}, \\ B_{3} = - \frac{ρ}{σ^{2} (1 - ρ^{2})} - \frac{τ^{2}}{σ^{2} {(1 + ρ)}^{2} [2 (n + 1) (1 + ρ) τ^{2} + σ^{2}]}, \\ C_{3} (x^{'}) = - \frac{τ^{2} s_{1} (x^{'}) + μ σ^{2} (1 + ρ)}{σ^{2} {(1 + ρ)}^{2} [2 (n + 1) (1 + ρ) τ^{2} + σ^{2}]}, \\ D_{3} (x^{'}) = \frac{τ^{2} s_{2} (x^{'}) - 2 ρ τ^{2} p (x^{'}) + 2 μ^{2} σ^{2} (1 - ρ^{2})}{2 τ^{2} σ^{2} (1 - ρ^{2})} - \frac{τ^{4} s_{1} {(x^{'})}^{2} + μ^{2} σ^{4} {(1 + ρ)}^{2} + 2 μ σ^{2} (1 + ρ) s_{1} (x^{'})}{τ^{2} σ^{2} {(1 + ρ)}^{2} [4 (n + 1) (1 + ρ) τ^{2} + 2 σ^{2}]} . \end{matrix}

It is readily shown that

A_{3} > 0

. It follows that the posterior predictive density

f_{n, x^{'}}^{*}

is the density of a normal bivariate distribution

N_{2} (m (x^{'}), Σ)

where:

m (x^{'}) = (\binom{m_{1} (x^{'})}{m_{2} (x^{'})}), Σ = σ_{1}^{2} (\begin{matrix} 1 & ρ_{1} \\ ρ_{1} & 1 \end{matrix})

being:

ρ_{1} = \frac{B_{3}}{2 A_{3}}, σ_{1}^{2} = \frac{2 A_{3}}{4 A_{3}^{2} - B_{3}^{2}}, m_{1} (x^{'}) = m_{2} (x^{'}) = - \frac{C_{3} (x^{'})}{2 (1 - ρ_{1})} .

It is easy to see that, as was to be expected,

| ρ_{1} | < 1

and

σ_{1}^{2} > 0

. If we denote:

a_{n} (ρ, σ, τ) : = 2 (n + 1) (1 + ρ) + \frac{σ^{2}}{τ^{2}}

we can write:

\begin{matrix} ρ_{1} = - \frac{a_{n} (ρ, σ, τ) + \frac{1 - ρ}{1 + ρ}}{a_{n} (ρ, σ, τ) - \frac{1 - ρ}{1 + ρ}} \cdot ρ, σ_{1}^{2} = \frac{a_{n} (ρ, σ, τ)}{a_{n} (ρ, σ, τ) - \frac{1 - ρ}{1 + ρ}} \cdot σ^{2}, \\ m_{1} (x^{'}) = m_{2} (x^{'}) = \frac{s_{1} (x^{'}) + (1 + ρ) \frac{σ^{2}}{τ^{2}} μ}{2 (1 - ρ_{1}) {(1 + ρ)}^{2} σ^{2} a_{n} (ρ, σ, τ)} . \end{matrix}

It follows that the conditional distribution:

{({R_{n, x^{'}}^{*}}^{R})}^{π_{2} | π_{1} = x_{1}} : = N ((1 - ρ_{1}) m_{1} (x^{'}) + ρ_{1} x_{1}, σ_{1}^{2} (1 - ρ_{1}^{2}))

is the Bayes estimator of the conditional distribution:

P_{θ}^{X_{2} | X_{1} = x_{1}} = N ((1 - ρ) θ + ρ x_{1}, σ^{2} (1 - ρ^{2}))

for the squared total variation function, and its density

{f_{n, x^{'}}^{*}}^{π_{2} | π_{1} = x_{1}}

is the Bayes estimator of the conditional density:

f_{θ}^{X_{2} | X_{1} = x_{1}} (x_{2}) = \frac{1}{σ \sqrt{2 π (1 - ρ^{2})}} exp \{- \frac{1}{2 σ^{2} (1 - ρ^{2})} {[x_{2} - (1 - ρ) θ - ρ x_{1}]}^{2}\}

for the

L^{1}

-squared loss function.Moreover, its mean:

E_{{R_{n, x^{'}}^{*}}^{R}} (π_{2} | π_{1} = x_{1}) = (1 - ρ_{1}) m_{1} (x^{'}) + ρ_{1} x_{1}

is the Bayes estimator of the regression curve:

E_{θ} (X_{2} | X_{1} = x_{1}) = (1 - ρ) θ + ρ x_{1}

for the squared error loss function.

8. Proofs

Proof of Proposition 1.

It follows from (5) and (2) that

Π_{n}^{(π^{'}, q)} = Π_{12, n}

because, for every

A_{12, n}^{'} \in {(A_{1} \times A_{2})}^{n}

and

T \in T

:

\begin{matrix} Π_{n}^{(π^{'}, q)} (A_{12, n}^{'} \times T) = Π_{n} ({(π^{'}, q)}^{- 1} (A_{12, n}^{'} \times T)) = Π_{n} (A_{12, n}^{'} \times (Ω_{1} \times Ω_{2}) \times T) = \\ \int_{T} R_{θ}^{n} (A_{12, n}^{'}) d Q (θ) = Π_{12, n} (A_{12, n}^{'} \times T) . \end{matrix}

Analogously, we can show that

Π_{n}^{(π, q)} = Π_{12, 1}

. Furthermore, (5) also proves that:

\begin{matrix} Π_{n}^{q} (T) = \int_{T} R_{θ} (Ω_{1} \times Ω_{2}) R_{θ}^{n} ({(Ω_{1} \times Ω_{2})}^{n}) d Q (θ) = Q (T), \end{matrix}

and:

\begin{matrix} Π_{n}^{π^{'}} (A_{12, n}^{'}) = \int_{Θ} R_{θ} (Ω_{1} \times Ω_{2}) R_{θ}^{n} (A_{12, n}^{'}) d Q (θ) = β_{12, n}^{*} (A_{12, n}^{'}), \end{matrix}

because of (2

^{''}

). Analogously, it can be proved that

Π_{n}^{π} = β_{12, 1}^{*}

. Moreover, the identity:

Π_{n}^{π^{'} | q = θ} = R_{θ}^{n}

follows from the definition of the conditional distribution and the facts that

Π_{n}^{q} = Q

and

\begin{matrix} Π_{n} (π^{'} \in A_{12, n}^{'}, q \in T) = \int_{T} R_{θ}^{n} (A_{12, n}^{'}) d Q (θ) . \end{matrix}

A similar reasoning proves that

Π_{n}^{π | q = θ} = R_{θ}

. From the definition of the posterior distribution we also have that:

\begin{matrix} Π_{n}^{(π^{'}, q)} (A_{12, n}^{'} \times T) = Π_{n} (π^{'} \in A_{12, n}^{'}, q \in T) = \int_{A_{12, n}^{'}} R_{n, x^{'}}^{*} (T) d β_{12, n}^{*} (x^{'}) \end{matrix}

which proves that

Π_{n}^{q | π^{'} = x^{'}} = R_{n, x^{'}}^{*}

as

Π_{n}^{π^{'}} = β_{12, n}^{*}

. In the same manner, we get

Π_{n}^{q | π = x} = R_{1, x}^{*}

. Now it is clear that, given

A_{i} \in A_{i}

,

i = 1, 2

:

P^{X_{1}} (A_{1}) = P_{θ}^{(X_{1}, X_{2})} (A_{1} \times Ω_{2}) = R_{θ} (A_{1} \times Ω_{2}) = R_{θ}^{π_{1}} (A_{1})

and that:

\begin{matrix} \int_{A_{1}} P_{θ}^{X_{2} | X_{1} = x_{1}} (A_{2}) d P_{θ}^{X_{1}} (x_{1}) = P_{θ}^{(X_{1}, X_{2})} (A_{1} \times A_{2}) = \\ R_{θ} (A_{1} \times A_{2}) = \int_{A_{1}} R_{θ}^{π_{2} | π_{1} = x_{1}} (A_{2}) d R_{θ}^{π_{1}} (x_{1}), \end{matrix}

so

P_{θ}^{X_{2} | X_{1} = x_{1}} = R_{θ}^{π_{2} | π_{1} = x_{1}}

. Finally, by definition of the conditional expectation, given

A_{1} \in A_{1}

, we have that, when

Ω_{2} = R

:

\begin{matrix} \int_{A_{1} \times R} x_{2} d R_{θ} (x_{1}, x_{2}) = \int_{A_{1}} E_{R_{θ}} (π_{2} | π_{1} = x_{1}) d R_{θ}^{π_{1}} (x_{1}) . \end{matrix}

However,

R_{θ}^{π_{1}} = P_{θ}^{X_{1}}

and:

\begin{matrix} \int_{A_{1} \times R} x_{2} d R_{θ} (x_{1}, x_{2}) = \int_{X_{1}^{- 1} (A_{1})} X_{2} d P_{θ} . \end{matrix}

This proves that

E_{R_{θ}} (π_{2} | π_{1} = x_{1}) = E_{P_{θ}} (X_{2} | X_{1} = x_{1})

. □

Proof of Lemma 1.

We must show that, for all

A_{12, n}^{'} \in {(A_{1} \times A_{2})}^{n}

and all

A_{1} \in A_{1}

:

\begin{matrix} \int_{A_{12, n}^{'} \times A_{1} \times Ω_{2} \times Θ} R_{θ}^{π_{2} | π_{1} = x_{1}} (A_{2}) d Π_{n} (x^{'}, x, θ) = \\ \int_{A_{12, n}^{'} \times A_{1}} {[{R_{n, x^{'}}^{*}}^{R}]}^{π_{2} | π_{1} = x_{1}} (A_{2}) d {Π_{n}}^{(π^{'}, π_{1})} (x^{'}, x_{1}) . \end{matrix}

(7)

According to (6), we have that, for all

A_{12, n}^{'} \in {(A_{1} \times A_{2})}^{n}

and all

A_{1} \in A_{1}

:

\begin{matrix} \int_{A_{12, n}^{'} \times A_{1} \times Ω_{2} \times Θ} R_{θ}^{π_{2} | π_{1} = x_{1}} (A_{2}) d Π_{n} (x^{'}, x, θ) = \\ \int_{Θ} \int_{A_{12, n}^{'}} \int_{A_{1}} R_{θ}^{π_{2} | π_{1} = x_{1}} (A_{2}) d R_{θ}^{π_{1}} (x_{1}) d R_{θ}^{n} (x^{'}) d Q (θ) = \\ \int_{Θ} R_{θ}^{n} (A_{12, n}^{'}) R_{θ} (A_{1} \times A_{2}) d Q (θ) . \end{matrix}

(8)

Note that, by definition of conditional distribution:

\begin{matrix} {R_{n, x^{'}}^{*}}^{R} (A_{1} \times A_{2}) = \int_{A_{1}} {[{R_{n, x^{'}}^{*}}^{R}]}^{π_{2} | π_{1} = x_{1}} (A_{2}) d {[{R_{n, x^{'}}^{*}}^{R}]}^{π_{1}} (x_{1}), \end{matrix}

(9)

and, by definition of posterior predictive distribution:

{R_{n, x^{'}}^{*}}^{R} (A_{1} \times A_{2}) = \int_{Θ} R_{θ} (A_{1} \times A_{2}) d R_{n, x^{'}}^{*} (θ) .

Note that, for any

A_{12, n}^{'} \in {(A_{1} \times A_{2})}^{n}

, being

R_{n, x^{'}}^{*} = Π_{n}^{q | π^{'} = x^{'}}

:

\begin{matrix} \int_{A_{12, n}^{'}} \int_{Θ} R_{θ} (A_{1} \times A_{2}) d R_{n, x^{'}}^{*} (θ) d Π_{n}^{π^{'}} (x^{'}) = \int_{A_{12, n}^{'}} \int_{Θ} R_{θ} (A_{1} \times A_{2}) d Π_{n}^{q | π^{'} = x^{'}} (θ) d Π_{n}^{π^{'}} (x^{'}) = \\ \int_{A_{12, n}^{'} \times Θ} R_{θ} (A_{1} \times A_{2}) d Π_{n}^{(π^{'}, q)} (x^{'}, θ) = \int_{{π^{'}}^{- 1} (A_{12, n}^{'})} R_{θ} (A_{1} \times A_{2}) d Π_{n} (x^{'}, x, θ), \end{matrix}

proving that:

\int_{Θ} R_{θ} (A_{1} \times A_{2}) d R_{n, x^{'}}^{*} (θ) = E_{Π_{n}} (r_{A_{1} \times A_{2}} | π^{'} = x^{'})

where

r_{A_{1} \times A_{2}} (θ) : = R_{θ} (A_{1} \times A_{2})

, and hence:

{R_{n, x^{'}}^{*}}^{R} (A_{1} \times A_{2}) = E_{Π_{n}} (r_{A_{1} \times A_{2}} | π^{'} = x^{'}) .

Thus, by definition of conditional expectation:

\begin{matrix} \int_{A_{12, n}^{'}} {R_{n, x^{'}}^{*}}^{R} (A_{1} \times A_{2}) d Π_{n}^{π^{'}} (x^{'}) = \int_{A_{12, n}^{'} \times Ω_{1} \times Ω_{2} \times Θ} R_{θ} (A_{1} \times A_{2}) d Π_{n} (x^{'}, x, θ) . \end{matrix}

(10)

However, the second term of this equation is:

\begin{matrix} \int_{A_{12, n}^{'} \times Ω_{1} \times Ω_{2} \times Θ} R_{θ} (A_{1} \times A_{2}) d Π_{n} (x^{'}, x, θ) = \\ \int_{Θ} \int_{A_{12, n}^{'}} R_{θ} (A_{1} \times A_{2}) d R_{θ}^{n} (x^{'}) d Q (θ) = \int_{Θ} R_{θ}^{n} (A_{12, n}^{'}) R_{θ} (A_{1} \times A_{2}) d Q (θ) \end{matrix}

which coincides with (8). This proves the lemma. □

Proof of Theorem 1.

Denoting by

{∥ \cdot ∥}_{2}

the usual norm on the Hilbert space

L^{2} ({(Ω_{1} \times Ω_{2})}^{n + 1} \times Θ, {(A_{1} \times A_{2})}^{n + 1} \times T, Π_{n})

, as a consequence of Jensen’s inequality we have that, given

A_{2} \in A_{2}

:

∥ H_{A_{2}} (x^{'}, x_{1}, θ) - M_{n}^{*} (x^{'}, x_{1}, A_{2}) ∥_{2} \leq {∥ H_{A_{2}} (x^{'}, x_{1}, θ) - G (x^{'}, x_{1}) ∥}_{2}

for any real measurable function G on

((Ω_{1} \times Ω_{2}) \times Ω_{1}, A_{1} \times A_{2}) \times A_{1})

. In particular, for any Markov kernel:

M : ({(Ω_{1} \times Ω_{2})}^{n} \times Ω_{1}, (A_{1} \times A_{2}) \times A_{1}) ≻ ⟶ (Ω_{2}, A_{2}),

we have:

\begin{matrix} ∥ H_{A_{2}} (x^{'}, x_{1}, θ) - M_{n}^{*} (x^{'}, x_{1}, A_{2}) ∥_{2} \leq {∥ H_{A_{2}} (x^{'}, x_{1}, θ) - M (x^{'}, x_{1}, A_{2}) ∥}_{2}, \end{matrix}

(11)

i.e.,

\begin{matrix} \int_{{(Ω_{1} \times Ω_{2})}^{n + 1} \times Θ} | M_{n}^{*} (x^{'}, x_{1}, A_{2}) - R_{θ}^{π_{2} | π_{1} = x_{1}} (A_{2}) |^{2} d Π_{n} (x^{'}, x, θ) \leq \\ \int_{{(Ω_{1} \times Ω_{2})}^{n + 1} \times Θ} | M (x^{'}, x_{1}, A_{2}) - R_{θ}^{π_{2} | π_{1} = x_{1}} (A_{2}) |^{2} d Π_{n} (x^{'}, x, θ) \end{matrix}

When the

σ

-field

A_{2}

is separable, there exists a countable field

A_{20}

such that:

\begin{matrix} C (x^{'}, x_{1}, θ) : = sup_{A_{2} \in A_{2}} | M_{n}^{*} (x^{'}, x_{1}, A_{2}) - R_{θ}^{π_{2} | π_{1} = x_{1}} (A_{2}) |^{2} = \\ sup_{A_{2} \in A_{20}} | M_{n}^{*} (x^{'}, x_{1}, A_{2}) - R_{θ}^{π_{2} | π_{1} = x_{1}} (A_{2}) |^{2} . \end{matrix}

Hence,

C (x^{'}, x_{1}, θ)

is

{(A_{1} \times A_{2})}^{n} \times A_{1} \times T

-measurable. Moreover, given

k \in N

, there exists

A_{2 k} (x^{'}, x_{1}, θ) \in A_{20}

such that:

C (x^{'}, x_{1}, θ) - \frac{1}{k} \leq | M_{n}^{*} (x^{'}, x_{1}, A_{2 k} (x^{'}, x_{1}, θ)) - R_{θ}^{π_{2} | π_{1} = x_{1}} (A_{2 k} (x^{'}, x_{1}, θ)) |^{2} .

Now we appeal to the Ryll-Nardzewski and Kuratowski measurable selection theorem as it appears in [10], p. 36. With the notation of that book, we take

({(Ω_{1} \times Ω_{2})}^{n} \times Ω_{1} \times Θ, {(A_{1} \times A_{2})}^{n} \times A_{1} \times T)

to be the measurable space

(Ω, B)

, and

A_{20}

(the countable field generating

A_{2}

) to be the complete separable metric space X. Given

k \in N

, let us consider the map

S_{k} : {(Ω_{1} \times Ω_{2})}^{n} \times Ω_{1} \times Θ \to P (X)

defined by:

S_{k} (x^{'}, x_{1}, θ) : = \{A_{2} \in A_{20} : C (x^{'}, x_{1}, θ) - \frac{1}{k} \leq | M_{n}^{*} (x^{'}, x_{1}, A_{2}) - R_{θ}^{π_{2} | π_{1} = x_{1}} (A_{2}) |^{2}\}

as the map

Ψ

of that book. We have that

\emptyset \neq S_{k} (x^{'}, x_{1}, θ) \subset A_{20}

and

S_{k} (x^{'}, x_{1}, θ)

is closed for the discrete topology on

A_{20}

. Moreover, given an open set

U \subset A_{20}

,

{(x^{'}, x_{1}, θ) : S_{k} (x^{'}, x_{1}, θ) \cap U \neq \emptyset} \in {(A_{1} \times A_{2})}^{n} \times A_{1} \times T,

because, given

A_{2} \in A_{20}

:

\begin{matrix} {(x^{'}, x_{1}, θ) : S_{k} (x^{'}, x_{1}, θ) ∋ A_{2}} = \\ \{(x^{'}, x_{1}, θ) : C (x^{'}, x_{1}, θ) - | M_{n}^{*} (x^{'}, x_{1}, A_{2}) - R_{θ}^{π_{2} | π_{1} = x_{1}} (A_{2}) |^{2} \leq \frac{1}{k}\} . \end{matrix}

Thus, according to the aforementioned measurable selection theorem, there exists a measurable map:

s_{k} : ({(Ω_{1} \times Ω_{2})}^{n} \times Ω_{1} \times Θ, {(A_{1} \times A_{2})}^{n} \times A_{1} \times T) \to (A_{20}, P (A_{20}))

such that:

s_{k} (x^{'}, x_{1}, θ) \in S_{k} (x^{'}, x_{1}, θ)

for every

(x^{'}, x_{1}, θ)

, or, which is the same:

C (x^{'}, x_{1}, θ) - \frac{1}{k} \leq | M_{n}^{*} (x^{'}, x_{1}, s_{k} (x^{'}, x_{1}, θ)) - R_{θ}^{π_{2} | π_{1} = x_{1}} (s_{k} (x^{'}, x_{1}, θ)) |^{2} .

Hence:

\begin{matrix} \int_{{(Ω_{1} \times Ω_{2})}^{n + 1} \times Θ} C (x^{'}, x_{1}, θ) d Π_{n} (x^{'}, x, θ) \leq \\ \int_{{(Ω_{1} \times Ω_{2})}^{n + 1} \times Θ} | M_{n}^{*} (x^{'}, x_{1}, s_{k} (x^{'}, x_{1}, θ)) - R_{θ}^{π_{2} | π_{1} = x_{1}} (s_{k} (x^{'}, x_{1}, θ)) |^{2} d Π_{n} (x^{'}, x, θ) + \frac{1}{k} \leq \\ \int_{{(Ω_{1} \times Ω_{2})}^{n + 1} \times Θ} sup_{A_{2} \in A_{2}} | M (x^{'}, x_{1}, A_{2}) - R_{θ}^{π_{2} | π_{1} = x_{1}} (A_{2}) |^{2} d Π_{n} (x^{'}, x, θ) + \frac{1}{k} . \end{matrix}

With k being arbitrary, this shows that

M_{n}^{*}

is the Bayes estimator of the conditional distribution

P^{X_{2} | X_{1}} (= R^{π_{2} | π_{1}})

for the squared total variation loss function. □

Proof of Theorem 3.

According to (11), taking

A_{2} =] - \infty, t]

, we have that, given

t \in R

:

∥ F_{θ} (x^{'}, x_{1}, t) - F_{n}^{*} (x^{'}, x_{1}, t) ∥_{2} \leq {∥ F_{θ} (x^{'}, x_{1}, t) - F (x^{'}, x_{1}, t) ∥}_{2}

for every estimator F of the conditional distribution function

F_{θ}

, where:

F_{n}^{*} (x^{'}, x_{1}, t) : = M_{n}^{*} (x^{'}, x_{1},] - \infty, t]) = {({R_{n, x^{'}}^{*}}^{R})}^{π_{2} | π_{1} = x_{1}} (] - \infty, t]),

i.e., for any t and every estimator F of

F_{θ}

:

\begin{matrix} \int_{{(Ω_{1} \times R)}^{n + 1} \times Θ} {| F_{n}^{*} (x^{'}, x_{1}, t) - F_{θ} (x_{1}, t) |}^{2} d Π_{n} (x^{'}, x, θ) \leq \\ \int_{{(Ω_{1} \times R)}^{n + 1} \times Θ} {| F (x^{'}, x_{1}, t) - F_{θ} (x_{1}, t) |}^{2} d Π_{n} (x^{'}, x, θ) \end{matrix}

Let

C (x^{'}, x_{1}, θ) : = {sup}_{t \in R} {| F_{n}^{*} (x^{'}, x_{1}, t) - F_{θ} (x_{1}, t) |}^{2}

. Since:

sup_{t \in R} | F_{n}^{*} (x^{'}, x_{1}, t) - F_{θ} (x_{1}, t) | = sup_{r \in Q} | F_{n}^{*} (x^{'}, x_{1}, r) - F_{θ} (x_{1}, r) |,

we have that, given

(x^{'}, x_{1}, θ)

and

k \in N

, there exists

r_{k} \in Q

such that:

C (x^{'}, x_{1}, θ) \leq {| F_{n}^{*} (x^{'}, x_{1}, r_{k}) - F_{θ} (x_{1}, r_{k}) |}^{2} + \frac{1}{k}

and hence:

\begin{matrix} \int_{{(Ω_{1} \times R)}^{n + 1} \times Θ} C (x^{'}, x_{1}, θ) d Π_{n} (x^{'}, x, θ) \leq \\ \int_{{(Ω_{1} \times R)}^{n + 1} \times Θ} {| F_{n}^{*} (x^{'}, x_{1}, r_{k}) - F_{θ} (x_{1}, r_{k}) |}^{2} d Π_{n} (x^{'}, x, θ) + \frac{1}{k} \leq \\ \int_{{(Ω_{1} \times R)}^{n + 1} \times Θ} sup_{t \in R} {| F (x^{'}, x_{1}, t) - F_{θ} (x_{1}, t) |}^{2} d Π_{n} (x^{'}, x, θ) + \frac{1}{k} . \end{matrix}

Note that

r_{k} = r_{k} (x^{'}, x_{1}, θ)

, and a judicious use of the aforementioned measurable selection theorem could solve the measure-theoretical technicalities in these inequalities, as in the proof of Theorem 1. With k being arbitrary, the result is proved. □

Proof of Theorem 4.

To prove the result, Jensen’s inequality could be helpful if we were to show that:

m_{n}^{*} (x^{'}, x_{1}) = E_{Π_{n}} (F | π^{'} = x^{'}, π_{1} = x_{1}),

where

F (x^{'}, x, θ) : = r_{θ} (x_{1}) (= E_{P_{θ}} (X_{2} | X_{1} = x_{1}) = E_{R_{θ}} (π_{2} | π_{1} = x_{1}))

. By definition of conditional expectation, this is equivalent to proving that, for all

A_{12, n}^{'} \in {(A_{1} \times R)}^{n}

and

A_{1} \in A_{1}

:

\int_{A_{12, n}^{'} \times A_{1} \times R \times Θ} F (x^{'}, x, θ) d Π_{n} (x^{'}, x, θ) = \int_{A_{12, n}^{'} \times A_{1}} m_{1}^{*} (x^{'}, x_{1}) d {Π_{n}}^{(π^{'}, π_{1})} (x^{'}, x_{1}),

or, which is the same:

\int_{A_{12, n}^{'} \times A_{1} \times R \times Θ} E_{R_{θ}} (π_{2} | π_{1} = x_{1}) d Π_{n} (x^{'}, x, θ) = \int_{A_{12, n}^{'} \times A_{1}} E_{{R_{n, x^{'}}^{*}}^{R}} (π_{2} | π_{1} = x_{1}) d {Π_{n}}^{(π^{'}, π_{1})} (x^{'}, x_{1}) .

Let us write

(α)

for the first term in this equality and

(β)

for the second.

Note that:

\begin{matrix} (α) = \int_{A_{12, n}^{'} \times A_{1} \times R \times Θ} E_{R_{θ}} (π_{2} | π_{1} = x_{1}) d Π_{n} (x^{'}, x, θ) = \\ \int_{Θ} \int_{A_{12, n}^{'}} \int_{A_{1}} E_{R_{θ}} (π_{2} | π_{1} = x_{1}) d R_{θ}^{π_{1}} (x_{1}) d R_{θ}^{n} (x^{'}) d Q (θ) = \\ \int_{Θ} R_{θ}^{n} (A_{12, n}^{'}) \int_{A_{1}} E_{R_{θ}} (π_{2} | π_{1} = x_{1}) d R_{θ}^{π_{1}} (x_{1}) d Q (θ) . \end{matrix}

Moreover:

\begin{matrix} (β) = \int_{A_{12, n}^{'} \times A_{1}} E_{{R_{n, x^{'}}^{*}}^{R}} (π_{2} | π_{1} = x_{1}) d {Π_{n}}^{(π^{'}, π_{1})} (x^{'}, x_{1}) = \\ \int_{A_{12, n}^{'} \times A_{1} \times R \times Θ} E_{{R_{n, x^{'}}^{*}}^{R}} (π_{2} | π_{1} = x_{1}) d Π_{n} (x^{'}, x, θ) = \\ \int_{Θ} \int_{A_{12, n}^{'}} \int_{A_{1}} E_{{R_{n, x^{'}}^{*}}^{R}} (π_{2} | π_{1} = x_{1}) d R_{θ}^{π_{1}} (x_{1}) d R_{θ}^{n} (x^{'}) d Q (θ) = \\ \int_{Θ} \int_{A_{12, n}^{'}} h_{A_{1}} (x^{'}, θ) d R_{θ}^{n} (x^{'}) d Q (θ) = \int_{Θ \times A_{12, n}^{'}} h_{A_{1}} (x^{'}, θ) d (Q \otimes R^{n}) (θ, x^{'}), \end{matrix}

where:

h_{A_{1}} (x^{'}, θ) : = \int_{A_{1}} E_{{R_{n, x^{'}}^{*}}^{R}} (π_{2} | π_{1} = x_{1}) d R_{θ}^{π_{1}} (x_{1}) .

Since

Q \otimes R^{n} = R_{n}^{*} \otimes β_{12, n}^{*}

, we have:

\begin{matrix} (β) = \int_{A_{12, n}^{'} \times Θ} h_{A_{1}} (x^{'}, θ) d (R_{n}^{*} \otimes β_{12, n}^{*}) (x^{'}, θ) = \int_{A_{12, n}^{'}} \int_{Θ} h_{A_{1}} (x^{'}, θ) d R_{n, x^{'}}^{*} (θ) d β_{12, n}^{*} (x^{'}) = \\ \int_{A_{12, n}^{'}} \int_{Θ} \int_{Ω_{1}} I_{A_{1}} (x) E_{{R_{x^{'}}^{*}}^{R}} (π_{2} | π_{1} = x_{1}) d R_{θ} (x) d R_{n, x^{'}}^{*} (θ) d β_{12, n}^{*} (x^{'}) = \\ \int_{A_{12, n}^{'}} \int_{Ω_{1} \times R} I_{A_{1}} (x) E_{{R_{n, x^{'}}^{*}}^{R}} (π_{2} | π_{1} = x_{1}) d {R_{n, x^{'}}^{*}}^{R} (x) d β_{12, n}^{*} (x^{'}) = \\ \int_{A_{12, n}^{'}} \int_{A_{1}} E_{{R_{n, x^{'}}^{*}}^{R}} (π_{2} | π_{1} = x_{1}) d {({R_{n, x^{'}}^{*}}^{R})}^{π_{1}} (x_{1}) d β_{12, n}^{*} (x^{'}) = \\ \int_{A_{12, n}^{'}} \int_{A_{1} \times R} π_{2} (x) d {R_{n, x^{'}}^{*}}^{R} (x) d β_{12, n, Q}^{*} (x^{'}) = \\ \int_{A_{12, n}^{'}} \int_{Θ} \int_{Ω_{1} \times R} I_{A_{1}} (x_{1}) x_{2} d R_{θ} (x) d R_{n, x^{'}}^{*} (θ) d β_{12, n}^{*} (x^{'}) = \\ \int_{A_{12, n}^{'} \times Θ} g_{A_{1}} (θ) d (R_{n}^{*} \otimes β_{12, n}^{*}) (x^{'}, θ) \end{matrix}

where:

g_{A_{1}} (θ) = \int_{Ω_{1} \times R} I_{A_{1}} (x_{1}) x_{2} d R_{θ} (x) .

Using again that

Q \otimes R^{n} = R_{n}^{*} \otimes β_{12, n}^{*}

, we have:

\begin{matrix} (β) = \int_{Θ \times A_{12, n}^{'}} g_{A_{1}} (θ) d (Q \otimes R^{n}) (θ, x^{'}) = \int_{Θ} \int_{A_{12, n}^{'}} g_{A_{1}} (θ) d R_{θ}^{n} (x^{'}) d Q (θ) = \\ \int_{Θ} R_{θ}^{n} (A_{12, n}^{'}) \int_{A_{1} \times R} x_{2} d R_{θ} (x) d Q (θ) = \\ \int_{Θ} R_{θ}^{n} (A_{12, n}^{'}) \int_{A_{1}} E_{R_{θ}} (π_{2} | π_{1} = x_{1}) d R_{θ}^{π_{1}} (x_{1}) d Q (θ), \end{matrix}

which proves that

(α) = (β)

, and hence, that:

m_{n}^{*} (x^{'}, x_{1}) = E_{Π_{n}} (F | π^{'} = x^{'}, π_{1} = x_{1}) .

Thus,

E_{Π_{n}} [{(m_{n}^{*} (x^{'}, x_{1}) - r_{θ} (x_{1}))}^{2}] \leq E_{Π_{n}} [{(m_{n} (x^{'}, x_{1}) - r_{θ} (x_{1}))}^{2}]

for any other estimator

m_{n}

of the regression curve

r_{θ}

, i.e.,

m_{n}^{*}

is the Bayes estimator of the regression curve

r_{θ} (x_{1}) : = E_{θ} (X_{2} | X_{1} = x_{1})

for the squared error loss function. □

Funding

This research was funded by the Junta de Extremaura (SPAIN), grant number GR21044.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A

Let us briefly recall some basic concepts about Markov kernels, mainly to fix the notation. In what follows,

(Ω, A)

,

(Ω_{1}, A_{1})

, and so on will denote measurable spaces.

Definition A1.

(1): (Markov kernel.) A Markov kernel $M_{1} : (Ω, A) ≻ ⟶ (Ω_{1}, A_{1})$ is a map $M_{1} : Ω \times A_{1} \to [0, 1]$ such that: (i) $\forall ω \in Ω$ , $M_{1} (ω, \cdot)$ is a probability measure on $A_{1}$ , and (ii) $\forall A_{1} \in A_{1}$ , $M_{1} (\cdot, A_{1})$ is $A$ -measurable.
(2): (Image of a Markov kernel.) The image (or probability distribution) of a Markov kernel $M_{1} : (Ω, A, P) ≻ ⟶ (Ω_{1}, A_{1})$ on a probability space is the probability measure $P^{M_{1}}$ on $A_{1}$ defined by $P^{M_{1}} (A_{1}) : = \int_{Ω} M_{1} (ω, A_{1}) d P (ω)$ .
(3): (Composition of Markov kernels.) Given two Markov kernels $M_{1} : (Ω_{1}, A_{1}) ≻ ⟶ (Ω_{2}, A_{2})$ and $M_{2} : (Ω_{2}, A_{2}) ≻ ⟶ (Ω_{3}, A_{3})$ , their composition is defined as the Markov kernel $M_{2} M_{1} : (Ω_{1}, A_{1}) ≻ ⟶ (Ω_{3}, A_{3})$ given by:

$M_{2} M_{1} (ω_{1}, A_{3}) = \int_{Ω_{2}} M_{2} (ω_{2}, A_{3}) M_{1} (ω_{1}, d ω_{2}) .$

Remark A1.

(1): (Markov kernels as extensions of the concept of random variable.) The concept of Markov kernel extends the concept of random variable (or measurable map). A random variable $T_{1} : (Ω, A, P) \to (Ω_{1}, A_{1})$ will be identified with the Markov kernel $M_{T_{1}} : (Ω, A; P) ≻ ⟶ (Ω_{1}, A_{1})$ defined by $M_{T_{1}} (ω, A_{1}) = δ_{T_{1} (ω)} (A_{1}) = I_{A_{1}} (T_{1} (ω))$ , where $δ_{T_{1} (ω)}$ denotes the Dirac measure—the degenerate distribution—at the point $T_{1} (ω)$ , and $I_{A_{1}}$ is the indicator function of the event $A_{1}$ . In particular, the probability distribution $P^{M_{T_{1}}}$ of $M_{T_{1}}$ coincides with the probability distribution $P^{T_{1}}$ of $T_{1}$ defined as $P^{T_{1}} (A_{1}) : = P (T_{1} \in A_{1})$ .
(2): Given a Markov kernel $M_{1} : (Ω_{1}, A_{1}) ≻ ⟶ (Ω_{2}, A_{2})$ and a random variable $X_{2} : (Ω_{2}, A_{2}) \to (Ω_{3}, A_{3})$ , we have that $M_{X_{2}} M_{1} (ω_{1}, A_{3}) = M_{1} (ω_{1}, X_{2}^{- 1} (A_{3})) = M_{1} {(ω_{1}, \cdot)}^{X_{2}} (A_{3}) .$ We write $X_{2} M_{1} : = M_{X_{2}} M_{1}$ .
(3): Given a Markov kernel $M_{1} : (Ω_{1}, A_{1}, P_{1}) ≻ ⟶ (Ω_{2}, A_{2})$ we write $P_{1} \otimes M_{1}$ for the only probability measure on the product σ-field $A_{1} \times A_{2}$ such that:

$(P_{1} \otimes M_{1}) (A_{1} \times A_{2}) = \int_{A_{1}} M_{1} (ω_{1}, A_{2}) d P_{1} (ω_{1}), A_{i} \in A_{i}, i = 1, 2 .$
(4): Given two r.v. $X_{i} : (Ω, A, P) \to (Ω_{i}, A_{i})$ , $i = 1, 2$ , we write $P^{X_{2} | X_{1}}$ for the conditional distribution of $X_{2}$ given $X_{1}$ , i.e., for the Markov kernel $P^{X_{2} | X_{1}} : (Ω_{1}, A_{1}) ≻ ⟶ (Ω_{2}, A_{2})$ such that:

$P^{(X_{1}, X_{2})} (A_{1} \times A_{2}) = \int_{A_{1}} P^{X_{2} | X_{1} = x_{1}} (A_{2}) d P^{X_{1}} (x_{1}), A_{i} \in A_{i}, i = 1, 2 .$

Hence, $P^{(X_{1}, X_{2})} = P^{X_{1}} \otimes P^{X_{2} | X_{1}}$ .

Let

(Ω, A, {P_{θ} : θ \in (Θ, T, Q)})

be a Bayesian statistical experiment, where Q denotes the prior distribution on the parameter space

(Θ, T)

. We assume that

P (θ, A) : = P_{θ} (A)

is a Markov kernel

P : (Θ, T) ≻ ⟶ (Ω, A)

. When needed, we shall assume that

P_{θ}

has a density (Radon-Nikodym derivative)

p_{θ}

with respect to a

σ

-finite measure

μ

on

A

and that the likelihood function

L (ω, θ) : = p_{θ} (ω)

is

A \times T

-measurable (this is sufficient to prove that P is a Markov kernel).

Let

Π : = Q \otimes P

, i.e.,

Π (A \times T) = \int_{T} P_{θ} (A) d Q (θ), A \in A, T \in T .

The prior predictive distribution is

β_{Q}^{*} : = Π^{I}

(the distribution of I with respect to

Π

), where

I (ω, θ) : = ω

. Thus:

β_{Q}^{*} (A) = \int_{Θ} P_{θ} (A) d Q (θ) .

The posterior distribution is a Markov kernel

P^{*} : (Ω, A) ≻ ⟶ (Θ, T)

such that:

Π (A \times T) = \int_{T} P_{θ} (A) d Q (θ) = \int_{A} P_{ω}^{*} (T) d β_{Q}^{*} (ω), A \in A, T \in T,

i.e., such that

Π = Q \otimes P = β_{Q}^{*} \otimes P^{*}

. In this way, the Bayesian statistical experiment can be identified with the probability space

(Ω \times Θ, A \times T, Π)

, as proposed, for instance, in [8].

It is well known that, for

ω \in Ω

, the posterior Q-density is proportional to the likelihood:

p_{ω}^{*} (θ) : = \frac{d P_{ω}^{*}}{d Q} (θ) = C (ω) p_{θ} (ω)

where

C (ω) = {[\int_{Θ} p_{θ} (ω) d Q (θ)]}^{- 1}

.

The posterior predictive distribution on

A

given

ω

is:

{P_{ω}^{*}}^{P} (A) = \int_{Θ} P_{θ} (A) d P_{ω}^{*} (θ), A \in A .

This is a Markov kernel:

P P^{*} (ω, A) : = {P_{ω}^{*}}^{P} (A) .

It is readily shown that the posterior predictive density is:

\frac{d {P_{ω}^{*}}^{P}}{d μ} (ω^{'}) = \int_{Θ} p_{θ} (ω^{'}) p_{ω}^{*} (θ) d Q (θ) .

We know from [1] that:

\int_{Ω \times Θ} sup_{A \in A} | {P_{ω}^{*}}^{P} (A) - P_{θ} {(A) |}^{2} d Π (ω, θ) \leq \int_{Ω \times Θ} sup_{A \in A} {| M (ω, A) - P_{θ} (A) |}^{2} d Π (ω, θ)

for every Markov kernel

M : (Ω, A) ≻ ⟶ (Ω, A)

, provided that

A

is separable (recall that a

σ

-field is said to be separable, or countably generated, if it contains a countable subfamily which generates it). We also have that, for a real statistic X with finite mean, the posterior predictive mean:

E_{{(P_{ω}^{*})}^{P}} (X) = \int_{Θ} \int_{Ω} X (ω^{'}) d P_{θ} (ω^{'}) d P_{ω}^{*} (θ)

which is the Bayes estimator of

f (θ) : = E_{θ} (X)

, as

E_{{(P_{ω}^{*})}^{P}} (X) = E_{P_{ω}^{*}} (E_{θ} (X))

.

Remark A2.

A notation more commonly used in the literature is the following:

p (θ)

(or

π (θ)

) denotes the prior distribution, and

p (y) : = \int p (y, θ) d θ

stands for the prior predictive distribution, while, for a future observation

\tilde{y}

, the posterior predictive distribution given the data y are denoted as

p (\tilde{y} | y) = \int p (\tilde{y}, θ | y) d θ = \int p (\tilde{y} | θ) p (θ | y) d θ

, where

p (θ | y)

refers to the posterior distribution. This is the form used in [3,11].

References

Nogales, A.G. On Bayesian estimation of densities and sampling distributions: The posterior predictive distribution as the Bayes estimator. Stat. Neerl. 2022, 76, 236–250. [Google Scholar] [CrossRef]
Lehmann, E.L.; Casella, G. Theory of Point Estimation, 2nd ed.; Springer: New York, NY, USA, 1998. [Google Scholar]
Gelman, A.; Carlin, J.B.; Stern, H.S.; Dunson, D.B.; Vehtari, A.; Rubin, D.B. Bayesian Data Analysis, 3rd ed.; CRC Press (Taylor & Francis Group): Boca Raton, FL, USA, 2014. [Google Scholar]
Geisser, S. Predictive Inference: An Introduction; Chapman & Hall: New York, NY, USA, 1993. [Google Scholar]
Rubin, D.B. Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Stat. 1984, 12, 1151–1172. [Google Scholar] [CrossRef]
Nadaraya, E.A. Nonparametric Estimation of Probability Densities and Regression Curves; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1989. [Google Scholar]
Ghosal, S.; Vaart, A.V.D. Fundamentals of Nonparametric Bayesian Inference; Cambridge University Press: Cambridge, UK, 2017. [Google Scholar]
Florens, J.P.; Mouchart, M.; Rolin, J.M. Elements of Bayesian Statistics; Marcel Dekker: New York, NY, USA, 1990. [Google Scholar]
Barra, J.R. Notions Fondamentales de Statistique Mathématique; Dunod: Paris, France, 1971. [Google Scholar]
Bogachev, V.I. Measure Theory, Vol. II; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
Ghosh, J.K.; Delampady, M.; Samanta, T. An Introduction to Bayesian Analysis, Theory and Methods; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nogales, A.G. Optimal Bayesian Estimation of a Regression Curve, a Conditional Density, and a Conditional Distribution. Mathematics 2022, 10, 1213. https://doi.org/10.3390/math10081213

AMA Style

Nogales AG. Optimal Bayesian Estimation of a Regression Curve, a Conditional Density, and a Conditional Distribution. Mathematics. 2022; 10(8):1213. https://doi.org/10.3390/math10081213

Chicago/Turabian Style

Nogales, Agustín G. 2022. "Optimal Bayesian Estimation of a Regression Curve, a Conditional Density, and a Conditional Distribution" Mathematics 10, no. 8: 1213. https://doi.org/10.3390/math10081213

APA Style

Nogales, A. G. (2022). Optimal Bayesian Estimation of a Regression Curve, a Conditional Density, and a Conditional Distribution. Mathematics, 10(8), 1213. https://doi.org/10.3390/math10081213

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimal Bayesian Estimation of a Regression Curve, a Conditional Density, and a Conditional Distribution

Abstract

1. Introduction

2. The Framework

3. Bayes Estimator of the Conditional Distribution

4. Bayes Estimator of the Conditional Density

5. Bayes Estimator of the Conditional Distribution Function

6. Bayes Estimator of a Regression Curve

7. Examples

8. Proofs

Funding

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI