Next Article in Journal
Limiting Distributions of a Non-Homogeneous Markov System in a Stochastic Environment in Continuous Time
Previous Article in Journal
A Note on the Concept of Time in Extensive Games
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimal Bayesian Estimation of a Regression Curve, a Conditional Density, and a Conditional Distribution

by
Agustín G. Nogales
Departamento de Matemáticas, IMUEx, Universidad de Extremadura, 06006 Badajoz, Spain
Mathematics 2022, 10(8), 1213; https://doi.org/10.3390/math10081213
Submission received: 20 February 2022 / Revised: 9 March 2022 / Accepted: 4 April 2022 / Published: 7 April 2022
(This article belongs to the Section D1: Probability and Statistics)

Abstract

:
In this paper, several related estimation problems are addressed from a Bayesian point of view, and optimal estimators are obtained for each of them when some natural loss functions are considered. The problems considered are the estimation of a regression curve, a conditional distribution function, a conditional density, and even the conditional distribution itself. These problems are posed in a sufficiently general framework to cover continuous and discrete, univariate and multivariate, and parametric and nonparametric cases, without the need to use a specific prior distribution. The loss functions considered come naturally from the quadratic error loss function commonly used in estimating a real function of the unknown parameter. The cornerstone of these Bayes estimators is the posterior predictive distribution. Some examples are provided to illustrate the results.

1. Introduction

In Statistics, the expression the probability of an event A (written P θ ( A ) ) is, in general, ambiguous, as it depends on the unknown parameter θ . Before conducting the experiment, a Bayesian statistician, provided with the prior distribution, possesses a natural candidate—the prior predictive probability of A—since it is the prior mean of the probabilities of A. However, in accordance with Bayesian philosophy, after the experiment has been performed and the data ω observed, a reasonable estimation is the posterior predictive probability of A given ω because it is the posterior mean of the probabilities of A given ω . It can be shown that not only is this the Bayes estimator of the probability P θ ( A ) of A for the squared error loss function but also that the posterior predictive distribution is the Bayes estimator of the sampling probability distribution P θ for the squared variation total loss function and that the posterior predictive density is the Bayes estimator of its density for the L 1 -squared loss function. Note that these loss functions should be considered natural in the sense that they are derived directly from the quadratic error loss function commonly used in the estimation of a real function of the parameter. Ref. [1] contains precise statements and proofs of these results, which are nothing but a functional generalization of Theorem 1.1 (more specifically of its Corollary 1.2.(a)) of [2], p. 228, which yields the Bayes estimator of a real function of the parameter for the squared error loss function.
This communication addresses the estimation of a regression curve and some related problems, such as the estimation of a conditional density or a conditional distribution function or even the conditional distribution itself from a Bayesian perspective. It should, therefore, be considered as the conditional counterpart of [1], and the results to be presented below as the functional extension of [2], Theorem 1.1, for the conditional case. Thus, it is unsurprising that the posterior predictive distribution is the cornerstone for the estimation problems to be discussed below. Some examples illustrating the results will be presented in Section 7. See [1] and the references therein for other examples of the determination of the posterior predictive distribution. In practice, however, the explicit evaluation of the posterior predictive distribution could well be cumbersome, and its simulation may become preferable. Ref. [3] is a good reference for such simulation methods, and hence, for the computation of the Bayes estimators of the conditional density and the regression curve.
The posterior predictive distribution has been presented as the base of Predictive Inference, which seeks to make inferences about a new unknown observation from the previous random sample in contrast with the greater emphasis that statistical inference, since its mathematical foundations in the early twentieth century, puts on parameter estimation and contrast (see [4] or [3]). With that idea in mind, it has also been used in other areas, such as model selection, testing for discordancy, goodness of fit, perturbation analysis, or classification (see additional fields of application in [4,5]), but never as a possible solution for the Bayesian problems of estimating an unconditional or conditional density. The reader is referred to the references within [1] for other uses of the posterior predictive distribution in Bayesian statistics.
To summarize the contribution of this work, I want to emphasize that the problems of estimating a density (conditional or not) or a regression curve are of central importance in Nonparametric Inference and Functional Data Analysis (for example, see [6] or [7], and the references they contain). Although nobody expects an optimal result for these problems in a frequentist environment, this article together with [1] produces optimal solutions for them in a Bayesian framework. The reader should note that these are not just theorems of existence and uniqueness of solutions; rather, on the contrary, the results obtained explicit formulas for the solutions based on the posterior predictive distribution. Note also that there is enough literature on how to calculate it, exactly or approximately.
Section 2 sets out the proper statistical framework for tackling the problems, i.e., the proper Bayesian experiment (conceived also as a probability space along the lines suggested by [8], for example.)
Section 3 deals with the problem of Bayesian estimation of a conditional distribution when the squared total variation loss function is used and Theorem 1 gives the Bayes estimator in terms of the posterior predictive distribution.
Section 4 takes advantage of Theorem 1 to solve the problem of the Bayesian estimation of a conditional density using the L 1 -squared loss function, obtaining the Bayes estimator of the conditional density (see Theorem 2).
Section 5 and Section 6 deal with the problems of Bayesian estimation of a conditional distribution function and a regression curve in the real case. Theorems 3 and 4 yieds the solutions.
Section 7 provide some examples to illustrate the application of all these theorems.
For ease of reading, the proofs are postponed until Section 8. This is followed by an appendix (Appendix A) explaining the notation and concepts used in the text.
We shall place ourselves from this point onwards in a general framework for Bayesian inference, as described in [9].

2. The Framework

Let ( Ω , A , { P θ : θ ( Θ , T , Q ) } ) be a Bayesian statistical experiment, and X i : ( Ω , A , { P θ : θ ( Θ , T , Q ) } ) ( Ω i , A i ) , i = 1 , 2 , two statistics. Consider the Bayesian experiment image of ( X 1 , X 2 ) :
( Ω 1 × Ω 2 , A 1 × A 2 , { P θ ( X 1 , X 2 ) : θ ( Θ , T , Q ) } ) .
In what follows, we shall assume that P ( X 1 , X 2 ) ( θ , A 12 ) : = P θ ( X 1 , X 2 ) ( A 12 ) , θ Θ , A 12 A 1 × A 2 is a Markov kernel and write R θ = P θ ( X 1 , X 2 ) .
The Bayesian experiment corresponding to a sample of size n of the joint distribution of ( X 1 , X 2 ) is:
( Ω 1 × Ω 2 ) n , ( A 1 × A 2 ) n , R θ n : θ ( Θ , T , Q ) .
We write R n ( θ , A 12 , n ) = R θ n ( A 12 , n ) for A 12 , n ( A 1 × A 2 ) n and
Π 12 , n : = Q R n
for the joint distribution of the parameter and the sample:
Π 12 , n ( A 12 , n × T ) = T R θ n ( A 12 , n ) d Q ( θ ) , A 12 , n ( A 1 × A 2 ) n , T T .
The corresponding prior predictive distribution β 12 , n * is:
β 12 , n * ( A 12 , n ) = Θ R θ n ( A 12 , n ) d Q ( θ ) , A 12 , n ( A 1 × A 2 ) n .
The posterior distribution is a Markov kernel:
R n * : ( ( Ω 1 × Ω 2 ) n , ( A 1 × A 2 ) n ) ( Θ , T )
such that, for all A 12 , n ( A 1 × A 2 ) n and T T ,
Π 12 , n ( A 12 , n × T ) = T R θ n ( A 12 , n ) d Q ( θ ) = A 12 , n R n * ( x , T ) d β 12 , n * ( x ) .
Let us write R n , x * ( T ) : = R n * ( x , T ) .
The posterior predictive distribution on A 1 × A 2 is the Markov kernel:
R n * R : ( ( Ω 1 × Ω 2 ) n , ( A 1 × A 2 ) n ) ( Ω 1 × Ω 2 , A 1 × A 2 )
defined, for x ( Ω 1 × Ω 2 ) n , by:
R n * R ( x , A 12 ) : = Θ R θ ( A 12 ) d R n , x * ( θ ) .
It follows that, with obvious notation:
Ω 1 × Ω 2 f ( x ) d R n , x * R ( x ) = Θ Ω 1 × Ω 2 f ( x ) d R θ ( x ) d R n , x * ( θ )
for any non-negative or integrable real random variable (r.r.v. for short) f.
We can also consider the posterior predictive distribution on ( A 1 × A 2 ) n defined as the Markov kernel:
R n * R n : ( ( Ω 1 × Ω 2 ) n , ( A 1 × A 2 ) n ) ( ( Ω 1 × Ω 2 ) n , ( A 1 × A 2 ) n )
such that:
R n * R n ( x , A 12 , n ) : = Θ R θ n ( A 12 , n ) d R n , x * ( θ ) .
According to Theorem 1 of [1], this is the Bayes estimator of the distribution R θ n for the squared total variation function:
( Ω 1 × Ω 2 ) n × Θ sup A 12 , n ( A 1 × A 2 ) n R n , x * R n ( A 12 , n ) R θ n ( A 12 , n ) 2 d Π 12 , n ( x , θ ) ( Ω 1 × Ω 2 ) n × Θ sup A 12 , n ( A 1 × A 2 ) n M ( x , A 12 , n ) R θ n ( A 12 , n ) 2 d Π 12 , n ( x , θ ) ,
for every Markov kernel M : ( Ω 1 × Ω 2 , A 1 × A 2 ) n ( Ω 1 × Ω 2 , A 1 × A 2 ) n .
It can be readily checked that:
R n , x * R n π 1 = R n , x * R ,
where π 1 ( x ) : = x 1 : = ( x 11 , x 21 ) for x ( Ω 1 × Ω 2 ) n . Then, Theorem 2 of [1] shows that:
( Ω 1 × Ω 2 ) n × Θ sup A 12 A 1 × A 2 R n , x * R ( A 12 ) R θ ( A 12 ) 2 d Π 12 , n ( x , θ ) ( Ω 1 × Ω 2 ) n × Θ sup A 12 A 1 × A 2 M ( x , A 12 ) R θ ( A 12 ) 2 d Π 12 , n ( x , θ ) ,
for every Markov kernel M : ( Ω 1 × Ω 2 , A 1 × A 2 ) n ( Ω 1 × Ω 2 , A 1 × A 2 ) .
We introduce some notation for ( x , x , θ ) ( Ω 1 × Ω 2 ) n × ( Ω 1 × Ω 2 ) × Θ :
π ( x , x , θ ) : = x , π i ( x , x , θ ) : = x i : = ( x i 1 , x i 2 ) , 1 i n , π ( x , x , θ ) : = x , π i ( x , x , θ ) : = x i , i = 1 , 2 , q ( x , x , θ ) : = θ .
Let us consider the probability space:
( ( Ω 1 × Ω 2 ) n × ( Ω 1 × Ω 2 ) × Θ , ( A 1 × A 2 ) n × ( A 1 × A 2 ) × T , Π n ) ,
where:
Π n ( A 12 , n × A 12 × T ) = T R θ ( A 12 ) R θ n ( A 12 , n ) d Q ( θ )
when A 12 , n ( A 1 × A 2 ) n , A 12 A 1 × A 2 and T T .
Thus, for a r.r.v. f on: ( ( Ω 1 × Ω 2 ) n × ( Ω 1 × Ω 2 ) × Θ , ( A 1 × A 2 ) n × ( A 1 × A 2 ) × T ) ,
f d Π n = Θ ( Ω 1 × Ω 2 ) n ( Ω 1 × Ω 2 ) f ( x , x , θ ) d R θ ( x ) d R θ n ( x ) d Q ( θ )
provided that the integral exists. Moreover, for a r.r.v. h on ( ( Ω 1 × Ω 2 ) × Θ , ( A 1 × A 2 ) × T ) :
h d Π n = Θ Ω 1 × Ω 2 h ( x , θ ) d R θ ( x ) d Q ( θ ) = Ω 1 × Ω 2 Θ h ( x , θ ) d R 1 , x * ( θ ) d β 12 , 1 * ( x ) .
The following proposition is straightforward.
Proposition 1.
Given A 12 , n ( A 1 × A 2 ) n , A 12 A 1 × A 2 and T T , we have that:
Π n ( π , q ) ( A 12 . n × T ) = Π 12 , n ( A 12 , n × T ) = T R θ n ( A 12 , n ) d Q ( θ ) = A 12 , n R n , x * ( T ) d β 12 , n * ( x ) , Π n ( π , q ) ( A 12 × T ) = Π 12 , 1 ( A 12 × T ) = T R θ ( A 12 ) d Q ( θ ) = A 12 R 1 , x * ( T ) d β 12 , 1 * ( x ) .
Moreover:
Π n q = Q , Π n ( π , q ) = Π 12 , n , Π n π = β 12 , n * , Π n ( π , q ) = Π 12 , 1 , Π n π = β 12 , 1 * , Π n π | q = θ = R θ n , Π n π | q = θ = R θ , Π n q | π = x = R n , x * , Π n q | π = x = R 1 , x * , P θ X 1 = R θ π 1 , P θ X 2 | X 1 = x 1 = R θ π 2 | π 1 = x 1 , E P θ ( X 2 | X 1 = x 1 ) = E R θ ( π 2 | π 1 = x 1 ) ,
where the last equality refers to the case where X 2 is a real statistic with a finite mean.
In particular, the probability space (4) contains all the basic ingredients of the Bayesian experiment (1), i.e., the prior distribution, the sampling probabilities, the posterior distributions, and the prior predictive distribution. In addition, it becomes the natural framework in which to address the estimation problems of this communication, as we shall see in what follows.

3. Bayes Estimator of the Conditional Distribution

An estimator of the conditional distribution P θ X 2 | X 1 from an n-sized sample of the joint distribution of ( X 1 , X 2 ) is a Markov kernel:
M : ( ( Ω 1 × Ω 2 ) n × Ω 1 , ( A 1 × A 2 ) n × A 1 ) ( Ω 2 , A 2 )
such that, for observed x = ( ( x 11 , x 21 ) , , ( x 1 n , x 2 n ) ) ( Ω 1 × Ω 2 ) n , M ( x , x 1 , · ) is a probability measure on A 2 that can be considered to be an estimation of the conditional distribution P θ X 2 | X 1 = x 1 for a given x 1 Ω 1 .
From a Bayesian point of view, the Bayes estimator of the conditional distribution P X 2 | X 1 = R π 2 | π 1 is a Markov kernel:
M : ( ( Ω 1 × Ω 2 ) n × Ω 1 , ( A 1 × A 2 ) n × A 1 ) ( Ω 2 , A 2 )
minimizing the Bayes risk:
( Ω 1 × Ω 2 ) n × Θ Ω 1 sup A 2 A 2 | M ( x , x 1 , A 2 ) R θ π 2 | π 1 = x 1 ( A 2 ) | 2 d R θ π 1 ( x 1 ) d Π 12 , n ( x , θ ) = Θ ( Ω 1 × Ω 2 ) n Ω 1 sup A 2 A 2 | M ( x , x 1 , A 2 ) R θ π 2 | π 1 = x 1 ( A 2 ) | 2 d R θ π 1 ( x 1 ) d R θ n ( x ) d Q ( θ ) = ( Ω 1 × Ω 2 ) n × ( Ω 1 × Ω 2 ) × Θ sup A 2 A 2 | M ( x , x 1 , A 2 ) R θ π 2 | π 1 = x 1 ( A 2 ) | 2 d Π n ( x , x , θ ) .
The following result yields the Bayes estimator of the conditional distribution P θ X 2 | X 1 from the posterior predictive distribution.
Theorem 1.
Assume that the σ-field A 2 is separable. Then, the conditional distribution of π 2 given π 1 = x 1 with respect to the posterior predictive distribution R n , x * R :
M n * ( x , x 1 , A 2 ) : = R n , x * R π 2 | π 1 = x 1 ( A 2 ) ,
is the Bayes estimator of the conditional distribution R π 2 | π 1 for the squared total variation loss function:
( Ω 1 × Ω 2 ) n + 1 × Θ sup A 2 A 2 | M n * ( x , x 1 , A 2 ) R θ π 2 | π 1 = x 1 ( A 2 ) | 2 d Π n ( x , x , θ ) ( Ω 1 × Ω 2 ) n + 1 × Θ sup A 2 A 2 | M ( x , x 1 , A 2 ) R θ π 2 | π 1 = x 1 ( A 2 ) | 2 d Π n ( x , x , θ )
for any estimator M of the conditional distribution R π 2 | π 1 .
Fix an event A 2 A 2 and define:
H A 2 ( x , x 1 , θ ) : = P θ X 2 | X 1 = x 1 ( A 2 ) = R θ π 2 | π 1 = x 1 ( A 2 ) .
Jensen’s inequality could help reach a proof of the theorem if the following result can be proved.
Lemma 1.
Given A 2 A 2 :
E Π n ( H A 2 | π = x , π 1 = x 1 ) = M n * ( x , x 1 , A 2 ) ,
i.e., for all A 12 , n ( A 1 × A 2 ) n and all A 1 A 1 :
A 12 , n × A 1 × Ω 2 × Θ R θ π 2 | π 1 = x 1 ( A 2 ) d Π n ( x , x , θ ) = A 12 , n × A 1 R n , x * R π 2 | π 1 = x 1 ( A 2 ) d Π n ( π , π 1 ) ( x , x 1 ) .

4. Bayes Estimator of the Conditional Density

When the joint distribution R θ = P θ ( X 1 , X 2 ) has a density f θ with respect to the product of two σ -finite measures μ 1 and μ 2 on A 1 and A 2 , resp., the conditional density is:
f θ X 2 | X 1 = x 1 ( x 2 ) : = f θ ( x 1 , x 2 ) f θ , X 1 ( x 1 )
for almost every x 1 , where f θ , X 1 ( x 1 ) stands for the marginal density of X 1 .
An estimator of the conditional density f θ X 2 | X 1 from an n-sized sample of the joint distribution of ( X 1 , X 2 ) is a map:
m : ( ( Ω 1 × Ω 2 ) n × Ω 1 × Ω 2 , ( A 1 × A 2 ) n × A 1 × A 2 ) ( R , R )
such that, being observed x = ( ( x 11 , x 21 ) , , ( x 1 n , x 2 n ) ) ( Ω 1 × Ω 2 ) n , m ( x , x 1 , · ) is considered to be an estimation of the conditional density f θ X 2 | X 1 = x 1 of X 2 given X 1 = x 1 .
It is well known (see, for instance [7], p. 126)) that, given two probability measures P 1 and P 2 on a measurable space ( Ω , A ) having densities p 1 and p 2 with respect to a σ -finite measure μ :
sup A A | P 1 ( A ) P 2 ( A ) | = 1 2 Ω | p 1 p 2 | d μ .
Thus, the Bayesian estimation of the conditional distribution P θ X 2 | X 1 = x 1 = R θ π 2 | π 1 = x 1 for the squared total variation loss function corresponds to the Bayesian estimation of its density f θ X 2 | X 1 = x 1 for the L 1 -squared loss function. Hence, according to Theorem 1, the Bayes estimator of the conditional density f θ X 2 | X 1 = x 1 for the L 1 -squared loss function is the μ 2 -density f n , x * X 2 | X 1 = x 1 of the conditional distribution:
R n , x * R π 2 | π 1 = x 1 .
Note that:
R n , x * R ( A 1 × A 2 ) = Θ R θ ( A 1 × A 2 ) d R n , x * ( θ ) = Θ A 1 × A 2 f θ ( x 1 , x 2 ) d ( μ 1 × μ 2 ) ( x 1 , x 2 ) r n , x * ( θ ) d Q ( θ ) = A 1 × A 2 Θ f θ ( x 1 , x 2 ) r n , x * ( θ ) d Q ( θ ) d ( μ 1 × μ 2 ) ( x 1 , x 2 )
where r n , x * ( θ ) denotes the Q-density of the posterior distribution R n , x * . Thus, r n , x * ( θ ) is of the form K ( x ) f n , θ ( x ) , where:
f n , θ ( x ) : = i = 1 n f θ ( x i )
is the ( μ 1 × μ 2 ) n -density of R θ n . Hence, the μ 1 × μ 2 -density of the posterior predictive distribution R x * R is:
f n , x * ( x 1 , x 2 ) : = Θ f θ ( x 1 , x 2 ) r n , x * ( θ ) d Q ( θ ) ,
and its first marginal is:
f n , x , 1 * ( x 1 ) : = Ω 2 Θ f θ ( x 1 , t ) r n , x * ( θ ) d Q ( θ ) d μ 2 ( t ) .
Thus, we have proved the following result.
Theorem 2.
Assume that A 2 is separable. The Bayes estimator of the conditional density f θ X 2 | X 1 for the L 1 -squared loss function is the μ 2 -density:
f n , x * X 2 | X 1 = x 1 ( x 2 ) : = f n , x * ( x 1 , x 2 ) f n , x , 1 * ( x 1 ) = Θ f θ ( x 1 , x 2 ) r n , x * ( θ ) d Q ( θ ) Ω 2 Θ f θ ( x 1 , t ) r n , x * ( θ ) d Q ( θ ) d μ 2 ( t )
of the conditional distribution R n , x * R π 2 | π 1 of π 2 given π 1 with respect to the posterior predictive distribution R n , x * R :
( Ω 1 × Ω 2 ) n + 1 × Θ Ω 2 f n , x * X 2 | X 1 = x 1 ( t ) f θ X 2 | X 1 = x 1 ( t ) d μ 2 ( t ) 2 d Π n ( x , x , θ ) ( Ω 1 × Ω 2 ) n + 1 × Θ Ω 2 m ( x , x 1 , t ) f θ X 2 | X 1 = x 1 ( t ) d μ 2 ( t ) 2 d Π n ( x , x , θ ) ,
for any estimator m of the conditional density.

5. Bayes Estimator of the Conditional Distribution Function

When X 2 is a r.r.v., we may be interested in the estimation of the conditional distribution function of X 2 given X 1 = x 1 :
F θ ( x 1 , t ) : = P θ ( X 2 t | X 1 = x 1 ) = R θ π 2 | π 1 = x 1 ( ] , t ] ) .
An estimator of such a conditional distribution function from an n-sized sample of R θ is a map of the form:
F : ( x , x 1 , t ) ( Ω 1 × R ) n × Ω 1 × R F ( x , x 1 , t ) : = M ( x , x 1 , ] , t ] ) [ 0 , 1 ]
for a Markov kernel:
M : ( ( Ω 1 × R ) n × Ω 1 , ( A 1 × R ) n × A 1 ) ( R , R ) .
An optimal estimator of the conditional distribution function F θ for the L -squared loss function from a Bayesian point of view (i.e., a Bayes estimator) is an estimator F n * minimizing the Bayes risk, i.e., such that:
( Ω 1 × R ) n + 1 × Θ sup t R | F n * ( x , x 1 , t ) F θ ( x 1 , t ) | 2 d Π n ( x , x , θ ) ( Ω 1 × R ) n + 1 × Θ sup t R | F ( x , x 1 , t ) F θ ( x 1 , t ) | 2 d Π n ( x , x , θ )
for any estimator F of the conditional distribution function F θ .
A natural candidate is the conditional distribution function for the posterior predictive distribution, as is stated in the following theorem.
Theorem 3.
The posterior predictive conditional distribution function:
F n * ( x , x 1 , t ) : = R n , x * R π 2 | π 1 = x 1 ( ] , t ] )
is the Bayes estimator of the conditional distribution function F θ for the L -squared loss function.

6. Bayes Estimator of a Regression Curve

Now assume that X 2 is a squared-integrable r.r.v. Thus, ( Ω 2 , A 2 ) = ( R , R ) . The regression curve of X 2 given X 1 is the map x 1 Ω 1 r θ ( x 1 ) : = E θ ( X 2 | X 1 = x 1 ) . An estimator of the regression curve r θ from a sample of size n of the joint distribution of ( X 1 , X 2 ) is a statistic:
m : ( x , x 1 ) ( Ω 1 × R ) n × Ω 1 m ( x , x 1 ) R ,
so that, being observed x ( Ω 1 × R ) n , m ( x , · ) is the estimation of r θ .
From a frequentist point of view, the simplest way to evaluate the error in estimating an unknown regression curve is to use the expectation of the quadratic deviation (see [6], p. 120):
E θ Ω 1 ( m ( x , x 1 ) r θ ( x 1 ) ) 2 d P θ X 1 ( x 1 ) = ( Ω 1 × R ) n Ω 1 ( m ( x , x 1 ) r θ ( x 1 ) ) 2 d R θ π 1 ( x 1 ) d R θ n ( x ) .
From a Bayesian point of view, the Bayes estimator of the regression curve r θ should minimize the Bayes risk (i.e., the prior mean of the expectation of the quadratic deviation):
Θ ( Ω 1 × R ) n Ω 1 ( m ( x , x 1 ) r θ ( x 1 ) ) 2 d R θ π 1 ( x 1 ) d R θ n ( x ) d Q ( θ ) = E Π n ( m ( x , x 1 ) r θ ( x 1 ) ) 2 .
The following result solves the problem of estimating the regression curve from a Bayesian point of view.
Theorem 4.
The regression curve of π 2 on π 1 with respect to the posterior predictive distribution R n , x * R :
m n * ( x , x 1 ) : = E R n , x * R ( π 2 | π 1 = x 1 )
is the Bayes estimator of the regression curve r θ ( x 1 ) : = E θ ( X 2 | X 1 = x 1 ) for the squared error loss function:
E Π n [ ( m n * ( x , x 1 ) r θ ( x 1 ) ) 2 ] E Π n [ ( m n ( x , x 1 ) r θ ( x 1 ) ) 2 ]
for any other estimator m n of the regression curve r θ .
Remark 1.
(Estimation of the regression curve when densities are available.) According to the previous results, when R θ has density f θ with respect to the product μ 1 × μ 2 of two σ-finite measures, the μ 2 -density f n , x * X 2 | X 1 = x 1 of the conditional distribution R n , x * R π 2 | π 1 = x 1 is:
f n , x * X 2 | X 1 = x 1 ( x 2 ) : = f n , x * ( x 1 , x 2 ) f n , x , 1 * ( x 1 ) = Θ f θ ( x 1 , x 2 ) r n , x * ( θ ) d Q ( θ ) R Θ f θ ( x 1 , t ) r n , x * ( θ ) d Q ( θ ) d μ 2 ( t ) ,
which is the Bayes estimator of the conditional density f θ X 2 | X 1 . Hence, the Bayes estimator of the regression curve can be computed as:
m n * ( x , x 1 ) : = E R n , x * R ( π 2 | π 1 = x 1 ) = R x 2 · f n , x * X 2 | X 1 = x 1 ( x 2 ) d μ 2 ( x 2 ) = R x 2 Θ f θ ( x 1 , x 2 ) r n , x * ( θ ) d Q ( θ ) d μ 2 ( x 2 ) R Θ f θ ( x 1 , x 2 ) r n , x * ( θ ) d Q ( θ ) d μ 2 ( x 2 ) .

7. Examples

Example 1.
Let us assume that, for θ , λ , x 1 > 0 , P θ X 1 = G ( 1 , θ 1 ) , P θ X 2 | X 1 = x 1 = G ( 1 , ( θ x 1 ) 1 ) , and Q = G ( 1 , λ 1 ) , where G ( α , β ) denotes the gamma distribution of parameters α , β > 0 (where β stands for the scale parameter). Hence, the joint density of X 1 and X 2 is:
f θ ( x 1 , x 2 ) = θ 2 x 1 exp { θ x 1 ( 1 + x 2 ) } I ] 0 , [ 2 ( x 1 , x 2 ) .
Then the density of R θ n is:
f n , θ ( x ) = θ 2 n · i = 1 n x i 1 · exp θ i = 1 n x i 1 ( 1 + x i 2 ) · I ] 0 , [ 2 n ( x ) ,
and the posterior Q-density given x is:
d R n , x * ( θ ) d Q = : r n , x * ( θ ) = K ( x ) f n , θ ( x )
where K ( x ) = [ 0 f n , θ ( x ) d Q ( θ ) ] 1 .
Hence, the posterior predictive density given x ] 0 , [ 2 n is:
f n , x * ( x ) = 0 f θ ( x ) r n , x * ( θ ) d Q ( θ ) = λ K ( x ) x 1 i = 1 n x i 1 0 θ 2 n + 2 exp θ [ λ + x 1 ( 1 + x 2 ) + i = 1 n x i 1 ( 1 + x i 2 ) ] d θ · I ] 0 , [ 2 ( x ) .
Since:
0 θ n exp { a θ } d θ = n ! a n + 1 ,
we have that:
f n , x * ( x ) = ( 2 n + 2 ) ! λ K ( x ) x 1 i = 1 n x i 1 [ λ + x 1 ( 1 + x 2 ) + i = 1 n x i 1 ( 1 + x i 2 ) ] ( 2 n + 3 )
and its first marginal is:
f n , x , 1 * ( x 1 ) = R f n , x * ( x 1 , x 2 ) d x 2 = 0 A ( B t + C ) m d t = A ( m 1 ) B C m 1
where:
m = 2 n + 3 , A = ( 2 n + 2 ) ! λ K ( x ) x 1 i = 1 n x i 1 , B = x 1 , a n d C = λ + x 1 + i = 1 n x i 1 ( 1 + x i 2 ) .
Thus:
f n , x , 1 * ( x 1 ) = ( 2 n + 2 ) ! λ K ( x ) x 1 i = 1 n x i 1 ( 2 n + 2 ) x 1 [ λ + x 1 + i = 1 n x i 1 ( 1 + x i 2 ) ] 2 n + 2 .
The Bayes estimator of the conditional density f θ X 2 | X 1 = x 1 ( x 2 ) = θ x 1 exp { θ x 1 x 2 } I ] 0 , [ ( x 2 ) is, for x 1 , x 2 > 0 :
f n , x * X 2 | X 1 = x 1 ( x 2 ) : = f n , x * ( x 1 , x 2 ) f n , x , 1 * ( x 1 ) = ( 2 n + 2 ) x 1 [ λ + x 1 + i = 1 n x i 1 ( 1 + x i 2 ) ] 2 n + 2 [ λ + x 1 ( 1 + x 2 ) + i = 1 n x i 1 ( 1 + x i 2 ) ] 2 n + 3 = ( 2 n + 2 ) x 1 a n ( x , x 1 ) 2 n + 2 ( x 1 x 2 + a n ( x , x 1 ) ) 2 n + 3
where:
a n ( x , x 1 ) = λ + x 1 + i = 1 n x i 1 ( 1 + x i 2 ) .
The Bayes estimator of the conditional distribution function:
F θ ( x 1 , t ) : = P θ ( X 2 t | X 1 = x 1 )
is, for t > 0 :
F n * ( x , x 1 , t ) = 0 t ( 2 n + 2 ) x 1 [ λ + x 1 + i = 1 n x i 1 ( 1 + x i 2 ) ] 2 n + 2 [ λ + x 1 ( 1 + x 2 ) + i = 1 n x i 1 ( 1 + x i 2 ) ] 2 n + 3 d x 2 = 0 t ( 2 n + 2 ) x 1 a n ( x , x 1 ) 2 n + 2 ( x 1 x 2 + a n ( x , x 1 ) ) 2 n + 3 d x 2 = a n ( x , x 1 ) 2 n + 2 1 a n ( x , x 1 ) 2 n + 2 1 ( x 1 t + a n ( x , x 1 ) ) 2 n + 2 = 1 1 + x 1 t a n ( x , x 1 ) 2 n 2 .
The Bayes estimator of the regression curve r θ ( x 1 ) : = E θ ( X 2 | X 1 = x 1 ) = 1 θ x 1 is, for x 1 > 0 :
m n * ( x , x 1 ) = 0 x 2 · f n , x * X 2 | X 1 = x 1 ( x 2 ) d x 2 = λ + x 1 + i = 1 n x i 1 ( 1 + x i 2 ) ( 2 n + 1 ) x 1 = a n ( x , x 1 ) ( 2 n + 1 ) x 1 .
Example 2.
Let us assume that X 1 has a Bernoulli distribution B i ( 1 , θ ) of unknown parameter θ ] 0 , 1 [ (i.e., P θ X 1 = B i ( 1 , θ ) ) and, given X 1 = k 1 { 0 , 1 } , X 2 has distribution B i ( 1 , 1 θ ) when k 1 = 0 and B i ( 1 , θ ) when k 1 = 1 , i.e., P θ X 2 | X 1 = k 1 = B i ( 1 , k 1 + ( 1 2 k 1 ) ( 1 θ ) ) . We can think of tossing a coin with probability θ of getting heads ( = 1 ) and making a second toss of this coin if it comes up heads on the first toss, or tossing a second coin with probability 1 θ of making heads if the first toss is tails (=0). Consider the uniform distribution on ] 0 , 1 [ as the prior distribution Q.
Then, the joint probability function of X 1 and X 2 is:
f θ ( k 1 , k 2 ) = θ k 1 ( 1 θ ) 1 k 1 [ k 1 + ( 1 2 k 1 ) ( 1 θ ) ] k 2 [ 1 k 1 ( 1 2 k 1 ) ( 1 θ ) ] 1 k 2 = θ ( 1 θ ) i f k 2 = 0 , ( 1 θ ) 2 i f k 1 = 0 , k 2 = 1 , θ 2 i f k 1 = 1 , k 2 = 1 .
The probability function of R θ n is:
f n , θ ( k ) = i = 1 n f θ ( k i ) = θ a n ( k ) ( 1 θ ) b n ( k )
for k = ( k 1 , , k n ) = ( k 11 , k 12 , , k n 1 , k n 2 ) { 0 , 1 } 2 n , where:
a n ( k ) = n + 0 ( k ) + 2 n 11 ( k ) , b n ( k ) = n + 0 ( k ) + 2 n 01 ( k ) ,
being n j 1 j 2 ( k ) the number of indices i { 1 , , n } such that ( k i 1 , k i 2 ) = ( j 1 , j 2 ) and n + j = n 0 j + n 1 j for j = 0 , 1 . Note that a n ( k ) + b n ( k ) = 2 n .
Hence, the posterior Q-density given k is:
r n , k * ( θ ) = d R n , k * d Q ( θ ) = K ( k ) f n , θ ( k ) = K ( k ) θ a n ( k ) ( 1 θ ) b n ( k )
where:
K ( k ) = 0 1 f n , θ ( k ) d Q ( θ ) 1 = 1 B ( a n ( k ) + 1 , b n ( k ) + 1 ) ,
where B ( α , β ) stands for the beta function.
Thus, the posterior predictive density given k { 0 , 1 } 2 n is:
f n , k * ( k 1 , k 2 ) = 0 1 f θ ( k 1 , k 2 ) r n , k * ( θ ) d Q ( θ ) = K ( k ) 0 1 θ a n ( k ) + k 1 ( 1 θ ) b n ( k ) + k 2 d θ = K ( k ) B ( a n ( k ) + k 1 + 1 , b n ( k ) + k 2 + 1 ) ,
and its first marginal is:
f n , k , 1 * ( k 1 ) = K ( k ) [ B ( a n ( k ) + k 1 + 1 , b n ( k ) + 1 ) + B ( a n ( k ) + k 1 + 1 , b n ( k ) + 2 ) ] .
The Bayes estimator of the conditional probability function:
f θ X 2 | X 1 = k 1 ( k 2 ) = [ k 1 + ( 1 2 k 1 ) ( 1 θ ) ] k 2 [ 1 k 1 ( 1 2 k 1 ) ( 1 θ ) ] 1 k 2
is
f n , x * X 2 | X 1 = k 1 ( k 2 ) : = f n , k * ( k 1 , k 2 ) f n , k , 1 * ( k 1 ) = 2 n + 2 2 n + n + 0 ( k ) + 2 n 01 ( k ) + 3 i f k 1 = k 2 = 0 , n + 0 ( k ) + 2 n 01 ( k ) + 1 2 n + n + 0 ( k ) + 2 n 01 ( k ) + 3 i f k 1 = 0 , k 2 = 1 , 2 n + 3 2 n + n + 0 ( k ) + 2 n 01 ( k ) + 4 i f k 1 = 1 , k 2 = 0 , n + 0 ( k ) + 2 n 01 ( k ) + 1 2 n + n + 0 ( k ) + 2 n 01 ( k ) + 4 i f k 1 = k 2 = 1 .
The Bayes estimator of the conditional mean r θ ( k 1 ) : = E θ ( X 2 | X 1 = k 1 ) = θ k 1 ( 1 θ ) 1 k 1 is, for k 1 = 0 , 1 :
m n * ( k , k 1 ) = f n , k * X 2 | X 1 = k 1 ( 1 ) = n + 0 ( k ) + 2 n 01 ( k ) + 1 2 n + n + 0 ( k ) + 2 n 01 ( k ) + 3 i f k 1 = 0 , n + 0 ( k ) + 2 n 01 ( k ) + 1 2 n + n + 0 ( k ) + 2 n 01 ( k ) + 4 i f k 1 = 1 .
Example 3.
Let ( X 1 , X 2 ) have bivariate normal distribution with density:
f θ ( x ) : = 1 2 π σ 2 1 ρ 2 exp 1 2 σ 2 ( 1 ρ 2 ) [ ( x 1 θ ) 2 2 ρ ( x 1 θ ) ( x 2 θ ) + ( x 2 θ ) 2 ] = 1 2 π σ 2 1 ρ 2 exp 1 2 σ 2 ( 1 ρ 2 ) [ x 1 2 + x 2 2 2 ρ x 1 x 2 2 ( 1 ρ ) ( x 1 + x 2 ) θ + 2 ( 1 ρ ) θ 2 ] ,
where σ > 0 and ρ ] 1 , 1 [ are assumed to be known. Thus:
R θ = N 2 θ θ , σ 2 1 ρ ρ 1 , X 1 , X 2 θ N ( θ , σ 2 ) P θ X 2 | X 1 = x 1 = N ( ( 1 ρ ) θ + ρ x 1 , σ 2 1 ρ 2 ) , E θ ( X 2 | X 1 = x 1 ) = ( 1 ρ ) θ + ρ x 1 .
Hence, for x = ( x 1 , , x n ) = ( x 11 , x 12 , , x n 1 , x n 2 ) ( R 2 ) n :
f n , θ ( x ) = i = 1 n f θ ( x i ) = 1 [ 2 π σ 2 1 ρ 2 ] n exp 1 2 σ 2 ( 1 ρ 2 ) i = 1 n ( x i 1 θ ) 2 2 ρ ( x i 1 θ ) ( x i 2 θ ) + ( x i 2 θ ) 2 .
Let us consider the prior distribution Q = N ( μ , τ 2 ) whose density is:
g ( θ ) = 1 τ 2 π exp 1 2 τ 2 ( θ μ ) 2 .
The posterior Q-density is:
r n , x * ( θ ) : = d R n , x * d Q ( θ ) = K 1 ( x ) f n , θ ( x ) = K 1 ( x ) [ 2 π σ 2 1 ρ 2 ] n exp 1 2 σ 2 ( 1 ρ 2 ) [ 2 n ( 1 ρ ) θ 2 2 ( 1 ρ ) s 1 ( x ) θ + s 2 ( x ) 2 ρ p ( x ) ]
where:
s 1 ( x ) : = i ( x i 1 + x i 2 ) , s 2 ( x ) : = i ( x i 1 2 + x i 2 2 ) , p ( x ) = i x i 1 x i 2 , K 1 ( x ) = R f n , x ( θ ) d Q ( θ ) 1 .
Note that, writing c n ( ρ , σ , τ ) = [ τ 2 π ( 2 π σ 2 1 ρ 2 ) n ] 1 :
R f n , x ( θ ) d Q ( θ ) = c n ( ρ , σ , τ ) · R 1 2 σ 2 ( 1 ρ 2 ) [ 2 n ( 1 ρ ) θ 2 2 ( 1 ρ ) s 1 ( x ) θ + s 2 ( x ) 2 ρ p ( x ) ] 1 2 τ 2 ( θ μ ) 2 ] d θ = c n ( ρ , σ , τ ) R exp { ( A 1 θ 2 B 1 ( x ) θ + C 1 ( x ) ) } d θ = exp C 1 ( x ) B 1 2 ( x ) 4 A 1 τ 2 π [ 2 π σ 2 1 ρ 2 ] n R exp A 1 θ B 1 ( x ) 2 A 1 2 d θ = exp C 1 ( x ) B 1 2 ( x ) 4 A 1 τ 2 A 1 [ 2 π σ 2 1 ρ 2 ] n
where:
A 1 = n σ 2 ( 1 + ρ ) + 1 2 τ 2 , B 1 ( x ) = s 1 ( x ) σ 2 ( 1 + ρ ) + μ τ 2 , C 1 ( x ) = s 2 ( x ) 2 ρ p ( x ) 2 σ 2 ( 1 ρ 2 ) + μ 2 2 τ 2 .
Hence:
K 1 ( x ) = τ 2 A 1 2 π σ 2 1 ρ 2 n exp C 1 ( x ) B 1 2 ( x ) 4 A 1 .
The posterior predictive density given x ( R 2 ) n is:
f n , x * ( x ) : = R f θ ( x ) r n , x * ( θ ) g ( θ ) d θ = K 1 ( x ) τ 2 π [ 2 π σ 2 1 ρ 2 ] n + 1 · R exp 2 ( n + 1 ) ( 1 ρ ) θ 2 2 ( 1 ρ ) ( x 1 + x 2 + s 1 ( x ) ) θ + ( x 1 2 + x 2 2 + s 2 ( x ) ) 2 ρ ( x 1 x 2 + p ( x ) ) 2 σ 2 ( 1 ρ 2 ) ( θ 2 2 μ θ + μ 2 ) 2 τ 2 d θ = K 2 ( x ) R exp { ( A 2 θ 2 B 2 ( x , x ) θ + C 2 ( x , x ) ) } d θ = K 2 ( x ) exp C 2 ( x , x ) B 2 2 ( x , x ) 4 A 2 R exp A 2 θ B 2 ( x , x ) 2 A 2 2 d θ = K 2 ( x ) π A 2 exp C 2 ( x , x ) B 2 2 ( x , x ) 4 A 2
where:
K 2 ( x ) = K 1 ( x ) τ 2 π [ 2 π σ 2 1 ρ 2 ] n + 1 = A 1 exp C 1 ( x ) B 1 2 ( x ) 4 A 1 2 π 3 / 2 σ 2 1 ρ 2 , A 2 = ( n + 1 ) ( 1 + ρ ) σ 2 + 1 2 τ 2 B 2 ( x . x ) = x 1 + x 2 + s 1 ( x ) σ 2 ( 1 + ρ ) + μ τ 2 , C 2 ( x , x ) = x 1 2 + x 2 2 + s 2 ( x ) 2 ρ ( x 1 x 2 + p ( x ) ) 2 σ 2 ( 1 ρ 2 ) + μ 2 τ 2
We can write:
C 2 ( x , x ) B 2 2 ( x , x ) 4 A 2 = A 3 x 1 2 + A 3 x 2 2 + B 3 x 1 x 2 + C 3 ( x ) ( x 1 + x 2 ) + D 3 ( x )
where:
A 3 = 1 2 σ 2 ( 1 ρ 2 ) τ 2 σ 2 ( 1 + ρ ) 2 [ 4 ( n + 1 ) ( 1 + ρ ) τ 2 + 2 σ 2 ] , B 3 = ρ σ 2 ( 1 ρ 2 ) τ 2 σ 2 ( 1 + ρ ) 2 [ 2 ( n + 1 ) ( 1 + ρ ) τ 2 + σ 2 ] , C 3 ( x ) = τ 2 s 1 ( x ) + μ σ 2 ( 1 + ρ ) σ 2 ( 1 + ρ ) 2 [ 2 ( n + 1 ) ( 1 + ρ ) τ 2 + σ 2 ] , D 3 ( x ) = τ 2 s 2 ( x ) 2 ρ τ 2 p ( x ) + 2 μ 2 σ 2 ( 1 ρ 2 ) 2 τ 2 σ 2 ( 1 ρ 2 ) τ 4 s 1 ( x ) 2 + μ 2 σ 4 ( 1 + ρ ) 2 + 2 μ σ 2 ( 1 + ρ ) s 1 ( x ) τ 2 σ 2 ( 1 + ρ ) 2 [ 4 ( n + 1 ) ( 1 + ρ ) τ 2 + 2 σ 2 ] .
It is readily shown that A 3 > 0 . It follows that the posterior predictive density f n , x * is the density of a normal bivariate distribution N 2 ( m ( x ) , Σ ) where:
m ( x ) = m 1 ( x ) m 2 ( x ) , Σ = σ 1 2 1 ρ 1 ρ 1 1
being:
ρ 1 = B 3 2 A 3 , σ 1 2 = 2 A 3 4 A 3 2 B 3 2 , m 1 ( x ) = m 2 ( x ) = C 3 ( x ) 2 ( 1 ρ 1 ) .
It is easy to see that, as was to be expected, | ρ 1 | < 1 and σ 1 2 > 0 . If we denote:
a n ( ρ , σ , τ ) : = 2 ( n + 1 ) ( 1 + ρ ) + σ 2 τ 2
we can write:
ρ 1 = a n ( ρ , σ , τ ) + 1 ρ 1 + ρ a n ( ρ , σ , τ ) 1 ρ 1 + ρ · ρ , σ 1 2 = a n ( ρ , σ , τ ) a n ( ρ , σ , τ ) 1 ρ 1 + ρ · σ 2 , m 1 ( x ) = m 2 ( x ) = s 1 ( x ) + ( 1 + ρ ) σ 2 τ 2 μ 2 ( 1 ρ 1 ) ( 1 + ρ ) 2 σ 2 a n ( ρ , σ , τ ) .
It follows that the conditional distribution:
R n , x * R π 2 | π 1 = x 1 : = N ( 1 ρ 1 ) m 1 ( x ) + ρ 1 x 1 , σ 1 2 ( 1 ρ 1 2 )
is the Bayes estimator of the conditional distribution:
P θ X 2 | X 1 = x 1 = N ( 1 ρ ) θ + ρ x 1 , σ 2 ( 1 ρ 2 )
for the squared total variation function, and its density f n , x * π 2 | π 1 = x 1 is the Bayes estimator of the conditional density:
f θ X 2 | X 1 = x 1 ( x 2 ) = 1 σ 2 π ( 1 ρ 2 ) exp 1 2 σ 2 ( 1 ρ 2 ) [ x 2 ( 1 ρ ) θ ρ x 1 ] 2
for the L 1 -squared loss function.Moreover, its mean:
E R n , x * R ( π 2 | π 1 = x 1 ) = ( 1 ρ 1 ) m 1 ( x ) + ρ 1 x 1
is the Bayes estimator of the regression curve:
E θ ( X 2 | X 1 = x 1 ) = ( 1 ρ ) θ + ρ x 1
for the squared error loss function.

8. Proofs

Proof of Proposition 1.
It follows from (5) and (2) that Π n ( π , q ) = Π 12 , n because, for every A 12 , n ( A 1 × A 2 ) n and T T :
Π n ( π , q ) ( A 12 , n × T ) = Π n ( ( π , q ) 1 ( A 12 , n × T ) ) = Π n ( A 12 , n × ( Ω 1 × Ω 2 ) × T ) = T R θ n ( A 12 , n ) d Q ( θ ) = Π 12 , n ( A 12 , n × T ) .
Analogously, we can show that Π n ( π , q ) = Π 12 , 1 . Furthermore, (5) also proves that:
Π n q ( T ) = T R θ ( Ω 1 × Ω 2 ) R θ n ( ( Ω 1 × Ω 2 ) n ) d Q ( θ ) = Q ( T ) ,
and:
Π n π ( A 12 , n ) = Θ R θ ( Ω 1 × Ω 2 ) R θ n ( A 12 , n ) d Q ( θ ) = β 12 , n * ( A 12 , n ) ,
because of (2 ). Analogously, it can be proved that Π n π = β 12 , 1 * . Moreover, the identity:
Π n π | q = θ = R θ n
follows from the definition of the conditional distribution and the facts that Π n q = Q and
Π n ( π A 12 , n , q T ) = T R θ n ( A 12 , n ) d Q ( θ ) .
A similar reasoning proves that Π n π | q = θ = R θ . From the definition of the posterior distribution we also have that:
Π n ( π , q ) ( A 12 , n × T ) = Π n ( π A 12 , n , q T ) = A 12 , n R n , x * ( T ) d β 12 , n * ( x )
which proves that Π n q | π = x = R n , x * as Π n π = β 12 , n * . In the same manner, we get Π n q | π = x = R 1 , x * . Now it is clear that, given A i A i , i = 1 , 2 :
P X 1 ( A 1 ) = P θ ( X 1 , X 2 ) ( A 1 × Ω 2 ) = R θ ( A 1 × Ω 2 ) = R θ π 1 ( A 1 )
and that:
A 1 P θ X 2 | X 1 = x 1 ( A 2 ) d P θ X 1 ( x 1 ) = P θ ( X 1 , X 2 ) ( A 1 × A 2 ) = R θ ( A 1 × A 2 ) = A 1 R θ π 2 | π 1 = x 1 ( A 2 ) d R θ π 1 ( x 1 ) ,
so P θ X 2 | X 1 = x 1 = R θ π 2 | π 1 = x 1 . Finally, by definition of the conditional expectation, given A 1 A 1 , we have that, when Ω 2 = R :
A 1 × R x 2 d R θ ( x 1 , x 2 ) = A 1 E R θ ( π 2 | π 1 = x 1 ) d R θ π 1 ( x 1 ) .
However, R θ π 1 = P θ X 1 and:
A 1 × R x 2 d R θ ( x 1 , x 2 ) = X 1 1 ( A 1 ) X 2 d P θ .
This proves that E R θ ( π 2 | π 1 = x 1 ) = E P θ ( X 2 | X 1 = x 1 ) . □
Proof of Lemma 1.
We must show that, for all A 12 , n ( A 1 × A 2 ) n and all A 1 A 1 :
A 12 , n × A 1 × Ω 2 × Θ R θ π 2 | π 1 = x 1 ( A 2 ) d Π n ( x , x , θ ) = A 12 , n × A 1 R n , x * R π 2 | π 1 = x 1 ( A 2 ) d Π n ( π , π 1 ) ( x , x 1 ) .
According to (6), we have that, for all A 12 , n ( A 1 × A 2 ) n and all A 1 A 1 :
A 12 , n × A 1 × Ω 2 × Θ R θ π 2 | π 1 = x 1 ( A 2 ) d Π n ( x , x , θ ) = Θ A 12 , n A 1 R θ π 2 | π 1 = x 1 ( A 2 ) d R θ π 1 ( x 1 ) d R θ n ( x ) d Q ( θ ) = Θ R θ n ( A 12 , n ) R θ ( A 1 × A 2 ) d Q ( θ ) .
Note that, by definition of conditional distribution:
R n , x * R ( A 1 × A 2 ) = A 1 R n , x * R π 2 | π 1 = x 1 ( A 2 ) d R n , x * R π 1 ( x 1 ) ,
and, by definition of posterior predictive distribution:
R n , x * R ( A 1 × A 2 ) = Θ R θ ( A 1 × A 2 ) d R n , x * ( θ ) .
Note that, for any A 12 , n ( A 1 × A 2 ) n , being R n , x * = Π n q | π = x :
A 12 , n Θ R θ ( A 1 × A 2 ) d R n , x * ( θ ) d Π n π ( x ) = A 12 , n Θ R θ ( A 1 × A 2 ) d Π n q | π = x ( θ ) d Π n π ( x ) = A 12 , n × Θ R θ ( A 1 × A 2 ) d Π n ( π , q ) ( x , θ ) = π 1 ( A 12 , n ) R θ ( A 1 × A 2 ) d Π n ( x , x , θ ) ,
proving that:
Θ R θ ( A 1 × A 2 ) d R n , x * ( θ ) = E Π n ( r A 1 × A 2 | π = x )
where r A 1 × A 2 ( θ ) : = R θ ( A 1 × A 2 ) , and hence:
R n , x * R ( A 1 × A 2 ) = E Π n ( r A 1 × A 2 | π = x ) .
Thus, by definition of conditional expectation:
A 12 , n R n , x * R ( A 1 × A 2 ) d Π n π ( x ) = A 12 , n × Ω 1 × Ω 2 × Θ R θ ( A 1 × A 2 ) d Π n ( x , x , θ ) .
However, the second term of this equation is:
A 12 , n × Ω 1 × Ω 2 × Θ R θ ( A 1 × A 2 ) d Π n ( x , x , θ ) = Θ A 12 , n R θ ( A 1 × A 2 ) d R θ n ( x ) d Q ( θ ) = Θ R θ n ( A 12 , n ) R θ ( A 1 × A 2 ) d Q ( θ )
which coincides with (8). This proves the lemma. □
Proof of Theorem 1.
Denoting by · 2 the usual norm on the Hilbert space L 2 ( ( Ω 1 × Ω 2 ) n + 1 × Θ , ( A 1 × A 2 ) n + 1 × T , Π n ) , as a consequence of Jensen’s inequality we have that, given A 2 A 2 :
H A 2 ( x , x 1 , θ ) M n * ( x , x 1 , A 2 ) 2 H A 2 ( x , x 1 , θ ) G ( x , x 1 ) 2
for any real measurable function G on ( ( Ω 1 × Ω 2 ) × Ω 1 , A 1 × A 2 ) × A 1 ) . In particular, for any Markov kernel:
M : ( ( Ω 1 × Ω 2 ) n × Ω 1 , ( A 1 × A 2 ) × A 1 ) ( Ω 2 , A 2 ) ,
we have:
H A 2 ( x , x 1 , θ ) M n * ( x , x 1 , A 2 ) 2 H A 2 ( x , x 1 , θ ) M ( x , x 1 , A 2 ) 2 ,
i.e.,
( Ω 1 × Ω 2 ) n + 1 × Θ | M n * ( x , x 1 , A 2 ) R θ π 2 | π 1 = x 1 ( A 2 ) | 2 d Π n ( x , x , θ ) ( Ω 1 × Ω 2 ) n + 1 × Θ | M ( x , x 1 , A 2 ) R θ π 2 | π 1 = x 1 ( A 2 ) | 2 d Π n ( x , x , θ )
When the σ -field A 2 is separable, there exists a countable field A 20 such that:
C ( x , x 1 , θ ) : = sup A 2 A 2 | M n * ( x , x 1 , A 2 ) R θ π 2 | π 1 = x 1 ( A 2 ) | 2 = sup A 2 A 20 | M n * ( x , x 1 , A 2 ) R θ π 2 | π 1 = x 1 ( A 2 ) | 2 .
Hence, C ( x , x 1 , θ ) is ( A 1 × A 2 ) n × A 1 × T -measurable. Moreover, given k N , there exists A 2 k ( x , x 1 , θ ) A 20 such that:
C ( x , x 1 , θ ) 1 k | M n * ( x , x 1 , A 2 k ( x , x 1 , θ ) ) R θ π 2 | π 1 = x 1 ( A 2 k ( x , x 1 , θ ) ) | 2 .
Now we appeal to the Ryll-Nardzewski and Kuratowski measurable selection theorem as it appears in [10], p. 36. With the notation of that book, we take ( ( Ω 1 × Ω 2 ) n × Ω 1 × Θ , ( A 1 × A 2 ) n × A 1 × T ) to be the measurable space ( Ω , B ) , and A 20 (the countable field generating A 2 ) to be the complete separable metric space X. Given k N , let us consider the map S k : ( Ω 1 × Ω 2 ) n × Ω 1 × Θ P ( X ) defined by:
S k ( x , x 1 , θ ) : = A 2 A 20 : C ( x , x 1 , θ ) 1 k | M n * ( x , x 1 , A 2 ) R θ π 2 | π 1 = x 1 ( A 2 ) | 2
as the map Ψ of that book. We have that S k ( x , x 1 , θ ) A 20 and S k ( x , x 1 , θ ) is closed for the discrete topology on A 20 . Moreover, given an open set U A 20 ,
{ ( x , x 1 , θ ) : S k ( x , x 1 , θ ) U } ( A 1 × A 2 ) n × A 1 × T ,
because, given A 2 A 20 :
{ ( x , x 1 , θ ) : S k ( x , x 1 , θ ) A 2 } = ( x , x 1 , θ ) : C ( x , x 1 , θ ) | M n * ( x , x 1 , A 2 ) R θ π 2 | π 1 = x 1 ( A 2 ) | 2 1 k .
Thus, according to the aforementioned measurable selection theorem, there exists a measurable map:
s k : ( ( Ω 1 × Ω 2 ) n × Ω 1 × Θ , ( A 1 × A 2 ) n × A 1 × T ) ( A 20 , P ( A 20 ) )
such that:
s k ( x , x 1 , θ ) S k ( x , x 1 , θ )
for every ( x , x 1 , θ ) , or, which is the same:
C ( x , x 1 , θ ) 1 k | M n * ( x , x 1 , s k ( x , x 1 , θ ) ) R θ π 2 | π 1 = x 1 ( s k ( x , x 1 , θ ) ) | 2 .
Hence:
( Ω 1 × Ω 2 ) n + 1 × Θ C ( x , x 1 , θ ) d Π n ( x , x , θ ) ( Ω 1 × Ω 2 ) n + 1 × Θ | M n * ( x , x 1 , s k ( x , x 1 , θ ) ) R θ π 2 | π 1 = x 1 ( s k ( x , x 1 , θ ) ) | 2 d Π n ( x , x , θ ) + 1 k ( Ω 1 × Ω 2 ) n + 1 × Θ sup A 2 A 2 | M ( x , x 1 , A 2 ) R θ π 2 | π 1 = x 1 ( A 2 ) | 2 d Π n ( x , x , θ ) + 1 k .
With k being arbitrary, this shows that M n * is the Bayes estimator of the conditional distribution P X 2 | X 1 ( = R π 2 | π 1 ) for the squared total variation loss function. □
Proof of Theorem 3.
According to (11), taking A 2 = ] , t ] , we have that, given t R :
F θ ( x , x 1 , t ) F n * ( x , x 1 , t ) 2 F θ ( x , x 1 , t ) F ( x , x 1 , t ) 2
for every estimator F of the conditional distribution function F θ , where:
F n * ( x , x 1 , t ) : = M n * ( x , x 1 , ] , t ] ) = R n , x * R π 2 | π 1 = x 1 ( ] , t ] ) ,
i.e., for any t and every estimator F of F θ :
( Ω 1 × R ) n + 1 × Θ | F n * ( x , x 1 , t ) F θ ( x 1 , t ) | 2 d Π n ( x , x , θ ) ( Ω 1 × R ) n + 1 × Θ | F ( x , x 1 , t ) F θ ( x 1 , t ) | 2 d Π n ( x , x , θ )
Let C ( x , x 1 , θ ) : = sup t R | F n * ( x , x 1 , t ) F θ ( x 1 , t ) | 2 . Since:
sup t R | F n * ( x , x 1 , t ) F θ ( x 1 , t ) | = sup r Q | F n * ( x , x 1 , r ) F θ ( x 1 , r ) | ,
we have that, given ( x , x 1 , θ ) and k N , there exists r k Q such that:
C ( x , x 1 , θ ) | F n * ( x , x 1 , r k ) F θ ( x 1 , r k ) | 2 + 1 k
and hence:
( Ω 1 × R ) n + 1 × Θ C ( x , x 1 , θ ) d Π n ( x , x , θ ) ( Ω 1 × R ) n + 1 × Θ | F n * ( x , x 1 , r k ) F θ ( x 1 , r k ) | 2 d Π n ( x , x , θ ) + 1 k ( Ω 1 × R ) n + 1 × Θ sup t R | F ( x , x 1 , t ) F θ ( x 1 , t ) | 2 d Π n ( x , x , θ ) + 1 k .
Note that r k = r k ( x , x 1 , θ ) , and a judicious use of the aforementioned measurable selection theorem could solve the measure-theoretical technicalities in these inequalities, as in the proof of Theorem 1. With k being arbitrary, the result is proved. □
Proof of Theorem 4.
To prove the result, Jensen’s inequality could be helpful if we were to show that:
m n * ( x , x 1 ) = E Π n ( F | π = x , π 1 = x 1 ) ,
where F ( x , x , θ ) : = r θ ( x 1 ) ( = E P θ ( X 2 | X 1 = x 1 ) = E R θ ( π 2 | π 1 = x 1 ) ) . By definition of conditional expectation, this is equivalent to proving that, for all A 12 , n ( A 1 × R ) n and A 1 A 1 :
A 12 , n × A 1 × R × Θ F ( x , x , θ ) d Π n ( x , x , θ ) = A 12 , n × A 1 m 1 * ( x , x 1 ) d Π n ( π , π 1 ) ( x , x 1 ) ,
or, which is the same:
A 12 , n × A 1 × R × Θ E R θ ( π 2 | π 1 = x 1 ) d Π n ( x , x , θ ) = A 12 , n × A 1 E R n , x * R ( π 2 | π 1 = x 1 ) d Π n ( π , π 1 ) ( x , x 1 ) .
Let us write ( α ) for the first term in this equality and ( β ) for the second.
Note that:
( α ) = A 12 , n × A 1 × R × Θ E R θ ( π 2 | π 1 = x 1 ) d Π n ( x , x , θ ) = Θ A 12 , n A 1 E R θ ( π 2 | π 1 = x 1 ) d R θ π 1 ( x 1 ) d R θ n ( x ) d Q ( θ ) = Θ R θ n ( A 12 , n ) A 1 E R θ ( π 2 | π 1 = x 1 ) d R θ π 1 ( x 1 ) d Q ( θ ) .
Moreover:
( β ) = A 12 , n × A 1 E R n , x * R ( π 2 | π 1 = x 1 ) d Π n ( π , π 1 ) ( x , x 1 ) = A 12 , n × A 1 × R × Θ E R n , x * R ( π 2 | π 1 = x 1 ) d Π n ( x , x , θ ) = Θ A 12 , n A 1 E R n , x * R ( π 2 | π 1 = x 1 ) d R θ π 1 ( x 1 ) d R θ n ( x ) d Q ( θ ) = Θ A 12 , n h A 1 ( x , θ ) d R θ n ( x ) d Q ( θ ) = Θ × A 12 , n h A 1 ( x , θ ) d ( Q R n ) ( θ , x ) ,
where:
h A 1 ( x , θ ) : = A 1 E R n , x * R ( π 2 | π 1 = x 1 ) d R θ π 1 ( x 1 ) .
Since Q R n = R n * β 12 , n * , we have:
( β ) = A 12 , n × Θ h A 1 ( x , θ ) d ( R n * β 12 , n * ) ( x , θ ) = A 12 , n Θ h A 1 ( x , θ ) d R n , x * ( θ ) d β 12 , n * ( x ) = A 12 , n Θ Ω 1 I A 1 ( x ) E R x * R ( π 2 | π 1 = x 1 ) d R θ ( x ) d R n , x * ( θ ) d β 12 , n * ( x ) = A 12 , n Ω 1 × R I A 1 ( x ) E R n , x * R ( π 2 | π 1 = x 1 ) d R n , x * R ( x ) d β 12 , n * ( x ) = A 12 , n A 1 E R n , x * R ( π 2 | π 1 = x 1 ) d R n , x * R π 1 ( x 1 ) d β 12 , n * ( x ) = A 12 , n A 1 × R π 2 ( x ) d R n , x * R ( x ) d β 12 , n , Q * ( x ) = A 12 , n Θ Ω 1 × R I A 1 ( x 1 ) x 2 d R θ ( x ) d R n , x * ( θ ) d β 12 , n * ( x ) = A 12 , n × Θ g A 1 ( θ ) d ( R n * β 12 , n * ) ( x , θ )
where:
g A 1 ( θ ) = Ω 1 × R I A 1 ( x 1 ) x 2 d R θ ( x ) .
Using again that Q R n = R n * β 12 , n * , we have:
( β ) = Θ × A 12 , n g A 1 ( θ ) d ( Q R n ) ( θ , x ) = Θ A 12 , n g A 1 ( θ ) d R θ n ( x ) d Q ( θ ) = Θ R θ n ( A 12 , n ) A 1 × R x 2 d R θ ( x ) d Q ( θ ) = Θ R θ n ( A 12 , n ) A 1 E R θ ( π 2 | π 1 = x 1 ) d R θ π 1 ( x 1 ) d Q ( θ ) ,
which proves that ( α ) = ( β ) , and hence, that:
m n * ( x , x 1 ) = E Π n ( F | π = x , π 1 = x 1 ) .
Thus,
E Π n [ ( m n * ( x , x 1 ) r θ ( x 1 ) ) 2 ] E Π n [ ( m n ( x , x 1 ) r θ ( x 1 ) ) 2 ]
for any other estimator m n of the regression curve r θ , i.e., m n * is the Bayes estimator of the regression curve r θ ( x 1 ) : = E θ ( X 2 | X 1 = x 1 ) for the squared error loss function. □

Funding

This research was funded by the Junta de Extremaura (SPAIN), grant number GR21044.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A

Let us briefly recall some basic concepts about Markov kernels, mainly to fix the notation. In what follows, ( Ω , A ) , ( Ω 1 , A 1 ) , and so on will denote measurable spaces.
Definition A1.
(1)
(Markov kernel.) A Markov kernel M 1 : ( Ω , A ) ( Ω 1 , A 1 ) is a map M 1 : Ω × A 1 [ 0 , 1 ] such that: (i) ω Ω , M 1 ( ω , · ) is a probability measure on A 1 , and (ii) A 1 A 1 , M 1 ( · , A 1 ) is A -measurable.
(2)
(Image of a Markov kernel.) The image (or probability distribution) of a Markov kernel M 1 : ( Ω , A , P ) ( Ω 1 , A 1 ) on a probability space is the probability measure P M 1 on A 1 defined by P M 1 ( A 1 ) : = Ω M 1 ( ω , A 1 ) d P ( ω ) .
(3)
(Composition of Markov kernels.) Given two Markov kernels M 1 : ( Ω 1 , A 1 ) ( Ω 2 , A 2 ) and M 2 : ( Ω 2 , A 2 ) ( Ω 3 , A 3 ) , their composition is defined as the Markov kernel M 2 M 1 : ( Ω 1 , A 1 ) ( Ω 3 , A 3 ) given by:
M 2 M 1 ( ω 1 , A 3 ) = Ω 2 M 2 ( ω 2 , A 3 ) M 1 ( ω 1 , d ω 2 ) .
Remark A1.
(1)
(Markov kernels as extensions of the concept of random variable.) The concept of Markov kernel extends the concept of random variable (or measurable map). A random variable T 1 : ( Ω , A , P ) ( Ω 1 , A 1 ) will be identified with the Markov kernel M T 1 : ( Ω , A ; P ) ( Ω 1 , A 1 ) defined by M T 1 ( ω , A 1 ) = δ T 1 ( ω ) ( A 1 ) = I A 1 ( T 1 ( ω ) ) , where δ T 1 ( ω ) denotes the Dirac measure—the degenerate distribution—at the point T 1 ( ω ) , and I A 1 is the indicator function of the event A 1 . In particular, the probability distribution P M T 1 of M T 1 coincides with the probability distribution P T 1 of T 1 defined as P T 1 ( A 1 ) : = P ( T 1 A 1 ) .
(2)
Given a Markov kernel M 1 : ( Ω 1 , A 1 ) ( Ω 2 , A 2 ) and a random variable X 2 : ( Ω 2 , A 2 ) ( Ω 3 , A 3 ) , we have that M X 2 M 1 ( ω 1 , A 3 ) = M 1 ( ω 1 , X 2 1 ( A 3 ) ) = M 1 ( ω 1 , · ) X 2 ( A 3 ) . We write X 2 M 1 : = M X 2 M 1 .
(3)
Given a Markov kernel M 1 : ( Ω 1 , A 1 , P 1 ) ( Ω 2 , A 2 ) we write P 1 M 1 for the only probability measure on the product σ-field A 1 × A 2 such that:
( P 1 M 1 ) ( A 1 × A 2 ) = A 1 M 1 ( ω 1 , A 2 ) d P 1 ( ω 1 ) , A i A i , i = 1 , 2 .
(4)
Given two r.v. X i : ( Ω , A , P ) ( Ω i , A i ) , i = 1 , 2 , we write P X 2 | X 1 for the conditional distribution of X 2 given X 1 , i.e., for the Markov kernel P X 2 | X 1 : ( Ω 1 , A 1 ) ( Ω 2 , A 2 ) such that:
P ( X 1 , X 2 ) ( A 1 × A 2 ) = A 1 P X 2 | X 1 = x 1 ( A 2 ) d P X 1 ( x 1 ) , A i A i , i = 1 , 2 .
Hence, P ( X 1 , X 2 ) = P X 1 P X 2 | X 1 .
Let ( Ω , A , { P θ : θ ( Θ , T , Q ) } ) be a Bayesian statistical experiment, where Q denotes the prior distribution on the parameter space ( Θ , T ) . We assume that P ( θ , A ) : = P θ ( A ) is a Markov kernel P : ( Θ , T ) ( Ω , A ) . When needed, we shall assume that P θ has a density (Radon-Nikodym derivative) p θ with respect to a σ -finite measure μ on A and that the likelihood function L ( ω , θ ) : = p θ ( ω ) is A × T -measurable (this is sufficient to prove that P is a Markov kernel).
Let Π : = Q P , i.e.,
Π ( A × T ) = T P θ ( A ) d Q ( θ ) , A A , T T .
The prior predictive distribution is β Q * : = Π I (the distribution of I with respect to Π ), where I ( ω , θ ) : = ω . Thus:
β Q * ( A ) = Θ P θ ( A ) d Q ( θ ) .
The posterior distribution is a Markov kernel P * : ( Ω , A ) ( Θ , T ) such that:
Π ( A × T ) = T P θ ( A ) d Q ( θ ) = A P ω * ( T ) d β Q * ( ω ) , A A , T T ,
i.e., such that Π = Q P = β Q * P * . In this way, the Bayesian statistical experiment can be identified with the probability space ( Ω × Θ , A × T , Π ) , as proposed, for instance, in [8].
It is well known that, for ω Ω , the posterior Q-density is proportional to the likelihood:
p ω * ( θ ) : = d P ω * d Q ( θ ) = C ( ω ) p θ ( ω )
where C ( ω ) = [ Θ p θ ( ω ) d Q ( θ ) ] 1 .
The posterior predictive distribution on A given ω is:
P ω * P ( A ) = Θ P θ ( A ) d P ω * ( θ ) , A A .
This is a Markov kernel:
P P * ( ω , A ) : = P ω * P ( A ) .
It is readily shown that the posterior predictive density is:
d P ω * P d μ ( ω ) = Θ p θ ( ω ) p ω * ( θ ) d Q ( θ ) .
We know from [1] that:
Ω × Θ sup A A | P ω * P ( A ) P θ ( A ) | 2 d Π ( ω , θ ) Ω × Θ sup A A | M ( ω , A ) P θ ( A ) | 2 d Π ( ω , θ )
for every Markov kernel M : ( Ω , A ) ( Ω , A ) , provided that A is separable (recall that a σ -field is said to be separable, or countably generated, if it contains a countable subfamily which generates it). We also have that, for a real statistic X with finite mean, the posterior predictive mean:
E ( P ω * ) P ( X ) = Θ Ω X ( ω ) d P θ ( ω ) d P ω * ( θ )
which is the Bayes estimator of f ( θ ) : = E θ ( X ) , as E ( P ω * ) P ( X ) = E P ω * ( E θ ( X ) ) .
Remark A2.
A notation more commonly used in the literature is the following: p ( θ ) (or π ( θ ) ) denotes the prior distribution, and p ( y ) : = p ( y , θ ) d θ stands for the prior predictive distribution, while, for a future observation y ˜ , the posterior predictive distribution given the data y are denoted as p ( y ˜ | y ) = p ( y ˜ , θ | y ) d θ = p ( y ˜ | θ ) p ( θ | y ) d θ , where p ( θ | y ) refers to the posterior distribution. This is the form used in [3,11].

References

  1. Nogales, A.G. On Bayesian estimation of densities and sampling distributions: The posterior predictive distribution as the Bayes estimator. Stat. Neerl. 2022, 76, 236–250. [Google Scholar] [CrossRef]
  2. Lehmann, E.L.; Casella, G. Theory of Point Estimation, 2nd ed.; Springer: New York, NY, USA, 1998. [Google Scholar]
  3. Gelman, A.; Carlin, J.B.; Stern, H.S.; Dunson, D.B.; Vehtari, A.; Rubin, D.B. Bayesian Data Analysis, 3rd ed.; CRC Press (Taylor & Francis Group): Boca Raton, FL, USA, 2014. [Google Scholar]
  4. Geisser, S. Predictive Inference: An Introduction; Chapman & Hall: New York, NY, USA, 1993. [Google Scholar]
  5. Rubin, D.B. Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Stat. 1984, 12, 1151–1172. [Google Scholar] [CrossRef]
  6. Nadaraya, E.A. Nonparametric Estimation of Probability Densities and Regression Curves; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1989. [Google Scholar]
  7. Ghosal, S.; Vaart, A.V.D. Fundamentals of Nonparametric Bayesian Inference; Cambridge University Press: Cambridge, UK, 2017. [Google Scholar]
  8. Florens, J.P.; Mouchart, M.; Rolin, J.M. Elements of Bayesian Statistics; Marcel Dekker: New York, NY, USA, 1990. [Google Scholar]
  9. Barra, J.R. Notions Fondamentales de Statistique Mathématique; Dunod: Paris, France, 1971. [Google Scholar]
  10. Bogachev, V.I. Measure Theory, Vol. II; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
  11. Ghosh, J.K.; Delampady, M.; Samanta, T. An Introduction to Bayesian Analysis, Theory and Methods; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Nogales, A.G. Optimal Bayesian Estimation of a Regression Curve, a Conditional Density, and a Conditional Distribution. Mathematics 2022, 10, 1213. https://doi.org/10.3390/math10081213

AMA Style

Nogales AG. Optimal Bayesian Estimation of a Regression Curve, a Conditional Density, and a Conditional Distribution. Mathematics. 2022; 10(8):1213. https://doi.org/10.3390/math10081213

Chicago/Turabian Style

Nogales, Agustín G. 2022. "Optimal Bayesian Estimation of a Regression Curve, a Conditional Density, and a Conditional Distribution" Mathematics 10, no. 8: 1213. https://doi.org/10.3390/math10081213

APA Style

Nogales, A. G. (2022). Optimal Bayesian Estimation of a Regression Curve, a Conditional Density, and a Conditional Distribution. Mathematics, 10(8), 1213. https://doi.org/10.3390/math10081213

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop