1. Introduction
In Statistics, the expression the probability of an event 
A (written 
) is, in general, ambiguous, as it depends on the unknown parameter 
. Before conducting the experiment, a Bayesian statistician, provided with the prior distribution, possesses a natural candidate—the prior predictive probability of 
A—since it is the prior mean of the probabilities of 
A. However, in accordance with Bayesian philosophy, after the experiment has been performed and the data 
 observed, a reasonable estimation is the posterior predictive probability of 
A given 
 because it is the posterior mean of the probabilities of 
A given 
. It can be shown that not only is this the Bayes estimator of the probability 
 of 
A for the squared error loss function but also that the posterior predictive distribution is the Bayes estimator of the sampling probability distribution 
 for the squared variation total loss function and that the posterior predictive density is the Bayes estimator of its density for the 
-squared loss function. Note that these loss functions should be considered natural in the sense that they are derived directly from the quadratic error loss function commonly used in the estimation of a real function of the parameter. Ref. [
1] contains precise statements and proofs of these results, which are nothing but a functional generalization of Theorem 1.1 (more specifically of its Corollary 1.2.(a)) of [
2], p. 228, which yields the Bayes estimator of a real function of the parameter for the squared error loss function.
This communication addresses the estimation of a regression curve and some related problems, such as the estimation of a conditional density or a conditional distribution function or even the conditional distribution itself from a Bayesian perspective. It should, therefore, be considered as the conditional counterpart of [
1], and the results to be presented below as the functional extension of [
2], Theorem 1.1, for the conditional case. Thus, it is unsurprising that the posterior predictive distribution is the cornerstone for the estimation problems to be discussed below. Some examples illustrating the results will be presented in 
Section 7. See [
1] and the references therein for other examples of the determination of the posterior predictive distribution. In practice, however, the explicit evaluation of the posterior predictive distribution could well be cumbersome, and its simulation may become preferable. Ref. [
3] is a good reference for such simulation methods, and hence, for the computation of the Bayes estimators of the conditional density and the regression curve.
The posterior predictive distribution has been presented as the base of Predictive Inference, which seeks to make inferences about a new unknown observation from the previous random sample in contrast with the greater emphasis that statistical inference, since its mathematical foundations in the early twentieth century, puts on parameter estimation and contrast (see [
4] or [
3]). With that idea in mind, it has also been used in other areas, such as model selection, testing for discordancy, goodness of fit, perturbation analysis, or classification (see additional fields of application in [
4,
5]), but never as a possible solution for the Bayesian problems of estimating an unconditional or conditional density. The reader is referred to the references within [
1] for other uses of the posterior predictive distribution in Bayesian statistics.
To summarize the contribution of this work, I want to emphasize that the problems of estimating a density (conditional or not) or a regression curve are of central importance in Nonparametric Inference and Functional Data Analysis (for example, see [
6] or [
7], and the references they contain). Although nobody expects an optimal result for these problems in a frequentist environment, this article together with [
1] produces optimal solutions for them in a Bayesian framework. The reader should note that these are not just theorems of existence and uniqueness of solutions; rather, on the contrary, the results obtained explicit formulas for the solutions based on the posterior predictive distribution. Note also that there is enough literature on how to calculate it, exactly or approximately.
Section 2 sets out the proper statistical framework for tackling the problems, i.e., the proper Bayesian experiment (conceived also as a probability space along the lines suggested by [
8], for example.)
 Section 3 deals with the problem of Bayesian estimation of a conditional distribution when the squared total variation loss function is used and Theorem 1 gives the Bayes estimator in terms of the posterior predictive distribution.
 Section 4 takes advantage of Theorem 1 to solve the problem of the Bayesian estimation of a conditional density using the 
-squared loss function, obtaining the Bayes estimator of the conditional density (see Theorem 2).
 Section 5 and 
Section 6 deal with the problems of Bayesian estimation of a conditional distribution function and a regression curve in the real case. Theorems 3 and 4 yieds the solutions.
 Section 7 provide some examples to illustrate the application of all these theorems.
 For ease of reading, the proofs are postponed until 
Section 8. This is followed by an appendix (
Appendix A) explaining the notation and concepts used in the text.
We shall place ourselves from this point onwards in a general framework for Bayesian inference, as described in [
9].
  2. The Framework
Let 
 be a Bayesian statistical experiment, and 
, 
, two statistics. Consider the Bayesian experiment image of 
:
In what follows, we shall assume that ,  is a Markov kernel and write .
The Bayesian experiment corresponding to a sample of size 
n of the joint distribution of 
 is:
We write 
 for 
 and
      
      for the joint distribution of the parameter and the sample:
The corresponding prior predictive distribution 
 is:
The posterior distribution is a Markov kernel: 
      such that, for all 
 and 
,
      
Let us write .
The posterior predictive distribution on 
 is the Markov kernel: 
      defined, for 
, by:
It follows that, with obvious notation:
      for any non-negative or integrable real random variable (r.r.v. for short) 
f.
We can also consider the posterior predictive distribution on 
 defined as the Markov kernel: 
      such that:
According to Theorem 1 of [
1], this is the Bayes estimator of the distribution 
 for the squared total variation function:
      for every Markov kernel 
.
It can be readily checked that:
      where 
 for 
. Then, Theorem 2 of [
1] shows that:
      for every Markov kernel 
.
We introduce some notation for 
:
Let us consider the probability space:
      where:
      when 
, 
 and 
.
Thus, for a r.r.v. 
f on: 
,
      
      provided that the integral exists. Moreover, for a r.r.v. 
h on 
:
The following proposition is straightforward.
Proposition 1. Given ,  and , we have that: Moreover:where the last equality refers to the case where  is a real statistic with a finite mean.  In particular, the probability space (4) contains all the basic ingredients of the Bayesian experiment (1), i.e., the prior distribution, the sampling probabilities, the posterior distributions, and the prior predictive distribution. In addition, it becomes the natural framework in which to address the estimation problems of this communication, as we shall see in what follows.
  4. Bayes Estimator of the Conditional Density
When the joint distribution 
 has a density 
 with respect to the product of two 
-finite measures 
 and 
 on 
 and 
, resp., the conditional density is:
      for almost every 
, where 
 stands for the marginal density of 
.
An estimator of the conditional density 
 from an 
n-sized sample of the joint distribution of 
 is a map:
      such that, being observed 
, 
 is considered to be an estimation of the conditional density 
 of 
 given 
.
It is well known (see, for instance [
7], p. 126)) that, given two probability measures 
 and 
 on a measurable space 
 having densities 
 and 
 with respect to a 
-finite measure 
:
Thus, the Bayesian estimation of the conditional distribution 
 for the squared total variation loss function corresponds to the Bayesian estimation of its density 
 for the 
-squared loss function. Hence, according to Theorem 1, the Bayes estimator of the conditional density 
 for the 
-squared loss function is the 
-density 
 of the conditional distribution:
Note that:
      where 
 denotes the 
Q-density of the posterior distribution 
. Thus, 
 is of the form 
, where:
      is the 
-density of 
. Hence, the 
-density of the posterior predictive distribution 
 is:
      and its first marginal is:
Thus, we have proved the following result.
Theorem 2. Assume that  is separable. The Bayes estimator of the conditional density  for the -squared loss function is the -density:of the conditional distribution  of  given  with respect to the posterior predictive distribution :for any estimator m of the conditional density.