Asymptotic Behavior of the Bayes Estimator of a Regression Curve

Nogales, Agustín G.

doi:10.3390/math13142319

Open AccessFeature PaperArticle

Asymptotic Behavior of the Bayes Estimator of a Regression Curve

by

Agustín G. Nogales

Departamento de Matemáticas, Instituto de Matemáticas, Universidad de Extremadura, 06006 Badajoz, Spain

Mathematics 2025, 13(14), 2319; https://doi.org/10.3390/math13142319

Submission received: 30 May 2025 / Revised: 17 July 2025 / Accepted: 18 July 2025 / Published: 21 July 2025

(This article belongs to the Section D1: Probability and Statistics)

Download Versions Notes

Abstract

In this work, we prove the convergence to 0 in both

L^{1}

and

L^{2}

of the Bayes estimator of a regression curve (i.e., the conditional expectation of the response variable given the regressor). The strong consistency of the estimator is also derived. The Bayes estimator of a regression curve is the regression curve with respect to the posterior predictive distribution. The result is general enough to cover discrete and continuous cases, parametric or nonparametric, and no specific supposition is made about the prior distribution. Some examples, two of them of a nonparametric nature, are given to illustrate the main result; one of the nonparametric examples exhibits a situation where the estimation of the regression curve has an optimal solution, although the problem of estimating the density is meaningless. An important role in the demonstration of these results is the establishment of a probability space as an adequate framework to address the problem of estimating regression curves from the Bayesian point of view, putting at our disposal powerful probabilistic tools in that endeavor.

Keywords:

Bayesian estimation of a regression curve; posterior predictive distribution; consistency

MSC:

62F15; 62G08; 62J02

1. Introduction

Given a random variable

X_{1}

—the independent variable, regressor, or predictor—and a real random variable

X_{2}

—the dependent variable or response—the so called regression curve of

X_{2}

given

X_{1}

is the map

r (x_{1}) : = E (X_{2} | X_{1} = x_{1})

, the function of

X_{1}

that best approximates

X_{2}

in the least squares sense; therefore, it becomes an essential tool in the study of the relationship between these two variables. Many statistical problems in practice, especially those related to prediction, require the estimation of the regression function from data, i.e., from a sample

(x_{1 i}, x_{2 i})

,

i = 1, \dots, n

, of the joint distribution of

X_{1}

and

X_{2}

. This estimation problem has been addressed in a good number of papers in both parametric and nonparametric contexts, from both the frequentist and the Bayesian points of view. In fact, regression techniques are among the most widely used methods in applied statistics.

In a nonparametric frequentist framework, the problem of estimation of the regression curve was first considered in [1,2]. We refer to [3] for this problem in a Bayesian context; it includes some historical notes about Bayesian nonparametric regression and some results about the consistency of the estimates for some specific priors.

Talking about the probability of an event A (written

P_{θ} (A)

) in a statistical context is ambiguous, as it depends of the unknown parameter. In a Bayesian context, once the data

ω

has been observed, a natural estimate of

P_{θ} (A)

is the posterior predictive probability of A given

ω

since it is the posterior mean of the probabilities of A given

ω

, which, as is well known, is the Bayes estimator of

P_{θ} (A)

for the squared error loss function. This simple fact already justifies the use of the posterior predictive distribution as an estimator of the sampling distribution, but, in reality, much more is true because, as shown in [4], the posterior predictive distribution is the Bayes estimator of the sampling probability distribution

P_{θ}

for the squared total variation loss function. It is similar to what happens with the strong law of large numbers and the Glivenko–Cantelli theorem: the first guarantees an almost certain punctual convergence of the empirical distribution function to the unknown population distribution function, but the second yields an almost sure uniform convergence, becoming the fundamental theorem of Mathematical Statistics. The problem of estimation of the density in a Bayesian nonparametric framework is considered as a number of references, such as [5], [6], [7], or [8]. In [4], the problem of estimation of the density from a Bayesian point of view is also addressed, and, under mild conditions, it is shown that the posterior predictive density is the Bayes estimator for the

L^{1}

-squared loss function, and the convergence to 0 of the Bayes risk (and the strong consistency) of this estimator is shown in [9].

As regards the estimation of the regression curve, or even the conditional density, reference [10] or reference [11] contain sufficient arguments on the usefulness of these problems in practice from a frequentist point of view, problems that go back to [12], although they have not produced much literature since then either. The paper [13] deals, among others, with the problem of the Bayesian estimation of a regression curve and proves that the regression curve, with respect to the posterior predictive distribution, is the Bayes estimator (for the squared error loss function). Here, we wonder about the convergence to 0 of its Bayes risk (and its strong consistency). This is the main goal of the paper, and Theorem 1 below answers the question in the affirmative.

So, the posterior predictive distribution is the key to the estimation problems raised above. It has been presented in the literature as the base of Predictive Inference, which seeks to make inferences about a new unknown observation from the previous random sample instead of estimating an unknown parameter. It should be noted that, in practice, the explicit evaluation of the posterior predictive distribution could be cumbersome, and its simulation may become preferable. The interested reader can find in the papers mentioned above, and the references therein, more information on the problems of estimating the density or regression curve, from both the frequentist and Bayesian perspectives, or about the usefulness of the posterior predictive distribution in Bayesian Inference and its calculation. We place special emphasis on the monographs [3,14,15].

In Section 2, an important and useful achievement of the paper is obtained, as it establishes a probability space as the theoretical framework (i.e., Bayesian experiment) appropriate to address the problem (in the same way that [16] considers the Bayesian experiment as a probability space). In fact, starting from the Bayesian experiment (1) corresponding to a sample of size m (possibly infinite) from the joint distribution of the two variables

X_{1}

(predictor) and

X_{2}

(response) of interest, the probability space (3) is presented as the appropriate model for the estimation of the regression curve of

X_{2}

given

X_{1} = x_{1}

from an m-sized sample of the joint distribution of the two variables. This has allowed us to obtain an explicit expression of the Bayes risk of an estimator of the regression curve and take advantage of powerful probabilistic tools when solving the problem of its asymptotic behavior.

Section 3 includes the aforementioned Theorem 1, whose proof lies in Jensen’s inequality, Lévy’s martingale convergence theorem, and a result by Doob on the consistency of the posterior distribution. The result is general enough to cover discrete and continuous cases, parametric or nonparametric, as the examples provided show, and, unlike what we have been able to find in the literature, no specific supposition is made about the prior distribution.

Section 4 contains the proof of the main result and some auxiliary results. In particular, Lemma 1, the key of the proof of the Theorem, yields a representation of the Bayes estimator of the regression curve as its conditional mean in the Bayesian experiment (3).

Section 5 includes some examples to illustrate the main result of the paper, two of them of a nonparametric nature. The last of these two nonparametric examples shows a situation where estimating the regression curve has a Bayes estimator, although the problem of estimating the density is meaningless; in fact, the estimation of the regression function is performed through the conditional distribution itself.

For ease of reading, we encourage the reader who is not familiar with the terminology or the notation used in the paper to start by reading the Appendix of [17].

2. The Framework

We recall from [13] the appropriate framework to address the problem and update it to incorporate the required asymptotic flavor.

Let

(Ω, A, {P_{θ} : θ \in (Θ, T, Q)})

be a Bayesian statistical experiment and

X_{i} : (Ω, A, {P_{θ} : θ \in (Θ, T, Q)}) \to (Ω_{i}, A_{i}), i = 1, 2,

two statistics. Consider the Bayesian experiment image of

(X_{1}, X_{2})

(Ω_{1} \times Ω_{2}, A_{1} \times A_{2}, {P_{θ}^{(X_{1}, X_{2})} : θ \in (Θ, T, Q)}) .

In the following, we will assume that

P^{(X_{1}, X_{2})} (θ, A_{12}) : = P_{θ}^{(X_{1}, X_{2})} (A_{12})

,

θ \in Θ, A_{12} \in A_{1} \times A_{2}

, and the joint distribution of

X_{1}

and

X_{2}

, is a Markov kernel. Let us write

R_{θ} = P_{θ}^{(X_{1}, X_{2})}

and

p_{j} (x) : = x_{j}

for

j = 1, 2

,

x : = (x_{1}, x_{2}) \in Ω_{1} \times Ω_{2}

. Hence

\begin{matrix} P_{θ}^{X_{1}} = R_{θ}^{p_{1}} and P_{θ}^{X_{2} | X_{1} = x_{1}} = R_{θ}^{p_{2} | p_{1} = x_{1}}, \end{matrix}

and, when

X_{2}

is a real random variable,

E_{P_{θ}} (X_{2} | X_{1} = x_{1}) = E_{R_{θ}} (p_{2} | p_{1} = x_{1})

. In order to alleviate and shorten the notation, we write

Ω_{12} = Ω_{1} \times Ω_{2}

and

A_{12} = A_{1} \times A_{2}

.

Given an integer n, for

m = n

(respectively,

m = N

), the Bayesian experiment corresponding to a n-sized sample (respectively, an infinite sample) of the joint distribution of

(X_{1}, X_{2})

is

\begin{matrix} (Ω_{12}^{m}, A_{12}^{m}, \{R_{θ}^{m} : θ \in (Θ, T, Q)\}) . \end{matrix}

(1)

We define the Markov kernel

R^{m}

by

R^{m} (θ, A_{12, m}^{'}) : = R_{θ}^{m} (A_{12, m}^{'})

, for

A_{12, m}^{'} \in A_{12}^{m}

, and denote

Π_{12, m} : = Q \otimes R^{m}

for the joint distribution of the parameter and the sample, i.e.,

Π_{12, m} (A_{12, m}^{'} \times T) = \int_{T} R_{θ}^{m} (A_{12, m}^{'}) d Q (θ), A_{12, m}^{'} \in A_{12}^{m}, T \in T .

The corresponding prior predictive distribution

β_{m}^{*}

on

Ω_{12}^{m}

is

β_{m}^{*} (A_{12, m}^{'}) = \int_{Θ} R_{θ}^{m} (A_{12, m}^{'}) d Q (θ), A_{12, m}^{'} \in A_{12}^{m} .

The posterior distribution is a Markov kernel

R_{m}^{*} : (Ω_{12}^{m}, A_{12}^{m}) ≻ ⟶ (Θ, T)

such that, for all

A_{12, m}^{'} \in A_{12}^{m}

and

T \in T

,

Π_{12, m} (A_{12, m}^{'} \times T) = \int_{T} R_{θ}^{m} (A_{12, m}^{'}) d Q (θ) = \int_{A_{12, m}^{'}} R_{m}^{*} (x^{'}, T) d β_{m}^{*} (x^{'}) .

Let us write

R_{m, x^{'}}^{*} (T) : = R_{m}^{*} (x^{'}, T)

.

The posterior predictive distribution on

A_{12}

is the Markov kernel

{R_{m}^{*}}^{R} : (Ω_{12}^{m}, A_{12}^{m}) ≻ ⟶ (Ω_{12}, A_{12})

defined, for

x^{'} \in Ω_{12}^{m}

, by

{R_{m}^{*}}^{R} (x^{'}, A_{12}) : = \int_{Θ} R_{θ} (A_{12}) d R_{m, x^{'}}^{*} (θ)

This way, given

x^{'}

, the posterior predictive probability of an event

A_{12}

is nothing but the posterior mean of

R_{θ} (A_{12})

. It follows that, with obvious notations,

\int_{Ω_{12}} f (x) d {R_{m, x^{'}}^{*}}^{R} (x) = \int_{Θ} \int_{Ω_{12}} f (x) d R_{θ} (x) d R_{m, x^{'}}^{*} (θ)

for any non-negative or integrable real random variable f.

We can also consider the posterior predictive distribution on

A_{12}^{m}

, defined as the Markov kernel

{R_{m}^{*}}^{R^{m}} : (Ω_{12}^{m}, A_{12}^{m}) ≻ ⟶ (Ω_{12}^{m}, A_{12}^{m})

such that

{R_{m}^{*}}^{R^{m}} (x^{'}, A_{12, m}^{'}) : = \int_{Θ} R_{θ}^{m} (A_{12, m}^{'}) d R_{m, x^{'}}^{*} (θ) .

Looking for the appropriate framework to address the problem of estimating the regression curve of the real random variable

X_{2}

given

X_{1} = x_{1}

, we part from the Bayesian experiment (1) corresponding to a sample

x^{'} \in Ω_{12}^{m}

from the joint distribution

R_{θ}

of the predictor

X_{1}

and the response

X_{2}

, and we choose

x = (x_{1}, x_{2}) \in Ω_{12}

(in fact, we only need the first coordinate

x_{1}

of x as the argument of the regression curve) from

R_{θ}

independently of

x^{'}

, which brings us to the product Bayesian experiment

\begin{matrix} (Ω_{12}^{m} \times Ω_{12}, A_{12}^{m} \times A_{12}, \{R_{θ}^{m} \times R_{θ} : θ \in (Θ, T, Q)\}) . \end{matrix}

(2)

The Bayesian experiment (2) can be identified in a standard way with the probability space

\begin{matrix} (Ω_{12}^{m} \times Ω_{12} \times Θ, A_{12}^{m} \times A_{12} \times T, Π_{m}), \end{matrix}

(3)

where

Π_{m} : = (R^{m} \times R) \otimes Q

, i.e.,

Π_{m} (A_{12, m}^{'} \times A_{12} \times T) = \int_{T} R_{θ} (A_{12}) R_{θ}^{m} (A_{12, m}^{'}) d Q (θ),

when

A_{12, m}^{'} \in A_{12}^{m}

,

A_{12} \in A_{12}

and

T \in T

.

So, for a real random variable f on

(Ω_{12}^{m} \times Ω_{12} \times Θ, A_{12}^{m} \times A_{12} \times T)

,

\begin{matrix} \int f d Π_{m} = \int_{Θ} \int_{Ω_{12}^{m}} \int_{Ω_{12}} f (x^{'}, x, θ) d R_{θ} (x) d R_{θ}^{m} (x^{'}) d Q (θ), \end{matrix}

provided that the integral exists. Moreover, for a real random variable h on

(Ω_{12} \times Θ, A_{12} \times T)

, by definition of the posterior distributions,

\int h d Π_{m} = \int_{Θ} \int_{Ω_{12}} h (x, θ) d R_{θ} (x) d Q (θ) = \int_{Ω_{12}} \int_{Θ} h (x, θ) d R_{1, x}^{*} (θ) d β_{1}^{*} (x) .

3. The Bayes Estimator of the Regression Curve: Asymptotic Behavior

Now suppose that

(Ω_{2}, A_{2}) = (R, R)

. Let

X_{2}

be a squared-integrable real random variable, such that

E_{θ} (X_{2}^{2})

has a finite prior mean; in particular,

E_{θ} (X_{2})

also has a finite prior mean.

In the regression curve of

X_{2}

, given that

X_{1}

is the map,

x_{1} \in Ω_{1} \mapsto r_{θ} (x_{1}) : = E_{θ} (X_{2} | X_{1} = x_{1})

An estimator of the regression curve

r_{θ}

from a sample of size n of the joint distribution of

(X_{1}, X_{2})

is a statistic

m : (x^{'}, x_{1}) \in {(Ω_{1} \times R)}^{n} \times Ω_{1} ⟼ m (x^{'}, x_{1}) \in R,

so that, being observed, the sample

x^{'} \in {(Ω_{1} \times R)}^{n}

,

m (x^{'}, \cdot)

is the estimation of

r_{θ}

.

From a classical point of view, the simplest way to evaluate the error in estimating an unknown regression curve is to use the expectation of the quadratic deviation (see [18], p. 120):

\begin{matrix} E_{θ} [\int_{Ω_{1}} {(m (x^{'}, x_{1}) - r_{θ} (x_{1}))}^{2} d P_{θ}^{X_{1}} (x_{1})] = \\ \int_{{(Ω_{1} \times R)}^{n}} \int_{Ω_{1}} {(m (x^{'}, x_{1}) - r_{θ} (x_{1}))}^{2} d R_{θ}^{p_{1}} (x_{1}) d R_{θ}^{n} (x^{'}) . \end{matrix}

From a Bayesian point of view, the Bayes estimator—the optimal estimator—of the regression curve

r_{θ}

should minimize the Bayes risk (i.e., the prior mean of the expectation of the quadratic deviation)

\begin{matrix} \int_{Θ} \int_{{(Ω_{1} \times R)}^{n}} \int_{Ω_{1}} {(m (x^{'}, x_{1}) - r_{θ} (x_{1}))}^{2} d R_{θ}^{p_{1}} (x_{1}) d R_{θ}^{n} (x^{'}) d Q (θ) = \\ E_{Π_{n}} [{(m (x^{'}, x_{1}) - r_{θ} (x_{1}))}^{2}] . \end{matrix}

So, the Bayesian experiment (3) is the appropriate framewok to address these questions (see also Remark 1).

Recall from [13] that the regression curve of

p_{2}

on

p_{1}

with respect to the posterior predictive distribution

{R_{n, x^{'}}^{*}}^{R}

, given the data

x^{'}

, is the Bayes estimator of the regression curve

r_{θ} (x_{1}) : = E_{θ} (X_{2} | X_{1} = x_{1})

for the squared error loss function; for the sake of completeness, this proposition is also included as part (i) of Theorem 1 below.

We wonder about the convergence to 0 of the Bayes risk:

E_{Π_{n}} [{(m_{n}^{*} (x^{'}, x_{1}) - r_{θ} (x_{1}))}^{2}] .

Another question of interest is the consistency of this Bayes estimator.

Lemma 1 below is key to solving the problem, since it shows that the Bayes estimator of the regression curve becomes its conditional mean in the Bayesian experiment (3). What the following theorem really provides is the asymptotic behavior of this estimator: the convergence to zero of its Bayes risks and the strong consistency of the Bayes estimator of the regression curve.

Theorem 1.

Let

(Ω, A, {P_{θ} : θ \in (Θ, T, Q)})

be a Bayesian statistical experiment and

X_{1} : (Ω, A, {P_{θ} : θ \in (Θ, T, Q)}) \to (Ω_{1}, A_{1})

and

X_{2} : (Ω, A, {P_{θ} : θ \in (Θ, T, Q)}) \to (R, R)

be two statistics, such that

E_{θ} (X_{2}^{2})

has a finite prior mean. Let us suppose that: (a)

(Ω_{1}, A_{1})

is a standard Borel space; (b) Θ is a Borel subset of a Polish space, and

T

is its Borel σ-field; and (c)

{R_{θ} : θ \in Θ}

is identifiable.

Then,

(i) The regression curve of

p_{2}

on

p_{1}

with respect to the posterior predictive distribution

{R_{n, x^{'}}^{*}}^{R}

,

m_{n}^{*} (x^{'}, x_{1}) : = E_{{R_{n, x^{'}}^{*}}^{R}} (p_{2} | p_{1} = x_{1}),

is the Bayes estimator of the regression curve

r_{θ} (x_{1}) : = E_{θ} (X_{2} | X_{1} = x_{1})

for the squared error loss function, i.e.,

E_{Π_{n}} [{(m_{n}^{*} (x^{'}, x_{1}) - r_{θ} (x_{1}))}^{2}] \leq E_{Π_{n}} [{(m_{n} (x^{'}, x_{1}) - r_{θ} (x_{1}))}^{2}]

for any other estimator

m_{n}

of the regression curve

r_{θ}

.

(ii) Moreover,

m_{n}^{*}

is a strongly consistent estimator of the regression curve, in the sense that

lim_{n} E_{{R_{n, x_{(n)}^{'}}^{*}}^{R}} (p_{2} | p_{1} = x_{1}) = E_{θ} (X_{2} | X_{1} = x_{1}), Π_{N} - a . e .,

where

x_{(n)}^{'} : = (x_{1}^{'}, \dots, x_{n}^{'})

if

x^{'} \in Ω_{12}^{N}

.

(iii) Finally, the Bayes risk of

m_{n}^{*}

converges to 0 for both the

L^{1}

and the

L^{2}

loss functions, i.e.,

lim_{n} E_{Π_{N}} [| m_{n}^{*} (x^{'}, x_{1}) - r_{θ} (x_{1}) |^{k}] = 0, k = 1, 2 .

4. Proofs and Auxiliary Results

Let us introduce some notations for different projections in the Bayesian model (3). Given

(x^{'}, x, θ) \in Ω_{12}^{m} \times Ω_{12} \times Θ

, we write

\begin{matrix} π_{m}^{'} (x^{'}, x, θ) : = x^{'}, π_{m} (x^{'}, x, θ) : = x, π_{j, m} (x^{'}, x, θ) : = x_{j}, j = 1, 2, q_{m} (x^{'}, x, θ) : = θ \\ π_{i, m}^{'} (x^{'}, x, θ) : = x_{i}^{'} : = (x_{i 1}^{'}, x_{i 2}^{'}), π_{(i), m}^{'} (x^{'}, x, θ) : = (x_{1}^{'}, \dots, x_{i}^{'}), \end{matrix}

for

1 \leq i \leq m

(read

i \in N

if

m = N

).

The following result is taken from [13].

Proposition 1.

For

n \in N

,

\begin{matrix} Π_{N}^{(π_{(n), N}^{'}, π_{1, N}, q_{N})} = Π_{n}, Π_{N}^{(π_{(n), N}^{'}, π_{1, N})} = Π_{n}^{(π_{(n), n}^{'}, π_{1, n})}, \\ Π_{m}^{q_{m}} = Q, Π_{m}^{(π_{m}^{'}, q_{m})} = Π_{12, m}, Π_{m}^{π_{m}^{'}} = β_{m}^{*}, Π_{m}^{(π_{m}, q_{m})} = Π_{12, 1}, Π_{m}^{π_{m}} = β_{1}^{*}, \\ Π_{m}^{π_{m}^{'} | q_{m} = θ} = R_{θ}^{m}, Π_{m}^{π_{m} | q_{m} = θ} = R_{θ}, Π_{m}^{q_{m} | π_{m}^{'} = x^{'}} = R_{m, x^{'}}^{*}, Π_{m}^{q_{m} | π_{m} = x} = R_{1, x^{'}}^{*} . \end{matrix}

Remark 1.

It follows from this proposition that the probability space (3) contains all the basic ingredients of the Bayesian experiment (1), i.e., the prior distribution, the sampling probabilities, the posterior distributions, and the prior predictive distribution. When

m = N

, (3) becomes the natural framework to address the asymptotic problem considered in this paper, since it integrates the sample

x^{'}

, the parameter θ, and the argument

x_{1}

of the regression function to be estimated into a simple joint probability distribution.

Lemma 1.

Let

Y (x^{'}, x, θ) : = E_{θ} (p_{2} | p_{1} = x_{1})

.

(i) For

n \in N

and an infinite sample

x^{'}

of the joint distribution

R_{θ}

of

X_{1}

and

X_{2}

, we show that

\begin{matrix} E_{{R_{n, x_{(n)}^{'}}^{*}}^{R}} (p_{2} | p_{1} = x_{1}) = E_{Π_{N}} (Y | (π_{(n), N}^{'}, π_{1, N}) = (x_{(n)}^{'}, x_{1})), \end{matrix}

where

x_{(n)}^{'} : = (x_{1}^{'}, \dots, x_{n}^{'})

.

(ii) For an infinite sample

x^{'}

of the distribution

R_{θ}

, we show that

E_{{R_{N, x^{'}}^{*}}^{R}} (p_{2} | p_{1} = x_{1}) = E_{Π_{N}} (Y | (π_{N}^{'}, π_{1, N}) = (x^{'}, x_{1})) .

Proof for Lemma 1.

(i) According to Lemma 1 of [13], we show that, for all

A_{12, n}^{'} \in {(A_{1} \times A_{2})}^{n}

and all

A_{i} \in A_{i}

,

i = 1, 2

,

\begin{matrix} \begin{matrix} \int_{A_{12, n}^{'} \times A_{1} \times Ω_{2} \times Θ} R_{θ}^{p_{2} | p_{1} = x_{1}} (A_{2}) d Π_{n} (x^{'}, x, θ) = \\ \int_{A_{12, n}^{'} \times A_{1}} {[{R_{n, x^{'}}^{*}}^{R}]}^{p_{2} | p_{1} = x_{1}} (A_{2}) d {Π_{n}}^{(π^{'}, p_{1})} (x^{'}, x_{1}) . \end{matrix} \end{matrix}

(4)

The proof of (i) follows in a standard way from this and Proposition 1 as

E_{θ} (p_{2} | p_{1} = x_{1}) = \int_{R} x_{2} d R_{θ}^{p_{2} | p_{1} = x_{1}} (x_{2}) and E_{{R_{n, x_{(n)}^{'}}^{*}}^{R}} (p_{2} | p_{1} = x_{1}) = \int_{R} x_{2} d {R_{n, x_{(n)}^{'}}^{*}}^{R} (x_{2}) .

(ii) The proof of (ii) is analogous. □

Proof for Theorem 1.

When

A_{(n)}^{'} : = {(π_{(n), N}^{'}, π_{1, N})}^{- 1} (A_{12}^{n} \times A_{1})

, we have that

{(A_{(n)}^{'})}_{n}

is an increasing sequence of sub-

σ

-fields of

A_{12}^{N} \times A_{1}

, such that

A_{12}^{N} \times A_{1} = σ (\cup_{n} A_{(n)}^{'})

. According to the martingale convergence theorem of Lévy, if Y is

A_{12}^{N} \times A_{1} \times T

-measurable and

Π_{N}

-integrable, then

E_{Π_{N}} (Y | A_{(n)}^{'})

converges

Π_{N}

-a.e. and in

L^{1} (Π_{N})

to

E_{Π_{N}} (Y | A_{12}^{N} \times A_{1})

.

Let us consider the measurable function

Y (x^{'}, x, θ) : = E_{θ} (X_{2} | X_{1} = x_{1}) .

Notice that

E_{Π_{N}} (Y) = \int_{Θ} E_{θ} (X_{2}) d Q (θ)

, so Y is

Π_{N}

-integrable. Hence, it follows from the aforementioned theorem of Lévy that

\begin{matrix} lim_{n} E_{{R_{n, x_{(n)}^{'}}^{*}}^{R}} (p_{2} | p_{1} = x_{1}) = E_{{R_{N, x^{'}}^{*}}^{R}} (p_{2} | p_{1} = x_{1}), Π_{N} - a . e . \end{matrix}

(5)

and

\begin{matrix} lim_{n} \int_{Ω_{12}^{N} \times Ω_{12} \times Θ} |E_{{R_{n, x_{(n)}^{'}}^{*}}^{R}} (p_{2} | p_{1} = x_{1}) - E_{{R_{N, x^{'}}^{*}}^{R}} (p_{2} | p_{1} = x_{1})| d Π_{N} (x^{'}, x, θ) = 0 . \end{matrix}

(6)

As a consequence of the known theorem of Doob (see Theorem 6.9 and Proposition 6.10 from [3], pp. 129, 130), for every

x_{1} \in Ω_{1}

,

lim_{n} \int_{Θ} E_{θ^{'}} (X_{2} | X_{1} = x_{1}) d Π_{N}^{q_{N} | (π_{(n), N}^{'}, π_{1, N}) = (x_{(n)}^{'}, x_{1})} (θ^{'}) = E_{θ} (X_{2} | X_{1} = x_{1}), R_{θ}^{N} - a . e .

and for Q, almost every

θ

. Hence, according to Lemma 1 (i),

lim_{n} E_{{R_{n, x_{(n)}^{'}}^{*}}^{R}} (p_{2} | p_{1} = x_{1}) = E_{θ} (X_{2} | X_{1} = x_{1}), R_{θ}^{N} - a . e .

and for Q, almost every

θ

.

In particular,

lim_{n} E_{{R_{n, x_{(n)}^{'}}^{*}}^{R}} (p_{2} | p_{1} = x_{1}) = E_{θ} (X_{2} | X_{1} = x_{1}), Π_{N} - a . e .

In this sense, we can say that the predictive posterior regression curve

E_{{R_{n, x_{(n)}^{'}}^{*}}^{R}} (p_{2} | p_{1} = x_{1})

of

X_{2}

, given

X_{1} = x_{1}

, is a strongly consistent estimator of the sampling regression curve

E_{θ} (X_{2} | X_{1} = x_{1})

of

X_{2}

, given

X_{1} = x_{1}

.

From this and (5), we obtain the following:

E_{{R_{N, x^{'}}^{*}}^{R}} (p_{2} | p_{1} = x_{1}) = E_{θ} (X_{2} | X_{1} = x_{1}), Π_{N} - a . e .

According to (6), we obtain the following:

\begin{matrix} lim_{n} \int_{Ω_{12}^{N} \times Ω_{12} \times Θ} |E_{{R_{n, x_{(n)}^{'}}^{*}}^{R}} (p_{2} | p_{1} = x_{1}) - E_{θ} (X_{2} | X_{1} = x_{1})| d Π_{N} (x^{'}, x, θ) = 0, \end{matrix}

which proves that the Bayes risk of the optimal estimator

E_{{R_{n, x_{(n)}^{'}}^{*}}^{R}} (p_{2} | p_{1} = x_{1})

of the regression curve

E_{θ} (X_{2} | X_{1} = x_{1})

converges to 0 for the

L^{1}

-loss function.

We wonder if that also happens for the

L^{2}

-squared loss function, i.e., if the Bayes risk

E_{Π_{n}} [{(m_{n}^{*} (x^{'}, x_{1}) - r_{θ} (x_{1}))}^{2}]

converges vers 0 when n goes to ∞. Theorem 6.6.9 of [19] shows that the answer is affirmative because

m_{n}^{*} (x^{'}, x_{1}) = E_{Π_{N}} (Y | A_{(n)}^{'})

and, by Jensen’s inequality,

\begin{matrix} E_{Π_{N}} (E_{Π_{N}} {(Y | A_{(n)}^{'})}^{2}) \leq E_{Π_{N}} (E_{Π_{N}} (Y^{2} | A_{(n)}^{'})) = E_{Π_{N}} (Y^{2}) \leq \int_{Θ} E_{θ} (X_{2}^{2}) d Q (θ) < \infty . \end{matrix}

This completes the proof. □

5. Examples

Example 1.

Let us suppose that, for

θ, λ, x_{1} > 0

,

P_{θ}^{X_{1}} = G (1, θ^{- 1})

,

P_{θ}^{X_{2} | X_{1} = x_{1}} = G (1, {(θ x_{1})}^{- 1})

, and

Q = G (1, λ^{- 1})

, where

G (α, β)

denotes the gamma distribution of parameters

α, β > 0

. Hence, the joint density of

X_{1}

and

X_{2}

is

f_{θ} (x_{1}, x_{2}) = θ^{2} x_{1} exp {- θ x_{1} (1 + x_{2})} I_{{] 0, \infty [}^{2}} (x_{1}, x_{2}) .

It is shown in [13], Example 1, that the Bayes estimator of the conditional density function

f_{θ}^{X_{2} | X_{1} = x_{1}} (t) = θ x_{1} t exp {- θ x_{1} t} I_{] 0, \infty [} (t)

(for

x_{1} > 0

) is the conditional density of

X_{2}

, given

X_{1} = x_{1}

, with respect to the posterior predictive distribution, i.e.,

m_{n}^{*} (x^{'}, x_{1}) = {f_{n, x^{'}}^{*}}^{X_{2} | X_{1} = x_{1}} (t) = \frac{(2 n + 2) x_{1} a_{n} {(x^{'}, x_{1})}^{2 n + 2}}{{(x_{1} t + a_{n} (x^{'}, x_{1}))}^{2 n + 3}},

where

a_{n} (x^{'}, x_{1}) = λ + x_{1} + \sum_{i = 1}^{n} x_{i 1}^{'} (1 + x_{i 2}^{'}) .

Let

φ : R_{+} \to R

be a real bounded measurable function. Hence,

φ \circ X_{2}

satisfies the conditions of Theorem 1. The Bayes estimator of the regression curve

r_{θ} (x_{1}) : = E_{θ} (φ \circ X_{2} | X_{1} = x_{1}) = \int_{0}^{\infty} φ (t) \cdot f_{θ}^{X_{2} | X_{1} = x_{1}} (t) d t

is

m_{n}^{*} (x^{'}, x_{1}) = \int_{0}^{\infty} φ (t) \cdot {f_{n, x^{'}}^{*}}^{X_{2} | X_{1} = x_{1}} (t) d t = \int_{0}^{\infty} φ (t) \cdot \frac{(2 n + 2) x_{1} a_{n} {(x^{'}, x_{1})}^{2 n + 2}}{{(x_{1} t + a_{n} (x^{'}, x_{1}))}^{2 n + 3}} d t .

For instance, it is readily shown that, if

φ = I_{] 0, 1 [}

, then the Bayes estimator is

m_{n}^{*} (x^{'}, x_{1}) = 1 - {(\frac{λ + x_{1} + \sum_{i = 1}^{n} x_{i 1}^{'} (1 + x_{i 2}^{'})}{λ + 2 x_{1} + \sum_{i = 1}^{n} x_{i 1}^{'} (1 + x_{i 2}^{'})})}^{2 n + 2} .

Theorem 1 shows that this a strongly consistent estimator of the regression curve

r_{θ} (x_{1})

, and its Bayes risk converges to 0 for both the

L^{1}

and

L^{2}

loss functions.

Example 2.

Let us suppose that

X_{1}

has a Bernoulli distribution of unknown parameter

θ \in] 0, 1 [

(i.e.,

P_{θ}^{X_{1}} = B i (1, θ)

), and, given

X_{1} = k_{1} \in {0, 1}

,

X_{2}

has distribution

B i (1, 1 - θ)

when

k_{1} = 0

and

B i (1, θ)

when

k_{1} = 1

, i.e.,

P_{θ}^{X_{2} | X_{1} = k_{1}} = B i (1, k_{1} + (1 - 2 k_{1}) (1 - θ))

. We can think of tossing a coin with probability θ of getting heads (

= 1

) and making a second toss of this coin if it comes up heads on the first toss or tossing a second coin with probability

1 - θ

of making heads if the first toss is tails (

= 0

). Consider the uniform distribution on

] 0, 1 [

as the prior distribution Q.

So, the joint probability function of

X_{1}

and

X_{2}

is

\begin{matrix} \begin{matrix} f_{θ} (k_{1}, k_{2}) & = θ^{k_{1}} {(1 - θ)}^{1 - k_{1}} {[k_{1} + (1 - 2 k_{1}) (1 - θ)]}^{k_{2}} {[1 - k_{1} - (1 - 2 k_{1}) (1 - θ)]}^{1 - k_{2}} \\ = \{\begin{matrix} θ (1 - θ) if k_{2} = 0, \\ {(1 - θ)}^{2} if k_{1} = 0, k_{2} = 1, \\ θ^{2} if k_{1} = 1, k_{2} = 1 . \end{matrix} \end{matrix} \end{matrix}

It is shown in [13], Example 2, that the Bayes estimator of the conditional mean

r_{θ} (k_{1}) : = E_{θ} (X_{2} | X_{1} = k_{1}) = θ^{k_{1}} {(1 - θ)}^{1 - k_{1}}

is, for

k_{1} = 0, 1

,

m_{n}^{*} (k^{'}, k_{1}) = {f_{n, k^{'}}^{*}}^{X_{2} | X_{1} = k_{1}} (1) = \{\begin{matrix} \frac{n_{+ 0} (k^{'}) + 2 n_{01} (k^{'}) + 1}{2 n + n_{+ 0} (k^{'}) + 2 n_{01} (k^{'}) + 3} & if k_{1} = 0, \\ \frac{n_{+ 0} (k^{'}) + 2 n_{01} (k^{'}) + 1}{2 n + n_{+ 0} (k^{'}) + 2 n_{01} (k^{'}) + 4} & if k_{1} = 1, \end{matrix}

n_{j_{1} j_{2}} (k^{'})

being the number of indices

i \in {1, \dots, n}

, such that

(k_{i 1}^{'}, k_{i 2}^{'}) = (j_{1}, j_{2})

and

n_{+ j} = n_{0 j} + n_{1 j}

for

j = 0, 1

.

Theorem 1 proves that it is a strongly consistent estimator of the conditional mean

r_{θ} (k_{1})

, and its Bayes risk converges to 0 for both the

L^{1}

and

L^{2}

loss functions.

Example 3.

Let

(X_{1}, X_{2})

have a bivariate normal distribution

N_{2} ((\begin{matrix} θ \\ θ \end{matrix}), σ^{2} (\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix})),

and consider the prior distribution

Q = N (μ, τ^{2})

, where

μ, σ, τ, a n d ρ

are supposed to be known, and θ refers to the unknown parameter. It is shown in [13], Example 3, that the conditional mean

E_{{R_{n, x^{'}}^{*}}^{R}} (p_{2} | p_{1} = x_{1}) = (1 - ρ_{1}) m_{1} (x^{'}) + ρ_{1} x_{1}

is the Bayes estimator of the regression curve

E_{θ} (X_{2} | X_{1} = x_{1}) = (1 - ρ) θ + ρ x_{1}

for the squared error loss function, where

\begin{matrix} ρ_{1} = - \frac{a_{n} (ρ, σ, τ) + \frac{1 - ρ}{1 + ρ}}{a_{n} (ρ, σ, τ) - \frac{1 - ρ}{1 + ρ}} \cdot ρ, m_{1} (x^{'}) = \frac{s_{1} (x^{'}) + (1 + ρ) \frac{σ^{2}}{τ^{2}} μ}{2 (1 - ρ_{1}) {(1 + ρ)}^{2} σ^{2} a_{n} (ρ, σ, τ)}, \end{matrix}

being

\begin{matrix} s_{1} (x^{'}) : = \sum_{i} (x_{i 1}^{'} + x_{i 2}^{'}), a_{n} (ρ, σ, τ) : = 2 (n + 1) (1 + ρ) + \frac{σ^{2}}{τ^{2}} . \end{matrix}

Theorem 1 proves that it is a strongly consistent estimator of the regression curve, and its Bayes risk converges to 0 for both the

L^{1}

and

L^{2}

loss functions.

Example 4.

In this (discrete and nonparametric) example, we assume that

(X_{1}, X_{2})

is a

N_{0}^{2}

-valued random variable with unknown arbitrary probability distribution P in

M (N_{0}^{2})

, the set of all probability distributions on

(N_{0}^{2}, P (N_{0}^{2}))

;

P

stands for the discrete σ-field. The Bayesian experiment (1) considered in this example for an n-sized sample of the joint distribution P of

(X_{1}, X_{2})

is

({(N_{0}^{2})}^{n}, P {(N_{0}^{2})}^{n}, {P^{n} : P \in (M (N_{0}^{2}), B_{M (N_{0}^{2})}, D_{α})})

where

B_{M (N_{0}^{2})}

is the Borel σ-field on

M (N_{0}^{2})

for the weak topology (for which it becomes a Polish space), α a finite measure on

N_{0}^{2}

, and

D_{α}

the Dirichlet process with base measure α, which plays the role of prior distribution. We refer to [3] or [20] for everything related to Dirichlet processes.

Let

φ : N_{0} \to R

be a real bounded function, so that

φ \circ X_{2}

satisfies the conditions of Theorem 1. To estimate the regression function

E_{P} (φ \circ p_{2} | p_{1} = k) = \frac{\sum_{j \in N_{0}} φ (j) P (k, j)}{\sum_{j \in N_{0}} P (k, j)}

from a sample

x^{'} : = (x_{1}^{'}, \dots, x_{n}^{'}) \in {(N_{0}^{2})}^{n}

, we need the posterior predictive distribution given

x^{'}

which, as is known (see [20], for instance), is

{R_{n, x^{'}}^{*}}^{R} = \frac{α + \sum_{i = 1}^{n} δ_{x_{i}^{'}}}{α (N_{0}^{2}) + n},

where

δ_{x_{i}^{'}} (k, l) = 1

when

x_{i}^{'} = (k, l)

, and

= 0

otherwise. Then, the Bayes estimator of the regression function

E_{P} (φ \circ p_{2} | p_{1} = k) = \sum_{j \in N_{0}} φ (j) P (p_{2} = j | p_{1} = k)

is the regression function of

φ \circ p_{2}

, given

p_{1} = k

, with respect to the posterior predictive distribution, given

x^{'}

, i.e.,

E_{({R_{n, x^{'}}^{*}}^{R})} (φ \circ p_{2} | p_{1} = k) = \frac{\sum_{j \in N_{0}} φ (j) [α (k, j) + \sum_{i = 1}^{n} δ_{x_{i}^{'}} (k, j)]}{\sum_{j \in N_{0}} [α (k, j) + \sum_{i = 1}^{n} δ_{x_{i}^{'}} (k, j)]} .

Theorem 1 proves that it is a strongly consistent estimator of this regression curve, and its Bayes risk converges to 0 for both the

L^{1}

and

L^{2}

loss functions.

The next example is a continuation of Example 5.5 of [17].

Example 5.

In this (nonparametric with continuous probability measure base for the prior Dirichlet process) example, we assume that

(X_{1}, X_{2})

is a

R^{2}

-valued random variable with an unknown arbitrary probability distribution P in

M (R^{2})

, and the set of all probability distributions on

(R^{2}, R^{2}))

;

R^{2}

stands for the Borel σ-field on

R^{2}

. The Bayesian experiment (1) considered in this example, for an n-sized sample of the joint distribution P of

(X_{1}, X_{2})

, is

({(R^{2})}^{n}, {(R^{2})}^{n}, {P^{n} : P \in (M (R^{2}), B_{M (R^{2})}, D_{α})})

where

B_{M (R^{2})}

is the Borel σ-field on

M (R^{2})

for the weak topology (for which it becomes a Polish space), α a probability measure on

R^{2}

, and

D_{α}

the Dirichlet process with base measure α, which plays the role of prior distribution. A reference to [3] or [20] for everything related to Dirichlet processes is still appropriate. The posterior predictive distribution, given the sample

x^{'} : = (x_{1}^{'}, \dots, x_{n}^{'}) \in {(R^{2})}^{n}

, is known to be

{R_{n, x^{'}}^{*}}^{R} = \frac{α + \sum_{i = 1}^{n} δ_{x_{i}^{'}}}{n + 1} = \frac{1}{n + 1} α + \frac{n}{n + 1} \frac{1}{n} \sum_{i = 1}^{n} δ_{x_{i}^{'}},

where

δ_{x_{i}^{'}} (x_{1}, x_{2}) = 1

when

x_{i}^{'} = (x_{1}, x_{2})

, and

= 0

otherwise. Note that this distribution is a convex combination of the probability measure α, the base measure of the prior distribution, and the empirical measure, a mixture in which the weight of the data increases with the sample size (in fact, it reaches 1). The Bayes estimator of the conditional distribution

P^{X_{2} | X_{1} = x_{1}}

is the conditional distribution of

p_{2}

given

p_{1} = x_{1}

with respect to the posterior predictive distribution, given

x^{'}

:

{({R_{n, x^{'}}^{*}}^{R})}^{p_{2} | p_{1} = x_{1}} .

It is shown in [17] that this conditional distribution can be calculated, for a Borel set

B \in R

, as follows:

\begin{matrix} {({R_{n, x^{'}}^{*}}^{R})}^{p_{2} | p_{1} = x_{1}} (B) = \\ I_{{x_{11}^{'}, \dots, x_{n 1}^{'}}^{c}} (x_{1}) \cdot α^{p_{2} | p_{1} = x_{1}} (B) + I_{{x_{11}^{'}, \dots, x_{n 1}^{'}}} (x_{1}) \cdot \frac{\sum_{i = 1}^{n} δ_{x_{i}^{'}} ({x_{1}} \times B)}{\sum_{i = 1}^{n} δ_{x_{i 1}^{'}} (x_{1})} \end{matrix}

Let us now consider a real bounded measurable function φ on

R

(then

φ \circ p_{2}

satisfies the conditions of Theorem 1. The Bayes estimator of the regression function

E_{P} (φ \circ p_{2} | p_{1} = x_{1})

is the regression function of

φ \circ p_{2}

, given

p_{1} = x_{1}

, with respect to the posterior predictive distribution, given

x^{'}

, i.e.,

m_{n}^{*} (x^{'}, x_{1}) : = E_{{R_{n, x^{'}}^{*}}^{R}} (φ \circ p_{2} | p_{1} = x_{1}) .

So

\begin{matrix} \begin{matrix} m_{n}^{*} (x^{'}, x_{1}) & = I_{{x_{11}^{'}, \dots, x_{n 1}^{'}}^{c}} (x_{1}) \cdot \int_{R} φ (x_{2}) d α^{X_{2} | X_{1} = x_{1}} (x_{2}) \\ + I_{{x_{11}^{'}, \dots, x_{n 1}^{'}}} (x_{1}) \cdot \frac{1}{\sum_{i = 1}^{n} δ_{x_{i 1}^{'}} (x_{1})} \sum_{i = 1}^{n} \int_{R} φ (x_{2}) d δ_{x_{i}^{'}} (x_{1}, d x_{2}) . \end{matrix} \end{matrix}

This way, if

x_{1} \in {x_{11}^{'}, \dots, x_{n 1}^{'}}^{c}

,

m_{n}^{*} (x^{'}, x_{1})

is the mean conditional of

φ \circ p_{2}

, given

p_{1} = x_{1}

for the base probability measure α (for instance, if α is the product

α_{1} \times α_{2}

of two probability distributions in

R

, then

m_{n}^{*} (x^{'}, x_{1})

is the mean

E_{α_{2}} (φ)

in this case).

If

x_{1} \in {x_{11}^{'}, \dots, x_{n 1}^{'}}

, denote

S_{x_{1}}

as the set of index

1 \leq i \leq n

, such that

x_{i 1}^{'} = x_{1}

and

s_{x_{1}}

the number of such indices. In this case,

m_{n}^{*} (x^{'}, x_{1}) = \frac{1}{s_{x_{1}}} \sum_{i \in S_{x_{1}}} φ (x_{i 2}^{'}) .

Therefore, if

x_{1}

is (respectively, is not) in the sample support, only the empirical measure (respectively, the base measure α of the prior distribution) is taken into account when estimating the regression curve at the point

x_{1}

.

Theorem 1 shows that the Bayes risk of this estimator of the regression curve

E_{P} (φ \circ p_{2} | p_{1} = x_{1})

converges to 0 for both the

L^{1}

and

L^{2}

loss functions and that it is a strongly consistent estimator of this regression curve.

Remark 2.

In Example 5, the problem of estimating the density has no sense because

M (R^{2})

contains all the probability measures on the plane. In fact, it is known (see [3]) that, even if α is absolutely continuous, as is the case,

D_{α}

is concentrated in the set of discrete probability measures on

R^{2}

, which discourages its use as the prior distribution in the set of all density functions. Fortunately, Theorem 3.1 of [17] allows us to address the problem in terms of the conditional distribution and, finally, obtain the Bayes estimator of the regression function without needing to appeal to density.

6. Conclusions

The optimality, in a decisional sense, of the regression curve with respect to the posterior predictive distribution as an estimator of the regression curve for the squared error loss function, together with its asymptotic behavior—consistency and convergence to 0 of the Bayes risk—is an important point in favor of this method of estimating the regression curve against to other estimation methods in a Bayesian context.

A remarkable fact about the results of this paper (and [13]) is that the predictor

X_{1}

is an arbitrary random variable (not necessarily a real or n-dimensional random variable). Furthermore, no special assumptions are made about the prior distribution.

An important issue of the paper is the establishment of a certain probability space as the appropriate theoretical framework for the study of the asymptotic behavior of the estimator of the regression curve, allowing for obtaining an explicit expression of the Bayes risk of this estimator and take advantage of powerful probabilistic tools when solving the problem of its asymptotic behavior.

Funding

This research was funded by the Junta de Extremadura (SPAIN) grant number GR24055.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

I would like to thank a reviewer for their comments, which have resulted in a clearer and more precise version of some of the examples presented.

Conflicts of Interest

The author declares no conflicts of interest.

References

Nadaraya, E.A. On estimating regression. Theory Probab. Its Appl. 1964, 9, 141–142. [Google Scholar] [CrossRef]
Watson, G.S. Smooth regression analysis. Sankhya Ser. A 1964, 26, 359–372. [Google Scholar]
Ghosal, S.; Vaart, A.V.D. Fundamentals of Nonparametric Bayesian Inference; Cambridge University Press: Cambridge, UK, 2017. [Google Scholar]
Nogales, A.G. On Bayesian estimation of densities and sampling distributions: The posterior predictive distribution as the Bayes estimator. Stat. Neerl. 2022, 76, 236–250. [Google Scholar] [CrossRef]
Bean, A.; Xu, X.; MacEachern, S. Transformations and Bayesian density estimation. Electron. J. Stat. 2016, 10, 3355–3373. [Google Scholar] [CrossRef]
Lijoi, A.; Prünster, I. Models beyond the Dirichlet Process. In Bayesian Nonparametrics; Hjort, N.L., Holmes, C., Müller, P., Walker, S.G., Eds.; Cambridge Series in Statistical and Probabilistic Mathematics; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
Lo, A.Y. On a class of Bayesian nonparametric estimates. I. Density estimates. Ann. Statist. 1984, 12, 351–357. [Google Scholar] [CrossRef]
Marchand, É.; Sadeghkhani, A. Predictive density estimation with additional information. Electron. J. Stat. 2018, 12, 4209–4238. [Google Scholar] [CrossRef]
Nogales, A.G. On consistency of the Bayes Estimator of the Density. Mathematics 2022, 10, 636. [Google Scholar] [CrossRef]
Efromovich, S. Conditional Density Estimation in a Regression Setting. Ann. Stat. 2007, 35, 2504–2535. [Google Scholar] [CrossRef]
Izbicki, R.; Lee, A.B. Nonparametric conditional density estimation in a high setting. J. Comput. Graph. Stat. 2016, 25, 1297–1316. [Google Scholar] [CrossRef]
Rosenblatt, M. Conditional probability density and regression estimators. In Multivariate Analysis II; Krishnaiah, P.R., Ed.; Academic Press: New York, NY, USA, 1969; pp. 25–31. [Google Scholar]
Nogales, A.G. Optimal Bayesian Estimation of a Regression Curve, a Conditional Density, and a Conditional Distribution. Mathematics 2022, 10, 1213. [Google Scholar] [CrossRef]
Geisser, S. Predictive Inference: An Introduction; Chapman & Hall: New York, NY, USA, 1993. [Google Scholar]
Gelman, A.; Carlin, J.B.; Stern, H.S.; Dunson, D.B.; Vehtari, A.; Rubin, D.B. Bayesian Data Analysis, 3rd ed.; CRC Press (Taylor & Francis Group): Boca Raton, FL, USA, 2014. [Google Scholar]
Florens, J.P.; Mouchart, M.; Rolin, J.M. Elements of Bayesian Statistics; Marcel Dekker: New York, NY, USA, 1990. [Google Scholar]
Nogales, A.G. The Bayes Estimator of a Conditional Density: Asymptotic Behavior. Braz. J. Probab. Stat. 2024, 38, 531–548. [Google Scholar] [CrossRef]
Nadaraya, E.A. Nonparametric Estimation of Probability Densities and Regression Curves; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1989. [Google Scholar]
Ash, R.B.; Dóleans-Dade, C. Probability and Measure Theory, 2nd ed.; Academic Press: San Diego, CA, USA, 2000. [Google Scholar]
Ghosh, J.K.; Delampady, M.; Samanta, T. An Introduction to Bayesian Analysis, Theory and Methods; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nogales, A.G. Asymptotic Behavior of the Bayes Estimator of a Regression Curve. Mathematics 2025, 13, 2319. https://doi.org/10.3390/math13142319

AMA Style

Nogales AG. Asymptotic Behavior of the Bayes Estimator of a Regression Curve. Mathematics. 2025; 13(14):2319. https://doi.org/10.3390/math13142319

Chicago/Turabian Style

Nogales, Agustín G. 2025. "Asymptotic Behavior of the Bayes Estimator of a Regression Curve" Mathematics 13, no. 14: 2319. https://doi.org/10.3390/math13142319

APA Style

Nogales, A. G. (2025). Asymptotic Behavior of the Bayes Estimator of a Regression Curve. Mathematics, 13(14), 2319. https://doi.org/10.3390/math13142319

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Asymptotic Behavior of the Bayes Estimator of a Regression Curve

Abstract

1. Introduction

2. The Framework

3. The Bayes Estimator of the Regression Curve: Asymptotic Behavior

4. Proofs and Auxiliary Results

5. Examples

6. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI