Multivariate Universal Local Linear Kernel Estimators in Nonparametric Regression: Uniform Consistency

Linke, Yuliana; Borisov, Igor; Ruzankin, Pavel; Kutsenko, Vladimir; Yarovaya, Elena; Shalnova, Svetlana

doi:10.3390/math12121890

Open AccessArticle

Multivariate Universal Local Linear Kernel Estimators in Nonparametric Regression: Uniform Consistency

by

Yuliana Linke

^1,2,*

,

Igor Borisov

^1,2

,

Pavel Ruzankin

^1,2

,

Vladimir Kutsenko

^3,4

,

Elena Yarovaya

^3,4,*

and

Svetlana Shalnova

⁴

¹

Sobolev Institute of Mathematics, 630090 Novosibirsk, Russia

²

Department of Probability and Mathematical Statistics, Novosibirsk State University, 630090 Novosibirsk, Russia

³

Department of Probability Theory, Lomonosov Moscow State University, 119234 Moscow, Russia

⁴

Department of Epidemiology of Noncommunicable Diseases, National Medical Research Center for Therapy and Preventive Medicine, 101990 Moscow, Russia

^*

Authors to whom correspondence should be addressed.

Mathematics 2024, 12(12), 1890; https://doi.org/10.3390/math12121890

Submission received: 15 May 2024 / Revised: 7 June 2024 / Accepted: 13 June 2024 / Published: 18 June 2024

(This article belongs to the Section D1: Probability and Statistics)

Download Versions Notes

Abstract

In this paper, for a wide class of nonparametric regression models, new local linear kernel estimators are proposed that are uniformly consistent under close-to-minimal and visual conditions on design points. These estimators are universal in the sense that their designs can be either fixed and not necessarily satisfying the traditional regularity conditions, or random, while not necessarily consisting of independent or weakly dependent random variables. With regard to the design elements, only dense filling of the regression function domain with the design points without any specification of their correlation is assumed. This study extends the dense data methodology and main results of the authors’ previous work for the case of regression functions of several variables.

Keywords:

nonparametric regression; local linear estimator; uniform consistency; fixed design; random design; strongly dependent design elements

MSC:

62G08

1. Introduction

We consider the following regression model:

X_{i} = f (z_{i}) + ε_{i}, i = 1, \dots, n,

(1)

where

f (t)

,

t = {(t_{1}, \dots, t_{k})}^{⊤} \in P \subset R_{+}^{k}

,

k \geq 1

, is an unknown real-valued random function (random field) that is continuous on a compact set

P

with probability 1. The design

{z_{i}; i = 1, \dots, n}

consists of a set of observable random k-dimensional vectors taking values in

P

with possibly unknown distributions, and it is not necessarily independent or identically distributed. We consider the design as an array of random vectors

{z_{i}, i = 1, \dots, n}

that may depend on n. In particular, this scheme includes models with a fixed design. It is not assumed that the random function

f (t)

does not depend on the design. Some conditions on the random error

{ε_{i}; i = 1, \dots, n}

are given below.

This paper is devoted to the construction of uniformly consistent (in the sense of convergence in probability) kernel-type estimators for the regression function

f (t)

under minimal assumptions on the dependence of the design points.

Let us review publications related to kernel estimators in the problem under consideration. Here, we do not aspire to present a comprehensive overview of this actively developing area of nonparametric estimation and we will only indicate publications representing the main trends in this direction. The most popular procedures for kernel estimation in the classical case of a nonrandom regression function are apparently related to the Nadaraya–Watson estimators, local polynomial estimators (in particular, local linear estimators), Priestley–Chao estimators, and Gasser–Müller, as well as their modifications (see, for example, books [1,2,3,4,5]).

We are primarily interested in the conditions on the design elements, and in this regard, we note that traditionally in regression problems, it is customary to consider deterministic and random designs separately. Part of this division seems to be due to different approaches to the study of estimators in one or another case. Moreover, initially, there was some specification by design type between kernel-type estimators: the Nadaraya–Watson estimators were studied, for example, only in the case of random design, while the Priestley–Chao and Gasser-Müller estimators are for a one-dimensional nonrandom case. Further in the direction mentioned, many various generalizations were obtained and the above-mentioned bounds were blurred (see, for example, [6,7,8,9]). The Nadaraya–Watson estimators in the case of a nonrandom regular multidimensional design were studied, for example, in [10].

In the case of deterministic design in the vast majority of works, one or another conditions for the design regularity are assumed (for example, see [10,11,12,13,14,15,16,17,18,19]). In papers dealing with random design, independent identically distributed design elements are often considered [1,11,20,21,22,23,24,25,26,27,28,29,30]. But over the past decades, many forms of dependent random variables have been proposed and the corresponding limit theorems for sequences with such properties (as well as probabilistic inequalities) were proved. Development in this direction of probability theory has also fully affected the problems of nonparametric regression so that as design elements, samples from a stationary sequence of random variables are often considered that satisfy one or another known form of dependence. In particular, to construct the design elements, the authors used various mixing conditions, moving average schemes, associated random variables, Markov or martingale properties, etc. In this regard, we note, for example, the papers [1,18,21,31,32,33,34,35,36,37,38,39,40,41]. In the recent papers [42,43,44,45,46,47,48], there are considered nonstationary sequences of design elements with certain special forms of dependence (Markov chains, autoregression, partial sums of moving averages, etc.). The uniform consistency of the kernel-type estimators of the regression function, both in the case of deterministic and random design, was studied by many authors (see, for example, [11,16,18,23,24,25,26,36,38,39,42,43,44,49,50] and the references therein).

It is worth noting that the nature of dependence of real sample data in statistics is difficult to determine if the dependence of observations takes place by the nature of the stochastic experiment. In this regard, the creation and development of new approaches to the statistical analysis of dependent observations of large size that do not satisfy the classical mixing conditions and other known forms of correlation, as well as the study of new forms of dependence, which would be statistically more clear and justified, is of interest not only from a theoretical point of view but is also relevant and especially important for applications.

In the present paper, we continue to develop the concept of dense data proposed in [51,52,53]. In these papers, it is established that to restore the regression function, it is enough to know the noisy values of this function on some dense (in one sense or another) set of points from the regression function domain. We succeeded in constructing new kernel-type estimators by using special sums of weighted observations with the structure of Riemann integral sums that are close to the corresponding integrals in the case of dense design. In this case, the stochastic nature of the design points does not play any role. In the papers of predecessors, in the case of random design, the fulfillment of the dense filling with design points of the regression function domain was satisfied due to various concrete forms of the weak dependence of design points, and the asymptotic properties of the estimators were studied using corresponding probabilistic limit theorems. It is important to emphasize that the new estimators are uniformly consistent not only in the cases of weak dependence mentioned above but also for significantly different correlations of observations when the conditions of ergodicity or stationarity are not satisfied, as well as the classical mixing conditions and other known conditions of dependence (see Example 2 in Section 2 below). In addition, the new estimators have the property of universality regarding the nature of the design: it can be either fixed and not necessarily regular or random, while not necessarily satisfying the traditional correlation conditions.

Note that the proposed estimators belong to the class of local linear kernel estimators, but with some weights other than those used in the classical version. These weights are given by the Lebesgue measure of the elements of some finite random partition of the design sample space, where each element of the partition corresponds to one design point. In this paper, explicit upper bounds on the rate of uniform convergence in the probability of the new estimators to the regression function are obtained simultaneously for both fixed and random designs, while, in contrast to the previously known results, the rate of convergence of our estimators are insensitive to the correlation structure of the design points. The only design characteristic explicitly included in the resulting upper bounds is the minimal radius of the epsilon net formed by the design elements in the regression function domain. This minimal radius is a qualitatively different characteristic compared with previously known ones, in terms of which it is possible to describe sufficient conditions for the uniform consistency of kernel-type estimators. Its advantage over classical weak dependence characteristics is that it is insensitive to the correlation of the design observations. The main thing is that as the size of the observations indefinitely increases, this radius tends to zero in probability. Such a requirement, as we have already noted above, is essentially necessary since only when the design is densely filled with the regression function domain, it is possible to restore it with one or another accuracy.

Previously, similar ideas were implemented in [51,52] for local constant estimators and in [53] for local linear estimators with a univariate design. The estimators from [53] are a particular case (for

k = 1

) of the estimators proposed here. Note that the construction of estimators from [53], when the weights of the weighted least squares method are the spacing statistics constructed by the variational series of design points, has no direct generalization to the case of functions of many variables. Similar conditions on design elements were also used in [54,55] for nonparametric regression, and in [56,57,58] for nonlinear regression. In particular, in [54,55], similar conditions were proposed for the Nadaraya–Watson estimators, but these conditions guarantee only pointwise consistency. In [54,55,59], conditions for the uniform consistency of Nadaraya–Watson estimators and classical local linear estimators are obtained in terms of dense data, but the conditions are not as simple as in the present paper and require a more uniform dense filling with design points of the regression function domain than is required in this paper.

Note also that the model (1) assumes that the unknown function

f (t)

is a random process with almost surely continuous trajectories (random field). This more general statement (in comparison with the classical one), is considered partly in order to use the obtained results as an application for estimating the mean and covariance functions of a continuous random field. In connection with the random regression function, we note, for example, recent works [60,61,62,63,64,65,66,67], in which the mean and covariance functions of the random regression function f are estimated in the case of N independent realizations

f_{1}, \dots, f_{N}

of the function f, there are noisy values of each of these random curves in some set of the design elements (the design can be either common to all trajectories or different from series to series). We consider one of the variants of this problem as an application of the main result. Previously, we considered some statements of this problem in the case of a univariate design (see [51,53,68,69,70]).

To conclude the Introduction, it is worth noting that all kernel methods have the so-called “curse of dimensionality” property: as the dimension of the design points increases, the convergence rates of the kernel methods decrease. Therefore, in this case, kernel methods require relatively large samples to achieve the required accuracy. In this regard, reducing the dimension of the design by selecting relevant factors play an important role. As a guide in this direction, we point out the recent paper by [71], where one can find a detailed bibliography.

This paper has the following structure. Section 2 contains the main results. In Section 3, as a corollary, the problem of estimating the mean function in the dense design case is considered. The proofs of the results are found in Section 4.

Currently, the authors are preparing a continuation of this work for publication that will provide both a computer simulation of the statistical procedures proposed in this work and examples of processing real data.

2. Main Results

2.1. Notation and Main Assumptions

We agree that vectors are column vectors. Vectors are denoted by bold letters, and matrices are in straight capital letters. Denote by

diag {d_{1}, \dots, d_{m}}

the diagonal matrix of dimension

m \times m

with relevant elements on the main diagonal. The symbol ^⊤ denotes the transposition of a vector or matrix, and the determinant of some matrix

A

is denoted by

det A

. Denote by

Λ_{k} (\cdot)

the Lebesgue measure in

R^{k}

. For any vector

x = {(x_{1}, \dots, x_{k})}^{⊤}

, the symbol

∥ x ∥

means the supnorm in

R^{k}

, i.e.,

∥ x ∥ = {max}_{j = 1, \dots k} | x_{j} |

. For an arbitrary matrix

X

, as a matrix norm, we consider the norm subordinated to the vector supnorm, i.e.,

∥ X ∥ = {max}_{l \leq k} \sum_{m \leq k} | X_{l m} |

, where the symbol

X_{l m}

here and below denotes the matrix entry at the intersection of the l-th row and m-th column. Notice that we may consider any vector column

x

as a matrix of dimension

k \times 1

and its supnorm

∥ x ∥

coincides with the matrix norm above. Therefore, we use one symbol

∥ \cdot ∥

to denote these two norms.

By

d (A)

, we denote the diameter of a set A, i.e.,

d (A) = {sup}_{x, y \in A} ∥ x - y ∥

. We also need a notation for the modulus of continuity of a function f defined on a unit k-dimensional cube

{[0, 1]}^{k}

:

ω_{f} (h) = sup_{x, y : ∥ x - y ∥ \leq h} | f (x) - f (y) | .

Next, by

O_{p} (η_{n})

, we denote a random variable

ζ_{n}

such that for all

M > 0

, one has

\underset{n \to \infty}{lim sup} P (| ζ_{n} | / η_{n} > M) \leq β (M),

where

{lim}_{M \to \infty} β (M) = 0

and

{η_{n}}

are positive (maybe random or not) variables and the function

β (M)

does not depend on the parameters of the model under consideration. We agree that throughout what follows, all limits, unless otherwise stated, are taken for

n \to \infty

. In what follows, without loss of generality, we assume that

P = {[0, 1]}^{k}

.

We now formulate the following four main assumptions.

(A_{1})

The observations

{(z_{i}, X_{i}), i = 1, \dots, n}

have the structure (1), where

f (t)

,

t \in {[0, 1]}^{k}

is an unknown real-valued random function on

{[0, 1]}^{k}

that is continuous with probability 1. The design points

{z_{i}; i = 1, \dots, n}

are a set of observable k-dimensional random variables with, generally speaking, unknown distributions and values in

{[0, 1]}^{k}

, not necessarily independent or identically distributed; in this case, the random variables

{z_{i}}

may depend on n.

(A_{2})

For all

n \geq 1

, the unobservable random errors

{ε_{i}; i = 1, \dots, n}

form a sequence of martingale differences under the condition

M_{p} = sup_{i \leq n} E {| ε_{i} |}^{p} < \infty when some p > k and p \geq 2,

where

M_{p}

does not depend on n. It is also assumed that

{ε_{i}}

is independent of

{z_{i}}

and the random function

f (\cdot)

; moreover, the random variables

{ε_{i}}

may depend on n.

(A_{3})

The kernel function

K (t)

,

t \in R^{k}

, vanishes outside the cube

{[- 1, 1]}^{k}

and can be represented as

K (t) = \prod_{j = 1}^{k} K_{o} (t_{j})

for

t = {(t_{1}, \dots, t_{k})}^{⊤}

, where

K_{o} (\cdot)

is a symmetric distribution density with the support

[- 1, 1]

. We assume that the function

K_{o} (t)

satisfies the Lipschitz condition with constant

1 \leq L < \infty

and

K_{o} (\pm 1) = 0

.

In what follows, we need the notation

K_{h} (t) = h^{- k} K (h^{- 1} t)

. It is clear that

K_{h} (t)

is a distribution density on

{[- h, h]}^{k}

.

The following condition is the only limitation on the design.

(A_{4})

For each n, there exists a random partition of the set

{[0, 1]}^{k}

into n Jordan-measurable subsets

{P_{i}; i = 1, \dots, n}

such that every element of this partition contains exactly one point from the set

{z_{i}; i = 1, \dots, n}

(the numbering of elements partition is such that

z_{i} \in P_{i}

), and

δ_{n} = {max}_{i \leq n} d (P_{i}) \overset{p}{\to} 0

. Here, it is assumed that the diameters are random variables, i.e., measurable mappings of the probability space.

Remark 1.

Traditionally, a family of nonempty sets

P_{1}, \dots, P_{n}

forms a partition of the set

P

if the elements of the family

{P_{i}}

are pairwise disjoint and

⋃_{i = 1}^{n} P_{i} = P

. Let us agree that in

(A_{4})

, it is allowed that the elements of the set

{P_{i}}

intersect along sets of zero Lebesgue measure (for example, along boundaries). Such a reservation allows us not to exclude the situation of multiple points in the design. In the case of pairwise different design points, such a reservation is not required. Note also that without the above convention, condition

(A_{4})

can be formulated as follows: for each n, there exists a random partition of the set

{[0, 1]}^{k}

into n Jordan-measurable subsets

{P_{i}; i = 1, \dots, n}

such that

δ_{n} : = {max}_{i \leq n} d (P_{i} \cup z_{i}) \overset{p}{\to} 0 .

Remark 2.

Condition

(A_{4})

means that for any n, the set of design points

{z_{i}; i = 1, \dots, n}

forms an ε-net of

{[0, 1]}^{k}

for

ε = δ_{n}

provided that

δ_{n} \overset{p}{\to} 0

.

Remark 3.

Note that condition

(A_{4})

is satisfied for any nonrandom regular design. If

{z_{i}}

are independent and identically distributed and the set

{[0, 1]}^{k}

is the support of the distribution

z_{1}

, then condition

(A_{1})

is also fulfilled. If, additionally, the distribution density

X_{1}

is separated from zero on

{[0, 1]}^{k}

, then

δ_{n} = O (\frac{log n}{n^{1 / k}})

with probability 1. If

{z_{i}; i \geq 1}

is a stationary sequence with an α-mixing condition and the marginal distribution support is

{[0, 1]}^{k}

, then condition

(A_{4})

is also satisfied. Note that all kinds of dependence of design points

{z_{i}}

known to the authors satisfy condition

(A_{4})

, but the fulfillment of this condition is also quite possible for other types of dependence not yet described in the modern literature on nonparametric regression (see Example 2 below).

Let

\begin{matrix} x = {(X_{1}, \dots, X_{n})}^{⊤}, W_{t} = diag {K_{h} (t - z_{1}) Λ_{k} (P_{1}), \dots, K_{h} (t - z_{n}) Λ_{k} (P_{n})}, \end{matrix}

(2)

\begin{matrix} Z_{t} = [\begin{matrix} 1 & {(t - z_{1})}^{⊤} \\ ⋮ & ⋮ \\ 1 & {(t - z_{n})}^{⊤} \end{matrix}] . \end{matrix}

(3)

We introduce the following class of estimators for the regression function f:

\begin{matrix} {\hat{f}}_{n, h} (t) : = e_{1}^{⊤} {(Z_{t}^{⊤} W_{t} Z_{t})}^{- 1} Z_{t}^{⊤} W_{t} x, \end{matrix}

(4)

where

e_{1} = {(1, 0, \dots, 0)}^{⊤}

is the

(k + 1)

-dimensional vector such that the first coordinate equals 1 and the other ones vanish.

Remark 4.

It is easy to check that the kernel estimator (4) is the first coordinate of the

(k + 1)

-dimensional estimator of the following variant of the weighted least squares method:

\begin{matrix} min_{a, b} \sum_{i = 1}^{n} {(X_{i} - (a + b^{⊤} (t - z_{i})))}^{2} K_{h} (t - z_{i}) Λ_{k} (P_{i}) . \end{matrix}

(5)

Thus, the proposed class of estimators in a certain sense (in fact, by construction) is close to the classical multidimensional local linear kernel estimators, but in weighted least squares (5), we use slightly different weights.

2.2. The Main Theorem, Corollaries, and Examples

The following theorem is the main result of this paper. It allows us to construct a confidence region (tube) for an unknown regression function.

Theorem 1.

Let conditions

(A_{1})

–

(A_{4})

be satisfied. Then, for any fixed

h \in (0, 1 / 2]

with probability 1, the following relation is valid:

\begin{matrix} sup_{t \in {[0, 1]}^{k}} | {\hat{f}}_{n, h} (t) - f (t) | \leq C_{1}^{*} ω_{f} (h) + ζ_{n} (h), \end{matrix}

(6)

with a nonnegative random variable

ζ_{n} (h)

such that

\begin{matrix} P (ζ_{n} (h) > y) \leq C_{2}^{*} y^{- p} h^{- k (p / 2 + 1)} E (δ_{n}^{k p / 2}) + P (δ_{n} > c^{*} h), \end{matrix}

(7)

where the constants

c^{*} < 1

,

C_{1}^{*}

, and

C_{2}^{*}

defined in (36), Lemmas 6 and 8, depend on k and the kernel K; furthermore, the constant

C_{2}^{*}

additionally depends on p and

M_{p}

.

Remark 5.

Let f be a nonrandom regression function. Substitute the following intp (7):

y = {(h^{- k (p / 2 + 1)} E (δ_{n}^{k p / 2}))}^{1 / p} .

Applying Markov’s power inequality with exponent

k p / 2

for the second term in (7), it is easy to see that under the conditions of the above theorem,

ζ_{n} (h) = O_{p} (C^{*} {(h^{- k (p / 2 + 1)} E (δ_{n}^{k p / 2}))}^{1 / p}),

(here,

C^{*}

is a positive constant that depends on k, p,

M_{p}

, and K) and there exists a solution

h \equiv h_{n}

to the equation

\begin{matrix} E (δ_{n}^{k p / 2}) = h^{k (p / 2 + 1)} ω_{f}^{p} (h) . \end{matrix}

(8)

It is clear that this solution tends to zero as n grows. In fact, the quantity

h_{n}

minimizes in h the order of smallness of the right-hand side of relation (6). Notice that in virtue of (8), the limit relations

{(h_{n})}^{- k (p / 2 + 1)} E (δ_{n}^{k p / 2}) \to 0

and

δ_{n} / h_{n} \overset{p}{\to} 0

are valid.

Theorem 1 implies the following two assertions.

Corollary 1.

Let conditions

(A_{1})

–

(A_{4})

be fulfilled and let

C

be a set of equicontinuous nonrandom functions from the space

C ({[0, 1]}^{k})

. Then,

γ_{n} (C) = sup_{f \in C} sup_{t \in {[0, 1]}^{k}} | {\hat{f}}_{n, {\tilde{h}}_{n}} (t) - f (t) | \overset{p}{\to} 0,

where

{\tilde{h}}_{n}

is a solution to Equation (8), where the modulus of continuity

ω_{f} (h)

is replaced with the universal modulus

ω_{f}^{C} (h) = sup_{f \in C} ω_{f} (h)

. Moreover, there is the valid relation

γ_{n} (C) = O_{p} (ω_{f}^{C} ({\tilde{h}}_{n})) .

Corollary 2.

If conditions

(A_{1})

–

(A_{4})

are fulfilled and the modulus of continuity of the random regression function

f (t)

,

t \in {[0, 1]}^{k}

satisfies with probability 1 the condition

ω_{f} (h) \leq ζ φ (h)

for some proper random variable

ζ > 0

and a positive nonrandom function

φ (h)

with the condition

φ (h) \to 0

as

h \to 0

, then there is the valid limit relation

sup_{t \in {[0, 1]}^{k}} | {\hat{f}}_{n, {\hat{h}}_{n}} (t) - f (t) | \overset{p}{\to} 0,

where

{\hat{h}}_{n}

is a solution to Equation (8), in which the modulus of continuity

ω_{f} (h)

is replaced with

φ (h)

.

Example 1.

δ_{n} \leq ν n^{- 1 / k}

, with

E ν^{k p / 2} < \infty

, and

ω_{f} (h) \leq ζ h^{α}

for

α \in (0, 1]

and some proper random variable ζ. Then,

h_{n} = O (n^{- \frac{1}{2 k (1 / p + 1 / 2) + α}}) a n d sup_{t \in {[0, 1]}^{k}} | {\hat{f}}_{n, h} (t) - f (t) | = O_{p} (n^{- \frac{α}{2 k (1 / p + 1 / 2) + 2 α}}) .

In particular, if

f (\cdot) = W (\cdot)

is a Wiener process on

[0, 1]

and the independent identically distributed random variables

ε_{i}

have a normal distribution with zero mean, then for all sufficiently small

ν > 0

,

sup_{t \in [0, 1]} | {\hat{f}}_{n, h} (t) - f (t) | = O_{p} (n^{- 1 / 3 + ν}) .

Here,

k = 1

,

α = 1 / 2 - ν_{1}

, and

1 / p < ν_{2}

for arbitrarily small positive

ν_{1}

and

ν_{2}

.

Example 2.

Let a sequence of bivariate random variables

{z_{i}; i \geq 1}

be defined by the relation

z_{i} = ν_{i} u_{i}^{l} + (1 - ν_{i}) u_{i}^{r},

(9)

where

{u_{i}^{l}}

and

{u_{i}^{r}}

are independent and uniformly distributed on the respective rectangles

[0, 1 / 2] \times [0, 1]

and

[1 / 2, 1] \times [0, 1]

, and the sequence

{ν_{i}}

does not depend on

{u_{i}^{l}}

and

{u_{i}^{r}}

and consists of Bernoulli random variables with the success probability

1 / 2

, i.e., the distribution of random variables

z_{i}

is an equilibrium mixture of two uniform distributions in the corresponding rectangles. The dependence between the random variables

ν_{i}

for any natural i is defined by the equalities

ν_{2 i - 1} = ν_{1}

and

ν_{2 i} = 1 - ν_{1}

. In this case, the random variables

{z_{i}; i \geq 1}

in (9) form a stationary sequence of random variables uniformly distributed on the square

[0, 1] \times [0, 1]

, but, say, all known mixing conditions are not satisfied here since for all natural m and n,

\begin{matrix} P (z_{2 m} \in [0, 1 / 2] \times [0, 1], z_{2 n - 1} \in [0, 1 / 2] \times [0, 1]) = 0 . \end{matrix}

On the other hand, it is easy to check that the stationary sequence

{z_{i}}

satisfies the Glivenko–Cantelli theorem. This means that for any fixed

h > 0

,

c_{1} h^{k} \leq lim sup_{t \in {[0, 1]}^{k}} # {i : ∥ t - z_{i} ∥ \leq h, 1 \leq i \leq n} \leq c_{2} h^{k}

is almost surely uniform in t; here, # is the counting measure and the constants

c_{1}

and

c_{2}

do not depend on t and h. In other words, the sequence

{z_{i}}

satisfies condition

(A_{4})

.

In the general case, according to the scheme of this example, one can construct various sequences of dependent random variables uniformly distributed over

[0, 1] \times [0, 1]

based on the choice of various sequences of Bernoulli switches with the conditions

ν_{j_{k}} = 1

and

ν_{l_{k}} = 0

for unlimited numbers of indices

{j_{k}}

and

{l_{k}}

, respectively. In this case, condition

(A_{4})

is also satisfied. But the corresponding sequence

{z_{i}}

(not necessarily stationary) may not satisfy the strong law of large numbers. For example, this situation occurs when

ν_{j} = 1 - ν_{1}

for

j = 2^{2 k - 1}, \dots, 2^{2 k} - 1

, and

ν_{j} = ν_{1}

for

j = 2^{2 k}, \dots, 2^{2 k + 1} - 1

, where

k = 1, 2, \dots

; i.e., we play in which of the rectangles we throw the first point at random, and then alternate the number of throws for each of the two specified rectangles as follows: 2, 4, 8, 16, and so on. Indeed, we introduce the notation

n_{k} = 2^{2 k} - 1

,

{\tilde{n}}_{k} = 2^{2 k + 1} - 1

, and

S_{m} = \sum_{i = 1}^{m} z_{1 i}

for

z_{i} = (z_{1 i}, z_{2 i})

, and we note that for all outcomes that make up the event

{ν_{1} = 1}

, the following is fulfilled:

\frac{S_{n_{k}}}{n_{k}} = \frac{1}{n_{k}} \sum_{i \in N_{1, k}} u_{1 i}^{l} + \frac{1}{n_{k}} \sum_{i \in N_{2, k}} u_{1 i}^{r},

where

u_{i}^{l} = (u_{1 i}^{l}, u_{2 i}^{l})

and

u_{i}^{r} = (u_{1 i}^{r}, u_{2 i}^{r})

, and

N_{1, k}

and

N_{2, k}

are the collections of indices such that the observations

{z_{i}, i \leq n_{k}}

lie in the rectangles

[0, 1 / 2] \times [0, 1]

or

[1 / 2, 1] \times [0, 1]

, respectively. It is easy to see that

# (N_{1, k}) = n_{k} / 3

and

# (N_{2, k}) = 2 # (N_{1, k})

. Hence,

S_{n_{k}} / n_{k} \to 7 / 12

almost surely as

k \to \infty

due to the strong law of large numbers for the sequences

{u_{1 i}^{l}}

and

{u_{1 i}^{r}}

. On the other hand, for all elementary outcomes belonging to the event

{ν_{1} = 1}

, as

k \to \infty

,

\frac{S_{{\tilde{n}}_{k}}}{{\tilde{n}}_{k}} = \frac{1}{{\tilde{n}}_{k}} \sum_{i \in {\tilde{N}}_{1, k}} u_{1 i}^{l} + \frac{1}{{\tilde{n}}_{k}} \sum_{i \in {\tilde{N}}_{2, k}} u_{1 i}^{r} \to \frac{5}{12}

(10)

with probability 1, where

{\tilde{N}}_{1, k}

and

{\tilde{N}}_{2, k}

are the collections of indices such that the observations

{z_{i}, i \leq {\tilde{n}}_{k}}

, lie in the respective rectangles

[0, 1 / 2] \times [0, 1]

or

[1 / 2, 1] \times [0, 1]

. When justifying the convergence in (10), we took into account the fact that

# ({\tilde{N}}_{1, k}) = (2^{2 k + 2} - 1) / 3

and

# ({\tilde{N}}_{2, k}) = 2 n_{k} / 3

, i.e.,

# ({\tilde{N}}_{1, k}) = 2 # ({\tilde{N}}_{2, k}) + 1

.

Similar conclusions are valid for all outcomes that make up the event

{ν_{1} = 0}

.

3. Estimating the Mean and Covariance of a Random Regression Function with a Dense Design

In this section, as an application of Theorem 1, we consider one of the variants of the problem of estimating the mean function of an almost surely continuous random process. The estimation of the mean and covariance functions plays an important role and many recent works were devoted to solving this problem (see, for example, [60,67,72,73] and the references below). As in the classical formulation of the problem of nonparametric regression of the form (1), the design in the problem of estimating the mean and covariance functions is considered to be either random or deterministic. For a random design, as a rule, it is assumed that the design points are independent identically distributed random variables (see, for example, [60,61,62,63,64,65,66,74,75]). In the case of deterministic design, one or another regularity condition is usually used. For example, the nonrandom equidistant design was discussed in [61]. In addition, in the problem of estimating the mean function, it is customary to subdivide the design into certain types depending on the number of design points for a particular trajectory.

The literature focuses on two opposing types of data: the design is in one sense or another “sparse” (for example, the number of design points for each of the realizations of the regression process is uniformly bounded; see [60,61,62,74]), or the design is somewhat “dense” (the number of design elements in each series grows simultaneously with the number of series; see [60,62,65,74]). In Theorem 2 given below, the second of the mentioned types of design is considered, provided only condition

(A_{4})

is satisfied in each of the independent series. Note that our formulation of the problem of estimating the mean function also includes the situation of a general deterministic design. Mean function estimation approaches that are used for dense or sparse data are often different (see, for example, [73,76]). In the case of a growing number of observations in each series, it is natural to first estimate the random regression function in each series, and then average their estimators over all series (see, for example, [61,65]). This is exactly what we do next (see Formula (12) below) by following this generally accepted approach. The uniform consistency of the mean function estimators was studied, for example, in [60,62,64,66,72].

Therefore, consider the following statement of the problem under consideration. We have N independent copies of model (1) from condition

(A_{1})

:

X_{i, j} = f_{j} (z_{i, j}) + ε_{i, j}, i = 1, \dots, n, j = 1, \dots, N,

(11)

where

f_{1} (t), \dots, f_{N} (t)

,

t \in {[0, 1]}^{k}

, are independent identically distributed unobservable random processes with a.s. continuous trajectories, and the collections

{ε_{i, j}; i = 1, \dots, n}

and

{z_{i, j}; i = 1, \dots, n}

for all j satisfy conditions

(A_{1}), (A_{2})

, and

(A_{4})

. Here and below in this section, the index j means the copy number of model (1). We define an estimator for the mean function by the equality

{\bar{f^{*}}}_{N, n, h} (t) = \frac{1}{N} \sum_{j = 1}^{N} {\hat{f}}_{n, h, j} (t),

(12)

where the estimators

{\hat{f}}_{n, h, j} (t)

are determined by Formula (4), where the values from (1) are replaced with the corresponding characteristics from (11).

A corollary of Theorem 1 is the following assertion.

Theorem 2.

For model (11), let conditions

(A_{1})

–

(A_{4})

be fulfilled and

E sup_{t \in {[0, 1]}^{k}} | f_{1} (t) | < \infty .

(13)

Moreover, let a sequence

h \equiv h_{n} \to 0

and a subsequence of naturals

N \equiv N_{n} \to \infty

satisfy the conditions

h^{- k (p / 2 + 1)} E (δ_{n}^{k p / 2}) \to 0 a n d N P (δ_{n} > c^{*} h}) \to 0 .

(14)

Then,

sup_{t \in {[0, 1]}^{k}} | {\bar{f^{*}}}_{N, n, h} (t) - E f_{1} (t) | \overset{p}{\to} 0 .

Remark 6.

If condition (13) is replaced with a slightly stronger constraint

E sup_{t \in {[0, 1]}^{k}} f_{1}^{2} (t) < \infty,

then by (14), one can show the uniform consistency of the estimator

M_{N, n, h}^{*} (t, s) = \frac{1}{N} \sum_{j = 1}^{N} {\hat{f}}_{n, h, j} (t) {\hat{f}}_{n, h, j} (s), t, s \in {[0, 1]}^{k},

for the mixed second moment

E f (t) f (s)

, where

h \equiv h_{n}

and

N \equiv N_{n}

are defined in (14). The proof of this fact is similar to the proof of Theorem 2, and thus, we omit the detailed arguments. In other words, under the above conditions, the estimator

{Cov}_{N, n, h}^{*} (t, s) = M_{N, n, h}^{*} (t, s) - {\bar{f^{*}}}_{N, n, h} (t) {\bar{f^{*}}}_{N, n, h} (s)

is uniformly consistent for the covariance function of the random regression field

f (t)

.

4. Proofs

Throughout this section, we assume that conditions

(A_{1})

–

(A_{3})

are satisfied. To prove Theorem 1, we need a number of auxiliary assertions. Let

\begin{matrix} κ_{j} = \int_{- 1}^{1} {| u |}^{j} K_{o} (u) d u, j = 1, 2, \\ λ_{i} (t) = K_{h} (t - z_{i}) Λ_{k} (P_{i}), w_{n 0} (t) = \sum_{i = 1}^{n} λ_{i} (t), \end{matrix}

(15)

\begin{matrix} w_{n u v} (t) = \sum_{i = 1}^{n} (t_{u} - z_{u i}) (t_{v} - z_{v i}) λ_{i} (t), w_{n u} (t) = \sum_{i = 1}^{n} (t_{u} - z_{u i}) λ_{i} (t), u, v = 1, \dots, k, \end{matrix}

(16)

\begin{matrix} w_{0} (t) = \int K_{h} (t - z) d z, w_{u} (t) = \int (t_{u} - z_{u}) K_{h} (t - z) d z, u = 1, \dots, k, \end{matrix}

(17)

\begin{matrix} w_{u v} (t) = \int (t_{u} - z_{u}) (t_{v} - z_{v}) K_{h} (t - z) d z, u, v = 1, \dots, k, \end{matrix}

(18)

where

t = {(t_{1}, \dots, t_{k})}^{⊤}

,

z_{i} = {(z_{i 1}, \dots, z_{i k})}^{⊤}

, and

z = {(z_{1}, \dots, z_{k})}^{⊤}

.

Remark 7.

We emphasize that due to the density properties of

K_{h} (\cdot)

, the summation domain in all sums in (15) and (16), as well as in all sums in the formulas given below in (41), (42), (44), and (51) coincide with the set

N_{h} (t) = {i : ∥ t - z_{i} ∥ \leq h, 1 \leq i \leq n},

(19)

and the domain of integration in (17) and (18), respectively, coincides with the set

A_{h} {(t) = {z \in [0, 1]}^{k} : ∥ t - z ∥ \leq h} .

These facts are fundamental for the further analysis.

Lemma 1.

For

h < 1 / 2

, the following relations are valid:

\begin{matrix} inf_{t \in {[0, 1]}^{k}} (w_{0} (t) w_{u u} (t) - w_{u}^{2} (t)) \geq 2^{- k - 1} (κ_{2} - κ_{1}^{2}) h^{2}, u = 1, \dots, k, \end{matrix}

(20)

\begin{matrix} \forall t \in {[0, 1]}^{k} w_{0} (t) w_{u v} (t) - w_{u} (t) w_{v} (t) = 0, u \neq v, u, v = 1, \dots, k, \end{matrix}

(21)

\begin{matrix} inf_{t \in {[0, 1]}^{k}} w_{0} (t) = 2^{- k}, sup_{t \in {[0, 1]}^{k}} w_{0} (t) = 1, \end{matrix}

(22)

\begin{matrix} sup_{t \in {[0, 1]}^{k}} w_{u} (t) = 2^{- 1} κ_{1} h, sup_{t \in {[0, 1]}^{k}} w_{u v} (t) = 2^{- 2} κ_{1}^{2} h^{2}, u, v = 1, \dots, k . \end{matrix}

(23)

Proof.

The relation (21) obviously follows from the representation

K (t) = \prod_{j = 1}^{k} K_{o} (t_{j})

introduced in condition

(A_{3})

. The statements (20), (22), and (23) follow from Lemma 1 in [53] and the abovementioned representation for the kernel function. □

Lemma 2.

For

h < 1 / 2

, on the subset of elementary events defined by the relation

δ_{n} \leq h

, the following inequalities hold:

\begin{matrix} sup_{t \in {[0, 1]}^{k}} w_{n 0} (t) \leq 2^{2 k} L^{k}, sup_{t \in {[0, 1]}^{k}} | w_{n 0} (t) - w_{0} (t) | \leq k 2^{2 k} L^{k} h^{- 1} δ_{n}, \end{matrix}

(24)

\begin{matrix} sup_{t \in {[0, 1]}^{k}} | w_{n u} (t) | \leq 2^{2 k} L^{k} h, sup_{t \in {[0, 1]}^{k}} | w_{n u} (t) - w_{u} (t) | \leq (k + 1) 2^{2 k} L^{k} δ_{n}, u = 1, \dots, k, \end{matrix}

(25)

\begin{matrix} sup_{t \in {[0, 1]}^{k}} | w_{n u v} (t) | \leq 2^{2 k} L^{k} h^{2}, sup_{t \in {[0, 1]}^{k}} | w_{n u v} (t) - w_{u v} (t) | \leq (k + 2) 2^{2 k} L^{k} h δ_{n}, u, v = 1, \dots, k; \end{matrix}

(26)

and on the subset of elementary events

δ_{n} \leq {(k 2^{3 k + 1} L^{k})}^{- 1} h

,

\begin{matrix} inf_{t \in {[0, 1]}^{k}} w_{n 0} (t) \geq 2^{- k - 1}, \end{matrix}

(27)

\begin{matrix} sup_{t \in {[0, 1]}^{k}} |w_{n u v} (t) - \frac{w_{n u} (t) w_{n v} (t)}{w_{n 0} (t)} - w_{u v} (t) + \frac{w_{u} (t) w_{v} (t)}{w_{0} (t)}| \leq (k + 2) 2^{6 k + 3} L^{k + k \land 2} h δ_{n} \end{matrix}

(28)

for any

u, v = 1, \dots, k

.

Proof.

To display the first estimates in (24)–(26), one should take into account Remark 7 and the following relations:

sup_{s \in {[- 1, 1]}^{k}} K (s) \leq L^{k}, \sum_{i = 1}^{n} K_{h} (t - z_{i}) Λ_{k} (P_{i}) \leq h^{- k} L^{k} {(2 h + 2 δ_{n})}^{k} \leq 2^{2 k} L^{k} .

(29)

The second estimates in (24)–(26) follow from the well-known error estimate for the approximation by Riemann integral sums of the corresponding integrals of functions satisfying the Lipschitz condition:

|\sum_{i \in N_{h} (t)} g_{t, α, β}^{u, v} (z_{i}) Λ_{k} (P_{i}) - \int_{z \in A_{h} (t)} g_{t, α, β}^{u, v} (z) d z| \leq δ_{n} L_{g_{t, α, β}^{u, v}} \sum_{i = 1}^{n} Λ_{k} (P_{i}),

(30)

where the functions

g_{t, α, β}^{u, v} (z) = {(t_{u} - z_{u})}^{α} {(t_{v} - z_{v})}^{β} K_{h} (t - z)

,

α, β = 0, 1

, are defined for

z_{j} \in [0 \lor t_{j} - h, 1 \land t_{j} + h]

,

j = 1, \dots, k

, and

L_{g_{t, α, β}^{u, v}}

is a Lipschitz constant of the function

g_{t, α, β}^{u, v} (z)

. It is easy to verify that under the conditions of the lemma,

sup_{t \in {[0, 1]}^{k}} L_{g_{t, 0, 0}^{u, v}} \leq k L^{k} h^{- k - 1}, sup_{t \in {[0, 1]}^{k}} L_{g_{t, 1, 0}^{u, v}} \leq (k + 1) L^{k} h^{- k}, sup_{t \in {[0, 1]}^{k}} L_{g_{t, 1, 1}^{u, v}} \leq (k + 2) L^{k} h^{- k + 1} .

To complete deriving (24)–(26), we need to take into account the definitions (15)–(18) and the estimates (29) and (30).

Finally, (27) follows from the first relation in (22) and the second relation in (24). When deriving the last assertion (28) of the lemma, we use the assertions of Lemmas 1 and 2. □

In what follows, we also need to establish the Lipschitz property of the introduced functions

w_{n 0} (t)

,

w_{n u} (t)

, and

w_{n u v} (t)

.

Lemma 3.

For

h < 1 / 2

, on the set of elementary outcomes defined by the relation

δ_{n} \leq h

, for any

t

,

t^{'} \in {[0, 1]}^{k}

, the following inequalities hold:

\begin{matrix} | w_{n 0} (t^{'}) - w_{n 0} (t) | \leq k 2^{2 k + 1} L^{k} h^{- 1} ∥ t^{'} - t ∥, \end{matrix}

(31)

\begin{matrix} | w_{n u} (t^{'}) - w_{n u} (t) | \leq (k + 1) 2^{2 k + 1} L^{k} ∥ t^{'} - t ∥, u = 1, \dots, k, \end{matrix}

(32)

\begin{matrix} | w_{n u v} (t^{'}) - w_{n u v} (t) | \leq (k + 2) 2^{2 k + 1} L^{k} h ∥ t^{'} - t ∥, u, v = 1, \dots, k . \end{matrix}

(33)

On the set of elementary outcomes with the condition

δ_{n} \leq {(k 2^{3 k + 1} L^{k})}^{- 1} h

,

|w_{n u v} (t^{'}) - \frac{w_{n u} (t^{'}) w_{n v} (t^{'})}{w_{n 0} (t^{'})} - w_{n u v} (t) + \frac{w_{n u} (t) w_{n v} (t)}{w_{n 0} (t)}| \leq 5 (k + 1) 2^{8 k + 2} L^{3 k} h ∥ t^{'} - t ∥ .

(34)

Proof.

First of all, we note that the normalized kernel

K_{h}

satisfies the Lipschitz condition with constant

k 2^{2 k + 1} L^{k} h^{- k - 1}

, and for the sets

N_{h} (t)

defined in (19), under the conditions of the lemma, we have the estimate

\sum_{i \in N_{h} (t)} Λ_{k} (P_{i}) \leq 2^{k} {(h + δ_{n})}^{k} \leq 2^{2 k} h^{k} .

From here, we easily obtain (31):

| w_{n 0} (t^{'}) - w_{n 0} (t) | \leq \sum_{i \in N_{h} (t^{'}) \cup N_{h} (t)} | λ_{i} (t^{'}) - λ_{i} (t) |

\leq k L^{k} h^{- k - 1} ∥ t^{'} - t ∥ (\sum_{i \in N_{h} (t^{'})} Λ_{k} (P_{i}) + \sum_{i \in N_{h} (t)} Λ_{k} (P_{i})) \leq k 2^{2 k + 1} L^{k} h^{- 1} ∥ t^{'} - t ∥ .

The proof of estimate (32) differs slightly from the one just shown:

| w_{n u} (t^{'}) - w_{n u} (t) | \leq \sum_{i \in N_{h} (t^{'}) \cup N_{h} (t)} λ_{i} (t) | t_{u}^{'} - t_{u} | + \sum_{i \in N_{h} (t^{'}) \cup N_{h} (t)} | λ_{i} (t^{'}) - λ_{i} (t) | | t_{u} - z_{u i} |

\leq 2^{2 k + 1} L^{k} ∥ t^{'} - t ∥ + k 2^{2 k + 1} L^{k} ∥ t^{'} - t ∥ .

Similarly, estimate (33) is proved after dividing the original sum into three subsums. Inequality (33) follows directly from the upper bounds (24) and (25) for the values under consideration, as well as from (31) and (32). We only deduce the main estimate:

| w_{n u} (t^{'}) w_{n v} (t^{'}) w_{n 0} (t) - w_{n u} (t) w_{n v} (t) w_{n 0} (t^{'}) | \leq | w_{n u} (t^{'}) - w_{n u} (t) | | w_{n v} (t^{'}) | w_{n 0} (t)

+ | w_{n v} (t^{'}) - w_{n v} (t) | | w_{n u} (t) | w_{n 0} (t) + | w_{n 0} (t^{'}) - w_{n 0} (t) | | w_{n u} (t) | | w_{n v} (t) |

\leq (5 k + 4) 2^{6 k} L^{3 k} h ∥ t^{'} - t ∥ .

To prove (34), we only need to use estimate (33), as well as the uniform lower bound

inf_{t \in {[0, 1]}^{k}} w_{n 0} (t) \geq 2^{- k - 1}

of the set of outcomes defined by the inequality

δ_{n} \leq {(k 2^{3 k + 1} L^{k})}^{- 1} h

(see (27)). After this, you just need to add up the obtained estimates and carry out elementary calculations, which we omit. □

We define the entries of a square matrix

H

of size k as follows:

H_{u v} = w_{n u v} (t) - w_{n 0}^{- 1} (t) w_{n u} (t) w_{n v} (t), u, v = 1, \dots, k .

(35)

Note that by virtue of the Cauchy–Bunyakowsky inequality, one has

H_{u v} \geq 0

for all

u, v

. We also need the following notation:

c^{*} = \frac{{(κ_{2} - κ_{1}^{2})}^{k}}{(k + 2)! 2^{6 k^{2} + 4 k + 2} L^{2 k^{2} - k + k \land 2}} .

(36)

It is easy to see that the difference

κ_{2} - κ_{1}^{2}

is the variance of a non-degenerate distribution, and thus, it is strictly positive. Also,

c^{*} < 1 .

The following assertion holds.

Lemma 4.

For

h < 1 / 2

, on the set of outcomes defined by the relation

δ_{n} \leq c^{*} h

,

sup_{t \in {[0, 1]}^{k}} ∥ H^{- 1} ∥ \leq k! 2^{6 k^{2} - 2 k - 1} L^{2 k (k - 1)} {(κ_{2} - κ_{1}^{2})}^{- k} h^{- 2} .

Proof.

Denote by

H^{0}

the square matrix of size k with entries

H_{u v}^{0} = w_{u v} (t) - w_{0}^{- 1} (t) w_{u} (t) w_{v} (t), u, v = 1, \dots, k .

By Lemma 1,

H^{0} = diag \{w_{11} (t) - w_{0} (t) w_{1}^{2} (t), \dots, w_{k k} (t) - w_{0} (t) w_{k}^{2} (t)\}

and the following estimate holds for the determinant of this diagonal matrix with positive entries:

inf_{t \in {[0, 1]}^{k}} det (H^{0}) \geq 2^{- k (k + 1)} {(κ_{2} - κ_{1}^{2})}^{k} h^{2 k} .

(37)

Next,

\begin{matrix} sup_{t \in {[0, 1]}^{k}} |det H - det H^{0}| \leq k! k (k + 2) 2^{6 k + 3} L^{k + k \land 2} {(L^{2 k} 2^{5 k + 2})}^{k - 1} δ_{n} h h^{2 (k - 1)} \leq \\ (k + 2)! 2^{5 k^{2} + 3 k + 1} L^{2 k^{2} - k + k \land 2} δ_{n} h^{2 k - 1} . \end{matrix}

(38)

When deriving (4), we use the definition of the matrix determinant and estimate (28), and took into account the fact that due to Lemmas 1 and 2, for all

u, v

, one has

sup_{t \in {[0, 1]}^{k}} H_{u v} \leq 2^{5 k + 2} L^{2 k} h^{2}, sup_{t \in {[0, 1]}^{k}} H_{u v}^{0} \leq 2^{k - 1} L^{2} h^{2} .

Hence, on the set of elementary outcomes defined by the relation

δ_{n} \leq c^{*} h

,

inf_{t \in {[0, 1]}^{k}} det H \geq 2^{- k (k + 1) - 1} {(κ_{2} - κ_{1}^{2})}^{k} h^{2 k} .

(39)

The assertion of the lemma can now be easily derived directly from the definition of the inverse matrix and relation (39). □

We need the following notation of n-dimensional vectors:

\begin{matrix} 1 = {(1, \dots, 1)}^{⊤}, ϵ = {(ε_{1}, \dots, ε_{n})}^{⊤}, \\ δ_{f} = {((f (z_{1}) - f (t)), \dots, (f (z_{n}) - f (t)))}^{⊤} \end{matrix}

In this case,

x = f (t) 1 + δ_{f} + ϵ

, and for the estimate

{\hat{f}}_{n, h} (t)

in (4), there is the valid representation

\begin{matrix} {\hat{f}}_{n, h} (t) = f (t) e_{1}^{⊤} {(Z_{t}^{⊤} W_{t} Z_{t})}^{- 1} Z_{t}^{⊤} W_{t} 1 + e_{1}^{⊤} {(Z_{t}^{⊤} W_{t} Z_{t})}^{- 1} Z_{t}^{⊤} W_{t} δ_{f} \\ + e_{1}^{⊤} {(Z_{t}^{⊤} W_{t} Z_{t})}^{- 1} Z_{t}^{⊤} W_{t} ϵ . \end{matrix}

(40)

Using definitions (2) and (3) and the first notation in (15), one can rewrite the matrix

Z_{t}^{⊤} W_{t} Z_{t}

of dimension

(k + 1) \times (k + 1)

and

(k + 1)

-dimensional vectors

Z_{t}^{⊤} W_{t} 1

,

Z_{t}^{⊤} W_{t} δ_{f}

, and

Z_{t}^{⊤} W_{t} ϵ

as follows:

\begin{matrix} Z_{t}^{⊤} W_{t} Z_{t} = [\begin{matrix} \sum_{i = 1}^{n} λ_{i} (t) & \sum_{i = 1}^{n} {(t - z_{i})}^{⊤} λ_{i} (t) \\ \sum_{i = 1}^{n} (t - z_{i}) λ_{i} (t) & \sum_{i = 1}^{n} (t - z_{i}) {(t - z_{i})}^{⊤} λ_{i} (t) \end{matrix}], Z_{t}^{⊤} W_{t} 1 = [\begin{matrix} \sum_{i = 1}^{n} λ_{i} (t) \\ \sum_{i = 1}^{n} (t - z_{i}) λ_{i} (t) \end{matrix}], \end{matrix}

(41)

\begin{matrix} Z_{t}^{⊤} W_{t} δ_{f} = [\begin{matrix} \sum_{i = 1}^{n} λ_{i} (t) (f (z_{i}) - f (t)) \\ \sum_{i = 1}^{n} (t - z_{i}) λ_{i} (t) (f (z_{i}) - f (t)) \end{matrix}], Z_{t}^{⊤} W_{t} ϵ = [\begin{matrix} \sum_{i = 1}^{n} λ_{i} (t) ε_{i} \\ \sum_{i = 1}^{n} (t - z_{i}) λ_{i} (t) ε_{i} \end{matrix}] . \end{matrix}

(42)

The following assertion holds.

Lemma 5.

The following identity is valid:

e_{1}^{⊤} {(Z_{t}^{⊤} W_{t} Z_{t})}^{- 1} Z_{t}^{⊤} W_{t} 1 = 1 .

Proof.

We use the Frobenius formula to invert a nondegenerate block matrix:

\begin{matrix} {[\begin{matrix} A & B \\ C & D \end{matrix}]}^{- 1} = [\begin{matrix} A^{- 1} + A^{- 1} B H^{- 1} C A^{- 1} & - A^{- 1} B H^{- 1} \\ - H^{- 1} C A^{- 1} & H^{- 1} \end{matrix}], f o r H = D - C A^{- 1} B, \end{matrix}

where

A

is a square nondegenerate matrix of size

m_{1}

and

D

is a square matrix of size

m_{2}

. In this formula, we let

m_{1} = 1

,

m_{2} = k

,

\begin{matrix} \begin{matrix} A = \sum_{i = 1}^{n} λ_{i} (t) \equiv w_{n 0} (t), B = \sum_{i = 1}^{n} {(t - z_{i})}^{⊤} λ_{i} (t), \\ C = \sum_{i = 1}^{n} (t - z_{i}) λ_{i} (t), D = \sum_{i = 1}^{n} (t - z_{i}) {(t - z_{i})}^{⊤} λ_{i} (t) . \end{matrix} \end{matrix}

(43)

Then, in view of (41) and the Frobenius formula, the first component of the vector of interest to us is

(A^{- 1} + A^{- 1} B H^{- 1} C A^{- 1}) A - A^{- 1} B H^{- 1} C \equiv 1 .

□

Lemma 6.

For

h < 1 / 2

, on the set of elementary outcomes defined by the relation

δ_{n} \leq c^{*} h

,

sup_{t \in {[0, 1]}^{k}} | e_{1}^{⊤} {(Z_{t}^{⊤} W_{t} Z_{t})}^{- 1} Z_{t}^{⊤} W_{t} δ_{f} | \leq C_{1}^{*} ω_{f} (h), w i t h C_{1}^{*} = 1 + k \cdot k! 2^{6 k^{2}} L^{2 k^{2} - k} {(κ_{2} - κ_{1}^{2})}^{- k} .

Proof.

We use definition (43). Next, using the Frobenius formula (see the proof of Lemma 5), we obtain the identity

\begin{matrix} e_{1}^{⊤} {(Z_{t}^{⊤} W_{t} Z_{t})}^{- 1} Z_{t}^{⊤} W_{t} δ_{f} = A^{- 1} A_{f} + A^{- 1} B H^{- 1} C A^{- 1} A_{f} - A^{- 1} B H^{- 1} C_{f}, \end{matrix}

(44)

where

A_{f} = \sum_{i = 1}^{n} λ_{i} (t) (f (z_{i}) - f (t)), C_{f} = \sum_{i = 1}^{n} (t - z_{i}) λ_{i} (t) (f (z_{i}) - f (t)) .

Consider each of the terms on the right side of this relation. Notice that for the standard inner product

(x, y)

in

R^{k}

and the supnorm

∥ \cdot ∥

under consideration, we have the inequality

| (x, y) | \leq k ∥ x ∥ ∥ y ∥

. Therefore,

\begin{matrix} | A^{- 1} B H^{- 1} C A^{- 1} A_{f} | \leq k A^{- 1} | A^{- 1} A_{f} | ∥ B^{⊤} ∥ ∥ H^{- 1} ∥ ∥ C ∥, \\ | A^{- 1} B H^{- 1} C_{f} | \leq k A^{- 1} ∥ B^{⊤} ∥ ∥ H^{- 1} ∥ ∥ C_{f} ∥ . \end{matrix}

Further, taking into account Remark 7, we obtain

| A^{- 1} A_{f} | \leq ω_{f} (h), ∥ C_{f} ∥ \leq ω_{f} (h) A h, ∥ C ∥ \leq A h, sup_{t \in {[0, 1]}^{k}} ∥ B^{⊤} ∥ \leq 2^{2 k} L^{k} h .

(45)

The above relations, together with the identity (44) and the estimates (45), complete the proof of the lemma. □

Lemma 7.

For any

h < 1 / 2

,

t \in {[0, 1]}^{k}

, and

m \in N_{h} (t)

, on the set of outcomes defined by the relation

δ_{n} \leq c^{*} h

, the following estimates are valid:

\begin{matrix} | α_{n, m} (t) | \leq 1 + (k + 1)! 2^{6 k^{2} - 1} L^{2 k^{2} - k} {(κ_{2} - κ_{1}^{2})}^{- k} \equiv α, \\ | α_{n, m} (t + 2^{- l} e_{r}) - α_{n, m} (t) | \leq \bar{α} 2^{- l} h^{- 1} i f 2^{- l} \leq h, \\ w i t h \bar{α} = 30 {(k!)}^{2} k^{3 - 2 k} L^{10 k - 2} 2^{8 k^{2} + 21 k + 5} {(κ_{2} - κ_{1}^{2})}^{- 2 k}, \end{matrix}

where

α_{n, m} (t) = 1 + B H^{- 1} C A^{- 1} - B H^{- 1} (t - z_{m})

, and

e_{r}

is the k-dimensional vector with unit r-th component and zeros as the other ones.

Proof.

Since

∥ t - z_{m} ∥ \leq h

for

m \in N_{h} (t)

, then the first assertion of the lemma follows from Lemma 4, relations (45), and the following estimate:

\begin{matrix} | α_{n, m} (t) | \leq 1 + k A^{- 1} ∥ B^{⊤} ∥ ∥ H^{- 1} ∥ ∥ C ∥ + ∥ B^{⊤} ∥ ∥ H^{- 1} ∥ ∥ t - z_{m} ∥ . \end{matrix}

To prove the second assertion, first of all, we note

H^{- 1} = \frac{adj H}{det H},

where the elements of the adjoint matrix

adj H

are, up to the signs ±, the complementary minors of the original matrix

H

. As is known, the determinant of any square matrix

M

of size k with entries

M_{i j}

,

i, j = 1, \dots, k

, can be represented as the multilinear form

det M = \sum_{α_{1}, \dots, α_{k}} {(- 1)}^{I (\bar{α})} M_{1 α_{1}} \dots M_{k α_{k}},

(46)

where summation is over all permutations

\bar{α} \equiv (α_{1} \dots α_{k})

of natural numbers

1, \dots, k

, and

I (\bar{α})

is the number of inversions in a particular permutation

\bar{α}

. We need Formula (46) to represent the determinant

det H

and the complementary minors of the matrix

H

in the inverse matrix

H^{- 1}

, respectively, as k- and

(k - 1)

-linear forms of type (46) constructed by entries (35).

To prove this, in the definition of the function

α_{n, m} (t)

, the term

B H^{- 1} C A^{- 1}

is Lipschitz and the summand

B H^{- 1} (t - z_{m})

is local Lipschitz, and the method is carried out the same as that in Lemma 3 (see the proof of (34)). First, we note that the following representation holds:

B H^{- 1} C A^{- 1} = \frac{\sum_{i, j = 1}^{k} w_{n i} (t) w_{n j} (t) {(- 1)}^{i + j} det H^{(i j)}}{A det H},

(47)

where

det H^{(i j)}

is the complementary minor to the entry

H_{i j} = w_{n i j} (t) + \frac{w_{n i} (t) w_{n j} (t)}{w_{n 0} (t)}

of the matrix

H

. Note that in virtue of notation (43), relation (27) can be rewritten as follows:

inf_{t \in {[0, 1]}^{k}} A \geq 2^{- k - 1} .

(48)

Further, to calculate the Lipschitz constant for the ratio in (47), we need the lower bounds (39) and (48) for the denominator of the fractions on the right-hand side of (47) that are valid on the set of elementary outcomes

δ_{n} \leq c_{*} h

, taking into account the fact that the lower bound in (39) has the order of smallness

h^{2 k}

. Moreover, Lemmas 2 and 3, together with the representation (46) (with the elements

H_{i j}

instead of

M_{i j}

), allow us to assert that the upper bound for

det H

has the same order of smallness in h, and the Lipschitz constant for

det H

is estimated from above as

C_{3}^{*} h^{2 k - 1}

. At the same time, the complementary minors

det H^{(i j)}

are estimated from above as

C_{4}^{*} h^{2 k - 2}

, with a Lipschitz constant of order

h^{2 k - 3}

uniformly over all

i, j = 1, \dots, k

. In fact, we only need to repeat the calculations when deriving estimate (34) to calculate the Lipschitz constant of the fraction on the right side of (47), from which it follows that the mentioned constant has the form

C_{5}^{*} h^{- 1}

, where as a constant

C_{5}^{*}

, we can take the value

C_{5}^{*} = 15 {(k!)}^{2} k^{3 - 2 k} L^{10 k - 2} 2^{8 k^{2} + 21 k + 5} {(κ_{2} - κ_{1}^{2})}^{- 2 k} .

(49)

For the term

B H^{- 1} (t - z_{m}))

, we establish that in the h-neighborhood of the point

t

, it has the property of being locally Lipschitz with a constant of the form

C_{6}^{*} h^{- 1}

. By analogy with (47), we represent the specified summand as

B H^{- 1} (t - z_{m}) = \frac{\sum_{i, j = 1}^{k} w_{n i} (t) (t_{j} - z_{m j}) {(- 1)}^{i + j} det H^{(i j)}}{det H} .

Repeating almost verbatim the previous arguments in proving this lemma, we come to the conclusion that in the h-neighbourhood of any point

t \in {[0, 1]}^{k}

(under the condition

∥ t - z_{m} ∥ \leq h

), the function

B H^{- 1} (t - z_{m})

satisfies the Lipschitz condition with constant

C_{6}^{*}

, which is significantly less than that in (49). Therefore, the final constant in the lemma can be set equal to

2 C_{5}^{*}

. □

Lemma 8.

For any

y > 0

and

h < 1 / 2

, on the subset of elementary events defined by the relation

δ_{n} \leq c^{*} h

, the following estimate holds:

\begin{matrix} P_{F_{n}} (sup_{t \in {[0, 1]}^{k}} |e_{1}^{⊤} {(Z_{t}^{⊤} W_{t} Z_{t})}^{- 1} Z_{t}^{⊤} W_{t} ϵ| > y) \leq C_{2}^{*} y^{- p} δ_{n}^{k p / 2} h^{- k (1 + (p / 2))}, \end{matrix}

where the symbol

P_{F_{n}}

denotes the conditional probability given the σ-algebra

F_{n}

generated by the design

{z_{i}; i \leq n}

, and the constant

C_{2}^{*}

is defined in (61).

Proof.

We use the notation (43). Then, from (41), (42), and the Frobenius formula, we obtain the identity

ν_{n, h} (t) : = e_{1}^{⊤} {(Z_{t}^{⊤} W_{t} Z_{t})}^{- 1} Z_{t}^{⊤} W_{t} ϵ = (A^{- 1} + A^{- 1} B H^{- 1} C A^{- 1}) A_{ε} - A^{- 1} B H^{- 1} C_{ε},

(50)

where

\begin{matrix} A_{ε} = \sum_{i = 1}^{n} λ_{i} (t) ε_{i}, C_{ε} = \sum_{i = 1}^{n} (t - z_{i}) λ_{i} (t) ε_{i} . \end{matrix}

(51)

Let

μ_{n, h} (t) = \sum_{i : ∥ t - z_{i} ∥ \leq h} α_{n, i} (t) λ_{i} (t) ε_{i} .

In virtue of estimate (27), Lemma 2, and Remark 7, one has

| ν_{n, h} (t) | \leq 2^{k + 1} | μ_{n, h} (t) | .

(52)

The distribution tail

sup_{t \in {[0, 1]}^{k}} | μ_{n, h} (t) |

is estimated by Kolmogorov’s dyadic chaining (see, for example, [77]). First of all, we note that the set

{[0, 1]}^{k}

under the supremum sign can be replaced with the set of dyadic rational points

R = \cup_{l \geq 1} R_{l}

, where

R_{l} = {(j_{1} / 2^{l}, \dots, j_{k} / 2^{l}) : j_{1} = 1, \dots, 2^{l} - 1; \dots; j_{k} = 1, \dots, 2^{l} - 1} .

Therefore,

\begin{matrix} sup_{t \in {[0, 1]}^{k}} | μ_{n, h} (t) | = sup_{t \in R} | μ_{n, h} (t) | \leq \\ \leq max_{t \in R_{m}} | μ_{n, h} (t) | + \sum_{l = m + 1}^{\infty} \sum_{r = 1}^{k} max_{t \in R_{l}} | μ_{n, h} (t + 2^{- l} e_{r}) - μ_{n, h} (t) |, \end{matrix}

where

m = ⌈ - {log}_{2} h ⌉

(here,

⌈ a ⌉

is the smallest integer greater than or equal to a). Hence,

\begin{matrix} P_{F_{n}} (sup_{t \in {[0, 1]}^{k}} | μ_{n, h} (t) | > y) \leq P_{F_{n}} (max_{t \in R_{m}} | μ_{n, h} (t) | > a_{m} y) + \\ + \sum_{l = m + 1}^{\infty} \sum_{r = 1}^{k} P_{F_{n}} (max_{t \in R_{l}} | μ_{n, h} (t + 2^{- l} e_{r}) - μ_{n, h} (t) | > a_{l} y / k) \leq \\ \leq \sum_{t \in R_{m}} P_{F_{n}} (| μ_{n, h} (t) | > a_{m} y) \\ + \sum_{l = m + 1}^{\infty} \sum_{r = 1}^{k} \sum_{t \in R_{l}} P_{F_{n}} (| μ_{n, h} (t + 2^{- l} e_{r}) - μ_{n, h} (t) | > a_{l} y / k), \end{matrix}

(53)

where

a_{m}, a_{m + 1}, \dots

is a sequence of positive numbers, with

a_{m} + a_{m + 1} + \dots = 1

.

To evaluate the probabilities on the right side of (53), we need the following martingale inequality (see [78], Theorem 2.1):

\begin{matrix} E | \sum_{i = 1}^{n} η_{i} |^{p} \leq {((p - 1) \sum_{i = 1}^{n} {(E | η_{i} |^{p})}^{2 / p})}^{p / 2}, \end{matrix}

(54)

where

{η_{i}}

is a martingale difference with finite moments of order

p \geq 2

.

To obtain an upper bound for

P_{F_{n}} (| μ_{n, h} (t) | > a_{m} y)

, we use (54) for

η_{i} = α_{n, i} (t) K_{h} (t - z_{i}) Λ_{k} (P_{i}) ε_{i} .

From (29), Lemma 7, and the elementary estimate

K_{h} (t - z_{i}) Λ_{k} (P_{i}) \leq L^{k} {(δ_{n} / h)}^{k}

we obtain that with probability 1,

\sum_{i = 1}^{n} {(E_{F_{n}} {| η_{i} |}^{p})}^{2 / p} \leq M_{p}^{2 / p} α^{2} 2^{2 k} L^{2 k} {(δ_{n} / h)}^{k} .

Next, taking into account the estimate

δ_{n} \leq h \leq 1

, the above inequality and (54) imply the relation

\begin{matrix} P_{F_{n}} (| μ_{n, h} (t) | > a_{m} y) \leq \frac{E_{F_{n}} (| μ_{n, h} (t) |^{p})}{{(a_{m} y)}^{p}} \leq G_{1} \frac{{(δ_{n} / h)}^{k p / 2}}{{(a_{m} y)}^{p}} a . s ., \end{matrix}

(55)

where

G_{1} = {(p - 1)}^{p / 2} α^{p} 2^{k p} L^{k p} M_{p} .

(56)

To estimate the probability

P_{F_{n}} (| μ_{n, h} (t + 2^{- l} e_{r}) - μ_{n, h} (t) | > a_{l} y / k)

, we use inequality (54), with

η_{i} = (α_{n, i} (t + 2^{- l} e_{r}) K_{h} (t - z_{i} + 2^{- l} e_{r}) - α_{n, i} (t) K_{h} (t - z_{i})) Λ_{k} (P_{i}) ε_{i} .

We need the following relations:

\begin{matrix} | α_{n, i} (t + 2^{- l} e_{r}) K_{h} (t - z_{i} + 2^{- l} e_{r}) - α_{n, i} (t) K_{h} (t - z_{i}) | Λ_{k} (P_{i}) \leq \\ \leq | α_{n, i} (t + 2^{- l} e_{r}) | | K_{h} (t - z_{i} + 2^{- l} e_{r}) - K_{h} (t - z_{i}) | Λ_{k} (P_{i}) + \\ + K_{h} (t - z_{i}) | α_{n, i} (t + 2^{- l} e_{r}) - α_{n, i} (t) | Λ_{k} (P_{i}) \leq \\ \leq α L^{k} h^{- k - 1} 2^{- l} δ_{n}^{k} + \bar{α} h^{- 1} 2^{- l} L^{k} h^{- k} δ_{n}^{k} = (α + \bar{α}) L^{k} h^{- k - 1} 2^{- l} δ_{n}^{k}, \end{matrix}

(57)

\begin{matrix} \sum_{i = 1}^{n} (α_{n, i} (t + 2^{- l} e_{r}) K_{h} (t - z_{i} + 2^{- l} e_{r}) - α_{n, i} (t) K_{h} (t - z_{i})) Λ_{k} (P_{i}) \leq \\ \leq (α + \bar{α}) L^{k} h^{- k - 1} 2^{- l + 1} {(2 h + 2 δ_{n})}^{k} \leq (α + \bar{α}) L^{k} h^{- 1} 2^{- l + 1} 2^{2 k} . \end{matrix}

(58)

Note that

| t - z_{i} + 2^{- l} e_{r} - t + z_{i} | = 2^{- l} \leq h

for

l > m

. Moreover, the summation domain in (4) coincides with the set

N_{h} (t + 2^{- l} e_{r}) \cup N_{h} (t) f o r t \in R_{l} .

These facts, Lemma 7, and the relation

δ_{n} \leq h \leq 1

were taken into account when deriving (4). To prove (57), we used the second assertion of Lemma 7.

Thus,

\sum_{i = 1}^{n} {(E_{F_{n}} {| η_{i} |}^{p})}^{2 / p} \leq M_{p}^{2 / p} {(α + \bar{α})}^{2} 2^{2 k} L^{2 k} 2^{- 2 l + 1} h^{- 2} {(δ_{n} / h)}^{k},

and in virtue of inequality (54), we obtain

\begin{matrix} P_{F_{n}} (| μ_{n, h} (t + 2^{- l} e_{r}) - μ_{n, h} (t) | > a_{l} y / k) \leq \frac{G_{2}}{k} \frac{{(δ_{n} / h)}^{k p / 2} h^{- p} 2^{- l p}}{{(a_{l} y)}^{p}}, \end{matrix}

(59)

where

G_{2} = 2^{p / 2} {(α + \bar{α})}^{p} k^{p + 1} {(p - 1)}^{p / 2} 2^{k p} L^{k p} M_{p} = 2^{p / 2} k^{p + 1} α^{- p} {(α + \bar{α})}^{p} G_{1} .

(60)

Now, using relations (53), (55), and (59), we conclude that

P_{F_{n}} (sup_{t \in P} | μ_{n, h} (t) | > y) < y^{- p} {(δ_{n} / h)}^{k p / 2} (G_{1} 2^{k m} a_{m}^{- p} + G_{2} h^{- p} \sum_{l = m + 1}^{\infty} 2^{- (p - k) l} a_{l}^{- p}) .

The optimal sequence

a_{l}

that minimizes the right-hand side of this inequality is

a_{m} = c {(G_{1} 2^{k m})}^{1 / (p + 1)}, a_{l} = c {(G_{2} h^{- p} 2^{- (p - k) l})}^{1 / (p + 1)}

for

l = m + 1, m + 2, \dots

, where the coefficient c is given by the equality

a_{m} + a_{m + 1} + \dots = 1

. For the specified sequence, we obtain

\begin{matrix} P_{F_{n}} (sup_{t \in P} | μ_{n, h} (t) | > y) \leq \\ \leq y^{- p} {(δ_{n} / h)}^{k p / 2} {({(G_{1} 2^{k m})}^{1 / (p + 1)} + \sum_{l = m + 1}^{\infty} {(G_{2} h^{- p} 2^{- (p - k) l})}^{1 / (p + 1)})}^{p + 1} \leq \\ \leq y^{- p} δ_{n}^{k p / 2} h^{- k (1 + (p / 2))} {({(2 G_{1})}^{1 / (p + 1)} + {(G_{2} 2^{- (p - k)})}^{1 / (p + 1)} \sum_{l = 0}^{\infty} 2^{- (p - k) l / (p + 1)})}^{p + 1} \leq \\ \leq y^{- p} δ_{n}^{k p / 2} h^{- k (1 + (p / 2))} {({(2 G_{1})}^{1 / (p + 1)} + {(G_{2} 2^{- (p - k)})}^{1 / (p + 1)} {(1 - 2^{- (p - k) / (p + 1)})}^{- 1})}^{p + 1} . \end{matrix}

This relation, together with the estimate (52), completes the derivation of Lemma 8 if we let

C_{2}^{*} = 2^{p (k + 1)} {({(2 G_{1})}^{1 / (p + 1)} + {(G_{2} 2^{- (p - k)})}^{1 / (p + 1)} {(1 - 2^{- (p - k) / (p + 1)})}^{- 1})}^{p + 1},

(61)

where the constants

G_{1}

and

G_{2}

are defined in (56) and (60), respectively. □

We complete the proof of Theorem 1. We set

ζ_{n} (h) = {sup}_{t \in {[0, 1]}^{k}} | ν_{n, h} (t) |

, where

ν_{n, h} (t)

is defined in (50), and together with Lemma 8, we take into account the relation

P (ζ_{n} (h) > y, δ_{n} \leq c^{*} h) = E I (δ_{n} \leq c^{*} h) P_{F_{n}} (ζ_{n} (h) > y) .

It remains for us to use the identity (40) and the assertions of the Lemmas 5 and 6. Theorem 1 is proved.

To prove Theorem 2, we need the following auxiliary assertion.

Lemma 9.

If condition (13) is satisfied, then

{lim}_{ε \to 0} E ω_{f_{1}} (ε) = 0

, and for the independent copies

f_{1} (t), \dots, f_{n} (t)

of an almost sure continuous random process, the following uniform law of large numbers holds:

sup_{t \in {[0, 1]}^{k}} | {\bar{f}}_{N} (t) - E f_{1} (t) | \overset{p}{\to} 0, w h e r e {\bar{f}}_{N} (t) = N^{- 1} \sum_{i = 1}^{N} f_{i} (t) .

(62)

Proof.

The first assertion of the lemma follows from (13) and Lebesgue’s dominated convergence theorem. Let

ω_{{\bar{f}}_{N}} (ε) = sup_{t, s : ∥ t - s ∥ \leq ε} | {\bar{f}}_{N} (t) - {\bar{f}}_{N} (s) |, ω_{E f_{1}} (ε) = sup_{t, s : ∥ t - s ∥ \leq ε} | E f_{1} (t) - E f_{1} (s) | .

For simplicity, let

k = 2

. For an arbitrary fixed

r > 0

and

u, v = 0, \dots, r

, the following relations hold:

\begin{matrix} sup_{t \in {[0, 1]}^{2}} | {\bar{f}}_{N} (t) - E f_{1} (t) | \leq max_{0 \leq u, v \leq r} |{\bar{f}}_{N} (u / r, v / r) - E f_{1} (u / r, v / r)| + \\ + max_{1 \leq u, v \leq r} sup_{\frac{u - 1}{r} \leq t_{1} \leq \frac{u}{r}, \frac{v - 1}{r} \leq t_{2} \leq \frac{v}{r}} |{\bar{f}}_{N} (t) - {\bar{f}}_{N} (u / r, v / r)| + \\ max_{1 \leq u, v \leq r} sup_{\frac{u - 1}{r} \leq t_{1} \leq \frac{u}{r}, \frac{v - 1}{r} \leq t_{2} \leq \frac{v}{r}} |E f_{1} (t) - E f_{1} (u / r, v / r)| \leq \\ \leq max_{0 \leq u, v \leq r} |{\bar{f}}_{N} (u / r, v / r) - E f_{1} (u / r, v / r)| + ω_{{\bar{f}}_{N}} (1 / r) + ω_{E f_{1}} (1 / r) . \end{matrix}

(63)

Now, notice that

ω_{E f_{1}} (ε) \leq E ω_{f_{1}} (ε)

and

{\bar{f}}_{N} (u / r, v / r) \overset{p}{\to} E f_{1} (u / r, v / r), ω_{{\bar{f}}_{N}} (ε) \leq \frac{1}{N} \sum_{i = 1}^{N} ω_{f_{i}} (ε) \overset{p}{\to} E ω_{f_{1}} (ε) .

Therefore, the right side in (4) does not exceed

2 E ω_{f_{1}} (1 / r) + o_{p} (1)

, and due to the arbitrariness of r and the first assertion of the lemma, the relation (62) for

k = 2

is proved. For arbitrary k, the derivation of the statement of the lemma is similar. □

Proof of Theorem 2.

We let

Δ_{n, h, j} = sup_{t \in {[0, 1]}^{k}} | {\hat{f}}_{n, h, j} (t) - f_{j} (t) |

and prove the following version of the law of large numbers for the values

{Δ_{n, h, j}}

:

\frac{1}{N} \sum_{j = 1}^{N} Δ_{n, h, j} \overset{p}{\to} 0 a s N \to \infty,

(64)

where the sequences

h = h_{n}

and

N = N_{n}

are defined in (14). We introduce the events

A_{n, h, j} = {δ_{n, j} \leq c^{*} h}, j = 1, \dots, N

and notice that for any positive

ν

, one has

P \{\frac{1}{N} \sum_{j = 1}^{N} Δ_{n, h, j} > ν\} \leq P \{\frac{1}{N} \sum_{j = 1}^{N} Δ_{n, h, j} I (A_{n, h, j}) > ν\} + N P (\bar{A_{n, h, 1}}) .

(65)

Next, from Theorem 1, it follows that

\begin{matrix} E Δ_{n, h, j} I (A_{n, h, j}) \leq C_{1}^{*} E ω_{f} (h) + \int_{0}^{\infty} P (ζ_{n} (h) > y, δ_{n} \leq c^{*} h) d y \leq \\ \leq C_{1}^{*} E ω_{f} (h) + γ_{n} + \int_{γ_{n}}^{\infty} P (ζ_{n} (h) > y, δ_{n} \leq c^{*} h) d y \leq \\ \leq C_{1}^{*} E ω_{f} (h) + (1 + C_{2}^{*}) γ_{n}, \end{matrix}

where

γ_{n} = {(h^{- k (p / 2 + 1)} E δ_{n}^{k p / 2})}^{1 / p}

. To complete the proof of (64), it remains to estimate the first term on the right-hand side of (65) using Markov’s inequality, taking into account the limit relations from (14) and the last estimate.

The assertion of Theorem 2 follows from Lemma 9, limit relation (64), and the simple estimate

sup_{t \in {[0, 1]}^{k}} |N^{- 1} \sum_{j = 1}^{N} {\hat{f}}_{n, h, j} (t) - E f_{1} (t)| \leq sup_{t \in {[0, 1]}^{k}} | {\bar{f}}_{N} (t) - E f_{1} (t) | + N^{- 1} \sum_{j = 1}^{N} Δ_{n, h, j} .

Thus, Theorem 2 is proved. □

5. Conclusions

This paper considers the classic problem of nonparametric regression: estimating a regression function from observations of its noisy values in some known set of points from the domain of its definition (design points). In order to estimate the regression function, you need to impose certain restrictions on the design. In numerous works by predecessors, it is usually assumed that the design points are either fixed and, in a certain sense, regularly fill the domain of definition of a regression function or random process, or are random and consist of independent or weakly dependent random variables. Our idea is as follows: in order to restore the function with respect to the design, it is enough to only require the condition for the design points to fill asymptotically densely the domain of the regression function as the sample size increases. In contrast to previously known results, the new condition is insensitive to the nature of the dependence of the design points and is essentially necessary to restore the function with varying the accuracy, and at the same time, includes both the situation of fixed design without requirements of regularity and random design, which may not satisfy the conditions of weak dependence.

We propose to implement this very clear idea by applying the construction of Riemann integral sums to the structure of new kernel estimators, which open up the opportunity to study asymptotic properties of the estimators due to the proximity of integral sums and corresponding integrals without using probabilistic limit theorems. We emphasize that this approach allows us to reconstruct a regression function without any specification regarding the structure of dependence of the design points and allows us to significantly weaken the previously known conditions for these characteristics.

In [51,52], we use local constant estimators, and in [53], we use local linear ones, but in the case of a one-dimensional design. In the present paper, new universal local linear kernel-type estimators of an unknown regression function are constructed in the case of a multivariate design.

Author Contributions

Conceptualization, I.B.; Methodology, Y.L., I.B. and E.Y.; Software, V.K.; Formal analysis, Y.L.; Investigation, P.R.; Data curation, S.S.; Writing—original draft, Y.L.; Writing—review & editing, E.Y.; Visualization, P.R. and V.K.; Supervision, E.Y.; Project administration, I.B. and S.S. All authors read and agreed to the published version of this manuscript.

Funding

The study of Y. Linke and I. Borisov is supported by the Mathematical Center in Akademgorodok under agreement no. 075-15-2022-281 with the Ministry of Science and Higher Education of the Russian Federation.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable for studies that did not create any new data.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Fan, J.; Gijbels, I. Local Polynomial Modelling and Its Applications; Chapman and Hall: London, UK, 1996. [Google Scholar]
Fan, J.; Yao, Q. Nonlinear Time Series Nonparametric and Parametric Methods; Springer: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
Györfi, L.; Kohler, M.; Krzyzak, A.; Walk, H. A Distribution-Free Theory of Nonparametric Regression; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
Härdle, W. Applied Nonparametric Regression; Cambridge University Press: Cambridge, UK, 1990. [Google Scholar]
Müller, H.-G. Nonparametric Regression Analysis of Longitudinal Data; Springer: New York, NY, USA, 1988. [Google Scholar]
Ahmad, I.A.; Lin, P.-E. Fitting a multiple regression function. J. Stat. Plan. Inference 1984, 9, 163–176. [Google Scholar] [CrossRef]
Georgiev, A.A. Nonparametric multiple function fitting. Stat. Probab. Lett. 1990, 10, 203–211. [Google Scholar] [CrossRef]
Chu, C.-K.; Marron, J.S. Choosing a Kernel Regression Estimator. Stat. Sci. 1991, 6, 404–419. [Google Scholar] [CrossRef]
Jones, M.C.; Davies, S.J.; Park, B.U. Versions of kernel-type regression estimators. J. Am. Stat. Assoc. 1994, 89, 825–832. [Google Scholar] [CrossRef]
Georgiev, A.A. Asymptotic properties of the multivariate Nadaraya-Watson regression function estimate: The fixed design case. Stat. Probab. Lett. 1989, 7, 35–40. [Google Scholar] [CrossRef]
Härdle, W.; Luckhaus, S. Uniform consistency of a class of regression function estimators. Ann. Stat. 1984, 12, 612–623. [Google Scholar] [CrossRef]
Beran, J.; Feng, Y. Local polynomial estimation with a FARIMA-GARCH error process. Bernoulli 2001, 7, 733–750. [Google Scholar] [CrossRef][Green Version]
Benelmadani, D.; Benhenni, K.; Louhichi, S. Trapezoidal rule and sampling designs for the nonparametric estimation of the regression function in models with correlated errors. Statistics 2020, 54, 59–96. [Google Scholar] [CrossRef]
Tang, X.; Xi, M.; Wu, Y.; Wang, X. Asymptotic normality of a wavelet estimator for asymptotically negatively associated errors. Stat. Probab. Lett. 2018, 140, 191–201. [Google Scholar] [CrossRef]
Benhenni, K.; Hedli-Griche, S.; Rachdi, M. Estimation of the regression operator from functional fixed-design with correlated errors. J. Multivar. Anal. 2010, 101, 476–490. [Google Scholar] [CrossRef]
Gu, W.; Roussas, G.G.; Tran, L.T. On the convergence rate of fixed design regression estimators for negatively associated random variables. Stat. Probab. Lett. 2007, 77, 1214–1224. [Google Scholar] [CrossRef]
Wu, J.S.; Chu, C.K. Nonparametric estimation of a regression function with dependent observations. Stoch. Proc. Their Appl. 1994, 50, 149–160. [Google Scholar] [CrossRef]
Hansen, B.E. Uniform convergence rates for kernel estimation with dependent data. Econom. Theory 2008, 24, 726–748. [Google Scholar] [CrossRef]
Zhou, X.; Zhu, F. Asymptotics for L1-wavelet method for nonparametric regression. J. Inequal. Appl. 2020, 2020, 216. [Google Scholar] [CrossRef]
Gasser, T.; Engel, J. The choice of weghts in kernel regression estimation. Biometrica 1990, 77, 277–381. [Google Scholar] [CrossRef]
Linton, O.B.; Jacho-Chavez, D.T. On internally corrected and symmetrized kernel estimators for nonparametric regression. Test 2010, 19, 166–186. [Google Scholar] [CrossRef]
Chu, C.K.; Deng, W.-S. An interpolation method for adapting to sparse design in multivariate nonparametric regression. J. Stat. Plan. Inference 2003, 116, 91–111. [Google Scholar] [CrossRef]
Nadaraya, E.A. Remarks on non-parametric estimates for density functions and regression curves. Theory Prob. Appl. 1970, 15, 134–137. [Google Scholar] [CrossRef]
Liero, H. Strong uniform consistency of nonparametric regression function estimates. Probab. Theory Relat. Fields 1989, 82, 587–614. [Google Scholar] [CrossRef]
Mack, Y.P.; Silvermann, B.W. Weak and strong uniform consistency of kernel regression estimates. Z. Wahrscheinlichkeitstheor. Verw. Geb. 1982, 61, 405–415. [Google Scholar] [CrossRef]
Devroye, L.P. The uniform convergence of the Nadaraya–Watson regression function estimate. Can. J. Stat. 1979, 6, 179–191. [Google Scholar] [CrossRef]
Müller, H.-G. Density adjusted kernel smoothers for random design nonparametric regression. Stat. Probab. Lett. 1997, 36, 161–172. [Google Scholar] [CrossRef]
Gu, J.; Li, Q.; Yang, J.-C. Multivariate local polynomial kernel estimators: Leading bias and asymptotic distribution. Econom. Rev. 2015, 34, 979–1010. [Google Scholar] [CrossRef]
Ruppert, D.; Wand, M.P. Multivariate locally weighted least squares regression. Ann. Stat. 1994, 22, 1346–1370. [Google Scholar] [CrossRef]
Li, Q.; Lu, X.; Ullah, A. Multivariate local polynomial regression for estimating average derivatives. Nonparametr. Stat. 2003, 15, 607–624. [Google Scholar] [CrossRef]
Roussas, G.G. Nonparametric regression estimation under mixing conditions. Stoch. Proc. Appl. 1990, 36, 107–116. [Google Scholar] [CrossRef]
Masry, E. Nonparametric regression estimation for dependent functional data. Stoch. Proc. Their Appl. 2005, 115, 155–177. [Google Scholar] [CrossRef]
Kulik, R.; Lorek, P. Some results on random design regression with long memory errors and predictors. J. Stat. Plan. Inference 2011, 141, 508–523. [Google Scholar] [CrossRef]
Kulik, R.; Wichelhaus, C. Nonparametric conditional variance and error density estimation in regression models with dependent errors and predictors. Electr. J. Stat. 2011, 5, 856–898. [Google Scholar] [CrossRef]
Jiang, J.; Mack, Y.P. Robust local polynomial regression for dependent data. Stat. Sin. 2001, 11, 705–722. [Google Scholar]
Li, X.; Yang, W.; Hu, S. Uniform convergence of estimator for nonparametric regression with dependent data. J. Inequal. Appl. 2016, 2016, 142. [Google Scholar] [CrossRef]
Hong, S.Y.; Linton, O.B. Asymptotic properties of a Nadaraya-Watson type estimator for regression functions of infinite order. arXiv 2016, arXiv:1604.06380. [Google Scholar] [CrossRef][Green Version]
Shen, J.; Xie, Y. Strong consistency of the internal estimator of nonparametric regression with dependent data. Stat. Probab. Lett. 2013, 83, 1915–1925. [Google Scholar] [CrossRef]
Masry, E. Multivariate local polynomial regression for time series: Uniform strong consistency and rates. J. Time Ser. Anal. 1996, 17, 571–599. [Google Scholar] [CrossRef]
Masry, E. Long-range dependence: Strong consistency and rates. IEEE Trans. Inf. Theory 2001, 47, 2863–2875. [Google Scholar] [CrossRef]
Masry, E.; Fan, J. Local polynomial estimation of regression functions for mixing processes. Scand. Stat. Theory Appl. 1997, 24, 165–179. [Google Scholar] [CrossRef]
Gao, J.; Kanaya, S.; Li, D.; Tjostheim, D. Uniform consistency for nonparametric estimators in null recurrent time series. Econom. Theory 2015, 31, 911–952. [Google Scholar] [CrossRef]
Wang, Q.; Chan, N. Uniform convergence rates for a class of martingales with application in non-linear cointegrating regression. Bernoulli 2014, 20, 207–230. [Google Scholar] [CrossRef]
Chan, N.; Wang, Q. Uniform convergence for Nadaraya-Watson estimators with nonstationary data. Econom. Theory 2014, 30, 1110–1133. [Google Scholar] [CrossRef]
Linton, O.; Wang, Q. Nonparametric transformation regression with nonstationary data. Econom. Theory 2016, 32, 1–29. [Google Scholar] [CrossRef]
Karlsen, H.A.; Myklebust, T.; Tjøstheim, D. Nonparametric estimation in a nonlinear cointegration type model. Ann. Stat. 2007, 35, 252–299. [Google Scholar] [CrossRef]
Wang, Q.; Phillips, P.C.B. Structural nonparametric cointegrating regression. Econometrica 2009, 77, 1901–1948. [Google Scholar]
Wang, Q.Y.; Phillips, P.C.B. Asymptotic theory for local time density estimation and nonparametric cointegrating regression. Econom. Theory 2009, 25, 710–738. [Google Scholar] [CrossRef]
Einmahl, U.; Mason, D.M. Uniform in bandwidth consistency of kernel-type function estimators. Ann. Stat. 2005, 33, 1380–1403. [Google Scholar] [CrossRef]
Liang, H.-Y.; Jing, B.-Y. Asymptotic properties for estimates of nonparametric regression models based on negatively associated sequences. J. Multivar. Anal. 2005, 95, 227–245. [Google Scholar] [CrossRef]
Borisov, I.S.; Linke, Y.Y.; Ruzankin, P.S. Universal weighted kernel-type estimators for some class of regression models. Metrika 2021, 84, 141–166. [Google Scholar] [CrossRef]
Linke, Y.; Borisov, I.; Ruzankin, P. Universal kernel-type estimation of random fields. Statistics 2023, 57, 785–810. [Google Scholar] [CrossRef]
Linke, Y.; Borisov, I.; Ruzankin, P.; Kutsenko, V.; Yarovaya, E.; Shalnova, S. Universal local linear kernel estimators in nonparametric regression. Mathematics 2022, 10, 2693. [Google Scholar] [CrossRef]
Linke, Y.Y.; Borisov, I.S. Insensitivity of Nadaraya–Watson estimators to design correlation. Commun. Stat. Theory Methods 2022, 51, 6909–6918. [Google Scholar] [CrossRef]
Linke, Y.Y. Towards insensitivity of Nadaraya–Watson estimators to design correlation. Theory Probab. Appl. 2023, 68, 198–210. [Google Scholar] [CrossRef]
Linke, Y.Y. Asymptotic properties of one-step M-estimators. Commun. Stat. Theory Methods 2019, 48, 4096–4118. [Google Scholar] [CrossRef]
Linke, Y.Y.; Borisov, I.S. Constructing explicit estimators in nonlinear regression problems. Theory Probab. Appl. 2018, 63, 22–44. [Google Scholar] [CrossRef]
Linke, Y.Y.; Borisov, I.S. Constructing initial estimators in one-step estimation procedures of nonlinear regression. Stat. Probab. Lett. 2017, 120, 87–94. [Google Scholar] [CrossRef]
Linke, Y.Y. On sufficient conditions for the consistency of local linear kernel estimators. Math. Notes. 2023, 114, 283–296. [Google Scholar] [CrossRef]
Li, Y.; Hsing, T. Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data. Ann. Stat. 2010, 38, 3321–3351. [Google Scholar] [CrossRef]
Hall, P.; Müller, H.-G.; Wang, J.-L. Properties of principal component methods for functional and longitudinal data analysis. Ann. Stat. 2006, 34, 1493–1517. [Google Scholar] [CrossRef]
Zhou, L.; Lin, H.; Liang, H. Efficient estimation of the nonparametric mean and covariance functions for longitudinal and sparse functional data. J. Am. Stat. Assoc. 2018, 113, 1550–1564. [Google Scholar] [CrossRef]
Yao, F. Asymptotic distributions of nonparametric regression estimators for longitudinal or functional data. J. Multivar. Anal. 2007, 98, 40–56. [Google Scholar] [CrossRef]
Yao, F.; Müller, H.-G.; Wang, J.-L. Functional data analysis for sparse longitudinal data. J. Am. Stat. Assoc. 2005, 100, 577–590. [Google Scholar] [CrossRef]
Zhang, J.-T.; Chen, J. Statistical inferences for functional data. Ann. Stat. 2007, 35, 1052–1079. [Google Scholar] [CrossRef]
Zhang, X.; Wang, J.-L. From sparse to dense functional data and beyond. Ann. Stat. 2016, 44, 2281–2321. [Google Scholar] [CrossRef]
Kokoszka, P.; Reimherr, M. Introduction to Functional Data Analysis; Chapman and Hall/CRC: London, UK, 2017. [Google Scholar]
Linke, Y.Y.; Borisov, I.S. Universal nonparametric kernel-type estimators for the mean and covariance functions of a stochastic process. Theory Probab. Appl. 2024, 69, 35–58. [Google Scholar] [CrossRef]
Linke, Y. Kernel estimators for the mean function of a stochastic process under sparse design conditions. Siberian Adv. Math. 2022, 32, 269–276. [Google Scholar] [CrossRef]
Linke, Y. Mean function estimation for a noisy random process under a sparse data condition. Chebyshevskii Sb. 2023, 24, 112–125. (In Russian) [Google Scholar] [CrossRef]
Bulinski, A. Forward Selection of Relevant Factors by Means of MDR-EFE Method. Mathematics 2024, 12, 831. [Google Scholar] [CrossRef]
Hsing, T.; Eubank, R. Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators; Wiley: Hoboken, NJ, USA, 2015. [Google Scholar]
Müller, H.-G. Functional modelling and classification of longitudinal data. Scand. J. Stat. 2005, 32, 223–246. [Google Scholar] [CrossRef]
Wu, H.; Zhang, J.-T. Nonparametric Regression Methods for Longitudinal Data Analysis: Mixed-Effects Modeling Approaches; John Wiley and Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
Zheng, S.; Yang, L.; Hardle, W. A smooth simultaneous confidence corridor for the mean of sparse functional data. J. Am. Stat. Assoc. 2014, 109, 661–673. [Google Scholar] [CrossRef]
Wang, J.-L.; Chiou, J.-M.; Müller, H.-G. Functional Data Analysis. Annu. Rev. Stat. 2016, 3, 257–295. [Google Scholar] [CrossRef]
Chentsov, N.N. Weak convergence of stochastic processes whose trajectories have no discontinuities of the second kind and the heuristic approach to the Kolmogorov-Smirnov tests. Theory Probab. Appl. 1956, 1, 140–144. [Google Scholar] [CrossRef]
Rio, E. Moment Inequalities for Sums of Dependent Random Variables under Projective Conditions. J. Theor. Probab. 2009, 22, 146–163. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Linke, Y.; Borisov, I.; Ruzankin, P.; Kutsenko, V.; Yarovaya, E.; Shalnova, S. Multivariate Universal Local Linear Kernel Estimators in Nonparametric Regression: Uniform Consistency. Mathematics 2024, 12, 1890. https://doi.org/10.3390/math12121890

AMA Style

Linke Y, Borisov I, Ruzankin P, Kutsenko V, Yarovaya E, Shalnova S. Multivariate Universal Local Linear Kernel Estimators in Nonparametric Regression: Uniform Consistency. Mathematics. 2024; 12(12):1890. https://doi.org/10.3390/math12121890

Chicago/Turabian Style

Linke, Yuliana, Igor Borisov, Pavel Ruzankin, Vladimir Kutsenko, Elena Yarovaya, and Svetlana Shalnova. 2024. "Multivariate Universal Local Linear Kernel Estimators in Nonparametric Regression: Uniform Consistency" Mathematics 12, no. 12: 1890. https://doi.org/10.3390/math12121890

APA Style

Linke, Y., Borisov, I., Ruzankin, P., Kutsenko, V., Yarovaya, E., & Shalnova, S. (2024). Multivariate Universal Local Linear Kernel Estimators in Nonparametric Regression: Uniform Consistency. Mathematics, 12(12), 1890. https://doi.org/10.3390/math12121890

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multivariate Universal Local Linear Kernel Estimators in Nonparametric Regression: Uniform Consistency

Abstract

1. Introduction

2. Main Results

2.1. Notation and Main Assumptions

2.2. The Main Theorem, Corollaries, and Examples

3. Estimating the Mean and Covariance of a Random Regression Function with a Dense Design

4. Proofs

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI