Berry–Esseen Bounds of Residual Density Estimators in the First-Order Autoregressive Model with the α-Mixing Errors

Wang, Jiaxin; Liu, Tianze

doi:10.3390/math14010073

Open AccessArticle

Berry–Esseen Bounds of Residual Density Estimators in the First-Order Autoregressive Model with the α-Mixing Errors

by

Jiaxin Wang

and

Tianze Liu

^*

School of Mathematics and Statistics, Beihua University, Jilin 132013, China

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(1), 73; https://doi.org/10.3390/math14010073

Submission received: 20 November 2025 / Revised: 18 December 2025 / Accepted: 24 December 2025 / Published: 25 December 2025

(This article belongs to the Special Issue Mathematical Statistics and Nonparametric Inference)

Download

Browse Figures

Versions Notes

Abstract

This study establishes explicit Berry–Esseen bounds for residual kernel density estimators in AR(1) models with

α

-mixing errors. Since the true innovations are unobservable, we introduce a residual-based estimator

{\hat{f}}_{n} (x)

and establish its normal approximation under stationarity. By imposing conditions on the bandwidth, mixing coefficients, and moments, we obtain Kolmogorov distance bounds between the standardized estimator and its Gaussian limit. These bounds explicitly depend on the bandwidth, block parameters, and mixing coefficients. A key corollary quantifies the convergence rate as

O (n^{(2 c - 2 b + a) / 4})

. Our results generalize prior work, advancing theoretical foundations for nonparametric inference in high-dimensional time series.

Keywords:

Berry–Esseen bounds; residual kernel density estimator; AR(1); α-mixing sequence

MSC:

60F25; 62G07

1. Introduction

Consider a sequence of random variables {

X_{s}

} following a first-order autoregressive (AR(1)) model

X_{s} = ρ X_{s - 1} + ε_{s}, 1 \leq s \leq n,

(1)

where {

ε_{s}

} is a sequence of errors (see Brockwell and Davis [1]). Under stationarity, the process can be represented as

X_{s} = \sum_{j = 0}^{\infty} ρ^{j} ε_{s - j}, s \geq 1 .

In the model (1), if the innovations {

ε_{s}

} were directly observable, a kernel density estimator defined as

f_{n} (x) : = \frac{1}{n h_{n}} \sum_{s = 1}^{n} K (\frac{x - ε_{s}}{h_{n}}), x \in R,

(2)

could be employed to estimate the true density

f (x)

of

ε_{s}

. Here,

K (\cdot)

is a kernel probability density function (p.d.f.), and

h_{n}

is a bandwidth satisfying

h_{n} \to 0

as

n \to \infty

. Since

f_{n} (x)

belongs to the class of kernel density estimators, its theoretical characteristics have been widely explored in the literature. Early studies focused on i.i.d. samples. Parzen [2] proposed kernel density estimation’s asymptotic theory, defined kernel bandwidth, and built the framework for independent-sample density estimation. Rosenblatt [3,4] laid the groundwork for classical kernel methodologies via foundational research. As focus shifted to dependent data, research expanded to mixing sequences. Wu et al. [5] established the complete consistency rate of recursive density estimators for strong mixing samples. Honda [6] developed the nonparametric conditional quantile theory for

α

-mixing processes, complementing the density estimation focus of this study. Laïb and Louani [7] assessed kernel regression estimators’ asymptotic traits for functional, stationary, ergodic datasets. These works provide the analytical frameworks underpinning this study’s theoretical advancements.

In the autoregressive model by (1), observable data are restricted to the sequence {

X_{1}, X_{2}, \dots, X_{n}

}. To construct an estimator for

f (x)

, we thus adapt the formulation of

f_{n} (x)

from the preceding context by incorporating residuals of the form

{\hat{ε}}_{s} = X_{s} - {\hat{ρ}}_{n} X_{s - 1}, 1 \leq s \leq n,

replacing the unobservable {

ε_{1}, ε_{2}, \dots, ε_{n}

}. Here,

{\hat{ρ}}_{n} = {\hat{ρ}}_{n} (X_{1}, \dots, X_{n})

denotes an estimator of

ρ

derived from the observed sequence {

X_{s}

}. As a concrete illustration, the least squares estimator of

ρ

takes the form

{\hat{ρ}}_{n} = \frac{\sum_{s = 1}^{n} X_{s} X_{s - 1}}{\sum_{s = 1}^{n} X_{s - 1}^{2}} .

Accordingly, we define a residual-based kernel density estimator for

f (x)

as follows

{\hat{f}}_{n} (x) : = \frac{1}{n h_{n}} \sum_{s = 1}^{n} K (\frac{x - {\hat{ε}}_{s}}{h_{n}}), x \in R .

For AR models with i.i.d. errors, Lee and Na [8] studied

L_{p}

-norm properties of residual density estimators, and Horváth and Zitikis [9] extended the Bickel–Rosenblatt framework to

L_{p}

settings. When errors are

α

-mixing, the asymptotic behavior becomes more intricate, requiring careful handling of the interplay between mixing decay and bandwidth selection. Gao et al. [10] established asymptotic normality of residual density estimators in stationary and explosive AR(1) models, highlighting the influence of mixing and bandwidth on convergence rates, yet without providing explicit Berry–Esseen bounds.

In the Berry–Esseen literature, Petrov [11] obtained the classical

O (n^{- 1 / 2})

rate for independent samples. For dependent data, Liu and Niu [12] derived Berry–Esseen bounds for recursive kernel estimators under strong mixing using block decomposition, while Wu et al. [13] achieved rates close to

O (n^{- 1 / 4})

in semiparametric models with linear-process errors via Bernstein-block techniques. Wu et al. [14] later improved these bounds for strong-mixing settings. Neufeld [15] studied weighted sums in free probability, and Chen and Qu [16] derived expansions and large deviations for particle systems. However, a clear gap remains in establishing Berry–Esseen bounds specifically for the residual kernel density estimator in the basic AR(1) model under

α

-mixing errors. The present paper aims to fill this gap.

Compared with existing literature, it has made substantial progress in model specification, theoretical conditions, and result accuracy. Specifically, Gao et al. [10] established the asymptotic normality of residual density estimation under the same AR(1) model, but did not provide a quantitative characterization of the distributional convergence rate; while Liu and Niu [12] studied the Berry–Esseen bounds of density estimation under

α

-mixing sequences, their object is pure mixing sequences without a regression structure, and they require a fast-decaying mixing coefficient (

α (n) = O (n^{- τ}), τ > 6

). In contrast, this paper is the first to establish explicit Berry–Esseen bounds for residual kernel density estimation under the AR(1) regression framework. We not only relax the mixing condition to

α (n) = O (n^{- ℓ}), ℓ > 3

, but also obtain a convergence bound dependent on the bandwidth, block parameters, and mixing coefficient through refined block partitioning and bias analysis. Under reasonable parameter selection, the rate can reach

O (n^{(2 c - 2 b + a) / 4})

, which is superior to the

O (n^{- 1 / 10})

in Liu and Niu [12], and approaches the near-optimal order

O (n^{- 1 / 4})

in nonparametric estimation when the mixing decays sufficiently fast. Therefore, the present work provides a more rigorous theoretical foundation for subsequent nonparametric inference in high-dimensional time series.

A sequence of random variables

{Z_{i} : i \geq 1}

is designated as

α

-mixing if its dependence coefficient

α (n)

, defined as

α (n) = sup_{δ \to \infty} sup_{k \geq δ} \{| P (A \cap B) - P (A) P (B) | : B \in G_{k + n}^{\infty}, A \in G_{1}^{k}\} \to 0

as

n \to \infty

, where

G_{a}^{b} = σ (Z_{a}, Z_{a + 1}, \dots, Z_{b})

denotes the

σ

-field generated by observations from index a to b (

a \leq b

). The

α

-mixing condition is relatively mild and covers many stochastic processes, including a wide range of time-series models. For further properties and applications of

α

-mixing. we refer to Wu et al. [5], Honda [6] and the references therein.

The paper is organized as follows. Section 2 meticulously elucidates the core hypotheses underpinning the study and presents principal results, including the Berry–Esseen bound and a corollary featuring a unique convergence rate. Section 3 provides the auxiliary lemma. Numerical simulation is provided in Section 4. Last, some proofs in Section 5. Unless otherwise specified, all limits are evaluated as the sample size

n \to \infty

. We employ the symbol C to denote any finite positive constant, with its specific value being irrelevant to the underlying mathematical reasoning.

2. Some Basic Assumptions and Main Results

We now list the assumptions underpinning our analysis.

(a) The error sequence ${ε_{s}}$ forms a strictly stationary $α$ -mixing stochastic process, with a bounded, unknown probability density function $f : R \to R^{+}$ .
(b) The density f has bounded first-order ( $f^{'}$ ) and second-order ( $f^{″}$ ) derivatives over $R$ .
(a) For $r > 2$ and $τ > 0$ , $E ε_{1} = 0$ and $E | ε_{1} |^{r + τ} < \infty$ . The $α$ -mixing coefficient satisfies $α (n) = O (n^{- ι})$ with $ι > 3$ .
(b) For fixed $x \in R$ , the asymptotic variance of $f_{n} (x)$ (defined in (2)) is positive

$\underset{n \to \infty}{lim inf} {n h_{n} V a r (f_{n} (x))} = σ_{1}^{2} (x) > 0,$

(3)
(a) The kernel function $K : R \to R^{+}$ is recognized as a bounded probability density function.
(b) Let the derivative $K^{'} (x)$ of $K (x)$ be bounded, $x \in R$ .
(c) Its moments satisfy

$\int_{R} x^{2} K (x) d x = D > 0 and \int_{R} x K (x) d x = 0,$

where D is a finite positive constant.
Let $μ_{n}$ and $ν_{n}$ be positive integers satisfying, as $n \to \infty$ ,

$μ_{n} \to \infty, ν_{n} \to \infty, μ_{n} \leq ν_{n}^{2}, μ_{n} / n \to 0, and ν_{n} / μ_{n} \to 0 .$
Let $h_{n}$ be the bandwidth sequence satisfying $h_{n} \to 0$ , $n \to \infty$ and $n h_{n}^{3} \to 0 .$
Let ${\hat{ρ}}_{n}$ be an estimator of $ρ$ that satisfies: $n^{1 / 2} ({\hat{ρ}}_{n} - ρ) = O_{P} (1)$ .
The core theoretical outcome concerning the Berry-Esséen bound for $α$ -mixing random sequences is established below.

Remark 1.

We here elaborate on the justifications for Assumptions 2(a) and 4.

For Assumption 2(a), note that when

r > 2

and

τ > 0

, the ratio

r / (r + τ)

is always less than 1. For instance, if

r = 4

and

τ = 2

, this ratio equals

2 / 3

; if

r = 3

and

τ = 1

, it equals

3 / 4

. Choosing

ℓ > 3

ensures

ℓ > r / (r + τ)

, as

3 > 1 > r / (r + τ)

. The assumption balances generality and tractability, covering a broad class of α-mixing processes with sufficiently fast dependence decay.

For Assumption 4, the constraints on

μ_{n}

and

ν_{n}

originate from the big-block-small-block technique, which decomposes dependent α-mixing sequences into approximately independent components. The condition

μ_{n} \to \infty

ensures each block contains enough observations to apply limit theorems, while

μ_{n} / n \to 0

guarantees a large number of blocks for aggregation. The condition

ν_{n} \to \infty

strengthens the independence approximation between consecutive big blocks, and

ν_{n} / μ_{n} \to 0

ensures buffer blocks are negligible to avoid excessive data loss. The constraint

μ_{n} \leq ν_{n}^{2}

balances mixing coefficient decay and block size growth, which ensures weak dependence between non-consecutive blocks.

Theorem 1.

Suppose Assumptions 1–6 hold. Let

x \in R

be a fixed point where the density function f satisfies first-order Lipschitz continuity. This condition implies

| f (x) - f (x - z) | \leq C | z |, \forall z \in R

where C is a positive constant. Let

Φ (\cdot)

denote the cumulative distribution function of the standard normal distribution. We take moment parameters

r > 2

and

τ > 0

, along with an auxiliary parameter

m > 0

. For this auxiliary parameter, we require

p^{- 1} + q^{- 1} < 1

where p and q are Hölder conjugate exponents. Under these settings, there exists a positive constant C such that

\begin{matrix} sup_{z \in R} |P (\frac{{\hat{f}}_{n} (x) - E (f_{n} (x))}{\sqrt{V a r (f_{n} (x))}} \leq z) - Φ (z)| \\ = O {{(\frac{ν_{n}}{μ_{n} h_{n}^{1 / 2}})}^{1 / 2} + {(\frac{μ_{n} α^{1 - p^{- 1} - q^{- 1}} (μ_{n})}{n h_{n}^{1 - p^{- 1} - q^{- 1}}})}^{1 / 2} + {(\frac{μ_{n}}{n h_{n}^{1 / 2}})}^{1 / 2} + \frac{ν_{n} α^{1 - p^{- 1} - q^{- 1}} (ν_{n})}{n h_{n}^{1 - p^{- 1} - q^{- 1}}} \\ + \frac{μ_{n}^{δ}}{n^{(r - 2) / 2} h_{n}^{(r - 2) / 2}} + \frac{μ_{n}^{(r - 2) / 2}}{n^{(r - 2) / 2} h_{n}^{(r τ - 2 r + r^{2}) / (2 r + 2 τ)}} + {(\frac{α^{(m - 1) / m} ν_{n} μ_{n}^{(1 - m) / m}}{n^{- 1 / 2} h_{n}^{(m - 1) / 2 m}})}^{1 / 2}} . \end{matrix}

where

δ > 0

is a small constant.

Corollary 1.

Assume Assumptions 1–6 with

h_{n} = n^{- a}

,

μ_{n} = n^{b}

, and

ν_{n} = n^{c}

, where

1 / 3 < a < 1 / 2

,

0 < c < b \leq 2 c < 1

,

m a x (0, 2 b - 1) < c < b - a / 2

. Let

E | ε_{1} |^{r + τ} < \infty

(

r > 2

,

τ > 0

), and

α (n) = O (n^{- ι})

for some

ι > 3

. Then

sup_{z \in R} |P (\frac{{\hat{f}}_{n} (x) - E (f_{n} (x))}{\sqrt{V a r (f_{n} (x))}} \leq z) - Φ (z)| = O (n^{(2 c - 2 b + a) / 4}),

with

Φ (\cdot)

denoting the standard normal distribution function.

Remark 2.

Corollary 1 follows from Theorem 1 by substituting the specific parameterizations

h_{n} = n^{- a}

, block parameters

μ_{n} = n^{b}

,

ν_{n} = n^{c}

and identifying the dominant error term through asymptotic comparison. Wu et al. [13] studied convergence rates of standardized partial sums, which depend solely on the decay of mixing coefficients. In contrast, the present work deals with residual kernel density estimation, where the rate must simultaneously capture two effects: the bias induced by the bandwidth

h_{n}

and the dependence control via the block parameter

μ_{n}

. Consequently, the rate expression is a joint function of a, b, and ι. As

ι \to \infty

, the rate obtained here approaches

O (n^{- 1 / 4})

, which aligns with the near-optimal rates typical in nonparametric estimation, as discussed in Wu et al. [13].

3. Auxiliary Lemma

Lemma 1

([17]). For an α-mixing sequence

{ε_{s}, s \geq 1}

satisfying

E ε_{s} = 0

and

E | ε_{s} |^{r + τ} < \infty

with

r > 2

and

τ > 0

, assume the mixing coefficient decays as

α (n) = O (n^{- ι})

, where

ι > r / (r + τ)

. For any

δ > 0

, there exists a constant

0 < c = c (r, τ, ι, δ) < \infty

, then

E (max_{1 \leq k \leq n} {|\sum_{s = 1}^{k} ε_{s}|}^{r}) \leq c \{n^{δ} \sum_{s = 1}^{n} E {| ε_{s} |}^{r} + {(\sum_{s = 1}^{n} (E | ε_{s} {|^{r + τ})}^{2 / (r + τ)})}^{r / 2}\}, n \geq 1 .

Lemma 2

([18]). Consider an α-mixing sequence

{ε_{s}, s \geq 1}

with

E ε_{s} = 0

and finite

(2 + τ)

-th moment

E | ε_{s} |^{2 + τ} < \infty

for

τ > 0

. If the mixing coefficients satisfy

\sum_{s = 1}^{\infty} α^{\frac{τ}{2 + τ}} (s) < \infty

, then

E (\sum_{s = 1}^{n} ε_{s}^{2}) = [1 + 16 \sum_{s = 1}^{\infty} α^{τ / (2 + τ)} (s)] \sum_{s = 1}^{n} (E | ε_{s} {|^{2 + τ})}^{2 / (2 + τ)}, n \geq 1 .

Lemma 3

([19]). Let ξ, η be

G

,

H

measurable random variables with

{E | ξ |}^{p} < \infty

,

{E | η |}^{q} < \infty

(

p, q > 1

and

p^{- 1} + q^{- 1} < 1

). Then

| E ξ η - E ξ E η | \leq {8 (E | ξ |}^{p})^{p^{- 1}} {(E | η |}^{q})^{q^{- 1}} {(α (G, H))}^{1 - p^{- 1} - q^{- 1}} .

Lemma 4

([20]). Given positive integers

μ_{n}

,

ν_{n}

and an α-mixing sequence

{ε_{s}, s \geq 1}

of random variables. Define

η_{j} = \sum_{s = j (μ_{n} + ν_{n}) + 1}^{j (μ_{n} + ν_{n}) + μ_{n}} {\tilde{Q}}_{n, s} (x)

for

0 \leq j \leq r_{n} - 1

. With

r > 0

,

m > 0

and

1 / s + 1 / m = 1

. There exists

C > 0

such that for any

i \in R

,

|E exp \{i t \sum_{j = 0}^{k - 1} η_{j}\} - \prod_{j = 0}^{k - 1} E exp {i t ξ_{j}}| \leq C | i | α^{1 / s} (ν_{n}) \sum_{j = 0}^{r_{n} - 1} {∥ η_{j} ∥}_{m}

Lemma 5

([12]). Consider random variables X and

Y_{1}, \dots, Y_{m}

and positive thresholds

w_{1}, \dots, w_{m}

, satisfies

\begin{matrix} sup_{u} |P (X + \sum_{i = 1}^{m} Y_{i} \leq u) - Φ (u)| \\ \leq sup_{u} |P (X \leq u) - Φ (u)| + \sum_{i = 1}^{m} \frac{w_{i}}{\sqrt{2 π}} + \sum_{i = 1}^{m} P (| Y_{i} | > w_{i}) . \end{matrix}

Lemma 6.

Under Assumptions 1–3 we have

E {(S_{n}^{″})}^{2} \leq C (\frac{ν_{n}}{μ_{n} h_{n}^{1 / 2}} + \frac{μ_{n} α^{1 - p^{- 1} - q^{- 1}} (μ_{n})}{n h_{n}^{1 - p^{- 1} - q^{- 1}}})

E {(S_{n}^{‴})}^{2} \leq C \frac{μ_{n}}{n h_{n}^{1 / 2}} .

Moreover,

\begin{matrix} P (| S_{n}^{″} | > {(\frac{ν_{n}}{μ_{n} h_{n}^{1 / 2}} + \frac{μ_{n} α^{1 - p^{- 1} - q^{- 1}} (μ_{n})}{n h_{n}^{1 - p^{- 1} - q^{- 1}}})}^{1 / 2}) & \leq C {(\frac{ν_{n}}{μ_{n} h_{n}^{1 / 2}} + \frac{μ_{n} α^{1 - p^{- 1} - q^{- 1}} (μ_{n})}{n h_{n}^{1 - p^{- 1} - q^{- 1}}})}^{1 / 2}, \\ P (| S_{n}^{‴} | > {(\frac{μ_{n}}{n h_{n}^{1 / 2}})}^{1 / 2}) & \leq C {(\frac{μ_{n}}{n h_{n}^{1 / 2}})}^{1 / 2} . \end{matrix}

Lemma 7.

Under Assumption 1(b), the variance discrepancy satisfies

|s_{n}^{2} - 1| \leq C ({(\frac{ν_{n}}{μ_{n} h_{n}^{1 / 2}})}^{1 / 2} + {(\frac{μ_{n} α^{1 - p^{- 1} - q^{- 1}} (μ_{n})}{n h_{n}^{1 - p^{- 1} - q^{- 1}}})}^{1 / 2} + {(\frac{μ_{n}}{n h_{n}^{1 / 2}})}^{1 / 2} + \frac{ν_{n} α^{1 - p^{- 1} - q^{- 1}} (ν_{n})}{n h_{n}^{1 - p^{- 1} - q^{- 1}}})

Lemma 8.

Under the conditions of Theorem 1,

\begin{matrix} sup_{z \in R} |P (S_{n}^{'} \leq z) - Φ (z)| \\ \leq C ({(\frac{ν_{n}}{μ_{n} h_{n}^{1 / 2}})}^{1 / 2} + {(\frac{μ_{n} α^{1 - p^{- 1} - q^{- 1}} (μ_{n})}{n h_{n}^{1 - p^{- 1} - q^{- 1}}})}^{1 / 2} + {(\frac{μ_{n}}{n h_{n}^{1 / 2}})}^{1 / 2} + \frac{ν_{n} α^{1 - p^{- 1} - q^{- 1}} (ν_{n})}{n h_{n}^{1 - p^{- 1} - q^{- 1}}} \\ + \frac{μ_{n}^{δ}}{n^{(r - 2) / 2} h_{n}^{(r - 2) / 2}} + \frac{μ_{n}^{(r - 2) / 2}}{n^{(r - 2) / 2} h_{n}^{(r τ - 2 r + r^{2}) / (2 r + 2 τ)}} + {(\frac{α^{(m - 1) / m} ν_{n} μ_{n}^{(1 - m) / m}}{n^{- 1 / 2} h_{n}^{(m - 1) / 2 m}})}^{1 / 2}) . \end{matrix}

4. Numerical Simulation

In this section, all numerical simulations were conducted using custom R scripts. The code encompasses five core functional modules: (1) verification of parameter constraints aligned with Corollary 1, (2) generation of

α

-mixing error sequences and AR(1) process data, (3) least squares estimation of the autoregressive coefficient

ρ

, (4) calculation of the theoretical expectation and variance for residual kernel density estimators, and (5) visualization of standardized statistic distributions. Detailed English comments are included to ensure full reproducibility.

We verify the asymptotic normality of the residual kernel density estimator and the convergence rate of the Berry–Esseen bound for Theorem 1 and Corollary 1 under finite samples via Monte Carlo simulation. The observations are generated from

X_{s} = ρ_{true} X_{s - 1} + ε_{s}, 1 \leq s \leq n,

where the sample size n is set to 50, 100, and 200, respectively.

The generation methods of parameters and errors are specified as follows: the autoregressive coefficient $ρ_{true} = 0.6$ , and the initial value $X_{1}$ follows $N (0, \frac{σ_{ε}^{2}}{1 - ρ_{true}^{2}})$ ;
The error sequence ${ε_{s}}$ is an $α$ -mixing process, generated by the recursive formula $ε_{s} = 0.2 ε_{s - 1} + η_{s}$ , where $η_{s}$ is an independent and identically distributed (i.i.d.) Gaussian innovation term with $η_{s} \sim N (0, 1)$ ; the bandwidth is set to $h_{n} = n^{- 0.4}$ , which satisfies the constraint $1 / 3 < 0.4 < 1 / 2$ ; the block parameters are set as follows: the large block length $μ_{n} = n^{0.45}$ and the small block length $ν_{n} = n^{0.23}$ , which satisfy the constraints in Corollary 1;
Calculate the standardized statistic $P_{n} = \frac{{\hat{f}}_{n} (x_{0}) - E [f_{n} (x_{0})]}{\sqrt{Var [f_{n} (x_{0})]}};$
The kernel function uses a Gaussian kernel with $K (u) = ϕ (u)$ .

Figure 1 shows the kernel density histogram of the standardized statistic

P_{n} (x_{0})

corresponding to the residual density estimator in model (1), overlaid with the standard normal distribution density curve to visually verify asymptotic normality. When the sample size

n = 50

, there is a certain deviation between the shape of the histogram and the normal curve, characterized by a slightly higher kurtosis and obvious differences in tail thickness, which is related to the failure of the distributional properties of dependent data to fully manifest under small samples. As the sample size increases to

n = 100

, the symmetry of the histogram gradually becomes prominent, and the fitting degree between the frequency distribution and the normal curve is significantly improved. When

n = 200

, the histogram fully presents a bell-shaped symmetric structure, and the frequency distribution in each interval almost coincides with the standard normal curve, with only slight deviations in the extreme value region. This phenomenon is consistent with the conclusions in the literature.

Figure 2 is the Quantile–Quantile plots of

P_{n}

for

n = 50, 100, 200

, respectively. In each plot, the horizontal axis corresponds to the quantiles of the standard normal distribution, and the vertical axis corresponds to the quantiles of

P_{n}

. The red straight line is the

45^{°}

reference line.

Figure 1 and Figure 2 and Table 1 show that the standardized residual kernel density estimator of the model in model (1) has good asymptotic normality, consistent with the core conclusion of Theorem 1. The Berry–Esseen bounds decreases monotonically with increasing sample size, and its convergence rate follows

O (n^{(2 c - 2 b + a) / 4})

. When the mixing coefficient decays sufficiently fast, this rate approaches the near-optimal

O (n^{- 1 / 4})

in nonparametric estimation, which is superior to the

O (n^{- 1 / 10})

rate in [12].

Thus, Monte Carlo simulations confirm the theoretical findings of Theorem 1 and Corollary 1. Specifically, the residual kernel density estimator in model (1) exhibits valid asymptotic normality, and its Berry–Esseen bound converges at the theoretical rate. These results also confirm the rationality of the parameter constraints and theoretical derivations in this paper.

5. Proofs

To present the main proofs, we adopt the following notation (see Gao and Yang [10]):

The standardized residual density estimator decomposes as

\frac{f_{n} (x) - E f_{n} (x)}{\sqrt{Var (f_{n} (x))}} = \frac{\sum_{s = 1}^{n} Q_{n, s} (x)}{\sqrt{Var (\sum_{s = 1}^{n} Q_{n, s} (x))}} : = \sum_{s = 1}^{n} {\tilde{Q}}_{n, s} (x), n \geq 1,

where

{\tilde{Q}}_{n, s} (x) = \frac{Q_{n, s} (x)}{\sqrt{Var (\sum_{s = 1}^{n} Q_{n, s} (x))}},

and

Q_{n, s} (x) = {h_{n}}^{- 1 / 2} [K (\frac{x - ε_{s}}{h_{n}}) - E K (\frac{x - ε_{s}}{h_{n}})], 1 \leq s \leq n .

To handle the dependence, we employ a block-splitting technique (see Masry [5]). Choose large-block length

μ_{n}

and small-block length

ν_{n}

. Define

r_{n} : = ⌊\frac{n}{μ_{n} + ν_{n}}⌋ .

(4)

The summation

S_{n}

is partitioned into three components

S_{n} = \sum_{s = 1}^{n} {\tilde{Q}}_{n, s} (x) = \sum_{j = 0}^{r_{n} - 1} φ_{j} + \sum_{j = 0}^{r_{n} - 1} η_{j} + ξ_{r_{n}} : = S_{n}^{'} + S_{n}^{″} + S_{n}^{‴} .

(5)

Define

φ_{j}

,

η_{j}

,

ξ_{k}

as follows

φ_{j} = \sum_{s = j (μ_{n} + ν_{n}) + 1}^{j (μ_{n} + ν_{n}) + μ_{n}} {\tilde{Q}}_{n, s} (x), 0 \leq j \leq r_{n} - 1,

(6)

η_{j} = \sum_{s = j (μ_{n} + ν_{n}) + μ_{n} + 1}^{(j + 1) (μ_{n} + ν_{n})} {\tilde{Q}}_{n, s} (x), 0 \leq j \leq r_{n} - 1,

(7)

ξ_{r_{n}} = \sum_{s = r_{n} (μ_{n} + ν_{n}) + 1}^{n} {\tilde{Q}}_{n, s} (x) .

(8)

Proof of Lemma 6.

Under Assumptions 1(a) and 3(a) [10], for any

p \geq 1

,

\begin{matrix} E | Q_{n, s} {(x) |}^{p} & = E | Q_{n, 1} {(x) |}^{p} \leq C h_{n}^{- p / 2} E {|K (\frac{x - ε_{1}}{h_{n}})|}^{p} \\ = C h_{n}^{- p / 2} \int_{R} {|K (\frac{x - z}{h_{n}})|}^{p} f (z) d z \\ \leq C h_{n}^{1 - p / 2}, 1 \leq s \leq n . \end{matrix}

(9)

Moreover, by Assumption 2(b) and (2), we have

Var (f_{n} (x)) = n^{- 2} h_{n}^{- 1} Var (\sum_{s = 1}^{n} Q_{n, s} (x)),

\underset{n \to \infty}{lim inf} {n h_{n} V a r (f_{n} (x))} = \underset{n \to \infty}{lim inf} {n^{- 1} Var (\sum_{s = 1}^{n} Q_{n, s} (x))} = σ_{1}^{2} (x) > 0 .

Thus

\begin{matrix} E | {\tilde{Q}}_{n, s} {(x) |}^{p} & = E {|\frac{Q_{n, s} (x)}{\sqrt{Var (\sum_{s = 1}^{n} Q_{n, s} (x))}}|}^{p} \leq C n^{- p / 2} (x) E {| Q_{n, s} (x) |}^{p} \\ \leq C n^{- p / 2} h_{n}^{1 - p / 2}, 1 \leq s \leq n . \end{matrix}

(10)

Using (6) and (7), we decompose the variance

E {(S_{n}^{″})}^{2} = Var (\sum_{j = 0}^{r_{n} - 1} η_{j}) = \sum_{j = 0}^{r_{n} - 1} Var (η_{j}) + 2 \sum_{0 \leq i < j \leq r_{n} - 1} Cov (η_{i}, η_{j}) : = R_{1} + R_{2} .

(11)

By (10) together with Assumption 2(b) and Lemma 2,

\begin{matrix} Var (η_{j}) & = E {[\sum_{s = j (μ_{n} + ν_{n}) + μ_{n} + 1}^{(j + 1) (μ_{n} + ν_{n})} {\tilde{Q}}_{n, s} (x)]}^{2} \leq \sum_{s = j (μ_{n} + ν_{n}) + μ_{n} + 1}^{(j + 1) (μ_{n} + ν_{n})} (E | {\tilde{Q}}_{n, s} (x) {|^{4})}^{1 / 2} \\ \leq C \frac{ν_{n}}{n h_{n}^{1 / 2}} . \end{matrix}

(12)

Thus

R_{1} = \sum_{j = 0}^{r_{n} - 1} Var (η_{j}) \leq C \frac{r_{n} ν_{n}}{n h_{n}^{1 / 2}} \leq C \frac{ν_{n}}{μ_{n} h_{n}^{1 / 2}} .

(13)

For

R_{2}

, set

χ_{j} = j (μ_{n} + ν_{n}) + μ_{n}

. Then

\begin{matrix} R_{2} & = 2 \sum_{0 \leq i < j \leq r_{n} - 1} Cov (η_{i}, η_{j}) \\ = 2 \sum_{0 \leq i < j \leq r_{n} - 1} \sum_{y_{1} = 1}^{ν_{n}} \sum_{y_{2} = 1}^{ν_{n}} Cov [{\tilde{Q}}_{n, χ_{i} + y_{1}} (x), {\tilde{Q}}_{n, χ_{j} + y_{2}} (x)], \end{matrix}

(14)

when

i \neq j

, we have

| χ_{i} - χ_{j} + y_{1} - y_{2} | \geq μ_{n}

. By Assumption 2(b), (4), (10) and Lemma 3,

\begin{matrix} | R_{2} | & \leq 2 \sum_{\begin{matrix} 1 \leq i < j \leq n \\ j - i \geq μ_{n} \end{matrix}} | Cov [{\tilde{Q}}_{n, i} (x), {\tilde{Q}}_{n, j} (x)] | \\ \leq C \sum_{\begin{matrix} 1 \leq i < j \leq n \\ j - i \geq μ_{n} \end{matrix}} α^{1 - p^{- 1} - q^{- 1}} (μ_{n}) (E | {\tilde{Q}}_{n, i} (x) |^{p})^{p^{- 1}} (E | {\tilde{Q}}_{n, j} (x) {|^{q})}^{q^{- 1}} \\ \leq C \frac{μ_{n} α^{1 - p^{- 1} - q^{- 1}} (μ_{n})}{n h_{n}^{1 - p^{- 1} - q^{- 1}}} . \end{matrix}

(15)

Combining (13) and (14) gives

\begin{matrix} E {(S_{n}^{″})}^{2} \leq C (\frac{ν_{n}}{μ_{n} h_{n}^{1 / 2}} + \frac{μ_{n} α^{1 - p^{- 1} - q^{- 1}} (μ_{n})}{n h_{n}^{1 - p^{- 1} - q^{- 1}}}) . \end{matrix}

(16)

Similarly, using Lemma 2 together with Hölder’s inequality yields

\begin{matrix} E {(S_{n}^{‴})}^{2} & = E {(\sum_{s = r_{n} (μ_{n} + ν_{n}) + μ_{n} + 1}^{n} {\tilde{Q}}_{n, s} (x))}^{2} \\ \leq C \sum_{s = r_{n} (μ_{n} + ν_{n}) + μ_{n} + 1}^{n} (E | {\tilde{Q}}_{n, s} (x) {|^{4})}^{1 / 2} \\ \leq C \frac{(n - (r_{n} (μ_{n} + ν_{n}) + 1))}{n h_{n}^{1 / 2}} \\ \leq C \frac{μ_{n}}{n h_{n}^{1 / 2}} . \end{matrix}

(17)

Finally, Markov’s inequality provides the probability bounds

\begin{matrix} P (| S_{n}^{″} | > {(\frac{ν_{n}}{μ_{n} h_{n}^{1 / 2}} + \frac{μ_{n} α^{1 - p^{- 1} - q^{- 1}} (μ_{n})}{n h_{n}^{1 - p^{- 1} - q^{- 1}}})}^{1 / 2}) & \leq C {(\frac{ν_{n}}{μ_{n} h_{n}^{1 / 2}} + \frac{μ_{n} α^{1 - p^{- 1} - q^{- 1}} (μ_{n})}{n h_{n}^{1 - p^{- 1} - q^{- 1}}})}^{1 / 2}, \end{matrix}

(18)

\begin{matrix} P (| S_{n}^{‴} | > {(\frac{μ_{n}}{n h_{n}^{1 / 2}})}^{1 / 2}) & \leq C {(\frac{μ_{n}}{n h_{n}^{1 / 2}})}^{1 / 2} . \end{matrix}

(19)

This completes the proof of Lemma 6. □

Proof of Lemma 7.

Starting from

E S_{n}^{2} = 1

, write

S_{n} = S_{n}^{'} + (S_{n}^{″} + S_{n}^{‴})

and compute

\begin{matrix} E {(S_{n}^{'})}^{2} & = E {[S_{n} - (S_{n}^{″} + S_{n}^{‴})]}^{2} \\ = 1 + E {(S_{n}^{″} + S_{n}^{‴})}^{2} - 2 E [S_{n} (S_{n}^{″} + S_{n}^{‴})] . \end{matrix}

By the triangle inequality and properties of expectations

| E {(S_{n}^{'})}^{2} - 1 | \leq E {(S_{n}^{″} + S_{n}^{‴})}^{2} + 2 E | S_{n} (S_{n}^{″} + S_{n}^{‴}) |,

Applying Hölder’s inequality to the second term and the

C_{r}

-inequality to the first term,

E | S_{n} (S_{n}^{″} + S_{n}^{‴}) | \leq {(E S_{n}^{2})}^{1 / 2} {(E {(S_{n}^{″} + S_{n}^{‴})}^{2})}^{1 / 2} \leq E^{1 / 2} {(S_{n}^{″})}^{2} + E^{1 / 2} {(S_{n}^{‴})}^{2},

and by the

C_{r}

-inequality

E {(S_{n}^{″} + S_{n}^{‴})}^{2} \leq 2 [E {(S_{n}^{″})}^{2} + E {(S_{n}^{‴})}^{2}] .

Combining these bounds with Lemma 6 to control

E {(S_{n}^{″})}^{2}

and

E {(S_{n}^{‴})}^{2}

, we obtain

\begin{matrix} |E {(S_{n}^{'})}^{2} - 1| & \leq C ({(\frac{ν_{n}}{μ_{n} h_{n}^{1 / 2}})}^{1 / 2} + {(\frac{μ_{n} α^{1 - p^{- 1} - q^{- 1}} (μ_{n})}{n h_{n}^{1 - p^{- 1} - q^{- 1}}})}^{1 / 2} + {(\frac{μ_{n}}{n h_{n}^{1 / 2}})}^{1 / 2}) . \end{matrix}

(20)

where

φ_{j}

is defined in (6). Define

s_{n}^{2} = \sum_{j = 0}^{r_{n} - 1} Var (φ_{j}), Γ_{n} = \sum_{0 \leq i < j \leq r_{n} - 1} cov (φ_{i}, φ_{j}), s_{n}^{2} = E {[S_{n}^{'}]}^{2} - 2 Γ_{n} .

Given

χ_{j} = j (μ_{n} + ν_{n})

for

i \neq j

and

| χ_{i} - χ_{j} + y_{1} - y_{2} | \geq ν_{n}

, it can be seen that

2 Γ_{n} = 2 \sum_{0 \leq i < j \leq r_{n} - 1} Cov (φ_{i}, φ_{j}) = 2 \sum_{0 \leq i < j \leq r_{n} - 1} \sum_{y_{1} = 1}^{μ_{n}} \sum_{y_{2} = 1}^{μ_{n}} Cov [{\tilde{Q}}_{n, χ_{i} + y_{1}} (x), {\tilde{Q}}_{n, χ_{j} + y_{2}} (x)] .

Analogously to (15), Lemma 3 gives

\begin{matrix} | Γ_{n} | & \leq \sum_{\begin{matrix} 1 \leq i < j \leq n \\ j - i \geq ν_{n} \end{matrix}} |Cov [{\tilde{Q}}_{n, i} (x), {\tilde{Q}}_{n, j} (x)]| \\ \leq C \sum_{\begin{matrix} 1 \leq i < j \leq n \\ j - i \geq ν_{n} \end{matrix}} α^{1 - p^{- 1} - q^{- 1}} (ν_{n}) {(E | {\tilde{Q}}_{n, i} {(x) |}^{p})}^{p^{- 1}} {(E | {\tilde{Q}}_{n, j} {(x) |}^{q})}^{q^{- 1}} \\ \leq C \frac{ν_{n} α^{1 - p^{- 1} - q^{- 1}} (ν_{n})}{n h_{n}^{1 - p^{- 1} - q^{- 1}}} . \end{matrix}

(21)

Therefore,

\begin{matrix} | s_{n}^{2} - 1 | \leq C ({(\frac{ν_{n}}{μ_{n} h_{n}^{1 / 2}})}^{1 / 2} + {(\frac{μ_{n} α^{1 - p^{- 1} - q^{- 1}} (μ_{n})}{n h_{n}^{1 - p^{- 1} - q^{- 1}}})}^{1 / 2} + {(\frac{μ_{n}}{n h_{n}^{1 / 2}})}^{1 / 2} + \frac{ν_{n} α^{1 - p^{- 1} - q^{- 1}} (ν_{n})}{n h_{n}^{1 - p^{- 1} - q^{- 1}}}) . \end{matrix}

(22)

This completes the proof of Lemma 7. □

Proof of Lemma 8.

Applying Lemma 1, Assumptin (4) and the Berry–Esseen inequality (Petrov [11]), we obtain

\begin{matrix} sup_{z \in R} |P (T_{n} / s_{n} \leq z) - Φ (z)| \leq \frac{C}{s_{n}^{r}} \sum_{j = 0}^{r_{n} - 1} E {| φ_{j}^{'} |}^{r} \\ \leq C \sum_{j = 0}^{r_{n} - 1} E {| φ_{j} |}^{r} \\ = C \sum_{j = 0}^{r_{n} - 1} E {|\sum_{s = j (μ_{n} + ν_{n}) + 1}^{j (μ_{n} + ν_{n}) + μ_{n}} {\tilde{Q}}_{n, i} (x)|}^{r} \\ \leq C \sum_{j = 0}^{r_{n} - 1} \{μ_{n}^{δ} E {|\sum_{s = j (μ_{n} + ν_{n}) + 1}^{j (μ_{n} + ν_{n}) + μ_{n}} {\tilde{Q}}_{n, s} (x)|}^{r} + {(\sum_{s = j (μ_{n} + ν_{n}) + 1}^{j (μ_{n} + ν_{n}) + μ_{n}} {(E | {\tilde{Q}}_{n, s} {(x) |}^{r + τ})}^{2 / (r + τ)})}^{r / 2}\} \\ \leq C r_{n} \{μ_{n}^{δ + 1} n^{- r / 2} h_{n}^{(2 - r) / 2} + μ_{n}^{r / 2} n^{- r / 2} h_{n}^{(2 r - r^{2} - r τ) / (2 r + 2 τ)}\} \\ \leq C (\frac{μ_{n}^{δ}}{n^{(r - 2) / 2} h_{n}^{(r - 2) / 2}} + \frac{μ_{n}^{(r - 2) / 2}}{n^{(r - 2) / 2} h_{n}^{(r τ - 2 r + r^{2}) / (2 r + 2 τ)}}) . \end{matrix}

(23)

By (23), the probability difference can be bounded through normal approximation

\begin{matrix} sup_{z \in R} |P (T_{n} \leq z + u) - P (T_{n} \leq z)| \\ \leq sup_{z \in R} |P (\frac{T_{n}}{s_{n}} \leq z) - Φ (z)| + sup_{z \in R} |P (\frac{T_{n}}{s_{n}} \leq \frac{z + u}{s_{n}}) - Φ (\frac{z + u}{s_{n}})| \\ + sup_{z} |Φ (\frac{z + u}{s_{n}}) - Φ (\frac{z}{s_{n}})| \\ \leq C (\frac{μ_{n}^{δ}}{n^{(r - 2) / 2} h_{n}^{(r - 2) / 2}} + \frac{μ_{n}^{(r - 2) / 2}}{n^{(r - 2) / 2} h_{n}^{(r τ - 2 r + r^{2}) / (2 r + 2 τ)}} + | u | / s_{n}) \\ \leq C (\frac{μ_{n}^{δ}}{n^{(r - 2) / 2} h_{n}^{(r - 2) / 2}} + \frac{μ_{n}^{(r - 2) / 2}}{n^{(r - 2) / 2} h_{n}^{(r τ - 2 r + r^{2}) / (2 r + 2 τ)}} + | u |) . \end{matrix}

(24)

Let

ζ (t)

be the characteristic function of

S_{n}^{'}

and

ψ (t)

that of

T_{n}

. Applying Lemma 4 with

1 / s + 1 / m = 1

gives

\begin{matrix} \begin{matrix} ψ (t) & = E exp {i t T_{n}} = \prod_{j = 0}^{r_{n} - 1} E exp {i t φ_{j}} = \prod_{j = 0}^{r_{n} - 1} E exp {i t φ_{j}}, \\ | ζ (t) - ψ (t) | & = |E exp \{i t \sum_{j = 0}^{r_{n} - 1} φ_{j}\} - \prod_{j = 0}^{r_{n} - 1} E exp {i t φ_{j}}| \\ \leq C | t | α^{1 / s} (ν_{n}) \sum_{j = 0}^{r_{n} - 1} {∥ φ_{j} ∥}_{m} \\ \leq C | t | α^{1 / s} (ν_{n}) \sum_{j = 0}^{r_{n} - 1} (E | φ_{j} {|^{m})}^{1 / 2} \\ \leq C | t | α^{1 / s} (ν_{n}) \sum_{j = 0}^{r_{n} - 1} {(E {|\sum_{s = j (μ_{n} + ν_{n}) + 1}^{j (μ_{n} + ν_{n}) + μ_{n}} {\tilde{Q}}_{n, s} (x)|}^{m})}^{1 / m} \\ \leq C | t | α^{1 / s} (ν_{n}) \sum_{j = 0}^{r_{n} - 1} {(\sum_{s = j (μ_{n} + ν_{n}) + 1}^{j (μ_{n} + ν_{n}) + μ_{n}} {(E | {\tilde{Q}}_{n, s} {(x) |}^{2 m})}^{1 / 2})}^{1 / m} \\ \leq C | t | α^{1 / s} (ν_{n}) r_{n} μ_{n}^{1 / m} n^{- 1 / 2} h_{n}^{(1 - m) / 2 m} \\ \leq C | t | \frac{α^{(m - 1) / m} (ν_{n}) μ_{n}^{(1 - m) / m}}{n^{- 1 / 2} h_{n}^{(m - 1) / 2 m}} . \end{matrix} \end{matrix}

(25)

Now, by Esseen’s inequality (Petrov [11]), for any

T > 0

there exists a constant

C > 0

such that

\begin{matrix} sup_{z \in R} |P (S_{n}^{'} \leq z) - P (T_{n} \leq z)| \\ \leq \int_{- T}^{T} |\frac{ζ (t) - ψ (t)}{t}| d t + T sup_{z \in R} \int_{| z | \leq C / T} |P (T_{n} \leq z + u) - P (T_{n} \leq z)| d z \\ \leq C T \frac{α^{(m - 1) / m} (ν_{n}) μ_{n}^{(1 - m) / m}}{n^{- 1 / 2} h_{n}^{(m - 1) / 2 m}} + T \frac{C}{T} (\frac{μ_{n}^{δ}}{n^{(r - 2) / 2} h_{n}^{(r - 2) / 2}} + \frac{μ_{n}^{(r - 2) / 2}}{n^{(r - 2) / 2} h_{n}^{(r τ - 2 r + r^{2}) / (2 r + 2 τ)}} + \frac{1}{T}) \\ \leq C ({(\frac{α^{(m - 1) / m} (ν_{n}) μ_{n}^{(1 - m) / m}}{n^{- 1 / 2} h_{n}^{(m - 1) / 2 m}})}^{1 / 2} + \frac{μ_{n}^{δ}}{n^{(r - 2) / 2} h_{n}^{(r - 2) / 2}} + \frac{μ_{n}^{(r - 2) / 2}}{n^{(r - 2) / 2} h_{n}^{(r τ - 2 r + r^{2}) / (2 r + 2 τ)}}), \end{matrix}

(26)

where we have chosen

T = {(\frac{n^{- 1 / 2} h_{n}^{(m - 1) / 2 m}}{α^{(m - 1) / m} (ν_{n}) μ_{n}^{(1 - m) / m}})}^{1 / 2}

.

Finally,

\begin{matrix} sup_{z \in R} |P (S_{n}^{'} \leq z) - Φ (z)| \\ \leq sup_{z \in R} |P (S_{n}^{'} \leq z) - P (T_{n} \leq z)| + sup_{z \in R} |P (T_{n} \leq z) - Φ (z / s_{n})| \\ + sup_{z \in R} |Φ (z / s_{n}) - Φ (u)| . \end{matrix}

(27)

Using Lemmas 6, 7 and (20)–(26), we complete the proof of Lemma 8. □

Proof of Theorem 1.

Combining Lemmas 5–8 with (11) and (12) yields

\begin{matrix} sup_{z \in R} |P (S_{n} \leq z) - Φ (z)| \\ \leq sup_{z \in R} |P (S_{n}^{'} \leq z) - Φ (z)| + P (| S_{n}^{″} | > {(\frac{ν_{n}}{μ_{n} h_{n}^{1 / 2}} + \frac{μ_{n} α^{1 - p^{- 1} - q^{- 1}} (μ_{n})}{n h_{n}^{1 - p^{- 1} - q^{- 1}}})}^{1 / 2}) \\ + \frac{{(\frac{ν_{n}}{μ_{n} h_{n}^{1 / 2}} + \frac{μ_{n} α^{1 - p^{- 1} - q^{- 1}} (μ_{n})}{n h_{n}^{1 - p^{- 1} - q^{- 1}}})}^{1 / 2}}{\sqrt{2 π}} + P (|S_{n}^{‴}| \geq {(\frac{μ_{n}}{n h_{n}^{1 / 2}})}^{1 / 2}) + \frac{{(\frac{μ_{n}}{n h_{n}^{1 / 2}})}^{1 / 2}}{\sqrt{2 π}} \\ \leq C ({(\frac{ν_{n}}{μ_{n} h_{n}^{1 / 2}})}^{1 / 2} + {(\frac{μ_{n} α^{1 - p^{- 1} - q^{- 1}} (μ_{n})}{n h_{n}^{1 - p^{- 1} - q^{- 1}}})}^{1 / 2} + {(\frac{μ_{n}}{n h_{n}^{1 / 2}})}^{1 / 2} + \frac{ν_{n} α^{1 - p^{- 1} - q^{- 1}} (ν_{n})}{n h_{n}^{1 - p^{- 1} - q^{- 1}}} \\ + \frac{μ_{n}^{δ}}{n^{(r - 2) / 2} h_{n}^{(r - 2) / 2}} + \frac{μ_{n}^{(r - 2) / 2}}{n^{(r - 2) / 2} h_{n}^{(r τ - 2 r + r^{2}) / (2 r + 2 τ)}} + {(\frac{α^{(m - 1) / m} (ν_{n}) μ_{n}^{(1 - m) / m}}{n^{- 1 / 2} h_{n}^{(m - 1) / 2 m}})}^{1 / 2}) . \end{matrix}

(28)

According to (Gao [10]), we have

\frac{{\hat{f}}_{n} (x) - f_{n} (x)}{\sqrt{V a r (f_{n} (x))}} = O_{P} (\frac{1}{n^{1 / 2} h_{n}^{3 / 2}}) .

Under Assumptions 1–3, the kernel function K is continuously differentiable of the first order and its derivative is bounded, for which there exists a constant

M > 0

such that

| K^{'} (u) | \leq M

for all

u \in R

. The density f is smooth, and the errors satisfy

E | ε_{1} |^{r + τ} < \infty

for

r > 2

and

τ > 0

. For the residual

{\hat{ε}}_{s} = X_{s} - {\hat{ρ}}_{n} X_{s - 1}

and the true innovation

ε_{s} = X_{s} - ρ X_{s - 1}

, a first-order Taylor expansion of

K (\frac{x - {\hat{ε}}_{s}}{h_{n}})

around

\frac{x - ε_{s}}{h_{n}}

gives

K (\frac{x - {\hat{ε}}_{s}}{h_{n}}) - K (\frac{x - ε_{s}}{h_{n}}) = K^{'} (ξ_{s x}) \frac{{\hat{ε}}_{s} - ε_{s}}{h_{n}},

where

ξ_{s x}

lies between

\frac{x - {\hat{ε}}_{s}}{h_{n}}

and

\frac{x - ε_{s}}{h_{n}}

. Since

{\hat{ε}}_{s} - ε_{s} = (ρ - {\hat{ρ}}_{n}) X_{s - 1}

, we obtain

{\hat{f}}_{n} (x) - f_{n} (x) = \frac{1}{n h_{n}} \sum_{s = 1}^{n} [K (\frac{x - {\hat{ε}}_{s}}{h_{n}}) - K (\frac{x - ε_{s}}{h_{n}})] = \frac{(ρ - {\hat{ρ}}_{n})}{n h_{n}^{2}} \sum_{s = 1}^{n} K^{'} (ξ_{s x}) X_{s - 1} .

Taking squares and applying the Cauchy–Schwarz inequality, together with Assumption 3(b) and Lemma A.2 of [2], yields

| {\hat{f}}_{n} (x) - f_{n} {(x) |}^{2} \leq {| {\hat{ρ}}_{n} - ρ |}^{2} \frac{1}{n^{2} h_{n}^{4}} {|\sum_{t = 1}^{n} K^{'} (ξ_{s x}) X_{s - 1}|}^{2}

E [{|{\hat{f}}_{n} (x) - f_{n} (x)|}^{2}] \leq E [{|{\hat{ρ}}_{n} - ρ|}^{2} \frac{1}{n^{2} h_{n}^{4}} {|\sum_{t = 1}^{n} K^{'} (ξ_{s x}) X_{s - 1}|}^{2}],

by the Cauchy–Schwarz inequality

E [{|{\hat{ρ}}_{n} - ρ|}^{2} {|\sum_{t = 1}^{n} K^{'} (ξ_{s x}) X_{s - 1}|}^{2}] \leq {(E [{|{\hat{ρ}}_{n} - ρ|}^{4}])}^{1 / 2} {(E [{|\sum_{t = 1}^{n} K^{'} (ξ_{s x}) X_{s - 1}|}^{4}])}^{1 / 2} .

(29)

From the condition

| {\hat{ρ}}_{n} - ρ | = O_{P} (n^{- 1 / 2})

in Assumption (6), the result

| {\hat{f}}_{n} (x) - f_{n} (x) | = O_{P} (\frac{1}{n h_{n}^{2}})

obtained in [10] and the asymptotic normality

\sqrt{n} ({\hat{ρ}}_{n} - ρ) \overset{d}{\to} N (0, (1 - ρ^{2}))

given in [21] (where

N (0, (1 - ρ^{2}))

is a Gaussian variable), we can obtain

E [| n^{1 / 2} ({\hat{ρ}}_{n} - ρ) |^{2}] \to (1 - ρ^{2}) σ^{2} < \infty,

E [| {\hat{ρ}}_{n} {- ρ |}^{2}] = O (\frac{1}{n}),

and by Assumption 2(a) the fourth-order moment is bounded

E [{({\hat{ρ}}_{n} - ρ)}^{4}] = O (\frac{1}{n^{2}}) .

(30)

By Lemma A.2 of [10], we have

E {|\sum_{s = 1}^{n} X_{s - 1} K^{'} (ξ_{s})|}^{4} = O (n^{2}) .

(31)

Substituting (30) and (31) into 29 gives

E [{|{\hat{f}}_{n} (x) - f_{n} (x)|}^{2}] = O (\frac{1}{n^{2} h_{n}^{4}}) .

Further, from Assumption 2(b), there exists a constant

\sqrt{Var (f_{n} (x))} \geq \frac{c}{\sqrt{n h_{n}}}

for some constant

c > 0

. Applying Cauchy–Schwarz inequality to

E [\frac{| {\hat{f}}_{n} (x) - f_{n} (x) |}{\sqrt{Var (f_{n} (x))}}]

, we obtain

E [\frac{| {\hat{f}}_{n} (x) - f_{n} (x) |}{\sqrt{Var (f_{n} (x))}}] \leq \frac{\sqrt{E [{|{\hat{f}}_{n} (x) - f_{n} (x)|}^{2}]}}{\sqrt{Var (f_{n} (x))}} = O (\frac{1}{n^{1 / 2} h_{n}^{3 / 2}}),

which establishes the expectation bound.

By Markov’s inequality

\begin{matrix} P \{\frac{|{\hat{f}}_{n} (x) - f_{n} (x)|}{\sqrt{V a r (f_{n} (x))}} > a\} \leq \frac{1}{a} E [\frac{|{\hat{f}}_{n} (x) - f_{n} (x)|}{\sqrt{V a r (f_{n} (x))}}] \leq {(\frac{μ_{n}}{n h_{n}^{1 / 2}})}^{1 / 2} . \end{matrix}

(32)

Choosing

a = μ_{n}^{- 1 / 2} h_{n}^{- 1}

yields the desired result.

Finally, Lemma 5, (32) and Lemma 8 together complete the proof of Theorem 1. □

Proof of Corollary 1.

Substitute

α (n) = O (n^{- ι})

,

h_{n} = n^{- a}

,

μ_{n} = n^{b}

,

ν_{n} = n^{c}

with

1 / 3 < a < 1 / 2

,

0 < c < b \leq 2 c < 1

,

m a x (0, 2 b - 1) < c < b - a / 2

, and take

p = q = 3

,

m = 3

,

δ = 1

,

r = 4

,

τ = 2

into Theorem 1

\begin{matrix} sup_{z \in R} |P (\frac{{\hat{f}}_{n} (x) - E (f_{n} (x))}{\sqrt{Var (f_{n} (x))}} \leq z) - Φ (z)| \\ = O (n^{(2 c - 2 b + a) / 4} + n^{(3 b - b ι + a - 3) / 6} + n^{(2 b + a - 2) / 4} + n^{(3 c - c ι + a - 3) / 3 + n^{b + a - 1}} \\ + n^{(3 b + 4 a - 3) / 3} + n^{(3 + 2 a - 4 c ι - 4 b) / 12}) . \end{matrix}

The second term involves the mixing coefficient

α (n) = O (n^{- ι})

. When

ι > 3

, although its exponent is negative, it is closer to zero than those of the other terms. The remaining terms have more negative (lower-order) exponents. Hence the second term dominates.

This completes the proof of Corollary 1. □

6. Discussion

The findings of this paper will not only advance the theoretical foundation of nonparametric inference for high-dimensional time series in the future but also provide key support for model diagnostics. For instance, in terms of model diagnostics, it can verify the assumption adaptability of loan time series models through comparisons of relevant standards, assess the robustness of key parameters, and offer a quantitative basis for model iteration and updates, thereby improving the accuracy of loan risk control and the reliability of model applications.

7. Conclusions

This paper derives explicit Berry–Esseen bounds for residual kernel density estimators in AR(1) models with

α

-mixing errors. Under our assumptions, we derive Kolmogorov distance bounds between the standardized estimator and its Gaussian limit. These bounds incorporate the bandwidth, block parameters, and mixing intensity. A key corollary quantifies the convergence rate as

O (n^{(2 c - 2 b + a) / 4})

, approaching the near-optimal

O (n^{- 1 / 4})

as the mixing coefficient decays sufficiently fast, advancing the theoretical foundation for nonparametric inference in high-dimensional time series. Monte Carlo simulations with sample sizes

n = 50, 100, 200

verify the estimator’s asymptotic normality and the monotonic convergence of the Berry–Esseen bounds, consistent with the theoretical rate and confirming the rationality of parameter constraints. Future work may extend to higher-order AR models, relax the

α

-mixing assumption to long-memory processes, or incorporate data-driven bandwidth selection to broaden applicability.

Author Contributions

Methodology, J.W.; Writing—original draft, J.W.; Writing—review and editing, T.L.; Funding acquisition, T.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Jilin Provincial Natural Science Foundation Youth Development Program (Grant No. YDZJ202301ZYTS373).

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

We sincerely appreciate the editorial team’s proactive adherence to submission guidelines, as this thoughtful preparation will undeniably reduce administrative delays and facilitate a more efficient publication trajectory for our work. Additionally, gratitude extends to unnamed colleagues who provided ad-hoc methodological insights or data-management support—their contributions, though uncredited in authorship, were integral to refining this manuscript.

Conflicts of Interest

All authors declare the absence of any competing interests.

References

Brockwell, P.J.; Davis, R.A. Time Series: Theory and Methods; Springer Series in Statistics; Springer: New York, NY, USA, 1991. [Google Scholar]
Parzen, E. On Estimation of a Probability Density Function and Mode. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
Rosenblatt, M. Remarks on Some Nonparametric Estimates of a Density Function. Ann. Math. Stat. 1956, 27, 832–837. [Google Scholar] [CrossRef]
Rosenblatt, M. A Central Limit Theorem and a Strong Mixing Condition. Proc. Natl. Acad. Sci. USA 1956, 42, 43–47. [Google Scholar] [CrossRef] [PubMed]
Wu, Y.; Yu, W.; Wang, X.J.; Shen, A. The rate of complete consistency for recursive probability density estimator under strong mixing samples. Stat. Probab. Lett. 2021, 176, 109130. [Google Scholar] [CrossRef]
Honda, T. Nonparametric Estimation of a Conditional Quantile for α-mixing Processes. Ann. Inst. Stat. Math. 2000, 52, 459–470. [Google Scholar] [CrossRef]
Laïb, N.; Louani, D. Asymptotic normality of kernel density function estimator from continuous time stationary and dependent processes. Stat. Probab. Lett. 2019, 145, 187–196. [Google Scholar] [CrossRef]
Lee, S.; Na, S.R. On the Bickel–Rosenblatt test for first-order autoregressive models. Stat. Probab. Lett. 2002, 56, 23–35. [Google Scholar] [CrossRef]
Horvath, L.; Zitikis, R. Asymptotics of the L_p-norms of density estimators in the first-order autoregressive models. Stat. Probab. Lett. 2003, 65, 331–342. [Google Scholar] [CrossRef]
Gao, M.; Yang, W.; Wu, S.; Yu, W. Asymptotic normality of residual density estimator in stationary and explosive autoregressive models. Comput. Stat. Data Anal. 2022, 175, 107549. [Google Scholar] [CrossRef]
Petrov, V.V. Limit Theorems of Probability Theory; Oxford University Press Inc.: New York, NY, USA, 1995. [Google Scholar]
Liu, Y.X.; Niu, S.L. Berry–Esseen bounds of recursive kernel estimator of density under strong mixing assumptions. Bull. Korean Math. Soc. 2017, 54, 343–358. [Google Scholar] [CrossRef]
Wu, Y.; Wang, X.; Li, Y.; Hu, S. Berry–Esseen type bounds of the estimators in a semiparametric model under linear process errors with α-mixing dependent innovations. Statistics 2019, 53, 943–967. [Google Scholar] [CrossRef]
Wu, Y.; Hu, T.C.; Volodin, A.; Wang, X. Some improved results on Berry–Esséen bounds for strong mixing random variables and applications. Statistics 2023, 57, 740–760. [Google Scholar] [CrossRef]
Neufeld, L. Weighted sums and Berry–Esseen type estimates in free probability theory. Probab. Theory Relat. Fields 2024, 190, 803–879. [Google Scholar] [CrossRef]
Chen, W.; Qu, Z. Berry–Esseen expansion and Cramér-type large deviation for run and tumble particles on one dimension. Stat. Probab. Lett. 2025, 218, 110308. [Google Scholar] [CrossRef]
Yang, S.C. Maximal Moment Inequality for Partial Sums of Strong Mixing Sequences and Application. Acta Math. Sin. Engl. Ser. 2007, 23, 1013–1024. [Google Scholar] [CrossRef]
Yang, W.; Wang, Y.; Hu, S. Some probability inequalities of least-squares estimator in nonlinear regression model with strong mixing errors. Commun. Stat.—Theory Methods 2017, 46, 165–175. [Google Scholar] [CrossRef]
Hall, P.; Heyde, C.C. Martingale Limit Theory and Its Application; Academic Press, Inc.: New York, NY, USA, 1980. [Google Scholar]
Yang, S. Uniformly Asymptotic Normality of the Regression Weighted Estimator for Negatively Associated Samples. J. Stat. Inference Plan. 2003, 115, 345–360. [Google Scholar] [CrossRef]
Phillips, P.C.B.; Magdalinos, T. Limit theory for moderate deviations from a unit root. J. Econom. 2007, 136, 115–130. [Google Scholar] [CrossRef]

Figure 1. Histograms of

P_{n}

corresponding to sample sizes of 50, 100, and 200.

Figure 1. Histograms of

P_{n}

corresponding to sample sizes of 50, 100, and 200.

Figure 2. Quantile–Quantile plots of

P_{n}

corresponding to sample sizes of 50, 100, and 200.

Figure 2. Quantile–Quantile plots of

P_{n}

corresponding to sample sizes of 50, 100, and 200.

Table 1. Berry–Esseen bounds under different sample sizes.

n	50	100	200
$P (P_{n} - \leq z) - Φ (z)$	0.0849	0.0664	0.0422

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, J.; Liu, T. Berry–Esseen Bounds of Residual Density Estimators in the First-Order Autoregressive Model with the α-Mixing Errors. Mathematics 2026, 14, 73. https://doi.org/10.3390/math14010073

AMA Style

Wang J, Liu T. Berry–Esseen Bounds of Residual Density Estimators in the First-Order Autoregressive Model with the α-Mixing Errors. Mathematics. 2026; 14(1):73. https://doi.org/10.3390/math14010073

Chicago/Turabian Style

Wang, Jiaxin, and Tianze Liu. 2026. "Berry–Esseen Bounds of Residual Density Estimators in the First-Order Autoregressive Model with the α-Mixing Errors" Mathematics 14, no. 1: 73. https://doi.org/10.3390/math14010073

APA Style

Wang, J., & Liu, T. (2026). Berry–Esseen Bounds of Residual Density Estimators in the First-Order Autoregressive Model with the α-Mixing Errors. Mathematics, 14(1), 73. https://doi.org/10.3390/math14010073

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Berry–Esseen Bounds of Residual Density Estimators in the First-Order Autoregressive Model with the α-Mixing Errors

Abstract

1. Introduction

2. Some Basic Assumptions and Main Results

3. Auxiliary Lemma

4. Numerical Simulation

5. Proofs

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI