Next Article in Journal
Research on Supply Chain Advertising Strategies for Big Data-Driven E-Commerce Platforms: Head or Newcomer?
Next Article in Special Issue
Robust Nonparametric Early Stopping in Tree Ensembles via IQR-Scale Change-Point Detection
Previous Article in Journal
A General Framework for Activation Function Optimization Based on Mollification Theory
Previous Article in Special Issue
An Inferential Study of Discrete One-Parameter Linear Exponential Distribution Under Randomly Right-Censored Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Berry–Esseen Bounds of Residual Density Estimators in the First-Order Autoregressive Model with the α-Mixing Errors

School of Mathematics and Statistics, Beihua University, Jilin 132013, China
*
Author to whom correspondence should be addressed.
Mathematics 2026, 14(1), 73; https://doi.org/10.3390/math14010073
Submission received: 20 November 2025 / Revised: 18 December 2025 / Accepted: 24 December 2025 / Published: 25 December 2025
(This article belongs to the Special Issue Mathematical Statistics and Nonparametric Inference)

Abstract

This study establishes explicit Berry–Esseen bounds for residual kernel density estimators in AR(1) models with α -mixing errors. Since the true innovations are unobservable, we introduce a residual-based estimator f ^ n ( x ) and establish its normal approximation under stationarity. By imposing conditions on the bandwidth, mixing coefficients, and moments, we obtain Kolmogorov distance bounds between the standardized estimator and its Gaussian limit. These bounds explicitly depend on the bandwidth, block parameters, and mixing coefficients. A key corollary quantifies the convergence rate as O ( n ( 2 c 2 b + a ) / 4 ) . Our results generalize prior work, advancing theoretical foundations for nonparametric inference in high-dimensional time series.

1. Introduction

Consider a sequence of random variables { X s } following a first-order autoregressive (AR(1)) model
X s = ρ X s 1 + ε s ,   1 s n ,  
where { ε s } is a sequence of errors (see Brockwell and Davis [1]). Under stationarity, the process can be represented as
X s = j = 0 ρ j ε s j ,   s 1 .
In the model (1), if the innovations { ε s } were directly observable, a kernel density estimator defined as
f n ( x ) : = 1 n h n s = 1 n K x ε s h n ,   x R ,
could be employed to estimate the true density f ( x ) of ε s . Here, K ( · ) is a kernel probability density function (p.d.f.), and h n is a bandwidth satisfying h n 0 as n . Since f n ( x ) belongs to the class of kernel density estimators, its theoretical characteristics have been widely explored in the literature. Early studies focused on i.i.d. samples. Parzen [2] proposed kernel density estimation’s asymptotic theory, defined kernel bandwidth, and built the framework for independent-sample density estimation. Rosenblatt [3,4] laid the groundwork for classical kernel methodologies via foundational research. As focus shifted to dependent data, research expanded to mixing sequences. Wu et al. [5] established the complete consistency rate of recursive density estimators for strong mixing samples. Honda [6] developed the nonparametric conditional quantile theory for α -mixing processes, complementing the density estimation focus of this study. Laïb and Louani [7] assessed kernel regression estimators’ asymptotic traits for functional, stationary, ergodic datasets. These works provide the analytical frameworks underpinning this study’s theoretical advancements.
In the autoregressive model by (1), observable data are restricted to the sequence { X 1 , X 2 , , X n }. To construct an estimator for f ( x ) , we thus adapt the formulation of f n ( x ) from the preceding context by incorporating residuals of the form
ε ^ s = X s ρ ^ n X s 1 ,   1 s n ,
replacing the unobservable { ε 1 , ε 2 , , ε n }. Here, ρ ^ n = ρ ^ n ( X 1 , , X n ) denotes an estimator of ρ derived from the observed sequence { X s }. As a concrete illustration, the least squares estimator of ρ takes the form
ρ ^ n = s = 1 n X s X s 1 s = 1 n X s 1 2 .
Accordingly, we define a residual-based kernel density estimator for f ( x ) as follows
f ^ n ( x ) : = 1 n h n s = 1 n K x ε ^ s h n ,   x R .
For AR models with i.i.d. errors, Lee and Na [8] studied L p -norm properties of residual density estimators, and Horváth and Zitikis [9] extended the Bickel–Rosenblatt framework to L p settings. When errors are α -mixing, the asymptotic behavior becomes more intricate, requiring careful handling of the interplay between mixing decay and bandwidth selection. Gao et al. [10] established asymptotic normality of residual density estimators in stationary and explosive AR(1) models, highlighting the influence of mixing and bandwidth on convergence rates, yet without providing explicit Berry–Esseen bounds.
In the Berry–Esseen literature, Petrov [11] obtained the classical O ( n 1 / 2 ) rate for independent samples. For dependent data, Liu and Niu [12] derived Berry–Esseen bounds for recursive kernel estimators under strong mixing using block decomposition, while Wu et al. [13] achieved rates close to O ( n 1 / 4 ) in semiparametric models with linear-process errors via Bernstein-block techniques. Wu et al. [14] later improved these bounds for strong-mixing settings. Neufeld [15] studied weighted sums in free probability, and Chen and Qu [16] derived expansions and large deviations for particle systems. However, a clear gap remains in establishing Berry–Esseen bounds specifically for the residual kernel density estimator in the basic AR(1) model under α -mixing errors. The present paper aims to fill this gap.
Compared with existing literature, it has made substantial progress in model specification, theoretical conditions, and result accuracy. Specifically, Gao et al. [10] established the asymptotic normality of residual density estimation under the same AR(1) model, but did not provide a quantitative characterization of the distributional convergence rate; while Liu and Niu [12] studied the Berry–Esseen bounds of density estimation under α -mixing sequences, their object is pure mixing sequences without a regression structure, and they require a fast-decaying mixing coefficient ( α ( n ) = O ( n τ ) , τ > 6 ). In contrast, this paper is the first to establish explicit Berry–Esseen bounds for residual kernel density estimation under the AR(1) regression framework. We not only relax the mixing condition to α ( n ) = O ( n ) , > 3 , but also obtain a convergence bound dependent on the bandwidth, block parameters, and mixing coefficient through refined block partitioning and bias analysis. Under reasonable parameter selection, the rate can reach O ( n ( 2 c 2 b + a ) / 4 ) , which is superior to the O ( n 1 / 10 ) in Liu and Niu [12], and approaches the near-optimal order O ( n 1 / 4 ) in nonparametric estimation when the mixing decays sufficiently fast. Therefore, the present work provides a more rigorous theoretical foundation for subsequent nonparametric inference in high-dimensional time series.
A sequence of random variables { Z i : i 1 } is designated as α -mixing if its dependence coefficient α ( n ) , defined as
α ( n ) = sup δ sup k δ | P ( A B ) P ( A ) P ( B ) | : B G k + n , A G 1 k 0
as n , where G a b = σ ( Z a , Z a + 1 , , Z b ) denotes the σ -field generated by observations from index a to b ( a b ). The α -mixing condition is relatively mild and covers many stochastic processes, including a wide range of time-series models. For further properties and applications of α -mixing. we refer to Wu et al. [5], Honda [6] and the references therein.
The paper is organized as follows. Section 2 meticulously elucidates the core hypotheses underpinning the study and presents principal results, including the Berry–Esseen bound and a corollary featuring a unique convergence rate. Section 3 provides the auxiliary lemma. Numerical simulation is provided in Section 4. Last, some proofs in Section 5. Unless otherwise specified, all limits are evaluated as the sample size n . We employ the symbol C to denote any finite positive constant, with its specific value being irrelevant to the underlying mathematical reasoning.

2. Some Basic Assumptions and Main Results

We now list the assumptions underpinning our analysis.
  • (a) The error sequence { ε s } forms a strictly stationary α -mixing stochastic process, with a bounded, unknown probability density function f : R R + .
    (b) The density f has bounded first-order ( f ) and second-order ( f ) derivatives over R .
  • (a) For r > 2 and τ > 0 , E ε 1 = 0 and E | ε 1 | r + τ < . The α -mixing coefficient satisfies α ( n ) = O ( n ι ) with ι > 3 .
    (b) For fixed x R , the asymptotic variance of f n ( x ) (defined in (2)) is positive
    lim inf n { n h n V a r ( f n ( x ) ) } = σ 1 2 ( x ) > 0 ,  
  • (a) The kernel function K : R R + is recognized as a bounded probability density function.
    (b) Let the derivative K ( x ) of K ( x ) be bounded, x R .
    (c) Its moments satisfy
    R x 2 K ( x ) d x = D > 0   and   R x K ( x ) d x = 0 ,
    where D is a finite positive constant.
  • Let μ n and ν n be positive integers satisfying, as n ,
    μ n ,   ν n ,   μ n ν n 2 ,   μ n / n 0 ,   and   ν n / μ n 0 .
  • Let h n be the bandwidth sequence satisfying h n 0 , n and n h n 3 0 .
  • Let ρ ^ n be an estimator of ρ that satisfies: n 1 / 2 ( ρ ^ n ρ ) = O P ( 1 ) .
    The core theoretical outcome concerning the Berry-Esséen bound for α -mixing random sequences is established below.
Remark 1.
We here elaborate on the justifications for Assumptions 2(a) and 4.
For Assumption 2(a), note that when r > 2 and τ > 0 , the ratio r / ( r + τ ) is always less than 1. For instance, if r = 4 and τ = 2 , this ratio equals 2 / 3 ; if r = 3 and τ = 1 , it equals 3 / 4 . Choosing > 3 ensures > r / ( r + τ ) , as 3 > 1 > r / ( r + τ ) . The assumption balances generality and tractability, covering a broad class of α-mixing processes with sufficiently fast dependence decay.
For Assumption 4, the constraints on μ n and ν n originate from the big-block-small-block technique, which decomposes dependent α-mixing sequences into approximately independent components. The condition μ n ensures each block contains enough observations to apply limit theorems, while μ n / n 0 guarantees a large number of blocks for aggregation. The condition ν n strengthens the independence approximation between consecutive big blocks, and ν n / μ n 0 ensures buffer blocks are negligible to avoid excessive data loss. The constraint μ n ν n 2 balances mixing coefficient decay and block size growth, which ensures weak dependence between non-consecutive blocks.
Theorem 1.
Suppose Assumptions 1–6 hold. Let x R be a fixed point where the density function f satisfies first-order Lipschitz continuity. This condition implies
| f ( x ) f ( x z ) | C | z | ,   z R
where C is a positive constant. Let Φ ( · ) denote the cumulative distribution function of the standard normal distribution. We take moment parameters r > 2 and τ > 0 , along with an auxiliary parameter m > 0 . For this auxiliary parameter, we require p 1 + q 1 < 1 where p and q are Hölder conjugate exponents. Under these settings, there exists a positive constant C such that
  sup z R P f ^ n ( x ) E ( f n ( x ) ) V a r ( f n ( x ) ) z Φ ( z )     = O { ν n μ n h n 1 / 2 1 / 2 + μ n α 1 p 1 q 1 ( μ n ) n h n 1 p 1 q 1 1 / 2 + μ n n h n 1 / 2 1 / 2 + ν n α 1 p 1 q 1 ( ν n ) n h n 1 p 1 q 1       + μ n δ n ( r 2 ) / 2 h n ( r 2 ) / 2 + μ n ( r 2 ) / 2 n ( r 2 ) / 2 h n ( r τ 2 r + r 2 ) / ( 2 r + 2 τ ) + α ( m 1 ) / m ν n μ n ( 1 m ) / m n 1 / 2 h n ( m 1 ) / 2 m 1 / 2 } .
where δ > 0 is a small constant.
Corollary 1.
Assume Assumptions 1–6 with h n = n a , μ n = n b , and ν n = n c , where 1 / 3 < a < 1 / 2 , 0 < c < b 2 c < 1 , m a x ( 0 , 2 b 1 ) < c < b a / 2 . Let E | ε 1 | r + τ < ( r > 2 , τ > 0 ), and α ( n ) = O ( n ι ) for some ι > 3 . Then
sup z R P f ^ n ( x ) E ( f n ( x ) ) V a r ( f n ( x ) ) z Φ ( z ) = O n ( 2 c 2 b + a ) / 4 ,
with Φ ( · ) denoting the standard normal distribution function.
Remark 2.
Corollary 1 follows from Theorem 1 by substituting the specific parameterizations h n = n a , block parameters μ n = n b , ν n = n c and identifying the dominant error term through asymptotic comparison. Wu et al. [13] studied convergence rates of standardized partial sums, which depend solely on the decay of mixing coefficients. In contrast, the present work deals with residual kernel density estimation, where the rate must simultaneously capture two effects: the bias induced by the bandwidth h n and the dependence control via the block parameter μ n . Consequently, the rate expression is a joint function of a, b, and ι. As ι , the rate obtained here approaches O ( n 1 / 4 ) , which aligns with the near-optimal rates typical in nonparametric estimation, as discussed in Wu et al. [13].

3. Auxiliary Lemma

Lemma 1
([17]). For an α-mixing sequence { ε s , s 1 } satisfying E ε s = 0 and E | ε s | r + τ < with r > 2 and τ > 0 , assume the mixing coefficient decays as α ( n ) = O ( n ι ) , where ι > r / ( r + τ ) . For any δ > 0 , there exists a constant 0 < c = c ( r , τ , ι , δ ) < , then
E max 1 k n s = 1 k ε s r c n δ s = 1 n E | ε s | r + s = 1 n ( E | ε s | r + τ ) 2 / ( r + τ ) r / 2 ,   n 1 .
Lemma 2
([18]). Consider an α-mixing sequence { ε s , s 1 } with E ε s = 0 and finite ( 2 + τ ) -th moment E | ε s | 2 + τ < for τ > 0 . If the mixing coefficients satisfy s = 1 α τ 2 + τ ( s ) < , then
E s = 1 n ε s 2 = 1 + 16 s = 1 α τ / ( 2 + τ ) ( s ) s = 1 n ( E | ε s | 2 + τ ) 2 / ( 2 + τ ) ,   n 1 .
Lemma 3
([19]). Let ξ, η be G , H measurable random variables with E | ξ | p < , E | η | q < ( p , q > 1 and p 1 + q 1 < 1 ). Then
| E ξ η E ξ E η | 8 ( E | ξ | p ) p 1 ( E | η | q ) q 1 ( α ( G , H ) ) 1 p 1 q 1 .
Lemma 4
([20]). Given positive integers μ n , ν n and an α-mixing sequence { ε s , s 1 } of random variables. Define η j = s = j ( μ n + ν n ) + 1 j ( μ n + ν n ) + μ n Q ˜ n , s ( x ) for 0 j r n 1 . With r > 0 , m > 0 and 1 / s + 1 / m = 1 . There exists C > 0 such that for any i R ,
E exp i t j = 0 k 1 η j j = 0 k 1 E exp { i t ξ j } C | i | α 1 / s ( ν n ) j = 0 r n 1 η j m
Lemma 5
([12]). Consider random variables X and Y 1 , , Y m and positive thresholds w 1 , , w m , satisfies
sup u P X + i = 1 m Y i u Φ ( u )   sup u P ( X u ) Φ ( u ) + i = 1 m w i 2 π + i = 1 m P ( | Y i | > w i ) .
Lemma 6.
Under Assumptions 1–3 we have
E ( S n ) 2 C ν n μ n h n 1 / 2 + μ n α 1 p 1 q 1 ( μ n ) n h n 1 p 1 q 1
E ( S n ) 2 C μ n n h n 1 / 2 .
Moreover,
P | S n | > ν n μ n h n 1 / 2 + μ n α 1 p 1 q 1 ( μ n ) n h n 1 p 1 q 1 1 / 2 C ν n μ n h n 1 / 2 + μ n α 1 p 1 q 1 ( μ n ) n h n 1 p 1 q 1 1 / 2 , P | S n | > μ n n h n 1 / 2 1 / 2 C μ n n h n 1 / 2 1 / 2 .
Lemma 7.
Under Assumption 1(b), the variance discrepancy satisfies
s n 2 1 C ν n μ n h n 1 / 2 1 / 2 + μ n α 1 p 1 q 1 ( μ n ) n h n 1 p 1 q 1 1 / 2 + μ n n h n 1 / 2 1 / 2 + ν n α 1 p 1 q 1 ( ν n ) n h n 1 p 1 q 1
Lemma 8.
Under the conditions of Theorem 1,
sup z R P ( S n z ) Φ ( z )   C ( ν n μ n h n 1 / 2 1 / 2 + μ n α 1 p 1 q 1 ( μ n ) n h n 1 p 1 q 1 1 / 2 + μ n n h n 1 / 2 1 / 2 + ν n α 1 p 1 q 1 ( ν n ) n h n 1 p 1 q 1     + μ n δ n ( r 2 ) / 2 h n ( r 2 ) / 2 + μ n ( r 2 ) / 2 n ( r 2 ) / 2 h n ( r τ 2 r + r 2 ) / ( 2 r + 2 τ ) + α ( m 1 ) / m ν n μ n ( 1 m ) / m n 1 / 2 h n ( m 1 ) / 2 m 1 / 2 ) .

4. Numerical Simulation

In this section, all numerical simulations were conducted using custom R scripts. The code encompasses five core functional modules: (1) verification of parameter constraints aligned with Corollary 1, (2) generation of α -mixing error sequences and AR(1) process data, (3) least squares estimation of the autoregressive coefficient ρ , (4) calculation of the theoretical expectation and variance for residual kernel density estimators, and (5) visualization of standardized statistic distributions. Detailed English comments are included to ensure full reproducibility.
We verify the asymptotic normality of the residual kernel density estimator and the convergence rate of the Berry–Esseen bound for Theorem 1 and Corollary 1 under finite samples via Monte Carlo simulation. The observations are generated from
X s = ρ true X s 1 + ε s ,   1 s n ,
where the sample size n is set to 50, 100, and 200, respectively.
  • The generation methods of parameters and errors are specified as follows: the autoregressive coefficient ρ true = 0.6 , and the initial value X 1 follows N 0 , σ ε 2 1 ρ true 2 ;
  • The error sequence { ε s } is an α -mixing process, generated by the recursive formula ε s = 0.2 ε s 1 + η s , where η s is an independent and identically distributed (i.i.d.) Gaussian innovation term with η s N ( 0 , 1 ) ; the bandwidth is set to h n = n 0.4 , which satisfies the constraint 1 / 3 < 0.4 < 1 / 2 ; the block parameters are set as follows: the large block length μ n = n 0.45 and the small block length ν n = n 0.23 , which satisfy the constraints in Corollary 1;
  • Calculate the standardized statistic P n = f ^ n ( x 0 ) E f n ( x 0 ) Var f n ( x 0 ) ;
  • The kernel function uses a Gaussian kernel with K ( u ) = ϕ ( u ) .
Figure 1 shows the kernel density histogram of the standardized statistic P n ( x 0 ) corresponding to the residual density estimator in model (1), overlaid with the standard normal distribution density curve to visually verify asymptotic normality. When the sample size n = 50 , there is a certain deviation between the shape of the histogram and the normal curve, characterized by a slightly higher kurtosis and obvious differences in tail thickness, which is related to the failure of the distributional properties of dependent data to fully manifest under small samples. As the sample size increases to n = 100 , the symmetry of the histogram gradually becomes prominent, and the fitting degree between the frequency distribution and the normal curve is significantly improved. When n = 200 , the histogram fully presents a bell-shaped symmetric structure, and the frequency distribution in each interval almost coincides with the standard normal curve, with only slight deviations in the extreme value region. This phenomenon is consistent with the conclusions in the literature.
Figure 2 is the Quantile–Quantile plots of P n for n = 50 , 100 , 200 , respectively. In each plot, the horizontal axis corresponds to the quantiles of the standard normal distribution, and the vertical axis corresponds to the quantiles of P n . The red straight line is the 45 ° reference line.
Figure 1 and Figure 2 and Table 1 show that the standardized residual kernel density estimator of the model in model (1) has good asymptotic normality, consistent with the core conclusion of Theorem 1. The Berry–Esseen bounds decreases monotonically with increasing sample size, and its convergence rate follows O ( n ( 2 c 2 b + a ) / 4 ) . When the mixing coefficient decays sufficiently fast, this rate approaches the near-optimal O ( n 1 / 4 ) in nonparametric estimation, which is superior to the O ( n 1 / 10 ) rate in [12].
Thus, Monte Carlo simulations confirm the theoretical findings of Theorem 1 and Corollary 1. Specifically, the residual kernel density estimator in model (1) exhibits valid asymptotic normality, and its Berry–Esseen bound converges at the theoretical rate. These results also confirm the rationality of the parameter constraints and theoretical derivations in this paper.

5. Proofs

To present the main proofs, we adopt the following notation (see Gao and Yang [10]):
The standardized residual density estimator decomposes as
f n ( x ) E f n ( x ) Var ( f n ( x ) ) = s = 1 n Q n , s ( x ) Var ( s = 1 n Q n , s ( x ) ) : = s = 1 n Q ˜ n , s ( x ) ,   n 1 ,
where
Q ˜ n , s ( x ) = Q n , s ( x ) Var ( s = 1 n Q n , s ( x ) ) ,
and
Q n , s ( x ) = h n 1 / 2 K x ε s h n E K x ε s h n ,   1 s n .
To handle the dependence, we employ a block-splitting technique (see Masry [5]). Choose large-block length μ n and small-block length ν n . Define
r n : = n μ n + ν n .  
The summation S n is partitioned into three components
S n = s = 1 n Q ˜ n , s ( x ) = j = 0 r n 1 φ j + j = 0 r n 1 η j + ξ r n : = S n + S n + S n .  
Define φ j , η j , ξ k as follows
φ j = s = j ( μ n + ν n ) + 1 j ( μ n + ν n ) + μ n Q ˜ n , s ( x ) ,   0 j r n 1 ,  
η j = s = j ( μ n + ν n ) + μ n + 1 ( j + 1 ) ( μ n + ν n ) Q ˜ n , s ( x ) ,   0 j r n 1 ,  
ξ r n = s = r n ( μ n + ν n ) + 1 n Q ˜ n , s ( x ) .  
Proof of Lemma 6. 
Under Assumptions 1(a) and 3(a) [10], for any p 1 ,
E | Q n , s ( x ) | p = E | Q n , 1 ( x ) | p C h n p / 2 E K x ε 1 h n p   = C h n p / 2 R K x z h n p f ( z ) d z   C h n 1 p / 2 ,   1 s n .
Moreover, by Assumption 2(b) and (2), we have
Var ( f n ( x ) ) = n 2 h n 1 Var ( s = 1 n Q n , s ( x ) ) ,
lim inf n { n h n V a r ( f n ( x ) ) } = lim inf n { n 1 Var ( s = 1 n Q n , s ( x ) ) } = σ 1 2 ( x ) > 0 .
Thus
E | Q ˜ n , s ( x ) | p = E Q n , s ( x ) Var ( s = 1 n Q n , s ( x ) ) p C n p / 2 ( x ) E | Q n , s ( x ) | p   C n p / 2 h n 1 p / 2 ,   1 s n .
Using (6) and (7), we decompose the variance
E ( S n ) 2 = Var j = 0 r n 1 η j = j = 0 r n 1 Var ( η j ) + 2 0 i < j r n 1 Cov ( η i , η j ) : = R 1 + R 2 .
By (10) together with Assumption 2(b) and Lemma 2,
Var ( η j ) = E s = j ( μ n + ν n ) + μ n + 1 ( j + 1 ) ( μ n + ν n ) Q ˜ n , s ( x ) 2 s = j ( μ n + ν n ) + μ n + 1 ( j + 1 ) ( μ n + ν n ) ( E | Q ˜ n , s ( x ) | 4 ) 1 / 2   C ν n n h n 1 / 2 .
Thus
R 1 = j = 0 r n 1 Var ( η j ) C r n ν n n h n 1 / 2 C ν n μ n h n 1 / 2 .
For R 2 , set χ j = j ( μ n + ν n ) + μ n . Then
R 2 = 2 0 i < j r n 1 Cov ( η i , η j )   = 2 0 i < j r n 1 y 1 = 1 ν n y 2 = 1 ν n Cov [ Q ˜ n , χ i + y 1 ( x ) , Q ˜ n , χ j + y 2 ( x ) ] ,
when i j , we have | χ i χ j + y 1 y 2 | μ n . By Assumption 2(b), (4), (10) and Lemma 3,
| R 2 | 2 1 i < j n j i μ n | Cov [ Q ˜ n , i ( x ) , Q ˜ n , j ( x ) ] |   C 1 i < j n j i μ n α 1 p 1 q 1 ( μ n ) ( E | Q ˜ n , i ( x ) | p ) p 1 ( E | Q ˜ n , j ( x ) | q ) q 1   C μ n α 1 p 1 q 1 ( μ n ) n h n 1 p 1 q 1 .
Combining (13) and (14) gives
E ( S n ) 2 C ν n μ n h n 1 / 2 + μ n α 1 p 1 q 1 ( μ n ) n h n 1 p 1 q 1 .
Similarly, using Lemma 2 together with Hölder’s inequality yields
E ( S n ) 2 = E s = r n ( μ n + ν n ) + μ n + 1 n Q ˜ n , s ( x ) 2   C s = r n ( μ n + ν n ) + μ n + 1 n ( E | Q ˜ n , s ( x ) | 4 ) 1 / 2   C ( n ( r n ( μ n + ν n ) + 1 ) ) n h n 1 / 2   C μ n n h n 1 / 2 .
Finally, Markov’s inequality provides the probability bounds
P | S n | > ν n μ n h n 1 / 2 + μ n α 1 p 1 q 1 ( μ n ) n h n 1 p 1 q 1 1 / 2 C ν n μ n h n 1 / 2 + μ n α 1 p 1 q 1 ( μ n ) n h n 1 p 1 q 1 1 / 2 ,
P | S n | > μ n n h n 1 / 2 1 / 2 C μ n n h n 1 / 2 1 / 2 .
This completes the proof of Lemma 6. □
Proof of Lemma 7. 
Starting from E S n 2 = 1 , write S n = S n + ( S n + S n ) and compute
E ( S n ) 2 = E S n ( S n + S n ) 2 = 1 + E ( S n + S n ) 2 2 E S n ( S n + S n ) .
By the triangle inequality and properties of expectations
| E ( S n ) 2 1 | E ( S n + S n ) 2 + 2 E | S n ( S n + S n ) | ,
Applying Hölder’s inequality to the second term and the C r -inequality to the first term,
E | S n ( S n + S n ) | E S n 2 1 / 2 E ( S n + S n ) 2 1 / 2 E 1 / 2 ( S n ) 2 + E 1 / 2 ( S n ) 2 ,
and by the C r -inequality
E ( S n + S n ) 2 2 E ( S n ) 2 + E ( S n ) 2 .
Combining these bounds with Lemma 6 to control E ( S n ) 2 and E ( S n ) 2 , we obtain
E ( S n ) 2 1 C ν n μ n h n 1 / 2 1 / 2 + μ n α 1 p 1 q 1 ( μ n ) n h n 1 p 1 q 1 1 / 2 + μ n n h n 1 / 2 1 / 2 .
where φ j is defined in (6). Define
s n 2 = j = 0 r n 1 Var ( φ j ) ,   Γ n = 0 i < j r n 1 cov ( φ i , φ j ) ,   s n 2 = E [ S n ] 2 2 Γ n .
Given χ j = j ( μ n + ν n ) for i j and | χ i χ j + y 1 y 2 | ν n , it can be seen that
2 Γ n = 2 0 i < j r n 1 Cov ( φ i , φ j ) = 2 0 i < j r n 1 y 1 = 1 μ n y 2 = 1 μ n Cov [ Q ˜ n , χ i + y 1 ( x ) , Q ˜ n , χ j + y 2 ( x ) ] .
Analogously to (15), Lemma 3 gives
| Γ n |   1 i < j n j i ν n Cov Q ˜ n , i ( x ) , Q ˜ n , j ( x )   C 1 i < j n j i ν n α 1 p 1 q 1 ( ν n ) E | Q ˜ n , i ( x ) | p p 1 E | Q ˜ n , j ( x ) | q q 1   C ν n α 1 p 1 q 1 ( ν n ) n h n 1 p 1 q 1 .
Therefore,
| s n 2 1 | C ν n μ n h n 1 / 2 1 / 2 + μ n α 1 p 1 q 1 ( μ n ) n h n 1 p 1 q 1 1 / 2 + μ n n h n 1 / 2 1 / 2 + ν n α 1 p 1 q 1 ( ν n ) n h n 1 p 1 q 1 .
This completes the proof of Lemma 7. □
Proof of Lemma 8. 
Applying Lemma 1, Assumptin (4) and the Berry–Esseen inequality (Petrov [11]), we obtain
  sup z R P ( T n / s n z ) Φ ( z ) C s n r j = 0 r n 1 E | φ j | r     C j = 0 r n 1 E | φ j | r     = C j = 0 r n 1 E s = j ( μ n + ν n ) + 1 j ( μ n + ν n ) + μ n Q ˜ n , i ( x ) r     C j = 0 r n 1 μ n δ E s = j ( μ n + ν n ) + 1 j ( μ n + ν n ) + μ n Q ˜ n , s ( x ) r + s = j ( μ n + ν n ) + 1 j ( μ n + ν n ) + μ n E | Q ˜ n , s ( x ) | r + τ 2 / ( r + τ ) r / 2     C r n μ n δ + 1 n r / 2 h n ( 2 r ) / 2 + μ n r / 2 n r / 2 h n ( 2 r r 2 r τ ) / ( 2 r + 2 τ )     C μ n δ n ( r 2 ) / 2 h n ( r 2 ) / 2 + μ n ( r 2 ) / 2 n ( r 2 ) / 2 h n ( r τ 2 r + r 2 ) / ( 2 r + 2 τ ) .
By (23), the probability difference can be bounded through normal approximation
  sup z R P ( T n z + u ) P ( T n z )     sup z R P T n s n z Φ ( z ) + sup z R P T n s n z + u s n Φ z + u s n       + sup z Φ z + u s n Φ z s n     C μ n δ n ( r 2 ) / 2 h n ( r 2 ) / 2 + μ n ( r 2 ) / 2 n ( r 2 ) / 2 h n ( r τ 2 r + r 2 ) / ( 2 r + 2 τ ) + | u | / s n     C μ n δ n ( r 2 ) / 2 h n ( r 2 ) / 2 + μ n ( r 2 ) / 2 n ( r 2 ) / 2 h n ( r τ 2 r + r 2 ) / ( 2 r + 2 τ ) + | u | .
Let ζ ( t ) be the characteristic function of S n and ψ ( t ) that of T n . Applying Lemma 4 with 1 / s + 1 / m = 1 gives
ψ ( t ) = E exp { i t T n } = j = 0 r n 1 E exp { i t φ j } = j = 0 r n 1 E exp { i t φ j } , | ζ ( t ) ψ ( t ) | = E exp i t j = 0 r n 1 φ j j = 0 r n 1 E exp { i t φ j }   C | t | α 1 / s ( ν n ) j = 0 r n 1 φ j m   C | t | α 1 / s ( ν n ) j = 0 r n 1 ( E | φ j | m ) 1 / 2   C | t | α 1 / s ( ν n ) j = 0 r n 1 E s = j ( μ n + ν n ) + 1 j ( μ n + ν n ) + μ n Q ˜ n , s ( x ) m 1 / m   C | t | α 1 / s ( ν n ) j = 0 r n 1 s = j ( μ n + ν n ) + 1 j ( μ n + ν n ) + μ n E | Q ˜ n , s ( x ) | 2 m 1 / 2 1 / m   C | t | α 1 / s ( ν n ) r n μ n 1 / m n 1 / 2 h n ( 1 m ) / 2 m   C | t | α ( m 1 ) / m ( ν n ) μ n ( 1 m ) / m n 1 / 2 h n ( m 1 ) / 2 m .
Now, by Esseen’s inequality (Petrov [11]), for any T > 0 there exists a constant C > 0 such that
  sup z R P ( S n z ) P ( T n z )     T T ζ ( t ) ψ ( t ) t d t + T sup z R | z | C / T P ( T n z + u ) P ( T n z ) d z     C T α ( m 1 ) / m ( ν n ) μ n ( 1 m ) / m n 1 / 2 h n ( m 1 ) / 2 m + T C T μ n δ n ( r 2 ) / 2 h n ( r 2 ) / 2 + μ n ( r 2 ) / 2 n ( r 2 ) / 2 h n ( r τ 2 r + r 2 ) / ( 2 r + 2 τ ) + 1 T     C α ( m 1 ) / m ( ν n ) μ n ( 1 m ) / m n 1 / 2 h n ( m 1 ) / 2 m 1 / 2 + μ n δ n ( r 2 ) / 2 h n ( r 2 ) / 2 + μ n ( r 2 ) / 2 n ( r 2 ) / 2 h n ( r τ 2 r + r 2 ) / ( 2 r + 2 τ ) ,
where we have chosen T = ( n 1 / 2 h n ( m 1 ) / 2 m α ( m 1 ) / m ( ν n ) μ n ( 1 m ) / m ) 1 / 2 .
Finally,
  sup z R P ( S n z ) Φ ( z )     sup z R P ( S n z ) P ( T n z ) + sup z R P ( T n z ) Φ ( z / s n )       + sup z R Φ ( z / s n ) Φ ( u ) .
Using Lemmas 6, 7 and (20)–(26), we complete the proof of Lemma 8. □
Proof of Theorem 1. 
Combining Lemmas 5–8 with (11) and (12) yields
  sup z R P ( S n z ) Φ ( z )     sup z R P ( S n z ) Φ ( z ) + P | S n | > ν n μ n h n 1 / 2 + μ n α 1 p 1 q 1 ( μ n ) n h n 1 p 1 q 1 1 / 2       + ν n μ n h n 1 / 2 + μ n α 1 p 1 q 1 ( μ n ) n h n 1 p 1 q 1 1 / 2 2 π + P S n μ n n h n 1 / 2 1 / 2 + μ n n h n 1 / 2 1 / 2 2 π     C ( ν n μ n h n 1 / 2 1 / 2 + μ n α 1 p 1 q 1 ( μ n ) n h n 1 p 1 q 1 1 / 2 + μ n n h n 1 / 2 1 / 2 + ν n α 1 p 1 q 1 ( ν n ) n h n 1 p 1 q 1       + μ n δ n ( r 2 ) / 2 h n ( r 2 ) / 2 + μ n ( r 2 ) / 2 n ( r 2 ) / 2 h n ( r τ 2 r + r 2 ) / ( 2 r + 2 τ ) + α ( m 1 ) / m ( ν n ) μ n ( 1 m ) / m n 1 / 2 h n ( m 1 ) / 2 m 1 / 2 ) .
According to (Gao [10]), we have
f ^ n ( x ) f n ( x ) V a r ( f n ( x ) ) = O P 1 n 1 / 2 h n 3 / 2 .
Under Assumptions 1–3, the kernel function K is continuously differentiable of the first order and its derivative is bounded, for which there exists a constant M > 0 such that | K ( u ) | M for all u R . The density f is smooth, and the errors satisfy E | ε 1 | r + τ < for r > 2 and τ > 0 . For the residual ε ^ s = X s ρ ^ n X s 1 and the true innovation ε s = X s ρ X s 1 , a first-order Taylor expansion of K x ε ^ s h n around x ε s h n gives
K x ε ^ s h n K x ε s h n = K ( ξ s x ) ε ^ s ε s h n ,
where ξ s x lies between x ε ^ s h n and x ε s h n . Since ε ^ s ε s = ( ρ ρ ^ n ) X s 1 , we obtain
f ^ n ( x ) f n ( x ) = 1 n h n s = 1 n K x ε ^ s h n K x ε s h n = ( ρ ρ ^ n ) n h n 2 s = 1 n K ( ξ s x ) X s 1 .
Taking squares and applying the Cauchy–Schwarz inequality, together with Assumption 3(b) and Lemma A.2 of [2], yields
| f ^ n ( x ) f n ( x ) | 2 | ρ ^ n ρ | 2 1 n 2 h n 4 t = 1 n K ( ξ s x ) X s 1 2
E f ^ n ( x ) f n ( x ) 2 E ρ ^ n ρ 2 1 n 2 h n 4 t = 1 n K ( ξ s x ) X s 1 2 ,
by the Cauchy–Schwarz inequality
E ρ ^ n ρ 2 t = 1 n K ( ξ s x ) X s 1 2 E ρ ^ n ρ 4 1 / 2 E t = 1 n K ( ξ s x ) X s 1 4 1 / 2 .
From the condition | ρ ^ n ρ | = O P ( n 1 / 2 ) in Assumption (6), the result | f ^ n ( x ) f n ( x ) | = O P 1 n h n 2 obtained in [10] and the asymptotic normality n ( ρ ^ n ρ ) d N ( 0 , ( 1 ρ 2 ) ) given in [21] (where N ( 0 , ( 1 ρ 2 ) ) is a Gaussian variable), we can obtain
E | n 1 / 2 ρ ^ n ρ | 2 ( 1 ρ 2 ) σ 2 < ,
E | ρ ^ n ρ | 2 = O 1 n ,
and by Assumption 2(a) the fourth-order moment is bounded
E ( ρ ^ n ρ ) 4 = O 1 n 2 .
By Lemma A.2 of [10], we have
E s = 1 n X s 1 K ( ξ s ) 4 = O n 2 .
Substituting (30) and (31) into 29 gives
E f ^ n ( x ) f n ( x ) 2 = O 1 n 2 h n 4 .
Further, from Assumption 2(b), there exists a constant Var ( f n ( x ) ) c n h n for some constant c > 0 . Applying Cauchy–Schwarz inequality to E | f ^ n ( x ) f n ( x ) | Var ( f n ( x ) ) , we obtain
E | f ^ n ( x ) f n ( x ) | Var ( f n ( x ) ) E f ^ n ( x ) f n ( x ) 2 Var ( f n ( x ) ) = O 1 n 1 / 2 h n 3 / 2 ,
which establishes the expectation bound.
By Markov’s inequality
P f ^ n ( x ) f n ( x ) V a r ( f n ( x ) ) > a 1 a E f ^ n ( x ) f n ( x ) V a r ( f n ( x ) ) μ n n h n 1 / 2 1 / 2 .
Choosing a = μ n 1 / 2 h n 1 yields the desired result.
Finally, Lemma 5, (32) and Lemma 8 together complete the proof of Theorem 1. □
Proof of Corollary 1. 
Substitute α ( n ) = O n ι , h n = n a , μ n = n b , ν n = n c with 1 / 3 < a < 1 / 2 , 0 < c < b 2 c < 1 , m a x ( 0 , 2 b 1 ) < c < b a / 2 , and take p = q = 3 , m = 3 , δ = 1 , r = 4 , τ = 2 into Theorem 1
  sup z R P f ^ n ( x ) E ( f n ( x ) ) Var ( f n ( x ) ) z Φ ( z )     = O n ( 2 c 2 b + a ) / 4 + n ( 3 b b ι + a 3 ) / 6 + n ( 2 b + a 2 ) / 4 + n ( 3 c c ι + a 3 ) / 3 + n b + a 1       + n ( 3 b + 4 a 3 ) / 3 + n ( 3 + 2 a 4 c ι 4 b ) / 12 .
The second term involves the mixing coefficient α ( n ) = O n ι . When ι > 3 , although its exponent is negative, it is closer to zero than those of the other terms. The remaining terms have more negative (lower-order) exponents. Hence the second term dominates.
This completes the proof of Corollary 1. □

6. Discussion

The findings of this paper will not only advance the theoretical foundation of nonparametric inference for high-dimensional time series in the future but also provide key support for model diagnostics. For instance, in terms of model diagnostics, it can verify the assumption adaptability of loan time series models through comparisons of relevant standards, assess the robustness of key parameters, and offer a quantitative basis for model iteration and updates, thereby improving the accuracy of loan risk control and the reliability of model applications.

7. Conclusions

This paper derives explicit Berry–Esseen bounds for residual kernel density estimators in AR(1) models with α -mixing errors. Under our assumptions, we derive Kolmogorov distance bounds between the standardized estimator and its Gaussian limit. These bounds incorporate the bandwidth, block parameters, and mixing intensity. A key corollary quantifies the convergence rate as O ( n ( 2 c 2 b + a ) / 4 ) , approaching the near-optimal O ( n 1 / 4 ) as the mixing coefficient decays sufficiently fast, advancing the theoretical foundation for nonparametric inference in high-dimensional time series. Monte Carlo simulations with sample sizes n = 50 , 100 , 200 verify the estimator’s asymptotic normality and the monotonic convergence of the Berry–Esseen bounds, consistent with the theoretical rate and confirming the rationality of parameter constraints. Future work may extend to higher-order AR models, relax the α -mixing assumption to long-memory processes, or incorporate data-driven bandwidth selection to broaden applicability.

Author Contributions

Methodology, J.W.; Writing—original draft, J.W.; Writing—review and editing, T.L.; Funding acquisition, T.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Jilin Provincial Natural Science Foundation Youth Development Program (Grant No. YDZJ202301ZYTS373).

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

We sincerely appreciate the editorial team’s proactive adherence to submission guidelines, as this thoughtful preparation will undeniably reduce administrative delays and facilitate a more efficient publication trajectory for our work. Additionally, gratitude extends to unnamed colleagues who provided ad-hoc methodological insights or data-management support—their contributions, though uncredited in authorship, were integral to refining this manuscript.

Conflicts of Interest

All authors declare the absence of any competing interests.

References

  1. Brockwell, P.J.; Davis, R.A. Time Series: Theory and Methods; Springer Series in Statistics; Springer: New York, NY, USA, 1991. [Google Scholar]
  2. Parzen, E. On Estimation of a Probability Density Function and Mode. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
  3. Rosenblatt, M. Remarks on Some Nonparametric Estimates of a Density Function. Ann. Math. Stat. 1956, 27, 832–837. [Google Scholar] [CrossRef]
  4. Rosenblatt, M. A Central Limit Theorem and a Strong Mixing Condition. Proc. Natl. Acad. Sci. USA 1956, 42, 43–47. [Google Scholar] [CrossRef] [PubMed]
  5. Wu, Y.; Yu, W.; Wang, X.J.; Shen, A. The rate of complete consistency for recursive probability density estimator under strong mixing samples. Stat. Probab. Lett. 2021, 176, 109130. [Google Scholar] [CrossRef]
  6. Honda, T. Nonparametric Estimation of a Conditional Quantile for α-mixing Processes. Ann. Inst. Stat. Math. 2000, 52, 459–470. [Google Scholar] [CrossRef]
  7. Laïb, N.; Louani, D. Asymptotic normality of kernel density function estimator from continuous time stationary and dependent processes. Stat. Probab. Lett. 2019, 145, 187–196. [Google Scholar] [CrossRef]
  8. Lee, S.; Na, S.R. On the Bickel–Rosenblatt test for first-order autoregressive models. Stat. Probab. Lett. 2002, 56, 23–35. [Google Scholar] [CrossRef]
  9. Horvath, L.; Zitikis, R. Asymptotics of the Lp-norms of density estimators in the first-order autoregressive models. Stat. Probab. Lett. 2003, 65, 331–342. [Google Scholar] [CrossRef]
  10. Gao, M.; Yang, W.; Wu, S.; Yu, W. Asymptotic normality of residual density estimator in stationary and explosive autoregressive models. Comput. Stat. Data Anal. 2022, 175, 107549. [Google Scholar] [CrossRef]
  11. Petrov, V.V. Limit Theorems of Probability Theory; Oxford University Press Inc.: New York, NY, USA, 1995. [Google Scholar]
  12. Liu, Y.X.; Niu, S.L. Berry–Esseen bounds of recursive kernel estimator of density under strong mixing assumptions. Bull. Korean Math. Soc. 2017, 54, 343–358. [Google Scholar] [CrossRef]
  13. Wu, Y.; Wang, X.; Li, Y.; Hu, S. Berry–Esseen type bounds of the estimators in a semiparametric model under linear process errors with α-mixing dependent innovations. Statistics 2019, 53, 943–967. [Google Scholar] [CrossRef]
  14. Wu, Y.; Hu, T.C.; Volodin, A.; Wang, X. Some improved results on Berry–Esséen bounds for strong mixing random variables and applications. Statistics 2023, 57, 740–760. [Google Scholar] [CrossRef]
  15. Neufeld, L. Weighted sums and Berry–Esseen type estimates in free probability theory. Probab. Theory Relat. Fields 2024, 190, 803–879. [Google Scholar] [CrossRef]
  16. Chen, W.; Qu, Z. Berry–Esseen expansion and Cramér-type large deviation for run and tumble particles on one dimension. Stat. Probab. Lett. 2025, 218, 110308. [Google Scholar] [CrossRef]
  17. Yang, S.C. Maximal Moment Inequality for Partial Sums of Strong Mixing Sequences and Application. Acta Math. Sin. Engl. Ser. 2007, 23, 1013–1024. [Google Scholar] [CrossRef]
  18. Yang, W.; Wang, Y.; Hu, S. Some probability inequalities of least-squares estimator in nonlinear regression model with strong mixing errors. Commun. Stat.—Theory Methods 2017, 46, 165–175. [Google Scholar] [CrossRef]
  19. Hall, P.; Heyde, C.C. Martingale Limit Theory and Its Application; Academic Press, Inc.: New York, NY, USA, 1980. [Google Scholar]
  20. Yang, S. Uniformly Asymptotic Normality of the Regression Weighted Estimator for Negatively Associated Samples. J. Stat. Inference Plan. 2003, 115, 345–360. [Google Scholar] [CrossRef]
  21. Phillips, P.C.B.; Magdalinos, T. Limit theory for moderate deviations from a unit root. J. Econom. 2007, 136, 115–130. [Google Scholar] [CrossRef]
Figure 1. Histograms of P n corresponding to sample sizes of 50, 100, and 200.
Figure 1. Histograms of P n corresponding to sample sizes of 50, 100, and 200.
Mathematics 14 00073 g001
Figure 2. Quantile–Quantile plots of P n corresponding to sample sizes of 50, 100, and 200.
Figure 2. Quantile–Quantile plots of P n corresponding to sample sizes of 50, 100, and 200.
Mathematics 14 00073 g002
Table 1. Berry–Esseen bounds under different sample sizes.
Table 1. Berry–Esseen bounds under different sample sizes.
n50100200
P P n z Φ ( z ) 0.08490.06640.0422
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, J.; Liu, T. Berry–Esseen Bounds of Residual Density Estimators in the First-Order Autoregressive Model with the α-Mixing Errors. Mathematics 2026, 14, 73. https://doi.org/10.3390/math14010073

AMA Style

Wang J, Liu T. Berry–Esseen Bounds of Residual Density Estimators in the First-Order Autoregressive Model with the α-Mixing Errors. Mathematics. 2026; 14(1):73. https://doi.org/10.3390/math14010073

Chicago/Turabian Style

Wang, Jiaxin, and Tianze Liu. 2026. "Berry–Esseen Bounds of Residual Density Estimators in the First-Order Autoregressive Model with the α-Mixing Errors" Mathematics 14, no. 1: 73. https://doi.org/10.3390/math14010073

APA Style

Wang, J., & Liu, T. (2026). Berry–Esseen Bounds of Residual Density Estimators in the First-Order Autoregressive Model with the α-Mixing Errors. Mathematics, 14(1), 73. https://doi.org/10.3390/math14010073

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop