The Law of the Iterated Logarithm for the Error Distribution Estimator in First-Order Autoregressive Models

Wang, Bing; Jin, Yi; Wang, Lina; Shi, Xiaoping; Yang, Wenzhi

doi:10.3390/axioms14110784

Open AccessArticle

The Law of the Iterated Logarithm for the Error Distribution Estimator in First-Order Autoregressive Models

by

Bing Wang

¹

,

Yi Jin

¹,

Lina Wang

¹,

Xiaoping Shi

²

and

Wenzhi Yang

^1,*

¹

School of Big Data and Statistics, Anhui University, Hefei 230601, China

²

Irving K. Barber Faculty of Science, University of British Columbia, Kelowna, BC V1V 1V7, Canada

^*

Author to whom correspondence should be addressed.

Axioms 2025, 14(11), 784; https://doi.org/10.3390/axioms14110784

Submission received: 29 August 2025 / Revised: 22 October 2025 / Accepted: 23 October 2025 / Published: 26 October 2025

Download

Browse Figures

Versions Notes

Abstract

This paper investigates the asymptotic behavior of kernel-based estimators for the error distribution in a first-order autoregressive model with dependent errors. The model assumes that the error terms form an

α

-mixing sequence with an unknown cumulative distribution function (CDF) and finite second moment. Due to the unobservability of true errors, we construct kernel-smoothed estimators based on residuals obtained via least squares. Under mild assumptions on the kernel function, bandwidth selection, and mixing coefficients, we establish a logarithmic law of the iterated logarithm (LIL) for the supremum norm difference between the residual-based kernel estimator and the true distribution function. The limiting bound is shown to be

1 / 2

, matching the classical LIL for independent samples. To support the theoretical results, simulation studies are conducted to compare the empirical and kernel distribution estimators under various sample sizes and error term distributions. The kernel estimators demonstrate smoother convergence behavior and improved finite-sample performance. These results contribute to the theoretical foundation for nonparametric inference in autoregressive models with dependent errors and highlight the advantages of kernel smoothing in distribution function estimation under dependence.

Keywords:

autoregressive model; α-mixing sequence; kernel estimation; law of the iterated logarithm; residual-based estimator

MSC:

60B10; 62G05

1. Introduction

Suppose that the sequence

{X_{i}}

satisfies the first-order autoregressive process

X_{i} = ρ X_{i - 1} + ϵ_{i},

(1)

where

ϵ_{i}, i = 0, \pm 1, \pm 2, \dots

, are random errors with mean 0, variance

σ_{0}^{2} > 0

, and they form a weakly stationary sequence of

α

-mixing with an unknown cumulative distribution function (CDF) F. Henceforth, we use the term stationary to mean weakly stationary. The definition of

α

-mixing will be provided later. It is well-known that the parameter

ρ

characterizes the properties of the process

X_{i}

. In this paper, we suppose

| ρ | < 1

and

{sup}_{i \geq 1} V a r (X_{i}) < \infty

. Hence,

X_{i}

is a so-called stationary process (see [1]). Throughout this paper, we consider the stationary solution of the first-order autoregressive process, which can be represented as

X_{i} = \sum_{j = 0}^{\infty} ρ^{j} ϵ_{i - j} .

(2)

If

ϵ_{1}, ϵ_{2}, \dots, ϵ_{n},

were observed, then the estimation of their CDF, F, can be obtained by the empirical CDF as follows:

F_{n} (t) = \frac{1}{n} \sum_{i = 1}^{n} I (ϵ_{i} \leq t), t \in R .

(3)

For any t,

F_{n} (t)

is the uniformly minimum variance unbiased estimator of

F (t)

for all continuous distribution functions, F. It is also well known as the Glivenko–Cantelli theorem that

sup_{t \in R} | F_{n} (t) - F (t) | \to 0, a . s .,

where “a.s.” stands for “almost surely” (see Gut [2]). The law of the iterated logarithm (LIL) for

F_{n} (t)

, i.e.,

\underset{n \to \infty}{lim sup} \sqrt{\frac{n}{2 log (log n)}} sup_{t \in R} | F_{n} (t) - F (t) | = \frac{1}{2}, a . s .

was obtained by Smirnov [3] and, independently, Chung [4]. Cai and Roussas [5] extended LIL of

F_{n} (t)

to the case that

ϵ_{1}, ϵ_{2}, \dots, ϵ_{n}

were

α

-mixing. Since the empirical distribution function is a step function and thus discontinuous, smooth estimators are often preferred to provide continuous and differentiable approximations that are more suitable for statistical inference and further analysis. Therefore, Yamato [6] proposed the following kernel estimator:

F_{n h} (t) = \frac{1}{n} \sum_{i = 1}^{n} \int_{- \infty}^{t} K_{h} (u - ϵ_{i}) d u, t \in R,

(4)

where

K_{h} (u) = K (u / h) / h

, h is the bandwidth and K is a kernel probability density function (PDF). Cai and Roussas [5] also studied the uniform strong rate of convergence for

{sup}_{t \in R} | F_{n h} (t) - F (t) | = O ({(log (log n) / n)}^{1 / 2})

; a.s., Cheng [7] obtained the uniform strong rate of convergence for

{sup}_{t \in R} n^{α} | F_{n h} (t) - F (t) | \to 0

, a.s., for any

0 \leq α < 1 / 2

.

But in autoregressive model (1), only the random variables

X_{1}, X_{2}, \dots, X_{n}

are observed. Therefore, in order to obtain an estimator of

F (t)

, we modify the definition

F_{n h} (t)

in (4) by plugging in the residuals

{\hat{ϵ}}_{i} = X_{i} - {\hat{ρ}}_{n} X_{i - 1}, 1 \leq i \leq n,

(5)

where

{\hat{ρ}}_{n} = {\hat{ρ}}_{n} (X_{1}, \dots, X_{n})

is an estimator of

ρ

. For example, the least squares estimator of

ρ

is

{\hat{ρ}}_{n} = \frac{\sum_{i = 1}^{n} X_{i} X_{i - 1}}{\sum_{i = 1}^{n} X_{i - 1}^{2}} .

Therefore, the residual empirical CDF is given by

{\tilde{F}}_{n} (t) = \frac{1}{n} \sum_{i = 1}^{n} I ({\hat{ϵ}}_{i} \leq t), t \in R,

(6)

and the residual kernel estimation of F is

{\hat{F}}_{n h} (t) = \frac{1}{n} \sum_{i = 1}^{n} \int_{- \infty}^{t} K_{h} (u - {\hat{ε}}_{i}) d u, t \in R .

(7)

Under the i.i.d. errors, Cheng [8] considered the first-order autoregressive model (1) and verified that the Glivenko–Cantelli theorem still holds for the estimators

{\hat{F}}_{n} (t)

and

{\hat{F}}_{n h} (t)

, i.e.,

sup_{t \in R} | {\hat{F}}_{n} (t) - F (t) | \to 0, sup_{t \in R} | {\hat{F}}_{n h} (t) - F (t) | \to 0, a . s .

Beyond the classical LIL under the supremum norm, recent research has extended the LIL framework to other normed settings. In particular, Cheng [9] considered the first-order autoregressive model (1) and obtained the integrated absolute error (i.e.,

L_{1}

-norm) of

{\hat{F}}_{n h}

,

\underset{n \to \infty}{lim sup} \sqrt{\frac{n}{2 log (log n)}} \int_{- \infty}^{\infty} | {\hat{F}}_{n h} (t) - F (t) | d F (t) = \frac{\sqrt{3}}{6}, a . s .

(8)

Motivated by these findings, our work focuses on the LIL for the residual kernel estimator

{\hat{F}}_{n h} (t)

in the first-order autoregressive model (1) with

α

-mixing errors, i.e.,

\underset{n \to \infty}{lim sup} \sqrt{\frac{n}{2 log (log n)}} sup_{t \in R} | {\hat{F}}_{n h} (t) - F (t) | = \frac{1}{2}, a . s .

(9)

Let us recall the definition of an

α

-mixing sequence.

Definition 1.

Denote

N = {1, 2, \dots, n, \dots}

. Let

F_{m}^{n} = σ (ϵ_{i}, m \leq i \leq n, i \in N)

be the σ-field generated by random variables

{ϵ_{m}, ϵ_{m + 1}, \dots, ϵ_{n}}, 1 \leq m \leq n

. For

n \geq 1

, we define

α (n) = sup_{m \in N} sup_{A \in F_{1}^{m}, B \in F_{m + n}^{\infty}} | P (A B) - P (A) P (B) | .

If

α (n) ↓ 0

as

n \to \infty

,

{ϵ_{n}, n \geq 1}

is called a strong mixing or α-mixing sequence.

The

α

-mixing condition is a type of weak dependence commonly used in time series analysis. It measures how quickly the dependence between past and future observations decays as the time gap increases. For more properties of

α

-mixing sequences, one can refer to Györfi [10], Roussas [11], Fan and Yao [12], Wang et al. [13], etc.

For more research on LIL, one can refer to Gajek et al. [14] and Cheng [15] for

L_{p}

-norms of the empirical distribution function

F_{n} (t)

and kernel distribution function

F_{n h} (t)

; Li and Wang [16] and Petrov [17] investigated the LIL for the sequences of dependent random variables; Liu and Zhang [18] studied the LIL for error density estimators in nonlinear autoregressive models, etc.

In the context of estimating the error distribution function, this paper makes three main contributions. First, we consider a linear first-order autoregressive model in which the autoregressive coefficient

ρ

must be estimated rather than assumed known. Second, we study kernel-based nonparametric estimators for the error distribution, providing smoother approximations than the empirical distribution function and enabling refined probabilistic analysis. Third, we explicitly allow for dependent errors modeled by a stationary

α

-mixing sequence, extending classical results beyond the independent error setting.

It is worth noting that for the i.i.d. case, the LIL for the empirical distribution function case has been well established by Smirnov [3] and Chung [4]. For kernel estimators of the error distribution, related results were obtained by Niu [19] and Cheng [20]. Our work extends these classical results to the dependent error setting, motivated in part by earlier studies on autoregressive processes such as that by Wang et al. [21], bridging the gap between the i.i.d. theory and time series models with weakly dependent innovations.

We also remark that alternative weak dependence structures could be considered. For example, if the errors are independent, the convergence rates simplify, and some technical conditions can be relaxed. If

α

-mixing is replaced by other forms of weak dependence, such as

β

-mixing,

ρ

-mixing, or martingale difference sequences or the weak dependence framework of Doukhan and Louhichi [22], we expect similar LIL results to hold under appropriate moment and dependence assumptions, though the technical details and proof techniques would require adaptation. A full exploration of these generalizations is beyond the scope of the present paper, but they constitute interesting directions for future research.

The rest of this paper is organized as follows: in Section 2, we list some basic assumptions for the first-order autoregressive model (1). The main results of a uniform strong rate of

{sup}_{t \in R} | {\hat{F}}_{n h} (t) - F_{n h} (t) |

and LIL of (9) are presented in Section 3. In order to check our results, some simulations are presented in Section 4. Some technical Lemmas and proofs are presented in Section 5. Last, we provide a conclusion and discussion in Section 6. Throughout the paper, we assume that limits are taken as

n \to \infty

unless otherwise specified. The

C, C_{1}, and C_{2}

denote some positive constants not depending on n, which may be different in various places.

2. Assumptions

This section presents the basic assumptions required for model (1) as well as the main theorems.

(A1): In model (1), let the sequence ${ϵ_{i}, - \infty < i < \infty}$ of errors form a stationary sequence of $α$ -mixing with unknown CDF F with a bounded second-order derivative, i.e., there exists a positive constant C ( $0 < C < \infty$ ) such that $| F^{″} (t) | \leq C$ for all $t \in R$ . Assume $E ϵ_{1} = 0$ and $E | ϵ_{1} |^{p} < \infty$ for some $p > 2$ . The mixing coefficient satisfies $α (n) = O (n^{- r})$ , where $r > \frac{5 p + 2}{p - 2}$ .
(A2): The kernel function $K (t), t K (t)$ , and $t^{2} K (t)$ are integrable over the real line and satisfy

$0 \leq inf_{t \in R} K (t) \leq sup_{t \in R} K (t) < \infty, \int_{- \infty}^{+ \infty} K (t) d t = 1, \int_{- \infty}^{+ \infty} t K (t) d t = 0, \int_{- \infty}^{+ \infty} t^{2} K (t) d t < \infty .$
(A3): The bandwidth h satisfies the following conditions:

$h \to 0, n h^{4} / log (log n) \to 0, n h^{2} / {(log n)}^{2} \to \infty .$
(A4): In model (1) with $| ρ | < 1$ , let the ${\hat{ρ}}_{n}$ be an estimator of $ρ$ with the following almost-surely (a.s.) property: there exists a positive constant C such that

$| {\hat{ρ}}_{n} - ρ | \leq C n^{- 1 / 2} {(log (log n))}^{1 / 2}, a . s .$

Remark 1.

We discuss these assumptions as follows:

(i): Assumption (A1) contains the smoothness of the error CDF F and the boundness of its second derivative $F^{″}$ . It also requires the mixing coefficients satisfy $α (n) = O (n^{- r})$ , where $r > \frac{5 p + 2}{p - 2}$ and $p > 2$ (p-th moment of error). If $p \to 2$ , then $r \to \infty$ . It is a strong condition of mixing coefficients, which require the errors to be asymptotically independent. If $p \to \infty$ , then $r \to 5$ . This requires a strong p-th moment of errors. In future research, we will try to relax the mixing coefficient condition. For more properties of α-mixing sequences, one can refer to Györfi [10], Roussas [11], Fan and Yao [12], Wang et al. [13], etc.
(ii): Assumption (A2) is the common condition of kernel $K (\cdot)$ about non-negatively, symmetry, and integrability. Obviously, the probability density functions Gaussian kernel $K (t) = {(2 π)}^{- \frac{1}{2}} exp (- \frac{t^{2}}{2})$ and Epanechnikov kernel $K (t) = \frac{3}{20 \sqrt{5}} (5 - t^{2}) I (t \leq \sqrt{5})$ conform to Assumption (A2). See for example, Györfi et al. [10], Roussas [11], Fan and Yao [12], Li and Racine [23], etc.
(iii): Assumption (A3) is a standard condition for bandwidth selection in kernel density estimation. For example, similar conditions are used by Cheng et al. [7,9,15], particularly in the proof of Lemma 5. A commonly used bandwidth is $h = n^{- 1 / 4}$ .
(iv): Assumption (A4) requires that the estimator $\hat{ρ}$ of the autoregressive parameter ρ converges almost surely at the rate $O (n^{- 1 / 2} {(log (log n))}^{1 / 2})$ . This condition plays a key role in our theoretical results, as it ensures that the effect of estimating ρ is asymptotically negligible. Without such a rate, the additional error introduced by residuals could dominate the behavior of the kernel estimator and invalidate the uniform LIL established in this paper. When $ϵ_{1}, ϵ_{2}, \dots, ϵ_{n}$ are i.i.d. errors, Koul and Zhu [24] considered a generalized M-estimator for p-th order autoregression models and obtained a strong rate of convergence for M-estimators ${\hat{ρ}}_{n}$ , i.e., $| | {\hat{ρ}}_{n} - ρ | | \leq C n^{- 1 / 2} {(log (log n))}^{1 / 2}$ , a.s., including the least-squares estimator; Cheng [9] used Assumption (A4) to study the $L_{1}$ -norm of ${\hat{F}}_{n h}$ in (8). When $ϵ_{1}, ϵ_{2}, \dots, ϵ_{n}$ in the first-order autoregression model (1) are α-mixing random variables with common density function f and $| ρ | < 1$ , Gao et al. [25] used the condition $| {\hat{ρ}}_{n} - ρ | = O_{p} (n^{- 1 / 2})$ to study the asymptotic normality of the kernel density estimator ${\hat{f}}_{n} (t) = \frac{1}{n h} \sum_{i = 1}^{n} K_{h} (u - {\hat{ε}}_{i})$ for the error density function f. Wu et al. [26] extended the work to nonlinear autoregressive models. In the proof by Gao et al. [25], moment inequalities were used to establish the convergence in probability. However, to extend these results to almost-sure convergence, it would be necessary to employ exponential inequalities. This in turn requires additional conditions on the α-mixing coefficients, ensuring that the covariance structure of the α-mixing sequence satisfies certain summability and decay properties. The proof of such a result is highly technical and involves delicate handling of the dependence structure. Because this is beyond the scope of the present paper, we treat Assumption (A4) as a standing assumption rather than providing a full proof here, and we leave this as an important topic for our future research.

3. Main Results

We now present the first main result of this section, which describes the uniform strong rate of convergence between

{\hat{F}}_{n h} (t)

and

F_{n h} (t)

in the stationary autoregressive model (1).

Theorem 1.

In the model (1), let Assumptions (A1)–(A4) hold. Then

\sqrt{\frac{n}{log (log n)}} sup_{t \in R} | {\hat{F}}_{n h} (t) - F_{n h} (t) | = o (1), a . s .

where

{\hat{F}}_{n h} (t)

is defined in (7) and

F_{n h} (t)

is defined by (6).

By Theorem 1, one can get the uniform strong rate

sup_{t \in R} | {\hat{F}}_{n h} (t) - F_{n h} (t) | = o ({(log (log n) / n)}^{1 / 2}), a . s .

Combining the LIL of

F_{n h} (t)

in Lemma 6, i.e.,

\underset{n \to \infty}{lim sup} \sqrt{\frac{n}{2 log (log n)}} sup_{t \in R} | F_{n h} (t) - F (t) | = \frac{1}{2}, a . s .,

we present the LIL for the residual kernel estimator

{\hat{F}}_{n h} (t)

as follows:

Theorem 2.

Under the same conditions as in Theorem 1, we have

\underset{n \to \infty}{lim sup} \sqrt{\frac{n}{2 log (log n)}} sup_{t \in R} | {\hat{F}}_{n h} (t) - F (t) | = \frac{1}{2}, a . s .

Remark 2.

In Theorem 3.2 of Cai and Roussas’s study [5], Cai and Roussas obtained the LIL with α-mixing samples such that

\underset{n \to \infty}{lim sup} \sqrt{\frac{n}{2 log (log n)}} sup_{t \in R} | F_{n} (t) - F (t) | = 1, a . s .

They had a typo error in estimation (17) that was equal to 1 a.s. In fact, they mistakenly cited [27] [Corollary 1.15.1] and claimed that

\underset{n \to \infty}{lim sup} [\frac{1}{n} \sqrt{\frac{n}{2 log (log n)}} sup_{t \in R} | K (t, n) |] = 1, a . s .

But it is

\underset{n \to \infty}{lim sup} [\frac{1}{n} \sqrt{\frac{n}{2 log (log n)}} sup_{t \in R} | K (t, n) |] = \frac{1}{2}, a . s .,

(see the proof of Lemma 4). Thus, Theorem 2 extends the conclusion of LIL to the kernel estimator

{\hat{F}}_{n h} (t)

of error CDF

F (t)

in autoregressive model (1) based on α-mixing errors.

4. Simulations

To verify the theoretical results, we conduct simulation studies to assess the uniform convergence behavior of four distribution estimators under different error distribution settings, namely Gaussian and Gamma. The goal is to compare the accuracy of these estimators in approximating the true error distribution function, particularly in terms of the iterated logarithm convergence rate.

We consider the first-order autoregressive model:

X_{i} = 0.5 X_{i - 1} + ϵ_{i}, 1 \leq i \leq n,

where

{ϵ_{i}}

is a dependent error sequence following either a Gaussian distribution or a Gamma distribution.

Gaussian errors: $ϵ \sim N (0, Σ_{n})$ , where $Σ_{n}$ is a Toeplitz covariance matrix with entries ${| i - j |}^{- 3}$ , ensuring that the errors form an $α$ -mixing sequence. A Toeplitz matrix is a matrix in which each descending diagonal from left to right is constant. For more details about the Toeplitz matrix, we can refer to Trench [28]. As discussed in [29], it is easy to check that $ϵ_{1}, ϵ_{2}, \dots$ are $α$ -mixing with $α (n) = O (n^{- 3})$ .
Gamma errors: we adopt a Gaussian copula-based approach to generate random vectors with exact Gamma marginals and an approximately specified covariance matrix. Specifically, we first generate a multivariate normal vector $Z \sim N (0, R)$ , where R is the correlation matrix corresponding to the target covariance matrix $Σ_{n}$ . Each component of Z is then transformed to a uniform random variable via the standard normal CDF and is subsequently mapped to a Gamma(4,1) random variable using the Gamma inverse CDF. This method guarantees the correct marginal Gamma distribution, while the resulting sample covariance matrix is very close to $Σ_{n}$ , with only small differences. An exact covariance match can be achieved using the NORTA (Normal To Anything) method, which iteratively adjusts the Gaussian correlation matrix to perfectly reproduce the target covariance after the nonlinear transformation. However, NORTA is computationally intensive, so we do not adopt it here, given that the Gaussian copula-based approach already provides sufficient accuracy and efficiency.

For each sample size

n \in {300, 600, 2000, 3000, 4000, 5000}

, we perform 100 Monte Carlo simulations. In each replication, the following four estimators of the error distribution function are constructed:

$F_{n} (t)$ : empirical distribution function based on the true errors;
$F_{n h} (t)$ : kernel-smoothed distribution estimator based on true errors;
${\tilde{F}}_{n} (t)$ : empirical distribution based on residuals (from least squares estimation);
${\hat{F}}_{n h} (t)$ : kernel-smoothed distribution based on residuals (from least-squares estimation).

To measure the estimation accuracy, we compute the Kolmogorov–Smirnov (KS) distance between each estimator and the true distribution function and normalize it using the theoretical upper bound under the law of the iterated logarithm:

N K S = \sqrt{\frac{n}{2 log (log n)}} KS distance .

Define the NKS for the estimators,

\begin{matrix} N K S_{1} & = \sqrt{\frac{n}{2 log (log n)}} sup_{t \in R} | F_{n} (t) - F (t) |, \\ N K S_{2} & = \sqrt{\frac{n}{2 log (log n)}} sup_{t \in R} | F_{n h} (t) - F (t) |, \\ N K S_{3} & = \sqrt{\frac{n}{2 log (log n)}} sup_{t \in R} | {\tilde{F}}_{n} (t) - F (t) |, \\ N K S_{4} & = \sqrt{\frac{n}{2 log (log n)}} sup_{t \in R} | {\hat{F}}_{n h} (t) - F (t) | . \end{matrix}

A Gaussian kernel is used for smoothing, with bandwidth

h = n^{- 1 / 3}

. The estimators are evaluated over the range

t \in [- 3, 3]

.

To investigate the effect of the bandwidth h on the performance of the proposed kernel estimator, we conducted a sensitivity analysis for two representative sample sizes,

n = 600

and

n = 3000

. For

n = 600

, the bandwidth h was taken from the set

{0.01, 0.05, 0.10, 0.15, 0.20, 0.202, 0.25, 0.30, 0.35, 0.40}

, where

h = 0.202

corresponds exactly to the suggested choice

n^{- 1 / 4}

with

n = 600

. For

n = 3000

, the bandwidth h was chosen from the set

{0.01, 0.05, 0.10, 0.135, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40}

, where

h = 0.13

corresponds to the suggested choice

n^{- 1 / 4}

with

n = 3000

.

Figure 1 and Figure 2 present the corresponding boxplots of the statistic

N K S_{4}

for these different bandwidths under the Gaussian error setting, with Figure 1 corresponding to the sample size

n = 600

and Figure 2 corresponding to the sample size

n = 3000

. For

n = 600

, although there is mild fluctuation across different bandwidths, the boxplot corresponding to

h = 0.202 = n^{- 1 / 4}

shows that its median is closer to zero and the interquartile range is under 0.5, indicating that the estimator performs more stably and with less bias at this recommended bandwidth. For

n = 3000

, the variability of

N K S_{4}

across bandwidths is smaller. Again, the boxplot for

h = 0.135 = n^{- 1 / 4}

is closer to zero, further supporting the theoretical bandwidth selection. These findings confirm that

N K S 4

is not sensitive to bandwidth variation, and they provide strong empirical evidence that the bandwidth

h = n^{- 1 / 4}

offers a good kernel estimator for CDF F. Therefore, all subsequent simulation results reported in this paper are based on the bandwidth

h = n^{- 1 / 4}

to ensure consistency and comparability across experiments. Figure 3, Figure 4, Figure 5 and Figure 6 display boxplots of

N K S

across different sample sizes and estimation methods. Each sample size includes 100 simulation replicates per method. The vertical axis represents the NKS, and a horizontal dashed line at level 0.5 is included to indicate the asymptotic upper bound suggested by the iterated logarithm law.

Figure 3 presents the NKS distances for the empirical estimator

F_{n} (t)

and the kernel estimator

F_{n h} (t)

, both based on true Gaussian errors. Each boxplot summarizes the distribution of NKS across 100 replications. We observe that the upper edge of the boxes (the 75th percentile) for both estimators stays well below the 0.5 theoretical boundary, even at the smallest sample size

n = 300

. As the sample size increases, the entire box shrinks downward, indicating uniform improvement in estimator accuracy. The kernel estimator

F_{n h} (t)

consistently shows a tighter spread and a lower upper quartile than

F_{n} (t)

, highlighting the advantage of kernel smoothing in reducing estimation variance under Gaussian error structures.

Figure 4 shows the NKS distances for the residual-based estimators

{\tilde{F}}_{n} (t)

and

{\hat{F}}_{n h} (t)

, derived from least-squares residuals under the same Gaussian error process. The results are similar to those in Figure 3. The upper quartile of the NKS remains consistently below 0.5 for both methods across all sample sizes. Again, the kernel estimator

{\hat{F}}_{n h} (t)

shows superior concentration, with its upper edge often well below the 0.5 threshold, even at

n = 300

. This suggests that residual-based estimators, particularly when smoothed, maintain excellent uniform convergence properties under Gaussian dependence, nearly matching the performance of error-based methods.

Figure 5 and Figure 6 report the results under Gamma-distributed and dependent errors. Although the Gamma distribution is skewed and heavy-tailed, the overall performance of the estimators remains similar to that observed under Gaussian errors. All estimators exhibit convergence as the sample size increases, and kernel-based methods continue to outperform empirical ones in terms of reduced dispersion and tighter concentration around the theoretical upper bound.

The simulation results confirm the validity of the iterated logarithm law for both error-based and residual-based estimators under dependent data. Among all the methods, the kernel-based estimators consistently demonstrate superior stability and accuracy across different error distributions and sample sizes.

5. Proofs of the Main Results

Lemma 1

(Hall and Heyde [30], Corollary A.2). Suppose that ξ and η are random variables which are

F_{1}^{t}

and

F_{t + τ}^{\infty}

-measurable, respectively,

t, τ \in N

, and that

{E | η |}^{p} < \infty

,

{E | ξ |}^{q} < \infty

, where

p, q > 1

and

\frac{1}{p} + \frac{1}{q} < 1

. Then,

| E ξ η - E ξ E η | \leq {8 (E | ξ |}^{p})^{1 / p} {(E | η |}^{q})^{1 / q} {(α (G, H))}^{1 - \frac{1}{p} - \frac{1}{q}} .

Lemma 2

(Liebscher [31], Proposition 5.1). Let

{Z_{n}}_{n \geq 1}

be a stationary α-mixing sequence with mixing coefficient

α (n)

. Assume that

E Z_{i} = 0

and

| Z_{i} | \leq S < \infty

,

a . s .

,

i = 1, 2, \dots, n

. Then, for

n, h \in N

,

0 < h \leq n / 2

and every

ε > 0

,

P (| \sum_{i = 1}^{n} Z_{i} | > ε) \leq 4 exp \{- \frac{ε^{2}}{16} {(n h^{- 1} D_{h} + \frac{1}{3} ε S h)}^{- 1}\} + 32 \frac{S}{ε} n α (h),

where

D_{h} = max_{1 \leq j \leq 2 h} Var (\sum_{i = 1}^{j} Z_{i})

.

Lemma 3.

In the first-order autoregression (1), let Assumption (A1) hold. Then for any permutation

{j_{1}, j_{2}, \dots, j_{n}}

of

{1, 2, \dots, n}

, we have

max_{1 \leq m < n} | \sum_{k = 1}^{m} X_{j_{k}} | = O (n^{1 / 2} log n), a . s .

(10)

Proof of Lemma 3.

Since

| ρ | < 1

, by (2), the stationary process

{X_{i}}_{i = 1}^{n}

in the first-order autoregression (1) has a representation of

X_{i} = \sum_{j = 0}^{\infty} ρ^{j} ϵ_{i - j}

,

n \geq 1

. Let

I_{i} (m) = 1

if

i \in {j_{1}, \dots, j_{m}}

and

I_{i} (m) = 0

otherwise. Then,

max_{1 \leq m \leq n} | \sum_{i = 1}^{n} X_{j_{i}} | \leq max_{1 \leq m \leq n} | \sum_{i = 1}^{n} I_{i} (m) \sum_{j = 0}^{n} ρ^{j} ϵ_{i - j} | + max_{1 \leq m \leq n} | \sum_{i = 1}^{n} I_{i} (m) \sum_{j > n} ρ^{j} ϵ_{i - j} | .

By

p > 2

,

E ϵ_{1} = 0

,

E | ϵ_{1} |^{p} < \infty

, it has

V a r (ϵ_{1}) = σ_{ϵ}^{2} < \infty

. According to the Chebyshev inequality and Cauchy inequality, we have

\begin{matrix} \sum_{n = 1}^{\infty} P (max_{1 \leq m \leq n} | \sum_{i = 1}^{n} I_{i} (m) \sum_{j > n} ρ^{j} ϵ_{i - j} | \geq n^{1 / 2} log n) \\ \leq & \sum_{n = 1}^{\infty} \frac{1}{n {(log n)}^{2}} \sum_{m = 1}^{n} E {(\sum_{i = 1}^{n} I_{i} (m) \sum_{j > n} ρ^{j} ϵ_{i - j})}^{2} \\ \leq & \sum_{n = 1}^{\infty} \frac{n^{2} σ_{ϵ}^{2}}{n {(log n)}^{2}} \{\sum_{j > n} ρ^{2 j} + 2 \sum_{k = 1}^{n - 1} \sum_{j > n, j + k > n} | ρ^{2 j + k} |\} \\ \leq & \sum_{n = 1}^{\infty} \frac{σ_{ϵ}^{2}}{n {(log n)}^{2}} {(n Σ_{j > n} {| ρ |}^{j})}^{2} < \infty, \end{matrix}

by using the fact

| ρ | < 1

. So, it follows from the Borel–Cantelli Lemma that

max_{1 \leq m \leq n} | \sum_{i = 1}^{n} I_{i} (m) \sum_{j > n} ρ^{j} ϵ_{i - j} | = O (n^{1 / 2} log n), a . s .

For

p > 2

, set

p_{n l} = ϵ_{l} I (| ϵ_{l} {| \leq (2 n)}^{1 / p}), q_{n l} = ϵ_{l} - p_{n l} = ϵ_{l} I (| ϵ_{l} | > {(2 n)}^{1 / p}) .

Then,

\begin{matrix} max_{1 \leq m \leq n} |\sum_{i = 1}^{n} I_{i} (m) \sum_{j = 0}^{n} ρ^{j} ϵ_{i - j}| = & max_{1 \leq m \leq n} |\sum_{l = 1 - n}^{n} \sum_{i = max (1, l)}^{min (n, n + l)} I_{i} (m) ρ^{i - l} ϵ_{l}| \\ \leq & max_{1 \leq m \leq n} |\sum_{l = 1 - n}^{n} \sum_{i = max (1, l)}^{min (n, n + l)} I_{i} (m) ρ^{i - l} p_{n l}| \\ + max_{1 \leq m \leq n} |\sum_{l = 1 - n}^{n} \sum_{i = max (1, l)}^{min (n, n + l)} I_{i} (m) ρ^{i - l} q_{n l}| . \end{matrix}

Observe that

\begin{matrix} max_{1 \leq m \leq n} |\sum_{l = 1 - n}^{n} \sum_{i = max (1, l)}^{min (n, n + l)} I_{i} (m) ρ^{i - l} q_{n l}| \leq & (\sum_{j = 0}^{\infty} {| ρ |}^{j}) \sum_{l = 1 - n}^{n} | ϵ_{l} | I (| ϵ_{l} | > {(2 n)}^{1 / p}) \\ \leq & C \sum_{i = 1}^{2 n} | ϵ_{i - n} | I (| ϵ_{i - n} | > {(2 n)}^{1 / p}) . \end{matrix}

In the following, we show that

\sum_{i = 1}^{2 n} | ϵ_{i - n} | I (| ϵ_{i - n} | > {(2 n)}^{1 / p})

converges to a positive and finite random variable as

n \to \infty

. Let

{a_{n}, n \geq 1}

be a positive constant sequence with

\sum_{n = 1}^{\infty} a_{n} < \infty

. Let

{ξ_{n}, n \geq 1}

be a sequence of random variables satisfying

\sum_{n = 1}^{\infty} P (| ξ_{n + 1} - ξ_{n} | > a_{n}) < \infty .

Then

ξ_{n} \to ξ

a.s.

n \to \infty

, where

ξ

is a positive and finite random variable. For more details, see Wang [32] [Proposition 2.4.2]. Note that

E | ϵ_{1} |^{p} < \infty

if and only if

\sum_{i = 1}^{\infty} P (| ϵ_{1} | > i^{1 / p}) < \infty

. Let

ξ_{n} = \sum_{i = 1}^{2 n} |ϵ_{i - n}| I \{| ϵ_{i - n} {| > (2 n)}^{1 / p}\}

and

a_{n} = 1 / n^{2}

, which satisfies

\sum_{n = 1}^{\infty} a_{n} < \infty

. Then we have

\begin{matrix} \sum_{n = 1}^{\infty} P (| ξ_{n + 1} - ξ_{n} | > a_{n}) \\ = & \sum_{n = 1}^{\infty} P (|\sum_{i = 1}^{2 n + 2} |ϵ_{i - n - 1}| I \{| ϵ_{i - n - 1} {| > (2 n + 2)}^{1 / p}\} - \sum_{i = 1}^{2 n} |ϵ_{i - n}| I \{| ϵ_{i - n} {| > (2 n)}^{1 / p}\}| > a_{n}) \\ \leq & \sum_{n = 1}^{\infty} P (|\sum_{i = 1}^{2 n + 2} |ϵ_{i - n - 1}| I \{| ϵ_{i - n - 1} {| > (2 n)}^{1 / p}\} - \sum_{i = 1}^{2 n} |ϵ_{i - n}| I \{| ϵ_{i - n} {| > (2 n)}^{1 / p}\}| > a_{n}) \\ = & \sum_{n = 1}^{\infty} P (||ϵ_{- n}| I \{| ϵ_{- n} {| > (2 n)}^{1 / p}\} + |ϵ_{n + 1}| I \{| ϵ_{n + 1} {| > (2 n)}^{1 / p}\}| > a_{n}) \\ \leq & \sum_{n = 1}^{\infty} P (||ϵ_{- n}| I \{| ϵ_{- n} {| > (2 n)}^{1 / p}\}| + ||ϵ_{n + 1}| I \{| ϵ_{n + 1} {| > (2 n)}^{1 / p}\}| > a_{n}) \\ \leq & \sum_{n = 1}^{\infty} (P (||ϵ_{- n}| I \{| ϵ_{- n} {| > (2 n)}^{1 / p}\}| > \frac{a_{n}}{2}) + P (||ϵ_{n + 1}| I \{| ϵ_{n + 1} {| > (2 n)}^{1 / p}\}| > \frac{a_{n}}{2})) \\ = & \sum_{n = 1}^{\infty} (P (|ϵ_{- n}| > \frac{a_{n}}{2}, | ϵ_{- n} | > {(2 n)}^{1 / p}) + P (|ϵ_{n + 1}| > \frac{a_{n}}{2}, | ϵ_{n + 1} | > {(2 n)}^{1 / p})) \\ \leq & \sum_{n = 1}^{\infty} (P (| ϵ_{- n} {| > (2 n)}^{1 / p}) + P (| ϵ_{n + 1} {| > (2 n)}^{1 / p})) \\ \leq & 2 \sum_{n = 1}^{\infty} P (| ϵ_{1} | > n^{1 / p}) < \infty . \end{matrix}

Then there exists a positive and finite random variable

ξ

such that

ξ_{n} = \sum_{i = 1}^{2 n} | ϵ_{i - n} |

I (| ϵ_{i - n} | > {(2 n)}^{1 / p}) \to ξ

, a.s.,

n \to \infty

. Consequently, we can obtain

max_{1 \leq m \leq n} |\sum_{l = 1 - n}^{n} \sum_{i = max (1, l)}^{min (n, n + l)} I_{i} (m) ρ^{i - l} q_{n l}| = O (n^{1 / 2} log n), a . s .

In addition, as

E ϵ_{1} = 0

and

E | ϵ_{1} |^{p} < \infty

, we have

\begin{matrix} max_{1 \leq m \leq n} |\sum_{l = 1 - n}^{n} \sum_{i = max (1, l)}^{min (n, n + l)} I_{i} (m) ρ^{i - l} E p_{n l}| \leq & C (\sum_{j = 0}^{\infty} {| ρ |}^{j}) \sum_{l = 1 - n}^{n} E | ϵ_{l} | I (| ϵ_{l} | > {(2 n)}^{1 / p}) \\ = & O (n^{1 / 2} log n) . \end{matrix}

To prove (10), we will show that

max_{1 \leq m \leq n} |\sum_{l = 1 - n}^{n} \sum_{i = max (1, l)}^{min (n, n + l)} I_{i} (m) ρ^{i - l} (p_{n l} - E p_{n l})| = O (n^{1 / 2} log n), a . s .

(11)

Note that

\begin{matrix} max_{1 - n \leq l \leq n} |\sum_{i = max (1, l)}^{min (n, n + l)} I_{i} (m) ρ^{i - l} (p_{n l} - E p_{n l})| \leq (\sum_{j = 0}^{\infty} {| ρ |}^{j}) 2 n^{1 / p} \leq C_{1} n^{1 / p}, a . s ., \end{matrix}

and by Assumption (A1) and Lemma 1, we have

\begin{matrix} V a r (\sum_{l = 1 - k}^{k} \sum_{i = max (1, l)}^{min (n, n + l)} I_{i} (m) ρ^{i - l} (p_{n l} - E p_{n l})) \leq (1 + \sum_{i = 1}^{\infty} α {(i)}^{\frac{p - 2}{p}}) (E | ϵ_{1} {|^{p})}^{2 / p} {(\sum_{j = 0}^{\infty} {| ρ |}^{j})}^{2} 2 k \\ \leq C_{2} k, for k \leq n . \end{matrix}

It implies that

D_{h} = max_{1 \leq k \leq h} V a r (\sum_{l = 1 - k}^{k} \sum_{i = max (1, l)}^{min (n, n + l)} I_{i} (m) ρ^{i - l} (p_{n l} - E p_{n l})) \leq C_{3} h,

where

0 < h \leq n

. Since

p > 2

and

r > \frac{5 p + 2}{p - 2}

according to Assumption (A1), it has

3 / 2 + 1 / p - (1 / 2 - 1 / p) r < - 1

. Then, by

α (n) = O (n^{- r})

, we take

h = n^{1 / 2 - 1 / p}

in Lemma 2 and have that

\begin{matrix} \sum_{n = 1}^{\infty} P (max_{1 \leq m \leq n} |\sum_{l = 1 - n}^{n} \sum_{i = max (1, l)}^{min (n, n + l)} I_{i} (m) ρ^{i - l} (p_{n l} - E p_{n l})| \geq C n^{1 / 2} log n) \\ \leq & \sum_{n = 1}^{\infty} \sum_{m = 1}^{n} P (|\sum_{l = 1 - n}^{n} \sum_{i = max (1, l)}^{min (n, n + l)} I_{i} (m) ρ^{i - l} (p_{n l} - E p_{n l})| \geq C n^{1 / 2} log n) \\ \leq & \sum_{n = 1}^{\infty} [4 n exp \{- \frac{C^{2} n {(log n)}^{2}}{16} {(n h^{- 1} D_{h} + \frac{1}{3} C n^{1 / 2} log n C_{1} n^{1 / p} h)}^{- 1}\} + 32 n \frac{C_{1} n^{1 / p}}{C n^{1 / 2} log n} n α (h)] \\ \leq & \sum_{n = 1}^{\infty} [4 n exp \{- \frac{C^{2} n {(log n)}^{2}}{16} {(C_{4} n + C_{5} n log n)}^{- 1}\} + C_{6} \frac{n^{2 + 1 / p}}{n^{1 / 2} log n} n^{- (1 / 2 - 1 / p) r}] \\ \leq & \sum_{n = 1}^{\infty} [4 n exp \{- C_{7} log n\} + C_{6} n^{3 / 2 + 1 / p - (1 / 2 - 1 / p) r} {log}^{- 1} n] < \infty, \end{matrix}

for some positive

C_{7} > 2

. So, by the Borel–Cantelli Lemma, the proof of (11) is completed.

□

Remark 3.

Gao [33] and Sun et al. [34] considered a moving-average process, denoted by

M A (\infty)

, i.e.,

X_{n} = \sum_{l = 0}^{\infty} ϕ_{l} ε_{n - l},

(12)

with

\sum_{l = 0}^{\infty} | ϕ_{l} | < \infty

and i.i.d.

{ε_{n}}

errors; they obtained that for any permutation

(l_{1}, \dots, l_{n})

of

(1, \dots, n)

,

max_{1 \leq k \leq n} | \sum_{i = 1}^{k} X_{l_{i}} | = O (n^{1 / 2} log n), a . s .

(13)

Lemma 4.6 by Liang et al. [35] extended the result (13) of the linear process (12) with the i.i.d. case to the negatively associated case. In Lemma 3, we extend the result (13) to the stationary process of the first-order autoregression model (1) with α-mixing errors.

Before proceeding, we recall the definition of the Kiefer process [36].

Definition 2.

A separable Gaussian process

K (s, t)

on

[0, 1] \times [0, \infty)

is called a Kiefer process if it satisfies

K (0, t) = K (1, t) = K (s, 0) = 0, 0 \leq s \leq 1, t \geq 0,

with mean zero

E [K (s, t)] = 0,

and covariance function

E {K (s, t) K (s^{'}, t^{'})} = min (t, t^{'}) Γ (s, s^{'}),

where

Γ (s, s^{'})

is the covariance function of a separable Gaussian process

G (s)

on

[0, 1]

, with

G (0) = G (1) = 0

.

The Kiefer process generalizes the Brownian bridge to two parameters and naturally arises in the asymptotic study of empirical and residual processes, including LIL.

Lemma 4.

Let

ϵ_{1}, ϵ_{2}, . . ., ϵ_{n}

be an α-mixing sample with PDF F. Let Assumption (A1) hold. Then the LIL for the empirical distribution function

F_{n} (t)

is

\underset{n \to \infty}{lim sup} \sqrt{\frac{n}{2 log (log n)}} sup_{t \in R} | F_{n} (t) - F (t) | = \frac{1}{2}, a . s .,

where

F_{n} (t)

is defined in (3).

Proof of Lemma 4.

Similarly to Cai and Roussas [5], define the empirical process:

R (t, s) = \sum_{k \leq s} g_{k} (t), t \in R, s \geq 0,

(14)

where

g_{k} (t) = I (ϵ_{k} \leq t) - F (t)

. Then the covariance function is

Γ (t, t^{'}) = E [g_{1} (t) g_{1} (t^{'})] + \sum_{k = 2}^{\infty} E [g_{1} (t) g_{k} (t^{'})] + \sum_{k = 2}^{\infty} E [g_{k} (t) g_{1} (t^{'})], t, t^{'} \in R

On an enriched probability space, by Theorem 3 by Dhompongsa [36], we can redefine the process to admit a Kiefer process

{K (t, s), t \in R, s ⩾ 0}

with covariance

E [K (t, s) K (t^{'}, s^{'})] = Γ (t, t^{'}) min (s, s^{'})

such that for some

λ > 0

, depending only on r,

sup_{0 ⩽ s ⩽ T} sup_{t \in R} | R (t, s) - K (t, s) | = O (T^{\frac{1}{2}} {(log T)}^{- λ}), a . s ., T > 0 .

(15)

On the one hand, it follows from (14) and (15) that

\underset{n \to \infty}{lim sup} [\sqrt{\frac{n}{2 log (log n)}} sup_{t \in R} | F_{n} (t) - F (t) |] = \underset{n \to \infty}{lim sup} [\frac{1}{n} \sqrt{\frac{n}{2 log (log n)}} sup_{t \in R} | K (t, n) |],

(16)

(see Cai and Roussas [5]). On the other hand, one uses Corollary 1.15.1 in [27] and obtains

\underset{n \to \infty}{lim sup} [\frac{1}{n} \sqrt{\frac{n}{2 log (log n)}} sup_{t \in R} | K (t, n) |] = \frac{1}{2}, a . s .

(17)

Combining (16) with (17), we complete the proof of Lemma 4. □

Lemma 5.

In model (1), let Assumptions (A1)–(A3) be satisfied. Then

sup_{t \in R} \sqrt{\frac{n}{log (log n)}} | F_{n h} (t) - F_{n} (t) | \to 0, a . s .

(18)

Proof of Lemma 5.

Cheng [15] used the Kiefer process to obtain (18) under the i.i.d sample assumption. By the Kiefer process, Cai and Roussas [5] obtained the uniform approximation of the empirical process

F_{n} (t)

to

F (t)

under the

α

-mixing samples (see Lemma 4). It can be seen that the proof of (18) in Cheng [15] also holds for the case of

α

-mixing. □

Lemma 6.

In model (1), let Assumptions (A1)–(A3) be satisfied. Then

\underset{n \to \infty}{lim sup} \sqrt{\frac{n}{2 log (log n)}} sup_{t \in R} | F_{n h} (t) - F (t) | = \frac{1}{2}, a . s .

(19)

Proof of Lemma 6.

By the triangle inequality,

| F_{n h} (t) - F (t) | \leq | F_{n h} (t) - F_{n} (t) | + | F_{n} (t) - F (t) |,

(20)

| F_{n h} (t) - F (t) | \geq - | F_{n h} (t) - F_{n} (t) | + | F_{n} (t) - F (t) | .

Thus,

sup_{t \in R} | F_{n h} (t) - F (t) | \leq sup_{t \in R} | F_{n h} (t) - F_{n} (t) | + sup_{t \in R} | F_{n} (t) - F (t) |,

(21)

sup_{t \in R} | F_{n h} (t) - F (t) | \geq - sup_{t \in R} | F_{n h} (t) - F_{n} (t) | + sup_{t \in R} | F_{n} (t) - F (t) | .

Combining (20) and (21) with Lemmas 4 and 5, we obtain the result of (19). □

Proof of Theorem 1.

From (1) and (5), it follows

{\hat{ϵ}}_{i} - ϵ_{i} = - ({\hat{ρ}}_{n} - ρ) X_{i - 1}

,

1 \leq i \leq n

and

X_{0} = 0

. Substituting this into the definitions of

F_{n h} (t)

and

{\hat{F}}_{n h} (t)

, we use the mean-value theorem and obtain

\begin{matrix} | {\hat{F}}_{n h} (t) - F_{n h} (t) | & = & |\frac{1}{n} \sum_{i = 1}^{n} [\int_{- \infty}^{t} K_{h} (t - {\hat{ϵ}}_{i}) d t - \int_{- \infty}^{t} K_{h} (t - ϵ_{i}) d t]| \\ = & |\frac{1}{n} \sum_{i = 1}^{n} [\frac{({\hat{ρ}}_{n} - ρ) X_{i - 1}}{h} K (η_{i t})]| \leq \frac{| {\hat{ρ}}_{n} - ρ |}{n h} |\sum_{i = 1}^{n} X_{i - 1} K (η_{i t})|, \end{matrix}

(22)

where

K (\cdot)

is defined in Assumption (A2) and

η_{i t}

is a point between

\frac{t - {\hat{ϵ}}_{i}}{h}

and

\frac{t - ϵ_{i}}{h}

. We now consider the term

| \sum_{i = 1}^{n} X_{i - 1} K (η_{i t}) |

in (22). Let

Y_{i t} = K (η_{i t})

,

1 \leq i \leq n

. By Assumption (A2),

Y_{1 t}, \dots, Y_{n t}

are non-negative and bounded. So, we sort

Y_{1 t}, \dots, Y_{n t}

by

Y_{(1 t)} \leq \dots \leq Y_{(n t)}

and denote the corresponding parts of

X_{0}, \dots, X_{n - 1}

as

X_{0 *}, \dots, X_{n - 1, *}

. Obviously, one can use Abel’s inequality (see [37]) and establish that

\begin{matrix} | \sum_{i = 1}^{n} X_{i - 1} K (η_{i t}) | & = & | \sum_{i = 1}^{n} Y_{i t} X_{i - 1} | = | \sum_{i = 1}^{n} Y_{(i t)} X_{i - 1, *} | \\ \leq & max_{1 \leq k \leq n} | Y_{(k t)} | max_{1 \leq k \leq n} | \sum_{i = 1}^{k} X_{i - 1, *} | \leq sup_{t \in R} K (t) | max_{1 \leq k \leq n} | \sum_{i = 1}^{k} X_{i - 1, *} | . \end{matrix}

(23)

Thus, it follows from Assumption (A4), Lemma 3, and (23) that

\begin{matrix} sup_{t \in R} | {\hat{F}}_{n h} (t) - F_{n h} (t) | & \leq & \frac{| {\hat{ρ}}_{n} - ρ |}{n h} | sup_{t \in R} | \sum_{i = 1}^{n} X_{i - 1} K (η_{i t}) | \\ = & \frac{O (n^{- 1 / 2} {(log (log n))}^{1 / 2})}{n h} O (n^{1 / 2} log n) \\ = & O (\sqrt{\frac{{(log n)}^{2}}{n h^{2}}} \sqrt{\frac{log (log n)}{n}}) = o (\sqrt{\frac{log (log n)}{n}}), \end{matrix}

proving

n h^{2} / {(log n)}^{2} \to \infty

. Consequently,

\sqrt{\frac{n}{log (log n)}} sup_{t \in R} | {\hat{F}}_{n h} (t) - F_{n h} (t) | = o (1),

i.e., (1) holds. □

Proof of Theorem 2.

Similarly to the proof of Lemma 6, by the triangle inequality,

| {\hat{F}}_{n h} (t) - F (t) | \leq | {\hat{F}}_{n h} (t) - F_{n h} (t) | + | F_{n h} (t) - F (t) |,

| {\hat{F}}_{n h} (t) - F (t) | \geq - | {\hat{F}}_{n h} (t) - F_{n h} (t) | + | F_{n h} (t) - F (t) |,

sup_{t \in R} | {\hat{F}}_{n h} (t) - F (t) | \leq sup_{t \in R} | {\hat{F}}_{n h} (t) - F_{n h} (t) | + sup_{t \in R} | F_{n h} (t) - F (t) |,

sup_{t \in R} | {\hat{F}}_{n h} (t) - F (t) | \geq - sup_{t \in R} | {\hat{F}}_{n h} (t) - F_{n h} (t) | + sup_{t \in R} | F_{n h} (t) - F (t) | .

Combining Lemma 6 with Theorem 1, we complete the proof of Theorem 2. □

6. Conclusions

In this paper, we investigated the asymptotic behavior of kernel estimators for the error distribution in first-order autoregressive models (1) when the error sequence is a stationary

α

-mixing process with an unknown distribution. Due to the unobservable nature of the true errors, we proposed a residual kernel estimator constructed from residuals and examined its convergence properties under a set of mild regularity conditions.

Our main theoretical contribution lies in establishing the LIL for the residual kernel distribution estimator. Specifically, we proved that under appropriate mixing, moment, and bandwidth conditions, the supremum norm of the deviation between the residual kernel estimator and the true error distribution function satisfies

lim sup_{n \to \infty} \sqrt{\frac{n}{2 log (log n)}} sup_{t \in R} | {\hat{F}}_{n h} (t) - F (t) | = \frac{1}{2}, a . s .

which parallels the classical LIL for i.i.d. sequences.

Moreover, we derived an intermediate result demonstrating that the difference between the residual kernel estimator and its infeasible counterpart based on true errors converges to zero at the same logarithmic rate. This ensures that the influence of estimation error in residuals is asymptotically negligible under our conditions. These theoretical findings were further corroborated by extensive simulation studies under both Gaussian and Gamma distributed

α

-mixing errors, which consistently showed that kernel estimators exhibit smoother behavior and smaller fluctuations compared to their empirical counterparts, especially in moderate-to-large sample sizes.

In addition to the simulation study, our proposed method has strong potential for practical applications in real data analysis. For instance, in economics and finance, AR(1) models are widely used to model interest rates, exchange rates, and stock returns, where the error terms often exhibit temporal dependence and heavy-tailed behavior. The residual-based kernel estimator provides a flexible tool for accurately estimating the error distribution in such contexts, which can improve model diagnostics, forecasting, and risk management. Future work will include applying this methodology to real-world datasets, such as macroeconomic indicators or climate time series, to demonstrate its practical utility.

The proposed approach has several strengths. It extends classical nonparametric distribution estimation to dependent data settings while maintaining desirable asymptotic properties. The kernel smoothing step improves finite-sample stability and produces smoother estimates that are more suitable for inference and visualization. However, the method also has limitations. For example, the theoretical results rely on the assumption of

α

-mixing errors with specific decay rates, which may not hold for certain strongly dependent processes. Additionally, while the study focuses on AR(1) models, many practical systems are better described by higher-order AR(p), MA(q), and ARMA(

p, q

) processes.

An important open question is how to extend the present results to general AR(p), MA(q), and ARMA(

p, q

) processes. The main difficulty lies in handling the increased complexity of parameter estimation and residual dependence when multiple lagged terms are present. Developing a unified theoretical framework for these models would significantly broaden the applicability of the method and provide deeper insights into the asymptotic behavior of residual-based estimators. Other promising directions include exploring data-driven bandwidth selection procedures, incorporating robust kernel functions to handle outliers, and applying the methodology to multivariate or nonlinear time series.

In summary, the residual kernel estimator studied in this paper provides a theoretically sound and practically useful tool for error distribution estimation in AR(1) models with dependent errors. With further research and development, it has the potential to be extended to more complex autoregressive structures and to offer valuable insights for both theoretical and applied settings.

Author Contributions

Supervision W.Y.; software B.W. and Y.J.; writing—original draft preparation, L.W., X.S. and W.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China (12301327) and the Anhui Province University Research Project (2023AH050096).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Brockwell, P.J.; Davis, R.A. Time Series: Theory and Methods, 2nd ed.; Springer: New York, NY, USA, 1991. [Google Scholar]
Gut, A. Probability: A Graduate Course, 2nd ed.; Springer: New York, NY, USA, 2013. [Google Scholar]
Smirnov, N.V. An approximation to distribution laws of random quantities determined by empirical data. Uspekhi Mat. Nauk. 1944, 10, 179–206. [Google Scholar]
Chung, K.L. An estimate concerning the Kolmogorov limit distribution. Am. Math. Soc. 1949, 67, 36–50. [Google Scholar]
Cai, Z.W.; Roussas, G.G. Uniform strong estimation under α-mixing, with rates. Stat. Probab. Lett. 1992, 15, 47–55. [Google Scholar] [CrossRef]
Yamato, H. Uniform convergence of an estimator of a distribution function. Bull. Math. Statist. 1973, 15, 69–78. [Google Scholar] [CrossRef] [PubMed]
Cheng, F. Strong uniform consistency rates of kernel estimators of cumulative distribution functions. Commun. Stat. Theory Methods 2017, 46, 6803–6807. [Google Scholar] [CrossRef]
Cheng, F. Glivenko-Cantelli Theorem for the kernel error distribution estimator in the first-order autoregressive mode. Stat. Probab. Lett. 2018, 139, 95–102. [Google Scholar] [CrossRef]
Cheng, F. The integrated absolute error of the kernel error distribution estimator in the first-order autoregression model. Stat. Probab. Lett. 2024, 214, 110215. [Google Scholar] [CrossRef]
Györfi, L.; Härdle, W.; Vieu, P. Nonparametric Curve Estimation from Time Series; Springer: New York, NY, USA, 1989. [Google Scholar]
Roussas, G.G. Nonparametric regression estimation under mixing conditions. Stoch. Process. Their Appl. 1990, 36, 107–116. [Google Scholar] [CrossRef]
Fan, J.Q.; Yao, Q.W. Nonlinear Time Series: Nonparametric and Parametric Methods; Springer: New York, NY, USA, 2003. [Google Scholar]
Wang, J.Y.; Liu, R.; Cheng, F.X.; Yang, L.J. Oracally efficient estimation of autoregressive error distribution with simultaneous confidence band. Ann. Statist. 2014, 42, 654–668. [Google Scholar] [CrossRef]
Gajek, L.; Kahszka, M.; Lenic, A. The law of the iterated logarithm for L_p-norms of empirical processes. Stat. Probab. Lett. 1996, 28, 107–110. [Google Scholar] [CrossRef]
Cheng, F. The law of the iterated logarithm for L_p-norms of kernel estimators of cumulative distribution functions. Mathematics 2024, 12, 1063. [Google Scholar] [CrossRef]
Li, Y.X.; Wang, J.F. The law of the iterated logarithm for positively dependent random variables. J. Math. Anal. Appl. 2008, 339, 259–265. [Google Scholar] [CrossRef][Green Version]
Petrov, V.V. On the law of the iterated logarithm for sequences of dependent random variables. Vestnik St. Petersb. Univ. Math. 2017, 50, 32–34. [Google Scholar] [CrossRef][Green Version]
Liu, T.Z.; Zhang, Y. Law of the iterated logarithm for error density estimators in nonlinear autoregressive models. Commun. Stat. Theory Methods 2019, 49, 1082–1098. [Google Scholar] [CrossRef]
Niu, S.L. LIL for kernel estimator of error distribution in regression model. J. Korean Math. Soc. 2007, 44, 1082–1098. [Google Scholar] [CrossRef]
Cheng, F. A law of the iterated logarithm for error density estimator in censored linear regression. J. Nonparametr. Stat. 2022, 34, 283–298. [Google Scholar] [CrossRef]
Wang, Y.; Mao, M.; Hu, X.; He, T. The Law of Iterated Logarithm for Autoregressive Processes. Math. Probl. Eng. 2014, 2014, 972712. [Google Scholar] [CrossRef]
Doukhan, P.; Louhichi, S. A new weak dependence condition and applications to moment inequalities. Stochastic Processes Appl. 1999, 84, 312–342. [Google Scholar] [CrossRef]
Li, Q.; Racine, J.S. Nonparametric Econometrics: Theory and Practice; Princeton University Press: Princeton, NJ, USA, 2007. [Google Scholar]
Koul, H.L.; Zhu, Z.W. Bahadur-Kiefer representations for M-estimators in autoregression models. Stochastic Process. Appl. 1995, 57, 167–189. [Google Scholar] [CrossRef]
Gao, M.; Yang, W.Z.; Wu, S.P.; Yu, W. Asymptotic normality of residual density estimator in stationary and explosive autoregressive models. Comput. Stat. Data An. 2022, 175, 107549. [Google Scholar] [CrossRef]
Wu, S.P.; Yang, W.Z.; Gao, M.; Fang, H.Y. Asymptotic results of error density estimator in nonlinear autoregressive models. J. Korean Stat. Soc. 2024, 53, 563–582. [Google Scholar] [CrossRef]
Csörgo, M.; Révész, P. Strong Approximation in Probability and Statistics; Academic Press: New York, NY, USA, 1981. [Google Scholar]
Gray, R.M. On the asymptotic eigenvalue distribution of Toeplitz matrices. IEEE Trans. Inf. Theory 1972, 18, 725–730. [Google Scholar] [CrossRef]
Withers, C.S. Conditions for linear processes to be strong-mixing. Z. Wahrscheinlichkeitstheorie Verw. Gebiete 1981, 57, 477–480. [Google Scholar] [CrossRef]
Hall, P.; Heyde, C.C. Martingale Limit Theory and Its Application; Academic Press: New York, NY, USA, 1980. [Google Scholar]
Liebscher, E. Estimation of the density and the regression function under mixing conditions. Stat. Decis. 2001, 19, 9–126. [Google Scholar] [CrossRef]
Wang, J.G. Fundamentals of Modern Probability Theory; Fudan University Press: Shanghai, China, 2005. [Google Scholar]
Gao, J.T. Asymptotic theory for partly linear models. Commun. Stat. Theory Methods 1995, 24, 1985–2009. [Google Scholar] [CrossRef]
Sun, X.Q.; You, J.H.; Chen, G.M.; Zhou, X. Convergence rates of estimators in partial linear regression models with MA(∞) error process. Commun. Stat. Theory Methods 2002, 31, 2251–2273. [Google Scholar] [CrossRef]
Liang, H.Y.; Mammitzsch, V.; Steinebach, J. On a semiparametric regression model whose errors form a linear process with negatively associated innovations. Statistics 2006, 40, 207–226. [Google Scholar] [CrossRef]
Dhompongsa, S. A note on the almost sure approximation of the empirical process of weakly dependent random vectors. Yokohama Math. J. 1984, 32, 113–121. [Google Scholar]
Mitrinovic, D.S. Analytic Inequalities; Springer: New York, NY, USA, 1970. [Google Scholar]

Figure 1. Boxplot of

N K S_{4}

under different bandwidth choices for sample size

n = 600

in the Gaussian error setting.

Figure 1. Boxplot of

N K S_{4}

under different bandwidth choices for sample size

n = 600

in the Gaussian error setting.

Figure 2. Boxplot of

N K S_{4}

under different bandwidth choices for sample size

n = 3000

in the Gaussian error setting.

Figure 2. Boxplot of

N K S_{4}

under different bandwidth choices for sample size

n = 3000

in the Gaussian error setting.

Figure 3. Boxplot of NKS distance for error-based estimators under Gaussian errors.

N K S_{1}

and

N K S_{2}

are the NKS distances for

F_{n} (t)

and

F_{n h} (t)

, respectively.

Figure 3. Boxplot of NKS distance for error-based estimators under Gaussian errors.

N K S_{1}

and

N K S_{2}

are the NKS distances for

F_{n} (t)

and

F_{n h} (t)

, respectively.

Figure 4. Boxplot of NKS distance for residual-based estimators under Gaussian errors.

N K S_{3}

and

N K S_{4}

are the NKS distances for

{\tilde{F}}_{n} (t)

and

{\hat{F}}_{n h} (t)

, respectively.

Figure 4. Boxplot of NKS distance for residual-based estimators under Gaussian errors.

N K S_{3}

and

N K S_{4}

are the NKS distances for

{\tilde{F}}_{n} (t)

and

{\hat{F}}_{n h} (t)

, respectively.

Figure 5. Boxplot of NKS distance for error-based estimators under Gamma errors.

N K S_{1}

and

N K S_{2}

are the NKS distances for

F_{n} (t)

and

F_{n h} (t)

, respectively.

Figure 5. Boxplot of NKS distance for error-based estimators under Gamma errors.

N K S_{1}

and

N K S_{2}

are the NKS distances for

F_{n} (t)

and

F_{n h} (t)

, respectively.

Figure 6. Boxplot of NKS distance for residual-based estimators under Gamma errors.

N K S_{3}

and

N K S_{4}

are the NKS distances for

{\tilde{F}}_{n} (t)

and

{\hat{F}}_{n h} (t)

, respectively.

Figure 6. Boxplot of NKS distance for residual-based estimators under Gamma errors.

N K S_{3}

and

N K S_{4}

are the NKS distances for

{\tilde{F}}_{n} (t)

and

{\hat{F}}_{n h} (t)

, respectively.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, B.; Jin, Y.; Wang, L.; Shi, X.; Yang, W. The Law of the Iterated Logarithm for the Error Distribution Estimator in First-Order Autoregressive Models. Axioms 2025, 14, 784. https://doi.org/10.3390/axioms14110784

AMA Style

Wang B, Jin Y, Wang L, Shi X, Yang W. The Law of the Iterated Logarithm for the Error Distribution Estimator in First-Order Autoregressive Models. Axioms. 2025; 14(11):784. https://doi.org/10.3390/axioms14110784

Chicago/Turabian Style

Wang, Bing, Yi Jin, Lina Wang, Xiaoping Shi, and Wenzhi Yang. 2025. "The Law of the Iterated Logarithm for the Error Distribution Estimator in First-Order Autoregressive Models" Axioms 14, no. 11: 784. https://doi.org/10.3390/axioms14110784

APA Style

Wang, B., Jin, Y., Wang, L., Shi, X., & Yang, W. (2025). The Law of the Iterated Logarithm for the Error Distribution Estimator in First-Order Autoregressive Models. Axioms, 14(11), 784. https://doi.org/10.3390/axioms14110784

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Law of the Iterated Logarithm for the Error Distribution Estimator in First-Order Autoregressive Models

Abstract

1. Introduction

2. Assumptions

3. Main Results

4. Simulations

5. Proofs of the Main Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI