A Note on the Asymptotic Normality of the Kernel Deconvolution Density Estimator with Logarithmic Chi-Square Noise

This paper studies the asymptotic normality for the kernel deconvolution estimator when the noise distribution is logarithmic chi-square; both identical and independently distributed observations and strong mixing observations are considered. The dependent case of the result is applied to obtain the pointwise asymptotic distribution of the deconvolution volatility density estimator in discrete-time stochastic volatility models.


Introduction
Consider the measurement error model: where X is the signal, while ε is the noise.Assume X is independent of ε; X has density f X , and ε has density k, so the density of Y , denoted as f Y , is the convolution of f X and k: where the * denotes convolution.
Assume we observe the realizations Y 1 , . . ., Y n of Y and that the function k is fully known; one possible estimator for f X from the noisy observations Y 1 , . . ., Y n is the kernel deconvolution estimator: where: is the empirical characteristic function of density f Y , K(x) is a kernel function, φ K and φ k are the Fourier transform of K and k, respectively 1 .The kernel deconvolution estimator was first proposed for the measurement error model by Carroll and Hall [1] and Stefanski and Carroll [2].Define the kernel deconvolution function as follows: the kernel deconvolution estimator can be written compactly as: In this paper, I show the asymptotic normality for the estimator fX (x) when the distribution of ε is logarithmic chi-square.The asymptotic distribution of the kernel deconvolution estimator has been considered in Fan [3], Fan and Liu [4], Van Es and Uh [5] and Van Es and Uh [6] for identically independently distributed (i.i.d.) observations.Masry [7] and Kulik [8] consider various cases for the weakly-dependent observations.However, none of the above research allows the error distribution to be the logarithmic chi-square distribution.I consider both the identical and independently distributed (i.i.d.) observations and strong mixing observations in this paper, which complements the above-mentioned literature.
The results obtained in this paper can be applied to obtain the asymptotic distribution of the deconvolution volatility density estimator.The problem of estimating volatility density has been gaining increasing interest in econometrics in recent years; see, e.g., Van Es, Spreij, and Van Zanten [9] and Van Es, Spreij, and Van Zanten [10] for the kernel deconvolution estimator, Comte and Genon-Catalot [11] for the penalized projection estimator and Todorov and Tauchen [12] for the study in the context of high-frequency data.Kernel deconvolution with logarithmic chi-square noise arises naturally when estimating the volatility density in stochastic volatility (SV) models.Existing research (e.g., Van Es, Spreij, and Van Zanten [9] and Van Es, Spreij, and Van Zanten [10]) focuses on the convergence rates of the estimators, and the asymptotic distribution of the estimators is not available.The characteristic function of a random variable with density f is defined as In Section 2, I review the probabilistic properties of the logarithmic chi-square distribution; Section 3 presents the asymptotic normality of the estimator, for both i.i.d.observations and dependent observations; Section 4 discusses the application of the results to volatility density estimation in SV models; Section 5 concludes the paper.

Logarithmic Chi-Square Distribution
The logarithmic chi-square distribution is obtained by taking the logarithm of a chi-square distribution with degrees of freedom of one.The density function of logarithmic chi-square distribution is: The density function of the logarithmic chi-square distribution is asymmetric and is plotted in Figure 1.The characteristic function of the logarithmic chi-square distribution is: where Γ(.) is the gamma function.
Fan [3] studies the quadratic mean convergence rate of the kernel deconvolution estimator; it turns out that the convergence rate of the estimator depends heavily on the type of error distribution.In particular, it is determined by the tail behaviour of the modulus of the characteristic function of the error distribution: the faster the modulus function goes to zero in the tail, the slower the converge rate.The following lemma, which is from Van Es, Spreij, and Van Zanten [10], gives the tail behaviour of |φ k (t)|.Lemma 1. (Lemma 5.1 of Van Es, Spreij, and Van Zanten [10]) For |t| → ∞, we have: and: From (3), it is known that the modulus of φ k (t) decays exponentially fast as |t| → ∞.It thus belongs to the super-smooth density according to the classification in Fan [13].According to Fan [13], the optimal convergence rate of the estimator is  From ( 4) and ( 5), it is known that in both tails, neither the real part nor imaginary part of the characteristic function can dominate the other; this violates the assumptions in the previous works by, e.g., Fan [3] and Masry [7], on studying the asymptotic normality; for super-smooth error distributions, these papers assume either the real part or the imaginary part to be dominant.

Asymptotic Normality
In this paper, I consider one particular kernel function, namely the sinc kernel function: (C1) The sinc kernel function is defined as: with Fourier transform 2 : φ K (t) = I{|t| 1}.

2
In this paper, I follow the convention to define the Fourier transform of a function f to be The sinc kernel function is favoured in theoretical literature because of the simplicity of its Fourier transform and is thus used here. 3

i.i.d. Observations
In this section, I prove the asymptotic normality of the estimator when the observations are i.i.d.
Theorem 1.When the observations are i.i.d. and ε is distributed as logarithmic chi-square, if Assumption (C1) holds, when exp (1/h) /n → 0 as n → ∞ and h → 0, it holds that, , Proof.Denote: First: Usually, for practical implementations, the following kernels: with Fourier transform: are used because they have better numerical properties; see Delaigle and Gijbels [14] for the discussions.
Second, I evaluate Var Z 1 , where the last equality is obtained because ).The latter result is shown as follows, where M is a very big number.The first term in the brackets is a constant depending on M ; the order of the second term can be evaluated as follows, where I use the fact that when M is big, |φ k (u)| can be replaced by its asymptotic approximation.
The second term clearly dominates the first term, which is a constant, such that: Here, I use the argument of Butucea [15] to split the integral and show that the tail part of the integral dominates.
A sufficient condition for asymptotic normality is the Lyapunov condition, which reduces to: for i.i.d.data.
For an upper bound for the numerator, Now, notice the result from Van Es, Spreij, and Van Zanten [10] and Masry [16] that, for p > 2 4 , An upper bound for ν h ∞ is easy to get, as 5 : 2 is known from (7), such that: for p > 2. Therefore, take p = 2 + δ and use the result in (9); it then holds that: this, together with (6), implies that Lyapunov's condition (8) holds, which completes the proof.

4
This is easy to see by noticing

Strong Mixing Observations
In this section, I consider the model: where X's realizations of X 1 , • • • , X n are strictly stationary and strong mixing, while the noise realizations ε 1 , • • • , ε n are i.i.d.logarithmic chi-square variables, independent of X, such that the observations Y 1 , • • • , Y n are also strictly stationary and strong mixing.
There are various concepts of dependence; here, I consider the case of α mixing, also called strong mixing, which is the weakest among all of the dependence concepts.
be an infinite sequence of strictly stationary random variables and F j i be the σ-algebra generated by {X t , i t j}; then, the α-mixing coefficient is defined as: For the dependent case, a bounded assumption on the joint density of observations is also needed.
(C2) The probability density function of any joint distribution (Y i , Y j ), 1 i < j n, exists and is bounded by a constant.Now, I give the asymptotic normality theorem.Notice that the mixing assumption here is a litter weaker than that in Masry [7].
Theorem 2. In model (11), let X 1 , X 2 , • • • , X n be strictly stationary, α-mixing with: for some δ > 2; the noises ε 1 , • • • , ε n are i.i.d.logarithmic chi-square variables, independent of X; if (C1) and (C2) hold, when exp (1/h) /n → 0 as n → ∞ and h → 0, then: Proof.First, by strict stationarity and using the ergodic theorem for strong mixing sequences, similarly as in the proof of Theorem 1, Next, the variance of the estimator is evaluated; first: Knowing from Theorem 1 that the first term is: For the covariance term, first notice: as h → 0. Now, because: where C is a constant; continuing on ( 14), I get: On the other hand, using the assumption on the α-mixing coefficients and the covariance inequality for strong mixing sequence in Proposition 2.5 in Fan and Yao [17], for δ > 2, Therefore, using ( 15) and ( 16), if one chooses m n = 1 h|log h| , then m n → ∞ and m n h → 0, then obviously the first term is o exp π h ; the second term is also o exp π h by noticing the mixing assumption in (12).Then, it is shown that: From ( 13) and ( 17), it then follows that: Now, I prove the central limit theorem, for which I use the classical large block-small block argument of proving the central limit theorem for the dependent sequence.First, I make some normalizations, define σ 0 = 1 2π 2 exp π h f Y (x) 1/2 , and: then Z j has mean zero and unit variance and: σ 0 ; and it will be shown that: which is the result that needed to be shown.First, the set {1, • • • , n} is partitioned into 2k n + 1 subsets with large blocks of size l n and small blocks of size s n , such that k n = n/(l n + s n ) , so the last remaining block has size n − k n (l n + s n ).
The sizes are such that l n → ∞, s n → ∞, l n /s n → ∞.Then, we can write: where: which are the sum of large blocks, small blocks and the last block, respectively.Then, as a standard procedure for the small block-big block argument, I show the following: for ∀ε > 0. (18) and (19) say that the small blocks and the last block are of smaller order.(20) says that the large blocks are as if independent in the sense of the characteristic function.Then, ( 21) and ( 22) are the Lindeberg-Feller condition for the asymptotic normality for kn j=1 ξ j under independence.For (18) and (19), using the moment inequality for the α-mixing sequence in Proposition 2.7 (i) in Fan and Yao [17], it can be shown that: notice that the conditions for Proposition 2.7 (i) are satisfied, because by (10), E Z j δ < ∞ for δ > 2; and the mixing assumption (12) implies that α(j) ab) ; take δ = ab and q = 2b, so the mixing condition is also satisfied.
For (20), using the covariance inequality in Proposition 2.6 in Fan and Yao [17], we have: this is o(1) by choosing for example ), such that for some q > 1, obviously, the above expression is o(1) by the assumption that exp (1/h) /n → 0, so (20) is proven.For Feller's condition (21), first use the same strategy as calculating the variance of the estimator; it holds that: for any j, because ξ j is also an infinite sum of the observations.Therefore, Finally, for Lindeberg's condition (22), first observe that: where I first use Holder's inequality and then Markov's inequality.Using again the moment inequality for the strong mixing sequence in Proposition 2.7 in Fan and Yao [17], If we want to recover the density f σ of log σ 2 t i from the observations {log y 2 t i }, this is a problem of deconvolution with logarithmic chi-square error, and the kernel deconvolution estimator can be used.Van Es, Spreij, and Van Zanten [9] and Van Es, Spreij, and Van Zanten [10] first noticed this connection.Define Z j := log y 2 j ; they use the following estimator to recover f σ (x), where φ K is the Fourier transform of a kernel function K and φ k (t) is the characteristic function of the log χ 2 1 variable.Van Es, Spreij, and Van Zanten [9] and Van Es, Spreij, and Van Zanten [10] derive the convergence rate of the estimator, but a central limit theorem is missing.
If we assume the observed return sequence {Z j }, j = 1, • • • , n is generated by the SV model (23) with a strict stationary, the α-mixing volatility process satisfies (12) and i.i.d.errors; a simple application of Theorem 2 will lead to the following corollary.
Corollary 1.In the stochastic volatility model (23), when the volatility process {σ j }, j = 1, • • • , n is α-mixing with (12) satisfied, ε t i 's are i.i.d.N (0, 1), independent of the volatility process; when exp (1/h) /n → 0 as n → ∞ and h → 0, it holds that: Since the density f u (x) can be estimated with the observed return sequence {log y 2 t i } consistently using the classical kernel density estimator for any x (see e.g., Fan and Yao [17]), the above result can be used to construct pointwise confidence intervals for the kernel deconvolution density estimator.

Conclusions
In this paper, I have proven the asymptotic normality for the kernel deconvolution estimator with logarithmic chi-square noise.I consider both the case of identical and independently distributed observations and strong mixing observations.The results are applied to prove the asymptotic normality of the kernel deconvolution estimator for volatility density in stochastic volatility models.

Figure 1 .
Figure 1.Density function of the logarithmic chi-square distribution.
Figure 2 plots the modulus function |φ k | and its approximation √ 2e − 1 2 π|t| ; we notice that the two functions almost coincide at both tails.

Figure 2 .
Figure 2. Modulus function of the characteristic function of logarithmic chi-square distribution and its approximation: the higher peak curve is the approximating function √ 2e − 1 2 π|t| .