Next Article in Journal
Effects of Foreign Direct Investment and Trade on Labor Productivity Growth in Vietnam
Next Article in Special Issue
Higher-Order Risk–Returns to Education
Previous Article in Journal
The Trade Effect of the EU’s Preference Margins and Non-Tariff Barriers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Shannon Entropy Estimation for Linear Processes

1
Department of Statistics, University of Connecticut, Storrs, CT 06269, USA
2
Department of Mathematics, University of Mississippi, University, MS 38677, USA
*
Author to whom correspondence should be addressed.
J. Risk Financial Manag. 2020, 13(9), 205; https://doi.org/10.3390/jrfm13090205
Submission received: 17 August 2020 / Revised: 4 September 2020 / Accepted: 7 September 2020 / Published: 9 September 2020
(This article belongs to the Special Issue Nonparametric Econometric Methods and Application II)

Abstract

:
In this paper, we estimate the Shannon entropy S ( f ) = E [ log ( f ( x ) ) ] of a one-sided linear process with probability density function f ( x ) . We employ the integral estimator S n ( f ) , which utilizes the standard kernel density estimator f n ( x ) of f ( x ) . We show that S n ( f ) converges to S ( f ) almost surely and in Ł 2 under reasonable conditions.

1. Introduction

Let f ( x ) be the common probability density function of a sequence { X n } n = 1 of identically distributed observations. The associated Shannon entropy
S ( f ) = E [ log f ( X ) ] = f ( x ) log f ( x ) d x
of such an observation was first introduced by Shannon (1948). In his 1948 paper, Shannon utilized this tool in his mathematical investigation of the theory of communication. Today, entropy is widely applied in the fields of information theory, statistical classification, pattern recognition and so on, since it is a measure of the amount of uncertainty present in a probability distribution.
In the literature, several estimators for the Shannon entropy have been introduced. See Beirlant et al. (1997) for an overview. Many of these estimators have been studied in cases where the data are independent. In 1976, Ahmad and Lin (1976) obtained results using the resubstitution estimator H n = 1 n i = 1 n ln f n ( X i ) for independent data { X i } i = 1 n . In particular, he showed consistency in the first and second mean under certain regularity conditions. Here, f n ( x ) is the kernel density estimator. Dmitriev and Tarasenko (1973) reported results in 1973 for estimating functionals of the type H f ( x ) , f ( x ) , , f k ( x ) d x , where the common density f ( x ) of the independent X i is assumed to have at least k derivatives. Plugging in kernel density estimators (see their paper and references therein) for the arguments of H and integrating only over the symmetric interval [ k n , k n ] , which is determined by a sequence { k n } n = 1 of a certain order, they provided a result for the estimation of Shannon entropy using the estimator that Beirlant et al. (1997) refer to as the integral estimator. Their results give conditions for almost sure convergence.
Interestingly enough, Dmitriev and Tarasenko (1973) also provided (because their work is a more general investigation of functionals) a result for the estimation of the quadratic Rényi entropy Q ( f ) = f 2 ( x ) d x . Conditions are provided specifically for the almost sure convergence of their estimator to the true value Q ( f ) . The estimation of Rényi entropy for the dependent case is challenging. A dependent case is treated by Sang et al. (2018). They studied the estimation of the quadratic Entropy for the one-sided linear process. Utilizing the Fourier transform along with the projection method, they demonstrate that the kernel entropy estimator satisfies a central limit theorem for short memory linear processes.
To study the Shannon entropy for dependent data is also a challenging problem, and to the best of our knowledge, general results for the Shannon entropy estimation of regular time series data are still unknown. In this paper, we study the Shannon entropy S ( f ) for the one-sided linear process
X n = i = 0 a i ε n i ,
where the innovations ε i are independent and identically distributed real valued random variables on some probability space ( Ω , F , P ) with mean zero and finite variance σ ε 2 and where the collection { a i : i 0 } of real coefficients satisfies i = 0 a i 2 < . Additionally, we will require that the common density f ε ( x ) of the innovations be bounded. The estimator we utilize employs the kernel method, which was first introduced by Rosenblatt (1956); Parzen (1962). The kernel estimator will be denoted by
f n ( x ) = 1 n h n i = 1 n K x X i h n ,
where the sequence { h n } n = 1 provides the bandwidths, and K : R R is the kernel function which satisfies R K ( x ) d x = 1 . Typically, the kernel function is a probability density function.
This method has proven to be successful in estimating probability density functions and their derivatives, regression functions, etc., in both the independent and dependent setting. For the independent setting, see the books (Devroye and Györfi (1985); Silverman (1986); Nadaraya (1989); Wand and Jones (1995); Schimek (2000); Scott (2015)) and the references therein. For the dependent setting, we refer the reader to (Tran (1992); Honda (2000); Wu and Mielniczuk (2002); Wu et al. (2010)). Bandwidth selection is an important issue in kernel density estimation, and there is a lot of research in this direction. See, e.g., Duin (1976); Rudemo (1982); Slaoui (2014, 2018).
A few remarks about notation and terms used in the paper follow. Let { a n } n = 1 and { b n } n = 1 be real-valued sequences. By a n = o ( b n ) we understand that a n / b n 0 and a n = O ( b n ) means that lim sup | a n / b n | < C for some positive number C. Essentially, this is the standard Landau little oh and big oh notation. When we write, a n b n , we mean a n = o ( b n ) , and as one might guess, b n a n means a n b n . We also employ the notation a n b n to indicate that 0 < lim inf n a n b n lim sup n a n b n < . A function l : [ 0 , ) R is referred to as slowly varying (at ) if it is positive and measurable on [ A , ) for some A R + such that lim x l ( λ x ) / l ( x ) = 1 holds for each λ R + . The set of all functions g : R R which are Hölder continuous of some order r will be denoted as C r ( R ) . That is, for each g C r ( R ) there exists C g R + , such that for all x , x R , we have | g ( x ) g ( x ) | C g | x x | r , and when r = 1 , we recognize this as the well-known Lipschitz condition. The notation Ł p ( E ) with 0 < p < represents the set of all real-valued functions f defined on some measure space ( E , A , μ ) having the property that E | f ( x ) | p d μ < . In the case that E = R and unless otherwise specified, the measure μ is tacitly understood to be Lebesgue measure and A is assumed to contain the Borel sets. Ł ( E ) refers to the set of real-valued functions defined on E which are bounded almost everywhere. Whenever the domain space of the function is understood, we may simply write Ł p .
The following are bandwidth, kernel, and density conditions that we shall refer to throughout this paper:
B.1 
h n ( n 1 log n ) 1 5 ;
K.1 
K C ι ( R ) for some ι ( 0 , 1 ] is bounded with bounded support;
K.2 
u K ( u ) d u = 0 ;
D.1 
f ε , f ε , f ε Ł ( R ) ;
D.2 
f ε , f ε , f ε Ł 2 ( R ) ;
D.3 
f Ł ( R ) .
Notice that the bandwidth, kernel, and density conditions are prefixed using B, K, and D, respectively.
In this first section, we have provided an introduction to the problem, a survey of past research in this area, and the notation to be used throughout. The main results are reported in Section Two. In Section Three, we present the proofs of the main results. Finally, the Appendix A introduces the reader to foundational results, which will be required in the proof of our main results.

2. Main Results

If { ε i : i Z } is a sequence of independent and identically distributed random variables over a common probability space ( Ω , F , P ) in Ł q ( Ω ) for some q > 0 , E ε i = 0 when q 1 , and { a i } i = 0 is a sequence of real coefficients such that i = 0 | a i | 2 q < , then the linear process X n given in (2) exists and is well-defined. For the case q 2 where the innovations have finite variance, we say that the process has short memory (short-range dependence) if i = 0 | a i | < and i = 0 a i 0 and long memory (long-range dependence) otherwise. Throughout, we assume that each ε i Ł q with q 2 .
Let f ( x ) be the probability density function of the linear process X n = i = 0 a i ε n i , n N defined in (2).In this paper, we estimate the Shannon Entropy f ( x ) log f ( x ) d x of the linear process. To do this, we employ the integral estimator
S n ( f ) = A n f n ( x ) log f n ( x ) d x ,
where f n ( x ) is the standard kernel density estimator defined in (3). The (random) sets A n are given by
A n = { x R : 0 < γ n f n ( x ) } ,
where { γ n } n = 1 is an appropriately defined sequence in R + that converges to zero.
Our estimator utilizes the kernel method of density estimation, and we will accordingly require adherence of the kernel to certain conditions. In addition, we impose some conditions on the bandwidths and on some of the densities of the problem. These conditions were listed in the previous section. Based on these conditions, let us consider the properties of the estimator (4). We proceed in a manner similar to the analysis done by Bouzebda and Elhattab (2011) for the independent case.
Theorem 1.
Let { X n : n N } be the linear process given in (2), and assume that it has short memory. Furthermore, assume that S ( f ) is finite. If the bandwidth, kernel, and density conditions listed earlier are satisfied, then
lim sup n n γ n 5 log n 2 5 | S n ( f ) A n log f ( x ) f ( x ) d x |
is bounded almost surely whenever the condition γ n h n is imposed on the sequence { γ n } n = 1 .
Corollary 1.
If the conditions of Theorem 1 hold, then we have
lim n | S n ( f ) S ( f ) | = 0
almost surely.
Theorem 2.
Let { X n : n N } be the linear process given in (2), and assume that it has short memory. Furthermore, assume that S ( f ) is finite. If the bandwidth, kernel, and density conditions listed earlier are satisfied, then
lim sup n n γ n 5 log n 2 5 S n ( f ) A n log f ( x ) f ( x ) d x 2
is bounded whenever the condition γ n h n is imposed on the sequence { γ n } n = 1 .
Corollary 2.
If the conditions of Theorem 2 hold, then the mean squared error (MSE) satisfies
lim n MSE ( S n ( f ) ) = 0 .
Remark 1.
In this paper, we work on the entropy estimation for short memory linear processes by applying the integral method. It is interesting to know whether the similar results hold for long memory linear processes. It is also interesting to know whether the resubstitution method works for dependent data such as linear processes. However, the research in these directions are beyond the scope of this paper. We leave research in these directions for future work.
Remark 2.
In a wide range of disciplines, including finance, geology, and engineering, many time series may be modeled using a linear process. In such instances, our result provides a method for estimating the associated Shannon Entropy. One example is the discriminatory data on the arrival phases of earthquakes and explosions, which were captured at a seismic recording station. Another example is the data about returns on the New York Stock Exchange. See these and many other time series data in the book by Shumway and Stoffer (2011) and other books on time series.

3. Proofs

Lemma 1.
If the conditions of Theorem 1 (or Theorem 2) hold, then
sup x R | f n ( x ) f ( x ) | = O log n n 2 5
almost surely.
Proof. 
This lemma follows from Theorem 2 of Wu et al. (2010) (see their discussion immediately after the statement of Theorem 2 and in the penultimate paragraph of section 4.1). See also the discussion in the Appendix A on fundamental results. □
Lemma 2.
If the conditions of Theorem 1 (or Theorem 2) hold, then
γ n 5 log n n .
Proof. 
Because h n ( n 1 log n ) 1 5 , there exists C R + such that
h n 5 n 1 log n > C
for sufficiently large n. Therefore,
lim n γ n 5 n 1 log n = lim n γ n 5 h n 5 · h n 5 n 1 log n C lim n γ n h n 5
as n , from which (8) follows. □
Note. Our use of Lemma 2 in the proofs of Theorems 1 and 2 will be tacit.
Lemma 3.
If ν is a finite signed measure that is absolutely continuous with respect to a measure μ, then corresponding to every positive number ε there is a positive number δ such that | ν | ( E ) < ε whenever E is a measurable set for which μ ( E ) < δ .
Proof. 
This is a basic result from measure theory. See, for example, Theorem B of Halmos (1974) in section 30. □
Proof of Theorem 1.
We begin with the decomposition
S n ( f ) A n log f ( x ) f ( x ) d x = A n f n ( x ) log f n ( x ) d x + A n f ( x ) log f ( x ) d x = A n f n ( x ) log f n ( x ) d x + A n f ( x ) log f n ( x ) d x A n f ( x ) log f n ( x ) d x + A n f ( x ) log f ( x ) d x = I n , 1 + I n , 2 ,
where
I n , 1 : = A n f n ( x ) f ( x ) log f n ( x ) d x ,
and
I n , 2 : = A n f ( x ) log f n ( x ) log f ( x ) d x .
First, we consider I n , 1 . Using the inequality
| log z | z + 1 z
for z R + , we notice that for all x A n , we have
| log f n ( x ) | f n ( x ) + 1 f n ( x ) = 1 + 1 ( f n ( x ) ) 2 f n ( x ) 1 + 1 γ n 2 f n ( x ) .
It follows that
| I n , 1 | sup x R | f n ( x ) f ( x ) | A n | log f n ( x ) | d x 1 + 1 γ n 2 sup x R | f n ( x ) f ( x ) | ,
since f n ( x ) integrates to unity over the real line.
Next, we consider I n , 2 . Since the set over which we are integrating may be changed to A n { x : f ( x ) > 0 } without affecting the value of I n , 2 , we may assume that f is positive on A n . Using the inequality
log z | z 1 | + | z 1 1 |
for z R + , we notice that for all x A n , we have
| log f n ( x ) log f ( x ) | = | ln f n ( x ) f ( x ) | | f n ( x ) f ( x ) 1 | + | f ( x ) f n ( x ) 1 | = | f n ( x ) f ( x ) f ( x ) | + | f ( x ) f n ( x ) f n ( x ) | = 1 + f n ( x ) f ( x ) | f n ( x ) f ( x ) f n ( x ) | C γ n | f n ( x ) f ( x ) | ,
if we can justify the existence of C R + . To that end, define
ε n = sup x A n | f n ( x ) f ( x ) | ,
and note that for all x A n , we have
| 1 f ( x ) f n ( x ) | ε n f n ( x ) ε n γ n .
Taking the supremem over A n yields
sup x A n | 1 f ( x ) f n ( x ) | ε n γ n = γ n 1 ε n C γ n 1 / n log n 2 5 ,
by Lemma 1. Note that
γ n 1 = o n log n 2 5 ,
since
lim n γ n 1 n log n 2 5 = lim n log n n 2 5 γ n = lim n log n n γ n 5 log n n 1 1 5 = 0 .
This guarantees the existence we sought to establish. We continue with
| I n , 2 | C γ n sup x R | f n ( x ) f ( x ) | A n f ( x ) d x C γ n sup x R | f n ( x ) f ( x ) | ,
since f n ( x ) integrates to unity over the real line.
In view of (9), (10) and (12), we have shown that
| S n ( f ) A n log f ( x ) f ( x ) d x |
1 γ n 2 + C γ n + 1 sup x R | f n ( x ) f ( x ) | .
Therefore,
lim sup n n γ n 5 log n 2 5 | S n ( f ) A n log f ( x ) f ( x ) d x | lim sup n n log n 2 5 γ n 2 1 γ n 2 + C γ n + 1 sup x R | f n ( x ) f ( x ) | = lim sup n γ n 2 + C γ n + 1 n log n 2 5 sup x R | f n ( x ) f ( x ) | ,
where the last expression is constant almost surely by Lemma 1 and since γ n 0 . □
Proof of Corollary 1.
By the triangle inequality
| S n ( f ) S ( f ) | J n , 1 + J n , 2 ,
where
J n , 1 = | S n ( f ) A n log f ( x ) f ( x ) d x |
and
J n , 2 = | A n log f ( x ) f ( x ) d x S ( f ) | .
Since J n , 1 0 almost surely by Theorem 1, we only need to contend with J n , 2 . That is, we need to show that
A n c f ( x ) log f ( x ) d x 0
almost surely as n .
For any Borel measurable set E, consider
P ( E ) = E f ( x ) d x ,
and define the signed measure
ν ( E ) = E log f ( x ) d P .
Since | S ( f ) | < , both ν + and ν are finite measures, and thus, ν is a finite signed measure that is absolutely continuous with respect to P. Because of Lemma 3, it suffices for us to demonstrate that
P ( A n c ) 0
almost surely. For any x A n c , we have f n ( x ) < γ n . By Lemma 1, there exists C R + such that f ( x ) f n ( x ) + | f n ( x ) f ( x ) | < γ n + C log n n 2 5 almost surely, and hence, we have shown that A n c B n almost surely, where
B n : = x : f ( x ) < γ n + C log n n 2 5 .
It is easy to see that
0 P ( A n c ) P ( B n ) 0
almost surely, since γ n + C log n n 2 5 0 as n . □
Proof of Theorem 2.
We start with
S n ( f ) A n log f ( x ) f ( x ) d x 2 S n ( f ) + A n f n ( x ) log f ( x ) d x 2 + A n f n ( x ) log f ( x ) d x A n f ( x ) log f ( x ) d x 2 = : K n , 1 + K n , 2 .
Recall inequality (11) in the proof of Theorem 1. Arguing in a similar manner as before, we can demonstrate the existence of C 1 R + so that
K n , 1 = A n f n ( x ) log f ( x ) log f n ( x ) d x 2 = A n f n ( x ) log f ( x ) f n ( x ) d x 2 A n f n ( x ) | log f ( x ) f n ( x ) | d x 2 A n C 1 γ n | f n ( x ) f ( x ) | f n ( x ) d x 2 C 1 γ n log n n 2 5 x A n f n ( x ) d x 2 C 1 γ n log n n 2 5 .
Notice also that
K n , 2 = A n [ f n ( x ) f ( x ) ] log f ( x ) d x 2 A n | f n ( x ) f ( x ) | log f ( x ) d x 2 C 2 log n n 2 5 A n f ( x ) + 1 f ( x ) d x 2 C 2 log n n 2 5 1 + A n f n ( x ) f ( x ) 1 f n 2 ( x ) f n ( x ) d x 2 C 2 log n n 2 5 1 + C 3 γ n 2 .
Therefore,
n γ n 5 log n 2 5 S n ( f ) A n log f ( x ) f ( x ) d x 2 n log n 2 5 γ n 2 C 1 γ n log n n 2 5 + C 2 log n n 2 5 1 + C 3 γ n 2 = C 1 γ n + C 2 γ n 2 + C 2 C 3 ,
from which the result follows. □
Proof of Corollary 2.
Note the decomposition
MSE ( S n ( f ) ) = S n ( f ) S ( f ) 2 S n ( f ) A n log f ( x ) f ( x ) d x 2 + A n log f ( x ) f ( x ) d x S ( f ) 2 = : M n , 1 + M n , 2 .
By Theorem 2, M n , 1 0 . Now, let
W n = x R y R f ( x ) f ( y ) log f ( x ) log f ( y ) I ( x , y ) A n c × A n c d x d y
and
W = x R y R f ( x ) f ( y ) | log f ( x ) log f ( y ) | d x d y .
Recall from (15) in the proof of Corollary 1 that W n 0 almost surely. Because | S ( f ) | < , it follows that W < , and moreover, | W n | W . Hence,
M n , 2 2 = A n c f ( x ) log f ( x ) d x 2 2 = R f ( x ) log f ( x ) I ( x A n c ) d x 2 2 = E [ W n ] ,
and the Lebesgue Dominated Convergence Theorem guarantees that
lim n M n , 2 2 = lim n E [ W n ] = E lim n W n = E [ 0 ] = 0 ,
thereby proving the corollary. □

Author Contributions

Conceptualization, T.F. and H.S.; Methodology, T.F. and H.S.; Formal Analysis, T.F. and H.S.; Investigation, T.F. and H.S.; Writing—Original Draft Preparation, T.F.; Writing—Review & Editing, H.S.; Supervision, H.S.; Funding Acquisition, H.S. Both authors have read and agreed to the published version of the manuscript.

Funding

This research is supported in part by the Simons Foundation Grant 586789.

Acknowledgments

The authors are grateful to the referees and Daniel J. Henderson for carefully reading the paper and for insightful suggestions that significantly improved the presentation of the paper. The research is supported in part by the Simons Foundation Grant 586789 and the College of Liberal Arts Faculty Grants for Research and Creative Achievement at the University of Mississippi.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

In the paper Wu et al. (2010), Wu et al. establish results that are very useful in the proof section. Here, we briefly survey their definitions and results which show that the kernel density estimator for one-sided linear processes enjoys properties similar to the independent case—see Stute (1982). Their work identifies conditions under which the kernel density estimator enjoys strong uniform consistency for a wide class of time series. Included is the linear process in (2).
As is common in analysis of time series, we allude to an independent and identically distributed collection { ε i : i Z } of random variables, typically referred to as the innovations. Note that many time series models fit the form
X n = J ( , ε n 1 , ε n ) ,
which regards the X n as a system dependent on the innovations. Note here that J is some measurable function which is referred to as the filter. In this context, we also need to define the sigma algebras
F n = σ { ε n , ε n 1 , } ,
where n Z . In addition, let ε 0 be an independent and identical copy of ε 0 which is, of course, independent of all the ε i . For n 0 , define
F n * = σ { ε n , ε n 1 , , ε 1 , ε 0 , ε 1 , } ,
and for n < 0 , put F n * = F n .
Define the l-step ahead conditional distribution by
F l ( x | F k ) = P ( X l + k x | F k ) ,
where l N and k Z . When it exists, the l-step ahead conditional density is
f l ( x | F k ) = d d x F l ( x | F k ) .
As Wu et al. (2010) note, a sufficient condition for the existence of a marginal density of (A1) is that f 1 ( x | F 0 ) exists and is uniformly bounded almost surely by some M R + . We shall refer to this as the marginal condition. Similarly, F l ( x | F k * ) = P ( X l + k * x | F k * ) , where X l + k * = X l + k a l + k ε 0 + a l + k ε 0 if l + k 0 and X l + k * = X l + k if l + k < 0 . Also, f l ( x | F * ) = d d x F l ( x | F k * ) .
With this setup, the authors introduce the following measures of the dependence present in the system (A1). Now, for k 0 , define a pointwise measure of difference by
θ k ( x ) = f 1 + k ( x | F 0 ) f 1 + k ( x | F 0 * ) 2
and an Ł 2 -integral measure of difference over R by
θ ( k ) = R θ k 2 ( x ) d x 1 2 .
Finally, define an overall measure of difference by
Θ ( n ) = j Z k = 1 j n j | θ ( k ) | 2 .
The distances on the derivatives are defined similarly, as given below.
ψ k ( x ) = f 1 + k ( x | F 0 ) f 1 + k ( x | F 0 * ) 2 , ψ ( k ) = R ψ k 2 ( x ) d x 1 2 , and Ψ ( n ) = j Z k = 1 j n j | ψ ( k ) | 2 .
With this setup, we can now report the following result of (Wu et al. 2010, Theorem 2).
Theorem A1.
Assume that, for some positive r and s, we have that K C r is a bounded function with bounded support and that X n Ł s . Further, assume the marginal condition, and assume that Θ ( n ) + Ψ ( n ) = O ( n α l ˜ ( n ) ) , where α 1 and where l ˜ is a slowly varying function. If log n = o ( n h n ) , then
sup x R | f n ( x ) E f n ( x ) | = O log n n h n + n 1 2 l ( n ) ,
where l ( n ) is another slowly varying function.
Now consider our particular case when the filter is the linear process of (2). In view of our assumption that the innovations have finite variance and because we assume the coefficients are square-summable, X n Ł 2 . Moreover, we assume all of the bandwidth, kernel, and density conditions listed earlier, from which it easily follows that the marginal condition is satisfied. For the short memory linear process (under the aforementioned assumptions), Wu et al. (2010) demonstrated that Θ ( n ) + Ψ ( n ) = O ( n ) . Also, notice that condition B.1 implies that log n = o ( n h n ) . Therefore, the theorem of Wu et al. (2010) applies to (2).
In addition, the well-known Taylor series argument under the conditions K.2 and K.3, as well as D.3, yields
sup x R | E [ f n ( x ) ] f ( x ) | = O ( h n 2 ) ,
so, collectively, we see that
sup x R | f n ( x ) f ( x ) | = O log n n h n + n 1 2 l ( n ) + h n 2 .
Basic methods of differential calculus show that log n n h n + h n 2 is minimized when h n satisfies B.1. Indeed, the optimum value of h n has the exact order of log n n 1 5 .

References

  1. Ahmad, Ibrahim, and Pi-Erh Lin. 1976. A nonparametric estimation of the entropy for absolutely continuous distributions. IEEE Transactions on Information Theory 22: 372–75. [Google Scholar] [CrossRef]
  2. Beirlant, Jan, Edward J. Dudewicz, László Györfi, and Edward C. van der Meulen. 1997. Nonparametric entropy estimation: An overview. International Journal of Mathematical and Statistical Sciences 6: 17–39. [Google Scholar]
  3. Bouzebda, Salim, and Issam Elhattab. 2011. Uniform-in-bandwidth consistency for kernel-type estimators of Shannon’s entropy. Electronic Journal of Statistics 5: 440–59. [Google Scholar] [CrossRef]
  4. Devroye, Luc, and László Györfi. 1985. Nonparametric Density Estimation: The L1 View. New York: Wiley. [Google Scholar]
  5. Dmitriev, Yu G., and Felix P. Tarasenko. 1973. On the estimation functions of the probability density and its derivatives. Theory of Probability and Its Applications 18: 628–633. [Google Scholar] [CrossRef]
  6. Duin, Robert P. W. 1976. On the choice of smoothing parameters of Parzen estimators of probability density function. IEEE Transactions on Computers 11: 1175–79. [Google Scholar] [CrossRef]
  7. Halmos, Paul R. 1974. Measure Theory. New York: Springer. [Google Scholar]
  8. Honda, Toshio. 2000. Nonparametric density estimation for a long-range dependent linear process. Annals of the Institute of Statistical Mathematics 52: 599–611. [Google Scholar] [CrossRef]
  9. Nadaraya, Elizbar Akakevič. 1989. Nonparametric Estimation of Probability Densities and Regression Curves. Dordrecht: Kluwer Academic Pub. [Google Scholar]
  10. Parzen, Emanuel. 1962. On estimation of a probability density and mode. Annals of Mathematical Statistics 31: 1065–79. [Google Scholar] [CrossRef]
  11. Rosenblatt, Murray. 1956. Remarks on some nonparametric estimates of a density function. Annals of Mathematical Statistics 27: 832–37. [Google Scholar] [CrossRef]
  12. Rudemo, Mats. 1982. Empirical choice of histograms and kernel density estimators. Scandinavian Journal of Statistics 9: 65–78. [Google Scholar]
  13. Sang, Hailin, Yongli Sang, and Fangjun Xu. 2018. Kernel entropy estimation for linear processes. Journal of Time Series Analysis 39: 563–91. [Google Scholar] [CrossRef] [Green Version]
  14. Schimek, Michael G. 2000. Smoothing and Regression: Approaches, Computation, and Application. Hoboken: John Wiley & Sons. [Google Scholar]
  15. Scott, David W. 2015. Multivariate Density Estimation: Theory, Practice, and Visualization, 2nd ed. Hoboken: John Wiley & Sons. [Google Scholar]
  16. Shannon, Claude E. 1948. A mathematical theory of communication. Bell System Technical Journal 27: 379–423. [Google Scholar] [CrossRef] [Green Version]
  17. Shumway, Robert H., and David S. Stoffer. 2011. Time Series Analysis and Its Applications, 3rd ed. New York: Springer. [Google Scholar]
  18. Silverman, Bernard W. 1986. Density Estimation for Statistics and Data Analysis. London: Chapman and Hall. [Google Scholar]
  19. Slaoui, Yousri. 2014. Bandwidth selection for recursive kernel density estimators defined by stochastic approximation method. Journal of Probability and Statistics 2014: 739640. [Google Scholar] [CrossRef]
  20. Slaoui, Yousri. 2018. Bias reduction in kernel density estimation. Journal of Nonparametric Statistics 30: 505–22. [Google Scholar] [CrossRef]
  21. Stute, Winfried. 1982. A law of the logarithm for kernel density estimator. Annals of Probability 10: 414–22. [Google Scholar] [CrossRef]
  22. Tran, Lanh Tat. 1992. Kernel density estimation for linear processes. Stochastic Processes and their Applications 41: 281–96. [Google Scholar] [CrossRef] [Green Version]
  23. Wand, Matt P., and M. Chris Jones. 1995. Kernel Smoothing. London: Chapman and Hall. [Google Scholar]
  24. Wu, Wei Biao, Yinxiao Huang, and Yibi Huang. 2010. Kernel estimation for time series: An asymptotic theory. Stochastic Processes and their Applications 120: 2412–31. [Google Scholar] [CrossRef] [Green Version]
  25. Wu, Wei Biao, and Jan Mielniczuk. 2002. Kernel density estimation for linear processes. Annals of Statistics 30: 1441–59. [Google Scholar] [CrossRef]

Share and Cite

MDPI and ACS Style

Fortune, T.; Sang, H. Shannon Entropy Estimation for Linear Processes. J. Risk Financial Manag. 2020, 13, 205. https://doi.org/10.3390/jrfm13090205

AMA Style

Fortune T, Sang H. Shannon Entropy Estimation for Linear Processes. Journal of Risk and Financial Management. 2020; 13(9):205. https://doi.org/10.3390/jrfm13090205

Chicago/Turabian Style

Fortune, Timothy, and Hailin Sang. 2020. "Shannon Entropy Estimation for Linear Processes" Journal of Risk and Financial Management 13, no. 9: 205. https://doi.org/10.3390/jrfm13090205

Article Metrics

Back to TopTop