Abstract
Let be a stationary process with values in a finite set. In this paper, we present a moving average version of the Shannon–McMillan–Breiman theorem; this generalize the corresponding classical results. A sandwich argument reduced the proof to direct applications of the moving strong law of large numbers. The result generalizes the work by Algoet et. al., while relying on a similar sandwich method. It is worth noting that, in some kind of significance, the indices and are symmetrical, i.e., for any integer n, if the growth rate of is slow enough, all conclusions in this article still hold true.
MSC:
94A17
1. Introduction
Information theory is mainly concerned with stationary random processes , where takes values in a set , with cardinality . The strong convergence of the entropy at time n of a random processes divided by n to a constant limit called the entropy rate of the process is known as the ergodic theorem of information theory or the asymptotic equipartition property (AEP) [1], in some sense, of the expression
Its original version proven in the 1950s for ergodic stationary processes is known as the Shannon–McMillan theorem for the convergence in mean and as the Shannon–Breiman–McMillan theorem [2,3,4] for the almost everywhere convergence. Since then, generalized versions of Shannon–McMillan–Breiman’s limit theorem were developed by many authors [1,2,4,5]. Extensions have been made in the direction of weakening the assumptions on the reference measure, state space, index set and required properties of the process. For the general development, please see Girardin [6] and the references therein.
In statistics, smoothing data is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise phenomena. One of the most used smoothing methods is moving average (MA). A number of authors have studied the question of almost everywhere convergence for an invertible transformation of X, which is measure preserving the moving averages, e.g., Akcoglu and Del Junco [7]; Bellow, Jones, and Rosenblatt [8]; Junco and Steele [9]; Schwartz [10]; and Haili and Nair [11]. Recently, Wang and Yang [12,13] proposed a new concept of the generalized entropy density, and established a generalized entropy ergodic theorem for time-nonhomogeneous Markov chain and for non-null stationary processes. Shi, Wang et al. [14] studied the generalized entropy ergodic theorem for nonhomogeneous Markov chains indexed by a binary tree.
Motivated by the work above, in this paper we will give a moving average version of the Shannon–McMillan–Breiman theorem. The results in this paper generalize the results of those in [2]. It is worth noting that, in some sense, the indices and are symmetrical. In this paper, we are discussing the so-called forward moving average; if the growth rate of w.r.p. to integer n is slow enough, all conclusions in this article still hold true, i.e., the backward moving average is still established.
The method used in showing the main results is the “sandwich” approximation approach of Algoet and Cover [2], which depends strongly on the moving strong law of large numbers: sample entropy is asymptotically sandwiched between two functions whose limits can be determined from the moving SLLN theorem.
This paper is organized as follows. In Section 2, we introduced some necessary preparatory knowledge. To distinguish this from the main conclusion theorem names, we present some required preliminaries and three lemmas. In Section 3, we give the main results and some properties of them are studied in the same section. Also, we give examples of applications.
2. Preliminaries
Throughout this section, let denote a fixed probability space and let be a stationary sequence taking values from a finite set . For the sequence , denote the partial sequence by and by for . Likewise, we write , for the sequence of and , respectively. Let
and
wherever the conditioning event has positive probability. Define random variables
by setting in the corresponding definitions. Since
the conditional probability makes sense (i.e., almost everywhere holds true under measure ).
Definition 1
(see e.g., [2]). The canonical Markov approximation of order m to the probability is defined for large as
We will prove a new version of AEP for a stationary process . Before developing the main theme of the paper, we shall need to derive some basic lemmas. Let be a pair of positive integers such that as , and for every , .
Lemma 1.
Let be a stationary process with values from a finite set ; then, we have
and
where the base of the logarithm is taken to be 2.
Proof.
Let A be the support set of ; then,
where indicates taking expectation under measure .
Similarly, let denote the support set of . Then, we have
By Markov’s inequality and Equation (4), we have, for any ,
Noting that , we see by the Borel–Cantelli lemma that the event
By the arbitrariness of , we have
Applying the same arguments using Markov’s inequality to Equation (3), we obtain
This proves the lemma. □
Lemma 2.
(SLLN for MA): For a stationary stochastic process ,
and
where , .
Proof.
By the fact that , we have
Note that
It is not difficult to verify that . An argument similar to the one used in Lemma 1 shows that
Let , and define
Since
It is straightforward to show that
Note that
By Equations (8) and (9) and and the property of superior limits, we have
Setting , dividing both sides of Equation (10) by s, we obtain
Using the inequalities and ,.
- It follows from Equation (11) that
From Equations (11)–(13), we have
Putting in Equation (14), we obtain
Replacing by in the above argument, we can obtain
These imply that
Note that ; therefore, we have, by Equation (15),
Since , Equation (5) follows immediately from Equations (7) and (16).
- Similarly, let s be a nonzero real number, and define
The remainder of the argument is analogous to that in proving Equation (5) and is left to the reader. □
Lemma 3.
(No gap): and .
Proof.
We know that for stationary precesses , so it remains to show that .
Let . Since is integrable. Now, since all random variables are discrete, we may write
Therefore, and is measurable relative to field ; is a non-negative supermartingale, hence converges to an integrable limit function for all .
Note that, for any m,
where the last equation follows from stationarity.
Since is finite and is bounded and continuous in p for all , the bounded convergence theorem allows interchange of expectation and limit, yielding
Thus, . □
3. Main Results
With the preliminary accounted for, we wish to use the Lemma 1 to conclude that
It is not easy to prove Equation (17). However, the closely related quantities and are easily identified as entropy rates.
Recall that the entropy rate is given by
Of course, by stationarity and the fact that conditioning does not increase entropy. It will be crucial that .
With the help of the preceding lemmas, we can now prove the following theorem:
Theorem 1.
(AEP) If H is the entropy rate of a finite-valued stationary process , then it holds that
Remark 1.
In the case , Theorem 1 reduces to the famous Shannon–McMillan–Breiman theorem, which is the fundamental theorem of information theory. Let ; it gives a delayed average version of AEP.
Proof.
We argue that the sequence of random variables is asymptotically sandwiched between the upper bound and the lower bound for all . The AEP will follow since and . □
From Lemma 1, we have
which we rewrite, taking the existence of the into account, as
for Also, from Lemma 1, we have
which we rewrite as
From the definition of in Lemma 2, we have, by putting together Equations (6) and (7),
for all m.
- But, by Lemma 3, . Consequently,
Now, we give some interesting applications of our main results in the next examples.
Example 1.
Let be independent, identically distributed random variables drawn from the probability mass function ; then,
Example 2.
Let
Let be drawn i.i.d. according to this distribution; then,
Example 3.
Let be independent identically distributed random variables drawn according to the probability mass function . Thus, . Let , where q is another probability mass function on ; then,
Since convergence almost everywhere implies convergence in probability, Theorem 1 has the following implication:
Definition 2.
The typical set with respect to is the set of sequence with the following properties:
As a consequence of the Theorem 1, we can show that the set has the following properties:
Proposition 1.
Let be independent, identically distributed random variables drawn from the probability mass function ; then,
- (1).
- If , then
- (2).
- for sufficiently large n.
- (3).
- , where denotes the number of elements in set A.
- (4).
- for sufficiently large n.
Proof.
The property (1) is immediate from the definition of .
Property (2) follows directly from Theorem 1, since the probability of the event tends to 1 as .
Thus, for any , there exists an such that for all , we have
Setting , we have the following:
Finally, for sufficiently large n, ,
where the second inequality follows from Definition 2. Therefore,
These complete the proof of the proposition. □
Example 4.
Let be i.i.d. . Let . Let and . Then we have the following:
(1) ;
(2) ;
(3) , for all n;
(4) , for sufficiently large n.
Proof.
(1) By Theorem 1, the probability is typical goes to 1.
(2) By the Strong Law of Large Numbers for moving average, we have . So there exists and such that for all , and there exists such that for all . So for all ,
So for any there exists such that for all ; therefore, .
(3) By the law of total probability . Also, for , from Theorem 1 in the text, . Combining these two equations gives
Multiplying through by gives the result .
(4) Since from (2) , there exists N such that for all . From Theorem 1 in the text, for , . So, combining these two gives
Multiplying through by gives the result for sufficiently large n. □
Author Contributions
Writing—original draft, Y.R. and Z.W. All authors have read and agreed to the published version of the manuscript.
Funding
This research was supported by NSF of Anhui University China (No. KJ2021A0386).
Data Availability Statement
No new data were created or analyzed in this study.
Acknowledgments
It is a pleasure to acknowledge our debt to Weicai Peng who suggested to us the problem addressed herein. We are grateful to the three anonymous referees and the editor for the useful comments and suggestions.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley-Interscience: Hoboken, NJ, USA, 2005. [Google Scholar]
- Algoet, P.H.; Cover, T.M. A sandwich proof of the Shannon-McMillan-Breiman theorem. Ann. Probab. 1988, 16, 899–909. [Google Scholar] [CrossRef]
- Breiman, L. The Individual Ergodic Theorem of Information Theory. Ann. Math. Stat. 1957, 28, 809–811. [Google Scholar] [CrossRef]
- McMillan, B. The basic theorems of information. Ann. Math. Stat. 1953, 24, 196–219. [Google Scholar] [CrossRef]
- Neshveyev, S.; StØrmer, E. The McMillan Theorem for a Class of Asymptotically Abelian C*-Algebras. Ergod. Theory Dyn. Syst. 2002, 22, 889–897. [Google Scholar] [CrossRef]
- Girardin, V. On the different extensions of the ergodic theorem of information theory. In Recent Advance in Applied Probability; Springer Science+Business Media: Berlin/Heidelberg, Germany, 2005; pp. 163–179. [Google Scholar]
- Akcoglu, M.A.; Junco, D. Convergence of averages of point transformations. Proc. Am. Math. Soc. 1975, 49, 265–266. [Google Scholar] [CrossRef]
- Bellow, A.; Jones, R.; Rosenblatt, J.M. Convergence for moving averages. Ergod. Theory Dyn. Syst. 1990, 10, 43–62. [Google Scholar] [CrossRef][Green Version]
- del Junco, A.; Steele, J.M. Moving averages of ergodic process. Metrika 1977, 24, 35–43. [Google Scholar] [CrossRef]
- Schwartz, M. Polynomially moving ergodic average. Proc. Am. Math. Soc. 1988, 103, 252–254. [Google Scholar] [CrossRef]
- Haili, H.K.; Nair, R. Optimal continued fractions and the moving average ergodic theorem. Period. Math. Hung. 2013, 66, 95–103. [Google Scholar] [CrossRef]
- Wang, Z.Z.; Yang, W.G. The generalized entropy ergodicity theorem for nonhomogeneous Markov chains. J. Theor. Probab. 2016, 29, 761–775. [Google Scholar] [CrossRef]
- Wang, Z.Z.; Yang, W.G. Markov approximation and the generalized entropy ergodic theorem for non-null stationary process. Proc. Indian Acad. Sci. (Math. Sci.) 2020, 130, 13. [Google Scholar] [CrossRef]
- Shi, Z.Y.; Wang, Z.Z.; Zhong, P.P. The generalized entropy ergodicity theorem for nonhomogeneous bifurcating Markov chains indexed by a binary tree. J. Theor. Probab. 2022, 35, 1367–1390. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).