# Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited

## Abstract

**:**

## 1. Introduction

What can be some general mathematical properties of natural language treated as a stochastic process, in view of empirical data?

## 2. Ergodic and Nonergodic Processes

**Theorem**

**1**

**.**For any discrete stationary process ${\left({X}_{i}\right)}_{i=1}^{\infty}$, there exist limits

- Process ${\left({X}_{i}\right)}_{i=1}^{\infty}$ is called IID (independent identically distributed) if$$\begin{array}{c}\hfill P({X}_{1}^{n}={x}_{1}^{n})=\pi \left({x}_{1}\right)\dots \pi \left({x}_{n}\right).\end{array}$$All IID processes are ergodic.
- Process ${\left({X}_{i}\right)}_{i=1}^{\infty}$ is called Markov (of order 1) if$$\begin{array}{c}\hfill P({X}_{1}^{n}={x}_{1}^{n})=\pi \left({x}_{1}\right)p\left({x}_{2}\right|{x}_{1})\dots p\left({x}_{n}\right|{x}_{n-1}).\end{array}$$A Markov process is ergodic in particular if$$\begin{array}{c}\hfill p\left({x}_{i}\right|{x}_{i-1})>c>0.\end{array}$$For a sufficient and necessary condition, see ([32], Theorem 7.16).
- Process ${\left({X}_{i}\right)}_{i=1}^{\infty}$ is called hidden Markov if ${X}_{i}=g\left({S}_{i}\right)$ for a certain Markov process ${\left({S}_{i}\right)}_{i=1}^{\infty}$ and a function g. A hidden Markov process is ergodic in particular if the underlying Markov process is ergodic.

**Theorem**

**2**

**.**Any process ${\left({X}_{i}\right)}_{i=1}^{\infty}$ with a stationary measure P is almost surely ergodic with respect to the random measure F given by

**Theorem**

**3**

**.**Any stationary probability measure P can be represented as

## 3. Strongly Nonergodic Processes

**Theorem**

**4**

**.**A stationary discrete process ${\left({X}_{i}\right)}_{i=1}^{\infty}$ is nonergodic if and only if there exists a function $f:{\mathbb{X}}^{*}\to \left(\right)open="\{"\; close="\}">0,1,2$ and a binary random variable Z such that $0<P(Z=0)<1$ and

**Proof.**

**Definition**

**1**

**.**A stationary discrete process ${\left({X}_{i}\right)}_{i=1}^{\infty}$ is called strongly nonergodic if there exist a function $g:\mathbb{N}\times {\mathbb{X}}^{*}\to \left(\right)open="\{"\; close="\}">0,1,2$ and a binary IID process ${\left({Z}_{k}\right)}_{k=1}^{\infty}$ such that $P({Z}_{k}=0)=P({Z}_{k}=1)=1/2$ and

## 4. Perigraphic Processes

**Definition**

**2.**

**Theorem**

**5.**

**Proof.**

## 5. Theorem about Facts and Words

**Definition**

**3**

**.**For ${x}_{1}^{n}\in {\mathbb{X}}^{n}$ and $k\in \left(\right)open="\{"\; close="\}">-1,0,1,\dots $, we put

- ${\mathrm{PPM}}_{k}\left({x}_{i}\right|{x}_{1}^{i-1})>0$ and ${\sum}_{{x}_{i}\in \mathbb{X}}{\mathrm{PPM}}_{k}\left({x}_{i}\right|{x}_{1}^{i-1})=1$,
- ${\mathrm{PPM}}_{k}\left({x}_{1}^{n}\right)>0$ and ${\sum}_{{x}_{1}^{n}\in {\mathbb{X}}^{n}}{\mathrm{PPM}}_{k}\left({x}_{1}^{n}\right)=1$,
- $\mathrm{PPM}\left({x}_{1}^{n}\right)>0$ and ${\sum}_{{x}_{1}^{n}\in {\mathbb{X}}^{n}}\mathrm{PPM}\left({x}_{1}^{n}\right)=1$.

**Theorem**

**6**

**.**The PPM probability is universal in expectation, i.e., we have

**Theorem**

**7.**

**Proof.**

**Definition**

**4.**

**Theorem**

**8.**

**Proof.**

**Theorem**

**9**

**.**Let ${\left({X}_{i}\right)}_{i=1}^{\infty}$ be a stationary strongly nonergodic process over a finite alphabet. We have inequalities

**Proof.**

**Theorem**

**10**

**.**Let ${\left({X}_{i}\right)}_{i=1}^{\infty}$ be a stationary process over a finite alphabet. We have inequalities

**Proof.**

## 6. Hilberg Exponents and Empirical Data

## 7. Conclusions

## Acknowledgments

## Conflicts of Interest

## Abbreviations

IID | independent identically distributed |

PPM | prediction by partial matching |

## Appendix A. Facts and Mutual Information

- First, there are four pointwise Shannon information measures:
- entropy$\mathbb{H}\left(X\right)=-\mathrm{log}P\left(X\right)$,
- conditional entropy$\mathbb{H}\left(X\right|Z):=-\mathrm{log}P(X\left|Z\right)$,
- mutual information$\mathbb{I}(X;Y):=\mathbb{H}\left(X\right)+\mathbb{H}\left(Y\right)-\mathbb{H}(X,Y)$,
- conditional mutual information$\mathbb{I}(X;Y|Z):=\mathbb{H}(X\left|Z\right)+\mathbb{H}\left(Y\right|Z)-\mathbb{H}(X,Y\left|Z\right)$,

where $P\left(X\right)$ is the probability of a random variable X and $P\left(X\right|Z)$ is the conditional probability of a random variable X given a random variable Z. The above definitions make sense for discrete-valued random variables X and Y and an arbitrary random variable Z. If Z is a discrete-valued random variable, then also $\mathbb{H}(X,Z)-\mathbb{H}\left(Z\right)=\mathbb{H}\left(X\right|Z)$ and $\mathbb{I}(X;Z)=\mathbb{H}\left(X\right)-\mathbb{H}\left(X\right|Z)$. - Moreover, we will use four algorithmic information measures:
- entropy${\mathbb{H}}_{a}\left(x\right)=K\left(x\right)\mathrm{log}2$,
- conditional entropy${\mathbb{H}}_{a}\left(x\right|z):=K\left(x\right|z)\mathrm{log}2$,
- mutual information${\mathbb{I}}_{a}(x;y):={\mathbb{H}}_{a}\left(x\right)+{\mathbb{H}}_{a}\left(y\right)-{\mathbb{H}}_{a}(x,y)$,
- conditional mutual information${\mathbb{I}}_{a}(x;y|z):={\mathbb{H}}_{a}\left(x\right|z)+{\mathbb{H}}_{a}\left(y\right|z)-{\mathbb{H}}_{a}(x,y|z)$,

where $K\left(x\right)$ is the prefix-free Kolmogorov complexity of an object x and $K\left(x\right|z)$ is the prefix-free Kolmogorov complexity of an object x given an object z. In the above definitions, x and y must be finite objects (finite texts), whereas z can be also an infinite object (an infinite sequence). If z is a finite object, then ${\mathbb{H}}_{a}(x,z)-{\mathbb{H}}_{a}\left(z\right)\stackrel{+}{=}{\mathbb{H}}_{a}\left(x\right|z,K\left(z\right))$ rather than being equal to ${\mathbb{H}}_{a}\left(x\right|z)$, where $\stackrel{+}{=}$, $\stackrel{+}{<}$, and $\stackrel{+}{>}$ are the equality and the inequalities up to an additive constant ([37], Theorem 3.9.1). Hence,$$\begin{array}{cc}\hfill {\mathbb{H}}_{a}\left(x\right)-{\mathbb{H}}_{a}\left(x\right|z)+{\mathbb{H}}_{a}\left(K\left(z\right)\right)& \stackrel{+}{>}{\mathbb{I}}_{a}(x;z)\stackrel{+}{=}{\mathbb{H}}_{a}\left(x\right)-{\mathbb{H}}_{a}\left(x\right|z,K\left(z\right))\hfill \\ & \stackrel{+}{>}{\mathbb{H}}_{a}\left(x\right)-{\mathbb{H}}_{a}\left(x\right|z).\hfill \end{array}$$

**Theorem**

**A1.**

**Proof.**

**Theorem**

**A2**

**.**Let ${\left({X}_{i}\right)}_{i=1}^{\infty}$ be a stationary strongly nonergodic process over a finite alphabet. We have inequality

**Proof.**

**Theorem**

**A3**

**.**Let ${\left({X}_{i}\right)}_{i=1}^{\infty}$ be a stationary process over a finite alphabet. We have inequality

**Proof.**

## Appendix B. Mutual Information and PPM Words

**Theorem**

**A4.**

**Proof.**

**Definition**

**A1.**

**Theorem**

**A5.**

**Proof.**

**Theorem**

**A6.**

**Proof.**

**Theorem**

**A7.**

**Proof.**

**Theorem**

**A8**

**.**Let ${\left({X}_{i}\right)}_{i=1}^{\infty}$ be a stationary process over a finite alphabet. We have inequalities

**Proof.**

## Appendix C. Hilberg Exponents for Santa Fe Processes

**Theorem**

**A9**

**Proof.**

**Theorem**

**A10.**

**Proof.**

## References

- Shannon, C. A mathematical theory of communication. Bell Syst. Tech. J.
**1948**, 30, 379–423, 623–656. [Google Scholar] [CrossRef] - Shannon, C. Prediction and entropy of printed English. Bell Syst. Tech. J.
**1951**, 30, 50–64. [Google Scholar] [CrossRef] - Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley: Hoboken, NJ, USA, 2006. [Google Scholar]
- Jelinek, F. Statistical Methods for Speech Recognition; The MIT Press: Cambridge, MA, USA, 1997. [Google Scholar]
- Manning, C.D.; Schütze, H. Foundations of Statistical Natural Language Processing; The MIT Press: Cambridge, MA, USA, 1999. [Google Scholar]
- Ryabko, B. Twice-universal coding. Probl. Inf. Transm.
**1984**, 20, 173–177. [Google Scholar] - Cleary, J.G.; Witten, I.H. Data compression using adaptive coding and partial string matching. IEEE Trans. Commun.
**1984**, 32, 396–402. [Google Scholar] [CrossRef] - Zipf, G.K. The Psycho-Biology of Language: An Introduction to Dynamic Philology, 2nd ed.; The MIT Press: Cambridge, MA, USA, 1965. [Google Scholar]
- Mandelbrot, B. Structure formelle des textes et communication. Word
**1954**, 10, 1–27. [Google Scholar] [CrossRef] - Kuraszkiewicz, W.; Łukaszewicz, J. The number of different words as a function of text length. Pamięt. Lit.
**1951**, 42, 168–182. (In Polish) [Google Scholar] - Guiraud, P. Les Caractères Statistiques du Vocabulaire; Presses Universitaires de France: Paris, France, 1954. [Google Scholar]
- Herdan, G. Quantitative Linguistics; Butterworths: London, UK, 1964. [Google Scholar]
- Heaps, H.S. Information Retrieval—Computational and Theoretical Aspects; Academic Press: Cambridge, MA, USA, 1978. [Google Scholar]
- Hilberg, W. Der bekannte Grenzwert der redundanzfreien Information in Texten—Eine Fehlinterpretation der Shannonschen Experimente? Frequenz
**1990**, 44, 243–248. [Google Scholar] [CrossRef] - Ebeling, W.; Nicolis, G. Entropy of Symbolic Sequences: The Role of Correlations. Europhys. Lett.
**1991**, 14, 191–196. [Google Scholar] [CrossRef] - Ebeling, W.; Pöschel, T. Entropy and long-range correlations in literary English. Europhys. Lett.
**1994**, 26, 241–246. [Google Scholar] [CrossRef] - Bialek, W.; Nemenman, I.; Tishby, N. Complexity through nonextensivity. Physica A
**2001**, 302, 89–99. [Google Scholar] [CrossRef] - Crutchfield, J.P.; Feldman, D.P. Regularities unseen, randomness observed: The entropy convergence hierarchy. Chaos
**2003**, 15, 25–54. [Google Scholar] [CrossRef] - Dębowski, Ł. On Hilberg’s law and its links with Guiraud’s law. J. Quant. Linguist.
**2006**, 13, 81–109. [Google Scholar] - Wolff, J.G. Language acquisition and the discovery of phrase structure. Lang. Speech
**1980**, 23, 255–269. [Google Scholar] [CrossRef] [PubMed] - De Marcken, C.G. Unsupervised Language Acquisition. Ph.D. Thesis, Massachussetts Institute of Technology, Cambridge, MA, USA, 1996. [Google Scholar]
- Kit, C.; Wilks, Y. Unsupervised Learning of Word Boundary with Description Length Gain. In Proceedings of the Computational Natural Language Learning ACL Workshop, Bergen; Osborne, M., Sang, E.T.K., Eds.; The Association for Computational Linguistics: Stroudsburg, PA, USA, 1999; pp. 1–6. [Google Scholar]
- Kieffer, J.C.; Yang, E. Grammar-based codes: A new class of universal lossless source codes. IEEE Trans. Inf. Theory
**2000**, 46, 737–754. [Google Scholar] [CrossRef] - Dębowski, Ł. A general definition of conditional information and its application to ergodic decomposition. Statist. Probab. Lett.
**2009**, 79, 1260–1268. [Google Scholar] - Dębowski, Ł. On the Vocabulary of Grammar-Based Codes and the Logical Consistency of Texts. IEEE Trans. Inf. Theory
**2011**, 57, 4589–4599. [Google Scholar] - Charikar, M.; Lehman, E.; Lehman, A.; Liu, D.; Panigrahy, R.; Prabhakaran, M.; Sahai, A.; Shelat, A. The Smallest Grammar Problem. IEEE Trans. Inf. Theory
**2005**, 51, 2554–2576. [Google Scholar] [CrossRef] - Dębowski, Ł. Excess entropy in natural language: present state and perspectives. Chaos
**2011**, 21, 037105. [Google Scholar] - Dębowski, Ł. The Relaxed Hilberg Conjecture: A Review and New Experimental Support. J. Quantit. Linguist.
**2015**, 22, 311–337. [Google Scholar] - Dębowski, Ł. Mixing, Ergodic, and Nonergodic Processes with Rapidly Growing Information between Blocks. IEEE Trans. Inf. Theory
**2012**, 58, 3392–3401. [Google Scholar] - Billingsley, P. Probability and Measure; Wiley: Hoboken, NJ, USA, 1979. [Google Scholar]
- Gray, R.M. Probability, Random Processes, and Ergodic Properties; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
- Breiman, L. Probability; SIAM: Philadephia, PA, USA, 1992. [Google Scholar]
- Kallenberg, O. Foundations of Modern Probability; Springer: Berlin/Heidelberg, Germany, 1997. [Google Scholar]
- Yaglom, A.M.; Yaglom, I.M. Probability and Information. In Theory and Decision Library; Springer: Berlin/Heidelberg, Germany, 1983. [Google Scholar]
- Gray, R.M.; Davisson, L.D. The ergodic decomposition of stationary discrete random processses. IEEE Trans. Inf. Theory
**1974**, 20, 625–636. [Google Scholar] [CrossRef] - Dębowski, Ł. Hilberg Exponents: New Measures of Long Memory in the Process. IEEE Trans. Inf. Theory
**2015**, 61, 5716–5726. [Google Scholar] - Li, M.; Vitányi, P.M.B. An Introduction to Kolmogorov Complexity and Its Applications, 3rd ed.; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
- Barron, A.R. Logically Smooth Density Estimation. Ph.D. Thesis, Stanford University, Stanford, CA, USA, 1985. [Google Scholar]
- Ryabko, B. Applications of Universal Source Coding to Statistical Analysis of Time Series. In Selected Topics in Information and Coding Theory; Series on Coding and Cryptology; Woungang, I., Misra, S., Misra, S.C., Eds.; World Scientific Publishing: Singapore, 2010. [Google Scholar]
- De Luca, A. On the combinatorics of finite words. Theor. Comput. Sci.
**1999**, 218, 13–39. [Google Scholar] [CrossRef] - Dębowski, Ł. Maximal Repetitions in Written Texts: Finite Energy Hypothesis vs. Strong Hilberg Conjecture. Entropy
**2015**, 17, 5903–5919. [Google Scholar]

**Figure 1.**The PPM order ${G}_{\mathrm{PPM}}\left({x}_{1}^{n}\right)$ and the cardinality of the PPM vocabulary $\mathrm{card}{V}_{\mathrm{PPM}}\left({x}_{1}^{n}\right)$ versus the input length n for William Shakespeare’s First Folio/35 Plays and a random permutation of the text’s characters.

© 2018 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Dębowski, Ł.
Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited. *Entropy* **2018**, *20*, 85.
https://doi.org/10.3390/e20020085

**AMA Style**

Dębowski Ł.
Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited. *Entropy*. 2018; 20(2):85.
https://doi.org/10.3390/e20020085

**Chicago/Turabian Style**

Dębowski, Łukasz.
2018. "Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited" *Entropy* 20, no. 2: 85.
https://doi.org/10.3390/e20020085