# Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited

## 1. Introduction

What can be some general mathematical properties of natural language treated as a stochastic process, in view of empirical data?

## 2. Ergodic and Nonergodic Processes

**.**For any discrete stationary process ${\left({X}_{i}\right)}_{i=1}^{\infty}$, there exist limits

- Process ${\left({X}_{i}\right)}_{i=1}^{\infty}$ is called IID (independent identically distributed) if$$\begin{array}{c}\hfill P({X}_{1}^{n}={x}_{1}^{n})=\pi \left({x}_{1}\right)\dots \pi \left({x}_{n}\right).\end{array}$$All IID processes are ergodic.
- Process ${\left({X}_{i}\right)}_{i=1}^{\infty}$ is called Markov (of order 1) if$$\begin{array}{c}\hfill P({X}_{1}^{n}={x}_{1}^{n})=\pi \left({x}_{1}\right)p\left({x}_{2}\right|{x}_{1})\dots p\left({x}_{n}\right|{x}_{n-1}).\end{array}$$A Markov process is ergodic in particular if$$\begin{array}{c}\hfill p\left({x}_{i}\right|{x}_{i-1})>c>0.\end{array}$$For a sufficient and necessary condition, see ([32], Theorem 7.16).
- Process ${\left({X}_{i}\right)}_{i=1}^{\infty}$ is called hidden Markov if ${X}_{i}=g\left({S}_{i}\right)$ for a certain Markov process ${\left({S}_{i}\right)}_{i=1}^{\infty}$ and a function g. A hidden Markov process is ergodic in particular if the underlying Markov process is ergodic.

**.**Any process ${\left({X}_{i}\right)}_{i=1}^{\infty}$ with a stationary measure P is almost surely ergodic with respect to the random measure F given by

**.**Any stationary probability measure P can be represented as

## 3. Strongly Nonergodic Processes

**.**A stationary discrete process ${\left({X}_{i}\right)}_{i=1}^{\infty}$ is nonergodic if and only if there exists a function $f:{\mathbb{X}}^{*}\to \left(\right)open="\{"\; close="\}">0,1,2$ and a binary random variable Z such that $0<P(Z=0)<1$ and

**.**A stationary discrete process ${\left({X}_{i}\right)}_{i=1}^{\infty}$ is called strongly nonergodic if there exist a function $g:\mathbb{N}\times {\mathbb{X}}^{*}\to \left(\right)open="\{"\; close="\}">0,1,2$ and a binary IID process ${\left({Z}_{k}\right)}_{k=1}^{\infty}$ such that $P({Z}_{k}=0)=P({Z}_{k}=1)=1/2$ and

## 4. Perigraphic Processes

## 5. Theorem about Facts and Words

**.**For ${x}_{1}^{n}\in {\mathbb{X}}^{n}$ and $k\in \left(\right)open="\{"\; close="\}">-1,0,1,\dots $, we put

- ${\mathrm{PPM}}_{k}\left({x}_{i}\right|{x}_{1}^{i-1})>0$ and ${\sum}_{{x}_{i}\in \mathbb{X}}{\mathrm{PPM}}_{k}\left({x}_{i}\right|{x}_{1}^{i-1})=1$,
- ${\mathrm{PPM}}_{k}\left({x}_{1}^{n}\right)>0$ and ${\sum}_{{x}_{1}^{n}\in {\mathbb{X}}^{n}}{\mathrm{PPM}}_{k}\left({x}_{1}^{n}\right)=1$,
- $\mathrm{PPM}\left({x}_{1}^{n}\right)>0$ and ${\sum}_{{x}_{1}^{n}\in {\mathbb{X}}^{n}}\mathrm{PPM}\left({x}_{1}^{n}\right)=1$.

**.**The PPM probability is universal in expectation, i.e., we have

**.**Let ${\left({X}_{i}\right)}_{i=1}^{\infty}$ be a stationary strongly nonergodic process over a finite alphabet. We have inequalities

**.**Let ${\left({X}_{i}\right)}_{i=1}^{\infty}$ be a stationary process over a finite alphabet. We have inequalities

## 6. Hilberg Exponents and Empirical Data

## 7. Conclusions

## Acknowledgments

## Conflicts of Interest

## Abbreviations

IID | independent identically distributed |

PPM | prediction by partial matching |

## Appendix A. Facts and Mutual Information

- First, there are four pointwise Shannon information measures:
- entropy$\mathbb{H}\left(X\right)=-\mathrm{log}P\left(X\right)$,
- conditional entropy$\mathbb{H}\left(X\right|Z):=-\mathrm{log}P(X\left|Z\right)$,
- mutual information$\mathbb{I}(X;Y):=\mathbb{H}\left(X\right)+\mathbb{H}\left(Y\right)-\mathbb{H}(X,Y)$,
- conditional mutual information$\mathbb{I}(X;Y|Z):=\mathbb{H}(X\left|Z\right)+\mathbb{H}\left(Y\right|Z)-\mathbb{H}(X,Y\left|Z\right)$,

where $P\left(X\right)$ is the probability of a random variable X and $P\left(X\right|Z)$ is the conditional probability of a random variable X given a random variable Z. The above definitions make sense for discrete-valued random variables X and Y and an arbitrary random variable Z. If Z is a discrete-valued random variable, then also $\mathbb{H}(X,Z)-\mathbb{H}\left(Z\right)=\mathbb{H}\left(X\right|Z)$ and $\mathbb{I}(X;Z)=\mathbb{H}\left(X\right)-\mathbb{H}\left(X\right|Z)$. - Moreover, we will use four algorithmic information measures:
- entropy${\mathbb{H}}_{a}\left(x\right)=K\left(x\right)\mathrm{log}2$,
- conditional entropy${\mathbb{H}}_{a}\left(x\right|z):=K\left(x\right|z)\mathrm{log}2$,
- mutual information${\mathbb{I}}_{a}(x;y):={\mathbb{H}}_{a}\left(x\right)+{\mathbb{H}}_{a}\left(y\right)-{\mathbb{H}}_{a}(x,y)$,
- conditional mutual information${\mathbb{I}}_{a}(x;y|z):={\mathbb{H}}_{a}\left(x\right|z)+{\mathbb{H}}_{a}\left(y\right|z)-{\mathbb{H}}_{a}(x,y|z)$,

where $K\left(x\right)$ is the prefix-free Kolmogorov complexity of an object x and $K\left(x\right|z)$ is the prefix-free Kolmogorov complexity of an object x given an object z. In the above definitions, x and y must be finite objects (finite texts), whereas z can be also an infinite object (an infinite sequence). If z is a finite object, then ${\mathbb{H}}_{a}(x,z)-{\mathbb{H}}_{a}\left(z\right)\stackrel{+}{=}{\mathbb{H}}_{a}\left(x\right|z,K\left(z\right))$ rather than being equal to ${\mathbb{H}}_{a}\left(x\right|z)$, where $\stackrel{+}{=}$, $\stackrel{+}{<}$, and $\stackrel{+}{>}$ are the equality and the inequalities up to an additive constant ([37], Theorem 3.9.1). Hence,$$\begin{array}{cc}\hfill {\mathbb{H}}_{a}\left(x\right)-{\mathbb{H}}_{a}\left(x\right|z)+{\mathbb{H}}_{a}\left(K\left(z\right)\right)& \stackrel{+}{>}{\mathbb{I}}_{a}(x;z)\stackrel{+}{=}{\mathbb{H}}_{a}\left(x\right)-{\mathbb{H}}_{a}\left(x\right|z,K\left(z\right))\hfill \\ & \stackrel{+}{>}{\mathbb{H}}_{a}\left(x\right)-{\mathbb{H}}_{a}\left(x\right|z).\hfill \end{array}$$

**.**Let ${\left({X}_{i}\right)}_{i=1}^{\infty}$ be a stationary strongly nonergodic process over a finite alphabet. We have inequality

**.**Let ${\left({X}_{i}\right)}_{i=1}^{\infty}$ be a stationary process over a finite alphabet. We have inequality

## Appendix B. Mutual Information and PPM Words

**.**Let ${\left({X}_{i}\right)}_{i=1}^{\infty}$ be a stationary process over a finite alphabet. We have inequalities

## Appendix C. Hilberg Exponents for Santa Fe Processes

**Figure 1.**The PPM order ${G}_{\mathrm{PPM}}\left({x}_{1}^{n}\right)$ and the cardinality of the PPM vocabulary $\mathrm{card}{V}_{\mathrm{PPM}}\left({x}_{1}^{n}\right)$ versus the input length n for William Shakespeare’s First Folio/35 Plays and a random permutation of the text’s characters.

