# Effective Complexity of Stationary Process Realizations

^{1}

^{2}

^{3}

^{*}

Previous Article in Journal

Max Planck Institute for Mathematics in the Sciences, Inselstr. 22, Leipzig 04103, Germany

Institute of Mathematics 7-2, Technical University Berlin, Straße des 17. Juni 136, Berlin 10623, Germany

Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA

Author to whom correspondence should be addressed.

Received: 8 May 2011 / Revised: 15 June 2011 / Accepted: 17 June 2011 / Published: 22 June 2011

The concept of effective complexity of an object as the minimal description length of its regularities has been initiated by Gell-Mann and Lloyd. The regularities are modeled by means of ensembles, which is the probability distributions on finite binary strings. In our previous paper [1] we propose a definition of effective complexity in precise terms of algorithmic information theory. Here we investigate the effective complexity of binary strings generated by stationary, in general not computable, processes. We show that under not too strong conditions long typical process realizations are effectively simple. Our results become most transparent in the context of coarse effective complexity which is a modification of the original notion of effective complexity that needs less parameters in its definition. A similar modification of the related concept of sophistication has been suggested by Antunes and Fortnow.

The concept of effective complexity has been initiated by Gell-Mann and Lloyd in [2], see also [3]. The main motivation was to define a complexity measure that distinguishes between regular and random aspects of a given object typically encoded as a binary string. This is in contrast to Kolmogorov complexity which is not sensitive to the source of incompressibility and in this sense fails to capture what is meant by complexity in the common language.

The main idea underlying the concept has been considered at different places in the literature, see [4,5,6,7,8,9,10]. It may be summarized as follows. One considers programs computing a given binary string as consisting of two parts: the implementation of an algorithm and a valid input for that algorithm. Then the corresponding measures of complexity refer to the algorithm part.

In [2] the algorithm part has been motivated as a description of a physical theory-represented by a probability distribution on finite binary strings, while the second part has been used to distinguish one among all possible objects contained in the (typical) support of the distribution. Effective complexity is equal to the length of the algorithm/theory part which is minimized over the set of programs that compute the string and that are almost minimal, i.e., their length is close to the Kolmogorov complexity of the string.

In [1] we have proposed a definition of effective complexity in precise terms of algorithmic information theory. Our formalization allows to include the concept into the context of algorithmic statistics, which also deals with two-part codings of binary strings [5]. Instances of corresponding measures of complexity are Kolmogorov minimal sufficient statistics and sophistication [5,10]. Roughly speaking, while Kolmogorov minimal sufficient statistics of a binary string x is the minimal algorithmic statistic of x from the model class of finite sets, and sophistication refers to the model class of total programs, effective complexity mainly coincides with the length of algorithmic statistics of x minimized over computable probability distributions.

More precisely, the minimization domain of effective complexity consists of computable probability distributions with total information which is approximately equal to the Kolmogorov complexity of the string: the tolerance level being specified by a parameter Δ. Total information has been defined by Gell-Mann and Lloyd in [2,3] as the sum of Kolmogorov complexity and Shannon entropy of a given computable ensemble. It is worth mentioning that it is equivalent to the concept of physical entropy introduced by Zurek for large physical systems such as thermodynamic engines [8].

Restricting the minimization domain of effective complexity by intersecting with subsets corresponding to pre-knowledge about the object, which is subjective to the observer, one ends up with a version of effective complexity with constraints. As far as we know, there is no literature other than the papers by Gell-Mann and Lloyd [2], where the idea to incorporate subjective pre-knowledge into the measure of complexity has been considered explicitly.

Compared to the effective complexity without constraints, which we will refer to as plain effective complexity or simply effective complexity, this gives a larger value and is the reason why Gell-Mann and Lloyd suggest to use the constrained version instead of the plain one: “If we impose no other conditions, every entity would come out simple!”, see [2], (p. 392). This statement has to be contrasted with the fact that there exist strings with large plain effective complexity, cf. Theorem 13 of our previous work [1]. See also corresponding results in the context of algorithmic statistics and sophistication, Theorem 2.2 in [5] and Theorem 6.5 in [6], respectively. Hence, the above conviction can be substantiated only in a weaker version referring to some typical behaviour. In the present contribution, we find a framework to this end in the form of almost sure statements in terms of probability theory. The focus is on mathematical foundations for the concept of effective complexity. In particular, we extend the analysis of [1] to the context of asymptotic behaviour.

In more detail, we investigate discrete-time stochastic processes with binary state space in the context of effective complexity as it has been defined in [1]. In addition to proving that typical strings are simple with respect to the plain effective complexity, our results also allow a deeper understanding of the dependence of effective complexity on the parameter Δ. Recall that this parameter determines the minimization domain consisting of computable ensembles with a total information that is Δ-close to the Kolmogorov complexity of the string. A corresponding parameter also appears in the context of sophistication and, more generally, algorithmic sufficient statistics [5,10]. Conceptually, it also corresponds to the significance level of Bennett’s logical depth defined in [11]. The relation between effective complexity and logical depth has been elaborated in [1].

In [10] Antunes and Fortnow suggested a modification of sophistication called coarse sophistication. In an analogous way, we introduce coarse effective complexity. It modifies the original concept of plain effective complexity by, roughly speaking, incorporating Δ into the definition as a further minimization argument. As a consequence, the definition becomes independent of the choice of this parameter. Our main results on effective complexity have direct implications on the asymptotic behaviour of coarse effective complexity. In particular, for an arbitrary stationary process the value of coarse effective complexity of a typical finite string is asymptotically upper bounded by any linear function of a string’s length.

After fixing notations and the mathematical framework in Section 2, we formulate and prove our main result, Theorem 1, in Section 3. It states that sufficiently long typical strings generated by a stationary process are effectively simple. The proof relies on the observation that the total information of uniform distributions on universally typical subsets is upper bounded by a value that exceeds the Kolmogorov complexity of a typical string by any linear growing amount in the string’s length. In Section 4 we introduce the concept of coarse effective complexity. We show that strings of moderate value of coarse effective complexity exist, see Theorem 3, and derive from our main theorem an upper bound on coarse effective complexity of long typical realizations of a stationary process, see Theorem 4. Finally, Section 5 contains some conclusions and an outlook for further analysis of effective complexity in its constrained version.

We denote by ${\{0,1\}}^{*}$ the set of finite binary strings, i.e., ${\{0,1\}}^{*}=\left\{\lambda \right\}\cup {\bigcup}_{n\in \mathbb{N}}{\{0,1\}}^{n}$, where λ is the empty string, while the set of doubly infinite sequences $(\dots ,{x}_{-1},{x}_{0},{x}_{1},\dots )$ with ${x}_{i}\in \{0,1\}$, $i\in \mathbb{Z}$, is denoted by ${\{0,1\}}^{\infty}$. We write $\ell \left(x\right)$ for the length of $x\in {\{0,1\}}^{*}$. Finite blocks ${x}_{m}^{n}=({x}_{m},{x}_{m+1},\dots ,{x}_{n})$, $m\le n$, of $x\in {\{0,1\}}^{\infty}$ are elements of ${\{0,1\}}^{*}$ of length $\ell \left({x}_{m}^{n}\right)=n-m+1$. We may identify them with cylinder sets $\left[{x}_{m}^{n}\right]:\phantom{\rule{0.277778em}{0ex}}=\{y\in {\{0,1\}}^{\infty}:\phantom{\rule{4pt}{0ex}}{y}_{i}={x}_{i},m\le i\le n\}$. In a similar fashion, strings $x\in {\{0,1\}}^{*}$ are associated to cylinder sets of the form $\left[x\right]:\phantom{\rule{0.277778em}{0ex}}=\{y\in {\{0,1\}}^{\infty}:\phantom{\rule{4pt}{0ex}}{y}_{i}={x}_{i},1\le i\le \ell \left(x\right)\}$. The σ-algebra on ${\{0,1\}}^{\infty}$ generated by the cylinder sets $\left[{x}_{m}^{n}\right]$, $m,n\in \mathbb{Z}$, $m\le n$, is denoted by Σ. We write $\mathcal{T}\left({\{0,1\}}^{\infty}\right)$ for the convex set of probability measures on $({\{0,1\}}^{\infty},\Sigma )$, which are invariant with respect to the left-shift T on ${\{0,1\}}^{\infty}$. The subset of ergodic T-invariant probability measures, i.e., the extremal points in $\mathcal{T}\left({\{0,1\}}^{\infty}\right)$, is denoted by $\mathcal{E}\left({\{0,1\}}^{\infty}\right)$.

Let $P\in \mathcal{T}\left({\{0,1\}}^{\infty}\right)$. The random variables ${X}_{i}$, $i\in \mathbb{Z}$, given by the coordinate projections ${X}_{i}\left(x\right):\phantom{\rule{0.277778em}{0ex}}={x}_{i}$, $x\in {\{0,1\}}^{\infty}$, respectively, represent a stationary process with values in $\{0,1\}$. Typical outcomes of such stochastic processes are the main focus in the present paper. The goal is to estimate their effective complexity. We will refer to elements of $\mathcal{T}\left({\{0,1\}}^{\infty}\right)$ as stationary probability measures and stationary stochastic processes interchangeably.

Adopting the setup of [1] as far as possible, we refer to probability distributions on ${\{0,1\}}^{*}$ as ensembles.

For each $n\in \mathbb{N}$ we identify the joint distribution (alternatively called n-block distribution) ${P}^{\left(n\right)}$ of n successive outcomes $({X}_{1},{X}_{2},\dots ,{X}_{n})$ of a stationary process with an ensembles ${\mathbb{E}}_{n}$ on ${\{0,1\}}^{*}$ through the relation: ${\mathbb{E}}_{n}\left(x\right)={P}^{\left(n\right)}\left(x\right)$ if $\ell \left(x\right)=n$ and ${\mathbb{E}}_{n}\left(x\right)=0$ otherwise.

Recall the definition of prefix Kolmogorov complexity $K\left(x\right)$ of a binary string $x\in {\{0,1\}}^{*}$:
where U is an arbitrary but fixed universal prefix computer. For details concerning the basics as well as deeper results on Kolmogorov complexity theory we refer to the book by Li and Vitányi [12].

$$\begin{array}{c}\hfill K\left(x\right):\phantom{\rule{0.277778em}{0ex}}=min\left\{\ell \right(p):\phantom{\rule{4pt}{0ex}}U(p)=x\}\end{array}$$

We call an ensemble computable if there exists a program for the universal computer U that, given $x\in {\{0,1\}}^{*}$ and $m\in \mathbb{N}$ as inputs, computes an approximation of the probability $\mathbb{E}\left(x\right)$ with accuracy of at least ${2}^{-m}$.

In [1] we have introduced an extension of the notion of Kolmogorov complexity to the case of computable ensembles $\mathbb{E}$ with computable and finite entropies $H(\mathbb{E})$. Here we mean by entropy $H(\mathbb{E})$ the Shannon entropy of the probability distribution $\mathbb{E}$ defined by $-{\sum}_{x\in {\{0,1\}}^{*}}\mathbb{E}\left(x\right)log\mathbb{E}\left(x\right)$. Note that a computable ensemble does not necessarily have a computable entropy, such that the corresponding requirement is a restriction, see [1] for details. In what follows a distinction only between general ensembles and computable ones with computable and finite entropies is drawn. We will refer to the latter ones as computable for short.

The Kolmogorov complexity $K(\mathbb{E})$ of a computable ensemble $\mathbb{E}$ is defined as the length of the shortest computer program that, given $x\in {\{0,1\}}^{*}$ and $m\in \mathbb{N}$ as inputs, outputs both $\mathbb{E}\left(x\right)$ and $H(\mathbb{E})$ with an accuracy of at least ${2}^{-m}$.

Additionally, we need to define computability of stochastic processes. The following definition is a reformulation of the notion of a “computable measure” in [12].

A stationary process P is called computable if there exists a program $p\in {\{0,1\}}^{*}$ for a universal computer U that, given $x\in {\{0,1\}}^{*}$ and $m\in \mathbb{N}$ as inputs, computes the probability $P\left(\right[x\left]\right)$ up to accuracy ${2}^{-m}$.

The goal is to show that under not too strong conditions long typical samples of stationary processes are effectively simple. Before we make rigorous statements we need a number of definitions. The first ones we adopt from our previous paper [1].

Let $\delta \ge 0$. We say that an ensemble $\mathbb{E}$ is δ-typical for a string $x\in {\{0,1\}}^{*}$, or alternatively, we call x δ-typical for $\mathbb{E}$, if the Shannon entropy $H(\mathbb{E})$ of $\mathbb{E}$ is finite and

$$\begin{array}{c}\hfill -log\mathbb{E}\left(x\right)\le H(\mathbb{E})(1+\delta )\end{array}$$

In particular, the special case of an equidistributed ensemble is δ-typical for all strings in the support and any $\delta \ge 0$ The total information $\Sigma (\mathbb{E})$ of a computable ensemble $\mathbb{E}$ is defined by

$$\Sigma (\mathbb{E}):\phantom{\rule{0.277778em}{0ex}}=H(\mathbb{E})+K(\mathbb{E})$$

For a motivation of the two definitions above, see [1].

Let $\delta \ge 0$ and $\Delta >0$. Effective complexity ${\mathcal{E}}_{\delta ,\Delta}\left(x\right)$ of a finite string $x\in {\{0,1\}}^{*}$ is defined by
where ${\mathcal{P}}_{\delta ,\Delta}\left(x\right)$ denotes the minimization domain associated to x:

$$\begin{array}{c}\hfill {\mathcal{E}}_{\delta ,\Delta}\left(x\right):\phantom{\rule{0.277778em}{0ex}}=min\{K(\mathbb{E}):\phantom{\rule{4pt}{0ex}}\mathbb{E}\in {\mathcal{P}}_{\delta ,\Delta}\left(x\right)\}\end{array}$$

$$\begin{array}{c}\hfill {\mathcal{P}}_{\delta ,\Delta}\left(x\right):\phantom{\rule{0.277778em}{0ex}}=\{\mathbb{E}:\phantom{\rule{4pt}{0ex}}\mathbb{E}\phantom{\rule{4pt}{0ex}}\text{computableensemble},\mathbb{E}\phantom{\rule{4pt}{0ex}}\delta \mathrm{-typical}\mathrm{for}x,\phantom{\rule{4pt}{0ex}}\Sigma (\mathbb{E})\le K\left(x\right)+\Delta \}\end{array}$$

We refer to elements of ${\mathcal{P}}_{\delta ,\Delta}\left(x\right)$ as effective ensembles for x.

Taking the point of view of [2], which was reviewed in [1], effective ensembles represent theories that are judged to be good explanations for the appearance of x.

The more general notion of effective complexity with constraints has been suggested in [2] mainly to circumvent problems of plain effective complexity. We have discussed them shortly in the Introduction. The main idea is that the constraints reflect some pre-knowledge about the possible theory for x. In [1] we have proposed a formalization of the constrained version in the following manner:
where $\mathcal{C}$ is a subset of $\mathcal{P}\left({\{0,1\}}^{*}\right)$. Note that with $\mathcal{C}=\mathcal{P}\left({\{0,1\}}^{*}\right)$ we have ${\mathcal{E}}_{\delta ,\Delta}\left(x\right)={\mathcal{E}}_{\delta ,\Delta}\left(x\right|\mathcal{C})$, for all $x\in {\{0,1\}}^{*}$.

$${\mathcal{E}}_{\delta ,\Delta}\left(x\right|\phantom{\rule{4pt}{0ex}}\mathcal{C}):\phantom{\rule{0.277778em}{0ex}}=min\{K(\mathbb{E}):\phantom{\rule{4pt}{0ex}}\mathbb{E}\in {\mathcal{P}}_{\delta ,\Delta}\left(x\right),\mathbb{E}\in \mathcal{C}\}$$

In what follows other essential concepts are that of typical and/or universally typical subsets.

Let P be a T-invariant probability measure on $({\{0,1\}}^{\infty},\Sigma )$. We call a sequence of subsets ${M}_{n}\in \Sigma $, $n\in \mathbb{N}$, P-typical if
We call $\left({M}_{n}\right)$ strongly P-typical if for P-almost all x there exists an ${N}_{x}\in \mathbb{N}$ such that

$$\underset{n\to \infty}{lim}P\left({M}_{n}\right)=1$$

$$x\in {M}_{n}\phantom{\rule{2.em}{0ex}}\mathrm{for}\mathrm{every}n\ge {N}_{x}$$

The above notions of typicality apply naturally to sequences ${M}_{n}\subseteq {\{0,1\}}^{n}$, $n\in \mathbb{N}$, if we identify subsets ${M}_{n}$, $n\in \mathbb{N}$, with cylinder sets $\left[{M}_{n}\right]\subseteq {\{0,1\}}^{\infty}$, respectively.

Let Λ be a set of stationary processes with values in $\{0,1\}$. We call ${M}_{n}\subseteq {\{0,1\}}^{n}$, $n\in \mathbb{N}$, universally typical for Λ if the sequence is P-typical for every $P\in \Lambda $, i.e., ${lim}_{n\to \infty}{P}^{\left(n\right)}\left({M}_{n}\right)=1$. We call the sequence universally strongly typical for Λ if it is strongly P-typical for every $P\in \Lambda $.

For sets ${\Lambda}_{r}\subseteq \mathcal{E}\left({\{0,1\}}^{\infty}\right)$ consisting of ergodic processes with entropy rate upper bounded by $r>0$ there exist universally typical subsets ${T}_{r,n}\subseteq {\{0,1\}}^{n}$ with
for all $n\in \mathbb{N}$. Moreover, there are methods to construct such sequences of universally typical subsets for ${\Lambda}_{r}$. We will apply the Lempel-Ziv algorithm in the construction procedure below. This famous algorithm represents a universal sequential data compression, see [13,14,15]. The main point is that all we need to know about an ergodic process P is its entropy rate ${h}_{P}$. This allows to prove the following theorem for stationary, in general not computable processes.

$$\begin{array}{c}\hfill |{T}_{r,n}|\le {2}^{rn}\end{array}$$

$$\begin{array}{c}\hfill {\mathcal{E}}_{\delta ,{\Delta}_{n}}\left({x}_{1}^{n}\right)\stackrel{+}{<}logn+\mathcal{O}(loglogn)\end{array}$$

First, let $r>0$ be arbitrary and define for each $n\in \mathbb{N}$ the subset ${T}_{r,n}\subseteq {\{0,1\}}^{n}$ as the set consisting of all binary strings ${x}_{1}^{n}$ which are mapped by the Lempel-Ziv (LZ) algorithm to a code word of length ${\ell}_{LZ}\left({x}_{1}^{n}\right)$ lower than $nr$. Then the sequence ${T}_{r,n}$, $n\in \mathbb{N}$, is universally typical for the set ${\Lambda}_{r}$ of ergodic processes with entropy rates lower than r.

Recall the following remarkable property of the LZ algorithm: For every ergodic process Q with entropy rate ${h}_{Q}$ it holds ${lim}_{n\to \infty}\frac{1}{n}{\ell}_{LZ}\left({x}_{1}^{n}\right)={h}_{Q}$ for Q-almost all $x\in {\{0,1\}}^{\infty}$. This implies that indeed the subsets ${T}_{r,n}$ as constructed above are typical for any ergodic Q with ${h}_{Q}<r$.

The upper bound (2) on the size $|{T}_{r,n}|$ follows from the fact that the LZ algorithm is a faithful coder. Hence the codeword lengths satisfy the Kraft inequality:

$$\begin{array}{ccc}\hfill 1& \ge & \sum _{{x}_{1}^{n}\in {\{0,1\}}^{n}}{2}^{-{\ell}_{LZ}\left({x}_{1}^{n}\right)}\ge \sum _{{x}_{1}^{n}\in {T}_{r,n}}{2}^{-{\ell}_{LZ}\left({x}_{1}^{n}\right)}\hfill \\ & \ge & \sum _{{x}_{1}^{n}\in {T}_{r,n}}{2}^{-nr}=\left|{T}_{r,n}\right|{2}^{-nr}\hfill \end{array}$$

Next, we show that if r is chosen to be a positive rational number satisfying $0<r-{h}_{P}<\u03f5/4$ then for P-almost every $x\in {\{0,1\}}^{\infty}$ there exists an ${N}_{x}\in \mathbb{N}$ such that
where again ${\mathbb{E}}_{r,n}$ denotes the uniform distribution on the universally typical subset ${T}_{r,n}$. First note that for all $n\in \mathbb{N}$
Secondly, we prove that there is a constant $c\in \mathbb{N}$ such that for all $n\in \mathbb{N}$

$$\begin{array}{c}\hfill \Sigma \left({\mathbb{E}}_{r,n}\right)\le K\left({x}_{1}^{n}\right)+{\Delta}_{n}\phantom{\rule{2.em}{0ex}}foreveryn\ge {N}_{x}\end{array}$$

$$\begin{array}{c}\hfill H\left({\mathbb{E}}_{r,n}\right)=log\left|{T}_{r,n}\right|\le rn\end{array}$$

$$\begin{array}{c}\hfill K\left({\mathbb{E}}_{r,n}\right)\le K\left(n\right)+K\left(r\right)+c\end{array}$$

This is derived from the existence of a program p of length c which expects as inputs $n\in \mathbb{N}$, $r\in \mathbb{Q}$ and $x\in {\{0,1\}}^{*}$ and outputs the value $\frac{1}{|{T}_{r,n}|}$ if $x\in {T}_{r,n}\subseteq {\{0,1\}}^{n}$ and 0 otherwise. Thus for fixed inputs n and r it gives a description of the uniform distribution ${\mathbb{E}}_{r,n}$ on ${T}_{r,n}$.

Indeed, p may be constructed on the base of a program ${p}_{LZ}$ implementing the Lempel-Ziv (LZ) algorithm on the given reference universal computer U. For given inputs n and r let p apply ${p}_{LZ}$ as a subroutine in order to determine elements of ${T}_{r,n}$. Then for fixed $n\in \mathbb{N}$ the number $|{T}_{r,n}|$ and hence the probability value $1/|{T}_{r,n}|$ of each $x\in {T}_{r,n}$ may be calculated easily.

To specify $r\in \mathbb{Q}$ and $n\in \mathbb{N}$ a number of $K\left(r\right)+K\left(n\right)$ bits is sufficient. With $c=\ell \left(p\right)$ the estimate (5) follows.

Next, fix an $N\in \mathbb{N}$ such that $K\left(n\right)+K\left(r\right)+c\le \frac{\u03f5}{4}n$ for all $n\ge N$. Then
where the last inequality holds by assumption $r-h<\frac{\u03f5}{4}$. According to the theorem by Brudno, see [16], for P-almost all x there exists an ${N}_{x,\u03f5}\in \mathbb{N}$ such that $K\left({x}_{1}^{n}\right)\ge n(h-\frac{\u03f5}{2})$ for all $n\ge {N}_{x,\u03f5}$. It follows for ${\Delta}_{n}=\u03f5n$

$$\begin{array}{c}\hfill \Sigma \left({\mathbb{E}}_{r,n}\right)\le nr+K\left(n\right)+K\left(r\right)+c\le n\left(r+\frac{\u03f5}{4}\right)\le n\left(h+\frac{\u03f5}{2}\right)\end{array}$$

$$\begin{array}{c}\hfill K\left({x}_{1}^{n}\right)+{\Delta}_{n}\ge n\left(h+\frac{\u03f5}{2}\right),\phantom{\rule{2.em}{0ex}}n\ge {N}_{x,\u03f5}\end{array}$$

Relations (6) and (7) together imply (4) for P-almost all x and $n\ge {N}_{x}:\phantom{\rule{0.277778em}{0ex}}=max\{{N}_{x,\u03f5},N\}$. It follows that P-almost surely the effective complexity ${\mathcal{E}}_{\delta ,{\Delta}_{n}}\left({x}_{1}^{n}\right)$ is upper bounded by the Kolmogorov complexity of ${\mathbb{E}}_{r,n}$ for $n\ge {N}_{x}$:

$$\begin{array}{ccc}\hfill {\mathcal{E}}_{\delta ,{\Delta}_{n}}\left({x}_{1}^{n}\right)& \le & K\left({\mathbb{E}}_{r,n}\right)\le K\left(n\right)+K\left(r\right)+c\hfill \\ & \stackrel{+}{<}& logn+\mathcal{O}(loglogn)\hfill \end{array}$$

Now let P be an arbitrary stationary process. Recall that there is a unique ergodic decomposition of P
Moreover, to P-almost every $x\in {\{0,1\}}^{\infty}$ we may associate an ergodic component ${Q}_{x}$ of P such that x is a typical element of ${Q}_{x}$. Then there exists an ${N}_{x,\u03f5}$ such that
where ${h}_{x}$ denotes the entropy rate of ${Q}_{x}$. Hence the proof for the stationary case reduces to the ergodic situation considered in the first part above.

$$\begin{array}{c}\hfill P={\int}_{\mathcal{E}\left({\{0,1\}}^{\infty}\right)}Qd\mu \left(Q\right)\end{array}$$

$$\begin{array}{c}\hfill K\left({x}_{1}^{n}\right)+{\Delta}_{n}\ge n\left({h}_{x}+\frac{\u03f5}{2}\right),\phantom{\rule{2.em}{0ex}}n\ge {N}_{x,\u03f5}\end{array}$$

Finally, we remark that Theorem 1 applies to the large class of stationary processes. This covers, in particular, the case of independent identically distributed processes, which represent the simplest and at the same time one of the best studied classes of stochastic processes. Further, our theorem is valid for stationary Markov chains. These represent another important and rather feasible class of stationary processes.

Our main result becomes most transparent if presented in the context of coarse effective complexity. This is a modification of plain effective complexity which incorporates the parameter Δ as a penalty into the original formula. It is inspired by a corresponding modification of sophistication, called coarse sophistication, which has been introduced by Antunes and Fortnow in [10].

Let $\delta \ge 0$. The coarse effective complexity ${\mathcal{E}}_{\delta}\left(x\right)$ of a finite string $x\in {\{0,1\}}^{*}$ is defined by

$$\begin{array}{c}\hfill {\mathcal{E}}_{\delta}\left(x\right):\phantom{\rule{0.277778em}{0ex}}=min\{K(\mathbb{E})+\Sigma (\mathbb{E})-K\left(x\right):\mathbb{E}iscomputableensemble,\mathbb{E}\phantom{\rule{4pt}{0ex}}is\delta -\mathrm{typical}\mathrm{for}x\}\end{array}$$

The term $\Sigma (\mathbb{E})-K\left(x\right)$ accounts for the exact value by which the total information of an ensemble $\mathbb{E}$ exceeds the Kolmogorov complexity of x. By definition of total information $\Sigma (\mathbb{E})$ an equivalent expression for ${\mathcal{E}}_{\delta}\left(x\right)$ reads:

$$\begin{array}{c}\hfill {\mathcal{E}}_{\delta}\left(x\right)=min\{2K(\mathbb{E})+H(\mathbb{E})-K\left(x\right):\mathbb{E}iscomputableensemble,\mathbb{E}is\delta -\mathrm{typical}\mathrm{for}x\}\end{array}$$

We derive the basic properties of coarse effective complexity similarly as it has been done in [10] in the context of coarse sophistication. That is, firstly, in the proposition below, we prove an upper bound on coarse effective complexity. Secondly, we show existence of strings which are close to saturate this bound.

$$\begin{array}{c}\hfill {\mathcal{E}}_{\delta}\left(x\right)\le \frac{n}{2}+logn+c\end{array}$$

Proof. Suppose that $K\left(x\right)\le \frac{n}{2}+logn$. Let ${\mathbb{E}}_{x}$ denote the ensemble with $\mathbb{E}\left(x\right)=1$ and $\mathbb{E}\left(y\right)=0$ for $y\ne x$. Note that ${\mathbb{E}}_{x}$ is trivially δ-typical for x for any $\delta \ge 0$ and obviously $H\left({\mathbb{E}}_{x}\right)=0$. Moreover, there is a constant ${c}_{1}$ such that it holds $K\left({\mathbb{E}}_{x}\right)\le K\left(x\right)+{c}_{1}$. This implies the upper bound
where the last line holds by assumption.

$$\begin{array}{ccc}\hfill {\mathcal{E}}_{\delta}\left(x\right)& \le & 2K\left({\mathbb{E}}_{x}\right)+0-K\left(x\right)\hfill \\ & \le & K\left(x\right)+2{c}_{1}\hfill \\ & \le & \frac{n}{2}+logn+2{c}_{1}\hfill \end{array}$$

Now, suppose that $K\left(x\right)>\frac{n}{2}+logn$. Let ${\mathbb{E}}_{n}$ be the ensemble on ${\{0,1\}}^{*}$ given by ${\mathbb{E}}_{n}\left(y\right)=\frac{1}{{2}^{n}}$ for all y with $\ell \left(y\right)=n$ and vanishing elsewhere. Then $H\left({\mathbb{E}}_{n}\right)=n$ and there exists a constant ${c}_{2}$, independent of n, such that $K\left({\mathbb{E}}_{n}\right)\le logn+{c}_{2}$. It follows
where, again, the second line holds by assumption on $K\left(x\right)$. Setting $c:\phantom{\rule{0.277778em}{0ex}}=max\{2{c}_{1},2{c}_{2}\}$ completes the proof.

$$\begin{array}{ccc}\hfill {\mathcal{E}}_{\delta}\left(x\right)& \le & 2logn+2{c}_{2}+n-K\left(x\right)\hfill \\ & \le & \frac{n}{2}+logn+2{c}_{2}\hfill \end{array}$$

$$\begin{array}{c}\hfill {\mathcal{E}}_{\delta}\left(x\right)\ge (1-3\delta )\frac{n}{2}-(2+3\delta )logn-2loglogn+C\end{array}$$

Proof. For $x\in {\{0,1\}}^{*}$ and $\Delta \ge 0$ denote by ${\mathbb{E}}_{x}^{\Delta}$ the minimal ensemble associated to ${\mathcal{E}}_{\delta ,\Delta}\left(x\right)$. Due to Lemma 22 in [1] for every $\u03f5>0$ there exists a subset ${S}_{x}^{\Delta}$ of ${\{0,1\}}^{*}$ such that
where ${c}_{1}$ is a global constant. In [1] we have proven the relation
which holds for arbitrary $x\in {\{0,1\}}^{n}$, $n\in \mathbb{N}$. The term ${\Lambda}_{\Delta}$ is constant in $x\in {\{0,1\}}^{*}$ and monotonically increasing in Δ, cf. $\left(32\right)$ in [1]. Now, let ${K}_{n}:\phantom{\rule{0.277778em}{0ex}}=max\left\{K\left(t\right)\right|\phantom{\rule{4pt}{0ex}}t\in {\{0,1\}}^{n}\}$ and define
where ${\Delta}_{n}:\phantom{\rule{0.277778em}{0ex}}=\frac{n}{2}+logn+c$ is the upper bound on ${\mathcal{E}}_{\delta}\left(x\right)$ obtained in the previous proposition, and ${c}_{2}$ is a global constant from Theorem IV.2 in [5], see also Lemma 12 in [1]. If n is large enough then $0<k<n$ holds, and Theorem IV.2 in [5] applies: There is a string ${x}_{k}\in {\{0,1\}}^{n}$ such that
for every set $S\ni {x}_{k}$ with $K\left(S\right)<k-{c}_{3}$, where ${c}_{3}$ is another global constant. Let ${\mathbb{E}}_{x}$ denote the minimizing ensemble associated to coarse effective complexity ${\mathcal{E}}_{\delta}\left(x\right)$ and ${\Delta}_{x}:\phantom{\rule{0.277778em}{0ex}}=K\left({\mathbb{E}}_{x}\right)+H\left({\mathbb{E}}_{x}\right)-K\left(x\right)$ such that ${\mathbb{E}}_{\delta}\left(x\right)=K\left({\mathbb{E}}_{x}\right)+{\Delta}_{x}$. Further, define ${S}_{x}:\phantom{\rule{0.277778em}{0ex}}={S}_{x}^{{\Delta}_{x}}$. We have the inequality
where the last upper bound holds by (10). Now suppose that $K\left({S}_{{x}_{k}}\right)<k-{c}_{1}$. Then
But the strict inequality is a contradiction to (12). Hence our assumption must be false and we instead have $K\left({S}_{{x}_{k}}\right)\ge k-{c}_{3}$. By ${\mathcal{E}}_{\delta}\left(x\right)=K\left({\mathbb{E}}_{x}\right)+{\Delta}_{x}$ and using both (11) and the bound ${K}_{n}\le n+2logn+\gamma $, where γ is a global constant, we finally obtain
where $C:\phantom{\rule{0.277778em}{0ex}}=-\delta (\gamma +\u03f5)-1-(1+\delta )c-{c}_{1}-{c}_{2}-{c}_{3}$.

$$\begin{array}{ccc}\hfill log|{S}_{x}^{\Delta}|& \le & H\left({\mathbb{E}}_{x}^{\Delta}\right)(1+\delta )+\u03f5\hfill \end{array}$$

$$\begin{array}{ccc}\hfill K\left({S}_{x}^{\Delta}\right)& \le & K\left({\mathbb{E}}_{x}^{\Delta}\right)+{c}_{1}\hfill \end{array}$$

$$\begin{array}{c}\hfill K\left(x\right|{S}_{x}^{\Delta},K\left({S}_{x}^{\Delta}\right))\ge \frac{log|{S}_{x}^{\Delta}|}{1+\delta}-logn-2loglogn-{\Lambda}_{\Delta}\end{array}$$

$$\begin{array}{c}\hfill k:\phantom{\rule{0.277778em}{0ex}}=n-\delta ({K}_{n}+{\Delta}_{n}+\u03f5)+logn+2loglogn-{\Lambda}_{{\Delta}_{n}}-{c}_{2}\end{array}$$

$$\begin{array}{c}\hfill K\left({x}_{k}\right|S,K\left(S\right))<log\left|S\right|-n-k+{c}_{2}\end{array}$$

$$\begin{array}{ccc}\hfill -\delta ({K}_{n}+{\Delta}_{n}+\u03f5)& \le & -\delta (K\left({x}_{k}\right)+{\Delta}_{x}+\u03f5)\hfill \\ & \le & -\delta \left(H\left({\mathbb{E}}_{{x}_{k}}\right)+\u03f5\right)\hfill \\ & \le & -\delta \left(H\left({\mathbb{E}}_{{x}_{k}}\right)+\frac{\u03f5}{1+\delta}\right)\hfill \\ & =& \frac{-\delta}{1+\delta}\left(H\left({\mathbb{E}}_{{x}_{k}}\right)(1+\delta )+\u03f5\right)\hfill \\ & \le & \left(\frac{1}{1+\delta}-1\right)log\left|{S}_{{x}_{k}}\right|\hfill \end{array}$$

$$\begin{array}{ccc}\hfill K\left({x}_{k}\right|{S}_{{x}_{k}},K\left({S}_{{x}_{k}}\right))& <& log|{S}_{{x}_{k}}|-n+k+{c}_{2}\hfill \\ & \le & log|{S}_{{x}_{k}}|-logn-2loglogn\hfill \\ & & -{\Lambda}_{{\Delta}_{n}}-\delta ({K}_{n}+{\Delta}_{n}+\u03f5)\hfill \\ & \le & \frac{log|{S}_{{x}_{k}}|}{1+\delta}-logn-2loglogn\hfill \\ & & -{\Lambda}_{{\Delta}_{n}}\hfill \\ & \le & \frac{log|{S}_{{x}_{k}}|}{1+\delta}-logn-2loglogn\hfill \\ & & -{\Lambda}_{{\Delta}_{{x}_{k}}}\hfill \end{array}$$

$$\begin{array}{ccc}\hfill {\mathcal{E}}_{\delta}\left({x}_{k}\right)& =& K\left({\mathbb{E}}_{{x}_{k}}\right)+{\Delta}_{{x}_{k}}\hfill \\ & \ge & K\left({S}_{{x}_{k}}\right)-{c}_{1}+{\Delta}_{{x}_{k}}\hfill \\ & \ge & k-{c}_{3}-{c}_{1}+{\Delta}_{{x}_{k}}\hfill \\ & \ge & n-\delta \left(\frac{3}{2}n+3logn+\gamma +c+\u03f5\right)-logn\hfill \\ & & -2loglogn-\frac{n}{2}-logn-c-{c}_{2}-1-{c}_{3}-{c}_{1}\hfill \\ & =& (1-3\delta )\frac{n}{2}-(2+3\delta )logn-2loglogn+C\hfill \end{array}$$

Although, according to the above theorem, for arbitrary large n the existence of strings of length n with moderate coarse effective complexity is ensured, the coarse effective complexity of sufficiently long prefixes of a typical stationary process realization becomes small. This is a direct implication of Theorem 1.

$$\begin{array}{c}\hfill {\mathcal{E}}_{\delta}\left({x}_{1}^{n}\right)\le \u03f5n+logn+\mathcal{O}(loglogn)\end{array}$$

In this contribution, we studied the notion of plain effective complexity, which is assigned to a given string, within the context of an underlying stochastic process as model of the string generating mechanism. In [1] we have shown that strings which are called “non-stochastic” in the context of Kolmogorov minimal sufficient statistics have large value of plain effective complexity. The existence of such strings has been proven by Gács, Tromp and Vitányi in [5]. Here, our aim was to understand how properties of the stochastic process such as ergodicity and stationarity influence the effective complexity of corresponding typical realizations. Is it possible that the prefixes of a typical process realization represent a sequence of finite strings in increasing length n that eventually have a high or moderate value of effective complexity? Our main theorem refers to stationary and in general non-computable processes. It proves that modelling the regularities of strings by computable ensembles with total information that is allowed to excess the string’s Kolmogorov complexity up to a linearly growing amount $\u03f5n$ with an arbitrary small $\u03f5>0$ is sufficient for typically generating non-complex strings.

The value $\u03f5n$ plays the role of a parameter in the concept of effective complexity. In order to have a notion that is independent of this parameter we introduced coarse effective complexity. It corresponds to coarse sophistication introduced by Antunes and Fortnow in [10] and modifies effective complexity by incorporating the parameter as a further minimization argument. Our result on effective complexity has a direct implication on the asymptotic behaviour of coarse effective complexity of typical realizations of a stationary process. The main statement in this context demonstrates the utility of the linear parameter scaling which we have considered. Moreover, it allows to analyze the interplay between the complexity of a stochastic process and the complexity of its typical realizations. In particular, it demonstrates that, in order to have a notion of effective complexity that also reflects the complexity of a stochastic process, further modifications of plain effective complexity are necessary, for instance introduction of appropriate constraints. This possibility is in line with Gell-Mann and Lloyd’s suggestion in [2] which we discussed in the Introduction.

Finally, we point out that continuing our previous work [1] we have formulated our results for the concept of effective complexity only. However, in line with the general equivalence statements obtained in the literature, cf. Section V in [6] or Lemma 20 in [1], it should be possible to reformulate our main theorem in the more general context of algorithmic statistics. Indeed, our upper bound on effective complexity of typical process realizations is derived in terms of computable ensembles that are uniform distributions on finite sets (universally typical subsets). This demonstrates the close relation in particular to the concept of Kolmogorov minimal sufficient statistics which refers to the model class of finite sets.

The authors would like to thank their colleagues at the MPI MiS, in particular Eckehard Olbrich, Wolfgang Löhr and Nils Bertschinger for their interest and helpful discussions. This work has been supported by the Santa Fe Institute.

- Ay, N.; Müller, M.; Szkoła, A. Effective complexity and its relation to logical depth. IEEE Trans. Inform. Theory
**2010**, 56, 4593–4607. [Google Scholar] [CrossRef] - Gell-Mann, M.; Lloyd, S. Effective complexity. In Nonextensive Entropy-Interdisciplinary Applications; Gell-Mann, M., Tsallis, C., Eds.; Oxford University Press: New York, NY, USA, 2004; pp. 387–398. [Google Scholar]
- Gell-Mann, M.; Lloyd, S. Information measures, effective complexity, and total information. Complexity
**1996**, 2, 44–52. [Google Scholar] [CrossRef] - Rissanen, J. Stochastic complexity in statistical inquiry. In Series in Computer Science 15; World scientific publishing Co.: Singapore, 1988. [Google Scholar]
- Gács, P.; Tromp, J.T.; Vitányi, P.M. Algorithmic statistics. IEEE Trans. Inform. Theory
**2001**, 47, 2443–2463. [Google Scholar] [CrossRef] - Vitányi, P.M. Meaningful information. IEEE Trans. Inform. Thoery
**2006**, 52, 4617–4626. [Google Scholar] [CrossRef] - Koppel, M. Structure. In The Universal Turing Machine: A Half-Century Survey; Herken, R., Ed.; Oxford University Press: Oxford, UK, 1988; pp. 235–252. [Google Scholar]
- Zurek, W.H. Algorithmic randomness and physical entropy. Phys. Rev. A
**1989**, 40, 4731–4751. [Google Scholar] [CrossRef] [PubMed] - Vereshchagin, N.K.; Vitányi, P.M.B. Kolmogorov’s structure function and model selection. IEEE Trans. Inform. Theory
**2004**, 50, 3265–3290. [Google Scholar] [CrossRef] - Antunes, L.; Fortnow, L. Sophistication revisited. Theory Comput. Syst.
**2009**, 45, 150–161. [Google Scholar] [CrossRef] - Bennett, C. Logical depth and physical complexity. In The Universal Turing machine-a Half-Century Survey; Herken, R., Ed.; Oxford University Press: Oxford, UK, 1988. [Google Scholar]
- Li, M.; Vitanyi, P. An Introduction to Kolmogorov Complexity and Its Applications, 2nd Ed. ed; Springer-Verlag: New York, NY, USA, 1997. [Google Scholar]
- Kieffer, J. A unified approach to weak universal source coding. IEEE Trans. Inform. Theory
**1978**, 24, 674–682. [Google Scholar] [CrossRef] - Ziv, J. Coding of sources with unknown statistics-I: Probability of encoding error. IEEE Trans. Inform. Theory
**1972**, 18, 384–389. [Google Scholar] [CrossRef] - Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: New York, NY, USA, 1991. [Google Scholar]
- Brudno, A.A. Entropy and the complexity of the trajectories of a dynamical system. Trans. Moscow Math. Soc.
**1983**, 2, 127–151. [Google Scholar]

© 2011 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/.)