# Information Anatomy of Stochastic Equilibria

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- How random is it? The entropy rate h
_{μ}, which is the entropy in the present observation conditioned on all past observations [2]. - How much memory is required to store these causal states? The statistical complexity C
_{μ}, or the entropy of the causal states [3]. - How much of the future is predictable from the past? The excess entropy
**E**, which is the mutual information between the past and the future [5]. - How much of the generated information (h
_{μ}) is relevant to predicting the future? The bound information b_{μ}, which is the mutual information between the present and future observations conditioned on all past observations [6]. - How much of the generated information is useless (neither affects future behavior nor contains information about the past)? The ephemeral information r
_{μ}, which is the entropy in the present observation conditioned on all past and future observations [6].

## 2. Background

_{−2}, x

_{−1}, x

_{0}, x

_{1}, …} and their probabilities, specified by the joint distribution Pr(…X

_{−2}, X

_{−1}, X

_{0}, X

_{1}, …). We denote a contiguous chain of random variables as X

_{0:}

_{L}= X

_{0}X

_{1}· · ·X

_{L}

_{−1}. We assume the process is ergodic and stationary (Pr(X

_{0:}

_{L}) = Pr(X

_{t}

_{:}

_{L}

_{+}

_{t}) for all t ∈ ℤ) and the measurement symbols range over a finite alphabet: x ∈ . In this setting, the present X

_{0}is the random variable measured at t = 0, the past is the chain X

_{:0}= …X

_{−2}X

_{−1}leading up the present and the future is the chain following the present X

_{1:}= X

_{1}X

_{2}· · ·. (We suppress the infinite index in these.)

_{0:}

_{L}). Importantly, they define an algebra of information measures for a given set of random variables [27]. James et al. [6] used this to show that the past and future partition the single-measurement entropy H(X

_{0}) into several measure-theoretic atoms. These include the ephemeral information:

_{μ}can also be written as a sum of atoms:

_{μ}) a process spontaneously generates is thrown away (r

_{μ}) and a portion is actively stored (b

_{μ}). Putting these observations together gives the information anatomy of a single measurement:

_{n}= nτ is now labeled X

_{nτ}, and the past X

_{:0}is now denoted …X

_{−2}

_{τ}X

_{−}

_{τ}instead of …X

_{−2}X

_{−1}. Rather than entropy or mutual information per observed symbol, as in the discrete time setting, we define an entropy or mutual information per elapsed time unit; that is, informational rates. A step in this direction is to normalize the information measures defined above by the observation interval:

_{0}] of a single symbol by the time resolution τ to preserve the form of the information-theoretic relationship given in Equation (1). In doing so, we no longer interpret H

_{0}(τ ) as the entropy of a single measurement symbol, but rather as the number of bits per unit time required to encode the time series in a model-free manner.

_{μ}(τ) as the minimal achievable coding rate, were we to build a maximally predictive model. In this time normalization, terms of order τ or higher are ignored. These definitions then lead to the τ-entropy rate familiar in the discrete-time, continuous-value setting [2,25,28]:

_{0}(τ), h

_{μ}(τ) and b

_{μ}(τ), then we can find r

_{μ}(τ), q

_{μ}(τ) and

**E/**τ via:

## 3. Information Anatomy of Stochastic Dynamical Systems

_{0}] is the process’s statistical complexity C

_{μ}[3,4]. The result is that the information anatomy analysis decomposes this causal-state information into:

- that useful for prediction or retrodiction beyond the information provided by the causal states at the previous time step—the bound information b
_{μ}; - that useful for both prediction and retrodiction—the co-information q
_{μ}; and - that useless for both prediction and retrodiction—the ephemeral information rate r
_{μ}.

_{μ}decomposition than considered in [30]. There, and more generally, C

_{μ}=

**E**+ χ. That is, the state information consists of that shared with the future (

**E**) and information not shared with the future, but that must be stored to implement optimal prediction—the crypticity χ [31]. Together with these observations, Equation (4) reminds us that χ = h

_{μ}for Markov processes, as originally noted for finite-range one-dimensional spin systems [32].

#### 3.1. Nonlinear Langevin Dynamics

^{n}, U(x) is an analytic potential function and η(t) is zero-mean white noise with diffusion matrix D: 〈η

_{i}(t)〉 = 0 and 〈η

_{i}(t)η

_{j}(t′)〉 = D

_{ij}δ(t−t′). The diffusion coefficients D

_{ij}= D

_{ji}are assumed to be independent of x and such that det D ≠ 0. The following (well known) stationary distribution is derived by converting the stochastic differential equation into its Fokker–Planck equation form:

^{−2}

^{U}

^{(}

^{x}

^{)}dx. We assume that this is the stationary probability distribution experienced by the particle and that it is normalizable: Z < ∞. (See Figure 2 for simulation results in one dimension.)

_{τ}|X

_{0}] and H[X

_{τ}|X

_{−}

_{τ}] in Equations (5) and (6) can be calculated, simplifying if the conditional probabilities Pr(X

_{τ}|X

_{0}) and Pr(X

_{τ}|X

_{−}

_{τ}) are Gaussians, using:

_{τ}|X

_{0}) and Pr(X

_{τ}|X

_{−}

_{τ}) are Gaussian to o(τ) over a region of ℝ

^{n}with measure arbitrarily close to one. The entropies of these Gaussians are calculable to leading and subleading order in τ using a linearized version of the nonlinear Langevin equation about the initial position:

_{ij}= ∂(D∇U)

_{j}/∂x

_{i}. (This is similar, but not identical to the approximation used in [12]. Appendix E comments on the differences.) From Appendix C, we have that:

_{μ}changes as the stochasticity of the system increases: the stationary distribution ρ

_{eq}(x) flattens out, leading to an unbounded increase in H

_{0}. This is counteracted by an unbounded increase in the entropy rate.

^{−2}

^{U}with ||x||. Then, integration by parts applied to Equation (16) gives:

^{⊤}v for some vector v, then:

_{μ}(τ) is maximized when the potential well is as flat as possible, while maintaining Z < ∞.

#### 3.2. Linear Langevin Equation with Noninvertible Diffusion

^{k}and m = dim(x

_{d}), where x

_{d}evolves deterministically and x

_{n}stochastically:

^{⊤}〉 = Dδ(t − t′), where D is invertible. Taken together, though, this is a linear Langevin equation for x with a noninvertible diffusion matrix. Naively assuming that the deterministic subsystem evolves with a small amount of noise, Equation (16) would apply and give, for example, to O(τ):

_{d}evolves with an infinitesimal amount of noise. The bound information in Equation (20) differs from that found from naive application of Equation (16), because the pre-factor for the log 2/τ divergence is (n + m)/2+m rather than (n+m)/2. That is, the difference counts the dimension m of the deterministically evolving state space x

_{d}. Thus, the deterministic subsystem allows for the active storage of more of the spontaneously generated stochasticity.

## 4. Examples

#### 4.1. Stochastic Gradient Descent in One Dimension

_{μ}is sensitive to the average curvature of the potential or, equivalently, to the average squared drift normalized by the diffusion constant.

^{*}(r, h) is not everywhere differentiable in r and h, and this appears also in b

_{μ}(τ, r, h). See Figure 3. The contour of nondifferentiability is h = 0 for r > 0. Along the contour, the potential is symmetric, there are suddenly two global minima of U(x) with ${x}_{1}^{*}=-{x}_{2}^{*}$, and so, the sign of x

^{*}changes discontinuously across h = 0.

_{μ}(τ) is maximized at a nonzero noise level D > 0. This is counterintuitive: adding noise only serves to decrease the process’s predictability. However, adding noise in the present affects the future in a way that cannot be predicted from the past. Since b

_{μ}(τ) measures the amount of information shared between the present and future not shared with the past, there is a level of stochasticity that maximizes b

_{μ}(τ) for some values of r and h. This is shown in Figure 3c.

#### 4.2. Particles Diffusing in a Heat Bath

_{1}, …, x

_{N}and masses m

_{1}, …, m

_{N}diffuse according to the potential function U(x

_{1}, …, x

_{N}) in a heat bath of temperature T. Let

**x**denote the vector of concatenated particle positions. When the inertial terms m

_{i}d

^{2}x

_{i}/dt

^{2}are negligible, an overdamped Langevin equation can be used to approximate the particles’ trajectories:

_{i}the effective “spring constant” for the i

^{th}particle:

_{i}the effective “oscillation frequency” for the i

^{th}particle:

_{0}is proportional to the Boltzmann entropy by a factor of k

_{B}/τ. The entropy rate h

_{μ}(τ) and ephemeral information r

_{μ}(τ) increase logarithmically with the mean squared velocity $\sqrt{\langle {v}^{2}\rangle}={k}_{B}T/m$. The bound information b

_{μ}(τ) increases when there is a larger γ. That is, it increases when there is stronger coupling between the particles and the heat bath or when there is a smaller average oscillation frequency ${\sum}_{i=1}^{N}{\omega}_{i}^{2}$. Since γ ≥ 0 and ${\omega}_{i}^{2}\ge 0$, the bound information is bounded above by ${b}_{\mu}(\tau )\le {\tau}^{-1}N\hspace{0.17em}\text{log\hspace{0.17em}}\sqrt{2}+O(\tau )$. To achieve this upper bound, the potential U(

**x**) must be “flattened out” to decrease k

_{i}, as described in Section 3.

## 5. Conclusions

_{μ}is nondifferentiable on the line h = 0 for r ≥ 0, because the location of the global minimum of the potential function changes discontinuously across that contour. Moreover, this is not related to the bifurcation contour $h=\pm 2{r}^{3/2}/3\sqrt{3}$ [33] where the number of equilibria changes from two to one or vice versa, which has no apparent signature in the bound information. However, in these calculations, we did not avoid the “ultraviolet catastrophe”. We embraced it, since we could then evaluate the information anatomy for general nonlinear Langevin equations by linearizing. If one evaluates the information anatomies of these types of stochastic dynamics when the time discretization is not infinitesimal, however, then signatures of bifurcations should show up in the bound information as they do for the finite-time predictable information or excess entropy [16,34].

_{0}, the entropy of a single measurement symbol, since its changes are proportional to heat loss [35]. However, the point of this example is that alternative information-theoretic quantities capture other behavioral properties of particles diffusing in a heat bath. As an application of this analysis, it will be worth exploring how the information anatomy measures reflect the trade-off between stable information storage and heat loss in the context of Maxwell-like demons [36].

## Acknowledgments

## Appendix

## A. Information Anatomy of a Markov Process

_{t}

_{:}

_{t}

_{′}is the random variable of measurements X

_{t}, X

_{t}

_{+1},..., X

_{t}

_{′−1}. For a Markov process the immediately preceding observation “shields” the future from the past:

_{μ}and h

_{μ}via identities given in Section 2:

## B. Statistical Complexity is the Entropy of a Measurement

_{μ}is the entropy of the probability distribution over causal states. Causal states themselves are groupings of pasts that are partitioned according to the predictive equivalence relation ~

_{ε}[4]:

_{:0}) = x

_{−1}. In that case, the causal state space is isomorphic to the alphabet of the process and the statistical complexity is the entropy of a single measurement: C

_{μ}= H[X

_{0}].

_{0}, and the statistical complexity is C

_{μ}= H[X

_{0}]. Implicit in these calculations is an assumption that the transition probabilities Pr(X

_{0}|X

_{−}

_{τ}) for a given stochastic differential equation exist and are unique, which is satisfied, since the drift term is analytic [38].

_{t}|X

_{0}= x) is a Gaussian:

_{t}|X

_{0}= x) = Pr(X

_{t}|X

_{0}= x′), the means and variances of the above probability distribution must match, meaning that e

^{Bt}x = e

^{Bt}x′ ⇒ x = x′. Therefore, for an Ornstein–Uhlenbeck process, the causal states are indeed isomorphic to the present measurement and the statistical complexity is H[X

_{0}]. The key here is that although Pr(X

_{t}|X

_{0}= x) may quickly forget its initial condition x, for any finite-time discretization, the transition probability Pr(X

_{t}|X

_{0}= x) still depends on x.

_{eq}exists and is normalizable. Our goal is to show that if Pr(X

_{t}|X

_{0}= x) = Pr(X

_{t}|X

_{0}= x′), then x = x′. The transition probability Pr(X

_{t}= x|X

_{0}= x′) is a solution to the corresponding Fokker–Planck equation:

_{t}|X

_{0}= x′) = Pr(X

_{t}|X

_{0}= x″) ⇒ x′ = x″. This implies that the causal states are again isomorphic to the present measurement and the statistical complexity is C

_{μ}= H[X

_{0}].

_{μ}. Recall that the latter is the entropy of the probability distribution over the causal states which, in turn, are groupings of pasts that lead to equivalent predictions of future behavior. Therefore, for the stochastic differential equations considered here, causal states simply track the last measured position.

## C. Approximating the Short-Time Propagator Entropy

^{⊤}(t′)〉 = Dδ(t − t′), where det D ≠ 0. Let p(x|x′) be the transition probability Pr(X

_{t}= x|X

_{0}= x′) for the system in Equation (A1). From arguments in [38], it exists and is uniquely defined. Let q(x|x′) be a Gaussian with the same mean and variance as p(x′|x).

_{KL}[p||q] is at least of o(τ). Then, we also want to show that H[q] can be determined to o(τ) from the linearized Langevin equation:

^{linearized}], where q

^{linearized}is the transition probability that results when we locally linearize the drift.

^{3/2}), which implies that p(z|x′) = q(z|x′) + τ

^{3/2}δq, where δq is at most of O(1) in τ. From that, it would follow that D

_{KL}[q + τ

^{3/2}δq||q] = (τ

^{3/2})

^{2}$\mathcal{I}$[q], where $\mathcal{I}$[q] is the Fisher information of a Gaussian (and hence bounded) and that H[p] = H[q] to O(τ

^{3}). That same moment expansion will show that the covariance and mean of p differ from the covariance and mean of q

^{linearized}by a correction term of at most O(τ

^{2}). From this, it follows that H[q] is H[q

^{linearized}] to o(τ). The bottleneck in this approximation scheme is not approximating the transition probability as a Gaussian, but rather approximating the covariance of that Gaussian by the covariance of the locally linearized stochastic differential equation.

^{n}〉 for n ≥ 2:

^{n}〉 ∝ (Dt)

^{n}

^{/2}. Inspired by this base case, we consider the moments of the variable $z=(x-\langle x\rangle )/\sqrt{Dt}$:

^{n}〉 in terms of t, since we are interested in the small-t limit:

_{n}, α

_{n}, or β

_{n}have information about δ, which encapsulates higher-order drift nonlinearities. The $O(\sqrt{t})$ term finally has information about δ:

^{3/2}), at most. Equations (A8)–(A10) can be solved with the following initial conditions:

_{n}= α

_{n}= β

_{n}= 0 for n odd, and α

_{n}= 0 for n even, as well. Some algebra shows that:

_{n}= α

_{n}= β

_{n}= 0 for n odd, α

_{n}= 0 for n even, and ${\langle {z}^{n}\rangle}_{q}={C}_{n}{(1+{\beta}_{2}t)}^{n/2}={C}_{n}+{\scriptstyle \frac{n}{2}}{C}_{n}{\mu}^{\prime}({x}^{\prime})t+O({t}^{2})$. Thus, the moments z

^{n}of p(z|x′) are consistent with the moments of q(z|x′) to O(t

^{3/2}). Additionally, as described earlier, those moments are consistent with the moments of the linearized Langevin equation to o(t). From prior logic, H[p] can be approximated to o(t) by ${\scriptstyle \frac{1}{2}}\text{log}(2\pi e\mid Dt+{\mu}^{\prime}({x}^{\prime})D{t}^{2}\mid )$.

^{n}:

_{ij}(x′) = ∂μ

_{j}/∂x

_{i}:

_{ijk}is at most of O(1) in ||x − x′||. The evolution equation for the means is:

_{σ}

_{(1)},..., x

_{σ}

_{(}

_{m}

_{)}):

_{σ}

_{(}

_{k}

_{):}

_{k}

_{≠}

_{i}

_{,}

_{j}) denotes the covariance of the variables x

_{σ}

_{(}

_{k}

_{)}for all k in the integer list 1,..., m with the restriction that we ignore k = i and k = j. We have a base case: when f = 0, A = 0 and D

_{ij}= Dδ

_{i}

_{,}

_{j}, the Green’s function is a Gaussian with variance $\propto \sqrt{t}$. Therefore, again, we switch to variable $z=(x-\langle x\rangle )/\sqrt{t}$ and calculate its covariance evolution, similarly to Equation (A7), where we employ Equation (A11) to find the appropriate t scaling of the nonlinear f term:

^{0}〉 = 1. This implies that β

_{σ}

_{(1),...,}

_{σ}

_{(}

_{m}

_{)}= 0 for all lists {σ(i): i = 1,..., m}. Since all moments are determined to at least O(t) by just the linearized version of the nonlinear Langevin equation and since linear Langevin equations have Gaussian Green’s functions, it follows that the Green’s function for the nonlinear Langevin equation is Gaussian to O(t). Some algebra shows that the variance of the linearized Langevin equation’s Green’s function is:

^{−}

^{U}

^{(}

^{x}

^{)}dx is normalizable. This suggests that we can approximate the potential function arbitrarily well by a potential function whose support is bounded such that transition probabilities are uniquely determined by moments. Consider, for instance, the sequence of potentials U

_{L}(x) defined by:

_{L}

_{→∞}H

^{(}

^{L}

^{)}[X

_{t}

_{+}

_{τ}|X

_{t}] = H[X

_{t}

_{+}

_{τ}|X

_{t}] to o(τ), then we can claim that the formulae in the main text applies, even when the support of the transition probability distribution function is unbounded. To o(τ), we see that:

_{ℝn}e

^{−}

^{U}

^{(}

^{x}

^{)}dx < ∞, so that there is a normalizable equilibrium probability distribution. We also stipulate that ${\scriptstyle \frac{1}{Z}}{\int}_{{\mathbb{R}}^{n}}\text{tr}(A(x)){e}^{-U(x)}dx<\infty $, so that the bound information rate would be finite. Hence, both lim

_{L}

_{→∞}∫

_{||}

_{x}

_{||≤}

_{L}e

^{−}

^{U}

^{(}

^{x}

^{)}dx = Z < ∞and lim

_{L}

_{→∞}∫

_{||}

_{x}

_{||≤}

_{L}e

^{−}

^{U}

^{(}

^{x}

^{)}tr(A(x))dx = ∫

_{ℝn}e

^{−}

^{U}

^{(}

^{x}

^{)}tr(A(x))dx < ∞. Since both of these converge to finite, nonzero values, the ratio of the limit is the limit of the ratios, and we have:

## D. Linear Langevin Dynamics with Noninvertible Diffusion Matrix

^{⊤}〉 = Dδ(t−t′), then we can solve it in terms of η(t) as follows:

_{t}|X

_{0}] via:

^{2}), has the form:

^{3}), has the form:

^{4}), has the form:

_{3}calculation, we care only about the upper left hand entry, and so, every other matrix entry can be ignored. Substituting Equations (A24)–(A27) into Equation (A23), we find that:

_{nn}≠ 0, D

_{nn}is invertible:

_{nn}is invertible and symmetric, we can also write:

^{⊤}) and tr(X +Y ) = tr(X)+ tr(Y )—reveals:

## E. Time-Local Predictive Information

**E**from the past, but chooses to ignore the present H[X

_{0}]. Along these lines, the time-local predictive information (TiPi) was recently proposed as a quantity that agents maximize in order to access different behavioral modes when adapting to their environment [12].

_{μ}. Recall that both b

_{μ}and the excess entropy

**E**capture the amount of information in the future that is predictable [5,6] and not that which is predictive. The latter is the amount of information that must be stored to optimally predict, and this is given by the statistical complexity C

_{μ}. Therefore, when we use the abbreviation, TiPi, we mean the time-local predictable information: information the agent immediately sees as advantageous.)

_{t}

_{−}

_{T}then. However, from that time forward, the agent, making no further observations, is ignorant. The stochastic dynamics then models the evolution of that ignorance from the given state to a distribution of states at t − 1 and then at t, taking into account only the model φ the agent has learned or is given. They report that TiPi is the difference between state information and noise entropy:

^{(0)}= I.

_{t}

_{−}

_{T}and x

_{t}

_{−1}.

_{θ}parametrized by θ:

^{⊤}〉 = Dδ(t − t′). Following the argument used in Section 3:

^{N}[X

_{t};X

_{t}

_{−}

_{τ}] leads the agent to alter the landscape, so that it is driven into unstable regions. Maximizing the averaged TiPi ${I}_{1}^{N}[{X}_{t};{X}_{t-\tau}]$ leads to a flattening of the potential landscape. Additionally, the effect of maximizing ${I}_{2}^{N}[{X}_{t};{X}_{t-\tau}]$ is not yet clear.

^{N}[X

_{t};X

_{t}

_{−}

_{τ}] has the same effect on the potential landscape as maximizing the TiPi in [12] when T = 2. Though the model there is set up for a discrete-time analysis, it is natural to suppose that adaptive agents in an environment move according to a continuous-time dynamic, but receive sensory signals in a discrete-time manner. Equating notation used here and there:

_{t}

_{+Δ}

_{t}in terms of x

_{t}

_{−Δ}

_{t}and noise terms and use that expression to evaluate b

_{μ}? This is related to the approach taken in [12]. However, the answer obtained using the moment series expansions is a factor of two different than what would have been obtained with such a discretization scheme. Additionally, by keeping track of the order of the approximation errors in Appendix C, we found that these formulae for both bound information and TiPi would only hold for invertible diffusion matrices. As suggested by Appendix D, our estimates for such conditional mutual information change qualitatively when the diffusion matrix is not invertible. That, in turn, may be relevant to environments that are hidden Markov, settings for which the agent’s sensorium does not directly report the environmental states.

## Author Contributions

## Conflicts of Interest

## References

- Walters, P. An Introduction to Ergodic Theory; Graduate Texts in Mathematics, Volume 79; Springer-Verlag: New York, NY, USA, 1982. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed; Wiley-Interscience: New York, NY, USA, 2006. [Google Scholar]
- Crutchfield, J.P.; Young, K. Inferring Statistical Complexity. Phys. Rev. Lett
**1989**, 63, 105–108. [Google Scholar] - Shalizi, C.R.; Crutchfield, J.P. Computational Mechanics: Pattern and Prediction, Structure and Simplicity. J. Stat. Phys
**2001**, 104, 817–879. [Google Scholar] - Crutchfield, J.P.; Feldman, D.P. Regularities Unseen, Randomness Observed: Levels of Entropy Convergence. Chaos
**2003**, 13, 25–54. [Google Scholar] - James, R.G.; Ellison, C.J.; Crutchfield, J.P. Anatomy of a Bit: Information in a Time Series Observation. Chaos
**2011**, 21, 037109. [Google Scholar] - Palmer, S.E.; Marre, O.; Berry, M.J., II; Bialek, W. Predictive Information in a Sensory Population
**2013**. arXiv:1307.0225. - Beer, R.D.; Williams, P.L. Information Processing and Dynamics in Minimally Cognitive Agents. Cogn. Sci
**2014**, in press. [Google Scholar] - Tononi, G.; Edelman, G.M.; Sporns, O. Complexity and Coherency: Integrating Information in the Brain. Trends Cogn. Sci
**1998**, 2, 474–484. [Google Scholar] - Strelioff, C.C.; Crutchfield, J.P. Bayesian Structural Inference for Hidden Processes. Phys. Rev. E
**2014**, 89, 042119. [Google Scholar] - Sato, Y.; Akiyama, E.; Crutchfield, J.P. Stability and Diversity in Collective Adaptation. Physica D
**2005**, 210, 21–57. [Google Scholar] - Martius, G.; Der, R.; Ay, N. Information driven self-organization of complex robotics behaviors. PLoS One
**2013**, 8, e63400. [Google Scholar] - Varn, D.P.; Canright, G.S.; Crutchfield, J.P. Discovering Planar Disorder in Close-Packed Structures from X-Ray Diffraction: Beyond the Fault Model. Phys. Rev. B
**2002**, 66, 174110–174113. [Google Scholar] - Varn, D.P.; Canright, G.S.; Crutchfield, J.P. ε-Machine spectral reconstruction theory: A direct method for inferring planar disorder and structure from X-ray diffraction studies. Acta. Cryst. Sec. A
**2013**, 69, 197–206. [Google Scholar] - Crutchfield, J.P.; Young, K. Computation at the Onset of Chaos. In Entropy, Complexity, and the Physics of Information; Zurek, W., Ed.; Volume VIII, SFI Studies in the Sciences of Complexity; Addison-Wesley: Reading, MA, USA, 1990; pp. 223–269. [Google Scholar]
- Tchernookov, M.; Nemenman, I. Predictive Information in a Nonequilibrium Critical Model. J. Stat. Phys
**2013**, 153, 442–459. [Google Scholar] - Atmanspracher, H.A.; Scheingraber, H. Information Dynamics; Plenum: New York, NY, USA, 1991; pp. 45–60. [Google Scholar]
- James, R.G.; Burke, K.; Crutchfield, J.P. Chaos Forgets and Remembers: Measuring Information Creation and Storage. Phys. Lett. A
**2014**. [Google Scholar] - Lizier, J.; Prokopenko, M.; Zomaya, A. Information modification and particle collisions in distributed computation. Chaos
**2010**, 20, 037109. [Google Scholar] - Flecker, B.; Alford, W.; Beggs, J.M.; Williams, P.L.; Beer, R.D. Partial Information Decomposition as a Spatiotemporal Filter. Chaos
**2011**, 21, 037104. [Google Scholar] - Moss, F.; McClintock, P.V.E. Noise in Nonlinear Dynamical Systems; Cambridge University Press: Cambridge, UK, 1989; Volume 1. [Google Scholar]
- Shraiman, B.; Wayne, C.E.; Martin, P.C. Scaling Theory for Noisy Period-Doubling Transitions to Chaos. Phys. Rev. Lett
**1981**, 46, 935. [Google Scholar] - Crutchfield, J.P.; Nauenberg, M.; Rudnick, J. Scaling for External Noise at the Onset of Chaos. Phys. Rev. Lett
**1981**, 46, 933. [Google Scholar] - Girardin, V. On the Different Extensions of the Ergodic Theorem of Information Theory. In Recent Advances in Applied Probability Theory; Baeza-Yates, R., Glaz, J., Gzyl, H., Husler, J., Palacios, J.L., Eds.; Springer: New York, NY, USA, 2005; pp. 163–179. [Google Scholar]
- Gaspard, P.; Wang, X.J. Noise Chaos (ε, τ)-Entropy Per Unit Time. Phys. Rep
**1993**, 235, 291–343. [Google Scholar] - Oksendal, B. Stochastic Differential Equations: An Introduction with Applications, 6th ed; Springer: New York, NY, USA, 2013. [Google Scholar]
- Yeung, R.W. Information Theory and Network Coding; Springer: New York, NY, USA, 2008. [Google Scholar]
- Gaspard, P. Brownian Motion, Dynamical Randomness, and Irreversibility. New J. Phys
**2005**, 7, 77–90. [Google Scholar] - Lecomte, V.; Appert-Rolland, C.; van Wijland, F. Thermodynamic Formalism for Systems with Markov Dynamics. J. Stat. Phys
**2007**, 127, 51–106. [Google Scholar] - Ellison, C.J.; Mahoney, J.R.; Crutchfield, J.P. Prediction, Retrodiction, and the Amount of Information Stored in the Present. J. Stat. Phys
**2009**, 136, 1005–1034. [Google Scholar] - Crutchfield, J.P.; Ellison, C.J.; Mahoney, J.R. Time’s Barbed Arrow: Irreversibility, Crypticity, and Stored Information. Phys. Rev. Lett
**2009**, 103, 094101. [Google Scholar] - Crutchfield, J.P.; Feldman, D.P. Statistical Complexity of Simple One-Dimensional Spin Systems. Phys. Rev. E
**1997**, 55, R1239–R1243. [Google Scholar] - Poston, T.; Stewart, I. Catastrophe Theory and Its Applications; Pitman: London, UK, 1978. [Google Scholar]
- Feldman, D.P.; Crutchfield, J.P. Structural Information in Two-Dimensional Patterns: Entropy Convergence and Excess Entropy. Phys. Rev. E
**2003**, 67, 051103. [Google Scholar] - Kittel, C.; Kroemer, H. Thermal Physics, 2nd ed; W. H. Freeman: New York, NY, USA, 1980. [Google Scholar]
- Landauer, R. Dissipation and Noise Immunity in Computation, Measurement, and Communication. J. Stat. Phys
**1989**, 54, 1509–1517. [Google Scholar] - Lohr, W. Properties of the Statistical Complexity Functional and Partially Deterministic HMMs. Entropy
**2009**, 11, 385–401. [Google Scholar] - Risken, H. The Fokker-Planck Equation: Methods of Solution and Applications, 2nd ed; Springer: Berlin, Germany, 1996. [Google Scholar]
- Drozdov, A.N.; Morillo, M. Expansion for the Moments of a Nonlinear Stochastic Model. Phys. Rev. Lett
**1996**, 77, 3280. [Google Scholar] - Crutchfield, J.P.; Ellison, C.J.; Mahoney, J.R.; James, R.G. Synchronization and Control in Intrinsic and Designed Computation: An Information-Theoretic Analysis of Competing Models of Stochastic Computation. Chaos
**2010**, 20, 037105. [Google Scholar]

**Figure 1.**Information anatomy of a stationary continuous-time process graphically depicted using information diagrams. Although the past entropy H[X

_{:0}] and the future entropy H[X

_{τ}

_{:}] typically are infinite, space limitations constrain us to draw them with finite areas. (

**a**) Information diagram for the anatomy of a process’s single observation X

_{0}in the context of its past X

_{:0}and its future X

_{τ}

_{:}(after [6], with permission). (

**b**) Information diagram for the anatomy of a Markov process, in which the present X

_{0}causally shields the past from future. The elusive information σ

_{μ}(τ) vanishes.

**Figure 2.**(

**a**) Particle diffusing according to ẋ = −x+η(t) with diffusion coefficient D = 1. A finite-time trajectory x(t) followed by the diffusing particle. (

**b**) Over infinite time, the particle experiences positions distributed according to the probability density function ρ

_{eq}(x) in Equation (7), calculated as a normalized histogram of particle positions. (

**c**) If the previous particle position is known, a future position can be determined with less uncertainty than if no previous particle position is known. The probability Pr(x, t|0, 0) of being in position x at a time t differs from the equilibrium probability distribution ρ

_{eq}(x), if we know the position of the particle at a previous time; e.g., x(0) = 0.

**Figure 3.**Information anatomy of the stochastic cusp catastrophe: (

**a**) Shifting from a double-well to single-well potentials as r and h are varied. Example potentials U(x) for various r and h: blue/dark line, r = 2 and h = −1; purple/medium line, r = 2 and h = 0; and yellow/light line, r = 2 and h = 1. (

**b**) Contour plot of the system-dependent part of the bound information rate b

_{μ}(τ) as a function of r and h, highlighting the global minimum x

^{*}changing discontinuously as h moves through zero. ${\text{lim}}_{D\to 0}\hspace{0.17em}{b}_{\mu}(\tau )-{\tau}^{-1}\hspace{0.17em}\text{log\hspace{0.17em}}\sqrt{2}$ as a function of r and h: b

_{μ}(τ) is nondifferentiable with respect to h along h = 0 when r ≥ 0. (

**c**) Bound information b

_{μ}(τ) as it varies over the cusp catastrophe equilibria surface: Height gives the fixed points as a function of r and h. Color hue is proportional to the deterministic limit ${\text{lim}}_{D\to 0}\hspace{0.17em}{b}_{\mu}(\tau )-{\tau}^{-1}\hspace{0.17em}\text{log\hspace{0.17em}}\sqrt{2}$ at each r and h. (

**d**) The bound information rate is maximized at nonzero stochasticity D for double-well potentials and asymmetric single-well potentials. D maximizing ${b}_{\mu}(\tau )-{\tau}^{-1}\hspace{0.17em}\text{log\hspace{0.17em}}\sqrt{2}$ as a function of r and h: the surface is colored by ${b}_{\mu}(\tau )-{\tau}^{-1}\hspace{0.17em}\text{log\hspace{0.17em}}\sqrt{2}$ at that value of D.

**Table 1.**Information anatomy of first-order, n-dimensional nonlinear Langevin dynamics: ẋ = −D∇U(x) + η(t), where U(x) is analytic in x and η(t) is zero-mean white noise with invertible diffusion matrix D, 〈η(t)η(t′)

^{⊤}〉 = Dδ(t − t′). Stationary distribution ρ

_{eq}(x) ∝ exp(−2U(x)) is assumed normalizable.

Information Rates | Definition | Terms | ||
---|---|---|---|---|

O(τ^{−1} log τ) | O(τ^{−1}) | O(1) | ||

Stored H_{0} = C_{μ}(τ) | ${\scriptstyle \frac{H[{X}_{0}]}{\tau}}$ | 0 | −∫ ρ_{eq}(x) log ρ_{eq}(x)dx | 0 |

τ-Entropy h_{μ}(τ) | ${\scriptstyle \frac{H[{X}_{0}\mid {X}_{:0}]}{\tau}}$ | ${\scriptstyle \frac{n}{2}}$ | $\text{log\hspace{0.17em}}\sqrt{2\pi e\mid \text{det\hspace{0.17em}}D\mid}+n\hspace{0.17em}\text{log\hspace{0.17em}}\sqrt{2}$ | $-{\scriptstyle \frac{1}{2}}\int \nabla \xb7(D\nabla U(x)){\rho}_{eq}(x)dx$ |

Bound b_{μ}(τ) | ${\scriptstyle \frac{I[{X}_{0};{X}_{\tau :}\mid {X}_{:0}]}{\tau}}$ | 0 | $n\hspace{0.17em}\text{log\hspace{0.17em}}\sqrt{2}$ | $-{\scriptstyle \frac{1}{2}}\int \nabla \xb7(D\nabla U(x)){\rho}_{eq}(x)dx$ |

Ephemeral r_{μ}(τ) | ${\scriptstyle \frac{H[{X}_{0}\mid {X}_{:0},{X}_{\tau :}]}{\tau}}$ | ${\scriptstyle \frac{n}{2}}$ | $\text{log\hspace{0.17em}}\sqrt{2\pi e\mid \text{det\hspace{0.17em}}D\mid}$ | 0 |

Enigmatic q_{μ}(τ) | ${\scriptstyle \frac{I[{X}_{:0};{X}_{0};{X}_{\tau :}]}{\tau}}$ | $-{\scriptstyle \frac{n}{2}}$ | $-\int {\rho}_{eq}(x)\hspace{0.17em}\text{log\hspace{0.17em}}{\rho}_{eq}(x)dx-n\hspace{0.17em}\text{log\hspace{0.17em}}2-\text{log\hspace{0.17em}}\sqrt{2\pi e\mid \text{det\hspace{0.17em}}D\mid}$ | ∫ ∇ · (D∇U(x))ρ_{eq}(x)dx |

Elusive σ_{μ}(τ) | ${\scriptstyle \frac{I[{X}_{:0};{X}_{\tau :}\mid {X}_{0}]}{\tau}}$ | 0 | 0 | 0 |

© 2014 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Marzen, S.; Crutchfield, J.P.
Information Anatomy of Stochastic Equilibria. *Entropy* **2014**, *16*, 4713-4748.
https://doi.org/10.3390/e16094713

**AMA Style**

Marzen S, Crutchfield JP.
Information Anatomy of Stochastic Equilibria. *Entropy*. 2014; 16(9):4713-4748.
https://doi.org/10.3390/e16094713

**Chicago/Turabian Style**

Marzen, Sarah, and James P. Crutchfield.
2014. "Information Anatomy of Stochastic Equilibria" *Entropy* 16, no. 9: 4713-4748.
https://doi.org/10.3390/e16094713