Way More than the Sum of Their Parts: From Statistical to Structural Mixtures

Crutchfield, James P.

doi:10.3390/e28010111

Open AccessEditor’s ChoiceArticle

Way More than the Sum of Their Parts: From Statistical to Structural Mixtures

by

James P. Crutchfield

Complexity Sciences Center and Physics and Astronomy Department, University of California at Davis, One Shields Avenue, Davis, CA 95616, USA

Entropy 2026, 28(1), 111; https://doi.org/10.3390/e28010111

Submission received: 26 November 2025 / Revised: 12 January 2026 / Accepted: 13 January 2026 / Published: 16 January 2026

(This article belongs to the Section Statistical Physics)

Download

Browse Figures

Versions Notes

Abstract

We show that mixtures comprising multicomponent systems typically are much more structurally complex than the sum of their parts; sometimes, infinitely more complex. We contrast this with the more familiar notion of statistical mixtures, demonstrating how statistical mixtures miss key aspects of emergent hierarchical organization. This leads us to identify a new kind of structural complexity inherent in multicomponent systems and to draw out broad consequences for system ergodicity.

Keywords:

epsilon-machine; nonergodicity; causal state; multistationary; excess entropy; statistical complexity; hierarchy; transients

1. Introduction

Multicomponent systems typically are much more structurally complex than the collection of their parts, even infinitely more so. This should be contrasted with statistical mixtures—such as arise in the Gibbs Paradox of thermodynamics (Sections 2–3 in [1]), where gases of distinct molecular species exhibit only a modest entropy increase upon formation due to the uncertainty in which species one has in hand. This contrast demonstrates how the ansatz of statistical mixtures misses key aspects of hierarchical organization. The result, as we show, is an awareness of a new kind of structural complexity of composite systems.

The development here focuses on the theoretical core of this basic phenomenon, arguing that it is, in fact, quite commonplace. To appreciate this, it will be helpful to address the motivating issues upfront.

The multicomponent systems of interest are found in several different domains, including the entropy of mixing in thermodynamics [2,3], the change point problem in statistics [4], the attractor-basin portrait of a dynamical system [5], Smale’s basic sets [6,7], spatially extended systems with multiple local attractors [8], chaotic crystallography [9,10], evolutionary dynamics [11], and adaptive and learning systems with memory. More recently, nonergodicity has been broadly implicated in, for example, computation theory [12], learning theory [13], and the complex behaviors of social and economic systems [14,15].

We introduce the concept of hidden multistationary processes to capture what is common across these domains—a system comprising multiple locally competing behaviors and structures. The basic idea can be appreciated within an experimental paradigm: multistationarity models repeated experimental trials in which different initial conditions lead to statistically distinct behaviors. When we wish to emphasize their structure, we refer to a multicomponent process; when emphasizing the statistical consequences, we refer to a multistationary process.

In short, one goal is to provide a tractable model that quantitatively captures what is common among these domains while providing an architectural, high-level view of the state-space organization of behaviors. In particular, we would like to analyze how unpredictable and how structurally complex hidden multistationary processes are when given their components, whose unpredictability and complexity we know. Another goal is that the approach be constructive, allowing one to quantitatively determine essential properties and to determine precisely what gives rise to the emergent global complexity.

The development proceeds as follows: It first reviews statistical mixtures, briefly recalling stochastic processes, information theory, structural complexity, and mixed-state processes. It then introduces the theory and construction of hidden multistationary processes. This includes a canonical minimal representation of hidden multistationary processes and a method to analyze their ergodic decompositions that determines how the latter affect information measures.

The sections following this explore a number of examples, going from the simplest cases and familiar structured stationary component processes to the Mother of All Processes that subsumes them all. Taken altogether, these illustrate a new kind of structural hierarchy and make plain how infinite complexity naturally emerges. The development concludes by drawing out parallels with related results and consequences in nonequilibrium thermodynamics and machine learning.

2. Background

To get started, we give a minimal summary of the required background—a summary that assumes familiarity with computational mechanics [16,17] and with information theory for complex systems [18,19].

2.1. Processes

A process, denoted

P

, is specified by the joint distribution

P ({\overset{\leftarrow}{X}}_{t}, {\vec{X}}_{t})

over its chain of random variables

\dots X_{- 1} X_{0} X_{1} \dots

. We view

P

as a communication channel with a fixed input distribution

P ({\overset{\leftarrow}{X}}_{t})

: It transmits information from the past

{\overset{\leftarrow}{X}}_{t} = \dots X_{t - 3} X_{t - 2} X_{t - 1}

to the future

{\vec{X}}_{t} = X_{t} X_{t + 1} X_{t + 2} \dots

by storing it in the present.

X_{t}

denotes the discrete random variable at time t taking on values x from a discrete alphabet

A

. And

X_{t}^{ℓ} = X_{t} X_{t + 1} \dots X_{t + ℓ - 1}

is the block of ℓ random variables starting at time t. A particular realization is denoted using lowercase:

X_{t}^{ℓ} = x_{t}^{ℓ} \in A^{ℓ}

. Often, we simply refer to a particular sequence

w = x_{0} x_{1} \dots x_{ℓ - 1}

,

x_{i} \in A

, as a word. If we have a symbol x and a word w, we form a new word by concatenation, e.g.,

w x

or

x w

.

2.2. Information

Given a process, we form the block distributions

{P (X_{t}^{ℓ}) : for all t and ℓ}

by marginalizing the given joint distribution

\begin{matrix} P (X_{t}^{ℓ}) = \sum_{{{\overset{\leftarrow}{x}}_{t}, {\vec{x}}_{t + ℓ}}} P ({\overset{\leftarrow}{x}}_{t}, {\vec{x}}_{t + ℓ}) . \end{matrix}

(1)

(We ignore here the measure-theoretic construction of cylinder sets and their measures; for background, see Ref. [20] and references therein.) A stationary process is one for which

P (X_{t}^{ℓ}) = P (X_{0}^{ℓ})

for all t and ℓ. For a stationary process, we drop the time index and thereby have the family of word distributions

P (X^{ℓ})

that completely characterizes the process.

The amount of Shannon information in words is measured by the block entropy,

\begin{matrix} H (ℓ) = H [P (X^{ℓ})], \end{matrix}

(2)

where

H [P (Y)] = - \sum_{{y}} P (y) {log}_{2} P (y)

is the Shannon entropy of the random variable Y. A process’ information production is given by its entropy rate,

\begin{matrix} h_{μ} = lim_{ℓ \to \infty} \frac{H (ℓ)}{ℓ}, \end{matrix}

(3)

where

μ

refers to the measure over infinite sequences and so, too, to the word probabilities

P (w)

. It is often used to measure a process’ degree of unpredictability.

At a minimum, a good predictor—denote this model’s state random variables

\hat{R}

—must capture all of a process’ excess entropy

E

[19]—all of the information shared between past and future:

E = I [\overset{\leftarrow}{X}; \vec{X}]

. Here,

I [Y; Z]

is the mutual information between variables Y and Z. That is, for a good predictor

\hat{R}

:

E = I [\hat{R}; \vec{X}]

.

These quantities are closely related. In particular, for finitary processes, those with

E < \infty

, the block entropy has the linear asymptotic behavior,

\begin{matrix} H (ℓ) \propto_{ℓ \to \infty} E + h_{μ} ℓ . \end{matrix}

More precisely,

\begin{matrix} E = lim_{ℓ \to \infty} [H (ℓ) - h_{μ} ℓ] . \end{matrix}

(4)

This shows that

E

controls the convergence of entropy rate estimates

h_{μ} (ℓ) = H (ℓ) - H (ℓ - 1)

. In fact, for time-series processes,

E

can also be defined in terms of entropy convergence,

\begin{matrix} E = \sum_{ℓ = 1}^{\infty} [h_{μ} (ℓ) - h_{μ}] . \end{matrix}

(5)

An analogous quantity that controls the block entropy convergence to the linear asymptote is the transient information,

\begin{matrix} T = \sum_{ℓ = 0}^{\infty} [E + h_{μ} ℓ - H (ℓ)] . \end{matrix}

(6)

T

measures the average amount of information an observer must extract in order to know a process’ internal state (for a review of these and related informations, see Ref. [19]).

2.3. Structure

We refer to a model of a process—a particular choice of

\hat{R}

—as a presentation. Note that building a model of a process is more demanding than developing a prediction scheme, since one wishes to go beyond sequence statistics to express a process’ mechanisms and internal organization.

To do this, we first recall that a process’ communication channel is determined by the conditional distributions

P ({\vec{X}}_{t} | {\overset{\leftarrow}{X}}_{t})

. Based on this, computational mechanics introduced an equivalence relation

\overset{\leftarrow}{x} \sim_{ϵ} {\overset{\leftarrow}{x}}^{'}

that groups all of a process’ histories that give rise to the same prediction. This results in constructing a map

ϵ : \overset{\leftarrow}{X} \to S

from all pasts (finite and infinite) to causal states defined by

\begin{matrix} ϵ ({\overset{\leftarrow}{x}}^{ℓ}) = {{\overset{\leftarrow}{x}}^{' ℓ} : P (\vec{X} | {\overset{\leftarrow}{x}}^{ℓ}) = P (\vec{X} | {\overset{\leftarrow}{x}}^{' ℓ})}, \end{matrix}

(7)

where

{\overset{\leftarrow}{x}}^{ℓ}, {\overset{\leftarrow}{x}}^{' ℓ} \in A^{ℓ}

. In other words, a process’ finite-history causal states are equivalence classes—

S^{ℓ} = P ({\overset{\leftarrow}{X}}^{ℓ}, \vec{X}) / \sim_{ϵ}

—that partition the space

{\overset{\leftarrow}{X}}^{ℓ}

of pasts into sets which are predictively equivalent. The causal states, then, are the collection across all past lengths:

S = \cup_{ℓ = 0}^{\infty} S^{ℓ}

. These consist of recurrent and transient causal states that are visited with positive or vanishing probability, respectively.

With the causal states in hand, one determines the causal-state to causal-state transitions,

\begin{matrix} {T_{σ, σ^{'}}^{(x)} : x \in A, σ, σ^{'} \in S} . \end{matrix}

The resulting model M, consisting of the causal states and transitions, is called the process’ ϵ-machine [21],

\begin{matrix} M (P) \equiv \{S, T^{(x)} : x \in A\} . \end{matrix}

(8)

Informally, a process is ergodic if its statistics can be estimated from a single realization that is sufficiently long. If

P

is ergodic, then

M (P)

’s recurrent causal states are strongly connected and their asymptotic invariant distribution

π = P (S)

is unique and given by

π = π T

, where

T = \sum_{x \in A} T^{(x)}

.

As described, an

ϵ

-machine is obtained from a process, but one can also simply define an

ϵ

-machine and consider its generated process. We will use both notions in the following, as they are equivalent [22]. But why should one use the

ϵ

-machine presentation of a process in the first place?

The main takeaway from computational mechanics is that out of all optimally predictive models

\hat{R}

resulting from a partition of the past—those such that

E = I [\hat{R}; \vec{X}]

—the

ϵ

-machine captures the amount of information that a process stores—the statistical complexity

C_{μ} \equiv H [P (S)]

. The excess entropy

E

—the information explicitly observed in sequences—is only a lower bound on the information

C_{μ}

that a process stores [21]:

E \leq C_{μ}

. The difference

χ = C_{μ} - E

, called the crypticity, measures how the process hides its internal state information from an observer [23].

A process’

ϵ

-machine is its minimal unifilar presentation. It is unique for the process. Moreover, it allows a number of the process’ complexity measures to be directly and efficiently calculated [24]. The latter include the process’ entropy rate, excess entropy, statistical complexity, and crypticity. In short, a process’

ϵ

-machine captures all of its informational and structural properties.

3. Mixed State Operator

Given an

ϵ

-machine M, its recurrent causal states can be treated as a standard basis

{e_{j}}

in a vector space. Then, any distribution

μ = P (S)

over the states is a linear combination:

μ = \sum_{j} c_{j} e_{j}

. Following Ref. [17], these distributions are called mixed states. For an k-state

ϵ

-machine, the mixed-state space is a

k - 1

-dimensional simplex

Δ^{k - 1}

, as the distributions

μ \in Δ^{k - 1}

are normalized.

Consider a special subset of mixed states. Define

μ (w)

as the distribution over M’s states induced after observing sequence

w = x_{0} \dots x_{ℓ - 1}

, M having started with state distribution

π

,

\begin{matrix} μ_{π} (w) & \equiv P (S_{ℓ} | X_{0}^{ℓ} = w, S_{0} \sim π) \\ = \frac{P (X_{0}^{ℓ} = w, S_{ℓ}, S_{0} \sim π)}{P (X_{0}^{ℓ} = w, S_{0} \sim π)} \\ = \frac{π T^{(w)}}{π T^{(w)} 1}, \end{matrix}

(9)

where 1 is a column vector of 1s and

T^{(w)} = T^{(x_{ℓ - 1})} \dots T^{(x_{0})}

. Here, the notation

X \sim P

serves to indicate that random variable X is governed by distribution P.

The last line gives the mixed-state

μ_{π} (w)

directly in terms of the initial state distribution

π

and M’s transition matrices. One interpretation is that

μ_{π} (w)

represents an observer’s best guess as to the process’ causal-state distribution given that it saw word w and knows both the process’

ϵ

-machine and the initial distribution

π

. Occasionally in the following, it will be noted that

π

refers not to the initial distribution but to another.

To determine the set of mixed states allowed by a process, we simply calculate the set

{μ_{π} (w)}

of distinct

μ_{π} (w)

for all words

w \in A^{*}

. This is most directly achieved by enumerating w in lexicographic order, e.g., for a binary alphabet, successively choosing

w \in {λ, 0, 1, 00, 01, 10, 11, \dots}

. Here,

λ

is the null word. As we will see, the mixed-state set can be finite or infinite.

If we consider the entire set of mixed states, then we construct a presentation of the process by specifying the transition matrices

{T^{x} : μ_{π} (w) \to μ_{π} (w x)}

,

\begin{matrix} P (x, μ_{π} (w x) | μ_{π} (w)) & \equiv \frac{P (w x | S_{0} \sim π)}{P (w | S_{0} \sim π)} \\ = μ_{π} (w) T^{(x)} 1 1 . \end{matrix}

(10)

Note that many words can induce the same mixed state.

It is useful to define a corresponding operator

U

that acts on a machine M, returning its mixed-state presentation

U_{π} (M) = {{T^{x}}, {μ_{π} (w)}}

under initial distribution

π

. The examples to follow shortly illustrate how mixed states and

U_{π} (M)

are calculated.

4. Constructing Hidden Multistationary Processes

Recall that a hidden multistationary nonergodic process is one that evolves, across successive realizations, to statistically distinct long-term behaviors. We now introduce our model of this by giving a construction procedure. This, in effect, defines what we mean by multistationary. We then develop several basic properties and analyze in detail a series of example constructions to illustrate them and their ergodic decompositions.

The main tool used to construct a hidden multistationary process is the mixed-state operator

U_{π}

. We show that this results in a canonical presentation of a given set of stationary components. This is the multistationary process’

ϵ

-machine.

Definition 1.

A hidden multistationary process (HMSP) is defined by the presentation determined via the following procedure:

Identify an indexed family of distinct component stationary ergodic processes ${P^{i}}_{i \in I}$ , where I is a finite or countable index set. Each is specified by its ϵ-machine presentation $M^{i} = M (P^{i})$ . The ϵ-machines consist only of their recurrent states $S^{i}$ that, due to ergodicity, form a single, strongly connected set.
Specify the component mixture distribution π—the probability with which each will be visited (sampled),

$\begin{matrix} π^{i} = P (M^{i}) . \end{matrix}$

(11)
Finally, calculate the mixed-state presentation of the multistationary process,

$\begin{matrix} M = U_{π} (⨆_{i \in I} M^{i}), \end{matrix}$

(12)

where we take the nonoverlapping set of the measure semi-groups [25] specified by the component ϵ-machines. In this way, M’s states and transitions are determined from the component ϵ-machines and the mixture distribution π.

The HMSP M, the result of the construction, determines the transient portion of a nonergodic

ϵ

-machine. M’s recurrent components are essentially the same as those (

M^{i}

’s) of the original component stationary processes

P^{i}

. That is to say, what is new in M is the set of transient causal states.

Note that this construction is a stochastic analog of building recognizers for multiregular formal languages [26].

5. The Multistationary $ϵ$ -Machine

With the background and definitions set, we are ready to explore the properties of multistationary nonergodic processes. We first establish the structural properties of their

ϵ

-machine presentations and then their informational properties via ergodic decompositions of various complexity measures.

Each component

M^{i} = \{S^{i}, T_{i}^{(x)}, x \in A^{i}\}

, considered as generating its own process

P^{i}

, has a stationary distribution

p^{i}

over its states,

\begin{matrix} p_{j}^{i} = P (S_{j}), S_{j} \in S^{i} . \end{matrix}

We will also write this as a vector over the multistationary process’ recurrent states, when we have a finite number of components,

\begin{matrix} π = [π^{1} (p_{1}^{1} \dots p_{j_{1}}^{1}) \dots π^{k} (p_{1}^{k} \dots p_{j_{k}}^{k})], \end{matrix}

where

k = | I |

and

j_{i} = | S^{i} |

. The stationary state distribution

π_{i j}

for the multistationary process generated by M is, then

\begin{matrix} π_{i j} = π^{i} \cdot p_{j}^{i} . \end{matrix}

(13)

Consider the following properties of a multistationary process as just defined. The proofs of these results closely follow those of Theorem 1, Lemma 7, and Theorem 2 of Section 4 in Ref. [27]. One simply groups together the states and transitions of the nonergodic mixture component machines and applies Ref. [27]’s methods to these new (larger) sets of the composite machine. Reference [27]’s proof steps, then apply directly to obtain the results claimed. Indeed, many of the results for

ϵ

-machines in Ref. [27] carry over to the

ϵ

-machines for multistationary processes.

To help motivate the construction and rationale of multistationary

ϵ

-machines, the following describes their properties with short outlines of the arguments. The claims themselves, though, are stated here only as conjectures, leaving to a sequel the formal development and proofs. With this noted, for each of the example cases in Section 7 below, the properties can be verified by hand.

Lemma 1 (Stationarity).

The state distribution

π_{i j}

is stationary.

Proof.

This follows from realizing that the recurrent portion of M’s transition matrix is block diagonal. That is, asymptotically the components are independent, and, by assumption, the component distributions are invariant. □

Hypothesis 1 (Unifilarity).

The hidden multistationary process machine M is unifilar.

Remark 1.

This would follow by adapting Lemma 5 (ϵ-Machines Are Deterministic) of Ref. [27] to the composite machine’s mixed-state presentation.

Hypothesis 2 (Minimality).

The hidden multistationary process machine M is minimal.

Remark 2.

This would follow by adapting Theorem 2 (Causal States Are Minimal) of Ref. [27] to the composite machine’s mixed-state presentation. Recall that the latter is determined from each component’s ϵ-machine, which is minimal.

Hypothesis 3 (Uniqueness).

The hidden multistationary process machine M is unique.

Remark 3.

This would follow by adapting Theorem 3 (Causal States Are Unique) of Ref. [27] to the composite machine’s mixed-state presentation.

As noted, the relevant definitions and proofs of these closely follow those given for

ϵ

-machines generally, as in Ref. [27], and will be the subject of a sequel.

Hypothesis 4.

The mixed state operator applied to a mixture of (finite, ergodic) ϵ-machines produces an ϵ-machine. That is, the ϵ-machine for the hidden multistationary process generated by

\begin{matrix} M = U (⨆_{i \in I} M^{i}) \end{matrix}

(14)

is an ϵ-machine.

Remark 4.

This would follow from the preceding claims.

Remark: Constructing HMSPs in this way, one could start with other classes of presentation for the ergodic component processes, such as nonunifilar presentations—i.e., generic HMMs. However, the resulting M need not be an

ϵ

-machine. And, as a consequence, one could not directly calculate from such an M the various complexity measures nor, lacking minimality, draw structural conclusions about its architecture. This is one reason why we choose to specify the component processes using

ϵ

-machine presentations. Limiting the current construction to ergodic components specified by finite-state

ϵ

-machine presentations serves to simplify the discussion and highlight our main results.

However, lifting these various restrictions or generalizing the previous properties to address them would be a fruitful effort, giving a much broader characterization of the complexity of multistationary processes.

So, from here on out we assume the ergodic components are

ϵ

-machines and ask what properties hold for the multistationary processes so constructed. We build processes consisting of either a finite number or countably infinite number of components.

6. Ergodic Decompositions

Since we are given the component processes

{P^{i}, i \in I}

, what can we say about the resulting multistationary process generated by M? A first step develops various kinds of ergodic decomposition that attempt to predict M’s properties in terms of its ergodic components’ properties. The basic question has a very long history in ergodic and information theories. The reader is referred to the review given in Ref. [28]. Our approach here is, on the one hand, to briefly give a flavor of several ergodic decompositions and, on the other, to compensate for that lack of rigor, by analyzing in detail a number of concrete examples.

The word distribution

P (X^{ℓ})

for

M = U (⊔_{i \in I} M^{i})

is given by

\begin{matrix} P (X^{ℓ}) = \sum_{i \in I} π^{i} P (X^{ℓ} | M^{i}) . \end{matrix}

(15)

That is, for word w,

\begin{matrix} P (w) = \sum_{i \in I} π^{i} P_{i} (w), \end{matrix}

(16)

where

P_{i} (w)

denotes the probability that

P^{i}

generates w.

Quantitatively, the HMSP’s block entropy is upper bounded by the component block entropies,

\begin{matrix} H (ℓ) & = H [P (X^{ℓ})] \\ = H [\sum_{i \in I} π^{i} P (X^{ℓ} | M^{i})] \\ \leq \sum_{i \in I} π^{i} H [P (X^{ℓ} | M^{i})] \\ = \sum_{i \in I} π^{i} H^{i} (ℓ), \end{matrix}

(17)

where the second-to-last step employs Jensen’s inequality [18] and

H^{i} (ℓ)

is component

P^{i}

’s block entropy.

A more insightful upper bound, though, is developed by first imagining that the sequences generated by the ergodic components do not overlap—for example, the

P^{i}

s have disjoint alphabets

A^{i}

. Then we define an indicator function f of the process and an associated random variable

θ

:

θ = f (X^{ℓ}) = i

, if

X^{ℓ} \in A_{i}^{ℓ}

. We have

\begin{matrix} H [X^{ℓ}] & = H [X^{ℓ}, f (X^{ℓ})] \\ = H [θ] + H [X^{ℓ} | θ] \\ = H [θ] + \sum_{i \in I} P (θ = i) H [X^{ℓ} | θ = i] \\ = H [π] + \sum_{i \in I} π^{i} H [X^{ℓ} | M^{i}] \\ = H [π] + \sum_{i \in I} π^{i} H^{i} (ℓ) . \end{matrix}

In the general setting, however, the sequences generated by distinct components can overlap. This reduces the number of distinct positive-probability words and so, too, the block entropy. In this way, we see that the above equality is only an upper bound on the HMSP’s block entropy:

\begin{matrix} H (ℓ) \leq H [π] + \sum_{i \in I} π^{i} H^{i} (ℓ) . \end{matrix}

(18)

This bound highlights the contribution of the mixture entropy

H [π]

. We return to critique this notion of ergodic decomposition later on. For now, we draw out several useful consequences of this line of reasoning, relying on the bound Equation (18). Elsewhere we explore tighter informational bounds on decomposition.

From this, we see that an HMSP’s entropy rate

h_{μ}

is simply determined by those of its ergodic components. Assuming the mixture entropy

H [π]

is finite, we have

\begin{matrix} h_{μ} & = lim_{ℓ \to \infty} \frac{H (ℓ)}{ℓ} \\ \leq lim_{ℓ \to \infty} \frac{1}{ℓ} \{H [π] + \sum_{i \in I} π^{i} H^{i} (ℓ)\} \\ = \sum_{i \in I} π^{i} lim_{ℓ \to \infty} \frac{H^{i} (ℓ)}{ℓ} \\ = \sum_{i \in I} π^{i} h_{μ}^{i}, \end{matrix}

(19)

where we have the component entropy rate

h_{μ}^{i} = h_{μ} (M^{i})

. Reference [28] originally established this decomposition.

What is less intuitive, though, are various complexity measures as they apply to HMSPs. As we will see, unlike the entropy rate, which component processes are selected and how they relate to one another play key roles. We first consider the ergodic decomposition for excess entropy, then for the transient information, and finally that for the statistical complexity.

The excess entropy

E

also has an ergodic decomposition. In this case, we have

\begin{matrix} E & = lim_{ℓ \to \infty} (H (ℓ) - h_{μ} ℓ) \\ \leq lim_{ℓ \to \infty} (H [π] + \sum_{i \in I} π^{i} H^{i} (ℓ) - ℓ \sum_{i \in I} π^{i} h_{μ}^{i}) \\ = H [π] + \sum_{i \in I} π^{i} (lim_{ℓ \to \infty} [H^{i} (ℓ) - h_{μ}^{i} ℓ]) \\ = H [π] + \sum_{i \in I} π^{i} E^{i}, \end{matrix}

(20)

where

E^{i}

is the excess entropy for ergodic component i. The excess entropy decomposition was explored in Refs. [29,30].

Combining the entropy rate and excess entropy ergodic decompositions, we see that the block-entropy linear asymptotes—

H^{i} (ℓ) \propto E^{i} + h_{μ}^{i} ℓ

—have their own decomposition,

\begin{matrix} E + h_{μ} ℓ & \leq H [π] + \sum_{i \in I} π^{i} E^{i} + ℓ \cdot \sum_{i \in I} π^{i} h_{μ}^{i} \\ = H [π] + \sum_{i \in I} π^{i} (E^{i} + h_{μ}^{i} ℓ) . \end{matrix}

(21)

It is a simple additional step to develop the ergodic decomposition for the transient information,

\begin{matrix} T & = \sum_{ℓ = 0}^{\infty} [E + h_{μ} ℓ - H (ℓ)] \\ \leq \sum_{ℓ = 0}^{\infty} [H [π] + \sum_{i \in I} π^{i} (E^{i} + h_{μ}^{i} ℓ) \\ + ℓ \sum_{i \in I} π^{i} h_{μ}^{i} - H [π] - \sum_{i \in I} π^{i} H^{i} (ℓ)] \\ = \sum_{ℓ = 0}^{\infty} \sum_{i \in I} π^{i} [E^{i} + h_{μ}^{i} ℓ - H^{i} (ℓ)] \\ = \sum_{i \in I} π^{i} T^{i} . \end{matrix}

(22)

Curiously, like the entropy rate decomposition, the mixture entropy

H [π]

does not play a role.

The statistical complexity also has an ergodic decomposition,

\begin{matrix} C_{μ} & = - \sum_{σ \in S} P (σ) {log}_{2} P (σ) \\ = - \sum_{i \in I} \sum_{σ^{i} \in S^{i}} P (σ^{i}) {log}_{2} P (σ^{i}) \\ = - \sum_{i \in I} \sum_{j = 0}^{| S^{i} | - 1} π_{i j} {log}_{2} π_{i j} \\ = - \sum_{i \in I} \sum_{j = 0}^{| S^{i} | - 1} π^{i} p_{j}^{i} {log}_{2} π^{i} p_{j}^{i} \\ = - \sum_{i \in I} π^{i} \sum_{j = 0}^{| S^{i} | - 1} p_{j}^{i} ({log}_{2} π^{i} + {log}_{2} p_{j}^{i}) \\ = - \sum_{i \in I} π^{i} \sum_{j = 0}^{| S^{i} | - 1} p_{j}^{i} {log}_{2} π^{i} - \sum_{i \in I} π^{i} \sum_{j = 0}^{| S^{i} | - 1} p_{j}^{i} {log}_{2} p_{j}^{i} \\ = - \sum_{i \in I} π^{i} {log}_{2} π^{i} - \sum_{i \in I} π^{i} C_{μ}^{i} \\ = H [π] + \sum_{i \in I} π^{i} C_{μ}^{i}, \end{matrix}

(23)

where

C_{μ}^{i}

are the statistical complexities of the ergodic components. The decomposition for statistical complexity was first noted in Ref. [31]. Note that this decomposition does not rely on assuming an equality as in Equation (18).

Finally, the multistationary crypticity

χ

, which measures how a process hides state information from an observer, is also unaffected by the mixture distribution,

\begin{matrix} χ & = C_{μ} - E \\ \geq H [π] + \sum_{i \in I} π^{i} C_{μ}^{i} - (H [π] + \sum_{i \in I} π^{i} E^{i}) \\ = \sum_{i \in I} π^{i} (C_{μ}^{i} - E^{i}) \\ = \sum_{i \in I} π^{i} χ^{i}, \end{matrix}

(24)

where

χ^{i}

is the crypticity of component

M^{i}

. In this, it is similar to the entropy rate and transient information decompositions.

7. Structural Decompositions—Beyond Statistical

To emphasize, what’s notable in these kinds of informational decomposition is that, for nonergodic

ϵ

-machines, we have, for example,

\begin{matrix} C_{μ} > \sum_{i \in I} π^{i} C_{μ}^{i} . \end{matrix}

That is, the global structural complexity

C_{μ}

of a multistationary process is strictly greater than that contained in its components

{C_{μ}^{i}}

. In short, a multistationary process is at least the sum of its parts. Indeed, the above inequality leaves out the entropy of mixing. But this is too facile. As we will see, multistationary processes are much, much more.

We will see below, taking a more structural perspective going beyond the ergodic decompositions, that the transient causal state structure is key to a process’ global organization and what sequences of observations reveal. This leads us to call into question the interpretation and use of the preceding kinds of ergodic decomposition.

We now show that the construction procedure can be used to answer a number of different questions about multistationary ergodic processes. Several questions are illustrated via particular examples; others via general constructions. The series of examples is developed incrementally to highlight the methods and particular results, as much in isolation as possible.

We first start with processes built from finite-state ergodic components that lead to a multistationary process that is itself finite-state. Then we analyze the case in which finite components lead to a multistationary process with an infinite number of states. We end with examples built from an infinite number of finitary ergodic processes. In each case, we explore the structure of the resulting multistationary process, its complexity measures, and its ergodic decomposition.

7.1. Finite Hidden Multistationary Processes

7.1.1. A Base Case

A simple but illustrative case is that of two period-1 component processes: all Heads and all Tails, selected with fair probability:

π = (1 / 2, 1 / 2)

.

The components observed separately have

h_{μ}^{0} = h_{μ}^{1} = 0

. But together

H (ℓ) = 1

,

ℓ \geq 1

, since that is the informational uncertainty we have about which component the process is in. Naturally, once in a component there is no uncertainty about the symbols emitted. In this way, we see that the HMSP information

H (ℓ)

of the mixture is all mixing entropy

H (π)

.

The composite

ϵ

-machine consists of three causal states: a single transient start state that immediately transitions (with fair probability) to either the All-Heads recurrent component (single recurrent state) or to the All-Tails recurrent component (also a single recurrent state).

7.1.2. Period-1 and Period-2 Process

Define the Periodic Process

P \equiv P (p)

that repeats the word

w = 0^{p - 1} 1

. Let us construct the simplest multistationary process consisting of the following two components:

Period-1 process $P (1)$ , which has complexity measures $h_{μ}^{1} = 0$ bits per symbol, $C_{μ}^{1} = 0$ bits, $E^{1} = 0$ bits, $T^{1} = 0$ bit-symbols, and $χ^{1} = 0$ bits.
Period-2 process $P (2)$ , which has complexity measures $h_{μ}^{1} = 0$ bits per symbol, $C_{μ}^{1} = 1$ bit, $E^{1} = 1$ bit, $T^{1} = 1$ bit-symbol, and $χ^{1} = 0$ bits.

The Period-1 component has a single recurrent state A and the Period-2 has two recurrent states; label them B and C. The second step is to specify the mixture distribution

π

and we take this to be uniform:

π = (\frac{1}{2}, \frac{1}{2})

. That is,

P (M^{1}) = 1 / 2

and

P (M^{2}) = 1 / 2

. And the final step is to use the mixed-state operator to construct

M = U_{π} (P (1) ⨆ P (2))

. The resulting multistationary

ϵ

-machine is shown in Figure 1c.

The recurrent states of the component

ϵ

-machines show up as M’s recurrent states, as claimed. The two recurrent state sets are not connected. What is new are the two transient states (solid circles). As a generator of the multistationary process, M begins in its start state (solid circle with circumscribing circle) and then follows transitions according to the edge probabilities, emitting the corresponding symbols, eventually reaching one or the other of the two recurrent-state sets—

{A}

or

{B, C}

.

We can understand M’s structure by calculating its mixed states

μ (w) = (P (A), P (B),

P (C))

,

w \in A

, using Equation (9):

\begin{matrix} μ (λ) & = [\frac{1}{2}, \frac{1}{4}, \frac{1}{4}], \\ μ (0) & = [0, 1, 0] = B, \\ μ (1) & = [\frac{1}{2}, 0, \frac{1}{2}], \\ μ (00) & = [0, 0, 0] = \emptyset, \\ μ (01) & = [0, 0, 1] = C, \\ μ (10) & = [0, 1, 0] = B, and \\ μ (11) & = [1, 0, 0] = A . \end{matrix}

In this, on the one hand,

μ (λ)

is the start state of the mixed state presentation and its distribution gives the asymptotic invariant distribution over the component recurrent states A, B, and C—the state probabilities before any symbols have been generated.

On the other hand, if

x = 0

is generated, then we immediately know the process is in component

P (2)

, since

P (1)

cannot produce a 0, and, in particular, it is in a specific state, B. This is reflected in the transient mixed state

μ (0) = [0, 1, 0]

. In fact, any time a valid 0 is generated, we know M is in state B. This is also seen in the mixed state

μ (10)

, in which the last symbol generated is a 0 and we again obtain a

δ

-function distribution concentrated on state B.

Now, there are also disallowed transitions and so disallowed words. This is shown in the mixed state

μ (00) = [0, 0, 0]

for the word

w = 00

.

More interesting, though, is the transient mixed state

μ (1) = [\frac{1}{2}, 0, \frac{1}{2}]

, which indicates that, having seen a 1, we know that M cannot be in state B. However, the best we can say is that it is either in state A (the Period-1 component) or in state C (the Period-2 component) with fair probability. It is not until we see another symbol that we are guaranteed to know with certainty in which component M is. If

w = 11

, then

P

is in A. Since we now know the state with certainty, we say that

w = 0

and

w = 11

are synchronizing words. In this case, they are the minimal synchronizing words.

The ergodic decompositions tell us the following:

$h_{μ} = π^{1} h_{μ}^{1} + π^{1} h_{μ}^{2} = 0$ bit per symbol;
$E = H (π) + π^{1} E^{1} + π^{2} E^{2} = 1 + 0 + 1 / 2 = 1.5$ bits;
$C_{μ} = H (π) + π^{1} C_{μ}^{1} + π^{2} C_{μ}^{2} = 1 + 0 + 1 / 2 = 1.5$ bits;
$T = π^{1} T^{1} + π^{1} T^{2} = 0 + 1 / 2 = 1 / 2$ bit-symbols;
$χ = π^{1} χ^{1} + π^{1} χ^{2} = 0 + 0 = 0$ bits.

Let us check these by directly calculating the entropy growth

H (ℓ)

and convergence

h_{μ} (ℓ)

for M. These are shown in Figure 2.

The entropy growth plot (top) leads to an estimate of

E \approx 1.5

bits, which is predicted by the ergodic decomposition. Both entropy growth and entropy convergence (bottom) show that

h_{μ} (ℓ) = h_{μ} = 0

after

ℓ = 2

. And this too is correctly predicted by the corresponding entropy rate decomposition.

In fact, for lengths longer than the longest period, there are always three distinct sequences—

w \in {1111 \dots, 0101 \dots, 1010 \dots}

. And so,

E \leq {log}_{2} 3 \approx 1.585

bits. This is roughly consistent with block entropy plots.

Let us analyze this exactly. One of those sequences is

w = 1^{n}

and it occurs with probability

1 / 2

. The two other sequences are

w = {(01)}^{n}

and

w = {(10)}^{n}

and they are generated equally often by their components. But since that component appears only half the time, they occur in the output sequences with probability

1 / 4

each. Thus,

E = H [P (w)] = H [(1 / 2, 1 / 4, 1 / 4)] = 1.5

bits. And this is what is seen in the plots.

The HMSP’s statistical complexity is

\begin{matrix} C_{μ} & = H [P (S)] \\ = H [(1 / 2, 1 / 4, 1 / 4)] \\ = 3 / 2 bits . \end{matrix}

(25)

which agrees with the ergodic decomposition.

The ergodic decomposition, however, predicts

T = 1 / 2

bit-symbols, while the entropy growth plot shows that, in fact,

T \approx 2.19

bit-symbols. So, the ergodic decomposition for

T

is incorrect. In short, we see that the ergodic decomposition does not properly account for the state distribution’s relaxation through the transient mixed states (solid circles) in M; Figure 1c. That relaxation takes longer than a single step (as the decomposition incorrectly assumes) and that increased relaxation time increases

T

.

Note that this is one of the simpler examples of the class of processes that have finite transients. Let us consider one that is more complex.

7.1.3. Isomorphic Golden Means Process

The No-Repeated-0s Golden Mean Process (GMP) generates all binary sequences except those with consecutive 0s. When a 0 is generated, then the probability of a 0 or a 1 is fair. The GMP is an order-1 Markov process.

Let

P^{1}

be the No-Repeated-0s GMP, and let

P^{2}

be the No-Repeated-1s GMP. See Figure 3a,b. We define a nonergodic mixture

P

as follows:

\begin{matrix} P = p P^{1} + (1 - p) P^{2}, \end{matrix}

with mixture distribution

π = (p, 1 - p)

. The probability of any word w is, then,

\begin{matrix} P (w) = p P_{1} (w) + (1 - p) P_{2} (w) . \end{matrix}

Using the mixed-state operator, we construct

M (P)

’s transient and recurrent states using this mixture distribution, finding

\begin{matrix} μ (w) & = [\begin{matrix} P (A | w) & P (B | w) & P (C | w) & P (D | w) \end{matrix}], \\ μ (λ) & = [\begin{matrix} \frac{2 p}{3} & \frac{p}{3} & \frac{2 (1 - p)}{3} & \frac{1 - p}{3} \end{matrix}] \\ μ (0) & = [\begin{matrix} 0 & \frac{p}{2 - p} & \frac{2 (1 - p)}{2 - p} & 0 \end{matrix}], \\ μ (1) & = [\begin{matrix} \frac{2 p}{1 + p} & 0 & 0 & \frac{1 - p}{1 + p} \end{matrix}], \\ μ (00) & = [\begin{matrix} 0 & 0 & 1 & 0 \end{matrix}] = C, \\ μ (01) & = [\begin{matrix} p & 0 & 0 & 1 - p \end{matrix}], \\ μ (10) & = [\begin{matrix} 0 & p & 1 - p & 0 \end{matrix}], \\ μ (11) & = [\begin{matrix} 1 & 0 & 0 & 0 \end{matrix}] = A, \\ μ (001) & = [\begin{matrix} 0 & 0 & 0 & 1 \end{matrix}] = D, and \\ μ (110) & = [\begin{matrix} 0 & 1 & 0 & 0 \end{matrix}] = B . \end{matrix}

Longer words can only lead to one of these mixed states and so the

ϵ

-machine is finite. The full multistationary

ϵ

-machine is shown in Figure 3c, as a function of the mixture parameter p. We see that the number of states, including the transients, is finite for all mixture probabilities.

The transition matrices for

M (P)

’s recurrent causal states are

T^{0} = [\begin{matrix} 0 & \frac{1}{2} & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & \frac{1}{2} & 0 \\ 0 & 0 & 1 & 0 \end{matrix}] and T^{1} = [\begin{matrix} \frac{1}{2} & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & \frac{1}{2} \\ 0 & 0 & 0 & 0 \end{matrix}] .

The stationary distribution is defined by the mixture of the two processes,

\begin{matrix} π (p) & = (\begin{matrix} p π^{1} & (1 - p) π^{2} \end{matrix}) \\ = \frac{1}{3} (\begin{matrix} 2 p & p & 2 (1 - p) & 1 - p \end{matrix}), \end{matrix}

recalling that

π^{1} = π^{2} = (\begin{matrix} 2 / 3 & 1 / 3 \end{matrix})

.

Using methods from refs. [16,17], the excess entropy for each recurrent component is seen to be

\begin{matrix} E^{1} = E^{2} & = \frac{2}{3} {log}_{2} \frac{3}{2} + \frac{1}{3} {log}_{2} 3 - \frac{2}{3} \\ = \frac{2}{3} {log}_{2} \frac{3}{4} + \frac{1}{3} {log}_{2} 3 \\ \approx 0.251629 bits . \end{matrix}

By the ergodic decomposition theorem, the excess entropy for the mixture, as a function of p is

\begin{matrix} E (p) & = p E^{1} + (1 - p) E^{2} + H (p) \\ = E^{1} + H (p), \end{matrix}

since the two components are isomorphic. For

p = 1 / 2

, we expect

E \approx 1.251629

bits.

Again, the component transient information equals the excess entropy, since the GMP is order-1 Markov. So, the associated ergodic decomposition gives

\begin{matrix} T (p) & = p T^{1} + (1 - p) T^{2} \\ = T^{1}, \end{matrix}

since the two components are isomorphic. For

p = 1 / 2

, we expect

T \approx 0.251629

bits.

Similarly, the statistical complexity of each recurrent component is

\begin{matrix} C_{μ}^{1} = C_{μ}^{2} & = \frac{2}{3} {log}_{2} \frac{3}{2} + \frac{1}{3} {log}_{2} 3 \\ \approx 0.9182958 bits . \end{matrix}

So, from Equation (23) the statistical complexity of the mixture as a function of p is

\begin{matrix} C_{μ} (p) = C_{μ}^{1} + H (p) . \end{matrix}

(26)

For

p = 1 / 2

, we expect

C_{μ} \approx 1.9182958

bits.

Let us check the decompositions by calculating the associated complexity measures from M’s entropy growth and convergence. The latter are shown in Figure 4.

The entropy growth plot estimates that

E = 1.22258

bits, which is low by 2%. And the entropy convergence plot shows that the

E

, calculated there using Equation (5) as the area shown, is a bit lower still:

E = 1.21753

. Although, due to the slow convergence and the finite number of terms taken in the approximation, these errors are expected. Similarly, the ergodic decomposition of the entropy rate

h_{μ} = 0.666667

bits per symbol shows up correctly when estimated from

P

’s entropy growth and convergence. And so, the predictions from the related ergodic decompositions are consistent.

The entropy growth, however, shows the transient information is substantially larger (

T \approx 4.29

bit-symbols) than that predicted from its ergodic decomposition (

T = 0.251629

). This discrepancy is clearly not due to estimation errors. Rather, as noted above for the P1-P2 mixture, it arises from the decomposition not accounting for the five transient causal states of M; see Figure 3c.

Individually, GMPs are subshifts of finite type and finite Markov order. From the cycles in the transient states, we see that as components they make the multistationary process sofic—infinite Markov order. There are subsets of sequences—specifically

{(01)}^{n}

—for which one never synchronizes.

This means that mixtures of finite-order Markov chains, even “linear” mixtures that come from independently running them, are processes that are not finite Markovian. They require hidden Markov representations.

7.2. Infinite State

The preceding examples, chosen to explicitly illustrate methods and as harbingers of coming results, are rather special in that the led to finite-state multistationary processes. We now turn to more typical cases, still constructed from finite-state ergodic components, that lead to a multistationary process with an infinite number of states.

7.2.1. Period-1 and Fair Coin Process

The next example of a multistationary process mixes stochastic and periodic behaviors: We build it out of a period-1 process and a fair coin. In effect, we ask how difficult it is to distinguish these two simple, but extreme processes—one completely predictable, the other completely unpredictable.

For here and a bit later, define Bernoulli Process

B (p)

, which is a model of a coin flip with bias probability p.

The first step, then, is to select the following two stationary components:

Period-1 Process $P (1)$ : $h_{μ}^{1} = 0$ bits per symbol, $C_{μ}^{1} = 0$ bits, $E^{1} = 0$ bits, $T^{1} = 0$ bit-symbols, and $χ^{1} = 0$ bits. See Figure 5a.
Fair Coin Process $B (\frac{1}{2})$ :, $h_{μ}^{1} = 1$ bit per symbol, $C_{μ}^{1} = 0$ bits, $E^{1} = 0$ bits, $T^{1} = 0$ bit-symbols, and $χ^{1} = 0$ bits. See Figure 5b.

Though at the two extremes of predictability, these are structurally trivial processes—

C_{μ}^{i} = 0

.

The second step is to select the mixture distribution, which we take to be uniform:

π = (\frac{1}{2}, \frac{1}{2})

. And the third step is to use the mixed-state operator to construct

M = U_{π} (P (1) ⨆ B (\frac{1}{2}))

. Several of the mixed states are

\begin{matrix} μ (λ) & = [\frac{1}{2}, \frac{1}{2}], \\ μ (0) & = [0, 1] = B \\ μ (1) & = [\frac{3}{4}, \frac{1}{4}], \\ μ (0 A^{+}) & = [0, 1] = B, \\ μ (1^{+} 0) & = [0, 1] = B, \\ μ (1^{k}) & = [1 - α_{k}, α_{k}], \\ ⋮, and \\ μ (1^{\infty}) & = [1, 0] = A, \end{matrix}

where

α_{k} = (2 k + 1) / 2^{k + 1}

. The resulting

ϵ

-machine is shown in Figure 5c.

In Figure 5c, and so too in the mixed states, we see our first surprising result for multistationary processes. Starting from two structurally trivial processes, the multistationary

ϵ

-machine has a countable infinity of transient causal states. Why? If, at any point, one sees a 0, then we know the process is in the Fair Coin component, since the other component cannot generate a 0. However, it is only after “seeing” an infinite sequence of 1s that one could determine that the process is in the All-1s component. In short, the effort required to distinguish between these two trivial processes is infinite and this is directly reflected in the infinite set of transient states.

The ergodic decompositions tell us the following:

$h_{μ} = π^{1} h_{μ}^{1} + π^{2} h_{μ}^{2} = 0 + 1 / 2 = 1 / 2$ bit per symbol;
$E = H (π) + π^{1} E^{1} + π^{2} E^{2} = 1 + 0 + 0 = 1$ bits;
$C_{μ} = H (π) + π^{1} C_{μ}^{1} + π^{2} C_{μ}^{2} = 1 + 0 + 0 = 1$ bits;
$T = π^{1} T^{1} + π^{2} T^{2} = 0 + 0 = 0$ bit-symbols;
$χ = π^{1} χ^{1} + π^{2} χ^{2} = 0 + 0 = 0$ bits.

Note that the ergodic decompositions predict that the structural complexity measures are driven solely by the mixture entropy

H (π)

. Both components contribute nothing:

E^{i} = C_{μ}^{i} = 0

. Let us check these predictions by estimating the quantities from M’s entropy growth and convergence, shown in Figure 6.

The entropy growth plot shows that

E = 1

bit, as predicted by M’s ergodic decomposition. And the entropy convergence plot shows that the

E

, calculated as the area shown, is also the same. Similarly, the ergodic decomposition of the entropy rate

h_{μ} = \frac{1}{2}

bits per symbol shows up correctly on the entropy plots.

The entropy growth plot, however, shows the transient information is quite a bit larger (

T \approx 2.78

bit-symbols) than that predicted (

T = 0

) from its ergodic decomposition.

Also, the informational ergodic decompositions, while indicating a role for the mixture entropy, miss entirely the existence of an infinite number of transient states and the attendant difficulty that confronts an observer trying to detect in which component the process is.

7.2.2. Two Biased Coins

Slightly increasing the level of sophistication, we now construct a multistationary process out of fully stochastic components: Two biased coins of unequal (but symmetric) biases

B (\frac{1}{4})

and

B (\frac{3}{4})

. See Figure 7a,b.

We again take a uniform mixture distribution:

π = (\frac{1}{2}, \frac{1}{2})

. The result of constructing

M = U_{π} (B (\frac{1}{4}) ⨆ B (\frac{3}{4}))

with

π = (\frac{1}{2}, \frac{1}{2})

is shown in Figure 8.

The mixed state presentation reveals two countably and infinitely long chains of transient causal states. One leads to the ergodic component for

B (\frac{3}{4})

and the other for

B (\frac{1}{4})

. In a simple sense these long transient chains show the mechanism by which one determines the coin biases. Interestingly, though, at any point statistical fluctuations can change the apparent bias and drive the state back up the long chains, heading for the complementary biased coin.

Consider, as above the

h_{μ}

,

E

,

C_{μ}

,

T

, and

χ

ergodic decompositions. The ergodic decompositions for excess entropy and statistical complexity give similar results; namely,

\begin{matrix} E & = H [π] + E (M^{1}) + E (M^{2}) \\ E & = H [π], and \\ C_{μ} & = H [π] + C_{μ} (M^{1}) + C_{μ} (M^{2}) \\ C_{μ} & = H [π] . \end{matrix}

That is, the complexities of the multistationary process are all in the mixture distribution. Even then, the mixture entropy, in this case a number upper bounded by 1, belies the infinite number of transients and the difficulty of determining in which ergodic component the process is. Quantitatively, it seems another measure of the global process complexity and a new decomposition are in order. We return to this shortly, after examining several more kinds of multistationary processes.

Let us validate the ergodic decompositions’ predictions vis-à-vis the process estimates of their various measures from M’s entropy growth and convergence. The latter are shown in Figure 9.

Entropy growth, using the y-intercept method, shows that

E = 1

bit, as predicted by M’s ergodic decomposition. And the entropy convergence plot shows that

E

, as the area shown, is also the same, though it takes many terms and so shows slow convergence. The ergodic decomposition of the entropy rate

h_{μ} = 0.811278

bits per symbol shows up correctly on the entropy plots.

The entropy growth, however, shows the transient information is substantially larger (

T \approx 5.95

bit-symbols) than that predicted (

T = 0

) from its ergodic decomposition. Again, the mixing entropy fails to account for the dominating transient causal state structure.

7.2.3. Pair of Isomorphic Even Processes

The Even Process (EP) generates all binary sequences such that pairs of 1s occur in blocks of even length bounded by 0s. Once a 0 is seen, a 0 or a 1 is generated with fair probability. The EP is closely related to the Golden Mean Process. They have the same entropy rates and statistical complexities. The main important difference, despite the close similarity and a simple relabeling of transitions, is that the EP is described by no finite-order Markov chain. It is infinite Markov order, though finite state.

To construct a multistationary process the first step, then, is to select two stationary components. One component

P^{1}

will be an EP with an even number of 0s (Figure 10b) and the other

P^{2}

an EP with an even number of 1s (Figure 10a). The second step is to choose mixture distribution:

π (M^{1}, M^{2}) = (1 / 2, 1 / 2)

.

Finally, Figure 11 shows the

ϵ

-machine for the HMSP

M = U_{π} (M^{1} ⨆ M^{2})

with

π (M^{1}, M^{2}) = (\frac{1}{2}, \frac{1}{2})

. The

ϵ

-machine displayed is estimated only up to words of length 8 and the transitions are set to give a well-formed

ϵ

-machine at this approximation.

There are several observations. First, the HMSP

ϵ

-machine is symmetric under 1-0 exchange, as it should be given this symmetry in the ergodic components. Second, and less obviously, there is an infinite number of transient causal states. This is due to the outside paths along

1^{\infty}

and

0^{\infty}

. These two sequences arise from the 2-cycles in the respective ergodic component recurrent states: pairs of 1s in

M^{1}

never synchronize; ditto for pairs of 0s in

M^{2}

. And so, in M there are infinitely long sequences that never reach

M^{1}

or

M^{2}

.

Third, the HMSP is infinite Markov order. To see this, note that there are six cycles in the transient states—these cycles are the signature of infinite Markov order or, what is called “soficity”. The HMSP is a shift in infinite type [32]. In particular, there is a two-cycle

{(00)}^{+}

between states 32 and 38 and one

{(11)}^{+}

between states 37 and 41. There are two four-cycles

{(1100)}^{+}

between states, 10, 18, 24, and 26 and between states 24, 29, 34, and 36; and two

{(0011)}^{+}

between states 11, 19, 25, and 23 and between states 25, 30, 35, and 33.

The ergodic decompositions tell us the following:

$h_{μ} = π^{1} h_{μ}^{1} + π^{1} h_{μ}^{2} = h_{μ}^{1} = \frac{2}{3}$ bit per symbol;
$E = H (π) + π^{1} E^{1} + π^{2} E^{2} = H (π) + E^{1} = 1 + 0.918296 = 1.918296$ bits;
$C_{μ} = H (π) + π^{1} C_{μ}^{1} + π^{2} C_{μ}^{2} = H (π) + C_{μ}^{1} = 1 + 0.918296 = 1.918296$ bits;
$T = π^{1} T^{1} + π^{1} T^{2} = T^{1} = 3.16938$ bit-symbols;
$χ = π^{1} χ^{1} + π^{1} χ^{2} = χ^{1} = 0$ bits.

Let us check these by directly calculating the entropy growth and convergence for M. These are shown in Figure 12.

Let us check the decompositions by comparing their predictions to estimates from M’s entropy growth and convergence. These functions are shown in Figure 12.

The entropy growth plot shows that

E = 1.847

bits which disagrees by about 4% with the prediction from the ergodic decomposition (

E = 1.918

bits). And the entropy convergence plot shows that the

E = 1.847

bits as the area shown. Although, due to the slow convergence, the finite number of terms taken in the numerical approximation, and the finite number of transient states taken in the approximation of M, this error is not surprising. Similarly, the ergodic decomposition of the entropy rate

h_{μ} = 2 / 3

bits per symbol shows up correctly on the entropy plots (

h_{μ} \approx = 0.6666

).

Entropy growth, however, shows the transient information is three times larger (

T \approx 9.77443

bit-symbols) than that predicted from its ergodic decomposition (

T = 3.16938

bit-symbols). Again, this discrepancy follows from the mixture entropy missing the contributions from the (infinite) number of transient causal states.

7.3. Infinite Components

We end our selection of example multistationary processes by constructing several from an infinite number of finitary ergodic components.

7.3.1. Handbag of Necklaces

Fourier analysis of a signal assumes the generating process consists of (at most) periodic sequences. As an analog of this assumption in the present setting, consider the Handbag of Necklaces (HMSP) consisting of ergodic-component stationary processes

P (i)

for all periods,

i \in I = 1, 2, 3, 4, \dots

. That is, if we assume a binary process, the sequences emitted consist of words period-1

a^{+}

, period-2

{(a b)}^{+}

, period-3

{(b b a)}^{+}

, period-4

{(b b b a)}^{+}

, and so on. The HMSP

ϵ

-machine is shown in Figure 13.

Note that there is an infinite number of transient causal states. Overall the HMSP is a highly symmetric structure and dominated by the transient states. From this one can readily read off how to synchronize—how to know in which ergodic component the process is. For example, to obtain to component

M^{i}

, there are exactly i paths.

Now, consider the mixture measure

π^{i}

for the components. Then, the state probabilities are

π_{i j} = K π^{i} / j, j = 1, \dots, i

, where

K = \sum_{i = 1}^{\infty} \sum_{j = 1}^{i} π^{i} / j

is the normalization constant. Note this is the presentations’ stationary invariant distribution.

There is some flexibility in setting the mixture distribution

π^{i}

. There are several criteria for choosing it for a countable number of states, including the following:

Normalization: $\sum_{i = 1}^{\infty} \sum_{j = 1}^{i} π_{i j} = \sum_{i = 1}^{\infty} π^{i} = 1$ . And so, $π^{i}$ must decay faster than $1 / i$ .
Finitary ( $E < \infty$ ) HMSP [19]: Must have $H [π] = \sum_{i = 1}^{\infty} π^{i} {log}_{2} π^{i} < \infty$ .
Infinitary ( $E \to \infty$ ) HMSP [19]: Must have $H [π] \to \infty$ . See Ref. [33] for another example process in this family.

Consider the structure of the transitions in the HMSP’s first row of states. The first transition probability for seeing an a is

\begin{matrix} P (a) & = \sum_{i = 1}^{\infty} P (a | M^{i}) P (M^{i}) \\ = \sum_{i = 1}^{\infty} π^{i} / i, \end{matrix}

since the probability of seeing an a in the

i^{t h}

component is

1 / i

. Let

p_{a} = P (a)

. The probability for the succeeding transition emitting an a is

\begin{matrix} P (a | a) & = P (a a) / P (a) \\ = π^{1} / p_{a}, \end{matrix}

since

P (a a) = π^{1}

. These transitions determine those leaving the top row of states on a b. Note that

P (a^{n} | a a) = 1, n = 1, 2, \dots

.

Now, consider the second to the top row of transitions. First,

P (b) = 1 - p_{a}

. Then, we have

\begin{matrix} P (a | b) & = P (b a) / P (b) \\ = \frac{p_{a} - π^{1}}{1 - p_{a}} . \end{matrix}

since

\begin{matrix} P (b a) & = \sum_{i = 2}^{\infty} P (M^{i}) P (b a | M^{i}) \\ = \sum_{i = 2}^{\infty} π^{i} / i \\ = p_{a} - π^{1} . \end{matrix}

Note that

P (b a b) = P (b a)

. There is a second path to

M^{2}

controlled by the transition

\begin{matrix} P (b | a) & = P (a b) / P (a) \\ = \frac{p_{a} - π^{1}}{p_{a}} . \end{matrix}

since

P (a b) = P (a b)

. That is, the appearance of

a b

and

b a

in each component occurs due to the same conditions.

The ergodic decompositions tell us the following:

$h_{μ} = \sum_{i = 1}^{\infty} π^{i} h_{μ}^{i} = 0$ , as $h_{μ} [P (i)] = 0$ .
The excess-entropy ergodic decomposition for a process with p component periodic processes with periods $1, \dots, p$ is

$\begin{matrix} E & = H [π] + \sum_{i = 1}^{p} π^{i} E^{i} \end{matrix}$

(27)

$\begin{matrix} = H [π] + \sum_{i = 1}^{p} π^{i} {log}_{2} i \end{matrix}$

(28)

$\begin{matrix} = {log}_{2} p + 1 / p \sum_{i = 2}^{p} {log}_{2} i, \end{matrix}$

(29)

where the second step follows assuming $π^{i}$ is uniform.
$C_{μ}$ = E.
$T = \sum_{i = 1}^{p} π^{i} T^{i} = \sum_{i = 2}^{p} π^{i} {log}_{2} i$ bit-symbols.
$χ = 0$ : This is a bit surprising: No crypticity, no hidden information.

These are consistent with directly calculating the entropy growth for M, as shown in Figure 14.

7.3.2. The Purse Process

The example of two biased coins suggests extending to an infinite number of biased coins in a purse—a bag of coins with different biases. As hinted at in the two-coin case, all of the (infinite) complexity is in the mixture and none comes from the components.

Moreover, we can choose

π

to be such that

H [π]

is finite or infinite. Thus, the Purse Process is an extreme example in which infinite complexity comes from zero-complexity components. There is probably no simpler way to say that a multistationary process is way more than the sum of its (zero complexity) parts.

To obtain a brief sense of the Purse Process, consider an HMSP consisting of three coins of unequal bias and compare this to the case of two coins of Section 7.2.2. Figure 15 shows the HMSP for two completely biased coins and one fair coin. Its basic features were already encountered above. And it suggests a notable generalization to which we now turn.

7.3.3. Mother of All Processes

Finally, consider upping the complexity ante substantially. This generalization is to an HMSP consisting of a mixture of all processes. Let us step through its construction.

First, recall that every stationary process has a unique

ϵ

-machine presentation. That is,

ϵ

-machines and stationary processes are in a 1-to-1 correspondence. Second, an efficient algorithm exists to list all

ϵ

-machines by the number of the recurrent causal states. Reference [34] shows how to systematically enumerate the ϵ-machine process library

L_{k}

for k-state

ϵ

-machines. See Table I there for the list of binary-alphabet topological

ϵ

-machines. There are 1,117,768,214 such 8-state

ϵ

-machines.

In the current construction consider only topological

ϵ

-machines for which any branching transitions are taken with fair probability. We refer to each process by its

ϵ

-machine’s enumeration number—we call this the process’ Gödel number.

Second, define the Process Urn (PU) as containing the entire library of

ϵ

-machines. That is, we imagine an HMSP that is the result of reaching into the urn, selecting one

ϵ

-machine, and having it generate a full realization. The repeatedly sampled PU is a HMSP—The Mother of All Processes. Certainly, one of the most nonergodic processes one could work with.

Definition 2.

The Mother of All Processes is

M = U_{π} (⨆_{M \in L} M),

(30)

with π being a chosen mixture distribution.

To simplify, let us examine the HMSP whose components are all one-state and all two-state

ϵ

-machines. There are now 10 components—three 1-state components and 7 2-state components.

There are 17 recurrent causal states altogether across the ergodic components. However, the many hundreds of mixed states are no longer usefully presented in a state-transition diagram, as illustrated up to this point. Instead, we plot the mixed states themselves as dots in the simplex

Δ^{17}

. This is shown in Figure 16. This is a 2D projection in which the recurrent states are the vertices of

Δ^{17}

and so appear on its periphery. The start state, with uniform probability across the components and not across the recurrent states, is not in the simplex center.

One notes the concentration of mixed states that move near to

Δ^{17}

’s vertices, indicating close approaches to synchronization.

There are a number of notable properties, including the following:

The simplex vertices correspond to recurrent causal states.
There is an uncountably infinite number of transient states. These fill out a complicated fractal measure within the $Δ^{17}$ .
All mixed states that are not on vertices are transient states.
While it is clear that M is not exactly synchronizable [35] as it contains infinite Markov-order components, is it asymptotically synchronizable [36]? What about the synchronizability of approximations to it?

There are a number of open questions, including the following:

What is M’s statistical complexity dimension $d_{μ}$ [37]?
What is the shortest synchronizing word to go from the Fair Coin to each other ergodic component?
How do informational measures grow with word-length approximation?

8. Discussion

In addition to their particular application, the ergodic decompositions give important insight into basic questions about what structural complexity is and how to measure it. A number of previous efforts that address these definitional issues consider it a key property that complexity be additive over the components of a system [38]. This is often motivated by a parallel with Boltzmann entropy in thermodynamics. And, for that matter, additivity was also posited as an axiom by Shannon for his measure of surprise [39].

However, the ergodic decompositions here show that the manner in which a system’s components relate to one another—specifically, the mixture distribution—plays a central role in the process’ organization and contributes to quantitative measures of global complexity. The foregoing offered a different, more structural view that goes beyond the ergodic decompositions and statistical mixtures. Constructively, the transient state structure is key to a multistationary process’ global organization and what observations can or cannot reveal.

The lessons here also suggest a skepticism in applying the ergodic decompositions of Section 6. One reason is that underlying them is the assumption of an IID sampling of components, which is not generally valid. Another is that they completely ignore how the internal structures of the components interrelate with each other. And, as shown, this brings out wholly new properties that are not part of any given component nor their sum nor their IID mixture. Indeed, the mixture entropy does not capture this, except in the most limited of cases.

Constructive responses to this will address the new kind of hierarchical structure explicitly represented by the multistationary process’

ϵ

-machine transient causal states and their complicated measure in

Δ^{k}

. Quantitatively, in contrast to the block entropy, entropy rate, and excess entropy, we demonstrated that the transient information is sensitive to this new kind of complexity in structural mixtures. It is this additional structure that makes the organization of multistationary processes way more than the sum of their parts. As a complementary metric, adapting the statistical complexity dimension

d_{μ}

suggests itself [37].

9. Conclusions

Let us close by exploring several wider implications for thermodynamics, on the one hand, and various attempts to introduce “universal modeling” schemes on the other.

First, we started out highlighting the colloquialism, made familiar by the social movements of the 1960s, that a system is more than the sum of its parts. Presumably, the social reaction then reflected an increasing awareness of the impact of technical systems humans were creating. The preceding development explored in which senses this could be true for truly complex systems—ones consisting of many structured components—more akin to the social subsystems than mere atoms. And the various informational ergodic decompositions bolstered the popular understanding.

However, in emphasizing structure and analyzing the concrete process class of hidden multistationary processes, it became abundantly clear—through all of the examples presented—that composite or heterogeneous (to use Gibbs’ word [40]) systems are far more than the sum of their components. Specifically, beyond a mere entropic, missing contribution from increased disorder that arises from the random selection of components, composite systems are markedly more complex. And they are more structured according to the relative interplay of the components’ internal organization. It is that interplay that drives the explosive complexity of multicomponent systems.

On this score, the history of composite systems is perhaps a bit confusing, especially as they arose in the early foundations of thermodynamics. There is, for example, Gibbs’ seemingly contradictory statement, as quoted by Jaynes (p. 13 in [1]), that “The whole is simpler than the sum of its parts”. The ergodic decompositions seemed to say the opposite. However, there is not really a confusion here. First, Gibbs was thinking of the correlations that would emerge between system components when coupled together. Here, we intentionally did not couple the components. Sequels address this. Second, at root, the issue turns on an ambiguous vocabulary for describing randomness and structure. Here, at least, by distinguishing between “randomness” in terms of Shannon’s notion of the flatness of a probability distribution and “structure” in terms of statistical complexity, we shed some light on these important and still evolving issues.

Second, the HMSP construction procedure here gives a rather direct picture of one kind of hierarchical organization in how a stochastic process can be built from other processes. The constructive procedure uses the mixed-state presentation. And this generates a new kind of hierarchy that emerges due to the diverse combinatorial relationships between the components’ internal organizations. Other related hierarchies can be similarly constructed, such as when using generalized hidden Markov models [41] as ergodic components.

Third and finally, modern statistical inference has been treated to a number of formalizations of general learning that make minimal assumptions. Consider, for example, the following:

Universal Priors [42,43,44]: In the computation-theoretic approach to modeling and statistical inference, there are attempts to define a most-general prior over model space. However, these raise very natural questions: What kind of process would generate such a prior? Moreover, what kinds of difficulties are there in detecting processes drawn according to such a prior?
No Free Lunch Theorem [45]: This framing makes a number of implicit assumptions about the measure on the Process Urn simplex. Does the theorem hold? Not when you consider structure.
Probably Almost Correct Learning [46]: This “distribution-free” approach is a bold attempt within machine learning to identify the computational nature of evolution and learning. However, is not this the same thing as assuming any process is possible? If so, then it is analogous to assuming the Mother of All Processes. That is, rather than being “distribution-free” the assumption underlying PAC learning is “distribution-full”.

In light of these, The Mother of All Processes suggests a construction for such assumption-free or minimal-assumption modeling. In this, one is sampling from the space of all processes and exploits the

ϵ

-machine representation to be specific about probability, on the one hand, and structure, on the other. The realization resulting from this was that the preceding development was able to demonstrate that transient-state structure made explicit the challenges in detecting component processes and that this was captured informationally via the transient information.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

Many thanks to Chris Ellison for insightful discussions and coding, to David Gier and Jeff Scargle for helpful comments, and to the Redwood Center for Theoretical Neuroscience, University of California at Berkeley, for its hospitality during a sabbatical visit.

Conflicts of Interest

The author declares no conflicts of interest.

References

Jaynes, E.T. The Gibbs paradox? In Maximum Entropy and Bayesian Methods; Smith, C.R., Erickson, G.J., Neudorfer, P.O., Eds.; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1991; pp. 1–22. [Google Scholar]
Callen, H.B. Thermodynamics and an Introduction to Thermostatistics, 2nd ed.; Wiley: New York, NY, USA, 1985. [Google Scholar]
Kondepudi, D. Introduction to Modern Thermodynamics; Wiley: Chichester, UK, 2008. [Google Scholar]
Lai, T.L. Sequential changepoint detection in quality control and dynamical systems. J. Roy. Stat. Soc. Ser. B 1995, 57, 613–658. [Google Scholar] [CrossRef]
Abraham, R.; Shaw, C. Dynamics: The Geometry of Behavior, 2nd ed.; Addison-Wesley: Boston, MA, USA, 1992. [Google Scholar]
Smale, S. Differentiable dynamical systems. Bull. Amer. Math. Soc. 1967, 73, 797–817. [Google Scholar] [CrossRef]
Bowen, R. One-dimensional hyperbolic sets for flows. J. Differ. Equ. 1972, 12, 173–179. [Google Scholar] [CrossRef]
Hanson, J.E.; Crutchfield, J.P. The attractor-basin portrait of a cellular automaton. J. Stat. Phys. 1992, 66, 1415–1462. [Google Scholar] [CrossRef]
Varn, D.P.; Canright, G.S.; Crutchfield, J.P. Discovering planar disorder in close-packed structures from X-Ray diffraction: Beyond the fault model. Phys. Rev. B 2002, 66, 174110–174113. [Google Scholar] [CrossRef]
Varn, D.P.; Canright, G.S.; Crutchfield, J.P. Inferring pattern and disorder in close-packed structures via ϵ-machine reconstruction theory: Structure and instrinsic computation in zinc sulphid. Acta. Cryst. Sec. B 2002, 63, 169–182. [Google Scholar] [CrossRef]
Crutchfield, J.P.; van Nimwegen, E. The evolutionary unfolding of complexity. In Evolution as Computation: DIMACS Workshop, Princeton, 1999; Landweber, L.F., Winfree, E., Eds.; Natural Computing Series; Springer: New York, NY, USA, 2001; pp. 67–94. [Google Scholar]
Gacs, P. Nonergodic one-dimensional media and reliable computation. Contemp. Math. 1985, 41, 125. [Google Scholar] [CrossRef]
Crutchfield, J.P.; Marzen, S. Signatures of infinity: Nonergodicity and resource scaling in prediction, complexity, and learning. Phys. Rev. E 2015, 91, 050106. [Google Scholar] [CrossRef]
Peters, O.; Adamou, A. The ergodicity solution of the cooperation puzzle. Phil. Trans. R. Soc. A 2022, 380, 20200425. [Google Scholar] [CrossRef] [PubMed]
Peters, O. The ergodicity problem in economics. Nat. Phys. 2019, 15, 1216. [Google Scholar] [CrossRef]
Crutchfield, J.P.; Ellison, C.J.; Mahoney, J.R. Time’s barbed arrow: Irreversibility, crypticity, and stored information. Phys. Rev. Lett. 2009, 103, 094101. [Google Scholar] [CrossRef]
Ellison, C.J.; Mahoney, J.R.; Crutchfield, J.P. Prediction, retrodiction, and the amount of information stored in the present. J. Stat. Phys. 2009, 136, 1005–1034. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley-Interscience: New York, NY, USA, 2006. [Google Scholar]
Crutchfield, J.P.; Feldman, D.P. Regularities unseen, randomness observed: Levels of entropy convergence. Chaos Interdiscip. J. Nonlinear Sci. 2003, 13, 25–54. [Google Scholar] [CrossRef]
Loomis, S.; Crutchfield, J.P. Topology, convergence, and reconstruction of predictive states. Phys. D Nonlinear Phenom. 2021, 445, 133621. [Google Scholar] [CrossRef]
Crutchfield, J.P.; Shalizi, C.R. Thermodynamic Depth of Causal States: Objective Complexity via Minimal Representations. Phys. Rev. E 1999, 59, 275–283. [Google Scholar] [CrossRef]
Travers, N.F.; Crutchfield, J.P. Equivalence of history and generator ϵ-machines. Symmetry 2024, 17, 78. [Google Scholar]
Mahoney, J.R.; Ellison, C.J.; James, R.G.; Crutchfield, J.P. How hidden are hidden processes? A primer on crypticity and entropy convergence. Chaos Interdiscip. J. Nonlinear Sci. 2011, 21, 037112. [Google Scholar] [CrossRef]
Crutchfield, J.P.; Riechers, P.; Ellison, C.J. Exact complexity: Spectral decomposition of intrinsic computation. Phys. Lett. A 2016, 380, 998–1002. [Google Scholar] [CrossRef]
Kitchens, B.; Tuncel, S. Finitary measures for subshifts of finite type and sofic systems. In Memoirs of the AMS; American Mathematical Soc.: Providence, RI, USA, 1985; Volume 58. [Google Scholar]
McTague, C.S.; Crutchfield, J.P. Automated pattern discovery—An algorithm for constructing optimally synchronizing multi-regular language filters. Theor. Comput. Sci. 2006, 359, 306–328. [Google Scholar] [CrossRef]
Shalizi, C.R.; Crutchfield, J.P. Computational mechanics: Pattern and prediction, structure and simplicity. J. Stat. Phys. 2001, 104, 817–879. [Google Scholar] [CrossRef]
Gray, R.M.; Davisson, L.D. The ergodic decomposition of stationary discrete random processses. IEEE Trans. Inf. Theory 1974, 20, 625–636. [Google Scholar] [CrossRef]
Debowski, L. Ergodic Decomposition of Excess Entropy and Conditional Mutual Information; Technical Report; Instytut Podstaw Informatyki Polskiej Akademii Nauk: Warsaw, Poland, 2007. [Google Scholar]
Debowski, L. A general definition of conditional information and its application to ergodic decomposition. Stat. Prob. Lett. 2009, 79, 1260–1268. [Google Scholar] [CrossRef]
Lohr, W. Properties of the statistical complexity functional and partially deterministic hmms. Entropy 2009, 11, 385–401. [Google Scholar] [CrossRef]
Lind, D.; Marcus, B. An Introduction to Symbolic Dynamics and Coding; Cambridge University Press: New York, NY, USA, 1995. [Google Scholar]
Travers, N.; Crutchfield, J.P. Infinite excess entropy processes with countable-state generators. Entropy 2014, 16, 1396–1413. [Google Scholar] [CrossRef]
Johnson, B.D.; Crutchfield, J.P.; Ellison, C.J.; McTague, C.S. Enumerating finitary processes. Entropy 2024, 26, 1105. [Google Scholar] [CrossRef] [PubMed]
Travers, N.; Crutchfield, J.P. Exact synchronization for finite-state sources. J. Stat. Phys. 2011, 145, 1181–1201. [Google Scholar] [CrossRef]
Travers, N.; Crutchfield, J.P. Asymptotic synchronization for finite-state sources. J. Stat. Phys. 2011, 145, 1202–1223. [Google Scholar] [CrossRef]
Jurgens, A.; Crutchfield, J.P. Divergent predictive states: The statistical complexity dimension of stationary, ergodic hidden Markov processes. Chaos Interdiscip. J. Nonlinear Sci. 2021, 31, 0050460. [Google Scholar] [CrossRef]
Bennett, C.H. On the nature and origin of complexity in discrete, homogeneous locally-interacting systems. Found. Phys. 1986, 16, 585–592. [Google Scholar] [CrossRef]
Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423, 623–656. [Google Scholar] [CrossRef]
Gibbs, J.W. On the equilibrium of heterogeneous substances. Trans. Conn. Acad. Arts Sci. 1876, 3, 108–248, 343–524. [Google Scholar] [CrossRef]
Upper, D.R. Theory and Algorithms for Hidden Markov Models and Generalized Hidden Markov Models. Ph.D. Thesis, University of California, Berkeley, CA, USA, 1997. [Google Scholar]
Rissanen, J. Universal coding, information, prediction, and estimation. IEEE Trans. Inf. Theory 1984, 30, 629–636. [Google Scholar] [CrossRef]
Rissanen, J. Stochastic Complexity in Statistical Inquiry; World Scientific: Singapore, 1989. [Google Scholar]
Li, M.; Vitanyi, P.M.B. An Introduction to Kolmogorov Complexity and Its Applications; Springer: New York, NY, USA, 1993. [Google Scholar]
Wolpert, D.H.; Macready, W.G. No free lunch theorems for optimization. IEEE Trans. Evol. Comp. 1997, 1, 67–82. [Google Scholar] [CrossRef]
Valiant, L. A theory of the learnable. Comm. ACM 1984, 27, 1134–1142. [Google Scholar] [CrossRef]

Figure 1. The Period-1 and Period-2 hidden multistationary process: (a) Component

P (1)

, (b) component

P (2)

, and (c)

M = U_{π} (P (1) ⨆ P (2))

with

π = (\frac{1}{2}, \frac{1}{2})

. Recurrent causal states are shown as hollow circles and transient causal states as small solid (black) circles. The start state sports a double circle. Transitions are labeled

p | x

to indicate taking the transition with probability p and emitting symbol

x \in A

.

Figure 1. The Period-1 and Period-2 hidden multistationary process: (a) Component

P (1)

, (b) component

P (2)

, and (c)

M = U_{π} (P (1) ⨆ P (2))

with

π = (\frac{1}{2}, \frac{1}{2})

. Recurrent causal states are shown as hollow circles and transient causal states as small solid (black) circles. The start state sports a double circle. Transitions are labeled

p | x

to indicate taking the transition with probability p and emitting symbol

x \in A

.

Figure 2. Entropy growth

H (ℓ)

(top) and entropy convergence

h_{μ} (ℓ)

(bottom) for the Period-1 and Period-2 HMSP, as function of word length

ℓ = 0, \dots, 5

.

Figure 2. Entropy growth

H (ℓ)

(top) and entropy convergence

h_{μ} (ℓ)

(bottom) for the Period-1 and Period-2 HMSP, as function of word length

ℓ = 0, \dots, 5

.

Figure 3. Two Golden Mean Processes and their nonergodic mixture: (a)

M^{1}

, (b)

M^{2}

, and (c)

M = U_{π} (M^{1} ⨆ M^{2})

with

π = (p, 1 - p)

.

Figure 3. Two Golden Mean Processes and their nonergodic mixture: (a)

M^{1}

, (b)

M^{2}

, and (c)

M = U_{π} (M^{1} ⨆ M^{2})

with

π = (p, 1 - p)

.

Figure 4. Entropy growth

H (ℓ)

(top) and entropy convergence

h_{μ} (ℓ)

(bottom) for the Two Isomorphic Golden Means HMSP, as function of word length

ℓ = 0, \dots, 16

and mixture parameter

p = 1 / 2

.

Figure 4. Entropy growth

H (ℓ)

(top) and entropy convergence

h_{μ} (ℓ)

(bottom) for the Two Isomorphic Golden Means HMSP, as function of word length

ℓ = 0, \dots, 16

and mixture parameter

p = 1 / 2

.

Figure 5. Period-1 and Fair Coin HMSP: (a)

P (1)

, (b)

B (\frac{1}{2})

, and (c)

M = U_{π} (P (1) ⨆ B (\frac{1}{2}))

with

π = (\frac{1}{2}, \frac{1}{2})

. The latter is approximated by connecting the Period-1 component after 8 1s. In fact, the

P (1)

component is never reached after any finite sequence. In contrast,

B (\frac{1}{2})

can be reached quickly and by many sequences. Recurrent causal states are shown as hollow circles and transient causal states as small solid (black) circles. The start state sports a double circle. Transitions are labeled

p | x

to indicate taking the transition with probability p and emitting symbol

x \in A

.

Figure 5. Period-1 and Fair Coin HMSP: (a)

P (1)

, (b)

B (\frac{1}{2})

, and (c)

M = U_{π} (P (1) ⨆ B (\frac{1}{2}))

with

π = (\frac{1}{2}, \frac{1}{2})

. The latter is approximated by connecting the Period-1 component after 8 1s. In fact, the

P (1)

component is never reached after any finite sequence. In contrast,

B (\frac{1}{2})

can be reached quickly and by many sequences. Recurrent causal states are shown as hollow circles and transient causal states as small solid (black) circles. The start state sports a double circle. Transitions are labeled

p | x

to indicate taking the transition with probability p and emitting symbol

x \in A

.

Figure 6. Entropy growth

H (ℓ)

(top) and entropy convergence

h_{μ} (ℓ)

(bottom) for the Period-1 and Fair Coin HMSP, as function of word length

ℓ = 0, \dots, 14

.

Figure 6. Entropy growth

H (ℓ)

(top) and entropy convergence

h_{μ} (ℓ)

(bottom) for the Period-1 and Fair Coin HMSP, as function of word length

ℓ = 0, \dots, 14

.

Figure 7. Two Biased Coins HMSP Components: (a)

B (\frac{1}{4})

and (b)

B (\frac{3}{4})

.

Figure 7. Two Biased Coins HMSP Components: (a)

B (\frac{1}{4})

and (b)

B (\frac{3}{4})

.

Figure 8. Two Biased Coins HMSP:

M = U_{π} (B (\frac{1}{4}) ⨆ B (\frac{3}{4}))

with

π = (\frac{1}{2}, \frac{1}{2})

. The latter is approximated by connecting the transient states to

B (\frac{1}{4})

and

B (\frac{3}{4})

after 30 0s or 30 1s, respectively. In fact,

B (\frac{1}{4})

and

B (\frac{3}{4})

are never reached after any finite word.

Figure 8. Two Biased Coins HMSP:

M = U_{π} (B (\frac{1}{4}) ⨆ B (\frac{3}{4}))

with

π = (\frac{1}{2}, \frac{1}{2})

. The latter is approximated by connecting the transient states to

B (\frac{1}{4})

and

B (\frac{3}{4})

after 30 0s or 30 1s, respectively. In fact,

B (\frac{1}{4})

and

B (\frac{3}{4})

are never reached after any finite word.

Figure 9. Entropy growth

H (ℓ)

(top) and entropy convergence

h_{μ} (ℓ)

(bottom) for the Two Biased Coins HMSP, as function of word length

ℓ = 0, \dots, 30

.

Figure 9. Entropy growth

H (ℓ)

(top) and entropy convergence

h_{μ} (ℓ)

(bottom) for the Two Biased Coins HMSP, as function of word length

ℓ = 0, \dots, 30

.

Figure 10. Two Even Processes: (a)

M^{1}

, pairs of 1s, with its two transient causal states and (b)

M^{2}

, pairs of 0s, with its two transient causal states.

Figure 10. Two Even Processes: (a)

M^{1}

, pairs of 1s, with its two transient causal states and (b)

M^{2}

, pairs of 0s, with its two transient causal states.

Figure 11. Two Isomorphic Even Process HMSP:

M = U_{π} (M^{1} ⨆ M^{2})

with

π (M^{1}, M^{2}) = (\frac{1}{2}, \frac{1}{2})

. Approximated with maximum word length of 8. Transient states are reconnected to mimic the component EP’s transient states. Specifically, calculating M using word length 8 leaves M’s states

38 - 41

as dangling states—with no outgoing transitions. Transitions were added by noting that these states closely approximate the EPs’ two transient states, seen in Figure 10a,b, at sufficiently large word length.

Figure 11. Two Isomorphic Even Process HMSP:

M = U_{π} (M^{1} ⨆ M^{2})

with

π (M^{1}, M^{2}) = (\frac{1}{2}, \frac{1}{2})

. Approximated with maximum word length of 8. Transient states are reconnected to mimic the component EP’s transient states. Specifically, calculating M using word length 8 leaves M’s states

38 - 41

as dangling states—with no outgoing transitions. Transitions were added by noting that these states closely approximate the EPs’ two transient states, seen in Figure 10a,b, at sufficiently large word length.

Figure 12. Entropy growth

H (ℓ)

(top) and entropy convergence

h_{μ} (ℓ)

(bottom) for the Two Isomorphic Even Multistationary Process, as function of word length

ℓ = 0, \dots, 28

.

Figure 12. Entropy growth

H (ℓ)

(top) and entropy convergence

h_{μ} (ℓ)

(bottom) for the Two Isomorphic Even Multistationary Process, as function of word length

ℓ = 0, \dots, 28

.

Figure 13. The

ϵ

-machine for HMSP presentation of the Handbag of Necklaces Process. Chains of probability-1 transitions between causal states are given as strings.

Figure 13. The

ϵ

-machine for HMSP presentation of the Handbag of Necklaces Process. Chains of probability-1 transitions between causal states are given as strings.

Figure 14. Entropy growth

H (ℓ)

(top) and entropy convergence

h_{μ} (ℓ)

(bottom) for the Handbag of Necklaces Multistationary Process, as function of word length

ℓ = 0, \dots, 30

and up to period

p = 15

. (Note that the mixed states were approximated up to Max_Length = 30.).

Figure 14. Entropy growth

H (ℓ)

(top) and entropy convergence

h_{μ} (ℓ)

(bottom) for the Handbag of Necklaces Multistationary Process, as function of word length

ℓ = 0, \dots, 30

and up to period

p = 15

. (Note that the mixed states were approximated up to Max_Length = 30.).

Figure 15. The

ϵ

-machine HMSP presentation of two completely biased coins and one fair coin—all single-state

ϵ

-machines:

M = U_{π} (All - 1 s ⨆ All - 0 s ⨆ B (\frac{1}{2}))

with

π = (\frac{1}{3}, \frac{1}{3}, \frac{1}{3})

.

Figure 15. The

ϵ

-machine HMSP presentation of two completely biased coins and one fair coin—all single-state

ϵ

-machines:

M = U_{π} (All - 1 s ⨆ All - 0 s ⨆ B (\frac{1}{2}))

with

π = (\frac{1}{3}, \frac{1}{3}, \frac{1}{3})

.

Figure 16. The

ϵ

-machine presentation of the Mother of All 1-State and 2-State HMSP:

M = U_{π} (⨆_{M \in L_{{1, 2}}} M)

with

π = (\frac{1}{10}, \dots, \frac{1}{10})

. The start state is the 17-vector at

(\frac{1}{10}, \frac{1}{10}, \frac{1}{10}, \frac{1}{20}, \dots, \frac{1}{20})

. The

ϵ

-machines for each ergodic component are placed around the periphery, dropping the transition probabilities which are either 1 or

\frac{1}{2}

. Their states are aligned to their associated vertex in

Δ^{17}

. The Fair Coin is the state at the top—the top-most vertex. The mixed states are approximated up to word lengths of

ℓ = 28

.

Figure 16. The

ϵ

-machine presentation of the Mother of All 1-State and 2-State HMSP:

M = U_{π} (⨆_{M \in L_{{1, 2}}} M)

with

π = (\frac{1}{10}, \dots, \frac{1}{10})

. The start state is the 17-vector at

(\frac{1}{10}, \frac{1}{10}, \frac{1}{10}, \frac{1}{20}, \dots, \frac{1}{20})

. The

ϵ

-machines for each ergodic component are placed around the periphery, dropping the transition probabilities which are either 1 or

\frac{1}{2}

. Their states are aligned to their associated vertex in

Δ^{17}

. The Fair Coin is the state at the top—the top-most vertex. The mixed states are approximated up to word lengths of

ℓ = 28

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Crutchfield, J.P. Way More than the Sum of Their Parts: From Statistical to Structural Mixtures. Entropy 2026, 28, 111. https://doi.org/10.3390/e28010111

AMA Style

Crutchfield JP. Way More than the Sum of Their Parts: From Statistical to Structural Mixtures. Entropy. 2026; 28(1):111. https://doi.org/10.3390/e28010111

Chicago/Turabian Style

Crutchfield, James P. 2026. "Way More than the Sum of Their Parts: From Statistical to Structural Mixtures" Entropy 28, no. 1: 111. https://doi.org/10.3390/e28010111

APA Style

Crutchfield, J. P. (2026). Way More than the Sum of Their Parts: From Statistical to Structural Mixtures. Entropy, 28(1), 111. https://doi.org/10.3390/e28010111

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Way More than the Sum of Their Parts: From Statistical to Structural Mixtures

Abstract

1. Introduction

2. Background

2.1. Processes

2.2. Information

2.3. Structure

3. Mixed State Operator

4. Constructing Hidden Multistationary Processes

5. The Multistationary $ϵ$ -Machine

6. Ergodic Decompositions

7. Structural Decompositions—Beyond Statistical

7.1. Finite Hidden Multistationary Processes

7.1.1. A Base Case

7.1.2. Period-1 and Period-2 Process

7.1.3. Isomorphic Golden Means Process

7.2. Infinite State

7.2.1. Period-1 and Fair Coin Process

7.2.2. Two Biased Coins

7.2.3. Pair of Isomorphic Even Processes

7.3. Infinite Components

7.3.1. Handbag of Necklaces

7.3.2. The Purse Process

7.3.3. Mother of All Processes

8. Discussion

9. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Way More than the Sum of Their Parts: From Statistical to Structural Mixtures

Abstract

1. Introduction

2. Background

2.1. Processes

2.2. Information

2.3. Structure

3. Mixed State Operator

4. Constructing Hidden Multistationary Processes

5. The Multistationary ϵ -Machine

6. Ergodic Decompositions

7. Structural Decompositions—Beyond Statistical

7.1. Finite Hidden Multistationary Processes

7.1.1. A Base Case

7.1.2. Period-1 and Period-2 Process

7.1.3. Isomorphic Golden Means Process

7.2. Infinite State

7.2.1. Period-1 and Fair Coin Process

7.2.2. Two Biased Coins

7.2.3. Pair of Isomorphic Even Processes

7.3. Infinite Components

7.3.1. Handbag of Necklaces

7.3.2. The Purse Process

7.3.3. Mother of All Processes

8. Discussion

9. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5. The Multistationary $ϵ$ -Machine