# Specific and Complete Local Integration of Patterns in Bayesian Networks

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Illustration

#### 1.2. Contributions

#### 1.3. Related Work

The observer will conclude that the system is an organisation to the extent that there is a compressed description of its objects and of their relations.

## 2. Notation and Background

**Definition**

**1.**

- 1.
- ${X}_{A}:={\left({X}_{i}\right)}_{i\in A}$ as the joint random variable composed of the random variables indexed by A, where A is ordered according to the total order of V,
- 2.
- ${\mathcal{X}}_{A}:={\prod}_{i\in A}{\mathcal{X}}_{i}$ as the state space of ${X}_{A}$,
- 3.
- ${x}_{A}:={\left({x}_{i}\right)}_{i\in A}\in {\mathcal{X}}_{A}$ as a value of ${X}_{A}$,
- 4.
- ${p}_{A}:{\mathcal{X}}_{A}\to [0,1]$ as the probability distribution (or more precisely probability mass function) of ${X}_{A}$ which is the joint probability distribution over the random variables indexed by A. If $A=\left\{i\right\}$ i.e., a singleton set, we drop the parentheses and just write ${p}_{A}={p}_{i}$,
- 5.
- ${p}_{A,B}:{\mathcal{X}}_{A}\times {\mathcal{X}}_{B}\to [0,1]$ as the probability distribution over ${\mathcal{X}}_{A}\times {\mathcal{X}}_{B}$. Note that in general for arbitrary $A,B\subseteq V$, ${x}_{A}\in {\mathcal{X}}_{A}$, and ${y}_{B}\in {\mathcal{X}}_{B}$ this can be rewritten as a distribution over the intersection of A and B and the respective complements. The variables in the intersection have to coincide:$$\begin{array}{cc}\hfill {p}_{A,B}({x}_{A},{y}_{B}):& ={p}_{A\backslash B,A\cap B,B\backslash A,A\cap B}({x}_{A\backslash B},{x}_{A\cap B},{y}_{B\backslash A},{y}_{A\cap B})\hfill \end{array}$$$$\begin{array}{cc}& ={\delta}_{{x}_{A\cap B}}\left({y}_{A\cap B}\right)\phantom{\rule{0.277778em}{0ex}}{p}_{A\backslash B,A\cap B,B\backslash A}({x}_{A\backslash B},{x}_{A\cap B},{y}_{B\backslash A}).\hfill \end{array}$$
- 6.
- ${p}_{B|A}:{\mathcal{X}}_{A}\times {\mathcal{X}}_{B}\to [0,1]$ with $({x}_{A},{x}_{B})\mapsto {p}_{B|A}\left({x}_{B}\right|{x}_{A})$ as the conditional probability distribution over ${X}_{B}$ given ${X}_{A}$:$$\begin{array}{c}\hfill {p}_{B|A}\left({y}_{B}\right|{x}_{A}):=\frac{{p}_{A,B}({x}_{A},{y}_{B})}{{p}_{A}\left({x}_{A}\right)}.\end{array}$$

**Definition**

**2**(Partition lattice of a set of random variables)

**.**

- 1.
- Then its partition lattice $\mathfrak{L}\left(V\right)$ is the set of partitions of V partially ordered by refinement (see also Appendix B).
- 2.
- For two partitions $\pi ,\rho \in \mathfrak{L}\left(V\right)$ we write $\pi \u22b2\rho $ if π refines ρ and $\pi \u22b2:\rho $ if π covers ρ. The latter means that $\pi \ne \rho $, $\pi \u22b2\rho $, and there is no $\xi \in \mathfrak{L}\left(V\right)$ with $\pi \ne \xi \ne \rho $ such that $\pi \u22b2\xi \u22b2\rho $.
- 3.
- We write $\mathit{0}$ for the zero element of a partially ordered set (including lattices) and $\mathit{1}$ for the unit element.
- 4.
- Given a partition $\pi \in \mathfrak{L}\left(V\right)$ and a subset $A\subseteq V$ we define the restricted partition ${\pi |}_{A}$ of π to A via:$${\pi |}_{A}:=\{b\cap A:b\in \pi \}.$$

## 3. Patterns, Entities, Specific, and Complete Local Integration

#### 3.1. Patterns

**Definition**

**3**(Patterns and trajectories)

**.**

- 1.
- A pattern at $A\subseteq V$ is an assignment$$\begin{array}{c}\hfill {X}_{A}={x}_{A}\end{array}$$
- 2.
- The elements ${x}_{V}$ of the joint state space ${\mathcal{X}}_{V}$ are isomorphic to the patterns ${X}_{V}={x}_{V}$ at V which fix the complete set ${\left\{{X}_{i}\right\}}_{i\in V}$ of random variables. Since they will be used repeatedly we refer to them as the trajectories of ${\left\{{X}_{i}\right\}}_{i\in V}$.
- 3.
- A pattern ${x}_{A}$ is said to occur in trajectory ${\overline{x}}_{V}\in {\mathcal{X}}_{V}$ if ${\overline{x}}_{A}={x}_{A}$.
- 4.
- Each pattern ${x}_{A}$ uniquely defines (or captures) a set of trajectories $\mathcal{T}\left({x}_{A}\right)$ via$$\begin{array}{c}\hfill \mathcal{T}\left({x}_{A}\right)=\{{\overline{x}}_{V}\in {\mathcal{X}}_{V}:{\overline{x}}_{A}={x}_{A}\},\end{array}$$
- 5.
- It is convenient to allow the empty pattern ${x}_{\varnothing}$ for which we define $\mathcal{T}\left({x}_{\varnothing}\right)={\mathcal{X}}_{V}$.

- Note that for every ${x}_{A}\in {\mathcal{X}}_{A}$ we can form a pattern ${\mathcal{X}}_{A}={x}_{A}$ so the set of all patterns is ${\bigcup}_{A\subseteq V}{\mathcal{X}}_{A}$.
- Our notion of patterns is similar to “patterns” as defined in [29] and to “cylinders” as defined in [30]. More precisely, these other definitions concern (probabilistic) cellular automata where all random variables have identical state spaces ${\mathcal{X}}_{i}={\mathcal{X}}_{j}$ for all $i,j\in V$. They also restrict the extent of the patterns or cylinders to a single time-step. Under these conditions our patterns are isomorphic to these other definitions. However, we drop both the identical state space assumption and the restriction to single time-steps.Our definition is inspired by the usage of the term “spatiotemporal pattern” in [14,31,32]. There is no formal definition of this notion given in these publications but we believe that our definition is a straightforward formalisation. Note that these publications only treat the Game of Life cellular automaton. The assumption of identical state space is therefore implicitly made. At the same time the restriction to single time-steps is explicitly dropped.

**Definition**

**4.**

**Theorem**

**1.**

**Proof.**

**Corollary**

**1.**

**Proof.**

#### 3.2. Motivation of Complete Local Integration as an Entity Criterion

- spatial identity and
- temporal identity.

**Definition**

**5**(Specific local integration (SLI))

**.**

**Definition**

**6.**((Complete) local integration)

**.**

- The reason for excluding the unit partition ${\mathbf{1}}_{O}$ of $\mathfrak{L}\left(O\right)$ (where ${\mathbf{1}}_{O}=\left\{O\right\}$ see Definition 2) is that with respect to it every pattern has ${mi}_{{\mathbf{1}}_{O}}\left({x}_{O}\right)=0$.
- Looking for a partition that minimises a measure of integration is known as the weakest link approach [35] to dealing with multiple partitions. We note here that this is not the only approach that is being discussed. Another approach is to look at weighted averages of all integrations. For a further discussion of this point in the case of the expected value of SLI see Ay [35] and references therein. For our interpretation taking the average seems less well suited since requiring a positive average will allow SLI to be negative with respect to some partitions.

**Definition**

**7**(ι-entity)

**.**

**Definition**

**8**(ci-entity-set)

**.**

- A first consequence of introducing the logarithm is that we can now formulate the condition of Equation (24) analogously to an old phrase attributed to Aristotle that “the whole is more than the sum of its parts”. In our case this would need to be changed to “the log-probability of the (spatiotemporal) whole is greater than the sum of the log-probabilities of its (spatiotemporal) parts”. This can easily be seen by rewriting Equation (22) as:$${mi}_{\pi}\left({x}_{O}\right)=log{p}_{O}\left({x}_{O}\right)-\sum _{b\in \pi}log{p}_{b}\left({x}_{b}\right).$$
- Another side effect of using the logarithm is that we can interpret Equation (24) in terms of the surprise value (also called information content) $-log{p}_{O}\left({x}_{O}\right)$ [36] of the pattern ${x}_{O}$ and the surprise value of its parts with respect to any partition $\pi $. Rewriting Equation (22) using properties of the logarithm we get:$${mi}_{\pi}\left({x}_{O}\right)=\sum _{b\in \pi}(-log{p}_{b}\left({x}_{b}\right))-(-log{p}_{O}\left({x}_{O}\right)).$$
- In coding theory, the Kraft-McMillan theorem [37] tells us that the optimal length (in a uniquely decodable binary code) of a codeword for an event x is $l\left(x\right)=-logp\left(x\right)$ if $p\left(x\right)$ is the true probability of x. If the encoding is not based on the true probability of x but instead on a different probability $q\left(x\right)$ then the difference between the optimal codeword length and the chosen codeword length is$$-logq\left(x\right)-(-logp\left(x\right))=log\frac{p\left(x\right)}{q\left(x\right)}.$$Complete local integration then requires that the joint code codeword is shorter than all possible product code codewords. This means there is no partition with respect to which the product code for the pattern ${x}_{O}$ has a shorter codeword than the joint code. So $\iota $-entities are patterns that are shorter to encode with the joint code than a product code. Patterns that have a shorter codeword in a product code associated to a partition $\pi $ have negative SLI with respect to this $\pi $ and are therefore not $\iota $-entities.
- We can relate our measure of identity to other measures in information theory. For this we note that the expectation value of specific local integration with respect to a partition $\pi $ is the multi-information ${MI}_{\pi}\left({X}_{O}\right)$ [9,10] with respect to $\pi $, i.e.,$$\begin{array}{cc}\hfill {MI}_{\pi}\left({X}_{O}\right):& =\sum _{{x}_{O}\in {\mathcal{X}}_{O}}{p}_{O}\left({x}_{O}\right)log\frac{{p}_{O}\left({x}_{O}\right)}{{\prod}_{b\in \pi}{p}_{b}\left({x}_{b}\right)}\hfill \end{array}$$$$\begin{array}{cc}& =\sum _{{x}_{O}\in {\mathcal{X}}_{O}}{p}_{O}\left({x}_{O}\right){mi}_{\pi}\left({x}_{O}\right).\hfill \end{array}$$

#### 3.3. Properties of Specific Local Integration

#### 3.3.1. Deterministic Case

**Theorem**

**2**(Deterministic specific local integration)

**.**

**Proof.**

#### 3.3.2. Upper Bounds

**Definition**

**9**(Anti-pattern)

**.**

- It is important to note that for an element of $\neg \left({x}_{O}\right)$ to occur it is not sufficient that ${x}_{O}$ does not occur. Only if every random variable ${X}_{i}$ with $i\in O$ differs from the value ${x}_{i}$ specified by ${x}_{O}$ does an element of $\neg \left({x}_{O}\right)$ necessarily occur. This is why we call $\neg \left({x}_{O}\right)$ the anti-pattern of ${x}_{O}$.

**Theorem**

**3**(Construction of a pattern with maximum SLI)

**.**

**Proof.**

- for all $i\in O\subset V$ let $pa\left(i\right)\cap (V\backslash O)=\varnothing $, i.e., nodes in O have no parents in the complement of O,
- for a specific $j\in O$ and all other $i\in O\backslash \left\{j\right\}$ let $pa\left(i\right)=\left\{j\right\}$, i.e., all nodes in O apart from j have $j\in O$ as a parent,
- for all $i\in O\backslash \left\{j\right\}$ let ${p}_{i}\left({\overline{x}}_{i}\right|b{\overline{x}}_{j})={\delta}_{{\overline{x}}_{j}}\left({\overline{x}}_{i}\right)$, i.e., the state of all nodes in O is always the same as the state of node j,
- also choose ${p}_{j}\left({x}_{j}\right)=q$ and ${\sum}_{{\overline{x}}_{j}\ne {x}_{j}}{p}_{j}\left({x}_{j}\right)=1-q$.

- ${p}_{O}\left({x}_{O}\right)=q$,
- ${\sum}_{{\overline{x}}_{O}\in \neg \left({x}_{O}\right)}{p}_{O}\left({\overline{x}}_{O}\right)=1-q$.

**Theorem**

**4**(Upper bound of SLI)

**.**

- 1.
- The tight upper bound of the SLI with respect to any partition π with $\left|\pi \right|=n$ fixed is$$\underset{\{{\left\{{X}_{i}\right\}}_{i\in V}:\exists {x}_{O},{p}_{O}\left({x}_{O}\right)=q\}}{max}\underset{\{\pi :|\pi |=n\}}{max}{mi}_{\pi}\left({x}_{O}\right)\le -(n-1)logq.$$
- 2.
- The upper bound is achieved if and only if for all $b\in \pi $ we have$${p}_{b}\left({x}_{b}\right)={p}_{O}\left({x}_{O}\right)=q.$$
- 3.
- The upper bound is achieved if and only if for all $b\in \pi $ we have that ${x}_{b}$ occurs if and only if ${x}_{O}$ occurs.

**Proof.**

**ad 1**- By Definition 5 we have$${mi}_{\pi}\left({x}_{O}\right)=log\frac{{p}_{O}\left({x}_{O}\right)}{{\prod}_{b\in \pi}{p}_{b}\left({x}_{b}\right)}.$$Now note that for any ${x}_{O}$ and $b\subseteq O$$$\begin{array}{cc}\hfill {p}_{b}\left({x}_{b}\right)& =\sum _{{\overline{x}}_{O\backslash b}}{p}_{O}({x}_{b},{\overline{x}}_{O\backslash b})\hfill \end{array}$$$$\begin{array}{cc}& ={p}_{O}\left({x}_{O}\right)+\sum _{{\overline{x}}_{O\backslash b}\ne {x}_{O\backslash b}}{p}_{O}({x}_{b},{\overline{x}}_{O\backslash b})\hfill \end{array}$$$$\begin{array}{cc}& \ge {p}_{O}\left({x}_{O}\right).\hfill \end{array}$$Plugging this into Equation (41) for every ${p}_{b}\left({x}_{b}\right)$ we get$$\begin{array}{cc}\hfill {mi}_{\pi}\left({x}_{O}\right)& =log\frac{{p}_{O}\left({x}_{O}\right)}{{\prod}_{b\in \pi}{p}_{b}\left({x}_{b}\right)}\hfill \end{array}$$$$\begin{array}{cc}& \le log\frac{{p}_{O}\left({x}_{O}\right)}{{p}_{O}{\left({x}_{O}\right)}^{\left|\pi \right|}}\hfill \end{array}$$$$\begin{array}{cc}& =-\left(\right|\pi |-1)log{p}_{O}\left({x}_{O}\right).\hfill \end{array}$$This shows that $-\left(\right|\pi |-1)log{p}_{O}\left({x}_{O}\right)$ is indeed an upper bound. To show that it is tight we have to show that for a given ${p}_{O}\left({x}_{O}\right)$ and $\left|\pi \right|$ there are Bayesian networks with patterns ${x}_{O}$ such that this upper bound is achieved. The construction of such a Bayesian network and a pattern ${x}_{O}$ was presented in Theorem 3.
**ad 2**- If for all $b\in \pi $ we have ${p}_{b}\left({x}_{b}\right)={p}_{O}\left({x}_{O}\right)$ then clearly ${mi}_{\pi}\left({x}_{O}\right)=-\left(\right|\pi |-1)log{p}_{O}\left({x}_{O}\right)$ and the least upper bound is achieved. If on the other hand ${mi}_{\pi}\left({x}_{O}\right)=-\left(\right|\pi |-1)log{p}_{O}\left({x}_{O}\right)$ then$$\begin{array}{cc}\hfill log\frac{{p}_{O}\left({x}_{O}\right)}{{\prod}_{b\in \pi}{p}_{b}\left({x}_{b}\right)}& =-\left(\right|\pi |-1)log{p}_{O}\left({x}_{O}\right)\hfill \end{array}$$$$\begin{array}{ccc}\hfill \iff \phantom{\rule{0.277778em}{0ex}}\phantom{\rule{0.277778em}{0ex}}\phantom{\rule{0.277778em}{0ex}}\phantom{\rule{0.277778em}{0ex}}\phantom{\rule{0.277778em}{0ex}}\phantom{\rule{0.277778em}{0ex}}& \hfill log\frac{{p}_{O}\left({x}_{O}\right)}{{\prod}_{b\in \pi}{p}_{b}\left({x}_{b}\right)}& =log\frac{{p}_{O}\left({x}_{O}\right)}{{p}_{O}{\left({x}_{O}\right)}^{\left|\pi \right|}}\hfill \end{array}$$$$\begin{array}{ccc}\hfill \iff \phantom{\rule{0.277778em}{0ex}}\phantom{\rule{0.277778em}{0ex}}\phantom{\rule{0.277778em}{0ex}}\phantom{\rule{0.277778em}{0ex}}\phantom{\rule{0.277778em}{0ex}}\phantom{\rule{0.277778em}{0ex}}& \hfill \prod _{b\in \pi}{p}_{b}\left({x}_{b}\right)& ={p}_{O}{\left({x}_{O}\right)}^{\left|\pi \right|},\hfill \end{array}$$
**ad 3**- By definition for any $b\in \pi $ we have $b\subseteq O$ such that ${x}_{b}$ always occurs if ${x}_{O}$ occurs. Now assume ${x}_{b}$ occurs and ${x}_{O}$ does not occur. In that case there is a positive probability for a pattern $({x}_{b},{\overline{x}}_{O\backslash b})$ with ${\overline{x}}_{O\backslash b}\ne {x}_{O\backslash b}$ i.e., ${p}_{O}({x}_{b},{\overline{x}}_{O\backslash b})>0$. Recalling Equation (43) we then see that$$\begin{array}{cc}\hfill {p}_{b}\left({x}_{b}\right)& ={p}_{O}\left({x}_{O}\right)+\sum _{{\overline{x}}_{O\backslash b}\ne {x}_{O\backslash b}}{p}_{O}({x}_{b},{\overline{x}}_{O\backslash b})\hfill \end{array}$$$$\begin{array}{cc}& >{p}_{O}\left({x}_{O}\right).\hfill \end{array}$$

- Note that this is the least upper bound for Bayesian networks in general. For a specific Bayesian network there might be no pattern that achieves this bound.
- The least upper bound of SLI increases with the improbability of the pattern and the number of parts that it is split into. If ${p}_{O}\left({x}_{O}\right)\to 0$ then we can have ${mi}_{\pi}\left({x}_{O}\right)\to \infty $.
- Using this least upper bound it is easy to see the least upper bound for the SLI of a pattern ${x}_{O}$ across all partitions $\left|\pi \right|$. We just have to note that $\left|\pi \right|\le \left|O\right|$.
- Since it is the minimum value of SLI with respect to arbitrary partitions the least upper bound of SLI is also an upper bound for CLI. It may not be the least upper bound however.

#### 3.3.3. Negative SLI

**Theorem**

**5.**

**Proof.**

- for all $i\in O$ let $|{\mathcal{X}}_{i}|=n$
- for every block $b\in \pi $ let $\left|b\right|=\frac{\left|O\right|}{\left|\pi \right|}$,
- for ${\overline{x}}_{O}\in {\mathcal{X}}_{O}$ let:$${p}_{O}\left({\overline{x}}_{O}\right):=\left(\right)open="\{"\; close>\begin{array}{cc}q\hfill & \phantom{\rule{4.pt}{0ex}}\mathrm{if}\phantom{\rule{4.pt}{0ex}}{\overline{x}}_{O}={x}_{O},\hfill \\ \frac{1-q-d}{{\sum}_{b\in \pi}|\neg \left({x}_{b}\right)|}\hfill & \phantom{\rule{4.pt}{0ex}}\mathrm{if}\phantom{\rule{4.pt}{0ex}}\exists c\in \pi \phantom{\rule{4.pt}{0ex}}\mathrm{s}.\mathrm{t}.\phantom{\rule{4.pt}{0ex}}{\overline{x}}_{O\backslash c}={x}_{O\backslash c}\wedge {\overline{x}}_{c}\ne {x}_{c},\hfill \\ \frac{d}{|\neg ({x}_{O}\left)\right|}\hfill & \phantom{\rule{4.pt}{0ex}}\mathrm{if}\phantom{\rule{4.pt}{0ex}}{\overline{x}}_{O}\in \neg \left({x}_{O}\right),\hfill \\ 0\hfill & \phantom{\rule{4.pt}{0ex}}\mathrm{else}.\hfill \end{array}$$

- The achieved value in Equation (53) is also our best candidate for a greatest lower bound of SLI for given ${p}_{O}\left({x}_{O}\right)$ and $\left|\pi \right|$. However, we have not been able to prove this yet.
- The construction equidistributes the probability $1-q$ (left to be distributed after the probability q of the whole pattern occurring is chosen) to the patterns ${\overline{x}}_{O}$ that are almost the same as the pattern ${x}_{O}$. These are almost the same in a precise sense: They differ in exactly one of the blocks of $\pi $, i.e., they differ by as little as can possibly be resolved/revealed by the partition $\pi $.
- For a pattern and partition such that $\left|O\right|/\left|\pi \right|$ is not a natural number, the same bound might still be achieved however a little extra effort has to go into the construction 3. of the proof such that Equation (59) still holds. This is not necessary for our purpose here as we only want to show the existence of patterns achieving the negative value.
- Since it is the minimum value of SLI with respect to arbitrary partitions the candidate for the greatest lower bound of SLI is also a candidate for the greatest lower bound of CLI.

#### 3.4. Disintegration

**Definition**

**10**(Disintegration hierarchy)

**.**

- 1.
- $${\mathfrak{D}}_{1}\left({x}_{V}\right):=\underset{\pi \in \mathfrak{L}\left(V\right)}{arg\; min}{mi}_{\pi}\left({x}_{V}\right)$$
- 2.
- and for $i>1$:$${\mathfrak{D}}_{i}\left({x}_{V}\right):=\underset{\pi \in \mathfrak{L}\left(V\right)\backslash {\mathfrak{D}}_{\prec i}\left({x}_{V}\right)}{arg\; min}{mi}_{\pi}\left({x}_{V}\right).$$

- Note that arg min returns all partitions that achieve the minimum SLI if there is more than one.
- Since the Bayesian networks we use are finite, the partition lattice $\mathfrak{L}\left(V\right)$ is finite, the set of attained SLI values is finite, and the number $\left|\mathfrak{D}\right|$ of disintegration levels is finite.
- In most cases the Bayesian network contains some symmetries among their mechanisms which cause multiple partitions to attain the same SLI value.
- For each trajectory ${x}_{V}$ the disintegration hierarchy $\mathfrak{D}$ then partitions the elements of $\mathfrak{L}\left(V\right)$ into subsets ${\mathfrak{D}}_{i}\left({x}_{V}\right)$ of equal SLI. The levels of the hierarchy have increasing SLI.

**Definition**

**11.**

**Definition**

**12**(Refinement-free disintegration hierarchy)

**.**

- 1.
- $${\mathfrak{D}}_{1}^{\u25c2}\left({x}_{V}\right):=\{\pi \in {\mathfrak{D}}_{1}\left({x}_{V}\right):{\mathfrak{D}}_{1}{\left({x}_{V}\right)}_{\u22b2\pi}=\varnothing \},$$
- 2.
- and for $i>1$:$${\mathfrak{D}}_{i}^{\u25c2}\left({x}_{V}\right):=\{\pi \in {\mathfrak{D}}_{i}\left({x}_{V}\right):{\mathfrak{D}}_{\prec i}{\left({x}_{V}\right)}_{\u22b2\pi}=\varnothing \}$$

- Each level ${\mathfrak{D}}_{i}^{\u25c2}\left({x}_{V}\right)$ in the refinement-free disintegration hierarchy ${\mathfrak{D}}^{\u25c2}\left({x}_{V}\right)$ consists only of those partitions that neither have refinements at their own nor at any of the preceding levels. So each partition that occurs in the refinement-free disintegration hierarchy at the i-th level is a finest partition that achieves such a low level of SLI or such a high level of disintegration.
- As we will see below, the blocks of the partitions in the refinement-free disintegration hierarchy are the main reason for defining the refinement-free disintegration hierarchy.

**Theorem**

**6**(Disintegration theorem)

**.**

- 1.
- Then for every ${\mathfrak{D}}_{i}^{\u25c2}\left({x}_{V}\right)\in {\mathfrak{D}}^{\u25c2}\left({x}_{V}\right)$ we find for every $b\in \pi $ with $\pi \in {\mathfrak{D}}_{i}^{\u25c2}\left({x}_{V}\right)$ that there are only the following possibilities:
- (a)
- b is a singleton, i.e., $b=\left\{i\right\}$ for some $i\in V$, or
- (b)
- ${x}_{b}$ is completely locally integrated, i.e., $\iota \left({x}_{b}\right)>0$.

- 2.
- Conversely, for any completely locally integrated pattern ${x}_{A}$, there is a partition ${\pi}^{A}\in \mathfrak{L}\left(V\right)$ and a level ${\mathfrak{D}}_{{i}^{A}}^{\u25c2}\left({x}_{V}\right)\in {\mathfrak{D}}^{\u25c2}\left({x}_{V}\right)$ such that $A\in {\pi}^{A}$ and ${\pi}^{A}\in {\mathfrak{D}}_{{i}^{A}}^{\u25c2}\left({x}_{V}\right)$.

**Proof.**

**ad 1**- We prove the theorem by contradiction. For this assume that there is block b in a partition $\pi \in {\mathfrak{D}}_{i}^{\u25c2}\left({x}_{V}\right)$ which is neither a singleton nor completely integrated. Let $\pi \in {\mathfrak{D}}_{i}^{\u25c2}\left({x}_{V}\right)$ and $b\in \pi $. Assume b is not a singleton i.e., there exist $i\ne j\in V$ such that $i\in b$ and $j\in b$. Also assume that b is not completely integrated i.e., there exists a partition $\xi $ of b with $\xi \ne {\mathbf{1}}_{b}$ such that ${mi}_{\xi}\left({x}_{b}\right)\le 0$. Note that a singleton cannot be completely locally integrated as it does not allow for a non-unit partition. So together the two assumptions imply ${p}_{b}\left({x}_{b}\right)\le {\prod}_{d\in \xi}{p}_{d}\left({x}_{d}\right)$ with $\left|\xi \right|>1$. However, then$$\begin{array}{cc}\hfill {mi}_{\pi}\left({x}_{V}\right)& =log\frac{{p}_{V}\left({x}_{V}\right)}{{p}_{b}\left({x}_{b}\right){\prod}_{c\in \pi \backslash b}{p}_{c}\left({x}_{c}\right)}\hfill \end{array}$$$$\begin{array}{cc}& \ge log\frac{{p}_{V}\left({x}_{V}\right)}{{\prod}_{d\in \xi}{p}_{d}\left({x}_{d}\right){\prod}_{c\in \pi \backslash b}{p}_{c}\left({x}_{c}\right)}\hfill \end{array}$$$${mi}_{\pi}\left({x}_{V}\right)=log\frac{{p}_{V}\left({x}_{V}\right)}{{\prod}_{d\in \xi}{p}_{d}\left({x}_{d}\right){\prod}_{c\in \pi \backslash b}{p}_{c}\left({x}_{c}\right)}.$$
- ${mi}_{\rho}\left({x}_{V}\right)={mi}_{\pi}\left({x}_{V}\right)$ which implies that $\rho \in {\mathfrak{D}}_{i}\left({x}_{V}\right)$ because $\pi \in {\mathfrak{D}}_{i}\left({x}_{V}\right)$, and
- $\rho \u22b2\pi $ which contradicts $\pi \in {\mathfrak{D}}_{i}^{\u25c2}\left({x}_{V}\right)$.

Second, let$${mi}_{\pi}\left({x}_{V}\right)>log\frac{{p}_{V}\left({x}_{V}\right)}{{\prod}_{d\in \xi}{p}_{d}\left({x}_{d}\right){\prod}_{c\in \pi \backslash b}{p}_{c}\left({x}_{c}\right)}.$$$${mi}_{\rho}\left({x}_{V}\right)<{mi}_{\pi}\left({x}_{V}\right),$$ **ad 2**- By assumption ${x}_{A}$ is completely locally integrated. Then let ${\pi}^{A}:=\left\{A\right\}\cup {\left\{\left\{j\right\}\right\}}_{j\in V\backslash A}$. Since ${\pi}^{A}$ is a partition of V it is an element of some disintegration level ${\mathfrak{D}}_{{i}^{A}}$. Then partition ${\pi}^{A}$ is also an element of the refinement-free disintegration level ${\mathfrak{D}}_{{i}^{A}}^{\u25c2}\left({x}_{V}\right)$ as we will see in the following. This is because any refinements must (by construction of ${\pi}^{A}$ break up A into further blocks which means that the local specific integration of all such partitions is higher. Then they must be at lower disintegration level ${\mathfrak{D}}_{k}\left({x}_{V}\right)$ with $k\ge {i}^{A}$. Therefore, ${\pi}^{A}$ has no refinement at its own or a higher disintegration level. More formally, let $\xi \in \mathfrak{L}\left(V\right),\xi \ne {\pi}^{A}$ and $\xi \u22b2{\pi}^{A}$ since ${\pi}^{A}$ only contains singletons apart from A the partition $\xi $ must split the block A into multiple blocks ${c\in \xi |}_{A}$. Since $\iota \left({x}_{A}\right)>0$ we know that$${mi}_{{\xi |}_{A}}\left({x}_{A}\right)=log\frac{{p}_{A}\left({x}_{A}\right)}{{\prod}_{{c\in \xi |}_{A}}{p}_{c}\left({x}_{c}\right)}>0$$$$\begin{array}{cc}\hfill {mi}_{\xi}\left({x}_{V}\right)& =log\frac{{p}_{V}\left({x}_{V}\right)}{{\prod}_{{c\in \xi |}_{A}}{p}_{c}\left({x}_{c}\right){\prod}_{i\in V\backslash A}{p}_{i}\left({x}_{i}\right)}\hfill \end{array}$$$$\begin{array}{cc}& >log\frac{{p}_{V}\left({x}_{V}\right)}{{p}_{A}\left({x}_{A}\right){\prod}_{i\in V\backslash A}{p}_{i}\left({x}_{i}\right)}\hfill \end{array}$$$$\begin{array}{cc}& ={mi}_{{\pi}^{A}}\left({x}_{V}\right).\hfill \end{array}$$

**Corollary**

**2.**

- 1.
- b is a singleton, i.e., $b=\left\{i\right\}$ for some $i\in V$, or
- 2.
- ${X}_{b}$ is completely (not only locally) integrated, i.e., $I\left({X}_{b}\right)>0$.

**Proof.**

#### 3.5. Disintegration Interpretation

#### 3.6. Related Approaches

## 4. Examples

#### 4.1. Set of Independent Random Variables

#### 4.2. Two Constant and Independent Binary Random Variables: $M{C}^{=}$

#### 4.2.1. Definition

#### 4.2.2. Trajectories

#### 4.2.3. Partitions of Trajectories

#### 4.2.4. SLI Values of the Partitions

#### 4.2.5. Disintegration Hierarchy

#### 4.2.6. Completely Integrated Patterns

#### 4.3. Two Random Variables with Small Interactions

#### 4.3.1. Definition

#### 4.3.2. Trajectories

#### 4.3.3. SLI Values of the Partitions

#### 4.3.4. Completely Integrated Patterns

## 5. Discussion

- correspond to fixed single random variables for a set of independent random variables,
- can vary from one trajectory to another,
- and can change the degrees of freedom that they occupy over time,
- can be ambiguous at a fixed level of disintegration due to symmetries of the system,
- can overlap at the same level of disintegration due to this ambiguity,
- can overlap across multiple levels of disintegration i.e., parts of $\iota $-entities can be $\iota $-entities again.

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## Abbreviations

SLI | Specific local integration |

CLI | Complete local integration |

## Appendix A. Kronecker Delta

**Definition**

**A1**(Delta)

**.**

- Let $X,Y$ be two random variables with state spaces $\mathcal{X},\mathcal{Y}$ and $f:\mathcal{X}\to \mathcal{Y}$ a function such that$$\begin{array}{c}\hfill p\left(y\right|x)={\delta}_{f\left(x\right)}\left(y\right),\end{array}$$$$\begin{array}{cc}\hfill p\left(y\right)& =\sum _{x}{p}_{Y}\left(y\right|x){p}_{X}\left(x\right)\hfill \end{array}$$$$\begin{array}{cc}& =\sum _{x}{\delta}_{f\left(x\right)}\left(y\right){p}_{X}\left(x\right)\hfill \end{array}$$$$\begin{array}{cc}& =\sum _{x}{\delta}_{x}\left({f}^{-1}\left(y\right)\right){p}_{X}\left(x\right)\hfill \end{array}$$$$\begin{array}{cc}& =\sum _{x\in {f}^{-1}\left(y\right)}{p}_{X}\left(x\right)\hfill \end{array}$$$$\begin{array}{cc}& ={p}_{X}\left({f}^{-1}\left(y\right)\right).\hfill \end{array}$$

## Appendix B. Refinement and Partition Lattice Examples

**Definition**

**A2.**

- 1.
- for all ${x}_{1},{x}_{2}\in \pi $, if ${x}_{1}\ne {x}_{2}$, then ${x}_{1}\cap {x}_{2}=\varnothing $,
- 2.
- ${\bigcup}_{x\in \pi}=\mathcal{X}$.

- In words, a partition of a set is a set of disjoint non-empty subsets whose union is the whole set.

**Definition**

**A3.**

**Definition**

**A4**(Refinement and coarsening)

**.**

- More intuitively, $\pi $ is a refinement of $\rho $ if all blocks of $\pi $ can be obtained by further partitioning the blocks of $\rho $. Conversely, $\rho $ is a coarsening of $\pi $ if all blocks in $\rho $ are unions of blocks in $\pi $.
- Examples are contained in the Hasse diagrams (defined below) shown in Figure A1.

**Definition**

**A5**(Hasse diagram)

**.**

- No edge is drawn between two elements $a,b\in A$ if $a\u22b2b$ but not $a\u22b2:b$.
- Only drawing edges for the covering relation does not imply a loss of information about the poset since the covering relation determines the partial order completely.
- For an example Hasse diagrams see Figure A1.

## Appendix C. Bayesian Networks

**Definition**

**A6.**

- In general there are multiple directed acyclic graphs that are factorisation compatible with the same probability distribution. For example, if we choose any total order for the nodes in V and define a graph by $pa\left(i\right)=\{j\in V:j<i\}$ then Equation (A12) becomes Equation (A11) which always holds.

**Definition**

**A7**(Bayesian network)

**.**

- On top of constituting the vertices of the graph G the set V is also assumed to be totally ordered in an (arbitrarily) fixed way. Whenever we use a subset $A\subset V$ to index a sequence of variables in the Bayesian network (e.g., in ${p}_{A}\left({x}_{A}\right)$) we order A according to this total order as well.
- Since ${\left\{{X}_{i}\right\}}_{i\in V}$ is finite and G is acyclic there is a set ${V}_{0}$ of nodes without parents.

**Definition**

**A8**(Mechanism)

**.**

- We could define the set of all mechanisms to formally also include the mechanisms of the nodes without parents ${V}_{0}$. However, in practice it makes sense to separate the nodes without parents as those that we choose an initial probability distribution over (similar to a boundary condition) which is then turned into a probability distribution ${p}_{V}$ over the entire Bayesian network ${\left\{{X}_{i}\right\}}_{i\in V}$ via Equation (A12). Note that in Equation (A12) the nodes in ${V}_{0}$ are not explicit as they are just factors ${p}_{i}\left({x}_{i}\right|{x}_{pa\left(i\right)})$ with $pa\left(i\right)=\varnothing $.
- To construct a Bayesian network, take graph $G=(V,E)$ and equip each node $i\in (V\backslash {V}_{0})$ with a mechanism ${p}_{i}:{\mathcal{X}}_{pa\left(i\right)}\times {\mathcal{X}}_{i}\to [0,1]$ and for each node $i\in {V}_{0}$ choose a probability distribution ${p}_{i}:{\mathcal{X}}_{i}\to [0,1]$. The joint probability distribution is then calculated by the according version of Equation (A12):$$\begin{array}{c}\hfill {p}_{V}\left({x}_{V}\right)=\prod _{i\in V\backslash {V}_{0}}{p}_{i}\left({x}_{i}\right|{x}_{pa\left(i\right)})\prod _{j\in {V}_{0}}{p}_{j}\left({x}_{j}\right).\end{array}$$

#### Appendix C.1. Deterministic Bayesian Networks

**Definition**

**A9**(Deterministic mechanism)

**.**

**Definition**

**A10**(Deterministic Bayesian network)

**.**

**Theorem**

**A1.**

**Proof.**

**Theorem**

**A2**(Pattern probability in a deterministic Bayesian network)

**.**

**Proof.**

- Due to the finiteness of the network, deterministic mechanisms, and chosen uniform initial distribution the minimum possible non-zero probability for a pattern ${x}_{A}$ is $1/|{\mathcal{X}}_{{V}_{0}}|$. This happens for any pattern that only occurs in a single trajectory. Furthermore, the probability of any pattern is a multiple of $1/|{\mathcal{X}}_{{V}_{0}}|$.

#### Appendix C.2. Proof of Theorem 2

**Proof.**

## Appendix D. Proof of Theorem 1

**Proof.**

## References

- Gallois, A. Identity over Time. In The Stanford Encyclopedia of Philosophy; Zalta, E.N., Ed.; Metaphysics Research Laboratory, Stanford University: Stanford, CA, USA, 2012. [Google Scholar]
- Grand, S. Creation: Life and How to Make It; Harvard University Press: Harvard, MA, USA, 2003. [Google Scholar]
- Pascal, R.; Pross, A. Stability and its manifestation in the chemical and biological worlds. Chem. Commun.
**2015**, 51, 16160–16165. [Google Scholar] [CrossRef] [PubMed] - Orseau, L.; Ring, M. Space-Time Embedded Intelligence. In Artificial General Intelligence; Number 7716 in Lecture Notes in Computer Science; Bach, J., Goertzel, B., Iklé, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 209–218. [Google Scholar]
- Barandiaran, X.E.; Paolo, E.D.; Rohde, M. Defining Agency: Individuality, Normativity, Asymmetry, and Spatio-temporality in Action. Adapt. Behav.
**2009**, 17, 367–386. [Google Scholar] [CrossRef] - Legg, S.; Hutter, M. Universal Intelligence: A Definition of Machine Intelligence. arXiv, 2007; arXiv: 0712.3329. [Google Scholar]
- Boccara, N.; Nasser, J.; Roger, M. Particlelike structures and their interactions in spatiotemporal patterns generated by one-dimensional deterministic cellular-automaton rules. Phys. Rev. A
**1991**, 44, 866–875. [Google Scholar] [CrossRef] [PubMed] - Biehl, M.; Ikegami, T.; Polani, D. Towards information based spatiotemporal patterns as a foundation for agent representation in dynamical systems. In Proceedings of the Artificial Life Conference, Cancun, Mexico, 2016; The MIT Press: Cambridge, MA, USA, 2016; pp. 722–729. [Google Scholar]
- McGill, W.J. Multivariate information transmission. Psychometrika
**1954**, 19, 97–116. [Google Scholar] [CrossRef] - Amari, S.I. Information geometry on hierarchy of probability distributions. IEEE Trans. Inf. Theory
**2001**, 47, 1701–1711. [Google Scholar] [CrossRef] - Lizier, J.T. The Local Information Dynamics of Distributed Computation in Complex Systems; Springer: Berlin/Heidelberg: Germany, 2012. [Google Scholar]
- Tononi, G.; Sporns, O. Measuring information integration. BMC Neurosci.
**2003**, 4, 31. [Google Scholar] [CrossRef] [PubMed] - Balduzzi, D.; Tononi, G. Integrated Information in Discrete Dynamical Systems: Motivation and Theoretical Framework. PLoS Comput. Biol.
**2008**, 4, e1000091. [Google Scholar] [CrossRef] [PubMed] - Beer, R.D. Characterizing autopoiesis in the game of life. Artif. Life
**2014**, 21, 1–19. [Google Scholar] [CrossRef] [PubMed] - Fontana, W.; Buss, L.W. “The arrival of the fittest”: Toward a theory of biological organization. Bull. Math. Biol.
**1994**, 56, 1–64. [Google Scholar] - Krakauer, D.; Bertschinger, N.; Olbrich, E.; Ay, N.; Flack, J.C. The Information Theory of Individuality. arXiv, 2014; arXiv:1412.2447. [Google Scholar]
- Bertschinger, N.; Olbrich, E.; Ay, N.; Jost, J. Autonomy: An information theoretic perspective. Biosystems
**2008**, 91, 331–345. [Google Scholar] [CrossRef] [PubMed] - Shalizi, C.R.; Haslinger, R.; Rouquier, J.B.; Klinkner, K.L.; Moore, C. Automatic filters for the detection of coherent structure in spatiotemporal systems. Phys. Rev. E
**2006**, 73, 036104. [Google Scholar] [CrossRef] [PubMed] - Wolfram, S. Computation theory of cellular automata. Commun. Math. Phys.
**1984**, 96, 15–57. [Google Scholar] [CrossRef] - Grassberger, P. Chaos and diffusion in deterministic cellular automata. Phys. D Nonlinear Phenom.
**1984**, 10, 52–58. [Google Scholar] [CrossRef] - Hanson, J.E.; Crutchfield, J.P. The attractor—Basin portrait of a cellular automaton. J. Stat. Phys.
**1992**, 66, 1415–1462. [Google Scholar] [CrossRef] - Pivato, M. Defect particle kinematics in one-dimensional cellular automata. Theor. Comput. Sci.
**2007**, 377, 205–228. [Google Scholar] [CrossRef] - Lizier, J.T.; Prokopenko, M.; Zomaya, A.Y. Local information transfer as a spatiotemporal filter for complex systems. Phys. Rev. E
**2008**, 77, 026110. [Google Scholar] [CrossRef] [PubMed] - Flecker, B.; Alford, W.; Beggs, J.M.; Williams, P.L.; Beer, R.D. Partial information decomposition as a spatiotemporal filter. Chaos Interdiscip. J. Nonlinear Sci.
**2011**, 21, 037104. [Google Scholar] [CrossRef] [PubMed] - Friston, K. Life as we know it. J. R. Soc. Interface
**2013**, 10, 20130475. [Google Scholar] [CrossRef] [PubMed] - Balduzzi, D. Detecting emergent processes in cellular automata with excess information. arXiv, 2011; arXiv:1105.0158. [Google Scholar]
- Hoel, E.P.; Albantakis, L.; Marshall, W.; Tononi, G. Can the macro beat the micro? Integrated information across spatiotemporal scales. Neurosci. Conscious.
**2016**, 2016, niw012. [Google Scholar] [CrossRef] - Grätzer, G. Lattice Theory: Foundation; Springer: New York, NY, USA, 2011. [Google Scholar]
- Ceccherini-Silberstein, T.; Coornaert, M. Cellular Automata and Groups. In Encyclopedia of Complexity and Systems Science; Meyers, R.A., Ed.; Springer: New York, NY, USA, 2009; pp. 778–791. [Google Scholar]
- Busic, A.; Mairesse, J.; Marcovici, I. Probabilistic cellular automata, invariant measures, and perfect sampling. arXiv, 2010; arXiv: 1010.3133. [Google Scholar]
- Beer, R.D. The cognitive domain of a glider in the game of life. Artif. Life
**2014**, 20, 183–206. [Google Scholar] [CrossRef] [PubMed] - Beer, R.R. Autopoiesis and Enaction in the Game of Life; The MIT Press: Cambridge, MA, USA, 2016; p. 13. [Google Scholar]
- Noonan, H.; Curtis, B. Identity. In The Stanford Encyclopedia of Philosophy; Zalta, E.N., Ed.; Metaphysics Research Laboratory, Stanford University: Stanford, CA, USA, 2014. [Google Scholar]
- Hawley, K. Temporal Parts. In The Stanford Encyclopedia of Philosophy; Zalta, E.N., Ed.; Metaphysics Research Laboratory, Stanford University: Stanford, CA, USA, 2015. [Google Scholar]
- Ay, N. Information Geometry on Complexity and Stochastic Interaction. Entropy
**2015**, 17, 2432–2458. [Google Scholar] [CrossRef] - MacKay, D.J. Information Theory, Inference and Learning Algorithms; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley: Hoboken, NJ, USA, 2006. [Google Scholar]
- Tononi, G. An information integration theory of consciousness. BMC Neurosc.
**2004**, 5, 42. [Google Scholar] [CrossRef] [PubMed] - Von Eitzen, H. Prove (1 − (1 − q)/n)
^{n}≥ q for 0 < q < 1 and n ≥ 2 a Natural Number. Mathematics Stack Exchange. 2016. Available online: http://math.stackexchange.com/q/1974262 (accessed on 18 October 2016). - Bullen, P.S. Handbook of Means and Their Inequalities; Springer Science+Business Media: Dordrecht, The Netherlands, 2003. [Google Scholar]
- Kolchinsky, A.; Rocha, L.M. Prediction and modularity in dynamical systems. In Advances in Artificial Life, ECAL; The MIT Press: Cambridge, MA, USA, 2011; pp. 423–430. [Google Scholar]
- Pemmaraju, S.; Skiena, S. Computational Discrete Mathematics: Combinatorics and Graph Theory with Mathematica
^{®}; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar] - De Bruijn, N.G. Asymptotic Methods in Analysis; Dover Publications: New York, NY, USA, 2010. [Google Scholar]

**Figure 1.**Illustration of concepts from this paper on the time-evolution (trajectory) of a one-dimensional elementary cellular automaton. Time-steps increase from left to right. None of the shown structures are derived from principles. They are manually constructed for illustrative purposes. In (

**a**) we show the complete (finite) trajectory. Naively, two gliders can be seen to collide and give rise to a third glider; In (

**b–d**) we show (spatiotemporal) patterns fixing the variables (allegedly) pertaining to a first, second, and a third glider; In (

**e**) we show a pattern fixing the variables of what could be a glider that absorbs the first glider from before and maintains its identity; In (

**f**) we show a partition into the time-slices of the pattern of the first glider; In (

**g**) we show a partition of the trajectory with three parts coinciding with the gliders and one part encompassing the rest; In (

**h**) we show again a partition with three parts coinciding with the gliders but now all other variables are considered as individual parts.

**Figure 2.**First time steps of a Bayesian network representing a multivariate dynamical system (or multivariate Markov chain) ${\left\{{X}_{i}\right\}}_{i\in V}$. Here we used $V=J\times T$ with J indicating spatial degrees of freedom and T the temporal extension. Then each node is indexed by a tuple $(j,t)$ as shown. The shown edges are just an example, edges are allowed to point from any node to another one within the same or in the subsequent column.

**Figure 3.**In (

**a**) we show a trajectory of the same cellular automaton as in Figure 1 with a randomly chosen initial condition. The set of gliders and their paths occurring in this trajectory is clearly different from those in Figure 1a. In (

**b**) we show an example of a random pattern that occurs in the trajectory of (

**a**) and is probably not an entity in any sense.

**Figure 5.**Visualisation of the four possible trajectories of $M{C}^{=}$. In each trajectory the time index increases from left to right. There are two rows corresponding to the two random variables at each time step and three columns corresponding to the three time-steps we are considering here.

**Figure 6.**Specific local integrations ${mi}_{\pi}\left({x}_{V}\right)$ of any of the four trajectories ${x}_{V}$ seen in Figure 5 with respect to all $\pi \in \mathfrak{L}\left(V\right)$. The partitions are ordered according to an enumeration with increasing cardinality $\left|\pi \right|$ ((see Pemmaraju and Skiena [42], Chapter 4.3.3) for the method). We indicate with vertical lines at what partitions the cardinality $\left|\pi \right|$ increases by one.

**Figure 7.**Same as Figure 6 but with the partitions sorted according to increasing SLI.

**Figure 8.**Hasse diagrams of the five disintegration levels of the trajectories of $M{C}^{=}$. Every vertex corresponds to a partition and edges indicate that the lower partition refines the higher one.

**Figure 9.**Hasse diagram of ${\mathfrak{D}}_{2}$ of $M{C}^{=}$ trajectories. Here we visualise the partitions at each vertex. The blocks of a partition are the cells of equal colour. Note that we can obtain all six disconnected components from one by symmetry operations that are respected by the joint probability distribution ${p}_{V}$. For example, we can shift each row individually to the left or right since every value is constant in each row. We can also switch top and bottom row since they have the same probability distributions even if 1 and 0 are exchanged.

**Figure 10.**For each disintegration level of the trajectories of $M{C}^{=}$ we here show example connected components of Hasse diagrams with the partitions at each vertex visualised. The disintegration level increases clockwise from the top left. The blocks of a partition are the cells of equal colour.

**Figure 11.**Hasse diagrams of the refinement-free disintegration hierarchy ${\mathfrak{D}}^{\u25c2}$ of $M{C}^{=}$ trajectories. Here we visualise the partitions at each vertex. The blocks of a partition are the cells of equal colour. It turns out that partitions that are on the same horizontal level in this diagram correspond exactly to a level in the refinement-free disintegration hierarchy ${\mathfrak{D}}^{\u25c2}$. The i-th horizontal level starting from the top corresponds to ${\mathfrak{D}}_{i}^{\u25c2}$. Take for example the second horizontal level from the top. The partitions on this level are just the minimal elements of the poset ${\mathfrak{D}}_{2}$ which was visualised in Figure 9. To connect this to Figure 8 note that for each disintegration level ${\mathcal{D}}_{i}$ shown there as a Hasse diagram, the partitions on the i-th horizontal level (counting from the top) in the present figure are the minimal elements of that disintegration level.

**Figure 12.**All distinct completely integrated composite patterns (singletons are not shown) on the first possible trajectory of $M{C}^{=}$. The value of complete local integration is indicated above each pattern. We display patterns by colouring the cells corresponding to random variables that are not fixed to any value by the pattern in grey. Cells corresponding to random variables that are fixed by the pattern are coloured according to the value i.e., white for 0 and black for 1.

**Figure 13.**All distinct completely integrated composite patterns on the second possible trajectory of $M{C}^{=}$. The value of complete local integration is indicated above each pattern.

**Figure 14.**All distinct completely integrated composite patterns on all four possible trajectories of $M{C}^{=}$. The value of complete local integration is indicated above each pattern.

**Figure 16.**Visualisation of three trajectories of $M{C}^{\u03f5}$. In each trajectory the time index increases from left to right. There are two rows corresponding to the two random variables at each time step and three columns corresponding to the three time-steps we are considering here. We can see that the first trajectory (in (

**a**)) makes no e-transitions, the second (in (

**b**)) makes one from t = 2 to t = 3, and the third (in (

**c**)) makes two.

**Figure 17.**Specific local integrations ${mi}_{\pi}\left({x}_{V}\right)$ of one of the four trajectories of $M{C}^{=}$ (measured w.r.t. the probability distribution of $M{C}^{=}$), here denoted ${x}_{V}^{M{C}^{=}}$ (this is the same data as in Figure 6), and the three representative trajectories ${x}_{V}^{k},x\in \{1,2,3\}$ of $M{C}^{\u03f5}$ (measured w.r.t. the probability distribution of $M{C}^{\u03f5}$) seen in Figure 16 with respect to all $\pi \in \mathfrak{L}\left(V\right)$. The partitions are ordered as in Figure 6 with increasing cardinality $\left|\pi \right|$. Vertical lines indicate partitions where the cardinality $\left|\pi \right|$ increases by one. Note that the values of ${x}_{V}^{M{C}^{=}}$ are almost completely hidden from view by those of ${x}_{V}^{1}$.

**Figure 18.**All distinct completely integrated composite patterns on the first trajectory ${x}_{V}^{1}$ of $M{C}^{\u03f5}$. The value of complete local integration is indicated above each pattern. See Figure 12 for colouring conventions.

**Figure 19.**All distinct completely integrated composite patterns on the second trajectory ${x}_{V}^{2}$ of $M{C}^{\u03f5}$. The value of complete local integration is indicated above each pattern.

**Figure 20.**All distinct completely integrated composite patterns on the third trajectory ${x}_{V}^{3}$ of $M{C}^{\u03f5}$. The value of complete local integration is indicated above each pattern.

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Biehl, M.; Ikegami, T.; Polani, D.
Specific and Complete Local Integration of Patterns in Bayesian Networks. *Entropy* **2017**, *19*, 230.
https://doi.org/10.3390/e19050230

**AMA Style**

Biehl M, Ikegami T, Polani D.
Specific and Complete Local Integration of Patterns in Bayesian Networks. *Entropy*. 2017; 19(5):230.
https://doi.org/10.3390/e19050230

**Chicago/Turabian Style**

Biehl, Martin, Takashi Ikegami, and Daniel Polani.
2017. "Specific and Complete Local Integration of Patterns in Bayesian Networks" *Entropy* 19, no. 5: 230.
https://doi.org/10.3390/e19050230