# Causal Composition: Structural Differences among Dynamically Equivalent Systems

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Theory

#### 2.1. The Compositional Intrinsic Information of an Example System

**Intrinsicality:**From an extrinsic perspective, the entropy H of a system is also a lower bound on the expected number of “yes/no” questions needed to determine the system’s state [43]. This implies that once the state of every single unit is known, so is the state of all the units together and all its subsets. Conversely, once the state of all the units is known, so is the state of every single unit and all their combinations (Figure 2). Providing this information in addition would seem redundant as it can easily be inferred. However, information that has to be inferred remains implicit. To make it explicit, a function (mechanism) has to be applied. From the intrinsic perspective of the system, information about its causes and effects is thus only available if it is made explicit by some mechanism within the system. In other words, the system itself takes a compositional perspective (Figure 2).

**Composition:**While the reductionist and holistic perspectives focus on causal interactions at one particular order (single elements vs. the system as a whole), any set of elements within the system that receives inputs from and provides outputs to elements within the system may, in principle, form a separate mechanism within the system (Figure 2). Any set of elements within the system may thus specify its own intrinsic information about the prior (and next) state of a particular system subset—its cause (or effect) “purview”. The constraints that a set of system elements in a state specifies about the prior state of a system subset are captured by its cause repertoire (Equation (9), Section 5.3).

**Integration:**Next, we must assess whether and to what extent a set of elements specifies irreducible information about other system subsets. This is because a set of elements contributes to the intrinsic information of the system as a whole only to the extent that it is irreducible under any partition (see Section 5.4, Equation (13)). This is quantified by its irreducible information ${\phi}_{C/E}$, which measures the minimal difference (here using ${D}_{KL}$) between the cause/effect repertoire before and after a partition, evaluated across all possible partitions (Equation (15)). In principle, each of the ${2}^{3}-1=7$ subsets of the system could specify irreducible information about the prior and next state of different subsets within $MCX$, and thus contribute to the system’s intrinsic information in a compositional manner. In our example system, the information specified by the “third-order” set $MC{X}_{t}=(0,1,1)$, however, is identical to the information specified by its subset $M{C}_{t}=(0,1)$. The information that $MC{X}_{t}=(0,1,1)$ specifies about $MC{X}_{t-1}$ is only due to $M{C}_{t}=(0,1)$. Including ${X}_{t}=1$ does not contribute anything on top; it can be partitioned away without a loss of information. Similarly, $M{X}_{t}=(0,1)$ does not specify irreducible information, since the information that ${C}_{t+1}=0$ is due to ${M}_{t}=0$ alone. The irreducible information specified by the subsets in our example system $MC{X}_{t}$ in state $(0,1,1)$ are listed in Table 1. In the following we will quantify the total amount of intrinsic information specified by a particular system as $\sum {\phi}_{C}+\sum {\phi}_{E}$, which is $8.81$ bits for $MC{X}_{t}=(0,1,1)$.

#### 2.2. Causal Composition and System-Level Integration

## 3. Results

#### 3.1. Same Global Dynamics Different Composition and Integration

#### 3.2. Global vs. Physical Reversibility

## 4. Discussion

#### 4.1. Composition vs. Decomposition of Information

#### 4.2. Agency and Autonomy

#### 4.3. The Role of Composition in IIT as a Theory of Phenomenal Consciousness

## 5. Methods

#### 5.1. Dynamical Causal Networks and State Transition Probabilities

#### 5.2. Predictive and Effective Information

#### 5.3. Cause and Effect Repertoires

#### 5.4. Subset Integration

#### 5.5. System Integration

- We use the KLD to quantify differences between probability distributions in order to facilitate the comparison to standard information-theoretical approaches.
- For simplicity and in line with information-theoretical considerations, $\sum {\phi}_{C}$ and $\sum {\phi}_{E}$ are considered independently instead of only counting $\phi =min({\phi}_{C},{\phi}_{E})$ for each subset.
- ${\Phi}_{\subseteq}$ simply evaluates the minimal difference in $\sum {\phi}_{C}$ or $\sum {\phi}_{E}$ under all possible system partitions instead of a more complex difference measure between the intact and partitioned system, such as the extended earth-mover’s distance used in [27].

#### 5.6. Data Sets

**Definition**

**1.**

- 1.
- $p({v}_{t-1}=z|{v}_{t}=s)=1$, and
- 2.
- $p({v}_{t}=s|{v}_{t-1}=z)=1$.

**Definition**

**2.**

#### 5.7. Software and Data Analysis

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Appendix A. Correlation between EI(S), 〈H(V _{i,t+1})〉, and ∑_{φC} +∑_{φE}

**Figure A1.**Intrinsic cause and effect information in a random sample of 10,000 binary 3-node systems. (

**a**,

**b**) Deterministic systems. (

**c**,

**d**) Probabilistic systems. (

**a**,

**c**) Correlation between $EI(S)$ and the total amount of $\phi $: $\sum {\phi}_{C}+\sum {\phi}_{E}$, again averaged across all possible system states. (

**b**,

**d**) Correlation between the average entropy $\langle H({V}_{i,t+1})\rangle $ of the individual system elements at $t+1$ and the total amount of $\phi $. ${\rho}_{SP}$ is the Spearman rank correlation coefficient. Note that ${\rho}_{SP}(EI)$ is high for both deterministic and probabilistic systems, as $EI(S)$ is a causal measure. By contrast, ${\rho}_{SP}(\langle H({V}_{i,t+1})\rangle )$ is high only for deterministic, not for probabilistic systems. This is because in probabilistic systems a large part of $\langle H({V}_{i,t+1})\rangle $ is explained by noise, while in deterministic system $\langle H({V}_{i,t+1})\rangle $ is due to the system’s mechanisms only.

## Appendix B. Practical Measures of Integrated Information and Composition

**Figure A2.**Non-compositional integrated information. (

**a**) ${\Phi}_{\subseteq}$ is plotted against ${\varphi}_{H}$. The measures are weakly correlated with ${\rho}_{SP}({\Phi}_{\subseteq},{\varphi}_{H})=0.24$ for reversible systems, ${\rho}_{SP}({\Phi}_{\subseteq},{\varphi}_{H})=0.47$ for the random deterministic sample, and more strongly correlated in the random probabilistic sample ${\rho}_{SP}({\Phi}_{\subseteq},{\varphi}_{H})=0.58$. (

**b**) ${\varphi}_{AR}$ is plotted against ${\Phi}_{\subseteq}$. The correlation between ${\Phi}_{\subseteq}$ and ${\varphi}_{AR}$ is stronger than for ${\varphi}_{H}$, with ${\rho}_{SP}({\Phi}_{\subseteq},{\varphi}_{AV})=0.48$ for reversible systems, ${\rho}_{SP}({\Phi}_{\subseteq},{\varphi}_{AR})=0.75$ for the random deterministic sample, and ${\rho}_{SP}({\Phi}_{\subseteq},{\varphi}_{AR})=0.69$ for the random probabilistic sample. Note that ${\varphi}_{AV}$ only takes on a few discrete values in the evaluated deterministic systems. Moreover, ${\varphi}_{AR}=0$ whenever ${\Phi}_{\subseteq}=0$ and not otherwise.

## References

- Kubilius, J. Predict, then simplify. NeuroImage
**2018**, 180, 110–111. [Google Scholar] [CrossRef] - Hirsch, M.W. The dynamical systems approach to differential equations. Bull. Am. Math. Soc.
**1984**, 11, 1–65. [Google Scholar] [CrossRef] - Carlson, T.; Goddard, E.; Kaplan, D.M.; Klein, C.; Ritchie, J.B. Ghosts in machine learning for cognitive neuroscience: Moving from data to theory. NeuroImage
**2018**, 180, 88–100. [Google Scholar] [CrossRef] - Kay, K.N. Principles for models of neural information processing. NeuroImage
**2018**, 180, 101–109. [Google Scholar] [CrossRef] - Tononi, G.; Sporns, O.; Edelman, G.M. A measure for brain complexity: Relating functional segregation and integration in the nervous system. Proc. Natl. Acad. Sci. USA
**1994**, 91, 5033–5037. [Google Scholar] [CrossRef] - Ay, N.; Olbrich, E.; Bertschinger, N.; Jost, J. A geometric approach to complexity. Chaos
**2011**, 21, 037103. [Google Scholar] [CrossRef] - Poldrack, R.A.; Farah, M.J. Progress and challenges in probing the human brain. Nature
**2015**, 526, 371–379. [Google Scholar] [CrossRef] - Borst, A.; Theunissen, F.E. Information theory and neural coding. Nat. Neurosci.
**1999**, 2, 947. [Google Scholar] [CrossRef] - Dayan, P.; Abbott, L.F. Theoretical Neuroscience—Computational and Mathematical Modeling of Neural Systems; MIT Press: Cambridge, MA, USA, 2000; Volume 1, pp. 1689–1699. [Google Scholar] [CrossRef]
- Victor, J.D. Approaches to Information-Theoretic Analysis of Neural Activity. Biol. Theory
**2006**, 1, 302–316. [Google Scholar] [CrossRef] [Green Version] - Quian Quiroga, R.; Panzeri, S. Extracting information from neuronal populations: Information theory and decoding approaches. Nat. Rev. Neurosci.
**2009**, 10, 173–185. [Google Scholar] [CrossRef] - Timme, N.M.; Lapish, C. A Tutorial for Information Theory in Neuroscience. eNeuro
**2018**, 5. [Google Scholar] [CrossRef] - Piasini, E.; Panzeri, S.; Piasini, E.; Panzeri, S. Information Theory in Neuroscience. Entropy
**2019**, 21, 62. [Google Scholar] [CrossRef] - Rumelhart, D.; Hinton, G.; Williams, R. Learning Internal Representations by Error Propagation, Parallel Distributed Processing; MIT Press: Cambridge, MA, USA, 1986. [Google Scholar]
- Marstaller, L.; Hintze, A.; Adami, C. The evolution of representation in simple cognitive networks. Neural Comput.
**2013**, 25, 2079–2107. [Google Scholar] [CrossRef] [PubMed] - Kriegeskorte, N.; Kievit, R.A. Representational geometry: integrating cognition, computation, and the brain. Trends Cogn. Sci.
**2013**, 17, 401–412. [Google Scholar] [CrossRef] [PubMed] [Green Version] - King, J.R.; Dehaene, S. Characterizing the dynamics of mental representations: the temporal generalization method. Trends Cogn. Sci.
**2014**, 18, 203–210. [Google Scholar] [CrossRef] [Green Version] - Ritchie, J.B.; Kaplan, D.M.; Klein, C. Decoding the Brain: Neural Representation and the Limits of Multivariate Pattern Analysis in Cognitive Neuroscience. Br. J. Philos. Sci.
**2019**, 70, 581–607. [Google Scholar] [CrossRef] - Mitchell, T.M.; Hutchinson, R.; Niculescu, R.S.; Pereira, F.; Wang, X.; Just, M.; Newman, S. Learning to Decode Cognitive States from Brain Images. Mach. Learn.
**2004**, 57, 145–175. [Google Scholar] [CrossRef] [Green Version] - Haynes, J.D. Decoding visual consciousness from human brain signals. Trends Cogn. Sci.
**2009**, 13, 194–202. [Google Scholar] [CrossRef] - Salti, M.; Monto, S.; Charles, L.; King, J.R.; Parkkonen, L.; Dehaene, S. Distinct cortical codes and temporal dynamics for conscious and unconscious percepts. eLife
**2015**, 4, e05652. [Google Scholar] [CrossRef] - Weichwald, S.; Meyer, T.; Özdenizci, O.; Schölkopf, B.; Ball, T.; Grosse-Wentrup, M. Causal interpretation rules for encoding and decoding models in neuroimaging. NeuroImage
**2015**, 110, 48–59. [Google Scholar] [CrossRef] [Green Version] - Albantakis, L. A Tale of Two Animats: What Does It Take to Have Goal? Springer: Cham, Switzerland, 2018; pp. 5–15. [Google Scholar] [CrossRef]
- Tononi, G. An information integration theory of consciousness. BMC Neurosci.
**2004**, 5, 42. [Google Scholar] [CrossRef] - Tononi, G. Integrated information theory. Scholarpedia
**2015**, 10, 4164. [Google Scholar] [CrossRef] - Tononi, G.; Boly, M.; Massimini, M.; Koch, C. Integrated information theory: From consciousness to its physical substrate. Nat. Rev. Neurosci.
**2016**, 17, 450–461. [Google Scholar] [CrossRef] - Oizumi, M.; Albantakis, L.; Tononi, G. From the Phenomenology to the Mechanisms of Consciousness: Integrated Information Theory 3.0. PLoS Comput. Biol.
**2014**, 10, e1003588. [Google Scholar] [CrossRef] - Lombardi, O.; López, C.; Lombardi, O.; López, C. What Does ‘Information’ Mean in Integrated Information Theory? Entropy
**2018**, 20, 894. [Google Scholar] [CrossRef] - Hall, N. Two concepts of causation. In Causation and Counterfactuals; MIT Press: Cambridge, MA, USA, 2004; pp. 225–276. [Google Scholar]
- Halpern, J.Y. Actual Causality; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Albantakis, L.; Marshall, W.; Hoel, E.; Tononi, G. What caused what? A quantitative account of actual causation using dynamical causal networks. Entropy
**2019**, 21, 459. [Google Scholar] [CrossRef] - Krakauer, D.; Bertschinger, N.; Olbrich, E.; Ay, N.; Flack, J.C. The Information Theory of Individuality. arXiv
**2014**, arXiv:1412.2447. [Google Scholar] - Marshall, W.; Kim, H.; Walker, S.I.; Tononi, G.; Albantakis, L. How causal analysis can reveal autonomy in models of biological systems. Philos. Trans. Ser. A Math. Phys. Eng. Sci.
**2017**, 375, 20160358. [Google Scholar] [CrossRef] - Kolchinsky, A.; Wolpert, D.H. Semantic information, autonomous agency and non-equilibrium statistical physics. Interface Focus
**2018**, 8, 20180041. [Google Scholar] [CrossRef] - Farnsworth, K.D. How Organisms Gained Causal Independence and How It Might Be Quantified. Biology
**2018**, 7, 38. [Google Scholar] [CrossRef] - Tononi, G.; Sporns, O. Measuring information integration. BMC Neurosci.
**2003**, 4, 1–20. [Google Scholar] [CrossRef] - Hoel, E.P.; Albantakis, L.; Tononi, G. Quantifying causal emergence shows that macro can beat micro. Proc. Natl. Acad. Sci. USA
**2013**, 110, 19790–19795. [Google Scholar] [CrossRef] [Green Version] - Bialek, W.; Nemenman, I.; Tishby, N. Predictability, complexity, and learning. Neural Comput.
**2001**, 13, 2409–2463. [Google Scholar] [CrossRef] - Williams, P.L.; Beer, R.D. Nonnegative Decomposition of Multivariate Information. arXiv
**2010**, arXiv:1004.2515. [Google Scholar] - Harder, M.; Salge, C.; Polani, D. Bivariate measure of redundant information. Phys. Rev. Stat. Nonlinear Soft Matter Phys.
**2013**, 87, 012130. [Google Scholar] [CrossRef] [Green Version] - Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J.; Ay, N. Quantifying Unique Information. Entropy
**2014**, 16, 2161–2183. [Google Scholar] [CrossRef] [Green Version] - Chicharro, D. Quantifying multivariate redundancy with maximum entropy decompositions of mutual information. arXiv
**2017**, arXiv:1708.03845. [Google Scholar] - Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
- Ay, N.; Polani, D. Information Flows in Causal Networks. Adv. Complex Syst.
**2008**, 11, 17–41. [Google Scholar] [CrossRef] - Kari, J. Reversible Cellular Automata: From Fundamental Classical Results to Recent Developments. New Gener. Comput.
**2018**, 36, 145–172. [Google Scholar] [CrossRef] - Esteban, F.J.; Galadí, J.A.; Langa, J.A.; Portillo, J.R.; Soler-Toscano, F. Informational structures: A dynamical system approach for integrated information. PLoS Comput. Biol.
**2018**, 14, e1006154. [Google Scholar] [CrossRef] - Kalita, P.; Langa, J.A.; Soler-Toscano, F. Informational Structures and Informational Fields as a Prototype for the Description of Postulates of the Integrated Information Theory. Entropy
**2019**, 21, 493. [Google Scholar] [CrossRef] - Hubbard, J.; West, B. Differential Equations: A Dynamical Systems Approach: A Dynamical Systems Approach. Part II: Higher Dimensional Systems; Applications of Mathematics; Springer: New York, NY, USA, 1991. [Google Scholar]
- Griffith, V.; Chong, E.; James, R.; Ellison, C.; Crutchfield, J. Intersection Information Based on Common Randomness. Entropy
**2014**, 16, 1985–2000. [Google Scholar] [CrossRef] [Green Version] - Ince, R. Measuring Multivariate Redundant Information with Pointwise Common Change in Surprisal. Entropy
**2017**, 19, 318. [Google Scholar] [CrossRef] - Finn, C.; Lizier, J.T. Pointwise Partial Information Decomposition Using the Specificity and Ambiguity Lattices. Entropy
**2018**, 20, 297. [Google Scholar] [CrossRef] - Williams, P.L.; Beer, R.D. Generalized Measures of Information Transfer. arXiv
**2011**, arXiv:1102.1507. [Google Scholar] - Pearl, J. Causality: Models, Reasoning and Inference; Cambridge University Press: Cambridge, UK, 2000; Volume 29. [Google Scholar]
- Janzing, D.; Balduzzi, D.; Grosse-Wentrup, M.; Schölkopf, B. Quantifying causal influences. Ann. Stat.
**2013**, 41, 2324–2358. [Google Scholar] [CrossRef] - Korb, K.B.; Nyberg, E.P.; Hope, L. A new causal power theory. In Causality in the Sciences; Oxford University Press: Oxford, UK, 2011. [Google Scholar] [CrossRef]
- Oizumi, M.; Tsuchiya, N.; Amari, S.I. A unified framework for information integration based on information geometry. Proc. Natl. Acad. Sci. USA
**2015**, 113, 14817–14822. [Google Scholar] [CrossRef] - Balduzzi, D.; Tononi, G. Qualia: The geometry of integrated information. PLoS Comput. Biol.
**2009**, 5, e1000462. [Google Scholar] [CrossRef] - Balduzzi, D.; Tononi, G. Integrated information in discrete dynamical systems: motivation and theoretical framework. PLoS Comput. Biol.
**2008**, 4, e1000091. [Google Scholar] [CrossRef] - Beer, R.D. A dynamical systems perspective on agent-environment interaction. Artif. Intell.
**1995**, 72, 173–215. [Google Scholar] [CrossRef] [Green Version] - Maturana, H.R.; Varela, F.J. Autopoiesis and Cognition: The Realization of the Living; Boston Studies in the Philosophy and History of Science; Springer: Dordrecht, The Netherlands, 1980. [Google Scholar]
- Tononi, G. On the Irreducibility of Consciousness and Its Relevance to Free Will; Springer: New York, NY, USA, 2013; pp. 147–176. [Google Scholar] [CrossRef]
- Favela, L.H. Consciousness Is (Probably) still only in the brain, even though cognition is not. Mind Matter
**2017**, 15, 49–69. [Google Scholar] - Aguilera, M.; Di Paolo, E. Integrated Information and Autonomy in the Thermodynamic Limit. arXiv
**2018**, arXiv:1805.00393. [Google Scholar] - Favela, L. Integrated information theory as a complexity science approach to consciousness. J. Conscious. Stud.
**2019**, 26, 21–47. [Google Scholar] - Fekete, T.; van Leeuwen, C.; Edelman, S. System, Subsystem, Hive: Boundary Problems in Computational Theories of Consciousness. Front. Psychol.
**2016**, 7, 1041. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Metz, C. How Google’s AI Viewed the Move No Human Could Understand. Available online: https://www.wired.com/2016/03/googles-ai-viewed-move-no-human-understand/ (accessed on 30 May 2018).
- Pearl, J.; Mackenzie, D. The Book of Why: The New Science of Cause and Effect; Basic Books: New York, NY, USA, 2018; p. 418. [Google Scholar]
- Albantakis, L.; Hintze, A.; Koch, C.; Adami, C.; Tononi, G. Evolution of Integrated Causal Structures in Animats Exposed to Environments of Increasing Complexity. PLoS Comput. Biol.
**2014**, 10, e1003966. [Google Scholar] [CrossRef] [PubMed] - Beer, R.D.; Williams, P.L. Information processing and dynamics in minimally cognitive agents. Cogn. Sci.
**2015**, 39, 1–38. [Google Scholar] [CrossRef] [PubMed] - Juel, B.E.; Comolatti, R.; Tononi, G.; Albantakis, L. When is an action caused from within? Quantifying the causal chain leading to actions in simulated agents. arXiv
**2019**, arXiv:1904.02995. [Google Scholar] [Green Version] - Haun, A.M.; Tononi, G.; Koch, C.; Tsuchiya, N. Are we underestimating the richness of visual experience? Neurosci. Conscious.
**2017**, 2017. [Google Scholar] [CrossRef] - Mayner, W.G.; Marshall, W.; Albantakis, L.; Findlay, G.; Marchman, R.; Tononi, G. PyPhi: A toolbox for integrated information theory. PLoS Comput. Biol.
**2018**, 14, e1006343. [Google Scholar] [CrossRef] - Marshall, W.; Gomez-Ramirez, J.; Tononi, G. Integrated Information and State Differentiation. Front. Psychol.
**2016**, 7, 926. [Google Scholar] [CrossRef] [Green Version] - Barrett, A.B.; Seth, A.K. Practical measures of integrated information for time-series data. PLoS Comput. Biol.
**2011**, 7, e1001052. [Google Scholar] [CrossRef] [PubMed] - Oizumi, M.; Amari, S.i.; Yanagawa, T.; Fujii, N.; Tsuchiya, N. Measuring Integrated Information from the Decoding Perspective. PLoS Comput. Biol.
**2016**, 12, e1004654. [Google Scholar] [CrossRef] [PubMed] - Ay, N. Information Geometry on Complexity and Stochastic Interaction. Entropy
**2015**, 17, 2432–2458. [Google Scholar] [CrossRef] - Mediano, P.A.M.; Seth, A.K.; Barrett, A.B. Measuring Integrated Information: Comparison of Candidate Measures in Theory and Simulation. Entropy
**2018**, 21, 17. [Google Scholar] [CrossRef] - Tegmark, M. Improved Measures of Integrated Information. PLoS Comput. Biol.
**2016**, 12, e1005123. [Google Scholar] [CrossRef] [PubMed] - Albantakis, L.; Tononi, G. The Intrinsic Cause-Effect Power of Discrete Dynamical Systems—From Elementary Cellular Automata to Adapting Animats. Entropy
**2015**, 17, 5472–5502. [Google Scholar] [CrossRef]

**Figure 1.**An example neural network of three binary interacting elements. The system evolves in discrete time steps and fulfills the Markov property, which means that the conditional probability distribution of the system at time t depends only upon its prior state at $t-1$. Shown are two equivalent descriptions of the system, which allow us to model and predict its dynamical state evolution: (

**a**) The system represented as a dynamical causal network. This type of description corresponds to a reductionist view of the system, highlighting the interactions between individual elements. Edges indicate causal connections between elements, which are equipped with update functions, or structural equations, that specify the element’s output given a particular input. While the neural network (left) is recurrent, it can be represented by a directed acyclic graph (DAG) when unfolded in time (right). Throughout, we assume stationarity, which means that the system’s dynamics do not change over time. (

**b**) The system represented by its state transition probabilities under all possible initial conditions, illustrated in form of a state transition diagram (left), and transition probability matrix (middle). This type of description corresponds to a holistic perspective onto the system, taking the system states and their evolution in state space as primary. As the system elements are binary (and comply with Equation (2), Section 5.1), the transition probability matrix can also be represented in state-by-node format, which indicates the probability of each node to be in state ’1’ at t given the respective input state at $t-1$ (right). As the system is deterministic, all probabilities are either 0.0 or 1.0. To distinguish binary state labels from real-valued probabilities, the latter include decimal points.

**Figure 2.**Reductionist, holistic, and compositional perspectives. (

**a**) From a reductionist perspective, causal interactions are evaluated at the level of individual elements (first order). Once the state of the individual elements is observed, the state of the system and all its subsets have to be inferred. (

**b**) Taking a holistic perspective, causal interactions are evaluated at the global level of the entire system (${n}^{th}$ order). Once the global state is observed, the states of all system subsets have to be inferred. (

**c**) From a compositional perspective, causal interactions are evaluated at all orders. Information about the state of each subset is available in explicit form if it is specified (irreducibly) by another subset within the system.

**Figure 3.**Cause and effect repertoires of example system $MCX$ in state (0,1,1). The cause (effect) repertoires of individual system elements and their combinations specify how each set of elements in its current state constrains its possible causes (effects) within $MCX$. ${C}_{t}=1$, for example, specifies that ${M}_{t-1}=1$, and predicts that ${M}_{t+1}=1$ is likely with $p=0.75$. Labels above the repertoires indicate what each set of elements specifies about its “purviews” (see Section 5.4), the system subsets that are being constrained, which also determine the size (state space) of the repertoire in the figure. ${C}_{t}=1$, for example, does not constrain ${C}_{t+1}$ or ${X}_{t+1}$ in any way. Given ${C}_{t}=1$ the state of ${C}_{t+1}$ and ${X}_{t+1}$ remains maximally uncertain.

**Figure 4.**Informational and dynamical properties of reversible and ergodic-reversible (ER) discrete dynamical systems. (

**a**) An example of a reversible three element system $S=\{A,B,C\}$. $EI(S)=n$ bit for all reversible systems. Dynamically these systems can still specify between 1 and ${2}^{n}$ attractors that lead to different stationary distributions $p(S)$ depending on the initial state; (

**b**) example of an ergodic reversible (ER) system. In these systems, $I({V}_{t-1};{V}_{t})\simeq EI(S)=n$ bit as the system cycles through all of its possible states, and the observed, stationary distribution $p(S)$ converges to a uniform distribution for an infinite number of observations and every full cycle through the system’s state space.

**Figure 5.**Distribution of intrinsic information and system-level integrated information. ${\Phi}_{\subseteq}$ is plotted against $\sum {\phi}_{C}+\sum {\phi}_{E}$ for all evaluated data sets: a random sample of 10,000 probabilistic (“Prob”) and deterministic (“Random”) TPMs, as well as the set of all 40,320 reversible systems (“REV”), and the subset of 5040 ergodic reversible (“ER”) systems (see Section 5.6 for details). ${\Phi}_{\subseteq}$ and $\sum {\phi}_{C}+\sum {\phi}_{E}$ are averages across all possible system states. Histograms show the distribution of ${\Phi}_{\subseteq}$ values (left) and $\sum {\phi}_{C}+\sum {\phi}_{E}$ values (bottom).

**Figure 6.**Illustrative ER example systems from low to high $\sum {\phi}_{C}+\sum {\phi}_{E}$. (

**a**) An ER system with the lowest $\sum {\phi}_{C}+\sum {\phi}_{E}$. Nodes A and C are both simple NOT/COPY logic gates. A is only connected to B in a feedforward manner, thus ${\Phi}_{\subseteq}=0$. (

**b**) An ER system with slightly higher $\sum {\phi}_{C/E}$ than (a). B is a simple COPY logic-gate, A is an XOR. This system is integrated with ${\Phi}_{\subseteq}=0.84$. (

**c**) An ER system with higher $\sum {\phi}_{C}+\sum {\phi}_{E}$, but ${\Phi}_{\subseteq}=0$. A is a simple NOT logic-gate (same as in (a)) that connects to B and C in a feedforward manner. (

**d**) An ER system with high $\sum {\phi}_{C}+\sum {\phi}_{E}$. All nodes specify nonlinear input-output functions over all system elements and the system is strongly integrated with ${\Phi}_{\subseteq}=1.50$.

**Figure 7.**Intrinsic information and system irreducibility under time-reversed dynamics. (

**a**,

**b**) The total amount of intrinsic information $\sum {\phi}_{C}+\sum {\phi}_{E}$ (a) and ${\Phi}_{\subseteq}$ (b) of each system is plotted against its time-reversed dynamical equivalent, which can exhibit different values. (

**c**) The difference in ${\Phi}_{\subseteq}$ between a system and its reverse, plotted against their difference in $\sum {\phi}_{C}+\sum {\phi}_{E}$. (

**d**) Example of a system with different causal composition and ${\Phi}_{\subseteq}$ compared to its time-reversed dynamical equivalent shown in (

**e**). Note also the differences in their elementary mechanisms and connectivity. Compared to (e), in (d) node B lacks the self-connection and A does not receive an input from C. While node A in (d) implements biconditional logic and node B an XOR function, all nodes in (e) implement logic functions that depend on A, B, and C as inputs.

**Figure 8.**Dynamics of a joint agent–environment system. (

**a**) The system $ABC$ forms a hypothetical agent that interacts dynamically with its environment. $ABCE$ forms a (4-node) ER system, as does $ABC$ if E is taken as a fixed background condition. Element E changes its state whenever $ABC=111$. $ABC$ is the subset with $max({\Phi}_{\subseteq})$ in all 16 states. We consider two cases of dynamical equivalence: (

**b**) Permuting the states of $ABCE$ in the global state-transition diagram will typically change the local dynamics of the agent subsystem $ABC$ and the prior agent–environment division is lost. Note that B is connected to the rest of the system in a purely feedforward manner. Instead of $ABC$, now $ACE$ forms the set of elements with $max({\Phi}_{\subseteq})$ in most states (11/16, discounting single elements). (

**c**) A local remapping of the state-transition diagram of $ABC$ will typically change the global dynamics, if the input-output function of the environment E remains unchanged. This changes the agent’s behavior with respect to its environment. In order to recover the global dynamics E’s mechanism needs to be adapted. Even in this case, however, the agent–environment division may not be maintained and $BC$ is now the set of elements with $max({\Phi}_{\subseteq})$ in most (14/16) states.

**Figure 9.**Permissible partitions. (

**a**) To assess the integrated intrinsic information ${\phi}_{C/E}({x}_{t})$ specified by a subset of system elements $X\subseteq S$ at t about the prior or next states of the system, ${x}_{t}$ has to be partitioned into at least two parts, here, e.g., $\{({(MCZ)}_{t-1}|{(MC)}_{t})\times (\u2300|{X}_{t})\}$ and $\{({M}_{t+1}|{M}_{t})\times ({(CX)}_{t+1}|{(CX)}_{t})\}$. (

**b**) Unidirectional system partitions as defined in [27]. The connections from one part of the system to another (but not vice versa) are partitioned.

Subset | ${\mathit{M}}_{\mathit{t}}=0$ | ${\mathit{C}}_{\mathit{t}}=1$ | ${\mathit{X}}_{\mathit{t}}=1$ | ${\mathit{MC}}_{\mathit{t}}=(0,1)$ | ${\mathit{MX}}_{\mathit{t}}=(0,1)$ | ${\mathit{CX}}_{\mathit{t}}=(1,1)$ | ${\mathit{MCX}}_{\mathit{t}}=(0,1,1)$ | $\sum {\mathit{\phi}}_{\mathit{C}/\mathit{E}}$ |
---|---|---|---|---|---|---|---|---|

${\phi}_{C}$ | 1.0 | 1.0 | 1.0 | 1.0 | 0.415 | 1.0 | 0.0 | 5.41 |

${\phi}_{E}$ | 1.189 | 0.189 | 0.189 | 1.0 | 0.0 | 0.415 | 0.415 | 3.40 |

**Table 2.**Irreducible information (in bits) specified by the subsets of the example systems in Figure 6 in state $(0,1,1)$. Which sets specify irreducible information and how much they specify is state-dependent. Values of $\phi =0.0$ bits are omitted for ease of comparison.

Subset ${\mathit{x}}_{\mathit{t}}$ | ${\mathit{\phi}}_{\mathit{C}}$ | ${\mathit{\phi}}_{\mathit{E}}$ | ||||||
---|---|---|---|---|---|---|---|---|

(a) | (b) | (c) | (d) | (a) | (b) | (c) | (d) | |

${A}_{t}=0$ | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.189 | 1.0 | 0.566 |

${B}_{t}=1$ | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.189 | 0.378 | 0.566 |

${C}_{t}=1$ | 1.0 | 1.0 | 1.0 | 1.0 | 0.189 | 1.0 | 0.378 | 0.566 |

$A{B}_{t}=(0,1)$ | 0.415 | 0.415 | 1.0 | 1.415 | 0.415 | 0.415 | ||

$A{C}_{t}=(0,1)$ | 1.0 | 0.415 | 0.83 | 0.415 | 0.415 | |||

$B{C}_{t}=(1,1)$ | 0.5 | 0.415 | 0.915 | 0.83 | 0.415 | 0.415 | 0.415 | |

$AB{C}_{t}=(0,1,1)$ | 0.415 | 1.0 | 1.0 | 0.415 | 0.83 | |||

$\sum {\phi}_{C/E}$ | 3.92 | 4.42 | 5.16 | 6.66 | 3.19 | 3.21 | 3.42 | 3.77 |

Subset ${\mathit{x}}_{\mathit{t}}$ | (a) | (d) | ||
---|---|---|---|---|

${\mathbf{z}}_{\mathbf{t}+\mathbf{1}}$ | $\mathbf{p}({\mathbf{z}}_{\mathbf{t}+\mathbf{1}}|{\mathbf{x}}_{\mathbf{t}})$ | ${\mathbf{z}}_{\mathbf{t}+\mathbf{1}}$ | $\mathbf{p}({\mathbf{z}}_{\mathbf{t}+\mathbf{1}}|{\mathbf{x}}_{\mathbf{t}})$ | |

${A}_{t}=0$ | ${A}_{t+1}=1$ | $(p=1)$ | $AB{C}_{t+1}=(1,0,0)$ | $(p=0.42)$ |

${B}_{t}=1$ | ${C}_{t+1}=1$ | $(p=1)$ | $AB{C}_{t+1}=(1,1,1)$ | $(p=0.42)$ |

${C}_{t}=1$ | ${B}_{t+1}=0$ | $(p=0.75)$ | $AB{C}_{t+1}=(0,0,1)$ | $(p=0.42)$ |

$A{B}_{t}=(0,1)$ | ${A}_{t+1}=1$ | $(p=1)$ | ||

$A{C}_{t}=(0,1)$ | ${B}_{t+1}=0$ | $(p=1)$ | ||

$B{C}_{t}=(1,1)$ | ${C}_{t+1}=1$ | $(p=1)$ | ||

$AB{C}_{t}=(0,1,1)$ | $AB{C}_{t+1}=(1,1,1)$ | $(p=1)$ | $AB{C}_{t+1}=(1,0,1)$ | $(p=1)$ |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Albantakis, L.; Tononi, G.
Causal Composition: Structural Differences among Dynamically Equivalent Systems. *Entropy* **2019**, *21*, 989.
https://doi.org/10.3390/e21100989

**AMA Style**

Albantakis L, Tononi G.
Causal Composition: Structural Differences among Dynamically Equivalent Systems. *Entropy*. 2019; 21(10):989.
https://doi.org/10.3390/e21100989

**Chicago/Turabian Style**

Albantakis, Larissa, and Giulio Tononi.
2019. "Causal Composition: Structural Differences among Dynamically Equivalent Systems" *Entropy* 21, no. 10: 989.
https://doi.org/10.3390/e21100989