Scaling Behaviour and Critical Phase Transitions in Integrated Information Theory

Aguilera, Miguel

doi:10.3390/e21121198

Open AccessArticle

Scaling Behaviour and Critical Phase Transitions in Integrated Information Theory

by

Miguel Aguilera

^1,2

¹

IAS-Research Center for Life, Mind and Society, University of the Basque Country, 20018 Donostia, Spain

²

ISAAC Lab, Aragón Institute of Engineering Research, University of Zaragoza, 50018 Zaragoza, Spain

Entropy 2019, 21(12), 1198; https://doi.org/10.3390/e21121198

Submission received: 14 March 2019 / Revised: 28 November 2019 / Accepted: 30 November 2019 / Published: 5 December 2019

(This article belongs to the Special Issue Integrated Information Theory)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Integrated Information Theory proposes a measure of conscious activity (

Φ

), characterised as the irreducibility of a dynamical system to the sum of its components. Due to its computational cost, current versions of the theory (IIT 3.0) are difficult to apply to systems larger than a dozen units, and, in general, it is not well known how integrated information scales as systems grow larger in size. In this article, we propose to study the scaling behaviour of integrated information in a simple model of a critical phase transition: an infinite-range kinetic Ising model. In this model, we assume a homogeneous distribution of couplings to simplify the computation of integrated information. This simplified model allows us to critically review some of the design assumptions behind the measure and connect its properties with well-known phenomena in phase transitions in statistical mechanics. As a result, we point to some aspects of the mathematical definitions of IIT that 3.0 fail to capture critical phase transitions and propose a reformulation of the assumptions made by integrated information measures.

Keywords:

Integrated Information Theory; Phi; Ising model; criticality; phase transitions

Graphical Abstract

1. Introduction

Integrated Information Theory (IIT [1]) was developed to address the problem of consciousness by characterizing its underlying processes in a quantitative manner. It provides a measure of integration,

Φ

, that quantifies to what extent a dynamical system generates information that is irreducible to the sum of its parts, considered independently. Beyond IIT as a theory of consciousness,

Φ

has received attention as a general measure of complexity, and different versions of the measure have been applied to capture to what extent the behaviour of a system is both differentiated (displaying diverse local patterns) and integrated (maintaining a global coherence). Furthermore,

Φ

attempts to capture the level of irreducibility of the causal structures of a system, revealing the boundaries in the organisation of complex dynamical systems (i.e., delimiting the parts of the system that are integrated into a functional unit [2]).

Despite its promising features, IIT is still controversial and it has received different critiques, both to its theoretical and philosophical foundations (e.g., see [3,4]) and the mathematical definitions therein. Some of the latter suggest that

Φ

might not be well defined and presents a number of problems [5]. Furthermore, there is a myriad of different definitions of

Φ

, and experimental testing of these competing definitions in small systems shows that their properties radically diverge in some cases [6]. Despite the abundance of critiques and alternative definitions of

Φ

, it is not clear which is the appropriate direction to settle theoretical differences or test different approaches experimentally. In our view, there is two main obstacles that hinder this endeavour: (1) the difficulty of testing integration measures in well-known dynamical systems where integrated information measures can be evaluated against known properties of the system, and (2) the computational cost of calculating IIT measures, preventing its application beyond small networks.

The first problem is, in general, difficult, as even in simple nonlinear models the relation between the network topology and dynamics is complex, and it is not always clear which topology should yield larger integrated information. Nevertheless, there is a family of models in which the relation between segregation and integration is well characterised: homogeneous systems exhibiting order-disorder critical phase transitions [7]. In these systems, there is a transition in their phase space from a disordered state, in which activity is random and segregated, to an ordered one, where units of the system are strongly coordinated. Just at the boundary separating ordered and disordered dynamics we find criticality, a state where a compromise between coordination and segregation is found [7,8]. Even in simple systems, critical dynamics are dominated by small bursts of local (i.e., segregated) activity, yet display large avalanches of globally coordinated (i.e., integrated) activity. In neural systems, these are generally referred to as “neuronal avalanches”, and experimental evidence suggests that neural dynamics exhibit a degree of concordance with those expected for a system near criticality [9].

Critical phenomena are theoretically characterised for some systems of infinite size, and they can be experimentally identified as divergent tendencies in large, finite systems as they grow in size. This refers us to the second problem, which is the very large computational cost of measuring integrated information. Due to combinatorial explosion, computing

Φ

is only possible in general for small, discrete systems. In practice, this prevents to measure integrated information in the very large or even infinite systems where critical dynamics can be appreciated. In IIT 3.0 [1], the latest version of the framework of Integrated Information Theory,

Φ

can only be computed for binary networks composed of up to a dozen units. There is a variety of other measures to compute integrated information [6], and some of them are computationally lighter, but all share these limits to some extent. As most versions of

Φ

require computing distance measures between probability distributions of a system and finding the minimum information partition (MIP), they present important restrictions in terms of computational cost as a system scales in size. Note that some simplified measures of

Φ

can be computed analytically for the case of Gaussian distributions (e.g., see [10,11]), but, in this paper, we will focus in the general case of binary variables, without considering the specific case of Gaussian assumptions.

In general, IIT measures have been limited to very small systems, and it is not well understood how integrated information scales with the size or temporal scale of the system. Previous work [12] has analysed how integrated information changes with spatial and temporal coarse graining in small networks, but it is still difficult to connect these results with well-known scaling effects like the ones that take place in critical phase transitions. Luckily, a family of simplified models, generally referred to as Ising models, can capture critical transitions with very simple topologies. Some of the simplest ones present homogeneous architectures that can greatly simplify the calculation of information theoretical quantities of the system, see, e.g., [13]. Using this idea, recent novel work using mean-field approximations [14] has shown that under some assumptions it is possible to compute integrated information in infinite-size networks with some homogeneous properties, showing that integrated information measures diverge at certain critical points. In this work, we extend those results by finding methods for computing integrated information of similar models of finite size. Specifically, we explore how integrated information measures scale with size in large networks, proposing methods for simplifying the computation of distance metrics between probability distributions and the search of the MIP. In doing so, we critically assess some of the assumptions behind IIT measures and propose some modifications to better capture the properties of second-order phase transitions. Specifically, we will explore different aspects of the definition of integrated information: (1) the dynamics and temporal span of integrated information, (2) assumptions for the computation of the cause repertoire, (3) the choice of distance metrics between the Wasserstein distance and the Kullback–Leibler divergence, (4) the effect of considering the environment from a situated perspective, (5) the relation between mechanism-level integration

φ

and system-level integration

Φ

and (6) the importance of identifying diverging tendencies in the system.

2. Model

To show how integrated information behaves around critical points in Ising models, we describe a (slightly modified) version of IIT 3.0. Then, we introduce a family of homogeneous kinetic Ising models and a series of methods that simplify the computation of integrated information in large networks.

2.1. IIT 3.0

We critically revise and adapt the framework of IIT 3.0 [1], which originally computes the integrated information of a subset of elements of a system as follows. For a system of N elements with state

s

at time t (We use boldface letters and symbols for vectors and matrices, e.g.,

s (t) = {(s_{1} (t), s_{2} (t), \dots, s_{N} (t))}^{T}

,

^{T}

indicating transposition.), we characterise the input–output relationship of the system elements through its corresponding transition probability function

P (s (t + τ) | s (t))

, describing the probabilities of the transitions from one state to another for all possible system states. IIT 3.0 requires systems to satisfy the Markov property (i.e., that the distribution of states of a process at time t depends only upon the state at time

t - τ

), and that the current states of elements are conditionally independent, given the past state of the system, i.e.,

P (s (t) | s (t - τ)) = \prod_{i} P (s_{i} (t + τ) | s (t))

.

IIT 3.0 computes integrated information in the causal mechanisms of a system by defining two subsets of

s (t)

and

s (t \pm τ)

, called the mechanism

M_{t} = {s_{i} (t)}_{i \in I_{M_{t}}}

and the purview

P_{t \pm τ} = {s_{i} (t \pm τ)}_{i \in I_{P_{t \pm τ}}}

, to represent the current state of part of the system and how it constrains future or past states. The cause-and-effect repertoires of the system are described, respectively, by the probability distributions

P (P_{t - τ} | M_{t})

and

P (P_{t + τ} | M_{t})

.

The integrated cause–effect information of

M_{t}

is then defined as the distance between the cause–effect repertoires of a mechanism and the cause–effect repertoires of its minimum information partition (MIP) over the maximally irreducible purview,

\begin{matrix} φ_{cause} (τ) & = \underset{P}{m a x} (\underset{cut}{m i n} (D_{W} (P (P_{t - τ} | M_{t}), P_{cut} (P_{t - τ} | M_{t})))), \end{matrix}

(1)

\begin{matrix} φ_{effect} (τ) & = \underset{P}{m a x} (\underset{cut}{m i n} (D_{W} (P (P_{t + τ} | M_{t}), P_{cut} (P_{t + τ} | M_{t})))), \end{matrix}

(2)

where

D_{W} (P, Q)

refers to the Wasserstein distance (also known as Earth Mover’s Distance), used by IIT 3.0 to quantify the statistical distance between probability distributions P and Q. The subindex

cut

specifies a bipartition of the mechanism into two halves and

P_{cut}

represents the cause or effect probability distribution under such partition,

\begin{matrix} cut & = \{M_{t}^{1}, P_{t \pm τ}^{1}, M_{t}^{2}, P_{t \pm τ}^{2}\}, \\ P_{cut} (P_{t \pm τ} | M_{t}) & = P (P_{t \pm τ}^{1} | M_{t}^{1}) \otimes P (P_{t \pm τ}^{2} | M_{t}^{2}) . \end{matrix}

(3)

Here,

cut

specifies the partition applied over the elements of mechanism

M

, where

M_{t}^{1}, M_{t}^{2}

design the blocks of a bipartition of the mechanism at the current state at time t,

M_{t}

;

P_{t \pm τ}^{1}, P_{t \pm τ}^{2}

refer to the blocks of a bipartition (not necessarily the same) of the present or past units

P_{t \pm τ}

. Figure 1B represents the partition

M_{1} = {s_{1} (t), s_{2} (t)}, M_{2} = {s_{3} (t)}, P_{1} = {s_{1} (t + 1), s_{2} (t + 1), s_{3} (t + 1)}, P_{2} = {}

. The interaction between the partitioned systems (⊗ operator) is defined by injecting uniform random noise in the partitioned connections when defining the transition probability matrix

P (s (t \pm τ) | s (t))

.

The integrated information of the mechanism

M_{t}

with a time span

τ

,

φ (τ)

, is the minimum of its corresponding integrated cause and effect information,

φ = \min (φ_{cause}, φ_{effect}) .

(4)

The integrated information of the entire system,

Φ (τ)

, is then defined as the distance between the cause–effect structure of the system and the cause–effect structure of its minimum information partition, eliminating constraints from one part of the system to the rest:

Φ (τ) = \underset{cut}{m i n} D_{W}^{*} (C (τ), C_{cut} (τ)),

(5)

where

C (τ)

stands for a “constellation of concepts”, which is constructed from the set of points with position

{P (s (t - τ) | s (t)), P (s (t + τ) | s (t))}

and value

φ (τ)

corresponding to all the mechanisms in the system. Similarly,

C_{cut} (τ)

stands for a constellation of a system in which a new unidirectional partition has been applied, injecting noise in the partitioned connections as in previous case (note that now

φ

is computed applying two different partitions). In this case, a especial distance measure is used (we label it as

D_{W}^{*}

), which is a modified he extended version of the Wasserstein distance that measures the minimal transport cost of transforming one constellation into another [1] (Text S2), in which

φ (τ)

are the values to be transported and the Wasserstein distance between

{P (s (t - τ) | s (t)), P (s (t + τ) | s (t))}

in the original system and under the partition is the distance these values have to be transported. Finally, if the system is a subset of elements of a larger system, all elements outside the system are considered as part of the environment and are conditioned on their current state throughout the causal analysis. Similarly, when evaluating a mechanism, all elements inside a mechanism (where

φ

is analysed) but outside the system (where

Φ

is determined) are considered as uniform, independent noise sources. Further details of the steps described here can be found in [1].

Working Assumptions

In order to compute integrated information in large systems, we modify some aspects of the theory. In IIT 3.0, an integrated information of a mechanism

φ

is evaluated for a particular mechanism

M_{t}

and a purview

P_{t \pm τ}

. Here, for simplicity, we assume that the purview always includes the same units as the mechanism (although we allow them to be partitioned differently, see, e.g., Figure 1B). Allowing more options for the purview could make a big difference in some systems; although, in the homogeneous systems tested here, the differences are small. Also, the distance for computing integrated information is measured for the distance of all elements of the system, not only the elements contained in the purview. In IIT 3.0 only elements in the purview are affected by a partition. In our modified version of the measure, in some cases (

τ > 1

, see Section 3.1) the outside of the purview can change as well, thus capturing these changes offers a better characterisation of the effects of partitions.

Moreover, as we assume a homogeneous architecture, in some cases mechanism integration

φ

and system-level integration

Φ

have a similarly diverging behaviour (as we explore in Section 3.5). Thus, for simplicity, in most cases we compute only the integrated information

φ

of a mechanism comprising the system of interest.

This homogeneous architecture also allows us to assume that under some conditions (systems with possible couplings and near the thermodynamic limit) the MIP is always either a partition that cuts a single node from the mechanism of the system or a cut that separates entire regions in different partitions (see Appendix B.3). This assumption will reduce drastically the computational cost of calculating integrated information.

Other assumptions will be studied in different sections of the article. In Table A1, these different assumptions are described and in Table A2 it is indicated if they are used by IIT 3.0 and if they are applied for obtaining the results of the different figures of Section 3.

2.2. Kinetic Ising Model with Homogeneous Regions

We define a general model capturing causal interactions between variables. Aiming for generality, we use the least structured statistical model defining causal correlations between pairs of units from one time step to the next [15]. We study a kinetic Ising model where N binary spins

s_{i} \in {- 1, + 1}

evolve in discrete time, with synchronous parallel dynamics (Figure 1A). Given the configuration of spins at time t,

s (t) = {s_{1} (t), \dots, s_{N} (t)}

, the spins

s_{i} (t)

are independent random variables drawn from the distribution:

\begin{matrix} P (s_{i} (t) | s (t - 1)) & = \frac{e^{β s_{i} (t) h_{i} (t)}}{2 cosh (β h_{i} (t))}, \end{matrix}

(6)

\begin{matrix} h_{i} (t) & = H_{i} + \sum_{j} J_{i j} s_{j} (t - 1) . \end{matrix}

(7)

The parameters

H

and

J

represent the local fields at each spin and the couplings between pairs of spins, and

β

is the inverse temperature of the model. Without loss of generality, we assume that

β = 1

.

In general, computing the probability distributions

P (s (t))

of a kinetic Ising model is a difficult task, as it requires computing over all previous trajectories. In general,

P (s (t + τ) | s (t))

is computed recursively applying the equation

P (s (t + τ) | s (t)) = \sum_{s (t + τ - 1)} P (s (t + τ) | s (t + τ - 1)) P (s (t + τ - 1) | s (t)) .

(8)

The cost of calculating this equation grows exponentially with size of the system, with a computational cost of

O (2^{2 N})

.

This computation can be simplified for certain architectures of the model. We divide the system into different regions, and assume that the coupling values

J_{i j}

are positive and homogeneous for each intra- or inter-region connections

J_{i j} = \frac{1}{N_{V}} J_{U V}

, where

U

and

V

are regions of the system with sizes

N_{U}, N_{V}

and

i \in U, j \in V

. Also, for simplicity we assume that

H = 0

.

When the system is divided in homogeneous regions, the calculation of the probability distribution of the system is simplified to computing the number of active units for each region

S_{U} (t) = \sum_{i \in U} (1 + s_{i} (t)) / 2

. With this, we simplify the transition probability matrix to

\begin{matrix} P (s_{i} (t) | S (t - 1)) & = \frac{e^{β s_{i} (t) h_{i} (t)}}{2 cosh (β h_{i} (t))}, h_{i} (t) = H_{i} + \sum_{V} J_{U V} S_{V} (t - 1), \end{matrix}

(9)

\begin{matrix} P (S_{U} (t) | S (t - 1)) & = P {(s_{i} (t) = 1 | S (t - 1))}^{S_{U} (t)} P {(s_{i} (t) = 0 | S (t - 1))}^{(N_{U} - S_{U} (t))} (\binom{N_{U}}{S_{U} (t)}) . \end{matrix}

(10)

Having then that

P (S (t + τ) | S (t)) = \sum_{S (t + τ - 1)} \prod_{U} P (S_{U} (t + τ) | S (t + τ - 1)) P (S (t + τ - 1) | S (t)) .

(11)

Now the cost is reduced to

O (\prod_{U} {(N_{U} + 1)}^{2})

, which, for a limited number of regions, makes the computation much lighter.

Interestingly, as shown in [14], if the size of the regions tends to infinity, then the behaviour of a region

U

is described simply by the mean field activity of its units,

m_{U} (t) = \frac{1}{N_{U}} \sum_{j \in U} s_{j} (t)

, and its evolution becomes deterministic, having that the behaviour of any unit i belonging to a region

V

by the value of the input field of the region

h_{V}

:

\begin{matrix} P (S_{U} (t) | S (t - 1)) & = δ (S_{U} (t) - N_{U} \frac{1 + m_{U} (t)}{2}), \end{matrix}

(12)

\begin{matrix} m_{U} (t) & = tanh \sum_{V} J_{U V} m_{V} (t - 1), \end{matrix}

(13)

where

δ (x)

is the Kronecker delta function and

m_{U} (t - 1)

is the mean field of region

U (t - 1)

.

2.3. Integrated Information in the Kinetic Ising Model with Homogeneous Regions

Describing an Ising model with homogeneous regions simplifies the computation of integrated information in two important ways: by reducing the costs of finding the minimum information partition and computing statistical distances between distributions.

As the connectivity of the system is homogeneous for all nodes in the same region, in Appendix B.3 we show that, near the thermodynamic limit and for the case of only positive couplings, the MIP is always a partition that either (a) isolates only one unit from one of the regions of the system, or (b) separates entire regions such that all elements of a region in the current or the future (or past) states always belong to the same partition. Also, in case (a), the partition that isolates a single unit in time t always has a smallest value of

φ

than the partition isolating a node at time

t \pm τ

, as partitioning the posterior distribution corresponds to a larger distance between probability distributions (see Appendix B.3). We tested both cases (a) and (b) and found that in all the examples exemplified in this article, as all couplings are in a similar order of magnitude, the MIP always is as in case (a), so the MIP always cuts only a node at time t from one region of the system (see, e.g., Figure 1B). In this work, we also compute the value of

Φ

for a homogeneous system with just one region. In this case, as there is only one region, the MIP at the system level is also a partition that isolates only one node, as this is the intervention yielding a minimal distance (see Appendix B.4).

In systems with finite size, the evolution of the probability distribution of the activity at the regions of the system is calculated using Equation (11). From there, Equations (1) and (2) can be computed. For large systems, it becomes unfeasible to compute the Wasserstein distance between distributions due to the combinatorial explosion of states. Nevertheless, when regions are homogeneous this computation is greatly simplified if the number of regions is not too large. As all the units within a region are statistically identical, in terms of Wasserstein distances it is equivalent to work with aggregate variables representing the sum of the units of a region (see Appendix B.1):

\begin{matrix} φ_{M}^{cut} (τ) = D_{W} (P (s (τ_{0} + τ) | s (τ_{0})) | | P_{cut} (s (τ_{0} + τ) | s (τ_{0}))) \\ = D_{W} ((P (S (τ_{0} + τ) | s (τ_{0})) | | P_{cut} (S (τ_{0} + τ) | s (τ_{0}))) . \end{matrix}

(14)

This equivalence is possible because the transport cost between two states

s (t)

and

s^{*} (t)

is defined as

\frac{1}{2} \sum_{i} | s_{i} (t) - s_{i}^{*} (t) |

. As the Wasserstein distance always chooses minimal transport costs, then the cost between states

S (t)

,

S^{*} (t)

is defined as

| S (t) - S^{*} (t) |

. If instead of the Wasserstein distance we use the Kullback–Leibler divergence, then

D_{K L} (s (t) | | s^{*} (t)) = D_{K L} (S (t) | | S^{*} (t))

(see Appendix C).

Finally, when the partition only affects one node of the mechanism of the system at region

V

(see, e.g., Figure 1B), the computation of

P_{cut}

is performed by transforming the transfer probability matrix as

\begin{matrix} P_{cut} (S_{U} (t) | S (t - 1)) = & \frac{1}{2} (P (S_{U} (t) | S (t - 1)) \\ + (1 - \frac{S_{V} (t)}{N_{V}}) P (S_{U} (t) | S_{V} (t - 1) + 1, S_{\bar{V}} (t - 1)) \\ + \frac{S_{V} (t)}{N_{V}} P (S_{U} (t) | S_{V} (t - 1) - 1, S_{\bar{V}} (t - 1))), \end{matrix}

(15)

where

\bar{V}

is the complement set to

{V}

. The origin of this expression is that injecting uniform noise to a single unit of region

V

can have three possible outputs: leaving the system as it was (

\frac{1}{2}

chance), adding one to the value of

S_{V}

(

\frac{1}{2} (1 - \frac{S_{V} (t)}{N_{V}})

chance) or subtracting one to the value of

S_{V}

(

\frac{1}{2} \frac{S_{V} (t)}{N_{V}}

chance). The linear combination of these three cases yields the final transfer probability matrix.

When computing integrated information of the system,

Φ

, Equation (5) computes the distance between concepts (i.e., values of integration of the mechanisms of a system) of the original system and the system under a unidirectional partition. Mechanisms affected by the partition will always have a value of

φ = 0

(and are transported to a residual, “null” concept located at the unconstrained distribution, see [1] (Text S2)). In our example, we find that these concepts contribute the most to the value of

Φ

(see Section 3.5).

3. Results

Criticisms concerning the definition of integrated information measures have addressed a variety of topics, e.g., the existence of “trivially non-conscious” systems, composed of units distributed in relatively simple arrangements yielding arbitrarily large values of

Φ

, or the absence of a canonical metric of the probability distribution space [5]. There are also some aspects that are not very well understood as the dependence of

Φ

with the scale or graining of a system, or the differences and dependencies between cause and effect repertoires. Here, we explore some of these aspects and explore possible reformulations of the measures concerning how current measures behave around critical phase transitions. We introduce as a reference the behaviour of the kinetic Ising model with homogeneous regions of infinite size. As described in Equation (13), behaviour in the thermodynamic limit can be described by the evolution of the mean firing rates. Also, computing the derivative of the mean firing rates with infinite size (which will be used to compute the distances between distributions) is straightforward:

\begin{matrix} \frac{\partial m_{U} (t)}{\partial J_{U V}} & = (1 - m_{U}^{2} (t)) (m_{U} (t - 1) + \sum_{V} J_{U V} \frac{\partial m_{V} (t - 1)}{\partial J_{U V}}) . \end{matrix}

(16)

In Figure 2 we observe an example for a infinite-size kinetic Ising model with just one homogeneous region

U

with self couplings of value

J_{UU} = J / N

. In this case, the model displays a critical point at

J = 1

.

We argue that this critical phase transitions offers an interesting case for studying integrated information. First, systems at criticality display long-range correlations and maximal energy fluctuations and at the critical point [7,8], which should produce maximal dynamical integration, as noted in [16]. However, IIT 3.0 is concerned not with dynamics, but with causal interactions (i.e., how the states of mechanisms generate information by constrain future/past states), thus assuming critical dynamics is not enough for expecting maximum integration in the terms of IIT 3.0. Still, we can argue that (a) phase transitions in an Ising model mark a discontinuity between different modes of operation and (b) the critical point is characterised by maximum susceptibility (i.e., sensitivity to changes in intensive physical properties of mechanisms, e.g., Figure 2B) in front of external perturbation [7]. Because of this, when measuring integrated information in Ising models, we expect critical phase transitions to be observable in terms of integrated information, and critical points to have distinguishable properties respect other points of the phase space.

Using this toy model, we explore different aspects of the mathematical definitions of integrated information and the assumptions behind these definitions, using critical phase transitions as a reference. We will compute

φ (τ)

as follows. First, we select the initial state

s (t)

. For finding a representative initial state, we start from a uniform distribution of

P (s)

(and the corresponding

P_{I} (S) = \frac{1}{2^{N}} (\binom{N}{S})

) and update until it stabilizes (using Equation (11) or Equation (13) for the finite and infinite cases, respectively). Then, we choose

S (t) = \arg \max P (S)

. From there, we update the probability distributions forward or backwards

τ

times with and without applying a partition and compute the distance between distributions for computing

φ_{effect}

and

φ_{cause}

. Total integrated information

φ

will be computed as the minimum between the two. In all sections, we will assume that mechanism and purview contain the same units

{i}_{i \in I_{M_{t}}} = {i}_{i \in I_{P_{t \pm τ}}}

. We also will assume that the mechanism and the purview are composed by the whole system under analysis, except for Section 3.4 and Section 3.5, were smaller subsystems are analysed. Only in Section 3.5 do we compute the value of

Φ

, and we will argue that in our examples the first level of integration

φ

is enough for describing the behaviour of the system.

3.1. Dynamics and Temporal Span of Integrated Information

First, we explore integrated information of the effect repertoire of a system for a time span

τ

,

φ_{effect} (τ)

. In IIT 3.0, integrated information is defined as the level of irreducibility of the causal structure of a dynamical system over one time step [1]. The level of irreducibility is computed by applying partitions over the system, in which noise is injected in the connections affected by the partition. This is done by manipulating the transition probability matrix of the system. In previous work, integrated information has been applied to different temporal spans by changing the temporal graining of the system and joining consecutive states in a new Markov chain [12] or by concealing the micro levels in black box mappings [17]. Another possibility could be describing the transition probability matrix of the system from t and

t + τ

. However, is this the adequate way to capture integration at larger timescales? As IIT 3.0 operates with the transition probability matrix of a system, one could compute this matrix from time t to time

t + τ

and compute a new transition probability matrix for a bipartition by injecting noise in the connections affected by it at time t. This implies that noise is injected at the first step (at time t) and then the system behaves normally for the following steps. We will refer to this way of applying a partition as an ’initial noise injection’ (in contrast with a “continuous noise injection”, see below).

We explore this by computing integrated information with only one region of size

N = 256

, with coupling values

J_{i j} = J / N

. If we compute

φ

for different values of

τ

(Figure 3A), we observe that for different couplings J integrated information always peaks at the ordered side of the phase transition. As

τ

is increased, this peak moves towards the critical point and its size decreases decreases, tending to zero. The assumption of an initial noise injection yields

φ (τ) = 0

at the critical point and maximum integration at the ordered side of the phase transition. Thus, integrated information in this case is not able to characterise the phase transition of the system

A different metric can be defined if, instead of applying the partition just at the initial step, we apply it to all

τ

updates (Figure 3B). We will refer to this way of applying a partition as a “continuous noise injection”, in contrast with the case in which noise is only injected at the first step. We propose that this is a more natural way to apply a partition, capturing larger integrated information around the critical point as we consider larger timescales. Moreover, as opposed to the previous case, which captured zero integration in the disordered side, this measure is able to capture increasing integration as the system approaches the critical transition from any side. One may note that in the mean-field approximation for infinite size (as shown in [14] and also Figure 5A), integration is zero when approaching a critical point from the disorder side. This is not a problem of the measure but a characteristic of the system, in which units have independent dynamics until J reaches the threshold of the critical point. For finite size, units are not completely independent and our measure correctly captures non-zero integration.

Still, some important considerations need to be taken into account when applying a continuous noise injection. In a initial noise injection,

φ

decreases with time as the effect of causal structures is diluted with time. In contrast, a continuous noise injection accumulates the effects of each time step, making integration grow for larger temporal span. These are very different assumptions, but we propose that the latter is more appropriate in our case in order to capture the long-range correlations and critical slowing down properties displayed by systems at criticality. Therefore, for the remainder of the article, we will assume a continuous noise injection.

3.2. Integrated Information of the Cause Repertoire

In the previous section we have explored the behaviour of

φ_{effect}

around a critical phase transition, i.e., the value of integrated information for the repertoire of states generated by the mechanisms of a system at time

t + τ

. IIT 3.0 proposes that integrated information should be computed as the minimum between

φ_{effect}

and

φ_{cause}

(Equation (4)). This is motivated by the “intrinsic information bottleneck principle”, proposing that information about causes of a state only exist to the extent it also can specify information about its effects, and vice versa [1].

Describing the cause repertoire is more complicated than the effect repertoire. IIT 3.0 [1] (Text S2) proposes to tackle the problem of defining

P (s (t - 1) | s (t))

by assuming a uniform prior distribution of past states

P_{U} (s (t - 1))

. This takes the form

P (s (t - 1) | s (t)) = \frac{P (s (t) | s (t - 1)) P_{U} (s (t - 1))}{\sum_{s (t - 1)} P (s (t) | s (t - 1)) P_{U} (s (t - 1))},

(17)

where

P_{U}

stands for a uniform probability distribution. This is equivalent to

P (S (t - 1) | S (t)) = \frac{P (S (t) | S (t - 1)) P_{I} (S (t - 1))}{\sum_{S (t - 1)} P (S (t) | S (t - 1)) P_{I} (S (t - 1))},

(18)

where

P_{I} (S) = \frac{1}{2^{N}} (\binom{N}{S})

is the binomial distribution resulting from combining N independent distributions (obtained directly from

P_{U} (s)

). Similarly,

P_{cut} (s (t - 1) | s (t))

and

P_{cut} (S (t - 1) | S (t))

can be computed like in Equations (17) and (18), assuming a modified conditional probability

P_{cut} (S (t) | S (t - 1))

What is the effect of considering an independent prior? As we observe in Figure 4A, for

τ = 1

, integration remains high even for large values of J. As we increase

τ

, integration decreases. This behaviour is completely different to the effect repertoire (Figure 3B). Intuitively, such a difference of behaviour from cause and effect mechanisms is strange for an homogeneous system in a stationary state as the one under study here. More importantly, the measure of

φ_{cause}

fails to capture integration around the critical point, and displays the largest values of integration of the system far into the ordered side of the phase space. Note that as

φ = \min (φ_{cause}, φ_{effect})

, in this case, the value of integration would be dominated by the cause repertoire, and

φ

would not diverge around the critical point.

It is possible to drop the assumption of an independent prior, but some assumption about the prior distribution is needed. A simple alternative is to assume that the system is in a stationary state with distribution

P_{st} (s (t)) = P_{st} (s (t - 1))

, having then

\begin{matrix} P (s (t - 1) | s (t)) & = \frac{P (s (t) | s (t - 1)) P_{st} (s (t - 1))}{\sum_{s (t - 1)} P (s (t) | s (t - 1)) P_{st} (s (t - 1))} = \frac{P (s (t) | s (t - 1)) P_{st} (s (t - 1))}{P_{st} (s (t))}, \end{matrix}

(19)

\begin{matrix} P (S (t - 1) | S (t)) & = \frac{P (S (t) | S (t - 1)) P_{st} (S (t - 1))}{\sum_{S (t - 1)} P (S (t) | S (t - 1)) P_{st} (S (t - 1))} = \frac{P (S (t) | S (t - 1)) P_{st} (S (t - 1))}{P_{st} (S (t))}, \end{matrix}

(20)

In this case, computing

φ_{cause} (τ)

(Figure 4B), we observe that the integration of the cause and effect repertoires has a similar behaviour as J changes, yielding similar curves to

φ_{effect} (τ)

. Still, note that integration values are slightly lower for the cause repertoire.

Thus, the assumption of an independent prior has dramatic consequences, which can be avoided by assuming an stationary distribution. Another alternative for systems undergoing a transient is to compute the trajectory of probability distributions

P (s (t))

and use it as priors, though this makes the computation much more costly. For the rest of the manuscript, we will assume an stationary prior. Note that the noise injected when partitioning the system is still uniform in all case.

3.3. Divergence of Integrated Information: Wasserstein and Kullback–Leibler Distance Measures

We have seen that, using our assumptions,

φ

grows with

τ

around the critical point in a finite system, suggesting that the value of integration may diverge in the thermodynamic limit. We test this divergence by computing integrated information

φ

for for networks of different size N and a given

τ

. In general, the relationship of

φ

and

τ

is complex, as for each value of J, it depends on the transient dynamics of the system. It is not the goal of this article to explore this issue in detail, but we want to ensure that finite systems have enough time to get close to a stationary regime. Thus, from now on, for simplicity, we will use a value of

τ = 10 {log}_{2} N

, where N is the size of the system. We choose this relation because we have tested that it ensures the divergence of integrated information around critical points, although other relations we tested (e.g.,

τ \propto N

) maintain the qualitative results shown in the following sections.

To test the divergence of

φ

, we compute the value of integrated information for the largest mechanism of a kinetic Ising model with an homogeneous region with different size N, and assuming continuous noise injection and a stationary prior. We observe in Figure 5A that for finite sizes

φ_{cause}

(black line) shows a diverging tendency around the critical point. Effect integration

φ_{effect}

(grey line) shows a similar divergence, with values slightly larger. In this case, we also computed the value of

φ_{effect}

for infinite size. When

N \to \infty

, units

s_{i} (t + τ)

become independent, and the Wasserstein distance of a system with one region is computed analytically as

D_{W} (P (s (t + τ) | s (t + τ)) | | P_{cut} (s (t + τ) | s (t + τ))) = \frac{1}{2} \frac{\partial m (t + τ)}{\partial J} J

(see Appendix B.2). The divergence of

φ_{effect}

for infinite size shown in Figure 5A was also analytically characterised in [14]. As

φ_{effect}

is always larger than

φ_{cause}

, in this case the total integration is always

φ = φ_{cause}

. Summarising, we can conclude that

φ

computed using the Wassertein distance shows a divergence around the critical point of the kinetic Ising model.

Many versions of

φ

use the Kullback–Leibler divergence as an alternative to the Wasserstein distance. As seen in Figure 5B, this change can lead to an important difference in the results of

φ_{cause}

(black line) and

φ_{effect}

(grey line). The figure shows that

φ

tends to peak around the critical point but decreases with the size of the system. Also for this case,

φ = φ_{cause}

for the cases we computed. By doing a similar approximation than in the previous case can be used to compute

φ_{effect}

in the infinite case (see Appendix C), using the well-known relation between the Kullback–Leibler and Fisher information, it can be shown that

D_{K L} (P (s (t + τ) | s (t + τ)) | | P_{cut} (s (t + τ) | s (t + τ))) = \frac{1}{2} \frac{1}{1 - m^{2} (t + τ)} J^{2} \frac{1}{N} {(\frac{\partial m (t + τ)}{\partial J})}^{2}

, tending to zero for diverging size N (see Appendix C). Using this expression, we find that for infinite size the value of

φ_{effect} N

diverges. However, computing the values for the finite networks, we find that

φ_{effect} N

and

φ_{cause} N

do not diverge for finite values of N (Figure 5C). This can be interpreted as a similar phenomenon found in homogeneous Ising models (e.g., Curie–Weiss model [13]), where the Heat Capacity, equivalent to the Fisher approximation to the Kullback–Leibler divergence computed here, does not diverge for finite sizes as the size of the system grows.

These results illustrate that different distance measures can have important effects in the behaviour of

φ

. As well, our results show that different metrics can hold different relations between finite models and the mean-field behaviour of the model with infinite size. For the Wasserstein distance,

φ

in finite systems tends to a diverging behaviour around the critical point, characterised for the infinite mean-field model. Conversely, for the Kullback–Leibler divergence,

φ

does not diverge in finite models and it does diverge for the mean-field infinite model (for the effect repertoire). In this case, the symmetry breaking of the system for infinite size provokes that the behaviour of the system is different in the mean-field model (a similar phenomena takes place with simple measures as the average magnetisation). This effect can be relevant for studying

φ

in real finite systems by computing their mean-field approximations.

Although most versions of integrated information measures used the Kullback–Leibler divergence, recently IIT 3.0 suggested that the Wasserstein distance is a more appropriate measure [1] (Text S2). The result presented here show that the change of distance measure can have important implications when measuring large systems. Further work should inspect how different distance measures are able or not to capture the scaling behaviour of different systems and how is this coherent with the properties of the systems under study. A way to do so could be to explore the connection between the relation of the Wasserstein and Kullback–Leibler versions of integrated information with well-known variables in Ising models like the magnetic susceptibility or the heat capacity of the system (see Appendix B and Appendix C).

As we have shown that under the appropriate assumptions both

φ_{cause}

and

φ_{effect}

have similar diverging tendencies, in the rest of the example, we will not show these variables separately and will just show

φ = \min (φ_{cause}, φ_{effect})

.

3.4. Situatedness: Effect of the Environment of a System

In IIT 3.0, there is a difference between a mechanism, where a first level of integration

φ

is computed, and a candidate set, composed of different mechanisms, where a second-order level of integration

Φ

applies. When computing integration at these two levels, there are some assumptions about how the elements outside of the systems are considered. In IIT 3.0, the elements inside the candidate set but outside of the mechanism are treated as independent sources of noise. Respectively, elements outside the candidate set are treated as background conditions, and are considered as fixed external constraints.

What are the effects of these assumptions when computing integrated information in a critical phase transition? We measure again integration of a kinetic Ising model with one region of size N and coupling J. However, instead of considering the whole system, we measure the level of integration of a subsystem or mechanism

M

covering a fraction of the system

M / N

, where M is the size of the mechanism. We choose a value of

M = \frac{3 N}{4}

, although other fractions yield similar results. To compute Equation (11) with and without the partition, we divide the system in two regions: one consisting on the units belonging to the mechanisms, and the other containing the units outside the mechanism. We measure the integrated information of the mechanism

φ_{M}

under three different assumptions: (a) that units outside of the mechanism operate normally, (b) units outside the mechanism are independent noise sources and (c) units outside the mechanism are fixed as external constraints.

In the first case, when external units operate normally (Figure 6A), we observe that the divergence of

φ_{M}

is maintained (although testing different values of M shows that

φ_{M}

increases with the size of the mechanism, see [14]). In contrast, if we accept the assumptions of IIT 3.0 and take the elements outside the mechanism as independent sources of noise or as static variables, the behaviour of

φ_{M}

changes radically. In the former case, when outside elements are independent noise sources the divergence is maintained but takes place at a different value of the parameter J (Figure 6B). This happens because inputs from uniform independent sources of noise will be distributed around a zero mean field value, and thus the phase transition of the system takes place at larger values of J that compensate for the units that are now uncorrelated. Thus, considering the elements outside of the mechanism as independent sources of noise can be misleading, showing that maximum integration takes places at different points of the system. In this case, the position of the divergence is located at larger values of J, corresponding with significantly lower values of covariance and fluctuations in the units of the system, therefore not reflecting the actual operation of the mechanisms.

The latter assumption implies maintaining the state of the units outside of the mechanism with the static values that they had at time t. In this case (Figure 6C), we find that

φ_{M}

does not diverge, and instead it has a peak in the ordered side of the phase transition. We can understand this by thinking that the effect of constant fields is equal to adding a value of

H_{i}

equal to the input from static units, therefore breaking the symmetry of the system and precluding a critical phase transition. In both cases, we observe that ignoring or simplifying the coupling between a system and its environment can affect drastically the values of integration as a system scales.

3.5. System-Level or Mechanism-Level Integration: Big Phi versus Small Phi

So far, we analysed the behaviour of integration measures

φ

describing the integration of mechanisms of a system. IIT 3.0 postulates that the integration of a system is defined by a second-order measure of integration,

Φ

, applied over the set of all its mechanisms. To explore how

Φ

behaves for systems of different sizes, we compute it for a homogeneous system with one region, including the different modifications assumed in the previous subsections. Note that this modifies the measure, but it still allows us to inspect some of its scaling aspects.

For measuring

Φ

, first

φ

is computed for the different mechanisms of the system, and then the integration of the set of mechanisms is compared with the set of values of

φ

of the system under unidirectional partitions, using a modified Wasserstein distance [1] (Text S2). In the case of a homogeneous system with just one region, the MIP is any of the partitions that isolates one single node from the rest of the system. The value of

Φ

is the modified Wasserstein distance (with and without applying the MIP) between the values of

φ

of the set of mechanisms of the system.

In Figure 7, we compare the values of

φ

of the larger mechanism (Figure 7A) with the normalised values of

Φ

of the whole system (Figure 7B), for a homogeneous system with only one region with self-couplings J. The value of

φ

of the larger mechanisms diverges around the critical point as expected. In the case of

Φ

, we find that for all values of J integration grows with size very rapidly. This is due to the fact that the number of concepts (i.e the number of mechanisms) of the system grows exponentially with size. The number of mechanisms or concepts is

N_{C} = \sum_{k = 1}^{N} (\binom{N}{k}) = 2^{N} - 1

. Thus, we normalise the value of

Φ

dividing by

N_{C}

(Figure 7B). Using normalised values of

Φ

, we observe that the system still diverges at the critical point. Furthermore, in this case the divergence is faster than in the case of

φ

, as it accumulates the effects of the divergence of many mechanisms under a second partition.

If we observe the contribution of different mechanisms to

Φ

, we observe that most of the contributions to

Φ

are determined by the mechanisms affected by the MIP. In this case, all the value of

φ

is transported by the Wasserstein distance into a new point defined by an independent distribution [1] (See Text S2) (e.g., for

N = 128

around

98 %

of the value of

Φ

is defined by the value of

φ

of the mechanisms under the MIP).

In our example, it seems that the relation between

φ

and

Φ

is not quite relevant (the divergence of the later seems to be an amplified version of the former). Heterogeneous or sparsely connected systems may present more complicated relations and present important differences in the behaviour of the highest order

φ

and the total

Φ

. Still, we believe that our simple example calls for a better justification of the need of measuring a second-order level of integration in IIT 3.0 and the difference of the two levels respect well-studied properties of systems.

3.6. Values versus Tendencies of Integration

Finally, we explore mechanism integration

φ

in the case of two homogeneous regions: one region

A

with self-interaction and another region

E

, which is just coupled to the first without recurrent connections (i.e.,

J_{EE} = 0

, Figure 8A). This case was used in [14] to represent an agent interacting with an environment, exploring the power of integrated information to delimit what is the most integrated part of the system. This delimitation has been proposed to identify the autonomy of small biological circuits [2], but it is still unclear if the conclusions of analysis in such small systems and simplified models could be extended to larger models of neural and biological systems.

For different values of recurrent connections

J_{AA} = J_{R}

, two values of bidirectional couplings

J_{AE} = J_{C}

,

J_{EA} = 2 J_{C}

are tested:

J_{C} = 0.8

and

J_{C} = 1.2

. The results in [14] showed that, for infinite sizes, in the weaker coupling condition

J_{C} = 0.8

,

A

was the most integrated unit of the system at the critical point. In contrast, for a stronger coupling

J_{C} = 1.2

, the joint

AE

system was the one that presented higher integration for infinite size. In Figure 8, we show the values of integration of

A

and

AE

for different sizes (integration of

E

is always zero as there are no recurrent connections for this region), with

J_{C} = 0.8

(Figure 8B,C) and

J_{C} = 1.2

(Figure 8D,E).

For

J_{C} = 0.8

, we observe that

φ_{A}

is always larger, independently of the size of the system, showing that

A

is always more integrated than

AE

. However, for

J_{C} = 1.2

, we find an interesting behaviour. We can observe that for small sizes (

N = 8, 16

)

A

is more integrated. Conversely, for larger sizes (

N = 64, 128

) we observe that

AE

is more integrated, as its value of

φ_{A}

diverges faster with size than

φ_{AE}

.

This is relevant because in many cases integrated information can only be measured for rather small systems. When analysing models of real neural or biological systems, these should be coarse grained or discretised in order for

φ

measures to be applicable. In such cases, we can expect that the delimitation of the most integrated units of the system have different values than at larger scales. Thus, rather than the exact value of

φ

, the diverging tendencies in the model might be most informative about the behaviour of the real observed system when small networks are considered.

4. Discussion

In this article, we critically reviewed different aspects of the definition of integrated information proposed by IIT 3.0, exemplifying them in toy models displaying critical phase transitions. Using a homogeneous Ising model, we simplify the calculations to measure integrated information in large systems. It is well known from theory in spin glasses that the infinite range homogeneous Ising model (also known as Curie Weiss model) presents a critical point for

J = 1

and

H = 0

[13]. Although we argue that critical phase transitions should be observed in integrated information measures (as critical points display long-range correlations, maximal susceptibility to parametric changes and preserve integrative and segregative tendencies of the system, see [7,16]), we have shown how different aspects of the definition of

φ

prevent to capture the critical phase transition of the system as it grows larger in size. This investigation has led us to propose reformulations of some aspects of the theory in order to address some of the problems encountered during the study.

As IIT 3.0 has been mostly tested in small logic gate circuits, exploring the behaviour of integrated information in large Ising models has allowed us to investigate questions that were so far unexplored and inspect some of the assumptions of the theory from a new perspective. We consider that the value of the study is twofold. On one hand, we propose a family of models with known statistical properties, where calculations of integrated information are simplified. These and similar models could work as a benchmark for testing properties of integrated information in large systems. On the other hand, the reformulations of different aspects of the theory proposed during the paper could be considered by future versions of IIT, to capture some of the phenomena that we could expect in large, complex systems.

First, we explored how the application of integrated information over an adequate timescale is important to observe increasing values of integration as the system scales. The dynamics of the Ising model are characterised by a “critical slowing down” as the critical point is approached. Consequently, we observed that to capture critical diverging tendencies, timescales larger than one time step should be used. As the dynamics of critical system display correlations at very different timescales, and the span of these timescales increase with the size of the system, integrated information should be evaluated in a way that the diversity of timescales is captured. In our analysis, we found that the way to capture integration near critical points is to apply partitions in a different way than IIT 3.0. In IIT 3.0 partitions are applied by injecting noise in the input state of the system and then computing the forward and backwards distributions, but this approach did not capture the phase transition in the model. In contrast, we successfully characterised the phase transition as diverging integrated information around the critical point by applying several updates of the state of the system and injecting noise at each update.

Second, to capture the cause repertoire of a state (integration in the causal dependencies of a mechanism with previous states), IIT 3.0 proposes to assume a uniform prior distribution of past states. We show that this assumption can distort the observed values of integration, losing an adequate characterisation of the critical phase transition. We suggest that the real prior distribution (either stationary or transient) should be used if cause repertoires are considered.

The third aspect we studied is the use of different distance measures between probability distributions. Specifically, we compared the Wasserstein distance used by IIT 3.0 with the Kullback–Leibler divergence, which is the choice for many competing definitions of integrated information. First, we show that values of the Kullback–Leibler divergence should be weighted by the size of the system in order to be comparable to the Wasserstein distance under the MIP; otherwise, they tend to zero as the system grows. We also show that, in a homogeneous kinetic Ising model at criticality, the Wasserstein distance shows diverging tendencies for finite sizes, whereas the Kullback–Leibler divergence only shows a finite peak. This shows that, in some cases, the Wasserstein distance may detect some divergences that would be ignored by the Kullback–Leibler divergence. Still, it should be debated whether it is adequate that a system like the toy model presented here shows a diverging value of integration. A closer examination of the behaviour of known quantities in an Ising model could constitute an adequate starting point for this discussion. In this sense, the results of the Wasserstein distance and Kullback–Leibler divergence can be connected with the behaviour of known quantities in the homogeneous Ising model. For example, the susceptibility of the system diverges at the critical point while the heat capacity of the system only shows a peak [13]. Both measures can be related to approximations of

φ

using the Wasserstein and Kullback–Leibler measures, respectively (from Equations (A22) and (A9)).

Furthermore, we analysed a crucial aspect of integration measures that is often overlooked: the situatedness of the system. The central claim of situated approaches to cognitive behaviour is that the agent-environment coupling shapes brain dynamics in a manner that is essential to behavioural or cognitive functionality [18,19]. Thus, ignoring or dismissing this brain-body-environment coupling can result in a substantial quantitative and qualitative distortion of the activity of a neural system [20]. Besides, there are deep theoretical reasons that come from the enactive perspective on cognition that establish that the very notion of cognitive performance is world-involving, i.e., that it is co-constituted by agent and environment [21]. In contrast, IIT 3.0 dismisses the bidirectional interaction between the system under evaluation and its environment for computing integration, with the aim to assess the integrated information from the “intrinsic perspective” of the system itself. Specifically, IIT 3.0 considers the units outside the system (i.e., outside the candidate set) as static variables and the units within the system but outside the evaluated mechanism as independent sources of noise. We show in the model that both assumption can have dramatic effects in the behaviour of the system. The assumption of static variables makes the divergence at the critical point disappear, and the assumption of independent sources of noise creates spurious divergences of integrated information at different positions than the original model. Only a situated version of integrated information, which does not dismiss the activity of the environment and its couplings to the system, can correctly measure integrated information even for a model as simple as ours. This suggest that the intrinsic notion of information or the intrinsic perspective of a system cannot dismiss the system’s regulation of its coupling with the environment [22]. Thus, ignoring the coupling with the outside of a sytem can have important consequences for the application of integrated information measures in simulated and experimental setups. For example, in [23], different agents are characterised by the integrated information of its neural mechanisms, but ignoring the environment might miss important channels of sensorimotor coordination contributing to the integration of the system. Similarly, attempts to identify the physical substrate of consciousness in brains [24] should take into account situated and embodied aspects of brain activity, or even consider the possibility that (at least at some moments) this substrate can cut across the brain–body–world divisions, rather than being confined inside the brain [25].

In other experiments, we compared the differences between the values of mechanism-level and system-level integration (

φ

and

Φ

) in a homogeneous system with one region, finding that some normalisation constants are required to compare

Φ

of systems with different size. We also found that

Φ

also diverges at the critical point, and it does faster than

φ

, due to the second partition applied and the accumulation of the different mechanisms of the system. Although here we compute

Φ

for a very simple system, we suggest that the introduction of this second level of analysis should be better justified. In that sense, recent work explores very small networks showing how the compositional definition of measures like

Φ

can yield very different results than the non-compositional mechanism-level measures

φ

[26]. Further work could try to better characterise the difference between the two levels in systems with analytically tractable properties like the Ising systems with multiple regions presented here.

Finally, we compared the diverging tendencies of two coupled subregions, showing that the delimitation of integrated information might change with size as the integration of some regions diverges faster than others. This is specially relevant as IIT 3.0 gives a prominent relevance to the areas of the brain with maximal integration (the “neural substrate of consciousness”). If integration takes the form of diverging values of

φ

around certain classes of critical points, or regions (see [14]), then the neural substrate supporting maximal integration should be characterised by how fast integration diverges with size, and not by the value of integration yielded by simplified models (e.g., by coarse-graining observed time series), which can be potentially misleading.

These results exploring homogeneous kinetic Ising models show that the calculation of integrated information presents important challenges even in simple models. This work serves to demonstrate that the measure is very susceptible to design assumptions and that the behaviour of the measure changes drastically because of this. In this scenario, we show how the connection between the theory (IIT 3.0), a theoretical understanding of complex dynamical systems (critical phase transitions), and the study of simplified models exemplifying known phenomena (homogeneous Ising models) offers a path to systematically study the implications of these assumptions. Our results compel researchers interested in IIT and related indices of complexity to apply such measures under careful examination of their design assumptions. Rather than applying the measure off-the-shelf, ideally researchers should be aware of the assumptions behind the measure and how it applies to each situation. In this way, theory can go in hand with cautious experimental applications, avoiding potentially misleading interpretations and ensuring that they are used to improve our understanding of biological and neural phenomena.

Funding

Miguel Aguilera was supported by the UPV/EHU post-doctoral training program ESPDOC17/17 and supported by project TIN2016-80347-R funded by the Spanish Ministry of Economy and Competitiveness and project IT1228-19 funded by Basque Government.

Acknowledgments

The author would like to thank Ezequiel Di Paolo for constructive criticism of the manuscript.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. List of Assumptions and Experiments

We show here a summary of the different assumptions made by IIT 3.0 and different sections of the manuscript. The different assumptions are described in Table A1. Moreover, in Table A2, we indicate which assumptions are considered by IIT 3.0 and the figures corresponding to different experiments in the article.

Note that due to computational cost of some of the measures, different experiments use different system sizes, and consequently they explore different values of couplings, as in larger systems peaks of values related with critical divergences take place closer to the critical point.

Table A1. Description of the different assumptions considered in the article.

Assumptions	Description
Homogeneous connectivity	In order to simplify the computation of probability distributions and the MIP in the thermodynamic limit, we assume that the system is divided into a number of homogeneous regions. All units within a region share the same inter/intra-region coupling values.
Equal mechanism and purview	To simplify calculations, we assume that the purview of the system is equal to the mechanism. In contrast, IIT 3.0 selects the purview that yields maximum integration $φ$ .
MIP cuts either a single node or entire regions	In the thermodynamic limit, when all couplings are positive, the MIP of a homogeneous Ising model is either a partition that cuts a single node from the mechanism or one that separates an entire region (see Appendix B.3). We assume that the same applies to finite systems.
Initial noise injection	When transition probability matrices describe several updates, IIT 3.0 assumes that partitions only inject noise in the initial state.
Continuous noise injection	In contrast with IIT 3.0, in some cases we assume that partitions inject noise at every update of the system.
Independent prior	In order to compute the cause repertoire of a mechanism, IIT 3.0 assumes a uniform prior distribution to apply Bayes rule (Equation (17))
Stationary prior	Alternative, in some cases we assume a stationary prior to compute cause repertoires of a mechanism (Section 3.2).
Wasserstein distance	In IIT 3.0, distances between distributions are computed using the Wasserstein distance.
Kullback–Leibler divergence	Many other alternative measures of integration (including previous versions of IIT) are based on the Kullback–Leibler divergence.

Table A2. List of assumptions considered by IIT 3.0 and by the results of the different experiments in the article.

Assumptions & Experiments	IIT 3.0	Figure 3A	Figure 3B	Figure 4A	Figure 4B	Figure 5A	Figure 5B,C	Figure 6A and Figure 8	Figure 6B,C	Figure 7A,B
Homogeneous connectivity		✓	✓	✓	✓	✓	✓	✓	✓	✓
Equal mechanism and purview		✓	✓	✓	✓	✓	✓	✓	✓	✓
MIP is a single node or entire regions		✓	✓	✓	✓	✓	✓	✓	✓	✓
Initial noise injection	✓	✓
Continuous noise injection			✓	✓	✓	✓	✓	✓	✓	✓
Independent prior	✓			✓
Stationary prior					✓	✓	✓	✓	✓	✓
Wasserstein distance	✓	✓	✓	✓	✓	✓		✓	✓	✓
KL divergence							✓
Environment decoupling	✓								✓
Environment coupling								✓		✓

Appendix B. Wasserstein Distance

The Wasserstein distance (or Earth Mover’s distance),

D_{W} (P (s), P (s))

, is defined as the minimum “cost of transportation” that arises when transforming one probability distribution

P (s)

into another

Q (s)

. This cost is defined as the amount of “mass” that has to be moved from each state in

P (s)

to another in

Q (s)

, defined by the matrix

W

, and the distance this mass has to be moved, which is defined by the distance

d (s_{I}, s_{J})

this mass has to be transported, defined as the Hamming distance between

s_{I}

and

s_{J}

, which counts the number of places by which two strings differ. Thus, the Wasserstein distance is defined as

D_{W} (P (s), Q (s)) = min_{W \in W (P, Q)} \sum_{s_{I}, s_{J}} d (s_{I}, s_{J}) W_{I J},

(A1)

where the indices

I, J

cover the set of possible states of the array

s

, and

W (P, Q)

represents the set of transfer matrices that meet the conditions

\sum_{J} W_{I J} = P (s)

,

\sum_{I} W_{I J} = Q (s)

,

\sum_{I J} W_{I J} = 1

Appendix B.1. Finite Size

For networks with homogeneous regions of finite size, the Wasserstein distance can be computed directly from the aggregate variables

S

. This is justified as follows.

For each value of

S_{K}

, there will be a set of corresponding values

{s_{I}}_{Σ (s_{I}) = S_{K}}

, where

Σ (s_{I})

is the transformation that obtains the aggregate values of each region such that for each region

U

we have that

S_{U} (t) = \sum_{i \in U} (1 + s_{i} (t)) / 2

.

We note that if

Σ (s_{I}) = Σ (s_{J}) = S_{K}

, this implies in homogeneous systems that probabilities of those states are identical:

P (s_{I}) = P (s_{J})

. For a pair of values

S_{K}, S_{L}

such that

Σ (s_{I}) = S_{K}, Σ (s_{J}) = S_{L}

, we consider the case in which we transport a set of identical probability distributions

{P (s_{I})}_{Σ (s_{I}) = S_{K}}

into

{P (s_{J})}_{Σ (s_{J}) = S_{L}}

. As there is the same amount of mass in all sources and destinations in the transportation cost, there is always a transport scheme such that

W_{I J} = \{\begin{matrix} 0, & if d (s_{I}, s_{J}) > min_{\begin{matrix} s_{I^{'}}, Σ (s_{I^{'}}) = S_{K} \\ s_{J^{'}}, Σ (s_{J^{'}}) = S_{L} \end{matrix}} d (s_{I^{'}}, s_{J^{'}}) \\ W_{I J}^{*}, & otherwise \end{matrix} .

(A2)

Moreover,

min_{\begin{matrix} s_{I^{'}}, Σ (s_{I^{'}}) = S_{K} \\ s_{J^{'}}, Σ (s_{J^{'}}) = S_{L} \end{matrix}} d (s_{I^{'}}, s_{J^{'}}) = d (S_{K}, S_{L}) = \sum_{U} | S_{K, U} - S_{L, U} |

(A3)

In this case, we can rewrite Equation (A1) as

D_{W} (P (s), Q (s)) = D_{W} (P (S), Q (S)) = min_{W \in W (P (S), Q (S))} \sum_{S_{K}, S_{L}} d (S_{K}, S_{L}) W_{K L} .

(A4)

Appendix B.2. Infinite Size

In the infinite-size kinetic Ising model with homogeneous connectivity, probability distributions are the product of an array of independent distributions

P (s_{i} (t \pm τ) | s (t)) = \frac{1 + s_{i} (t \pm τ) m_{i} (t \pm τ)}{2}

. Thus, the cost of transport can be defined as the sum of individual costs of the independent distributions:

D_{W} (P (s (t \pm τ) | s (t)) | | P_{cut} (s (t \pm τ) | s (t))) = \frac{1}{2} \sum_{i} |m_{i} (t \pm τ) - m_{i}^{cut} (t \pm τ)| .

(A5)

The mean activation rate of a region can be interpreted as a function of couplings

J

,

m_{i} (t \pm τ ∣ J)

. We describe mean rate of a region under partition

cut

as

m_{i}^{cut (t \pm τ)} = m_{i} (t \pm τ | J + dJ))

. Where

d J_{i j} = J_{i j}

if

J_{i j} \in J_{cut}

; otherwise,

d J_{i j} = 0

.

We assume a homogeneous system with a number of regions, with

J_{i j} = \frac{1}{N} J_{U, V}

and the partition affects a small number of connections (see Appendix B.3), in the thermodynamic limit the mean rate can be described as the first order term of a Taylor expansion at

dJ = 0

:

m_{i} (t \pm τ | J + dJ)) = m_{i} (t \pm τ | J)) + \sum_{J_{i j} \in J_{cut}} \frac{\partial m_{i} (t \pm τ))}{\partial J_{i j}} J_{i j},

(A6)

then the Wasserstein distance is

D_{W} (P (s (t \pm τ) | s (t)) | | P_{cut} (s (t \pm τ) | s (t))) = \frac{1}{2} \sum_{i} |\sum_{J_{k l} \in J_{cut}} \frac{\partial m_{i} (t \pm τ))}{\partial J_{k l}} J_{k l}| .

(A7)

We assume that the system is divided into homogeneous regions, for

k \in U, l \in V

, and

J_{k l} = \frac{1}{N} J_{UV}

.

\begin{matrix} D_{W} (P (s (t \pm τ) | s (t)) | | P_{cut} (s (t \pm τ) | s (t))) & = \frac{1}{2} \sum_{i} |\sum_{UV} \sum_{\begin{matrix} k \in U, l \in V \\ J_{k l} \in J_{cut} \end{matrix}} \frac{\partial m_{i} (t \pm τ))}{\partial J_{k l}} J_{k l}| \end{matrix}

(A8)

\begin{matrix} = \frac{1}{2} \sum_{W} N_{W} |\sum_{UV} N_{cut} [U, V] \frac{\partial m_{W} (t \pm τ)}{\partial J_{U V}} \frac{J_{U V}}{N}|, \end{matrix}

(A9)

where

N_{cut} [U, V]

is the number of connections from region

U

to region

V

that is affected by the partition

cut

. as it is shown in [14]. Note that the terms

\frac{\partial m_{i} (t \pm τ))}{\partial J_{k l}} J_{k l}

are always equal when

i \neq k

, and the case of

i = k

can is neglected in the thermodynamic limit. Also note that integrated information for infinite size in the homogeneous kinetic Ising model with one region would be equivalent to the magnetic susceptibility of the system [13].

Appendix B.3. Minimum Information Partition in the Thermodynamic Limit

In the homogeneous kinetic Ising model, any conditional distribution can be computed as a product of independent distributions, by recursively computing Equation (13). When computing integrated information with infinite sizes using Equation (A9), if all couplings are positive, the sign of all

\frac{\partial m_{W} (t \pm τ)}{\partial J_{U V}}

is always the same. Then the MIP is the partition that minimises

\begin{matrix} D_{W} (P (s (t \pm τ) | s (t)) | | P_{cut} (s (t \pm τ) | s (t))) & = \frac{1}{2} \sum_{UV} N_{cut} [U, V] F [U, V], \end{matrix}

(A10)

\begin{matrix} F [U, V] & = \sum_{W} N_{W} |\frac{\partial m_{W} (t \pm τ)}{\partial J_{U V}}| \frac{J_{U V}}{N}, \end{matrix}

(A11)

as

F [U, V] > 0

.

We can describe the number of cut partitions as

N_{cut} [U, V] = N_{U} N_{V} (f_{U}^{c} + f_{V}^{f} - 2 f_{U}^{c} f_{V}^{f})

, where

f_{U}^{c}

are the fraction of units of region

V

cut by the partition in the present state, and

f_{V}^{f}

are the fraction of units of region

U

cut by the partition in the future (or past) state. The only constraints are that

f_{U} = \frac{n_{U}}{N_{U}}, n_{U} \in Z

, with

0 < n_{U} < N_{U}

,

max (\sum_{U} n_{U}^{c}, \sum_{U} n_{U}^{f} > 0

and

min (\sum_{U} n_{U}^{c}, \sum_{U} n_{U}^{f}) < N

.

Take, for example, a specific region

U^{'}

, we can decompose the distance function in

D_{W} = N_{U^{'}} N_{V} (\sum_{V} f_{V}^{f} F [U^{'}, V] + f_{U^{'}}^{c} \sum_{V} (1 - 2 f_{V}^{f}) F [U^{'}, V]) + \sum_{U \neq U^{'}, V} N_{U} N_{V} (f_{U}^{c} + f_{V}^{f} - 2 f_{U}^{c} f_{V}^{f}) F [U, V] .

(A12)

If we only consider changes in the value of

f_{U^{'}}^{c}

, the first term is minimised by

f_{U^{'}}^{c} = \{\begin{matrix} 0, & if \sum_{V} (1 - 2 f_{V}^{f}) F [U^{'}, V] > 0 . \\ 1, & otherwise . \end{matrix} .

(A13)

As well, the function grows monotonically with

f_{U^{'}}^{c}

.

Repeating this observation for every possible

f_{U^{'}}^{c}, f_{V^{'}}^{f}

, if

0 < f < 1

, the MIP should be a partition in which all values of f are either 0 or 1. A trivial solution is one where all

f = 0

or all

f = 1

, but this case violates one of the last two constraints above. The closest possible solution that complies with the minimum is one in which one

f_{U^{'}}^{c} = \frac{1}{N_{U^{'}}^{c}}

or one

f_{V^{'}} = \frac{1}{N_{V^{'}}}

, where for the rest of values

f_{U}^{c} = 0, f_{V} = 0

.

That is, the space of possible partitions that could constitute the MIP is constrained to (a) partitions that isolate one single unit in the repertoire of current or future (or past) states or (b) one partition in which all elements of a region in the current or the future (or past) states belong to the same partition.

In case (a), if the unit isolated by the partition belongs to region

U

in the current state, the distance is

D_{W} (P (s (t \pm τ) | s (t)) | | P_{cut} (s (t \pm τ) | s (t))) = \frac{1}{2} \sum_{V} N_{V} F [U, V],

(A14)

whereas when the isolated unit belongs to the future (or past) state, the mean rate of the isolated unit i will be

m_{i}^{cut} (t \pm τ) = 0

. Thus, according to Equation (A7), the distance is

D_{W} (P (s (t \pm τ) | s (t)) | | P_{cut} (s (t \pm τ) | s (t))) = \frac{1}{2} |m_{U} (t \pm τ)| + \frac{1}{2} (1 - \frac{1}{N_{U}}) \sum_{V} N_{V} F [U, V] .

(A15)

Thus, in the thermodynamic limit, in case (a) the isolated unit will always belong to the present state, and never the purview. An equivalent argument applies to the cause repertoire in a stationary prior is assumed.

Appendix B.4. Minimum Information Partition in the Thermodynamic Limit for Computing Φ

The approximation of the MIP of

Φ

in the thermodynamic limit of a kinetic Ising model with an homogeneous region with coupling

J_{i j} = \frac{J}{N}

can be computed by calculating the first term of the Taylor expansion of

D_{W}^{*} (C (J), C_{cut} (J + d J))

(from Equation (5)) around

J = 0

:

D_{W}^{*} (C (J), C_{cut} (J + d J)) = N_{cut} \frac{\partial D_{W}^{*} (C (J), C_{cut} (J + d J))}{\partial J} |_{d J = 0} \frac{J}{N},

(A16)

as in previous cases, if the approximation accurate (the partition is small) then

\frac{\partial D_{W}^{*} (C (J), C_{cut} (J + d J))}{\partial J}

should be positive and the MIP will be a partition that cuts one single node, i.e.,

N_{cut} = 1

.

Appendix C. Kullback–Leibler Divergence

Many versions of

φ

use the Kullback–Leibler divergence as an alternative distance measure to the Wasserstein distance.

The Kullback–Leibler divergence is defined as

D_{K L} (P (s) | | Q (s)) = \sum_{s} P (s) log \frac{P (s)}{Q (s)} .

(A17)

Appendix C.1. Finite Size

In the finite size, the Kullback–Leibler divergence

D_{K L} (P (s) | | Q (s))

is equivalent to the divergence of the aggregate variables

D_{K L} (P (S) | | Q (S))

. This can be shown as follows. The probability

P (S)

is equal to the sum of all probabilities

P (Σ (s))

such that

Σ (s) = S

, having that

P (S) = \sum_{Σ (s) = S} P (Σ (s)) = P (Σ (s)) \sum_{U} (\binom{N_{U}}{S_{U}})

(A18)

then we have that

D_{K L} (P (s) | | Q (s)) = \sum_{s} P (s) log \frac{P (s)}{Q (s)} = \sum_{s} P (s) log \frac{P (S)}{Q (S)} = D_{K L} (P (S) | | Q (S))

(A19)

Appendix C.2. Infinite Size

In the infinite-size kinetic Ising model with homogeneous connectivity, as the state of the units of the system is independent, the Kullback–Leibler divergence can be defined as the sum of individual divergences. As

P (s_{i} (t \pm τ) | s (t)) = \frac{1 + s_{i} (t \pm τ) m_{i} (t \pm τ)}{2}

, we have that

\begin{matrix} D_{K L} (P (s (t \pm τ) | s (t)) | | P_{cut} (s (t \pm τ) | s (t))) \\ = - \sum_{i} \sum_{s_{i}} P (s_{i} (t \pm τ) | s (t)) log (1 + \frac{(m_{i}^{cut} (t \pm τ) - m_{i} (t \pm τ)) s_{i}}{1 + m_{i} (t \pm τ) s_{i}}) . \end{matrix}

(A20)

In the thermodynamic limit, if a partition cuts a small number of connections

m_{i}^{cut} (t \pm τ) - m_{i} (t \pm τ)

is small and we can approximate the value of the distance as a Taylor expansion of order n:

\begin{matrix} D_{K L} (P (s (t \pm τ) | s (t)) | | P_{cut} (s (t \pm τ) | s (t))) \\ = \frac{1}{2} \sum_{i} \sum_{s_{i}} \frac{1}{2} (\frac{1}{1 + m_{i} (t \pm τ) s_{i}} \sum_{J_{k l}, J_{m n} \in J_{cut}} \frac{\partial^{2} m_{i} (t \pm τ)}{\partial J_{k l} \partial J_{m n}} J_{k l} J_{m n}) \\ = \frac{1}{2} \sum_{i} (\frac{1}{1 - m_{i}^{2} (t \pm τ)} \sum_{J_{k l} J_{m n} \in J_{cut}} \frac{\partial^{2} m_{i} (t \pm τ)}{\partial J_{k l} \partial J_{m n}} \frac{J_{k l} J_{m n}}{N^{2}}) . \end{matrix}

(A21)

We assume that the system is divided into homogeneous regions, for

k \in U, l \in V

and

J_{k l} = \frac{1}{N} J_{UV}

.

\begin{matrix} D_{K L} (P (s (t \pm τ) | s (t)) | | P_{cut} (s (t \pm τ) | s (t))) \\ = \frac{1}{2} \sum_{i} \frac{1}{1 - m_{i}^{2} (t \pm τ)} \sum_{UV} \sum_{\begin{matrix} k \in U, l \in V \\ J_{k l} \in J_{cut} \end{matrix}} \sum_{ST} \sum_{\begin{matrix} m \in S, n \in T \\ J_{m n} \in J_{c u t} \end{matrix}} \frac{\partial^{2} m_{W} (t \pm τ)}{\partial J_{UV} \partial J_{ST}} J_{UV} J_{ST} \end{matrix}

\begin{matrix} = \frac{1}{2} \sum_{W} \frac{N_{W}}{1 - m_{W}^{2} (t \pm τ)} \sum_{UVST} N_{cut} [U, V] N_{cut} [S, T] \frac{\partial^{2} m_{W} (t \pm τ)}{\partial J_{UV} \partial J_{ST}} J_{UV} J_{ST} . \end{matrix}

(A22)

where

N_{cut} [U, V]

is the number of connections from region

U

to region

V

that is affected by the partition

cut

. as it is shown in [14]. Note that the terms

\frac{\partial m_{i} (t \pm τ))}{\partial J_{k l}} J_{k l}

are always equal when

i \neq k

, and the case of

i = k

can is neglected in the thermodynamic limit.

Appendix C.3. Minimum Information Partition in the Thermodynamic Limit for the Kullback–Leibler Divergence

The approximation of the MIP in the thermodynamic limit is harder in the case of the Kullback–Leibler divergence due to the quadratic terms. However, in this article we only compute

φ

using the Kullback–Leibler divergence in the case of one region with couplings J. In this case, Equation (A22) becomes

\begin{matrix} D_{K L} (P (s (t \pm τ) | s (t)) | | P_{cut} (s (t \pm τ) | s (t))) \\ = \frac{1}{2} \frac{N}{1 - m^{2} (t \pm τ)} N_{cut}^{2} \frac{\partial^{2} m (t \pm τ)}{\partial J^{2}} {\frac{J}{N}}^{2}, \end{matrix}

(A23)

where

N_{cut}

is the number of connections cut by the partition. In this case, as

\frac{\partial^{2} m (t \pm τ)}{\partial J^{2}} J^{2}

is always positive for positive couplings, we have that the MIP is just a partition that cuts one single node.

References

Oizumi, M.; Albantakis, L.; Tononi, G. From the phenomenology to the mechanisms of consciousness: Integrated information theory 3.0. PLoS Comput. Biol. 2014, 10, e1003588. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Marshall, W.; Kim, H.; Walker, S.I.; Tononi, G.; Albantakis, L. How causal analysis can reveal autonomy in models of biological systems. Phil. Trans. R. Soc. A 2017, 375, 20160358. [Google Scholar] [CrossRef] [PubMed]
Miyahara, K.; Witkowski, O. The integrated structure of consciousness: Phenomenal content, subjective attitude, and noetic complex. Phenom. Cogn. Sci. 2019, 18, 731–758. [Google Scholar] [CrossRef] [Green Version]
Cerullo, M.A. The Problem with Phi: A Critique of Integrated Information Theory. PLoS Comput. Biol. 2015, 11. [Google Scholar] [CrossRef] [Green Version]
Barrett, A.B.; Mediano, P.A. The Phi measure of integrated information is not well-defined for general physical systems. J. Conscious. Stud. 2019, 26, 11–20. [Google Scholar]
Mediano, P.A.M.; Seth, A.K.; Barrett, A.B. Measuring Integrated Information: Comparison of Candidate Measures in Theory and Simulation. Entropy 2019, 21, 17. [Google Scholar] [CrossRef] [Green Version]
Salinas, S.R.A. The Ising Model. In Introduction to Statistical Physics; Salinas, S.R.A., Ed.; Graduate Texts in Contemporary Physics; Springer: New York, NY, USA, 2001; pp. 257–276. [Google Scholar] [CrossRef]
Salinas, S.R.A. Scaling Theories and the Renormalization Group. In Introduction to Statistical Physics; Springer: New York, NY, USA, 2001; pp. 277–304. [Google Scholar]
Beggs, J.M. The criticality hypothesis: How local cortical networks might optimize information processing. Philos. Trans. R. Soc. A 2007, 366, 329–343. [Google Scholar] [CrossRef]
Barrett, A.B.; Seth, A.K. Practical Measures of Integrated Information for Time-Series Data. PLoS Comput. Biol. 2011, 7, e1001052. [Google Scholar] [CrossRef] [Green Version]
Oizumi, M.; Amari, S.i.; Yanagawa, T.; Fujii, N.; Tsuchiya, N. Measuring Integrated Information from the Decoding Perspective. PLoS Comput. Biol. 2016, 12, e1004654. [Google Scholar] [CrossRef]
Hoel, E.P.; Albantakis, L.; Marshall, W.; Tononi, G. Can the macro beat the micro? Integrated information across spatiotemporal scales. Neurosci. Conscious. 2016, 2016. [Google Scholar] [CrossRef] [Green Version]
Kochmański, M.; Paszkiewicz, T.; Wolski, S. Curie-Weiss magnet: A simple model of phase transition. Eur. J. Phys. 2013, 34, 1555–1573. [Google Scholar] [CrossRef] [Green Version]
Aguilera, M.; Di Paolo, E. Integrated information in the thermodynamic limit. Neural Netw. 2019. [Google Scholar] [CrossRef] [PubMed]
Pressé, S.; Ghosh, K.; Lee, J.; Dill, K.A. Principles of maximum entropy and maximum caliber in statistical physics. Rev. Mod. Phys. 2013, 85, 1115–1141. [Google Scholar] [CrossRef] [Green Version]
Tegmark, M. Consciousness as a state of matter. Chaos Soliton. Fract. 2015, 76, 238–270. [Google Scholar] [CrossRef] [Green Version]
Marshall, W.; Albantakis, L.; Tononi, G. Black-boxing and cause-effect power. PLoS Comput. Biol. 2018, 14, e1006114. [Google Scholar] [CrossRef] [Green Version]
Chiel, H.J.; Beer, R.D. The brain has a body: Adaptive behavior emerges from interactions of nervous system, body and environment. Trends Neurosci. 1997, 20, 553–557. [Google Scholar] [CrossRef]
Clark, A. The Dynamical Challenge. Cogn. Sci. 1997, 21, 461–481. [Google Scholar] [CrossRef]
Aguilera, M.; Bedia, M.G.; Santos, B.A.; Barandiaran, X.E. The situated HKB model: How sensorimotor spatial coupling can alter oscillatory brain dynamics. Front. Comput. Neurosci. 2013, 7. [Google Scholar] [CrossRef] [Green Version]
Di Paolo, E.; Buhrmann, T.; Barandiaran, X. Sensorimotor Life: An Enactive Proposal; Oxford University Press: Oxford, UK, 2017. [Google Scholar]
Di Paolo, E.A. Autopoiesis, Adaptivity, Teleology, Agency. Phenomenol. Cogn. Sci. 2005, 4, 429–452. [Google Scholar] [CrossRef]
Albantakis, L.; Hintze, A.; Koch, C.; Adami, C.; Tononi, G. Evolution of integrated causal structures in animats exposed to environments of increasing complexity. PLOS Comput. Biol. 2014, 10, e1003966. [Google Scholar] [CrossRef]
Tononi, G.; Boly, M.; Massimini, M.; Koch, C. Integrated information theory: From consciousness to its physical substrate. Nat. Rev. Neurosci. 2016, 17, 450–461. [Google Scholar] [CrossRef] [PubMed]
Thompson, E.; Varela, F.J. Radical embodiment: Neural dynamics and consciousness. Trends Cogn. Sci. 2001, 5, 418–425. [Google Scholar] [CrossRef]
Albantakis, L.; Tononi, G. Causal Composition: Structural Differences among Dynamically Equivalent Systems. Entropy 2019, 21, 989. [Google Scholar] [CrossRef] [Green Version]

Figure 1. (A) Description of the infinite size kinetic Ising model. (B) Description of the partition schema used to define perturbations. Partitioned connections (black arrows) are injected with random noise. Nonpartitioned connections operate normally or are independent sources of noise (see Section 3.4).

Figure 2. Description of the behaviour of the homogeneous Ising model with one region and coupling J, showing a critical point at

J = 1

. (A) Values of mean firing rate m for the stationary solution of the kinetic Ising model with one homogeneous region. (B) Value of

\frac{\partial m}{\partial J}

for the positive stationary solution of the kinetic Ising model with one homogeneous region, diverging at the critical point.

Figure 2. Description of the behaviour of the homogeneous Ising model with one region and coupling J, showing a critical point at

J = 1

. (A) Values of mean firing rate m for the stationary solution of the kinetic Ising model with one homogeneous region. (B) Value of

\frac{\partial m}{\partial J}

for the positive stationary solution of the kinetic Ising model with one homogeneous region, diverging at the critical point.

Figure 3. Integration of the effect repertoire

φ_{effect} (τ)

of the largest mechanism of a homogeneous Ising model with one region of size

N = 256

and couplings J with different temporal spans

τ

, assuming (A) initial injection of noise and (B) continuous injection of noise. Note that

τ = 1

, in both cases

φ_{effect}

, has the same value.

Figure 3. Integration of the effect repertoire

φ_{effect} (τ)

of the largest mechanism of a homogeneous Ising model with one region of size

N = 256

and couplings J with different temporal spans

τ

, assuming (A) initial injection of noise and (B) continuous injection of noise. Note that

τ = 1

, in both cases

φ_{effect}

, has the same value.

Figure 4. Integration of the cause repertoire

φ_{cause} (τ)

of the largest mechanism of a homogeneous Ising model with one region of size

N = 256

and couplings J with different temporal spans

τ

, assuming (A) an independent prior and (B) the stationary distribution as a prior. Continuous noise injection is assumed.

Figure 4. Integration of the cause repertoire

φ_{cause} (τ)

of the largest mechanism of a homogeneous Ising model with one region of size

N = 256

and couplings J with different temporal spans

τ

, assuming (A) an independent prior and (B) the stationary distribution as a prior. Continuous noise injection is assumed.

Figure 5. Integrated information

φ (τ)

for the cause (black lines) and effect (grey lines) repertoires of the largest mechanism of a homogeneous kinetic Ising models with one region of size N (and infinite size when

N \to \infty

) and coupling J using (A) the Wasserstein distance. (B) The Kullback–Leibler divergence, and (C) values of

φ N

using the Kullback–Leibler divergence. Note that in all cases

φ (τ) = φ_{cause} (τ)

. All cases are computed with

τ = 10 {log}_{2} N

for finite systems and

τ \to \infty

for infinite systems. Continuous noise injection and a stationary prior are assumed.

Figure 5. Integrated information

φ (τ)

for the cause (black lines) and effect (grey lines) repertoires of the largest mechanism of a homogeneous kinetic Ising models with one region of size N (and infinite size when

N \to \infty

) and coupling J using (A) the Wasserstein distance. (B) The Kullback–Leibler divergence, and (C) values of

φ N

using the Kullback–Leibler divergence. Note that in all cases

φ (τ) = φ_{cause} (τ)

. All cases are computed with

τ = 10 {log}_{2} N

for finite systems and

τ \to \infty

for infinite systems. Continuous noise injection and a stationary prior are assumed.

Figure 6. Effects of the environment in integrated information. Integrated information

φ_{M} (τ)

(black lines) of a mechanism

M

of size

\frac{3 N}{4}

of a homogeneous kinetic Ising model with one region of size N and coupling J, assuming that elements outside of the mechanism operate (A) normally, (B) as independent sources of noise and (C) as static input fields. Values of

φ_{M} (τ)

are compared with

φ_{M, effect} (τ \to \infty)

(grey line) to show diverging tendencies of the effect repertoire. Note that tendencies of

φ_{M, effect} (τ \to \infty)

are larger than values of

φ_{M} (τ)

, as the effect repertoire tends to show larger values. Values of

φ

are computed with

τ = 10 {log}_{2} N

for finite systems and

τ \to \infty

for infinite systems. Continuous noise injection and a stationary prior are assumed.

Figure 6. Effects of the environment in integrated information. Integrated information

φ_{M} (τ)

(black lines) of a mechanism

M

of size

\frac{3 N}{4}

of a homogeneous kinetic Ising model with one region of size N and coupling J, assuming that elements outside of the mechanism operate (A) normally, (B) as independent sources of noise and (C) as static input fields. Values of

φ_{M} (τ)

are compared with

φ_{M, effect} (τ \to \infty)

(grey line) to show diverging tendencies of the effect repertoire. Note that tendencies of

φ_{M, effect} (τ \to \infty)

are larger than values of

φ_{M} (τ)

, as the effect repertoire tends to show larger values. Values of

φ

are computed with

τ = 10 {log}_{2} N

for finite systems and

τ \to \infty

for infinite systems. Continuous noise injection and a stationary prior are assumed.

Figure 7. Mechanims and system-level integration in a homogeneous system with one region of size N and coupling J. Values of (A)

φ

of the largest mechanism and (B) values of

Φ

for the whole system. Measures with

τ = 10 {log}_{2} N

, assuming continuous noise injection, stationary priors and environment coupling.

Figure 7. Mechanims and system-level integration in a homogeneous system with one region of size N and coupling J. Values of (A)

φ

of the largest mechanism and (B) values of

Φ

for the whole system. Measures with

τ = 10 {log}_{2} N

, assuming continuous noise injection, stationary priors and environment coupling.

Figure 8. Integrated information in a system coupled to an environment. (A) Structure of couplings between the two regions

A, B

of size

N_{A} = N_{E} = \frac{N}{2}

of a homogeneous kinetic Ising models with couplings

J_{AA} = J_{R}, J_{EE} = 0, J_{AE} = J_{C}, J_{EA} = 2 J_{C}

. (B–E) Integrated information of the mechanism

A

,

φ_{A}

and mechanism

AE

,

φ_{AE}

, for values of

J_{R} = 1

and

J_{C} = 0.8

and

J_{C} = 1.2

, respectively. Values of

φ

are computed for

τ = 10 {log}_{2} N

for finite systems and

τ \to \infty

for infinite systems. Continuous noise injection, stationary priors and environment coupling are assumed.

Figure 8. Integrated information in a system coupled to an environment. (A) Structure of couplings between the two regions

A, B

of size

N_{A} = N_{E} = \frac{N}{2}

of a homogeneous kinetic Ising models with couplings

J_{AA} = J_{R}, J_{EE} = 0, J_{AE} = J_{C}, J_{EA} = 2 J_{C}

. (B–E) Integrated information of the mechanism

A

,

φ_{A}

and mechanism

AE

,

φ_{AE}

, for values of

J_{R} = 1

and

J_{C} = 0.8

and

J_{C} = 1.2

, respectively. Values of

φ

are computed for

τ = 10 {log}_{2} N

for finite systems and

τ \to \infty

for infinite systems. Continuous noise injection, stationary priors and environment coupling are assumed.

© 2019 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Aguilera, M. Scaling Behaviour and Critical Phase Transitions in Integrated Information Theory. Entropy 2019, 21, 1198. https://doi.org/10.3390/e21121198

AMA Style

Aguilera M. Scaling Behaviour and Critical Phase Transitions in Integrated Information Theory. Entropy. 2019; 21(12):1198. https://doi.org/10.3390/e21121198

Chicago/Turabian Style

Aguilera, Miguel. 2019. "Scaling Behaviour and Critical Phase Transitions in Integrated Information Theory" Entropy 21, no. 12: 1198. https://doi.org/10.3390/e21121198

APA Style

Aguilera, M. (2019). Scaling Behaviour and Critical Phase Transitions in Integrated Information Theory. Entropy, 21(12), 1198. https://doi.org/10.3390/e21121198

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Scaling Behaviour and Critical Phase Transitions in Integrated Information Theory

Abstract

1. Introduction

2. Model

2.1. IIT 3.0

Working Assumptions

2.2. Kinetic Ising Model with Homogeneous Regions

2.3. Integrated Information in the Kinetic Ising Model with Homogeneous Regions

3. Results

3.1. Dynamics and Temporal Span of Integrated Information

3.2. Integrated Information of the Cause Repertoire

3.3. Divergence of Integrated Information: Wasserstein and Kullback–Leibler Distance Measures

3.4. Situatedness: Effect of the Environment of a System

3.5. System-Level or Mechanism-Level Integration: Big Phi versus Small Phi

3.6. Values versus Tendencies of Integration

4. Discussion

Funding

Acknowledgments

Conflicts of Interest

Appendix A. List of Assumptions and Experiments

Appendix B. Wasserstein Distance

Appendix B.1. Finite Size

Appendix B.2. Infinite Size

Appendix B.3. Minimum Information Partition in the Thermodynamic Limit

Appendix B.4. Minimum Information Partition in the Thermodynamic Limit for Computing Φ

Appendix C. Kullback–Leibler Divergence

Appendix C.1. Finite Size

Appendix C.2. Infinite Size

Appendix C.3. Minimum Information Partition in the Thermodynamic Limit for the Kullback–Leibler Divergence

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI