Measuring Integrated Information: Comparison of Candidate Measures in Theory and Simulation

Integrated Information Theory (IIT) is a prominent theory of consciousness that has at its centre measures that quantify the extent to which a system generates more information than the sum of its parts. While several candidate measures of integrated information (“Φ”) now exist, little is known about how they compare, especially in terms of their behaviour on non-trivial network models. In this article, we provide clear and intuitive descriptions of six distinct candidate measures. We then explore the properties of each of these measures in simulation on networks consisting of eight interacting nodes, animated with Gaussian linear autoregressive dynamics. We find a striking diversity in the behaviour of these measures—no two measures show consistent agreement across all analyses. A subset of the measures appears to reflect some form of dynamical complexity, in the sense of simultaneous segregation and integration between system components. Our results help guide the operationalisation of IIT and advance the development of measures of integrated information and dynamical complexity that may have more general applicability.


Introduction
Since the seminal work of Tononi, Sporns and Edelman [45], and more recently, of Balduzzi and Tononi [5], there have been many valuable contributions in neuroscience towards understanding and quantifying the dynamical complexity of a wide variety of systems.A system is said to be dynamically complex if it shows a balance between two competing tendencies, namely • integration, i.e. the system behaves as one; and • segregation, i.e. the parts of the system behave independently.
The notion of dynamical complexity has also been variously described as a balance between order and disorder, or between chaos and synchrony, and has been related to criticality and metastability [31].Many quantitative measures of dynamical complexity have been proposed, but a theoreticallyprincipled, one-size-fits-all measure remains elusive.
A prominent framework highlighting the extent of simultaneous integration and segregation is Integrated Information Theory (IIT), which studies dynamical complexity from information-theoretic principles.Measures of integrated information attempt to quantify the extent to which the whole system is generating more information than the 'sum of its parts'.The information to be quantified is typically the information that the current state contains about a past state (for the information integrated over time window τ , the past state to be considered is that at time τ from the present).The partitioning is done such that one considers the parts with the weakest links between them, in other words, the partition across which integrated information is computed is the 'minimum information partition.'There are many ways one can operationalise this concept of integrated information.Consequently, there now exists a range of distinct integrated information measures.
Proponents of IIT claim that measures of integrated information potentially relate to the quantity of consciousness generated by any physical system [34].This is however controversial, and empirical evidence of a relationship between any particular measure of integrated information and consciousness remains scarce [15].Here, we do not focus on the connections of IIT to consciousness, although we do comment on the application of IIT to neural data (see Discussion).We instead consider measures of integrated information more generally as useful operationalisations of notions of dynamical complexity.
We have two goals.First, to provide a unified source of explanation of the principles and practicalities of the various candidate measures of integrated information.Second, to examine the behaviour of candidate measures on non-trivial network models, in order to shed light on their comparative practical utility.
In a recent related paper, Tegmark [41] developed a theoretical taxonomy of all integrated information measures that can be written as a distance between a probability distribution pertaining to the whole and that obtained from the product of probability distributions pertaining to the parts.Here we review in detail five distinct and prominent proposed measures of integrated information, including two (ψ and Φ G ) that were not covered in Tegmark's taxonomy.These are: whole-minussum integrated information Φ [5]; integrated stochastic interaction Φ [11]; integrated synergy ψ [19]; decoder-based integrated information Φ * [35]; geometric integrated information Φ G [37].We also consider, for comparison, the measure causal density (CD) [39], which can be considered as the sum of independent information transfers in the system (without reference to a minimum information partition).This measure has previously been discussed in conjunction with integrated information measures [40,39].
All of the measures have the potential to behave in ways which are not obvious a priori, and in a manner difficult to express analytically.While some simulations of some of the measures (Φ, Φ and CD) on networks have been performed [11,39], other measures (Φ * and Φ G ) have not previously been computed on any model consisting of more than two components.This paper provides a comparison of the full suite of measures on non-trivial network models.We consider eight-node networks with a range of different architectures, animated with basic noisy vector autoregressive dynamics.We examine how network topology as well as coupling strength and correlation of noise inputs affect each measure.We also plot the relation between each measure and the global correlation (a simple dynamical control).Based on these comparisons we discuss the extent to which each measure appears genuinely to capture the co-existence of integration and segregation central to the concepts of dynamical complexity and integrated information.
After covering the necessary preliminaries in Section 2, Section 3 sets out the intuition behind the measures, and summarises the mathematics behind the definition of each measure.In Section 4 we present the simulations.Then Section 5 is the Discussion.In the Appendix, Section A.1, we derive new formulae for computing the decoder-based integrated information Φ * for Gaussian systems, correcting the previous formulae in Ref. [35].Other Appendices contain further derivations of mathematical properties of the measures.

Notation, convention and preliminaries
In this section we review the fundamental concepts needed to define and discuss the candidate measures of integrated information.In general, we will denote random variables with uppercase letters (e.g.X, Y ) and particular instantiations with the corresponding lowercase letters (e.g.x, y).Variables can be either continuous or discrete, and we assume that continuous variables can take any value in R n and that a discrete variable X can take any value in the finite set Ω X .Whenever there is a sum involving a discrete variable X we assume the sum runs for all possible values of X (i.e. the whole Ω X ).A partition P = {M 1 , M 2 , . . ., M r } divides the elements of system X into r non-overlapping, non-empty sub-systems (or parts), such that X = M 1 M 2 • • • M r and M i M j = ∅, for any i, j.We denote each variable in X as X i , and the total number of variables in X as n.When dealing with time series, time will be indexed with a subscript, e.g.X t .
Entropy H quantifies the uncertainty associated with random variable X -i.e. the higher H(X) the harder it is to make predictions about X -and is defined as In many scenarios, a discrete set of states is insufficient to represent a process or time series.This is the case, for example, with brain recordings, which come in real-valued time series and with no a priori discretisation scheme.In these cases, using a continuous variable X ∈ R we can similarly define the differential entropy, However, differential entropy is not as interpretable and well-behaved as its discrete-variable counterpart.For example, differential entropy is not invariant to rescaling or other transformations on X.Moreover, it is only defined if X has a density with respect to the Lebesgue measure dx; this assumption will be upheld throughout this paper.We can also define the conditional and joint entropies as H(X, Y ) =: − x,y p(x, y) log p(x, y) , respectively.Conditional and joint entropies can be analogously defined for continuous variables by appropriately replacing sums with integrals.The Kullback-Leibler (KL) divergence quantifies the dissimilarity between two probability distributions p and q: D KL (p q) =: The KL divergence represents a notion of (non-symmetric) distance between two probability distributions.It plays an important role in information geometry, which deals with the geometric structure of manifolds of probability distributions.Finally, mutual information I quantifies the interdependence between two random variables X and Y .It is the KL divergence between the full joint distribution and the product of marginals, but it can also be expressed as the average reduction in uncertainty about X when Y is given: Mutual information is symmetric in the two arguments X and Y .We make use of the following properties of mutual information: 1. I(X; Y ) = I(Y ; X), 2. I(X; Y ) ≥ 0, and 3. I(f (X); g(Y )) = I(X; Y ) for any injective functions f, g.
We highlight one implication of property 3: I is upper-bounded by the entropy of both X and Y .This means that the entropy H(X) of a random variable X is the maximum amount of information X can have about any other variable Y (or another variable Y can have about X).
Mutual information is defined analogously for continuous variables and, unlike differential entropy, it retains its interpretability in the continuous case. 1 Furthermore, one can track how much information a system preserves during its temporal evolution by computing the time-delayed mutual information (TDMI) I(X t ; X t−τ ).
Next, we introduce notation and several useful identities to handle Gaussian variables.Given an n-dimensional real-valued system X, we denote its covariance matrix as Σ(X) ij =: cov(X i , X j ).Similarly, cross-covariance matrices are denoted as Σ(X, Y ) ij =: cov(X i , Y j ).We will make use of the conditional (or partial) covariance formula, For Gaussian variables, All systems we deal with in this article are stationary and ergodic, so throughout the paper Σ(X t ) = Σ(X t−τ ) for any τ .
3 Integrated information measures

Overview
In this section we review the theoretical underpinnings and practical considerations of several proposed measures of integrated information, and in particular how they relate to intuitions about segregation, integration and complexity.These measures are: • Whole-minus-sum integrated information, Φ; • Integrated stochastic interaction, Φ; • Integrated synergy, ψ; • Decoder-based integrated information, Φ * ; • Geometric integrated information, Φ G ; and • Causal density, CD.
All of these measures (besides CD) have been inspired by the measure proposed by Balduzzi and Tononi in [5], which we call Φ 20082 .Φ 2008 was based on the information the current state contains about a hypothetical maximum entropy past state.In practice, this results in measures that are applicable only to discrete Markovian systems [11].For broader applicability, it is more practical to build measures based on the ongoing spontaneous information dynamics -that is, based on p(X t , X t−τ ) without applying a perturbation to the system.Measures are then well-defined for any stochastic system (with a well-defined Lebesgue measure across the states), and can be estimated for real data using empirical distributions if stationarity can be assumed.All of the measures we consider in this paper are based on a system's spontaneous information dynamics.
Table 1 contains a brief description of each measure and a reference to the original publication that introduced it.We refer the reader to the original publications for more detailed descriptions of each measure.Table 2 contains a summary of properties of the measures considered, proven for the case in which the system is ergodic and stationary, and the spontaneous distribution is used.

Minimum information partition
Key to all measures of integrated information is the notion of splitting or partitioning the system to quantify the effect of such split on the system as a whole.In that spirit, integrated information measures are defined through some measure of effective information, which operationalises the concept of "information beyond a partition" P.This typically involves splitting the system according to P and computing some form of information loss, via (for example) mutual information (Φ), conditional entropy ( Φ), or decoding accuracy (Φ * ) (see Table 1).Integrated information is then the effective information with respect to the partition that identifies the "weakest link" in the system, i.e. the partition for which the parts are least integrated.Formally, integrated information is the effective information beyond the minimum information partition (MIP), which, given an effective information measure f [X; τ, P], is defined as where K(P) is a normalisation coefficient.In other words, the MIP is the partition across which the (normalised) effective information is minimum, and integrated information is the (unnormalised) effective information beyond the MIP.The purpose of the normalisation coefficient is to avoid biasing the minimisation towards unbalanced bipartitions (recall that the extent of information sharing between parts is bounded by the entropy of the smaller part).Balduzzi and Tononi [5] suggest the form However, not all contributions to IIT have followed Balduzzi and Tononi's treatment of the MIP.
Of the measures listed above, Φ and Φ share this partition scheme, ψ defines the MIP through an unnormalised effective information, and Φ * , Φ G and CD are defined via the atomic partition without any reference to the MIP.These differences are a confounding factor when it comes to comparing measures -it becomes difficult to ascertain whether differences in behaviour of various measures are due to their definitions of effective information, to their normalisation factor (or lack thereof), or to their partition schemes.We return to this discussion in Sec.5.1.
In the following we present all measures as they were introduced in their original papers (see Table 1), although it is trivial to combine different effective information measures with different partition optimisation schemes.However, all results presented in Sec. 4 are calculated by minimising each unnormalised effective information measure over even-sized bipartitions -i.e.bipartitions in which both parts have the same number of components.This is to avoid conflating the effect of the partition scan method with the effect of the integrated information measure itself.

Whole-minus-sum integrated information Φ
We next turn to the different measures of integrated information.As highlighted above, a primary difference among them is how they define the effective information beyond a given partition.Since most measures were inspired by Balduzzi and Tononi's Φ 2008 , we start there.
For Φ 2008 , the effective information ϕ 2008 is given by the KL divergence between p c (X is the conditional distribution for X 0 given X 1 = x under the perturbation at time 0 into all states with equal probability -i.e.given that the joint distribution is given by p ce (X 0 , X 1 ) =: p(X 1 |X 0 )p u (X 0 ), where p u is the uniform (maximum entropy) distribution 4 .
Averaging ϕ 2008 over all states x, the result can be expressed as either or These two expressions are equivalent under the uniform perturbation, since they differ only by a factor that vanishes if p(X 0 ) is the uniform distribution.However, they are not equivalent if the spontaneous distribution of the system is used instead -i.e. if p(X t−τ , X t ) is used instead of p ce (X 0 , X 1 ).This means that for application to spontaneous dynamics (i.e.without perturbation) we have two alternatives that give rise to two measures that are both equally valid analogs of Φ 2008 .We call the first alternative whole-minus-sum integrated information Φ (Φ E in [11]).The effective information ϕ is defined as the difference in time-delayed mutual information between the whole system and the parts.The effective information of the system beyond a certain partition P is We can interpret I(X t ; X t−τ ) as how good the system is at predicting its own future or decoding its own past 5 .Then ϕ here can be seen as the loss in predictive power incurred by splitting the system according to P. The details of the calculation of Φ (and the MIP) are shown in Box 3.1.Φ is often regarded as a poor measure of integrated information because it can be negative [35].This is indeed conceptually awkward if Φ is seen as an absolute measure of integration between the parts of a system, though it is a reasonable property if Φ is interpreted as a "net synergy" measure [9] -quantifying to what extent the parts have shared or complementary information about the future state.That is, if Φ > 0 we infer that the whole is better than the parts at predicting the future (i.e., Φ > 0 is a sufficient condition), but a negative or zero Φ does not imply the opposite.Therefore, from an IIT perspective a negative Φ can lead to the understandably confusing interpretation of a system having "negative integration," but through a different lens (net synergy) it can be more easily interpreted as (negative) overall redundancy in the evolution of the system.See Section 3.5 and Ref. [9] for further discussion on whole-minus-sum measures.
1.For discrete variables: For continuous, linear-Gaussian variables: 3. For continuous variables with an arbitrary distribution, we must resort to the nearestneighbour methods introduced by [25].See reference for details.

Integrated stochastic interaction Φ
We next consider the second alternative for Φ 2008 for spontaneous information dynamics: integrated stochastic interaction Φ.Also introduced in Barrett and Seth [11], this measure embodies similar concepts as Φ, with the main difference being that Φ utilises a definition of effective information in terms of an increase in uncertainty instead of in terms of a loss of information.
Φ is based on stochastic interaction φ, introduced by Ay [4].Akin to Eq. ( 15), we define stochastic interaction beyond partition P as φ[X; τ, P] =: Stochastic interaction quantifies to what extent uncertainty about the past is increased when the system is split in parts, compared to considering the system as a whole.The details of the calculation of Φ are similar to those of Φ and are described in Box 3.2.
The most notable advantage of Φ over Φ as a measure of integrated information is that Φ is guaranteed to be non-negative.In fact, as mentioned above ϕ and φ are related through the equation where This measure is also linked to information destruction, as presented in Wiesner et al. [48].The quantity H(X t−τ |X t ) measures the amount of irreversibly destroyed information, since H(X t−τ |X t ) > 0 indicates that more than one possible past trajectory of the system converged on the same present state, making the system irreversible and indicating a loss of information about the past states.From this perspective, φ can be understood as the difference between the information that is considered destroyed when the system is observed as a whole, or split into parts.Note however that this measure is time-symmetric when applied to a stationary system; for stationary systems total instantaneous entropy does not increase with time.
1.For discrete variables: For continuous, linear-Gaussian variables: 3. For continuous variables with an arbitrary distribution, we must resort to the nearestneighbour methods introduced by [25].See reference for details.

Integrated synergy ψ
Originally designed as a "more principled" integrated information measure [19], ψ shares some features with Φ and Φ but is grounded in a different branch of information theory, namely the Partial Information Decomposition (PID) framework, as described by Williams and Beer [49].In the PID, the information that two (source) variables provide about a third (target) variable is decomposed into four non-negative terms as where U α is the unique information of source α, R is the redundancy between both sources and S is their synergy.Figure 1 illustrates the involved quantities in a Venn diagram.Integrated synergy ψ is the information that the parts provide about the future of the system that is exclusively synergistic -i.e.cannot be provided by any combination of parts independently: where The main problem of PID is that it is underdetermined.For example, for the case of two sources, Shannon's information theory specifies three quantities (I(X, Y ; Z), I(X; Z), I(Y ; Z)) whereas PID specifies four (S, R, U X , U Y ).Therefore, a complete operational definition of ψ requires a definition of redundancy from which to construct the partial information components [49].In this sense, the main shortcoming of ψ, inherited from PID, is that there is no agreed consensus on a definition of redundancy [9,12].
Here, we take Griffith's conceptual definition of ψ and we complement it with available definitions of redundancy.For the linear-Gaussian systems we will be studying in Sec. 4, we use the minimum mutual information PID presented in [9] 6 .Although we do not show any discrete examples here, for completeness we provide complete formulae to calculate ψ for discrete variables using Griffith and Koch's redundancy measure [20].Note that alternatives are available for both discrete and linear-Gaussian systems [38,23,49,13,24].

Decoder-based integrated information Φ *
Introduced by Oizumi et al. in Ref. [35], decoder-based integrated information Φ * takes a different approach from the previous measures.In general, Φ * is given by where I * is known as the mismatched decoding information, and quantifies how much information can be extracted from a variable if the receiver is using a suboptimal (or mismatched ) decoding distribution [27,33].This mismatched information has been used in neuroscience to quantify the contribution of neural correlations in stimulus coding [36], and can similarly be used to measure the contribution of inter-partition correlations to predictive information.
To calculate Φ * we formulate a restricted model q in which the correlations between partitions are ignored, and we calculate I * for the case where the sender is using the full model p as an encoder and the receiver is using the restricted model q as a decoder.The details of the calculation of Φ * and I * are shown in Box 3.4.Unlike the previous measures shown in this section, Φ * does not have an interpretable formulation in terms of simpler information-theoretic functionals like entropy and mutual information.Calculating I * involves a one-dimensional optimisation problem, which is straightforwardly solvable if the optimised quantity, Ĩ(β), has a closed form expression [27].For systems with continuous variables, it is in general very hard to estimate Ĩ(β).However, for continuous linear-Gaussian systems and for discrete systems Ĩ(β) has an analytic closed form as a function of β if the covariance or joint probability table of the system are known, respectively.In Appendix A we derive the formulae.(Note the version written down in [35] is incorrect, although their simulations match our results; we checked results from our derived version of the formulae versus results obtained from numerical integration, and confirmed that our derived formulae are the correct ones.)Conveniently, in both the discrete and the linear-Gaussian case Ĩ(β) is concave in β (proofs in [27] and in Appendix A, respectively), which makes the optimisation significantly easier.
1.For discrete variables: For continuous, linear-Gaussian variables: (see appendix for details) For continuous variables with an arbitrary distribution: unknown.

Geometric integrated information Φ G
In [37], Oizumi et al. approach the notion of dynamical complexity via yet another formalism.Their approach is based on information geometry [2,1].The objects of study in information geometry are spaces of families of probability distributions, considered as differentiable (smooth) manifolds.The natural metric in information geometry is the Fisher information metric, and the KL divergence provides a natural measure of (asymmetric) distance between probability distributions.Information geometry is the application of differential geometry to the relationships and structure of probability distributions.
To quantify integrated information, Oizumi et al. [37] consider the divergence between the complete model of the system under study p(X t−τ , X t ) and a restricted model q(X t−τ , X t ) in which links between the parts of the system have been severed.This is known as the M-projection of the system onto the manifold of restricted models Key to this measure is that in considering the partitioned system, it is only the connections that are cut; correlations between the parts are still allowed on the partitioned system.Although conceptually simple, Φ G is very hard to calculate compared to all other measures we consider here (see Box 3.5).
There is no known closed form solution for any system, and we can only find approximate numerical estimates for some systems.In particular, for discrete and linear-Gaussian variables we can formulate Φ G as the solution of a pure constrained multivariate optimisation problem, with the advantage that the optimisation objective is differentiable and convex [14].
1.For discrete variables: numerically optimise the objective D KL (p q) subject to the constraints For continuous, linear-Gaussian variables: numerically optimise the objective where Σ(E) = Σ(X t |X t−1 ), and subject to the constraints 3. For continuous variables with an arbitrary distribution: unknown.

Causal density
Causal density (CD) is somewhat distinct from the other measures considered so far, in the sense that it is a sum of information transfers rather than a direct measure of the extent to which the whole is greater than the parts.Nevertheless, we include it here because of its relevance and use in the dynamical complexity literature.CD was originally defined in terms of Granger causality [18], but here we write it in terms of Transfer Entropy (TE) which provides a more general information-theoretic definition [6].The conditional transfer entropy from X to Y conditioned on Z is defined as With this definition of TE we define CD as the average pairwise conditioned TE between all variables in X, where M [ij] is the subsystem formed by all variables in X except for those in parts M i and M j .In a practical sense, CD has many advantages.It has been thoroughly studied in theory [7] and applied in practice, with application domains ranging from complex systems to neuroscience [28,29,32].Furthermore, there are off-the-shelf algorithms that calculate TE in discrete and continuous systems [8].For details of the calculation of CD see Box 3.6.
Causal density is a principled measure of dynamical complexity, as it vanishes for purely segregated or purely integrated systems.In a highly segregated system there is no information transfer at all, and in a highly integrated system there is no transfer from one variable to another beyond the rest of the system [39].Furthermore, CD is non-negative and upper-bounded by the total timedelayed mutual information (proof in Appendix B), therefore satisfying what other authors consider an essential requirement for a measure of integrated information [37].
1.For discrete variables: For continuous, linear-Gaussian variables: For continuous variables with an arbitrary distribution, we must resort to the nearestneighbour methods introduced by [25].See reference for details.

Other measures
As already mentioned, all the measures reviewed here (besides CD) were inspired by the Φ 2008 measure, which arose from the version of IIT laid out in Ref. [5].The most recent version of IIT [34] is conceptually distinct, and the associated "Φ-3.0" is consequently different to the measures we consider here.The consideration of perturbation of the system, as well as all of its subsets, in both the past and the future renders Φ-3.0 considerably more computationally expensive than other Φ measures.We do not here attempt to consider the construction of an analogue of Φ-3.0 for spontaneous information dynamics.Such an undertaking lies beyond the scope of this paper.Recently, Tegmark [41] developed a comprehensive taxonomy of all integrated information measures that can be written as a distance between a probability distribution pertaining to the whole and one obtained as a product of probability distributions pertaining to the parts.Tegmark further identified a shortlist of candidate measures, based on a set of explicit desiderata.This shortlist overlaps with the measures we consider here, and also contains other measures which are minor variants.Of Tegmark's shortlisted measures, φ M is equivalent to Φ under the system's spontaneous distribution, φ M kk is its state-resolved version, φ oak is transfer entropy (which we cover here through CD), and φ npk is not defined for continuous variables.The measures Φ G and ψ are outside of Tegmark's classification scheme.

Results
All of the measures of integrated information that we have described have the potential to behave in ways which are not obvious a priori, and in a manner difficult to express analytically.While some simulations of Φ, Φ and CD on networks have been performed [11,39], Φ * and Φ G have not previously been computed on models consisting of more than two components, and ψ hasn't previously been explored at all on systems with continuous variables.In this section, we study all the measures together on small networks.We compare the behaviour of the measures, and assess the extent to which each measure is genuinely capturing dynamical complexity.
To recap, we consider the following 6 measures: • Whole-minus-sum integrated information, Φ.
We use models based on stochastic linear auto-regressive (AR) processes with Gaussian variables.These constitute appropriate models for testing the measures of integrated information.They are straightforward to parameterise and simulate, and are amenable to the formulae presented in Section 3. Mathematically, we define an AR process (of order 1) by the update equation where ε t is a serially independent random sample from a zero-mean Gaussian distribution with given covariance Σ(ε), usually referred to as the noise or error term.A particular AR process is completely specified by the coupling matrix or network A and the noise covariance matrix Σ(ε).An AR process is stable, and stationary, if the spectral radius of the coupling matrix is less than 1 [30].(The spectral radius is the largest of the absolute values of its eigenvalues.)All the example systems we consider are calibrated to be stable, so the Φ measures can be computed from their stationary statistics.We shall consider how the measures vary with respect to: (i) the strength of connections, i.e. the magnitude of non-zero terms in the coupling matrix; (ii) the topology of the network, i.e the arrangement of the non-zero terms in the coupling matrix; (iii) the density of connections, i.e. the density of non-zero terms in the coupling matrix; and (iv) the correlation between noise inputs to different system components, i.e. the off diagonal terms in Σ(ε).The strength and density of connections can be thought of as reflecting, in different ways, the level of integration in the network.The correlation between noise inputs reflects (inversely) the level of segregation, in some sense.We also, in each case, compute the control measures • Time-delayed mutual information (TDMI), I(X t−τ , X t ); and • Average absolute correlation Σ, defined as the average absolute value of the non-diagonal entries in the system's correlation matrix.
These simple measures quantify straightforwardly the level of interdependence between elements of the system, across time and space respectively.TDMI captures the total information generated as the system transitions from one time-step to the next, and Σ is another basic measure of the level of integration.We report the unnormalised measures minimised over even-sized bipartitions -i.e.bipartitions in which both parts have the same number of components.In doing this we avoid conflating the effects of the choice of definition of effective information with those of the choice of partition search (see Sec. 3.2).See Discussion (Sec.5.1) for more on this.

Key quantities for computing the integrated information measures
To compute the integrated information measures, the stationary covariance and lagged partial covariance matrices are required.By taking the expected value of X T t X t with Eq. ( 33) and given that ε t is white noise, uncorrelated in time, one obtains that the stationary covariance matrix Σ(X) is given by the solution to the discrete-time Lyapunov equation, This can be easily solved numerically, for example in Matlab via use of the dlyap command.The lagged covariance can also be calculated from the parameters of the AR process as and partial covariances can be obtained by applying Eq. ( 7).Finally, we obtain the analogous quantities for the partitions by the marginalisation properties of the Gaussian distribution.Given a bipartition X t = {M t , N t }, we write the covariance and lagged covariance matrices as and we simply read the partition covariance matrices as

Two-node network
We begin with the simplest non-trivial AR process, Setting a = 0.4 we obtain the same model as depicted in Fig. 3 in Ref. [35].We simulate the AR process with different levels of noise correlation c and show results for all the measures in Fig. 2. Note that as c approaches 1 the system becomes degenerate, so some matrix determinants in the formulae become zero causing some measures to diverge.(38).Two connected nodes with coupling strength a receive noise with correlation c, which can be thought of as coming from a common source.(B) All integrated information measures for different noise correlation levels c. 2 immediately reveals a wide variability of behaviour among the measures, in both value and trend, even for this minimally simple model.Nevertheless, some patterns emerge.Both TDMI and Φ G are unaffected by noise correlation, and both Φ and Σ grow monotonically with c.In fact, Φ diverges to infinity as c → 1.The measures ψ, Φ * , and CD decrease monotonically to 0 when the effect of the coupling cannot be distinguished from the noise.On the other hand, Φ also decreases monotonically but becomes negative for large enough c.

Inspection of Figure
In Fig. 3 we analyse the same system, but now varying both noise correlation c and coupling strength a.As per the stability condition presented above, any value of a ≥ 0.5 makes the system's spectral radius greater than or equal to 1, so the system becomes non-stationary and variances diverge.Hence in these plots we evaluate all measures for values of a below the limit a = 0.5.Again, the measures behave very differently.In this case TDMI and Φ G remain unaffected by noise correlation, and grow with increasing coupling strength as expected.In contrast, Φ and Σ increase with both a and c.Φ decreases with c but shows non-monotonic behaviour with a.Of all the measures, ψ, Φ * , and CD show desirable properties consistent with capturing conjoined segregation and integration -they monotonically decrease with noise correlation and increase with coupling strength.

Eight-node networks
We now turn to networks with eight nodes, enabling examination of a richer space of dynamics and topologies.
We first analyse a network optimised using a genetic algorithm to yield high Φ [11].The noise covariance matrix has ones in the diagonal and c everywhere else, and now a is a global factor applied to all edges of the network.The adjacency matrix is scaled such that its spectral radius is 1 when a = 1.Similar to the previous section, we evaluate all measures for multiple values of a and c and show the results in Fig. 4.
Moving to a larger network mostly preserves the features highlighted above.TDMI is unaffected Figure 4: All integrated information measures for the Φ-optimal AR process proposed by [11], for different coupling strengths a and noise correlation levels c.Vertical axis is inverted for visualisation purposes.by c; Φ behaves like Σ and diverges for large c; and Φ * and CD have the same trend as before, although now the decrease with c is less pronounced.Interestingly, ψ and Φ G increase slightly with c, and Φ does not show the instability and negative values seen in Fig. 3. Overall, in this more complex network the effect of increasing noise correlation on Φ, ψ, Φ * , and CD is not as pronounced as in simpler networks, where these measures decrease rapidly towards zero with increasing c.
Thus far we have studied the effect of AR dynamics on integrated information measures, keeping the topology of the network fixed and changing only global parameters.We next examine the effect of network topology, on a set of 6 networks: A A fully connected network without self-loops.B The Φ-optimal binary network presented in [11].C The Φ-optimal weighted network presented in [11].D A bidirectional ring network.E A "small-world" network, formed by introducing two long-range connections to a bidirectional ring network.F An unidirectional ring network.
In each network the adjacency matrix has been normalised to a spectral radius of 0.9.As before, we simulate the system following Eq.(33), and here set noise input correlations to zero (c = 0) so the noise input covariance matrix is just the identity matrix.Figure 5 shows connectivity diagrams of the networks for visual comparison, and Fig. 6 shows the values of all integrated information measures evaluated on all networks.

A B C D E F
Figure 5: Networks used in the comparative analysis of integrated information measures.(A) Fully connected network, (B) Φ-optimal binary network from [11], (C) Φ-optimal weighted network from [11], (D) bidirectional ring network, (E) small world network, and (F) unidirectional ring network.
As before, there is substantial variability in the behaviour of all measures, but some general patterns are apparent.Intriguingly, the unidirectional ring network is consistently judged by all measures (except for Φ) as the most complex, followed in most cases by the weighted Φ-optimal network. 7On the other end of the spectrum, the fully connected network A is also consistently judged as the least complex network, which is explained by the large correlation between its nodes as shown by Σ.
The results here can be summarised by comparing the relative complexity assigned to the networks by each measure -that is, to what extent do measures agree on which network is more complex than which.For convenience, we show the measure-dependent ranking of the network complexity in Table 3.
Inspecting this table reveals a remarkable alignment between TDMI, Φ G , Φ * , and ψ, especially given how much their behaviour diverges when varying a and c.Although the particular values are different, the measures largely agree on the ranking of the networks based on their integrated information.This consistency of ranking is initially encouraging with regard to empirical application.However, the ranking is not what might be expected from topological complexity measures from network theory.If we ranked these networks by e.g.small-world index, we expect networks B, C, and E to be at the top and networks A, D, and F to be at the bottom -very different from any of the rankings in Table 3. 8 In fact, the Spearman correlation between the ranking by small-world index and  5, normalised to spectral radius 0.9 and under the influence of uncorrelated noise.The ring and weighted Φ-optimal networks score consistently at the top, while denser networks like the fully connected and the binary Φ-optimal networks are usually at the bottom.Most measures disagree on specific values but agree on the relative complexity ranking of the networks.Table 3: Networks ranked according to their value of each integrated information measure (highest value to the left).We add small-world index as a dynamics-agnostic measure of network complexity.
Measure Ranking the networks we consider are small and sparse, we use the 4 th -order cliques (instead of triangles, which are 3 rd -order cliques) to calculate the clustering coefficient [50].
those by TDMI, Φ G , Φ * , and ψ is around −0.4, leading to the counterintuitive conclusion that more complex networks in fact integrate less information.We note that these rankings are very robust to noise correlation (results not shown) for all measures except Φ.Indeed, across all simulations in this study the behaviour of Φ is erratic, undermining prospects for empirical application.(This behaviour is even more prevalent if Φ is optimised over all bipartitions, as opposed to over even bipartitions.)

Random networks
We next perform a more general analysis of the performance of measures of integrated information, using Erdős-Rényi random networks.We consider Erdős-Rényi random networks parametrised by two numbers: the edge density of the network ρ and the noise correlation c (defined as above), both in the [0, 1) interval.To sample a network with a given ρ, we generate a matrix in which each possible edge is present with probability ρ and then remove self-loops.The stochasticity in the construction of the Erdős-Rényi network induces fluctuations on the integrated information measures, such that for each (ρ, c) we calculate the mean and variance of each measure.First, we generate 50 networks for each point in the (ρ, c) plane and take the mean of each integrated information measure evaluated on those 50 networks.As before, the adjacency matrices are normalised to a spectral radius of 0.9.Results are shown in Fig. 7.
Φ G increases markedly with ρ and moderately with c, Σ increases sharply with both and the rest of the measures can be divided in two groups, with Φ, ψ and CD that decrease with c and TDMI, Φ and Φ * that increase.Notably, all integrated information measures except Φ G show a band of high value at an intermediate value of ρ.This demonstrates their sensitivity to the level of integration.The decrease when ρ is increased beyond a certain point is due to the weakening of the individual connections in that case (due to the fixed overall coupling strength, as quantified by spectral radius).
Secondly, in Fig. 8 we plot each measure against the average correlation of each network, following the rationale that a good complexity index should peak at an intermediate value of Σ -i.e. it should reach its maximum value in the middle range of Σ.To obtain this figure we sampled a large number of Erdős-Rényi networks with random (ρ, c), and evaluated all integrated information measures, as well as their average correlation Σ. Fig. 8 shows that some of the measures have this intermediate peak, in particular: Φ * , ψ, Φ G , and CD.Although also showing a modest intermediate peak, Φ has a stronger overall positive trend with Σ, and Φ an overall negative trend.These analyses further support Φ * , ψ, Φ G , and CD as valid complexity measures, although the relation between them remains unclear and not always consistent in other scenarios.
One might worry that these peaks could be due to a biased sampling of the Σ axis -if our sampling scheme were obtaining many more samples in, say, the 0.2 < Σ < 0.4 range, then the points with high Φ we see in that range could be explained by the fact that the high-Φ tails of the distribution are sampled better in that range than in the rest of the Σ axis.However, the histogram at the bottom of Fig. 8 shows this is not the case -on the contrary, the samples are relatively uniformly spread along the axis.Therefore, the peaks shown by Φ * , ψ, Φ G , and CD are not sampling artefacts.

Discussion
In this study we compared several candidate measures of integrated information in terms of their theoretical construction, and their behaviour when applied to the dynamics generated by a range of non-trivial network architectures.We found that no two measures had precisely the same basic mathematical properties, see Table 2. Empirically, we found a striking variability in the behaviour among the measures even for simple systems, see Table 4 for a summary.Of the measures we have considered, ψ, Φ * and CD best capture conjoined segregation and integration on small networks, when animated with Gaussian linear AR dynamics (Fig. 2).These measures decrease with increasing noise input correlation and increase with increasing coupling strength (Fig. 4).Further, on random networks with fixed overall coupling strength (as quantified by spectral radius), they achieve their highest scores when an intermediate number of connections are present (Fig. 7).They also obtain their highest scores when the average correlation across components takes an intermediate value (Fig. 8).In terms of network topology, none of the measures strongly reflect complexity of the network structure in a graph theoretic sense.At fixed overall coupling strength, a simple ring structure (Fig. 5) leads in most cases to the highest scores.Among the other measures: Φ is largely determined by the level of correlation amongst the noise inputs, and is not very sensitive to changes in coupling strength; Φ G depends mainly on the overall coupling strength, and is not very sensitive to changes in noise input correlation; and Φ generally behaves erratically.
Considered together, our results motivate the continued development of ψ, Φ * and CD as theoretically sound and empirically adequate measures of integrated information.

Partition selection
Integrated information is typically defined as the effective information beyond the minimum information partition [5,44].However, when a particular measure of integrated information has been first introduced, it is often with a new operationalisation of both effective information and the minimum information partition.In this paper we have restricted attention to comparing different choices of measure of effective information, while keeping the same partition selection scheme across all measures.Specifically, we restricted the partition search to even-sized bipartitions, which has the advantage of obviating the need for introducing a normalisation factor when comparing bipartitions with different sizes, see Section 3.2.For uneven partitions, normalisation factors are required to compensate for the fact that there is less capacity for information sharing as compared to even partitions.However, such factors are known to introduce instabilities, both under continuous parameter changes, and in terms of numerical errors [11].Further research is needed to compare different approaches to defining the minimum information partition, or finding an approximation to it in reasonable computation time [42].
In terms of computation time, performing the most thorough search, through all partitions, as in the early formulation of Φ by Balduzzi and Tononi [5] requires time O(n n ) 9 .Restricting attention to bipartitions reduces this to O(2 n ), whilst restricting to even bipartitions reduces this further to O(n 2 ).These observations highlight a trade-off between computation time and comprehensive consideration of possible partitions.Future comparisons of integrated information measures may benefit from more advanced methods for searching among a restricted set of partitions to obtain a good approximation to the minimum information partition.For example, Toker and Sommer use graph modularity, stochastic block models or spectral clustering as informed heuristics to suggest a small number of partitions likely to be close to the MIP, and then take the minimum over those.With these approximations they are able to calculate the MIP of networks with hundreds of nodes [42,43].Alternatively, Hidaka and Oizumi make use of the submodularity of mutual information to perform efficient optimisation and find the bipartition across which there is the least instantaneous mutual information of the system [21].Presently, however, their method is valid only for instantaneous mutual information and is therefore not applicable to finding the bipartition that minimises any form of normalised effective information as described in Section 3.2.
Further, each measure carries special considerations regarding partition search.For example, for ψ, taking the minimum across all partitions is equivalent to taking it across bipartitions only, thanks to the properties of I ∩ [49,9,38].Arsiwalla and Verschure [3] used Φ and suggested always using the atomic partition on the basis that it is fast, well-defined, and for Φ specifically it can be proven to be the partition of maximum information; and thus it provides a quickly computable upper bound for the measure.

Continuous variables and the linear Gaussian assumption
We have compared the various integrated information measures only on systems whose states are given by continuous variables with a Gaussian distribution.This is motivated by measurement variables being best characterised as continuous in many domains of potential application.Future research should continue the comparison of these measures on a test-bed of systems with discrete variables.Moreover, non-Gaussian continuous systems should also be considered because the Gaussian approximation is not always a good fit to real data.For example, the spiking activity of populations of neurons typically exhibit exponentially distributed dynamics [17].Systems with discrete variables are in principle straightforward to deal with, since calculating probabilities (following the 9 More precisely, as the Bell number Bn. most brute-force approach) amounts simply to counting occurrences of states.General continuous systems, however, are less straightforward.Estimating generic probability densities in a continuous domain is challenging, and calculating information-theoretic quantities on these is difficult [25,46].The AR systems we have studied here are a rare exception, in the sense that their probability density can be calculated and all relevant information-theoretic quantities have an analytical expression.Nevertheless, the Gaussian assumption is common in biology, and knowing now how these measures behave on these Gaussian systems will inform further development of these measures, and motivate their application more broadly.

Empirical as opposed to maximum entropy distribution
We have considered versions of each measure that quantify information with respect to the empirical, or spontaneous, stationary distribution for the state of the system.This constitutes a significant divergence from the supposedly fundamental measures of intrinsic integrated information of IIT versions 2 and 3 [5,34].Those measures are based on information gained about a hypothetical past moment in which the system was equally likely to be in any one of its possible states (the 'maximum entropy' distribution).However, as pointed out previously [11], it is not possible to extend those measures, developed for discrete Markovian systems, to continuous systems.This is because there is no uniquely defined maximum entropy distribution for a continuous random variable (unless it has hard-bounds, i.e. a closed and bounded set of states).Hence, quantification of information with respect to the empirical distribution is the pragmatic choice for construction of an integrated information measure applicable to continuous time-series data.
The consideration of information with respect to the empirical, as opposed to maximum entropy, distribution does however have an effect on the concept underlying the measure of integrated information -it results in a measure not of mechanism, but of dynamics [10].That is, what is measured is not information about what the possible mechanistic causes of the current state could be, but rather what the likely preceding states actually are, on average, statistically; see [11] for further discussion.Given the diversity of behaviour of the various integrated information measures considered here even on small networks with linear dynamics, one must remain cautious about considering them as generalisations or approximations of the proposed 'fundamental' Φ measures of IIT versions 2 or 3 [5,34].
A remaining important challenge, in many practical scenarios, is the identification of stationary epochs.For a relatively long data segment, it can be unrealistic to assume that all the statistics are constant throughout.For shorter data segments, one can not be confident that the system has explored all the states that it potentially would have, given enough time.

Final remarks
The further development, and empirical application of Integrated Information Theory requires a satisfactory informational measure of dynamical complexity.During the last few years several measures have been proposed, but their behaviour in any but the simplest cases has not been extensively characterised or compared.In this study, we have reviewed several candidate measures of integrated information, and provided a comparative analysis on simulated data, generated by simple Gaussian dynamics applied to a range of network topologies.
Assessing the degree of dynamical complexity, integrated information, or co-existing integration and segregation exhibited by a system remains an important outstanding challenge.Progress meeting this challenge will have implications not only for theories of consciousness, such as Integrated Information Theory, but more generally in situations where relations between local and global dynamics are of interest.The review presented here identifies promising theoretical approaches for designing adequate measures of integrated information.Further, our simulations demonstrate the need for empirical investigation of such measures, since measures that share similar theoretical properties can behave in substantially different ways, even on simple systems.
Bounded by TDMI TDMI can be defined as the M-projection of the full model p to a manifold of restricted models Q M I = {q : q(X t , X t−τ ) = q(X t )q(X t−τ )} [37].The bound Φ G ≤ I(X t ; X t−τ ) follows from the fact that Q M I ⊂ Q.

Causal density
Time-symmetric Follows from the non-symmetry of transfer entropy [47].
Non-negative Re-writing CD as a sum of conditional MI terms, follows from (MI-2).
Bounded by TDMI Proven in Appendix B.

Figure 2 :
Figure 2: (A) Graphical representation of the two-node AR process described in Eq.(38).Two connected nodes with coupling strength a receive noise with correlation c, which can be thought of as coming from a common source.(B) All integrated information measures for different noise correlation levels c.

Figure 3 :
Figure3: All integrated information measures for the two-node AR process described in Eq.(38), for different coupling strengths a and noise correlation levels c.Vertical axis is inverted for visualisation purposes.

Figure 6 :
Figure6: Integrated information measures for all networks in the suite shown in Fig.5, normalised to spectral radius 0.9 and under the influence of uncorrelated noise.The ring and weighted Φ-optimal networks score consistently at the top, while denser networks like the fully connected and the binary Φ-optimal networks are usually at the bottom.Most measures disagree on specific values but agree on the relative complexity ranking of the networks.

Figure 7 :
Figure 7: Average integrated information measures for Erdős-Rényi random networks with given density ρ and noise correlation c.

5 ΣFigure 8 :
Figure 8: Integrated information measures of random Erdős-Rényi networks, plotted against the average correlation Σ of the same network.(bottom) Normalised histogram of Σ for all sampled networks.

Table 1 :
Integrated information measures considered and original references.

Table 4 :
Integrated information measures considered and brief summary of our results.