# Measuring Integrated Information: Comparison of Candidate Measures in Theory and Simulation

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

**integration**, i.e., the system behaves as one; and**segregation**, i.e., the parts of the system behave independently.

## 2. Methods

#### 2.1. Notation, Convention and Preliminaries

- $I(X;Y)=I(Y;X)$,
- $I(X;Y)\ge 0$, and
- $I(f(X);g(Y))=I(X;Y)$ for any injective functions $f,g$.

#### 2.2. Integrated Information Measures

#### 2.2.1. Overview

- Whole-minus-sum integrated information, $\mathsf{\Phi}$;
- Integrated stochastic interaction, $\tilde{\mathsf{\Phi}}$;
- Integrated synergy, $\psi $;
- Decoder-based integrated information, ${\mathsf{\Phi}}^{*}$;
- Geometric integrated information, ${\mathsf{\Phi}}_{G}$; and
- Causal density, CD.

#### 2.2.2. Minimum Information Partition

#### 2.2.3. Whole-Minus-Sum Integrated Information $\mathsf{\Phi}$

**Box 1. Calculating whole-minus-sum integrated information Φ.**

- For
**discrete variables**:$$\begin{array}{c}\hfill {\displaystyle I({X}_{t-\tau};{X}_{t})=\sum _{x,{x}^{\prime}}p({X}_{t-\tau}=x,{X}_{t}={x}^{\prime})\mathrm{log}\left(\frac{p({X}_{t-\tau}=x,{X}_{t}={x}^{\prime})}{p({X}_{t-\tau}=x)\phantom{\rule{3.33333pt}{0ex}}p({X}_{t}={x}^{\prime})}\right)}\end{array}$$ - For
**continuous, linear-Gaussian variables**:$$\begin{array}{c}\hfill I({X}_{t-\tau};{X}_{t})=\frac{1}{2}\mathrm{log}\left(\frac{det\mathsf{\Sigma}({X}_{t})}{det\mathsf{\Sigma}({X}_{t}\phantom{\rule{0.166667em}{0ex}}|\phantom{\rule{0.166667em}{0ex}}{X}_{t-\tau})}\right)\end{array}$$ - For
**continuous variables**with an arbitrary distribution, we must resort to the nearest-neighbour methods introduced by [24]. See reference for details.

#### 2.2.4. Integrated Stochastic Interaction $\tilde{\mathsf{\Phi}}$

- For
**discrete variables**:$$\begin{array}{c}\hfill {\displaystyle H({X}_{t-\tau}\phantom{\rule{0.166667em}{0ex}}|\phantom{\rule{0.166667em}{0ex}}{X}_{t})=-\sum _{x,{x}^{\prime}}p({X}_{t-\tau}=x,{X}_{t}={x}^{\prime})\mathrm{log}\left(\frac{p({X}_{t-\tau}=x,{X}_{t}={x}^{\prime})}{p({X}_{t}={x}^{\prime})}\right)}\end{array}$$ - For
**continuous, linear-Gaussian variables**:$$\begin{array}{c}\hfill H({X}_{t-\tau}\phantom{\rule{0.166667em}{0ex}}|\phantom{\rule{0.166667em}{0ex}}{X}_{t})=\frac{1}{2}\mathrm{log}det\mathsf{\Sigma}({X}_{t-\tau}\phantom{\rule{0.166667em}{0ex}}|\phantom{\rule{0.166667em}{0ex}}{X}_{t})+\frac{1}{2}n\mathrm{log}(2\pi e)\end{array}$$ - For
**continuous variables**with an arbitrary distribution, we must resort to the nearest-neighbour methods introduced by [24]. See reference for details.

#### 2.2.5. Integrated Synergy $\psi $

**Box 3. Calculating integrated synergy ψ.**

- For
**discrete variables**: (following Griffith and Koch’s [30] PID scheme)$$\begin{array}{cc}\hfill {I}_{\cup}({M}_{t-\tau}^{1},\dots ,{M}_{t-\tau}^{r};{X}_{t})=& \hfill {\displaystyle \underset{q}{min}\sum _{x,{x}^{\prime}}q(x,{x}^{\prime})\mathrm{log}\left(\frac{q(x,{x}^{\prime})}{q(x)\phantom{\rule{3.33333pt}{0ex}}q({x}^{\prime})}\right)}\\ \hfill \mathrm{s}.\mathrm{t}.\phantom{\rule{3.33333pt}{0ex}}\phantom{\rule{3.33333pt}{0ex}}& \hfill q({M}_{t-\tau}^{i},{X}_{t})=p({M}_{t-\tau}^{i},{X}_{t})\end{array}$$ - For
**continuous, linear-Gaussian variables**:$$\begin{array}{c}\hfill {I}_{\cup}({M}_{t-\tau}^{1},\dots ,{M}_{t-\tau}^{r};{X}_{t})=\underset{k}{max}I({M}_{t-\tau}^{k};{X}_{t})\end{array}$$ - For
**continuous variables**with an arbitrary distribution: unknown.

#### 2.2.6. Decoder-Based Integrated Information ${\mathsf{\Phi}}^{*}$

**Box 4. Calculating decoder-based integrated information Φ*.**

- For
**discrete variables**:$$\begin{array}{c}{\displaystyle \tilde{I}(\beta ;X,\tau ,\mathcal{P})=-\sum _{{x}^{\prime}}p({X}_{t}={x}^{\prime})\mathrm{log}\sum _{x}p({X}_{t-\tau}=x)q{({X}_{t}={x}^{\prime}|{X}_{t-\tau}=x)}^{\beta}}\hfill \\ \hfill {\displaystyle +\sum _{x,{x}^{\prime}}p({X}_{t-\tau}=x,{X}_{t}={x}^{\prime})\mathrm{log}q{({X}_{t}={x}^{\prime}|{X}_{t-\tau}=x)}^{\beta}}\end{array}$$ - For
**continuous, linear-Gaussian variables**: (see appendix for details)$$\begin{array}{c}\hfill \tilde{I}(\beta ;X,\tau ,\mathcal{P})=\frac{1}{2}\mathrm{log}\left(\left|Q\right||{\mathsf{\Sigma}}_{x}|\right)+\frac{1}{2}\mathrm{tr}\left({\mathsf{\Sigma}}_{x}R\right)+\beta \mathrm{tr}\left({\mathsf{\Pi}}_{x|\tilde{x}}^{-1}{\mathsf{\Pi}}_{x\tilde{x}}{\mathsf{\Pi}}_{x}^{-1}{\mathsf{\Sigma}}_{\tilde{x}x}\right)\end{array}$$ - For
**continuous variables**with an arbitrary distribution: unknown.

#### 2.2.7. Geometric Integrated Information ${\mathsf{\Phi}}_{G}$

**Box 5. Calculating geometric integration Φ**

_{G}.- For
**discrete variables**: numerically optimise the objective ${D}_{KL}(p\parallel q)$ subject to the constraints$$\begin{array}{c}\hfill \sum _{x,{x}^{\prime}}q({X}_{t-\tau}={x}^{\prime},{X}_{t}=x)=1\phantom{\rule{17.0pt}{0ex}}\mathrm{and}\phantom{\rule{17.0pt}{0ex}}q({M}_{t}^{i}\phantom{\rule{0.166667em}{0ex}}|\phantom{\rule{0.166667em}{0ex}}{X}_{t-\tau})=q({M}_{t}^{i}\phantom{\rule{0.166667em}{0ex}}|\phantom{\rule{0.166667em}{0ex}}{M}_{t-\tau}^{i})\phantom{\rule{3.33333pt}{0ex}}\forall i.\end{array}$$ - For
**continuous, linear-Gaussian variables**: numerically optimise the objective$$\begin{array}{c}\hfill {\mathsf{\Phi}}_{G}[X;\tau ,\mathcal{P}]=\underset{\mathsf{\Sigma}{(E)}^{\prime}}{min}\frac{1}{2}\mathrm{log}\frac{{|\mathsf{\Sigma}(E)}^{\prime}|}{|\mathsf{\Sigma}(E)|}\phantom{\rule{3.33333pt}{0ex}},\end{array}$$$$\begin{array}{c}\mathsf{\Sigma}{(E)}^{\prime}=\mathsf{\Sigma}(E)+(A-{A}^{\prime})\mathsf{\Sigma}(X){(A-{A}^{\prime})}^{\mathrm{T}}\phantom{\rule{21.25pt}{0ex}}\mathrm{and}\hfill \\ {(\mathsf{\Sigma}(X)(A-{A}^{\prime})\mathsf{\Sigma}{(E)}^{\prime -1})}_{ii}=0.\hfill \end{array}$$ - For
**continuous variables**with an arbitrary distribution: unknown.

#### 2.2.8. Causal Density

**Box 6. Calculating causal density CD.**

- For
**discrete variables**:$$\begin{array}{c}\begin{array}{c}T{E}_{\tau}({X}^{i}\to {X}^{j}\phantom{\rule{0.166667em}{0ex}}|\phantom{\rule{0.166667em}{0ex}}{X}^{\left[ij\right]})=\hfill \\ \hspace{1em}\hspace{1em}\hspace{1em}{\displaystyle \sum _{x,{x}^{\prime}}p\left({X}_{t+\tau}^{j}={x}^{\prime j},{X}_{t}=x\right)\mathrm{log}\left(\frac{p\left({X}_{t+\tau}^{j}={x}^{\prime j}\phantom{\rule{0.166667em}{0ex}}|\phantom{\rule{0.166667em}{0ex}}{X}_{t}=x\right)}{p\left({X}_{t+\tau}^{j}={x}^{\prime j}\phantom{\rule{0.166667em}{0ex}}|\phantom{\rule{0.166667em}{0ex}}{X}_{t}^{j}={x}^{j},{X}_{t}^{\left[ij\right]}={x}^{\left[ij\right]}\right)}\right)}\hfill \end{array}\hfill \end{array}$$ - For
**continuous, linear-Gaussian variables**:$$\begin{array}{c}\hfill T{E}_{\tau}({X}^{i}\to {X}^{j}\phantom{\rule{0.166667em}{0ex}}|\phantom{\rule{0.166667em}{0ex}}{X}^{\left[ij\right]})=\frac{1}{2}\mathrm{log}\left(\frac{det\mathsf{\Sigma}\left({X}_{t+\tau}^{j}\phantom{\rule{0.166667em}{0ex}}|\phantom{\rule{0.166667em}{0ex}}{X}_{t}^{j}\oplus {X}_{t}^{\left[ij\right]}\right)}{det\mathsf{\Sigma}\left({X}_{t+\tau}^{j}\phantom{\rule{0.166667em}{0ex}}|\phantom{\rule{0.166667em}{0ex}}{X}_{t}\right)}\right)\end{array}$$ - For
**continuous variables**with an arbitrary distribution, we must resort to the nearest-neighbour methods introduced by [24]. See reference for details.

#### 2.2.9. Other Measures

## 3. Results

- Whole-minus-sum integrated information, $\mathsf{\Phi}$,
- Integrated stochastic interaction, $\tilde{\mathsf{\Phi}}$,
- Decoder-based integrated information, ${\mathsf{\Phi}}^{*}$,
- Geometric integrated information, ${\mathsf{\Phi}}_{G}$,
- Integrated synergy, $\psi $,
- Causal density, CD.

- Time-delayed mutual information (TDMI), $I({X}_{t-\tau},{X}_{t})$; and
- Average absolute correlation $\overline{\mathsf{\Sigma}}$, defined as the average absolute value of the non-diagonal entries in the system’s correlation matrix.

#### 3.1. Key Quantities for Computing the Integrated Information Measures

`dlyap`command. The lagged covariance can also be calculated from the parameters of the AR process as

#### 3.2. Two-Node Network

#### 3.3. Eight-Node Networks

**A**- A fully connected network without self-loops.
**B**- The $\mathsf{\Phi}$-optimal binary network presented in [2].
**C**- The $\mathsf{\Phi}$-optimal weighted network presented in [2].
**D**- A bidirectional ring network.
**E**- A “small-world” network, formed by introducing two long-range connections to a bidirectional ring network.
**F**- A unidirectional ring network.

**A**consistently scores lowest, which is explained by the large correlation between its nodes as shown by $\overline{\mathsf{\Sigma}}$.

**B**,

**C**, and

**E**to be at the top and networks

**A**,

**D**, and

**F**to be at the bottom—very different from any of the rankings in Table 3. In fact, the Spearman correlation between the ranking by small-world index and those by TDMI, ${\mathsf{\Phi}}_{G}$, ${\mathsf{\Phi}}^{*}$, and $\psi $ is around $-0.4$, leading to the conclusion that more structurally complex networks integrate less information. We note that these rankings are very robust to noise correlation (results not shown) for all measures except $\mathsf{\Phi}$. Across all simulations in this study, the behaviour of $\mathsf{\Phi}$ is erratic, undermining prospects for empirical application. (This behaviour is even more prevalent if $\mathsf{\Phi}$ is optimised over all bipartitions, as opposed to over even bipartitions.)

#### 3.4. Random Networks

## 4. Discussion

#### 4.1. Partition Selection

#### 4.2. Continuous Variables and the Linear Gaussian Assumption

#### 4.3. Empirical as Opposed to Maximum Entropy Distribution

## 5. Final Remarks

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Appendix A. Derivation and Concavity Proof of I *

#### Appendix A.1. Derivation of I * in Gaussian Systems

#### Appendix A.2. $\tilde{I}$(β) Is Concave in β in Gaussian Systems

- An affine function preserves concavity, in the sense that a linear combination of convex (concave) functions is also convex (concave).
- A non-negative weighted sum preserves concavity. Since $p(x)>0$, the outer integral in Equation (A20) preserves concavity,

## Appendix B. Bounds on Causal Density

## Appendix C. Properties of Integrated Information Measures

- MI-1
- $I(X;Y)=I(Y;X)$,
- MI-2
- $I(X;Y)\ge 0$,
- MI-3
- $I(f(X);g(Y))=I(X;Y)$ for any injective functions $f,g$,

#### Appendix C.1. Whole-Minus-Sum Integrated Information Φ

**Time-symmetric**- Follows from (MI-1).
**Non-negative**- Proof by example. If ${X}_{t}^{i}={X}_{t}^{j}$, we have $\mathsf{\Phi}=(1-N)I({X}_{t}^{i};{X}_{t-\tau}^{i})\le 0$.
**Rescaling-invariant**- Follows from (MI-3) when Balduzzi and Tononi’s [12] normalisation factor is not used.
**Bounded by TDMI**- Follows from (MI-2).

#### Appendix C.2. Integrated Stochastic Interaction $\tilde{\mathsf{\Phi}}$

**Time-symmetric**- Follows from $H({X}_{t}|{H}_{t-\tau})=H({X}_{t-\tau}|{H}_{t})$, which can be proved starting from the system temporal joint entropy

**Non-negative**- Follows from the fact that $\tilde{\mathsf{\Phi}}$ is an M-projection (see Reference [4]).
**Rescaling-invariant**- Follows from the non-invariance of differential entropy [18] (regardless of whether a normalisation factor is used).
**Bounded by TDMI**- Proof by counterexample. In the two-node AR process of the main text $\tilde{\mathsf{\Phi}}\to \infty $ as $c\to 1$, although TDMI remains finite.

#### Appendix C.3. Integrated Synergy ψ

**Time-symmetric**- Proof by counterexample—for the AR system with$$\begin{array}{c}\hfill A=\left(\begin{array}{cc}a& a\\ 0& 0\end{array}\right)\phantom{\rule{2.em}{0ex}},\phantom{\rule{2.em}{0ex}}\mathsf{\Sigma}(\epsilon )=\left(\begin{array}{cc}1& 0\\ 0& 1\end{array}\right).\end{array}$$

#### Appendix C.4. Decoder-Based Integrated Information Φ*

**Non-negative**- Follows from ${I}^{*}[X;\tau ,\mathcal{P}]\le I({X}_{t};{X}_{t-\tau})$, proven in Reference [36].
**Rescaling-invariant**- Assume that the measure is computed on a time series of rescaled data ${X}_{t}^{r}={X}_{t}A$, where A is a diagonal matrix with positive real numbers. Then, its covariance is related to the covariance of the original time series as ${\mathsf{\Sigma}}_{X}^{r}=\mathbb{E}\left[{{X}_{t}^{r}}^{\mathrm{T}}{X}_{t}^{r}\right]=\mathbb{E}\left[{A}^{\mathrm{T}}{X}_{t}^{\mathrm{T}}{X}_{t}A\right]={A}^{2}{\mathsf{\Sigma}}_{X}$. We can analogously calculate ${\mathsf{\Pi}}_{X},{\mathsf{\Pi}}_{X\tilde{X}},{\mathsf{\Pi}}_{X|\tilde{X}}$ and easily verify that all A’s cancel out, proving the invariance.
**Bounded by TDMI**- Follows from ${I}^{*}[X;\tau ,\mathcal{P}]\ge 0$, proven in Reference [36].

#### Appendix C.5. Geometric Integrated Information Φ_{G}

**Time-symmetric**- Follows from the symmetry in the constraints that define the manifold of restricted models Q [4].
**Non-negative**- Follows from the fact that ${\mathsf{\Phi}}_{G}$ is an M-projection [4].
**Rescaling-invariant**- Given a Gaussian distribution p with covariance ${\mathsf{\Sigma}}_{p}$, its M-projection in Q is another Gaussian with covariance ${\mathsf{\Sigma}}_{q}$. Given a new distribution ${p}^{\prime}$ formed by rescaling some of the variables in p, the M-projection of ${p}^{\prime}$ is a Gaussian with covariance ${A}^{2}{\mathsf{\Sigma}}_{q}$ with A a diagonal positive matrix (see above), which satisfies ${D}_{KL}(p\parallel q)={D}_{KL}({p}^{\prime}\parallel {q}^{\prime})$ and therefore ${\mathsf{\Phi}}_{G}$ is invariant to rescaling.
**Bounded by TDMI**- TDMI can be defined as the M-projection of the full model p to a manifold of restricted models ${Q}^{MI}=\{q\phantom{\rule{3.33333pt}{0ex}}:q({X}_{t},{X}_{t-\tau})=q({X}_{t})q({X}_{t-\tau})\}$ [4]. The bound ${\mathsf{\Phi}}_{G}\le I({X}_{t};{X}_{t-\tau})$ follows from the fact that ${Q}^{MI}\subset Q$.

#### Appendix C.6. Causal Density

**Time-symmetric**- Follows from the non-symmetry of transfer entropy [61].
**Non-negative**- Re-writing CD as a sum of conditional MI terms, follows from (MI-2).
**Rescaling-invariant**- Follows from (MI-3).
**Bounded by TDMI**- Proven in S2 Appendix.

## References

- Holland, J. Complexity: A Very Short Introduction; Oxford University Press: Oxford, UK, 2014. [Google Scholar]
- Barrett, A.B.; Seth, A.K. Practical measures of integrated information for time-series data. PLoS Comput. Biol.
**2011**, 7, e1001052. [Google Scholar] [CrossRef] [PubMed] - Griffith, V. A principled infotheoretic ϕ-like measure. arXiv, 2014; arXiv:1401.0978. [Google Scholar]
- Oizumi, M.; Tsuchiya, N.; Amari, S.-I. A unified framework for information integration based on information geometry. arXiv, 2015; arXiv:1510.04455. [Google Scholar]
- Oizumi, M.; Amari, S.-I.; Yanagawa, T.; Fujii, N.; Tsuchiya, N. Measuring integrated information from the decoding perspective. arXiv, 2015; arXiv:1505.04368. [Google Scholar]
- Toker, D.; Sommer, F.T. Great than the sum: Integrated information in large brain networks. arXiv, 2017; arXiv:1708.02967. [Google Scholar]
- Mediano, P.A.M.; Farah, J.C.; Shanahan, M.P. Integrated information and metastability in systems of coupled oscillators. arXiv, 2016; arXiv:1606.08313. [Google Scholar]
- Tagliazucchi, E. The signatures of conscious access and its phenomenology are consistent with large-scale brain communication at criticality. Conscious. Cogn.
**2017**, 55, 136–147. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Oizumi, M.; Albantakis, L.; Tononi, G. From the phenomenology to the mechanisms of consciousness: Integrated information theory 3.0. PLoS Comput. Biol.
**2014**, 10, e1003588. [Google Scholar] [CrossRef] [PubMed] - Tononi, G.; Sporns, O.; Edelman, G.M. A measure for brain complexity: Relating functional segregation and integration in the nervous system. Proc. Natl. Acad. Sci. USA
**1994**, 91, 5033–5037. [Google Scholar] [CrossRef] [PubMed] - Sporns, O. Complexity. Scholarpedia
**2007**, 2, 1623. [Google Scholar] [CrossRef] - Balduzzi, D.; Tononi, G. Integrated information in discrete dynamical dystems: Motivation and theoretical framework. PLoS Comput. Biol.
**2008**, 4, e1000091. [Google Scholar] [CrossRef] [PubMed] - Seth, A.K.; Barrett, A.B.; Barnett, L. Causal density and integrated information as measures of conscious level. Philos. Trans. A
**2011**, 369, 3748–3767. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Granger, C.W.J. Investigating causal relations by econometric models and cross-spectral methods. Econometrica
**1969**, 37, 424. [Google Scholar] [CrossRef] - Seth, A.K.; Izhikevich, E.; Reeke, G.N.; Edelman, G.M. Theories and measures of consciousness: An extended framework. Proc. Natl. Acad. Sci. USA
**2006**, 103, 10799–10804. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Kanwal, M.S.; Grochow, J.A.; Ay, N. Comparing information-theoretic measures of complexity in Boltzmann machines. Entropy
**2017**, 19, 310. [Google Scholar] [CrossRef] - Tegmark, M. Improved measures of integrated information. arXiv, 2016; arXiv:1601.02626. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements Information Theory; Wiley: Hoboken, NJ, USA, 2006. [Google Scholar]
- The formal derivation of the differential entropy proceeds by considering the entropy of a discrete variable with k states, and taking the k→∞ limit. The result is the differential entropy plus a divergent term that is usually dropped and is ultimately responsible for the undesirable properties of differential entropy. In the case of I(X;Y) the divergent terms for the various entropies involved cancel out, restoring the useful properties of its discrete counterpart.
- Although the origins of causal density go as back as 1969, it hasn’t been until the last decade that it has found its way into neuroscience. The paper referenced in the table acts as a modern review of the properties and behaviour of causal density. This measure is somewhat distinct from the others, but is still a measure of complexity based on information dynamics between the past and current state; therefore its inclusion here will be useful.
- Krohn, S.; Ostwald, D. Computing integrated information. arXiv, 2016; arXiv:1610.03627. [Google Scholar]
- The c and e here stand respectively for cause and effect. Without an initial condition, here that the uniform distribution holds at time 0, there would be no well-defined probability distribution for these states. Further, Markovian dynamics are required for these probability distributions to be well-defined; for non-Markovian dynamics, a longer chain of initial states would have to be specified, going beyond just that at time 0.
- Barrett, A.B. An exploration of synergistic and redundant information sharing in static and dynamical gaussian systems. arXiv, 2014; arXiv:1411.2832. [Google Scholar]
- Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E
**2004**, 69, 066138. [Google Scholar] [CrossRef] [PubMed] - Ay, N. Information geometry on complexity and stochastic interaction. Entropy
**2015**, 17, 2432–2458. [Google Scholar] [CrossRef] - Wiesner, K.; Gu, M.; Rieper, E.; Vedral, V. Information-theoretic bound on the energy cost of stochastic simulation. arXiv, 2011; arXiv:1110.4217. [Google Scholar]
- Williams, P.L.; Beer, R.D. Nonnegative decomposition of multivariate information. arXiv, 2010; arXiv:1004.2515. [Google Scholar]
- Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J. Shared information—New insights and problems in decomposing information in complex systems. In Proceedings of the European Conference on Complex Systems 2012; Gilbert, T., Kirkilionis, M., Nicolis, G., Eds.; Springer: Berlin, Germany, 2012. [Google Scholar]
- Barrett’s derivation of the MMI-PID, which follows Williams and Beer and Griffith and Koch’s procedure, gives this formula when the target is univariate. We generalise the formula here to the case of multivariate target in order to render ψ computable for Gaussians. This formula leads to synergy being the extra information contributed by the weaker source given the stronger source was previously known.
- Griffith, V.; Koch, C. Quantifying synergistic mutual information. arXiv, 2012; arXiv:1205.4265. [Google Scholar]
- Rosas, F.; Ntranos, V.; Ellison, C.; Pollin, S.; Verhelst, M. Understanding interdependency through complex information sharing. Entropy
**2016**, 18, 38. [Google Scholar] [CrossRef] - Ince, R.A.A. Measuring multivariate redundant information with pointwise common change in surprisal. Entropy
**2017**, 19, 318. [Google Scholar] [CrossRef] - Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J.; Ay, N. Quantifying unique information. Entropy
**2014**, 16, 2161–2183. [Google Scholar] [CrossRef] - Kay, J.W.; Ince, R.A.A. Exact partial information decompositions for Gaussian systems based on dependency constraints. arXiv, 2018; arXiv:1803.02030. [Google Scholar]
- Latham, P.E.; Nirenberg, S. Synergy, redundancy, and independence in population codes, revisited. J. Neurosci.
**2005**, 25, 5195–5206. [Google Scholar] [CrossRef] [PubMed] - Merhav, N.; Kaplan, G.; Lapidoth, A.; Shitz, S.S. On information rates for mismatched decoders. IEEE Trans. Inf. Theory
**1994**, 40, 1953–1967. [Google Scholar] [CrossRef] - Oizumi, M.; Ishii, T.; Ishibashi, K.; Hosoya, T.; Okada, M. Mismatched decoding in the brain. J. Neurosci.
**2010**, 30, 4815–4826. [Google Scholar] [CrossRef] - Amari, S.-I.; Nagaoka, H. Methods of Information Geometry; American Mathematical Society: Providence, RI, USA, 2000. [Google Scholar]
- Amari, S.-I. Information geometry in optimization, machine learning and statistical inference. Front. Electr. Electron. Eng. China
**2010**, 5, 241–260. [Google Scholar] [CrossRef] - Boyd, S.S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
- Seth, A.K. Causal connectivity of evolved neural networks during behavior. Netw. Comput. Neural Syst.
**2005**, 16, 35–54. [Google Scholar] [CrossRef] [Green Version] - Barnett, L.; Barrett, A.B.; Seth, A.K. Granger causality and transfer entropy are equivalent for Gaussian variables. Phys. Rev. Lett.
**2009**, 103, 238701. [Google Scholar] [CrossRef] [PubMed] - Barnett, L.; Seth, A.K. Behaviour of granger causality under filtering: Theoretical invariance and practical application. J. Neurosci. Methods
**2011**, 201, 404–419. [Google Scholar] [CrossRef] [PubMed] - Lindner, M.; Vicente, R.; Priesemann, V.; Wibral, M. TRENTOOL: A matlab open source toolbox to analyse information flow in time series data with transfer entropy. BMC Neurosci.
**2011**, 12, 119. [Google Scholar] [CrossRef] [PubMed] - Lizier, J.T.; Heinzle, J.; Horstmann, A.; Haynes, J.-D.; Prokopenko, M. Multivariate information-theoretic measures reveal directed information structure and task relevant changes in fMRI connectivity. J. Comput. Neurosci.
**2010**, 30, 85–107. [Google Scholar] [CrossRef] [PubMed] - Mediano, P.A.M.; Shanahan, M.P. Balanced information storage and transfer in modular spiking neural networks. arXiv, 2017; arXiv:1708.04392. [Google Scholar]
- Barnett, L.; Seth, A.K. The MVGC multivariate granger causality toolbox: A new approach to granger-causal inference. J. Neurosci. Methods
**2014**, 223, 50–68. [Google Scholar] [CrossRef] [PubMed] - Lütkepohl, H. New Introduction to Multiple Time Series Analysis; Springer: New York, NY, USA, 2005. [Google Scholar]
- According to an anonymous reviewer, Φ
_{G}does decrease with noise correlation in discrete systems, although in this article we focus exclusively in Gaussian systems. - Note that in Figure 5 the Φ-optimal networks
**B**and**C**score much less than simpler network**F**. This is because all networks have been scaled to a spectral radius of 0.9—when the networks are normalised to a spectral radius of 0.5, as in the original paper, then**B**and**C**are, as expected, the networks with highest Φ. - Humphries, M.D.; Gurney, K. Network ‘small-world-ness:’ A quantitative method for determining canonical network equivalence. PLoS ONE
**2008**, 3, e0002051. [Google Scholar] [CrossRef] [PubMed] - Yin, H.; Benson, A.R.; Leskovec, J. Higher-order clustering in networks. arXiv, 2017; arXiv:1704.03913. [Google Scholar]
- The small-world index of a network is defined as the ratio between its clustering coefficient and its mean minimum path length, normalised by the expected value of these measures on a random network of the same density. Since the networks we consider are small and sparse, we use the 4th order cliques (instead of triangles, which are 3rd order cliques) to calculate the clustering coefficient.
- Tononi, G.; Sporns, O. Measuring information integration. BMC Neurosci.
**2003**, 4, 31. [Google Scholar] [CrossRef] - Toker, D.; Sommer, F. Moving past the minimum information partition: How to quickly and accurately calculate integrated information. arXiv, 2016; arXiv:1605.01096. [Google Scholar]
- Hidaka, S.; Oizumi, M. Fast and exact search for the partition with minimal information loss. arXiv, 2017; arXiv:1708.01444. [Google Scholar]
- Arsiwalla, X.D.; Verschure, P.F.M.J. Integrated information for large complex networks. In Proceedings of the 2013 International Joint Conference on Neural Networks (IJCNN), Dallas, TX, USA, 4–9 August 2013; pp. 1–7. [Google Scholar]
- Dayan, P.; Abbott, L.F. Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems; MIT Press: Cambridge, MA, USA, 2001. [Google Scholar]
- Wang, Q.; Kulkarni, S.R.; Verdu, S. Divergence estimation for multidimensional densities via k-nearest-neighbor distances. IEEE Trans. Inf. Theory
**2009**, 55, 2392–2405. [Google Scholar] [CrossRef] - Barrett, A.B.; Barnett, L. Granger causality is designed to measure effect, not mechanism. Front. Neuroinform.
**2013**, 7, 6. [Google Scholar] [CrossRef] - Wibral, M.; Vicente, R.; Lizier, J.T. (Eds.) Directed Information Measures in Neuroscience; Understanding Complex Systems; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]

**Figure 1.**(

**A**) graphical representation of the two-node AR process described in Equation (37). Two connected nodes with coupling strength a receive noise with correlation c, which can be thought of as coming from a common source; (

**B**) all integrated information measures for different noise correlation levels c.

**Figure 2.**All integrated information measures for the two-node AR process described in Equation (37), for different coupling strengths a and noise correlation levels c. The vertical axis is inverted for visualisation purposes.

**Figure 3.**All integrated information measures for the $\mathsf{\Phi}$-optimal AR process proposed by [2], for different coupling strengths a and noise correlation levels c. Vertical axis is inverted for visualisation purposes.

**Figure 4.**Networks used in the comparative analysis of integrated information measures. (

**A**) fully connected network; (

**B**) $\mathsf{\Phi}$-optimal binary network from [2]; (

**C**) $\mathsf{\Phi}$-optimal weighted network from [2]; (

**D**) bidirectional ring network; (

**E**) small world network; and (

**F**) is a unidirectional ring network.

**Figure 5.**Integrated information measures for all networks in the suite shown in Figure 4, normalised to spectral radius $0.9$ and under the influence of uncorrelated noise. The ring and weighted $\mathsf{\Phi}$-optimal networks score consistently at the top, while denser networks like the fully connected and the binary $\mathsf{\Phi}$-optimal networks are usually at the bottom. Most measures disagree on specific values but agree on the relative ranking of the networks.

**Figure 7.**Integrated information measures of random Erdős–Rényi networks, plotted against the average correlation $\overline{\mathsf{\Sigma}}$ of the same network; (bottom) normalised histogram of $\overline{\mathsf{\Sigma}}$ for all sampled networks.

Measure | Description | Reference |
---|---|---|

$\mathsf{\Phi}$ | Information lost after splitting the system | [12] |

$\tilde{\mathsf{\Phi}}$ | Uncertainty gained after splitting the system | [2] |

$\psi $ | Synergistic predictive information between parts of the system | [3] |

${\mathsf{\Phi}}^{*}$ | Past state decoding accuracy lost after splitting the system | [5] |

${\mathsf{\Phi}}_{G}$ | Information-geometric distance to system with disconnected parts | [4] |

CD | Average pairwise directed information flow | [13] |

**Table 2.**Overview of properties of integrated information measures, proofs in Appendix C.

$\mathbf{\Phi}$ | $\tilde{\mathbf{\Phi}}$ | $\mathbf{\psi}$ | ${\mathbf{\Phi}}^{*}$ | ${\mathbf{\Phi}}_{\mathit{G}}$ | CD | |
---|---|---|---|---|---|---|

Time-symmetric | ✓ | ✓ | ✕ | ? | ✓ | ✕ |

Non-negative | ✕ | ✓ | ✓ | ✓ | ✓ | ✓ |

Invariant to variable rescaling | ✓ | ✕ | ✓ | ✓ | ✓ | ✓ |

Upper-bounded by time-delayed mutual information | ✓ | ✕ | ✓ | ✓ | ✓ | ✓ |

Known estimators for arbitrary real-valued systems | ✓ | ✓ | ✕ | ✕ | ✕ | ✓ |

Closed-form expression in discrete and Gaussian systems | ✓ | ✓ | ✓ | ✕ | ✕ | ✓ |

**Table 3.**Networks ranked according to their value of each integrated information measure (highest value to the left). We add small-world index as a dynamics-agnostic measure of network complexity.

Measure | Ranking | |||||
---|---|---|---|---|---|---|

$I({X}_{t},{X}_{t+\tau})$ | F | C | D | E | B | A |

${\mathsf{\Phi}}_{G}$ | F | C | D | E | B | A |

$\mathsf{\Phi}$ | F | C | B | E | D | A |

${\mathsf{\Phi}}^{*}$ | F | C | B | E | D | A |

$\overline{\mathsf{\Sigma}}$ | C | B | A | E | D | F |

$\tilde{\mathsf{\Phi}}$ | C | F | B | D | E | A |

$\psi $ | F | C | D | E | B | A |

CD | C | F | B | D | E | A |

SWI | C | E | B | A | D | F |

Measure | Summary of Results |
---|---|

$\mathsf{\Phi}$ | Erratic behaviour, negative when nodes are strongly correlated. |

$\tilde{\mathsf{\Phi}}$ | Mostly reflects noise input correlation, not sensitive to changes in coupling. |

$\psi $ | Reflects both segregation and integration. |

${\mathsf{\Phi}}^{*}$ | Reflects both segregation and integration. |

${\mathsf{\Phi}}_{G}$ | Mostly reflects changes in coupling, not sensitive to noise input correlation. |

CD | Reflects both segregation and integration. |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Mediano, P.A.M.; Seth, A.K.; Barrett, A.B.
Measuring Integrated Information: Comparison of Candidate Measures in Theory and Simulation. *Entropy* **2019**, *21*, 17.
https://doi.org/10.3390/e21010017

**AMA Style**

Mediano PAM, Seth AK, Barrett AB.
Measuring Integrated Information: Comparison of Candidate Measures in Theory and Simulation. *Entropy*. 2019; 21(1):17.
https://doi.org/10.3390/e21010017

**Chicago/Turabian Style**

Mediano, Pedro A.M., Anil K. Seth, and Adam B. Barrett.
2019. "Measuring Integrated Information: Comparison of Candidate Measures in Theory and Simulation" *Entropy* 21, no. 1: 17.
https://doi.org/10.3390/e21010017