Next Article in Journal
Application of Continuous Wavelet Transform and Convolutional Neural Network in Decoding Motor Imagery Brain-Computer Interface
Next Article in Special Issue
Mathematics and the Brain: A Category Theoretical Approach to Go Beyond the Neural Correlates of Consciousness
Previous Article in Journal
Comparison of the Trilateral Flash Cycle and Rankine Cycle with Organic Fluid Using the Pinch Point Temperature
Previous Article in Special Issue
Why Does Space Feel the Way it Does? Towards a Principled Account of Spatial Experience
Article
Peer-Review Record

Scaling Behaviour and Critical Phase Transitions in Integrated Information Theory

Entropy 2019, 21(12), 1198; https://doi.org/10.3390/e21121198
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Entropy 2019, 21(12), 1198; https://doi.org/10.3390/e21121198
Received: 14 March 2019 / Revised: 28 November 2019 / Accepted: 30 November 2019 / Published: 5 December 2019
(This article belongs to the Special Issue Integrated Information Theory)

Round 1

Reviewer 1 Report

Integrated Information Theory is aimed to define a measure of conscious activity (Φ), characterized as the irreducibility of a dynamical system to the sum of its components.

Grammar – “is aimed to define” is awkward.


Due to its computational cost, current versions of the theory (IIT 3.0) are difficult to apply to systems larger than a dozen of units, and in general it is not well known how integrated information scales in time and space.

This makes it sound as if IIT has never had any work on how integrated information changes across space or time – whereas Hoel et al. 2016 “Can the macro beat the micro? Integrated information across spatiotemporal scales” shows that integrated information can increase at higher scales. I think the author means how the calculation of integrated information changes as systems get larger in size.

As a result, we point to some aspects of the mathematical definitions in IIT 3.0 that are flawed to capture critical phenomena and propose a reformulation of some aspects of the theory.
Grammar – it doesn’t make sense to point to flawed definitions to capture critical phenomena.

 

Integrated Information Theory (IIT, [1]) was developed to address the problem consciousness by providing a measure of integration, Φ, that quantifies to what extent a dynamical system generates information that is irreducible to the sum of its parts considered independently.
Grammar – the problem of

(displaying a diversity locally specific patterns)

Grammar

 

Furthermore, IIT attempts to quantify the irreducibility of these complex patterns, revealing the boundaries in the organization (i.e. delimiting the parts of the system that are integrated into a functional unit) of complex dynamical systems

IIT doesn’t quantify the complexity of patterns, it quantifies the causal structure. These are very different, for instance, see Larissa et al. 2016 “The intrinsic cause-effect power of discrete dynamical systems.” Additionally, it’s unclear how IIT attempts to quantify the integrated information for complex dynamical systems, given that previously the author stated that IIT can only be run on small systems.

 

In IIT 3.0 [1], the lastest version of the framework proposed by the lab of Giulio Tononi

Spelling mistake. Additionally, I’m not sure this is the best way to reference the paper.

 

can only be computed for binary networks composed of up to a dozen of units. A variety of other measures exist to compute integrated information [3] and some of them are computationally lighter, but all share these limits to some extent.

What about all the metrics and adaptations that are used for brain imaging? Are those for binary networks as well? In what sense are large systems still a problem for the existing heuristics? There is not enough background to motivate these issues here.

 

Since most versions of Φ require computing distance measures between probability distributions of a system and finding the minimum information partition (MIP), they present important restrictions in terms of computational cost as a system scales in size.

Is the restriction really because of differences between probability distributions? Isn’t the MIP by far the limiting factor, given it is a bell number?

As a consequence, IIT measures have been limited to very small systems in general, and it is not well understood how they scale with the size or temporal scale of the system

Grammar – they

 

Previous work [4] has tried to analyse how integrated information changes with spatial and temporal coarse graining in small networks

Did they try and fail? Grammar.

 

 Aside from parctical difficulties due to its computational costs

Grammar

 

the measure of Φ might not be well defined

Grammar

 

In general, measures of Φ can only be applied to small networks

What about all the ones for brain imaging. They are just ignored consistently throughout the motivating section of the text.

 

dynamics its complex even for simple models

grammar

 

In the critical state, even in simple systems, dynamics are dominated by small bursts of local (i.e. segregated) activity, yet large avalanches of globally coordinated (i.e. integrated) activity

Grammar

 

Despite its appealing properties, critical phenomena are theoretically characterized for systems of infinite size, and can be characterized in large finite systems as divergent tendencies as they scale

Grammar


In the framework of IIT 3.0, it is required that a system satisfies the Markov property (i.e., the state at time t only depends on the state at time t − τ)

In Hoel et al. 2016 “Can the macro beat the micro? Integrated information across spatiotemporal scales” there are clearly systems that are second-order

and how it constraints the potential future or past states states.

Multiple grammatical mistakes

 

as the distance between the cause-effect structure of the system, and cause-effect structure defined by its minimum information partition

Grammar

 

we define a general model defining causal temporal interactions between variables

What would a causal non-temporal interaction be?

 

We divide system is divided into 125 different regions

Grammar

 

the MIP in most cases one of the partitions that cuts the lowest number of connections for each region.

Grammar

 

in general the MIP still is one that isolates one unit of one of the regions comprised by the mechanism.

Where is this shown “in general”

 

the level of irreducibility of the causal distributions

What are causal distributions?

 

As IIT 3.0 operates with the transition probability matrix of a system, one could compute this matrix from time t to time t + τ and compute a new probability transition matrix for a bipartition by injecting noise in the connections affected by it at time t. This implies that the system behaves normally for the following steps.

It’s unclear what’s going on here. More time needs to be explicitly spent saying why this is different, perhaps with an actual example figure if this difference is a key point of the paper.

 

The figure text from this figure is just lifted from the figure text of the previous figure. And there’s no figured 3 even cited in the text. And B in this “Figure 3” looks a lot the C in Figure 2, but cropped differently.

 

This behaviour is completely different to the one from in the effect repertoire. This should not happen for a homogeneous system in a stationary state as the one under study here.

Grammar. Also, how is the reader to know this isn’t due to your modifactions in the calculation of integrated information to adapt to Ising models rather than if one actually calculate using the real values from IIT 3.0, even though it is computationally intractable?

 

Also, again the measure of φcause fails to capture integration around the critical point.

What’s the argument behind this being necessary?

 

in [5, see Appendix B] We can see in Figure 4.A it is shown how the value of φ diverges with size.

Grammar

 

These results illustrate that different distance metrics can have important effects in the behaviour of φ.

This is an interesting result and should be highlighted a bit more.

We measure the integrated information of the mechanism φM under three different assumptions: a) that units outside of the mechanism operate normally, b) units outside the mechanism are independent noise sources, and c) units outside the mechanism are fixed as external constraints.

Again, this is quite interesting. Both the notion of exploring the difference between distributions and how to treat outside sources (as noise or so on) is extremely interesting. However, this is only explored in relationship to the Ising model. So the claims are somewhat incomplete, given the adaptations that are made. What about doing this is Markov chains and asking the same questions?

 

Second, some of the assumptions made by IIT 3.0 for computing integration over the cause repertoire of a state (the distribution of previous states) present problems for for capturing the integration at a critical point.

Grammar.

 

Plus, in the stationary case, the cause repertoire is identical to the effect repertoire, which allows to ignore the distinction.

Grammar.


GENERAL THOUGHTS 
The paper begins by arguing that IIT 3.0 doesn't scale in terms of its calculation and then introduces all sorts of assumptions in the Ising model to make it calculable. This doesn't actually solve the scaling problem so much as show that as long as enough assumptions are made, some version of IIT can be calculated. The paper begins by motivating itself by talking about scale but the takeaway is actually that assumptions in the calculation of IIT, specifically in terms of the chosen difference method and then also the choice of what distribution to use for elements outside the system, can be supported or disproven by watching how the measure operates in the Ising model. These are two mostly different problems, and yet the paper is motivated by the one that's not solved or really even argued for in the paper itself.

Overall, my impression of this paper is that it is a draft. There are numerous mistakes throughout involving grammatical errors, misplaced figures and figure texts. Two different times the figure texts of two different figures are identical. There is a figure included that is never cited. This is not a polished or final paper and should not have been submitted as such. There is a lot of repetition, and many sections could be a lot shorter. That said, some of the results are quite interesting and the general idea of the paper is based on an interesting argument – that PHI needs to be computed in a contextual manner dependent on the system. I do not believe it should be published in its current condition, but is due for a major revision that involves polishing and editing. In general the paper needs a stronger motivation, with the conclusions driving the original motivation.

Author Response

- We thank the reviewer for the thorough revision of the manuscript. We have corrected the grammar errors pointed here and others, and we have sent the manuscript for English proofreading for further revision (we will incorporate the corrections when they arrive).

- As well we have corrected other aspects of the manuscript. We comment the more relevant changes blow.


Due to its computational cost, current versions of the theory (IIT 3.0) are difficult to apply to systems larger than a dozen of units, and in general it is not well known how integrated information scales in time and space.

This makes it sound as if IIT has never had any work on how integrated information changes across space or time – whereas Hoel et al. 2016 “Can the macro beat the micro? Integrated information across spatiotemporal scales” shows that integrated information can increase at higher scales. I think the author means how the calculation of integrated information changes as systems get larger in size.

- We rephrased this in different places to refer to systems getting larger in size.

Furthermore, IIT attempts to quantify the irreducibility of these complex patterns, revealing the boundaries in the organization (i.e. delimiting the parts of the system that are integrated into a functional unit) of complex dynamical systems

IIT doesn’t quantify the complexity of patterns, it quantifies the causal structure. These are very different, for instance, see Larissa et al. 2016 “The intrinsic cause-effect power of discrete dynamical systems.” Additionally, it’s unclear how IIT attempts to quantify the integrated information for complex dynamical systems, given that previously the author stated that IIT can only be run on small systems.

- We changed patterns for causal structured in a couple of places.

- With the new structure of the introcuction (see below) we don’t introduce the issues with the system size until later, thus we hope this does not sound confusing anymore.

In IIT 3.0 [1], the lastest version of the framework proposed by the lab of Giulio Tononi

Spelling mistake. Additionally, I’m not sure this is the best way to reference the paper.

- Changed to “the latest version of the framework of Integrated Information Theory”

can only be computed for binary networks composed of up to a dozen of units. A variety of other measures exist to compute integrated information [3] and some of them are computationally lighter, but all share these limits to some extent.

What about all the metrics and adaptations that are used for brain imaging? Are those for binary networks as well? In what sense are large systems still a problem for the existing heuristics? There is not enough background to motivate these issues here.

- We now mention explicitely these versions for continuous distributions. To our knowledge, these versions of Phi are applied assuming Gaussian distributions of the data. In this new version we mention that they exist and are a way to compute integration analytically (therefore avoiding computation costs) and explain we do not consider Gaussian assuptions in this paper and focus on the computation of phi for general cases (lines 56-59).

Since most versions of Φ require computing distance measures between probability distributions of a system and finding the minimum information partition (MIP), they present important restrictions in terms of computational cost as a system scales in size.

Is the restriction really because of differences between probability distributions? Isn’t the MIP by far the limiting factor, given it is a bell number?

- The size of the computation of the MIP scales faster with size, but the computation of distances between probability distributions is still an important problem (e.g. for Ising models of size 20-30 it would already be difficult to calculate in a standard computer). If the reviewer feels that we should clarify this further we are open to suggestions.


Previous work [4] has tried to analyse how integrated information changes with spatial and temporal coarse graining in small networks

Did they try and fail? Grammar.

- We have rephrased this to remove the “try”

In general, measures of Φ can only be applied to small networks

What about all the ones for brain imaging. They are just ignored consistently throughout the motivating section of the text.

- We think that our comment above also applies to this.

In the framework of IIT 3.0, it is required that a system satisfies the Markov property (i.e., the state at time t only depends on the state at time t − τ)

In Hoel et al. 2016 “Can the macro beat the micro? Integrated information across spatiotemporal scales” there are clearly systems that are second-order

- I’m afraid I do not understand what the reviewer means here or what should be corrected. Could the reviewer clarify this?

we define a general model defining causal temporal interactions between variables

What would a causal non-temporal interaction be?

- We removed “temporal”

in general the MIP still is one that isolates one unit of one of the regions comprised by the mechanism.

Where is this shown “in general”

- We removed “in general” and tried to make this paragraph clearer.

the level of irreducibility of the causal distributions

What are causal distributions?

- Changed to “causal structures”

As IIT 3.0 operates with the transition probability matrix of a system, one could compute this matrix from time t to time t + τ and compute a new probability transition matrix for a bipartition by injecting noise in the connections affected by it at time t. This implies that the system behaves normally for the following steps.

It’s unclear what’s going on here. More time needs to be explicitly spent saying why this is different, perhaps with an actual example figure if this difference is a key point of the paper.

- We have rewritten parts of this section trying to make clearer the distinction between what we call “initial noise injection” and “continuous noise injection” for computing partitions. We hope the new version is better explained, but we are open to suggestions of further changes

The figure text from this figure is just lifted from the figure text of the previous figure. And there’s no figured 3 even cited in the text. And B in this “Figure 3” looks a lot the C in Figure 2, but cropped differently.

- This was a mistake, the legend has been corrected.

- Yes, the two figures are identical since, at is it explained in the text, the value of phi is equivalent in these two cases. We have reference the two figures together in the text for clarity.

This behaviour is completely different to the one from in the effect repertoire. This should not happen for a homogeneous system in a stationary state as the one under study here.

Grammar. Also, how is the reader to know this isn’t due to your modifactions in the calculation of integrated information to adapt to Ising models rather than if one actually calculate using the real values from IIT 3.0, even though it is computationally intractable?

- We modified the text to point that this more of a intuition, given the homogeneity of the system, and placed more weight in the fact phi_cause fails in this case to characterize the critical point as the point with more integrated information.

- Still, we think that our example shows the effects of using a uniform prior even for tau=1, which should be equivalent to the results in IIT3.0 (since our assumptions involve mostly the cases when tau>1). If the reviewer thinks that this should be discussed further we are open to the possiblity

Also, again the measure of φcause fails to capture integration around the critical point.

What’s the argument behind this being necessary?

- Outside the critical point, in the ordered and disordered regions, the fluctuations of the energy and the correlations of an Ising model are very small, and they tend to zero as the system grows in size. Thus, the only point where integrated information can be significantly high is around the critical point. This however was not clear in the previous version of the text. We now mention this more clearly in the introduction section and at the beginning of the results section.



These results illustrate that different distance metrics can have important effects in the behaviour of φ.

This is an interesting result and should be highlighted a bit more.

- We have extended the discussion of these results in section 3.3 and highligted the importance for debates in IIT.

We measure the integrated information of the mechanism φM under three different assumptions: a) that units outside of the mechanism operate normally, b) units outside the mechanism are independent noise sources, and c) units outside the mechanism are fixed as external constraints.

Again, this is quite interesting. Both the notion of exploring the difference between distributions and how to treat outside sources (as noise or so on) is extremely interesting. However, this is only explored in relationship to the Ising model. So the claims are somewhat incomplete, given the adaptations that are made. What about doing this is Markov chains and asking the same questions?

- The dynamical Ising model presented here would be an example of a Markov chain. The results of the paper could be reproduced in other Markov chains as long as they had the same homogeneous distribution of coupling parameters as the Ising model tested, but then the models would be very similar to the Ising models in the paper.

- I’m afraid that it is difficult to reproduce the results here in more general Markov chains with heterogeneous parameters (at least without any kind of approximation).

- These questions could be asked in general for small Markov chains, but I guess that would be a different paper. If the reviewer has a more specific suggestion we’d be happy to consider it.



GENERAL THOUGHTS 
The paper begins by arguing that IIT 3.0 doesn't scale in terms of its calculation and then introduces all sorts of assumptions in the Ising model to make it calculable. This doesn't actually solve the scaling problem so much as show that as long as enough assumptions are made, some version of IIT can be calculated. The paper begins by motivating itself by talking about scale but the takeaway is actually that assumptions in the calculation of IIT, specifically in terms of the chosen difference method and then also the choice of what distribution to use for elements outside the system, can be supported or disproven by watching how the measure operates in the Ising model. These are two mostly different problems, and yet the paper is motivated by the one that's not solved or really even argued for in the paper itself.

- We believe this point is very important, and we have tried to rearrange the whole introduction section in the line suggested by the reviewer. Now, we first introduce the problem of evaluating assumptions about how IIT can be defined, and only then introduce the issue of size as a particular problem that we find when we thing about testing IIT in models of critical phase transitions. We hope that this new structure is better for motivating the problem.

Overall, my impression of this paper is that it is a draft. There are numerous mistakes throughout involving grammatical errors, misplaced figures and figure texts. Two different times the figure texts of two different figures are identical. There is a figure included that is never cited. This is not a polished or final paper and should not have been submitted as such. There is a lot of repetition, and many sections could be a lot shorter. That said, some of the results are quite interesting and the general idea of the paper is based on an interesting argument – that PHI needs to be computed in a contextual manner dependent on the system. I do not believe it should be published in its current condition, but is due for a major revision that involves polishing and editing. In general the paper needs a stronger motivation, with the conclusions driving the original motivation.

- We thank the reviewer for the many corrections and suggestions. Given the short time given for returning the review, we performed important changes in some sections and we hope we have brought the paper to closer to a final form. Still, we will further proofread the manuscript to polish the current version.

p { margin-bottom: 0.1in; line-height: 115%; }


Reviewer 2 Report

In the present paper, the authors analyze the limitations of different versions of Phi, and they propose ways to overcome such limitations. However, talking about different versions of Phi is misleading since they blend together Phi 3.0, which can be computed over binary networks composed up to a dozen of units, and approximations of Phi computed over continuous and real-valued time series. For this reason, I would suggest improving the first part of the introduction by specifying which version of Phi they are questioning on a case-by-case basis.

 

The mean-field approximation presented in (Aguilera et al., Neural Networks, 2019) shows integration divergence when J approaches the critical point from the right. On the contrary, the level of integration is zero on the left of the critical point. Is it correct looking for a phi measure able to capture integration as the system approaches the critical transition from any side? Could the author expand this point and elaborate more about the relationship between the mean-field and the finite-size networks in designing this work?

 

I would also ask to clarify the concept explained at lines 193-194 and 322-326.

 

 

Minor: 

 

-      Legend in Figure 2 is unprecise: B and C do not describe the corresponding graphs;

-      Legend in Figure 4 in incomplete (it actually misses the description of Fig.4 B)

-      Line 118: “…mechanism integration of a mechanism…”, could the author resphrase this sentence?

 

 

Typos:

 

There are many typos in the text. For instance,

-      Line 14: the problem consciousness

-      Line 39: paractical difficulties

-      Line 47: dynamics its complex

-      Line 94: {s_i(t+1)}\in P, instead of \in M

-      Line 124: we divide system is divided into

-      Line 136: MIP in most cases one, verb missing

-      Line 156: There is also some aspects

-      Etc.

-      Line 247: It is well that

Author Response

In the present paper, the authors analyze the limitations of different versions of Phi, and they propose ways to overcome such limitations. However, talking about different versions of Phi is misleading since they blend together Phi 3.0, which can be computed over binary networks composed up to a dozen of units, and approximations of Phi computed over continuous and real-valued time series. For this reason, I would suggest improving the first part of the introduction by specifying which version of Phi they are questioning on a case-by-case basis.

We have rearranged the introduction to make the claims of the paper more clear. As well, we have tried to differentiate between IIT and its versions of Phi (as a theory of consciousness) and different versions of Phi (as general complexity integration measures). From the latter, we explicit mention Gaussian approximations that are used by some authors to avoid some of the computational costs by computing the values of phi analytically, and say that our work doesn’t consider these approximations. We hope these new version is less confused, but we could make the disctintion more explicit if the reviewer feels that is necessary.

The mean-field approximation presented in (Aguilera et al., Neural Networks, 2019) shows integration divergence when J approaches the critical point from the right. On the contrary, the level of integration is zero on the left of the critical point. Is it correct looking for a phi measure able to capture integration as the system approaches the critical transition from any side? Could the author expand this point and elaborate more about the relationship between the mean-field and the finite-size networks in designing this work?

Respect integration is zero when approaching a critical point from the disorder side, we have introduced a clarification in section 3.1. There we explain that the fact that integration is zero in the disordered side for the infinite system and not in the finite system is not a problem of the measure but a characteristic of the system. In the infinite size system units have independent dynamics until J reaches the threshold of the critical point. For finite size, units are not completely independent and our measure correctly captures non-zero integration.

As well, in section 3.3 we extended the discussion about the relation of the results in finite size and the infinite mean-field case. Specially, it is interesting that the Wasserstein distance has the same behaviour in finite and infinite cases, while the Kullback-Leibler divergence does not.

I would also ask to clarify the concept explained at lines 193-194 and 322-326.

In 193-194 we have tried to clarify the difference between what we called initial and continuous noise injections. We hope the new version is clearer respect to what we did in each case.

We also extended lines 322-326 to describe better what we did

 

Minor:

-      Legend in Figure 2 is unprecise: B and C do not describe the corresponding graphs;

-      Legend in Figure 4 in incomplete (it actually misses the description of Fig.4 B)

-      Line 118: “…mechanism integration of a mechanism…”, could the author rephrase this sentence?

These error have been corrected



Typos:

 

There are many typos in the text. For instance,

-      Line 14: the problem consciousness

-      Line 39: paractical difficulties

-      Line 47: dynamics its complex

-      Line 94: {s_i(t+1)}\in P, instead of \in M

-      Line 124: we divide system is divided into

-      Line 136: MIP in most cases one, verb missing

-      Line 156: There is also some aspects

-      Etc.

-      Line 247: It is well that

We have corrected these and other typos.

p { margin-bottom: 0.1in; line-height: 115%; }


Reviewer 3 Report

In this paper, the author critically reviews the assumptions IIT3.0 uses to calculate integrated information using a homogeneous kinetic Ising model. I believe that doing IIT’s analyses using the Ising model provides valuable information about how integrated information behaves. This work is relevant because it shows some useful properties that can massively reduce computational cost. On the other hand, there are some parts I had trouble with understanding. So, I recommend the author to address my concerns below:

(1) In figure 2 and figure 3, the author used ‘continuous’ and ‘initial’ injection of noise. Although these different ways of noise injections cause qualitatively distinct behaviors of integrated information, I could not find the description of the difference between them. Please clarify this.

(2) The author examines how phi value changes with the time-lag used to calculate phi, which is important to find the best time-lag that maximizes phi value according to the exclusion axiom of IIT. However, I could not understand his following point: ‘As we increase tau, this peak moves towards the critical point. Also, as tau increases the size of the peak decreases, tending to zero. This is a problem since phi(tau) is not able to capture maximum integration at the critical point.’ I understand that it’s natural to expect that phi would be maxed at the critical temperature. However, I’m not sure that it’s generally true that infinite time-lag gives the max phi. Rather, I expect to see a decrease of phi with time-lag, because the causality between mechanism and purview should be diluted with it. Please explain the necessity to take the infinite for the time-lag.

(3) The author argued that the behavior of cause phi should be similar to effect phi because the different behavior ‘should not happen for a homogeneous system in a stationary state as the one under study here’. However, I could not find any description of the reason the author thought so. Please clarify it.

(4) The author found weird behavior of phi when he/she injected independent noise to calculate phi (Figure 5B) in order to marginalize out units outside the mechanism. However, the standard way of IIT’s marginalization would be marginalizing the probability distribution with the uniform joint distribution. Please explain why the author did not do this and show what happens if it is done.

(5) I cannot see how the author constructs the system shown in figure 6A. What is the local connection in A and E? Also, although the author described that ‘one region A with self-interaction and another region E which is just coupled to the first without recurrent connections,’ it looks like there are bidirectional connections between A and E. Also, the legend of figure 6 does not make sense at all. It’s important to correctly understand results shown figure 6 so please fully revise relevant parts.

Minor points

(1) The legend is hard to understand. It seems the difference between figures 2A and 2B is the difference in how to inject noise, but the label of y says it is cumulative phi instead of actual phi.

(2) There is a typo in line 195: ‘integrated information integration’

(3) The legend of figure 3 looks identical to that of figure 2. Please revise it.

(4) A space between ‘vice’ and ‘versa’ is missed in line 211.

(5) ‘Figure 2.A’ in line 213 might be ‘Figure 3A’.

(6) ‘Figure 2.B’ in line 222 might be ‘Figure 3B’.

(7) The legend of figure 4 does not make sense.


Author Response

In this paper, the author critically reviews the assumptions IIT3.0 uses to calculate integrated information using a homogeneous kinetic Ising model. I believe that doing IIT’s analyses using the Ising model provides valuable information about how integrated information behaves. This work is relevant because it shows some useful properties that can massively reduce computational cost. On the other hand, there are some parts I had trouble with understanding. So, I recommend the author to address my concerns below:

We thank the reviewer by these comments. We have tried to addressed all the points raised, which we comment below each paragraph. As well, the introduction has been rearranged in response to comments from the other reviewers.

(1) In figure 2 and figure 3, the author used ‘continuous’ and ‘initial’ injection of noise. Although these different ways of noise injections cause qualitatively distinct behaviors of integrated information, I could not find the description of the difference between them. Please clarify this.

This comment is right, the terms were described very superficially. We have tried to define  both terms and clarify the difference between both in section 3.1 .

(2) The author examines how phi value changes with the time-lag used to calculate phi, which is important to find the best time-lag that maximizes phi value according to the exclusion axiom of IIT. However, I could not understand his following point: ‘As we increase tau, this peak moves towards the critical point. Also, as tau increases the size of the peak decreases, tending to zero. This is a problem since phi(tau) is not able to capture maximum integration at the critical point.’ I understand that it’s natural to expect that phi would be maxed at the critical temperature. However, I’m not sure that it’s generally true that infinite time-lag gives the max phi. Rather, I expect to see a decrease of phi with time-lag, because the causality between mechanism and purview should be diluted with it. Please explain the necessity to take the infinite for the time-lag.

This comment reveals a problem that is interesting, but not easy to solve. The main difficulty is that systems at critical points tend to display infinite correlation lengths and critical slowing down, thus they dynamics can only be completely captured with infinite time-lags

With the ‘initial noise injection’ version of partitions, it is hard to capture long timescales of critical system, and phi is maximized for points at the ordered side of the phase transition . With the ‘continuous time injection’ , these large time-scales are captured. The price to pay for that, is that now the measure measures the influence of different timescales cummulatively, and thus there is no longer a dilution of causal influences with larger lags.

We have included this discussion in section 3.1 and discuss why we propose that the second approach is more adequate for the examples covered in the paper.

(3) The author argued that the behavior of cause phi should be similar to effect phi because the different behavior ‘should not happen for a homogeneous system in a stationary state as the one under study here’. However, I could not find any description of the reason the author thought so. Please clarify it.

We have clarified in the text that this claim is just an intuition of what should happen, and we have placed more importance in the fact that in this case the integration does not match with the description of the phase space, displaying very little integrated information at the critical point and large integrated information deep into the ordered phase.

(4) The author found weird behavior of phi when he/she injected independent noise to calculate phi (Figure 5B) in order to marginalize out units outside the mechanism. However, the standard way of IIT’s marginalization would be marginalizing the probability distribution with the uniform joint distribution. Please explain why the author did not do this and show what happens if it is done.

I am not sure if I understand this problem correctly. In IIT 3.0 uniform independent noise is injected to units to create a uniform distribution, and this is what we did. Other versions of IIT may involve marginalization over probability distributions in the same way that computing mutual information, but that is the difference between traditional information theoretic approaches and the ‘interventional’ or ‘perturbational’ approach used by IIT 3.0 (along with Pearl, and others like Ay and Polani). I’m not sure if the question of the reviewer is related to this, but we can clarify it in the text if necessary.

(5) I cannot see how the author constructs the system shown in figure 6A. What is the local connection in A and E? Also, although the author described that ‘one region A with self-interaction and another region E which is just coupled to the first without recurrent connections,’ it looks like there are bidirectional connections between A and E. Also, the legend of figure 6 does not make sense at all. It’s important to correctly understand results shown figure 6 so please fully revise relevant parts.

There was a mistake in the legend of Figure 6, it is corrected now. We have also tried to clarify the description of the system connectivity in the text (section 3.5)

Minor points

(1) The legend is hard to understand. It seems the difference between figures 2A and 2B is the difference in how to inject noise, but the label of y says it is cumulative phi instead of actual phi.

(2) There is a typo in line 195: ‘integrated information integration’

(3) The legend of figure 3 looks identical to that of figure 2. Please revise it.

(4) A space between ‘vice’ and ‘versa’ is missed in line 211.

(5) ‘Figure 2.A’ in line 213 might be ‘Figure 3A’.

(6) ‘Figure 2.B’ in line 222 might be ‘Figure 3B’.

(7) The legend of figure 4 does not make sense.

Thanks for the corrections. We have incorporated all in the text. We have also corrected the legend of figures 2 and 3.


Round 2

Reviewer 1 Report

Small changes

(IIT 3.0) are difficult to apply to systems larger than a dozen of  units

Grammar.

Due to combinatorial explosion, computing Φ in practice is only possible in small discrete  systems, preventing its application to the very large or even infinite system where critical dynamics can be appreciated

English

Moreover, since we will assume a homogeneous architecture, mechanisms in the system have a similar behaviour in most cases, and the distinction between mechanism integration φ and system-level integration Φ is not quite revealing (as we will find diverging values of φ at some points, this will correspond to equally diverging values of Φ). Thus, for simplicity, we will compute only the integrated information of a mechanism, φ, comprising the whole system of interest

We need something more than the authors word on this, particularly since “big PHI” may explain some of the discrepancies that the paper points out. Why can “big PHI” or system-level integration be completely ignored in this account? We never see any evidence of this, even just something showing how it grows or obeys the same rules as the “small PHI” directly. Is it really true that the values are identical, or follow immensely simply from the “small PHI” calculation in this system? If true, this seems a point of interest given the new direction of the paper of discussing how the assumptions behind IIT depend on the system. I would suggest incorporating this into the results in some way.

 

Figure 6E is not referenced in the figure text.

Second, some of the assumptions made by IIT 3.0 for computing integration over the cause repertoire of a state (the distribution of previous states) present problems for capturing the integration  of a system at a critical point. The assumption of a uniform prior distribution simplifies the calculations of cause repertoires but distorts the behaviour of the system.

In the section of the cause information it is not taken into account that the minimum bit value is taken between the cause and effect information within IIT 3.0. This should be compared in the paper to draw this conclusion. In general the conclusions of the piece are a bit stronger than the given evidence provided, since IIT is generally only calculated in parts. I’m not sure why these weren’t done, other than ease of calculation.

In contrast, IIT 3.0 quantifies the level of integration dismissing the  interaction between a system and its environment.

Grammar

Specifically, IIT 3.0 considers the units outside of a system as static variables and the units within a system but outside of a mechanisms as independent sources of noise.

Grammar

This allows to draw important consequences for the application of integrated information measures in simulated and experimental setups.

 

Grammar

 

Overall comments and further recommendations

Overall, this is an interesting paper with a lot to say. It is much easier to understand now that problems with presentation and image/text duplication are resolved and the structure is much clearer, as well as its reasoning and its final conclusions. I do think the author’s work has a lot of merit and that there are some great contributions to the literature here, particularly for people looking to design future versions of IIT and also apply IIT to different systems. However, the remaining problem that I see is that the paper takes leaps in a few places, but does not back up those leaps with its simulations. This undermines the strength of the conclusions by just leaving out various simulations that could be run. For example, this paper never once calculates the “big PHI” metric, which is the main metric of IIT 3.0, yet still takes all of its conclusions as applying directly to IIT 3.0. A similar problem is that IIT 3.0 takes the minimum between the cause/effect information. They are analyzed only separately here yet the conclusions are assumed to still apply directly to IIT 3.0. Given these two missing components I don’t think the author’s conclusions actually can be proven to follow from what is presented. I worry that the paper’s results will be dismissed simply because it does not account for the simple criticism of “the big PHI was never calculated and the cause/effect information were never treated in relation to each other. Therefore, it’s unknown how these results actually apply to the IIT 3.0 calculation, merely its parts.”


I suggest that the paper’s strong conclusions be backed up by actually examining these two values: a) the big PHI and b) the min(cause/effect) and their behavior in the model. It may not be necessary to calculate the “big PHI” for all the examples if, as the author claims, big PHI acts identically to small PHI, and this is actually shown in the paper rather than merely being assumed. Indeed, if big PHI is essentially unnecessary in the calculation of IIT in an Ising model, this also supports the author’s point that a sensible calculation of integrated information changes depending on the system under question. Otherwise the author needs to be constantly mentioning that their conclusions apply only to the “small PHI” calculation in IIT 3.0, and even that is not done 100% as in IIT 3.0 since the behavior of the minimum is never shown in this paper. Without these two comparisons the conclusions of the paper in their current form are presented far too strongly. However, once those two things are shown in the paper I’d strongly recommend it for publication as a solid contribution to the existing literature.


Author Response

We introduced the grammar corrections pointed by the reviewer


Figure 6E is not referenced in the figure text.
The figure (now Figure 7E) is referenced in the text together with Figure 7D (line 393)

We also thank the reviewer by their reflections about big PHI and the conclusions of the paper. Motivated by these comments we decided to apply the same methods we used to simplify computations in the system with just one region in order to compute big PHI.
We added a new section (Section 3.5) discussing the results of measuring big PHI in this simple system, showing that divergences at the critical point are conserved (and amplifying) and discussing about the necessity of adding this second level of integration over the values of small phi

About the min(cause/effect) metric, we think that some parts of the text addressing this might not be clear and we have rewritten them. First, when discussing the value of phi_cause in Figure 3.A, we describe now more clearly that its values are much lower than those of phi_effect (Figure 2.C), thus the values of phi_cause dominate phi.
As well, when we assume stationarity instead of assuming an independent prior distribution, phi_cause = phi_effect = phi, then it is not necessary to compute the minimum (and this is what we assume during the rest of the paper).
We also mention as a suggestion that min(phi_cause,phi_effect) should be computed in a case where instead of a stationary distribution, some other specific transient distribution is assumed, although in this case the computational complexity of the measure would increase.
We hope that these clarifications can address the concerns of the reviewer and that it can be clear for a reader that the conclusions are backed by the results (at least in the stationary case).
If the reviewer thinks that it is necessary to compute some instance of min(phi_cause,phi_effect) it might be done, though it would take some time to implement and perform new simulations and would also maybe adding a subsection explaining the methods and showing those results...

Round 3

Reviewer 1 Report

The comparisons of BIG PHI makes the paper much more complete. Indeed, BIG PHI does seem to have good respect for the critical point under certain assumptions (which the author rightly points out are need) - the discussion is still quite negative but I view this as actually somewhat supportive of IIT. I think it was always a straw-man to assume that a measure designed to work in discrete finite systems of logic gates would always work exactly the same across all system types (like continuous systems, physically-realistic systems, etc). However, the author has shown that with certain (sensible) assumptions IIT does indeed capture the critical point, there just needs to be thought in on what those assumptions are on a case-by-case basis. So the author could land on a slightly more positive note that currently. 

Please do a last grammatical check.

Author Response

Many thanks for your comments.

We reviewed the text and corrected a couple of typos. We will perform a thorough revision before production.


As for the discussion, we have added a new paragraph inspired by your comment. We hope that it can contribute to read the results of the paper as a constructive contributions to IIT and future versions of the theory.


"Since IIT~3.0 has been mostly tested in small logic gate circuits, exploring the behaviour of integrated information in large Ising models has allowed us to investigate questions that were so far unexplored and inspect some of the assumptions of the theory from a new perspective. We consider that the value of the study is twofold. On the one hand, we propose a family of models with known statistical properties, where calculations of integrated information are simplified. These and similar models could work as a benchmark for testing properties of integrated information in large systems. On the other hand, the reformulations of different aspects of the theory proposed during the paper could be considered by future versions of IIT in order to capture some of the phenomena that we could expect in large complex systems."

Back to TopTop