Next Article in Journal
Care Ethics, Bruno Latour, and the Anthropocene
Next Article in Special Issue
On Falsifiable Statistical Hypotheses
Previous Article in Journal
Virtual Reality and Aesthetic Experience
Previous Article in Special Issue
Induction, Experimentation and Causation in the Social Sciences
 
 
Article
Peer-Review Record

Causal Emergence: When Distortions in a Map Obscure the Territory

Philosophies 2022, 7(2), 30; https://doi.org/10.3390/philosophies7020030
by Frederick Eberhardt * and Lin Lin Lee
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3:
Philosophies 2022, 7(2), 30; https://doi.org/10.3390/philosophies7020030
Submission received: 1 October 2021 / Revised: 15 February 2022 / Accepted: 17 February 2022 / Published: 9 March 2022
(This article belongs to the Special Issue The Problem of Induction throughout the Philosophy of Science)

Round 1

Reviewer 1 Report

The authors present a critical review of the theory of causal emergence developed by Hoel and collaborators. In doing so, the authors display great skill in navigating the conceptual and technical challenges imposed by this exciting but convoluted topic. First and foremost, I’d like to congratulate the authors for putting together an extremely interesting manuscript, which I believe will be a useful contribution to the literature. Below I present a few comments and suggestion, which I hope the authors may find helpful to further refine their ideas.

In section 2, the authors describe Hoel’s framework as if it was designed primarily to study time series; however, I believe this may not the best way to present it. It is true that Hoel is relatively unclear about this very important point, but my understanding is that in Hoel’s view the primary target of the analysis is not an ongoing process, but a given input-output mechanism characterised by the TPM. Then, the question is how the mechanism looks like when operated at a microscale (distinguishing each entry) or at a coarse-grained level. This scenario is very different from studying an ongoing process, which might be Markovian but with its own marginal (stationary or not) probability distribution — which very well may not be uniform. Therefore, I’d suggest to introduce the theory in Section 2 without an explicit reference to a time series setup. Of course the authors can mention this type of scenario, but framing it as the main thing I believe could be detrimental. 

I’d also suggest the authors to re-think the notation used in Section 2. First, I find it is unnecessary to use S_M and S_m (instead of simply using different letters), as this distinction cannot be made with the lowercase instantiations. More importantly, there is an abuse of notation that I find confusing: eq. (1) states that H^{max} is a uniform distribution, but then equations (2) and (6) use it as if it would be a random variable. Given that there are not so many variables involved, I’d suggest the authors to find a more clear notation (even if this implies moving away from the original notation, which I don’t think is particularly clear anyway). Furthermore, I’d suggest to replace eq. (7) with the one in the footnote — followed by the clarification note, as I believe eq. (7) is anyway not so simple and adds more confusion than anything else.

About eq. (8), I may be wrong but I fear there may be an omission, as (unless I’m missing something) the maximisation affects not only S but also R. In effect, my understanding is that every time a different coarse graining is considered, this imposes a different input distribution (as it is a MaxEnt over a different partition) but also a different coarse-graining on the receiver side. Furthermore, I’d suggest the authors include a clarification on the set over which p(S) is maximised (which are not all the possible input distributions, but a very peculiar set), as this is a key difference with the maximisation that takes place with the channel capacity. 

Related with the statement that eq (8) tend to the channel capacity when the system grows, I wonder if this holds in general or not. Please note that Ref [1] states “causal capacity *can* approximate channel capacity”, but doesn’t seem to say this is necessary in all cases. Please double-check if this is the case, or if some clarification on potential limitations to this parallel could be needed.

About Section 3, I have to confess that while I’m convinced that using MaxEnt inputs is  arbitrary and a very important issue to be flagged, I’m not so convinced by some of the reasons argued. For example, considering the example of of HDL and LDL, I’d think that if the two effects counteract each other, then conflating them should generate less causal power and hence that coarse-graining should be selected as a causally emergent one. In my view, this approach is a bit mercenary in the sense of just focusing on the aggregated causal power (as measured by the mutual information), without considering much of the underlying semantics. And I don’t see this as intrinsically problematic. That being said, my suggestion would be to change the order of the section: first present the problem (that using MaxEnt distributions is arbitrary, and that it carries a hell lot of the weight of the overall framework - see comment below), and only then display the reasoning of why this is problematic. In this way, the reader will get the problem before the interpretation, and hence is less of a problem if she/he doesn’t agree with the latter.

On this issue, it may be interesting to remark the critical role that the MaxEnt interventions play in Hoel’s framework. In fact, it has been noted that the only reason why the macro to have more mutual information than the micro is because the framework allows for interventions at macro and micro levels to be “incompatible” — in the sense that MaxEnt macro intervention are usually not a coarse-graining of MaxEnt micro intervention. The critical role of this “incompatibility have been remarked by (Aaronson2017, and please see also the interesting rebuttal letter by Hoel) and in (Rosas2020). In my opinion there is nothing wrong about this choice, but restricts the domain of applicability of this framework to measure emergence to scenarios where one cares about the intrinsic nature of a mechanism but not about the actual activity displayed by a system. Other approaches to emergence such as the ones formulated in (Seth2010), (Barnett2021), and (Rosas2020), would be more appropriate to address the latter cases.

About Section 4, I was confused in the beginning with the use of the term “aggregation”. I think the authors mean “sub-sampling”, i.e. considering only one every x samples, which effectively reduces the sampling frequency of a signal, while with aggregation I thought the authors were implying some sort of temporal coarse-graining. Maybe a clarification could help readers not to fall into a similar confusion. Additionally, I’d appreciate a bit of more elaboration on why the commutativity is desirable, or conversely why non-commutativity is problematic. Finally, perhaps for future work, it would be very interesting to check if other frameworks to quantify causal emergence, such as the ones presented in (Seth2010), (Rosas2020), and (Barnett2021) may satisfy this desideratum or not.

I have a bit of confusion about the article’s structure in section 5 and 6. On the one hand, Section 5 presents further comments about the issues discussed before, so it is not clear why this comments need to be here and couldn’t be discussed earlier. Then, Section 6 feels more of a discussion section than a conclusion. If I may suggest some changes, I’d suggest to move the dedicated comments of Section 5 to the corresponding previous sections, then merge the medium-level observations to a new section 5 named “Discussion”, and then conclude with a shorter section 6 that could summarise more directly the results of this investigation. Please feel free to disagree with these suggestions.

About section 6, just two small comments. First, note that the comment in the first paragraph of Section 6 about the interaction of the system with measurement is confusing. Please reconsider this passage (lines 359-361). Second, in lines 374-375, it is mentioned that something may not be “appropriate”. This description sounds a bit vague or arbitrary, please clarify if possible.

 

References 

Scot Aaronson, Higher-level causation exists (but I wish it didn’t). https://www.scottaaronson.com/blog/?p=3294

Rosas, F. E., Mediano, P. A., Jensen, H. J., Seth, A. K., Barrett, A. B., Carhart-Harris, R. L., & Bor, D. (2020). Reconciling emergences: An information-theoretic approach to identify causal emergence in multivariate data. PLOS Computational Biology, 16(12), e1008289.

Seth, A. K. (2010). Measuring autonomy and emergence via Granger causality. Artificial life, 16(2), 179-196.

Barnett, L., & Seth, A. K. (2021). Dynamical independence: discovering emergent macroscopic processes in complex dynamical systems. arXiv preprint arXiv:2106.06511.

Author Response

Dear Reviewer, please see the attached file.

Author Response File: Author Response.pdf

Reviewer 2 Report

attached pdf.

Comments for author File: Comments.pdf

Author Response

Dear Reviewer, please see the attached file

Author Response File: Author Response.pdf

Reviewer 3 Report

Summary: In the manuscript, the author(s) (henceforth: AU) raises criticisms against Hoel’s attempt to provide an analysis of causal emergence using the tools of causal modeling and information theory. Many of AU’s criticisms rely on two forms of averaging that play a role on Hoel’s account:

1) Effective information is calculated by reference to a maximum entropy (i.e. a uniform distribution over the different possible interventions). It averages over the Kullback-Leibler divergence of each possible intervention from the uniform distribution.

2)Interventions on macrovariables are defined as the average of interventions on each microvariable that jointly constitutes the microvariable.

AUs first criticism is that Hoel’s account allows for “ambiguous” interventions, in which a causal macrovariable consists of microvariables whose effects are very different. Hoel’s stipulation that interventions on macrovariables are performed in the way specified by 2) eliminates the ambiguity, but it is totally arbitrary. AUs second criticism is that the operators for abstraction and marginalization (i.e. coarse-graining and iterating the process over multiple time steps) do not commute. AU traces this problem to the specification of a reference distribution, and thus to the first form of averaging. Finally, AU raises some additional concerns related to the divergence between the normative concept of information theory and the empirical aims of an account of emergence, as well as concerns about the uniqueness of the coarse-graining.

 

In general, I find the discussion in this paper to be extremely clear and effective. The criticisms are compelling. Additionally, the analysis provides many insights into the relationship between causation and information theory that potentially extend beyond the aim of criticizing Hoel’s account. In my view the paper could be published as is. I’ll nevertheless add a few minor comments in case they are of use to AU or the editors.

  1. Ambiguous manipulations. The discussion of how Hoel’s account will involve ambiguous manipulations was my favorite in the paper. It’s a good example of the importance of paying careful attention to the details of causal modeling. Because some readers might be unfamiliar with these details, it might be useful to further clarify precisely why this ambiguity is so problematic. Crucially, lots of forms of averaging are perfectly ok from the perspective of causal models. There can be all types of effect heterogeneity resulting from different experimental units having different effect modifiers. In contrast, combining meaningfully different versions of the causal variable leads to a problem because of the assumption that the influence of a cause on its effect needs to be independent of the intervention on the cause itself. When there is ambiguous manipulation, this assumption appears to be violated, since learning about the particular nature of the intervention can provide information about which version of the cause was manipulated, and thus about the downstream effects of the intervention.

I’m not totally sure how to get into this without providing too much information. But some comment that the mixing of heterogeneous causes is especially problematic in causal modeling, and that aggregation is not always a problem, could preempt some confusion.

 

b. I was surprised by the claim on page 12 that the requirement against ambiguous manipulations was closely related to Yablo’s proportionality requirement. Failure of proportionality might suggest that the cause variable is specified in a too fine-grained manner, but does not seem to entail ambiguity. In any event, the relationship between the two concepts remains murky to me. So I’d recommend either clarifying this further or eliminating it.

 

Author Response

Dear Reviewer, please see the attached file

Author Response File: Author Response.pdf

Back to TopTop