What Caused What? A Quantitative Account of Actual Causation Using Dynamical Causal Networks

Actual causation is concerned with the question: “What caused what?” Consider a transition between two states within a system of interacting elements, such as an artificial neural network, or a biological brain circuit. Which combination of synapses caused the neuron to fire? Which image features caused the classifier to misinterpret the picture? Even detailed knowledge of the system’s causal network, its elements, their states, connectivity, and dynamics does not automatically provide a straightforward answer to the “what caused what?” question. Counterfactual accounts of actual causation, based on graphical models paired with system interventions, have demonstrated initial success in addressing specific problem cases, in line with intuitive causal judgments. Here, we start from a set of basic requirements for causation (realization, composition, information, integration, and exclusion) and develop a rigorous, quantitative account of actual causation, that is generally applicable to discrete dynamical systems. We present a formal framework to evaluate these causal requirements based on system interventions and partitions, which considers all counterfactuals of a state transition. This framework is used to provide a complete causal account of the transition by identifying and quantifying the strength of all actual causes and effects linking the two consecutive system states. Finally, we examine several exemplary cases and paradoxes of causation and show that they can be illuminated by the proposed framework for quantifying actual causation.


Introduction
The nature of cause and effect has been much debated in both philosophy and the sciences. To date, there is no single widely-accepted account of causation, and the various sciences focus on different aspects of the issue [1]. In physics, no formal notion of causation seems to even be required for describing the dynamical evolution of a system by a set of mathematical equations. At most, the notion of causation is reduced to the basic requirement that causes must precede and be able to influence their effects-no further constraints are imposed with regard to "what caused what".
However, a detailed record of "what happened" prior to a particular occurrence rarely provides a satisfactory explanation for why it occurred in causal, mechanistic terms (see Theory 2.2 for a formal definition of the term "occurrence" as a set of random variables in a particular state at a particular time). As an example, take AlphaGo, the deep neural network that repeatedly defeated human champions in the game Go [2]. Understanding why AlphaGo chose a particular move is a non-trivial problem [3], even though all its network parameters and its state evolution can be recorded in detail. Identifying "what caused what" becomes particularly difficult in complex systems with a distributed, recurrent IIT provides the tools to characterize potential causation-the causal constraints exerted by a mechanism in a given state.
In particular, our objective is to provide a complete quantitative causal account of "what caused what", within a transition between consecutive system states. Our approach differs from previous accounts of actual causation in what constitutes a complete causal account: Unlike most accounts of actual causation (e.g., [7,10,12], but see [29]), causal links within a transition are considered from the perspective of both causes and effects. Additionally, we not only evaluate actual causes and effects of individual variables, but also actual causes and effects of high-order occurrences, comprising multiple variables. While some existing accounts of actual causation include the notion of being "part of a cause" [12,21], the possibility of multi-variate causes and effects is rarely addressed, or even outright excluded [11].
Despite the differences in what constitutes a complete causal account, our approach remains compatible with the traditional view of actual causation, which considers only actual causes of individual variables (no high-order causation, and no actual effects). In this context, the main difference between our proposed framework and existing "contingency"-based definitions is that we simultaneously consider all counterfactual states of the transition, rather than a single contingency (e.g., as in [8,11,[19][20][21]30,31]). This allows us to express the causal analysis in probabilistic, informational terms [25,[32][33][34], which has the additional benefit that our framework naturally extends from deterministic to probabilistic causal networks, and also from binary to multi-valued variables. Finally, it allows us to quantify the strength of all causal links between occurrences and their causes and effects within the transition.
In the following, we will first formally describe the proposed causal framework of actual causation. We, then, demonstrate its utility on a set of examples, which illustrates the benefits of characterizing both causes and effects, the fact that causation can be compositional, and the importance of identifying irreducible causes and effects for obtaining a complete causal account. Finally, we illustrate several prominent paradoxical cases from the actual causation literature, including overdetermination and prevention, as well as a toy model of an image classifier, based on an artificial neural network.

Theory
Integrated information theory is concerned with the intrinsic cause-effect power of a physical system (intrinsic existence). The IIT formalism [25,27] starts from a discrete distributed dynamical system in its current state and asks how the system elements, alone and in combination (composition), constrain the potential past and future states of the system (information), and whether they do so above and beyond their parts (integration). The potential causes and effects of a system subset correspond to the set of elements over which the constraints are maximally informative and integrated (exclusion). In the following we aim to translate the IIT account of potential causation into a principled, quantitative framework for actual causation, which allows for the evaluation of all actual causes and effects within a state transition of a dynamical system of interacting elements, such as a biological or artificial neural network (see Figure 1). For maximal generality, we will formulate our account of actual causation in the context of dynamical causal networks [32,34,35].

Dynamical Causal Networks
Our starting point is a dynamical causal network: A directed acyclic graph (DAG) G u = (V, E) with edges E that indicate the causal connections among a set of nodes V and a given set of background conditions (state of exogenous variables) U = u (see Figure 1B). The nodes in G u represent a set of associated random variables (which we also denote by V) with state space Ω = ∏ i Ω V i and probability function p(v|u), v ∈ Ω. For any node V i ∈ V, we can define the parents of V i in G u as all nodes with an edge leading into V i , pa(V i ) = {V j | e ji ∈ E}.
A causal network G u is dynamical, in the sense that we can define a partition of its nodes V into k + 1 temporally ordered "slices", V = {V 0 , V 1 , . . . , V k }, starting with an initial slice without parents (pa(V 0 ) = ∅) and such that the parents of each successive slice are fully contained within the previous slice (pa(V t ) ⊆ V t−1 , t = 1, . . . , k). This definition is similar to the one proposed in [32], but is stricter, requiring that there are no within-slice causal interactions. This restriction prohibits any "instantaneous causation" between variables (see also [7], Section 1.5) and signifies that G u fulfills the Markov property. Nevertheless, recurrent networks can be represented as dynamical causal models when unfolded in time (see Figure 1B) [20]. The parts of V = {V 0 , V 1 , . . . , V k } can thus be interpreted as consecutive time steps of a discrete dynamical system of interacting elements (see Figure 1); a particular state V = v, then, corresponds to a system transient over k + 1 time steps.
In a Bayesian network, the edges of G u fully capture the dependency structure between nodes V. That is, for a given set of background conditions, each node is conditionally independent of every other node, given its parents in G u , and the probability function can be factored as For a causal network, there is the additional requirement that the edges E capture causal dependencies (rather than just correlations) between nodes. This means that the decomposition of p(v | u) holds, even if the parent variables are actively set into their state as opposed to passively observed in that state ("Causal Markov Condition", [7,15]), As we assume, here, that U contains all relevant background variables, any statistical dependencies between V t−1 and V t are, in fact, causal dependencies, and cannot be explained by latent external variables ("causal sufficiency", see [34]). Moreover, because time is explicit in G u and we assume that there is no instantaneous causation, there is no question of the direction of causal influences-it must be that the earlier variables (V t−1 ) influence the later variables (V t ). By definition, V t−1 contains all parents of V t for t = 1, . . . , k. In contrast to the variables V within G u , the background variables U are conditioned to a particular state U = u throughout the causal analysis and are, otherwise, not further considered.
Together, these assumptions imply a transition probability function for V, such that the nodes at time t are conditionally independent given the state of the nodes at time t − 1 (see Figure 1C), (1) To reiterate, a dynamical causal network G u describes the causal interactions among a set of nodes (the edges in E describe the causal connections between the nodes in V) conditional on the state of the background variables U, and the transition probability function p u (v t | v t−1 ) (Equation (1)) fully captures the nature of these causal dependencies. Note that p u (v t |v t−1 ) is generally undefined in the case where p u (v t−1 ) = 0. However, in the present context, it is defined as p u (v t |v t−1 ) = p u (v t |do(v t−1 )) using the do(v t−1 ) operation. The interventional probability p u (v t |do(v t−1 )) is well-defined for all v t−1 ∈ Ω and can typically be inferred from the mechanisms associated with the variables in V t .
In summary, we assume that G u fully and accurately describes the system of interest for a given set of background conditions. In reality, a causal network reflects assumptions about a system's elementary mechanisms. Current scientific knowledge must inform which variables to include, what their relevant states are, and how they are related mechanistically [7,36]. Here, we are primarily interested in natural and artificial systems, such as neural networks, for which detailed information about the causal network structure and the mechanisms of individual system elements is often available, or can be obtained through exhaustive experiments. In such systems, counterfactuals can be evaluated by performing experiments or simulations that assess how the system reacts to interventions. The transition probabilities can, in principle, be determined by perturbing the system into all possible states while holding the background variables fixed and observing the resulting transitions. Alternatively, the causal network can be constructed by experimentally identifying the input-output function of each element (i.e., its structural equation [7,34]). Merely observing the system without experimental manipulation is insufficient to identify causal relationships in most situations. Moreover, instantaneous dependencies are frequently observed in (experimentally obtained) time-series data of macroscopic variables, due to unobserved interactions at finer spatio-temporal scales [37]. In this case, a suitable dynamical causal network may still be obtained, simply by discounting such instantaneous dependencies, since these interactions are not due to the macroscopic mechanisms themselves.
Our objective, here, is to formulate a quantitative account of actual causation applicable to any predetermined, dynamical causal network, independent of practical considerations about model selection [12,36]. Confounding issues due to incomplete knowledge, such as estimation biases of probabilities from finite sampling, or latent variables, are, thus, set aside for the present purposes.
To what extent and under which conditions the identified actual causes and effects generalize across possible levels of description, or under incomplete knowledge, is an interesting question that we plan to address in future work (see also [38,39]).

Occurrences and Transitions
In general, actual causation can be evaluated over multiple time steps (e.g., considering indirect causal influences). Here, however, we specifically focus on direct causes and effects without intermediary variables or time steps. For this reason, we only consider causal networks containing nodes from two consecutive time points, Figure 1D).
Note that our approach generalizes, in principle, to system transitions across multiple (k > 1) time steps, by considering the transition probabilities (1). While this practice would correctly identify counterfactual dependencies between v t−k and v t , it ignores the actual states of the intermediate time steps (v t−k+1 , . . . , v t−1 ). As a consequence, this approach cannot, at present, address certain issues regarding causal transitivity across multiple paths, incomplete causal processes in probabilistic causal networks [40], or causal dependencies in non-Markovian systems.
Within a dynamical causal network G u = (V, E) with V = {V t−1 , V t }, our objective is to determine the actual cause or actual effect of occurrences within a transition v t−1 ≺ v t . Formally, an occurrence is defined to be a sub-state corresponding to a subset of elements at a particular time and in a particular state. This corresponds to the general usage of the term "event" in the computer science and probability literature. The term "occurrence" was chosen, instead, to avoid philosophical baggage associated with the term "event".

Cause and Effect Repertoires
Before defining the actual cause or actual effect of an occurrence, we first introduce two definitions from IIT which are useful in characterizing the causal powers of occurrences in a causal network: Cause/effect repertoires and partitioned cause/effect repertoires. In IIT, a cause (or effect) repertoire is a conditional probability distribution that describes how an occurrence (set of elements in a state) constrains the potential past (or future) states of other elements in a system [25,26] (see also [27,41] for a general mathematical definition). In the present context of a transition v t−1 ≺ v t , an effect repertoire specifies how an occurrence x t−1 ⊆ v t−1 constrains the potential future states of a set of nodes Y t ⊆ V t . Likewise, a cause repertoire specifies how an occurrence y t ⊆ v t constrains the potential past states of a set of nodes X t−1 ⊂ V t−1 (see Figure 2).
The effect and cause repertoire can be derived from the system transition probabilities in Equation (1) by conditioning on the state of the occurrence and causally marginalizing the variables outside the occurrence V t−1 \ X t−1 and V t \ Y t (see Discussion 4.1). Causal marginalization serves to remove any contributions to the repertoire from variables outside the occurrence by averaging over all their possible states. Explicitly, for a single node Y i,t , the effect repertoire is: where W = V t−1 \ X t−1 with state space Ω W . Note that, for causal marginalization, each possible state W = w ∈ Ω W is given the same weight |Ω W | −1 in the average, which corresponds to imposing a uniform distribution over all w ∈ Ω W . This ensures that the repertoire captures the constraints due to the occurrence, and not to whatever external factors might bias the variables in W to one state or another (this is discussed in more detail in Section 4.1).
In graphical terms, causal marginalizing implies that the connections from all W i ∈ W to Y i,t are "cut" and independently replaced by an un-biased average across the states of the respective W i , which also removes all dependencies between the variables in W. Causal marginalization, thus, corresponds to the notion of cutting edges proposed in [34]. However, instead of feeding all open ends with the product of the corresponding marginal distributions obtained from the observed joint distribution, as in Equation (7) of [34], here we impose a uniform distribution p = 1 |Ω W | , ∀w ∈ Ω W , as we are interested in quantifying mechanistic dependencies, which should not depend on the observed joint distribution. The complementary cause repertoire of a singleton occurrence y i,t , using Bayes' rule, is: In the general case of a multi-variate Y t (or y t ), the transition probability function p u (Y t | x t−1 ) not only contains dependencies of Y t on x t−1 , but also correlations between the variables in Y t due to common inputs from nodes in W t−1 = V t−1 \ X t−1 , which should not be counted as constraints due to x t−1 . To discount such correlations, we define the effect repertoire over a set of variables Y t as the product of the effect repertoires over individual nodes (Equation (2)) (see also [34]): In the same manner, we define the cause repertoire of a general occurrence y t over a set of variables X t−1 as: We can also define unconstrained cause and effect repertoires, a special case of cause or effect repertoires where the occurrence that we condition on is the empty set. In this case, the repertoire describes the causal constraints on a set of the nodes due to the structure of the causal network, under maximum uncertainty about the states of variables within the network. With the convention that π(∅) = 1, we can derive these unconstrained repertoires directly from the formulas for the cause and effect repertoires, Equations (3) and (4). The unconstrained cause repertoire simplifies to a uniform distribution, representing the fact that the causal network itself imposes no constraint on the possible states of variables in V t−1 , The unconstrained effect repertoire is shaped by the update function of each individual node Y i,t ∈ Y t under maximum uncertainty about the state of its parents, where In summary, the effect and cause repertoires π(Y t | x t−1 ) and π(X t−1 | y t ), respectively, are conditional probability distributions that specify the causal constraints due to an occurrence on the potential past and future states of variables in a causal network G u . The cause and effect repertoires discount constraints that are not specific to the occurrence of interest; possible constraints due to the state of variables outside of the occurrence are causally marginalized from the distribution, and constraints due to common inputs from other nodes are avoided by treating each node in the occurrence independently. Thus, we denote cause and effect repertoires with π, to highlight that, in general, probability imposing a uniform distribution over the marginalized variables), in the special case that all variables Y i,t ∈ Y t are conditionally independent, given x t−1 (see also [34], Remark 1). This is the case, for example, if X t−1 already includes all inputs (all parents) of Y t , or determines Y t completely.
An objective of IIT is to evaluate whether the causal constraints of an occurrence on a set of nodes are "integrated", or "irreducible"; that is, whether the individual variables in the occurrence work together to constrain the past or future states of the set of nodes in a way that is not accounted for by the variables taken independently [25,42]. To this end, the occurrence (together with the set of nodes it constrains) is partitioned into independent parts, by rendering the connection between the parts causally ineffective [25,26,34,42]. The partitioned cause and effect repertoires describe the residual constraints under the partition. Comparing the partitioned cause and effect repertoires to the intact cause and effect repertoires reveals what is lost or changed by the partition.
A partition ψ of the occurrence x t−1 (and the nodes it constrains, Y t ) into m parts is defined as: such that {x j,t−1 } m j=1 is a partition of x t−1 and Y j,t ⊆ Y t with Y j,t ∩ Y k,t = ∅, j = k. Note that this includes the possibility that any Y j,t = ∅, which may leave a set of nodes Y t \ m j=1 Y j,t completely unconstrained (see Figure 3 for examples and details).
The partitioned effect repertoire of an occurrence x t−1 over a set of nodes Y t under a partition ψ is defined as: This is the product of the corresponding m effect repertoires, multiplied by the unconstrained effect repertoire (Equation (6)) of the remaining set of nodes Y t \ m j=1 Y j,t , as these nodes are no longer constrained by any part of x t−1 under the partition.
Examples partitions of the occurrence 3 "#$ and its contraints on 4 " The set of all possible partitions of an occurrence, Ψ(x t−1 , Y t ), includes all partitions of x t−1 into 2 ≤ m ≤ |x t−1 | parts, according to Equation (7); as well as the special case ψ = {(x t−1 , ∅)}. Considering this special case a potential partition has the added benefit of allowing us to treat singleton occurrences and multi-variate occurrences in a common framework. (B) Except for the special case when the occurrence is completely cut from the nodes it constrains, we generally do not consider cases with m = 1 as partitions of the occurrence. The partition must eliminate the possibility of joint constraints of x t−1 onto Y t . The set of all partitions Ψ(X t−1 , y t ) of a cause repertoire π(X t−1 | y t ) includes all partitions of y t into 2 ≤ m ≤ |y t | parts, according to Equation (9), and, again, the special case of ψ = {(∅, y t )} for m = 1.
In the same way, a partition ψ of the occurrence y t (and the nodes it constrains X t−1 ) into m parts is defined as: such that {y i,t } m i=1 is a partition of y t and X j,t−1 ⊆ X t−1 with X j,t−1 ∩ X k,t−1 = ∅, j = k. The partitioned cause repertoire of an occurrence y t over a set of nodes X t−1 under a partition ψ is defined as: the product of the corresponding m cause repertoires multiplied by the unconstrained cause repertoire (Equation (6)) of the remaining set of nodes X t−1 \ m j=1 X j,t−1 , which are no longer constrained by any part of y t due to the partition.

Actual Causes and Actual Effects
The objective of this section is to introduce the notion of a causal account for a transition of interest v t−1 ≺ v t in G u as the set of all causal links between occurrences within the transition. There is a causal link between occurrences x t−1 and y t if y t is the actual effect of x t−1 , or if x t−1 is the actual cause of y t . Below, we define causal link, actual cause, actual effect, and causal account, following five causal principles: Realization, composition, information, integration, and exclusion.
Realization. A transition v t−1 ≺ v t must be consistent with the transition probability function of a dynamical causal network G u , Only occurrences within a transition v t−1 ≺ v t may have, or be, an actual cause or actual effect (This requirement corresponds to the first clause ("AC1") of the Halpern and Pearl account of actual causation [20,21]; that is, for C = c to be an actual cause of E = e, both must actually happen in the first place.) As a first example, we consider the transition {(OR, AND) t−1 = 10} ≺ {(OR, AND) t = 10}, shown in Figure 1D. This transition is consistent with the conditional transition probabilities of the system, shown in Figure 1C.
Composition. Occurrences and their actual causes and effects can be uni-or multi-variate. For a complete causal account of the transition v t−1 ≺ v t , all causal links between occurrences x t−1 ⊆ v t−1 and y t ⊆ v t should be considered. For this reason, we evaluate every subset of x t−1 ⊆ v t−1 as occurrences that may have actual effects and every subset y t ⊆ v t as occurrences that may have actual causes (see Figure 4). For a particular occurrence x t−1 , all subsets y t ⊆ v t are considered as candidate effects ( Figure 5A). For a particular occurrence y t , all subsets x t−1 ⊆ v t−1 are considered as candidate causes (see Figure 5B). In what follows, we refer to occurrences consisting of a single variable as "first-order" occurrences and to multi-variate occurrences as "high-order" occurrences, and, likewise, to "first-order" and "high-order" causes and effects.  In the example transition shown in Figure 4, {OR t−1 = 1} and {AND t = 0} are first-order occurrences that could have an actual effect in v t , and {(OR, AND) t−1 = 10} is a high-order occurrence that could also have its own actual effect in v t . On the other side, {OR t = 1}, {AND t = 0} and {(OR, AND) t = 10} are occurrences (two first-order and one high-order) that could have an actual cause in v t−1 . To identify the respective actual cause (or effect) of any of these occurrences, we evaluate all possible sets {OR = 1}, {AND = 0}, and {(OR, AND) = 10} at time t − 1 (or t). Note that, in principle, we also consider the empty set, again using the convention that π(∅) = 1 (see "exclusion", below).
Information. An occurrence must provide information about its actual cause or effect. This means that it should increase the probability of its actual cause or effect compared to its probability if the occurrence is unspecified. To evaluate this, we compare the probability of a candidate effect y t in the effect repertoire of the occurrence x t−1 (Equation (3)) to its corresponding probability in the unconstrained repertoire (Equation (6)). In line with information-theoretical principles, we define the effect information ρ e of the occurrence x t−1 about a subsequent occurrence y t (the candidate effect) as: In words, the effect information ρ e is the relative increase in probability of an occurrence at t when constrained by an occurrence at t − 1, compared to when it is unconstrained. A positive effect information ρ e (x t−1 , y t ) > 0 means that the occurrence x t−1 makes a positive difference in bringing about y t . Similarly, we compare the probability of a candidate cause x t−1 in the cause repertoire of the occurrence y t (Equation (4)) to its corresponding probability in the unconstrained repertoire (Equation (5)). Thus, we define the cause information ρ c of the occurrence y t about a prior occurrence x t−1 (the candidate cause) as: In words, the cause information ρ c is the relative increase in probability of an occurrence at t − 1 when constrained by an occurrence at t, compared to when it is unconstrained. Note that the unconstrained repertoire (Equations (5) and (6)) is an average over all possible states of the occurrence. The cause and effect information thus take all possible counterfactual states of the occurrence into account in determining the strength of constraints.
In an information-theoretic context, the formula log 2 (p(x | y)/p(x)) is also known as the "pointwise mutual information" (see [43], Chapter 2). While the pointwise mutual information is symmetric, the cause and effect information of an occurrence pair (x t−1 , y t ) are not always identical, as they are defined based on the product probabilities in Equations (3) and (4). Nevertheless, ρ e and ρ c can be interpreted as the number of bits of information that one occurrence specifies about the other.  The state that actually occurred is selected from the effect or cause repertoire (green is used for effects, blue for causes). Its probability is compared to the probability of the same state when unconstrained (overlaid distributions without fill). All repertoires are based on product probabilities, π (Equations (3) and (4)), that discount correlations due to common inputs when variables are causally marginalized. In addition to the mutual information, ρ e/c (x t−1 , y t ) is also related to information-theoretic divergences that measure differences in probability distributions, such as the Kullback-Leibler divergence D KL (p(x|y)||p(x)), which corresponds to an average of log 2 (p(x | y)/p(x)) over all states x ∈ Ω X , weighted by p(x | y). Here, we do not include any such weighting factor, since the transition specifies which states actually occurred. While other definitions of cause and effect information are, in principle, conceivable, ρ e/c (x t−1 , y t ) captures the notion of information in a general sense and in basic terms.
Note that ρ e > 0 is a necessary, but not sufficient, condition for y t to be an actual effect of x t−1 and ρ c > 0 is a necessary, but not sufficient, condition for x t−1 to be an actual cause of y t . Further, ρ c/e = 0 if and only if conditioning on the occurrence does not change the probability of a potential cause or effect, which is always the case when conditioning on the empty set.
Occurrences x t−1 that lower the probability of a subsequent occurrence y t have been termed "preventative causes" by some [33]. Rather than counting a negative effect information ρ e (x t−1 , y t ) < 0 as indicating a possible "preventative effect", we take the stance that such an occurrence x t−1 has no effect on y t , since it actually predicts other occurrences Y t = ¬y t that did not happen. By the same logic, a negative cause information ρ c (x t−1 , y t ) < 0 means that x t−1 is not a cause of y t within the transition. Nevertheless, the current framework can, in principle, quantify the strength of possible "preventative" causes and effects.
In Figure 5A, the occurrence {OR t−1 = 1} raises the probability of {OR t = 1}, and vice versa ( Figure 5B By contrast, the occurrence {OR t−1 = 1} lowers the probability of occurrence {AND t = 0} and also of the second-order occurrence {(OR, AND) t = 10}, compared to their unconstrained probabilities. Thus, neither {AND t = 0} nor {(OR, AND) t = 10} can be actual effects of {OR t−1 = 1}. Likewise, the occurrence {OR t = 1} lowers the probability of {AND t−1 = 0}, which can, thus, not be its actual cause.
Integration. A high-order occurrence must specify more information about its actual cause or effect than its parts when they are considered independently. This means that the high-order occurrence must increase the probability of its actual cause or effect beyond the value specified by its parts.
As outlined in Section 2.3, a partitioned cause or effect repertoire specifies the residual constraints of an occurrence after applying a partition ψ. We quantify the amount of information specified by the parts of an occurrence based on partitioned cause/effect repertoires (Equations (8) and (10)). We define the effect information under a partition ψ as and the cause information under a partition ψ as The information a high-order occurrence specifies about its actual cause or effect is integrated to the extent that it exceeds the information specified under any partition ψ. Out of all permissible partitions Ψ(x t−1 , Y t ) (Equation (7)), or Ψ(X t−1 , y t ) (Equation (9)), the partition that reduces the effect or cause information the least is denoted the "minimum information partition" (MIP) [25,26], respectively: We can, then, define the integrated effect information α e as the difference between the effect information and the information under the MIP: and the integrated cause information α c as: For first-order occurrences x i,t−1 or y i,t−1 , there is only one way to partition the occurrence A positive integrated effect information (α e (x t−1 , y t ) > 0) signifies that the occurrence x t−1 has an irreducible effect on y t , which is necessary, but not sufficient, for y t to be an actual effect of x t−1 . Likewise, a positive integrated cause information (α c (x t−1 , y t ) > 0) means that y t has an irreducible cause in x t−1 , which is a necessary, but not sufficient, condition for x t−1 to be an actual cause of y t .
In our example transition, the occurrence {(OR, AND) t−1 = 10} ( Figure 5C) is reducible. This is because {OR t−1 = 1} is sufficient to determine that {OR t = 1} with probability 1 and {AND t−1 = 0} is sufficient to determine that {AND = 0} with probability 1. Thus, there is nothing to be gained by considering the two nodes together as a second-order occurrence. By contrast, the occurrence {(OR, AND) t = 10} determines the particular past state {(OR, AND) t−1 = 10} with higher probability than the two first-order occurrences {OR t = 1} and {AND t = 0}, taken separately ( Figure 5D, right). Thus, the second-order occurrence {(OR, AND) t = 10} is irreducible over the candidate Discussion 4.4).
Exclusion: An occurrence should have at most one actual cause and one actual effect (which, however, can be multi-variate; that is, a high-order occurrence). In other words, only one occurrence y t ⊆ v t can be the actual effect of an occurrence x t−1 , and only one occurrence x t−1 ⊆ v t−1 can be the actual cause of an occurrence y t .
It is possible that there are multiple occurrences The integrated effect or cause information of an occurrence quantifies the strength of its causal constraint on a candidate effect or cause. When there are multiple candidate causes or effects for which α c/e (x t−1 , y t ) > 0, we select the strongest of those constraints as its actual cause or effect (that is, the one that maximizes α). Note that adding unconstrained variables to a candidate cause (or effect) does not change the value of α, as the occurrence still specifies the same irreducible constraints about the state of the extended candidate cause (or effect). For this reason, we include a "minimality" condition, such that no subset of an actual cause or effect should have the same integrated cause or effect information. This minimality condition between overlapping candidate causes or effects is related to the third clause ("AC3") in the various Halpern-Pearl (HP) accounts of actual causation [20,21], which states that no subset of an actual cause should also satisfy the conditions for being an actual cause. Under uncertainty about the causal model, or other practical considerations, the minimality condition could, in principle, be replaced by a more elaborate criterion, similar to, for example, the Akaike information criterion (AIC) that weighs increases in causal strength, as measured here, against the number of variables included in the candidate cause or effect.
We define the irreducibility of an occurrence as its maximum integrated effect (or cause) information over all candidate effects (or causes), Considering the empty set as a possible cause or effect guarantees that the minimal value that α max can take is 0. Accordingly, if α max = 0, then the occurrence is said to be reducible, and it has is no actual cause or effect.
For the example in Figure 2A, {OR t = 1} has two candidate causes with α max c ({OR t = 1}) = 0.415 bits, the first-order occurrence {OR t−1 = 1} and the second-order occurrence {(OR, AND) t−1 = 10}. In this case, {OR t−1 = 1} is the actual cause of {OR t = 1}, by the minimality condition across overlapping candidate causes.
The exclusion principle avoids causal over-determination, which arises from counting multiple causes or effects for a single occurrence. Note, however, that symmetries in G u can give rise to genuine indeterminism about the actual cause or effect (see Results 3). This is the case if multiple candidate causes (or effects) are maximally irreducible and they are not simple sub-or super-sets of each other. Upholding the causal exclusion principle, such degenerate cases are resolved by stipulating that the one actual cause remains undetermined between all minimal candidate causes (or effects).
To summarize, we formally translate the five causal principles of IIT into the following requirements for actual causation: may have actual effects and be actual causes, and all y t ⊆ v t may have actual causes and be actual effects. Information: Occurrences must increase the probability of their causes or effects (ρ(x t−1 , y t ) > 0). Integration: Moreover, they must do so above and beyond their parts (α(x t−1 , y t ) > 0).
Exclusion: An occurrence has only one actual cause (or effect), and it is the occurrence that maximizes α c (or α e ).
Having established the above causal principles, we now formally define the actual cause and the actual effect of an occurrence within a transition v t−1 ≺ v t of the dynamical causal network G u : Definition 1. Within a transition v t−1 ≺ v t of a dynamical causal network G u , the actual cause of an occurrence y t ⊆ v t is an occurrence x t−1 ⊆ v t−1 which satisfies the following conditions: 1. The integrated cause information of y t over x t−1 is maximal Define the set of all occurrences that satisfy the above conditions as x * (y t ). As an occurrence can have, at most, one actual cause, there are three potential outcomes: 1. If x * (y t ) = {x t−1 }, then x t−1 is the actual cause of y t ; 2. if |x * (y t )| > 1 then the actual cause of y t is indeterminate; and 3. if x * (y t ) = {∅}, then y t has no actual cause.

Definition 2.
Within a transition v t−1 ≺ v t of a dynamical causal network G u , the actual effect of an occurrence x t−1 ⊆ v t−1 is an occurrence y t ⊆ v t which satisfies the following conditions: 1. The integrated effect information of x t−1 over y t is maximal α e (x t−1 , y t ) = α max (x t−1 ); and 2. No subset of y t satisfies condition (1) Define the set of all occurrences that satisfy the above conditions as y * (x t−1 ). As an occurrence can have, at most, one actual effect, there are three potential outcomes: 1. If y * (x t−1 ) = {y t }, then y t is the actual effect of x t−1 ; 2. if |y * (x t−1 )| > 1 then the actual effect of x t−1 is indeterminate; and 3. if y * (x t−1 ) = {∅}, then x t−1 has no actual effect.
Based on Definitions 1 and 2: An integrated occurrence defines a single causal link, regardless of whether the actual cause (or effect) is unique or indeterminate. When the actual cause (or effect) is unique, we sometimes refer to the actual cause (or effect) explicitly in the causal link, x t−1 ← y t (or x t−1 → y t ). The strength of a causal link is determined by its α max e or α max c value. Reducible occurrences (α max = 0) cannot form a causal link.
Under this definition, all actual causes and actual effects contribute to the causal account C(v t−1 ≺ v t ). Notably, the fact that there is a causal link x t−1 → y t does not necessarily imply that the reverse causal link x t−1 ← y t is also present, and vice versa. In other words, just because y t is the actual effect of x t−1 , the occurrence x t−1 does not have to be the actual cause of y t . It is, therefore, not redundant to include both directions in C(v t−1 ≺ v t ), as illustrated by the examples of over-determination and prevention in the Results section (see, also, Discussion 4.2). Figure 6 shows the entire causal account of our example transition. Intuitively, in this simple example, {OR t−1 = 1} has the actual effect {OR t = 1} and is also the actual cause of {OR t = 1}, and the same for {AND t−1 = 0} and {AND = 0}. Nevertheless, there is also a causal link between the second-order occurrence {(OR, AND) t = 10} and its actual cause {(OR, AND) t−1 = 10}, which is irreducible to its parts, as shown in Figure 5D (right). However, there is no complementary link from {(OR, AND) t = 10} to {(OR, AND) t−1 = 10}, as it is reducible ( Figure 5C, right). The causal account, shown in Figure 6, provides a complete causal explanation for "what happened" and "what caused what" in the transition {(OR, Similar to the notion of system-level integration in IIT [25,26], the principle of integration can also be applied to the causal account as a whole, not only to individual causal links (see Appendix A). In this way, it is possible to evaluate to what extent the In summary, the measures defined in this section provide the means to exhaustively assess "what caused what" in a transition v t−1 ≺ v t , and to evaluate the strength of specific causal links of interest under a particular set of background conditions, U = u. Software to analyze transitions in dynamical causal networks with binary variables is freely available within the "PyPhi" toolbox for integrated information theory [44] at https://github.com/ wmayner/pyphi, including documentation at https://pyphi.readthedocs.io/en/stable/examples/ actual_causation.html.

Results
In the following, we will present a series of examples to illustrate the quantities and objects defined in the theory section and address several dilemmas taken from the literature on actual causation. While indeterminism may play a fundamental role in physical causal models, the existing literature on actual causation largely focuses on deterministic problem cases. For ease of comparison, most causal networks analyzed in the following are, thus, deterministic, corresponding to prominent test cases of counterfactual accounts of actual causation (e.g., [8,11,[19][20][21]45]).

Same Transition, Different Mechanism: Disjunction, Conjunction, Bi-Conditional, and Prevention
From a dynamical point of view, without taking the causal structure of the mechanisms into account, the same occurrences happen in all four situations. However, analyzing the causal accounts of these transitions reveals differences in the number, type, and strength of causal links between occurrences and their actual causes or effects.  Figure 7A,B are often referred to as the disjunctive and conjunctive versions of the "forest-fire" example [12,20,21], where lightning and/or a match being dropped result in a forest fire. In the case that lightning strikes and the match is dropped, {A = 1} and {B = 1} are typically considered two separate (first-order) causes in both the disjunctive and conjunctive version (e.g., [20]). This result is not a valid solution within our proposed account of actual causation, as it violates the causal exclusion principle. We explicitly evaluate the high-order occurrence {AB = 11} as a candidate cause, in addition to {A = 1} and {B = 1}. In line with the distinct logic structure of the two examples, we identify the high-order occurrence {AB = 11} as the actual cause of {D = 1} in the conjunctive case, while we identify either {A = 1} or {B = 1} as the actual cause of {C = 1} in the disjunctive case, but not both. By separating actual causes from actual effects, acknowledging causal composition, and respecting the causal exclusion principle, our proposed causal analysis can illuminate and distinguish all situations displayed in Figure 7.
Bi-conditional: The significance of high-order occurrences is further emphasized by the third example ( Figure 7C), where E is a "logical bi-conditional" (an XNOR) of its two inputs. In this case, the individual occurrences {A = 1} and {B = 1} by themselves make no difference in bringing about {E = 1}; their effect information is zero. For this reason, they cannot have actual effects and cannot be actual causes. Only the second-order occurrence {AB = 11} specifies {E = 1}, which is its actual effect {AB = 11} → {E = 1}. Likewise, {E = 1} only specifies the second-order occurrence {AB = 11}, which is its actual cause {AB = 11} ← {E = 1}, but not its parts taken separately. Note that the causal strength in this example is lower than in the case of the AND-gate, since, everything else being equal, {D = 1} is, mechanistically, a less-likely output than {E = 1}.
Prevention: In the final example, Figure 7D , since A could be partitioned away without loss. This example can be seen as a case of prevention: {B = 1} causes {F = 1}, which prevents any effect of {A = 1}. In a popular narrative accompanying this example, {A = 1} is an assassin putting poison in the King's tea, while a bodyguard administers an antidote {B = 1}, and the King survives {F = 1} [12]. The bodyguard thus "prevents" the King's death (However, the causal model is also equivalent to an OR-gate, as can be seen by switching the state labels of A from '0' to '1' and vice versa. The discussed transition would correspond to the case of one input to the OR-gate being '1' and the other '0'. As the OR-gate switches on ('1') in this case, the '0' input has no effect and is not a cause). Note that the causal account is state-dependent: For a different transition, A may have an actual effect or contribute to an actual cause; if the bodyguard does not administer the antidote ({B = 0}), whether the King survives depends on the assassin (the state of A).
Taken together, the above examples demonstrate that the causal account and the causal strength of individual causal links within the account capture differences in sufficiency and necessity of the various occurrences in their respective transitions. Including both actual causes and effects, moreover, contributes to a mechanistic understanding of the transition, since not all occurrences at t − 1 with actual effects end up being actual causes of occurrences at t.

Linear Threshold Units
A generalization of simple, linear logic gates, such as OR-and AND-gates, are binary linear threshold units (LTUs). Given n equivalent inputs V t−1 = {V 1,t−1 , V 2,t−1 , . . . , V n,t−1 } to a single LTU V t , V t will turn on ('1') if the number of inputs in state '1' exceeds a given threshold k, LTUs are of great interest, for example, in the field of neural networks, since they comprise one of the simplest model mechanisms for neurons; capturing the notion that a neuron fires if it received sufficient synaptic inputs. One example is a Majority-gate, which outputs '1' if and only if more than half of its inputs are '1'. Figure  In this case, it happens to be that the third-order occurrence {ABC = 111} is minimally sufficient for {M = 1}-no smaller set of inputs would suffice. Note, however, that the actual cause is not determined based on sufficiency, but because {ABC = 111} is the set of nodes maximally constrained by the occurrence {M = 1}. Nevertheless, causal analysis, as illustrated here, will always identify a minimally sufficient set of inputs as the actual cause of an LTU v t = 1, for any number of inputs n and any threshold k. Furthermore, any occurrence of input variables x t−1 ⊆ v t−1 with at most k nodes, all in state '1', will be irreducible, with the LTU v t = 1 as their actual effect.  Theorem 1. Consider a dynamical causal network G u , such that V t = {Y t } is a linear threshold unit with n inputs and threshold k ≤ n, and V t−1 is the set of n inputs to Y t . For a transition v t−1 ≺ v t , with y t = 1 and ∑ v t−1 ≥ k, the following holds: 1. The actual cause of {Y t = 1} is an occurrence {X t−1 = x t−1 } with |x t−1 | = k and min(x t−1 ) = 1, and 2. if min(x t−1 ) = 1 and |x t−1 | ≤ k then the actual effect of Note that a LTU in the off ('0') state, {Y t = 0}, has equivalent results with the role of '0' and '1' reversed, and a threshold of n − k. In the case of over-determination (e.g., the transition v t−1 = {ABCD = 1111} ≺ v t = {M = 1}, where all inputs to the Majority-gate are '1'), the actual cause will again be a subset of three input nodes in the state '1'. However, which of the possible sets remains undetermined, due to symmetry, just as in the case of the OR-gate in Figure 7A.
For comparison, the original and updated Halpern-Pearl (HP) definitions of actual causation [20] generally identify all individual variables in state '1' as causes of an LTU v t = 1. The modified HP definition proposed in [21], roughly speaking, identifies the actual causes as the set of variables whose state needs to be flipped in order to change the outcome, which may vary depending on the state v t−1 and the threshold k. In the particular example of Figure

Distinct Background Conditions
The causal network in Figure 8A considers all inputs to M as relevant variables. Under certain circumstance, however, we may want to consider a different set of background conditions. For example, in a voting scenario it may be a given that D always votes "no" (D = 0). In that case, we may want to analyze the causal account of the Figure 8B). Doing so results in a causal account with the same causal links but higher causal strengths. This captures the intuition that the "yes votes" of A, B, and C are more important if it is already determined that D will vote "no".
The difference between the causal accounts of v t−1 ≺ v t in G u , compared to G u , moreover, highlights the fact that we explicitly distinguish fixed background conditions U = u from relevant variables V, whose counterfactual relations must be considered (see also [46]). While the background variables are fixed in their actual state U = u, all counterfactual states of the relevant variables V are considered when evaluating the causal account of v t−1 ≺ v t in G u .

Disjunction of Conjunctions
Another case often considered in the actual causation literature is a disjunction of conjunctions (DOC); that is, an OR-operation over two or more AND-operations. In the general case, a disjunction of conjunctions is a variable V t that is a disjunction of k conditions, each of which is a conjunction of n j Here, we consider a simple example, (A ∧ B) ∨ C (see Figure 9). The debate over this example is mostly concerned with the type of transition shown in Figure 9A: , and the question of whether {A = 1} is a cause of {D = 1}, even if B = 0. One story accompanying this example is: "a prisoner dies either if A loads B's gun and B shoots, or if C loads and shoots his gun, . . ., A loads B's gun, B does not shoot, but C does load and shoot his gun, so that the prisoner dies" [12,47].
The quantitative assessment of actual causes and actual effects can help to resolve issues of actual causation, in this type of example. As shown in Figure 9A The results from this example extend to the general case of disjunctions of conjunctions. In the situation where v t = 1, the actual cause of v t is a minimally sufficient occurrence. If multiple conjunctive conditions are satisfied, the actual cause of v t remains indeterminate between all minimally sufficient sets (asymmetric over-determination). At t − 1, any first-order occurrence in state '1', as well as any high-order occurrence of such nodes that does not overdetermine v t , has an actual effect. This includes any occurrence in state all '1' that contains only variables from exactly one conjunction, as well as any high-order occurrence of nodes across conjunctions, which do not fully contain any specific conjunction.
If, instead, v t = 0, then its actual cause is an occurrence that contains a single node in state '0' from each conjunctive condition. At t − 1, any occurrence in state all '0' that does not overdetermine v t has an actual effect, which is any all '0' occurrence that does not contain more than one node from any conjunction.
These results are formalized by the following theorem.
Theorem 2. Consider a dynamical causal network G u , such that V t = {Y t } is a DOC element that is a disjunction of k conditions, each of which is a conjunction of n j inputs, and V t−1 = {{V i,j,t−1 } n j i=1 } k j=1 is the set of its n = ∑ j n j inputs. For a transition v t−1 ≺ v t , the following holds: 1. If y t = 1, if min(x t−1 ) = 1 and |x t−1 | = c j = n j ; otherwise x t−1 is reducible.
2. If y t = 0, (a) The actual cause of {Y t = 0} is an occurrence x t−1 ⊆ v t−1 such that max(x t−1 ) = 0 and c j = 1 ∀ j; and (b) if max(x t−1 ) = 0 and c j ≤ 1 ∀ j then the actual effect of Proof. See Appendix C.

Complicated Voting
As has already been demonstrated in the examples in Figure 7C,D, the proposed causal analysis is not restricted to linear update functions or combinations thereof. Figure 10 depicts an example transition featuring a complicated, non-linear update function. This specific example is taken from [12,21]: If A and B agree, F takes their value; if B, C, D, and E agree, F takes A's value; otherwise, the majority decides. The transition of interest According to [21],

Non-Binary Variables
To demonstrate the utility of our proposed framework in the case of non-binary variables, we consider a voting scenario with three possible candidates ("1", "2", and "3"), as originally suggested by [48]. Let us assume that there are seven voters, five of which vote in favor of candidate "1", and the remaining two vote in favor of candidate "2"; therefore, candidate "1" wins ( Figure 11). This corresponds to the transition v t−1 = {ABCDEFG = 1111122} ≺ v t = {W = 1}. A simple majority is sufficient for any candidate to win. The winner is indicated by {W = 1/2/3}, respectively. Throughout, we assume that no candidate wins in case of a tie for the maximum number of votes, in which case {W = 0}.
If there were only two candidates, this example would reduce to a simple linear threshold unit with n = 7 inputs and threshold k = 4. To recall, according to Theorem 1, one out of all minimally sufficient sets of 4 voters in favor of candidate "1" would be chosen as the actual cause of {W = 1}, for such a binary LTU -which one remains undetermined. However, the fact that there are three candidates changes the mechanistic nature of the example, as the number of votes necessary for winning now depends on the particular input state. While four votes are always sufficient to win, three votes suffice if the other two candidates each receive two votes. As

Noise and Probabilistic Variables
The examples, so far, have involved deterministic update functions. Probabilistic accounts of causation are closely related to counterfactual accounts [10]. Nevertheless, certain problem cases only arise in probabilistic settings (e.g., that of Figure 12B). The present causal analysis can be applied equally to probabilistic and deterministic causal networks, as long as the system's transition probabilities satisfy conditional independence (Equation (1)). No separate, probabilistic calculus for actual causation is required. In the simplest case, where noise is added to a deterministic transition v t−1 ≺ v t , the noise will generally decrease the strength of the causal links in the transition. Figure 12  . The result of the causal analysis is that there are no integrated causal links within this transition. We have that {A = 1} decreases the probability of {N = 0}, and vice versa, which leads to α c/e < 0. Consequently, α max c/e = 0, as specified by the empty set. One interpretation is that the actual cause of {N = 0} must lie outside of the system, such as a missing latent variable. Another interpretation is that the actual cause for {N = 0} is genuine 'physical noise'; for example, within an element or connection. In any case, the proposed account of actual causation is sufficiently general to cover both deterministic, as well as probabilistic, systems.

Simple Classifier
As a final example, we consider a transition with a multi-variate v t : The three variables A, B, and C provide input to three different "detectors", the nodes D, S, and L. D is a "dot-detector": It outputs '1' if exactly one of the 3 inputs is in state '1'; S is a "segment-detector": It outputs '1' for input states {ABC = 110} and {ABC = 011}; and L detects lines-that is, {ABC = 111}. Figure 13 shows In addition, all high-order occurrences y t are irreducible, each having their own actual cause above those of their parts. The actual cause identified for these high-order occurrences can be interpreted as the "strongest" shared cause of nodes in the occurrence; for example, {B = 0} ← {DS = 10}. While only the occurrence {ABC = 001} is sufficient to determine {DS = 10}, this candidate causal link is reducible, because {DS = 10} does not constrain the past state of ABC any more than {D = 1} by itself. In fact, the occurrence {S = 0} does not constrain the past state of AC at all. Thus, {ABC = 001} and all other candidate causes of {DS = 10} that include these nodes are either reducible (because their causal link can be partitioned with α max c = 0) or excluded (because there is a subset of nodes whose causal strength is at least as high). In this example, {B = 0} is the only irreducible shared cause of {D = 1} and {S = 0}, and, thus, is also the actual cause of {DS = 10}.  Figure 13. Simple classifier. D is a "dot-detector", S is a "segment-detector", and L is a "line-detector" (see text).

Discussion
In this article, we presented a principled, comprehensive formalism to assess actual causation within a given dynamical causal network G u . For a transition v t−1 ≺ v t in G u , the proposed framework provides a complete causal account of all causal links between occurrences at t − 1 and t of the transition, based on five principles: Realization, composition, information, integration, and exclusion. In what follows, we review specific features and limitations of our approach, discuss how the results relate to intuitive notions about actual causation and causal explanation, and highlight some of the main differences with previous proposals aimed at operationalizing the notion of actual causation. Specifically, our framework considers all counterfactual states, rather than a single contingency, which makes it possible to assess the strength of causal links. Second, it distinguishes between actual causes and actual effects, which are considered separately. Third, it allows for causal composition, in the sense that first-and high-order occurrences can have their own causes and effects within the same transition, as long as they are irreducible. Fourth, it provides a rigorous treatment of causal overdetermination. As demonstrated in the results section, the proposed formalism is generally applicable to a vast range of physical systems, whether deterministic or probabilistic, with binary or multi-valued variables, feedforward or recurrent architectures, as well as narrative examples; as long as they can be represented as a causal network with an explicit temporal order.

Testing All Possible Counterfactuals with Equal Probability
In the simplest case, counterfactual approaches to actual causation are based on the "but-for" test [12]: C = c is a cause of E = e if C = ¬c implies E = ¬e ("but for c, e would not have happened"). In multi-variate causal networks, this condition is typically dependent on the remaining variables W. What differs among current counterfactual approaches are the permissible contingencies (W = w) under which the "but-for" test is applied (e.g., [8,11,[19][20][21]30,31]). Moreover, if there is one permissible contingency (counterfactual state) {¬c, w} that implies E = ¬e, then c is identified as a cause of e in an "all-or-nothing" manner. In summary, current approaches test for counterfactual dependence under a fixed contingency W = w, evaluating a particular counterfactual state C = ¬c. This holds true, even for recently-proposed extensions of contingency-based accounts of actual causation to probabilistic causal models [49,50] (see, however, [51] for an alternative approach, based on CP-logic).
Our starting point is a realization of a dynamical causal network G u , which is a transition v t−1 ≺ v t that is compatible with G u 's transition probabilities (p u (v t |v t−1 ) > 0) given the fixed background conditions U = u ( Figure 14A). However, we employ causal marginalization, instead of fixed W = w and C = ¬c, within the transition. This means that we replace these variables with an average over all their possible states (see Equation (2)).
Applied to variables outside of the candidate causal link (see Figure 14B), causal marginalization serves to remove the influence of these variables on the causal dependency between the occurrence and its candidate cause (or effect), which is, thus, evaluated based on its own merits. The difference between marginalizing the variables outside the causal link of interest and treating them as fixed contingencies becomes apparent in the case of the XOR ("exclusive OR") mechanism in Figure 14 (or, equivalently, the bi-conditional (XNOR) in Figure 7C). With the input B fixed in a particular state ('0' or '1'), the state of the XOR will completely depend on the state of A. However, the state of A alone does not determine the state of the XOR at all if B is marginalized. The latter better captures the mechanistic nature of the XOR, which requires a difference in A and B to switch on ('1').
We also marginalize across all possible states of C, in order to determine whether e counterfactually depends on c. Instead of identifying one particular C = ¬c for which E = ¬e, all of C's states are equally taken into account. The notion that counterfactual dependence is an "all-or-nothing concept" [12] becomes problematic; for example, if non-binary variables are considered, and also in non-deterministic settings. By contrast, our proposed approach, which considers all possible states of C, naturally extends to the case of multi-valued variables and probabilistic causal networks. Moreover, it has the additional benefit that we can quantify the strength of the causal link between an occurrence and its actual cause (effect). In the present framework, having positive effect information ρ e (x t−1 , y t ) > 0 is necessary, but not sufficient, for x t−1 → y t , and the same for positive cause information ρ c (x t−1 , y t ) > 0. Taken together, we argue that causal marginalization-that is, averaging over contingencies and all possible counterfactuals of an occurrence-reveals the mechanisms underlying the transition. By contrast, fixing relevant variables to any one specific state largely ignores them. This is because a mechanism is only fully described by all of its transition probabilities, for all possible input states (Equation (1)). For example, the bi-conditional E (in Figure 7C) only differs from the conjunction D (in Figure 7B) for the input state AB = 00. Once the underlying mechanisms are specified, based on all possible transition probabilities, causal interactions can be quantified in probabilistic terms [25,32], even within a single transition v t−1 ≺ v t (i.e., in the context of actual causation [33,52]). However, this also means that all transition probabilities have to be known for the proposed causal analysis, even for states that are not typically observed (see also [25,32,34,42]).
Finally, in our analysis, all possible past states are weighted equally in the causal marginalization. Related measures of information flow in causal networks [32], causal influence [34], and causal information [33] consider weights based on a distribution of p(v t−1 ); for example, the stationary distribution, observed probabilities, or a maximum entropy distribution (equivalent to weighting all states equally). Janzing et al. [34], for example, proposed to quantify the "factual" direct causal influence across a set of edges in a causal network by "cutting" those edges, and comparing the joint distribution before and after the cut. Their approach is very similar to our notion of partitioning. However, instead of weighting all states equally in the marginalization, they marginalized each variable according to its probabilities in the joint distribution, which typically depend on the long-term dynamics of the system (and, thus, on other mechanisms within the network than the ones directly affected by the cut), as well as the state in which the system was initialized. While this makes sense for a measure of expected causal strength, in the context of actual causation the prior probabilities of occurrences at t − 1 are extraneous to the question "what caused what?" All that matters is what actually happened, the transition v t−1 ≺ v t , and the underlying mechanisms. How likely v t−1 was to occur should not influence the causes and effects within the transition, nor how strong the causal links are between actual occurrences at t − 1 and t. In other words, the same transition, involving the same mechanisms and background conditions, should always result in the same causal account. Take, for instance, a set of nodes A, B that output to C, which is a deterministic OR-gate. If C receives no further inputs from other nodes, then whenever {AB = 11} and {C = 1}, the causal links, their strength, and the causal account of the transition {AB = 11} ≺ {C = 1} should be the same as in Figure 7A ("Disjunction"). Which larger system the set of nodes was embedded in, or what the probability was for the transition to happen in the first place, according to the equilibrium, observed, or any other distribution, is not relevant in this context. Let us assume, for example, that {A = 1} was much more likely to occur than {B = 1}. This bias in prior probability does not change the fact that, mechanistically, {A = 1} and {B = 1} have the same effect on {C = 1}, and are equivalent causes.

Distinguishing Actual Effects and Actual Causes
An implicit assumption, commonly made about (actual) causation, is that the relation between cause and effect is bidirectional: If occurrence C = c had an effect on occurrence E = e, then c is assumed to be a cause of e [8,11,[19][20][21]30,31,49,50]. As demonstrated throughout the Results section, however, this conflation of causes and effects is untenable, once multi-variate transitions v t−1 ≺ v t are considered (see also Section 4.3 below). There, an asymmetry between causes and effects simply arises, due to the fact that the set of variables that is affected by an occurrence x t−1 ⊆ v t−1 typically differs from the set of variables that affects an occurrence y t ⊆ v t . Take the toy classifier example in Figure 13: Accordingly, we propose that a comprehensive causal understanding of a given transition is provided by its complete causal account C (Definition 4), including both actual effects and actual causes. Actual effects are identified from the perspective of occurrences at t − 1, whereas actual causes are identified from the perspective of occurrences at t. This means that also the causal principles of composition, integration, and exclusion are applied from these two perspectives. When we evaluate causal links of the form x t−1 → y t , any occurrence x t−1 may have one actual effect y t ⊆ v t if x t−1 is irreducible (α max e (x t−1 ) > 0) (Definition 2). When we evaluate causal links of the form x t−1 ← y t , any occurrence y t may have one actual cause x t ⊆ v t−1 if y t is irreducible (α max c (y t ) > 0) (Definition 1). As seen in the first example (Figure 6), there may be a high-order causal link in one direction, but the reverse link may be reducible.
As mentioned in the Introduction and exemplified in the Results, our approach has a more general scope, but is still compatible with the traditional view of actual causation, concerned only with actual causes of singleton occurrences. Nevertheless, even in the limited setting of a singleton v t , considering both causes and effects may be illuminating. Consider, for example, the transition shown in Figure 9A: By itself, the occurrence {A = 1} raises the probability of {D = 1} (ρ e (x t−1 , y t ) = α e (x t−1 , y t ) > 0), which is a common determinant of being a cause in probabilistic accounts of (actual) causation [13,14,53,54] (Note though that Pearl initially proposed maximizing the posterior probability p(c | e) as a means of identifying the best ("most probable") explanation for an occurrence e ( [16]; Chapter 5). However, without a notion of irreducibility, as applied in the present framework, explanations based on p(c | e) tend to include irrelevant variables [29,55]). Even in deterministic systems with multi-variate dependencies, however, the fact that an occurrence c, by itself, raises the probability of an occurrence e, does not necessarily determine that E = e will actually occur [10]. In the example of Figure [21].
In summary, an actual effect x t−1 → y t does not imply the corresponding actual cause x t−1 ← y t , and vice versa. Including both directions in the causal account may, thus, provide a more comprehensive explanation of "what happened" in terms of "what caused what".

Composition
The proposed framework of actual causation explicitly acknowledges that there may be high-order occurrences which have genuine actual causes or actual effects. While multi-variate dependencies play an important role in complex distributed systems [4,5,56], they are largely ignored in the actual causation literature.
From a strictly informational perspective focused on predicting y t from x t−1 , one might be tempted to disregard such compositional occurrences and their actual effects, since they do not add predictive power. For instance, the actual effect of {AB = 11} in the conjunction example of Figure 7B is informationally redundant, since {D = 1} can be inferred (predicted) from {A = 1} and {B = 1} alone. From a causal perspective, however, such compositional causal links specify mechanistic constraints that would not be captured, otherwise. It is these mechanistic constraints, and not predictive powers, that provide an explanation for "what happened" in the various transitions shown in Figure 7, by revealing "what caused what". In Figure 7C To illustrate this, with respect to both actual causes and actual effects, we can extend the XNOR example to a "double bi-conditional" and consider the Figure 15). In the figure, both D and E are XNOR nodes that share one of their inputs (node B), and {AB = 11} ← {D = 1} and {BC = 11} ← {E = 1}. As illustrated by the cause-repertoires shown in Figure 15B, and in accordance with D's and E's logic function (mechanism), the actual cause of {D = 1} can be described as the fact that A and B were in the same state, and the actual cause of {E = 1} as the fact that B and C were in the same state. In addition to these first-order occurrences, also the second-order occurrence {DE = 11} has an actual cause {ABC = 111}, which can be described as the fact that all three nodes A, B, and C were in the same state. Crucially, this fact is not captured by In summary, high-order occurrences capture multi-variate mechanistic dependencies between the occurrence variables that are not revealed by the actual causes and effects of their parts. Moreover, a high-order occurrence does not exclude lower-order occurrences over their parts, which specify their own actual causes and effects. In this way, the composition principle makes explicit that high-order and first-order occurrences all contribute to the explanatory power of the causal account.

Integration
As discussed above, high-order occurrences can have actual causes and effects, but only if they are irreducible to their parts. This is illustrated in Figure 16, in which a transition equivalent to our initial example in Figure 6 ( Figure 16A) is compared against a similar, but reducible transition ( Figure 16C) in a different causal network. The two situations differ mechanistically: The OR and AND gates in Figure 16A receive common inputs from the same two nodes, while the OR and AND in Figure 16C have independent sets of inputs. Nevertheless, the actual causes and effects of all single-variable occurrences are identical in the two cases. In both transitions, {OR = 1} is caused by its one input in state '1', and {AND = 0} is caused by its one input in state '0'. What distinguishes the two causal accounts is the additional causal link, in Figure 16A, between the second-order occurrence {(OR,AND) = 10} and its actual cause {AB = 10}. Furthermore, {(OR,AND) = 10} raises the probability of both {AB = 10} (in Figure 16A) and {AD = 10} (in Figure 16C), compared to their unconstrained probability π = 0.25 and, thus, ρ c (x t−1 , y t ) > 0 in both cases. Yet, only {AB = 10} ← {(OR,AND) = 10} in Figure 16A is irreducible to its parts. This is shown by partitioning across the MIP with α c (x t−1 , y t ) = 0.17. This second-order occurrence, thus, specifies that the OR and AND gates in Figure 16A receive common inputs-a fact that would, otherwise, remain undetected.
As described in Appendix A, using the measure A(v t−1 ≺ v t ), we can also quantify the extent to which the entire causal account C of a transition v t−1 ≺ v t is irreducible. The case where A(v t−1 ≺ v t ) = 0 indicates that v t−1 ≺ v t can either be decomposed into multiple transitions without causal links between them (e.g., Figure 16C), or includes variables without any causal role in the transition (e.g., Figure 7D).

Exclusion
That an occurrence can affect several variables (high-order effect), and that the cause of an occurrence can involve several variables (high-order cause) is un-controversial [57]. Nevertheless, the possibility of multi-variate causes and effects is rarely addressed in a rigorous manner. Instead of one high-order occurrence, contingency-based approaches to actual causation typically identify multiple first-order occurrences as separate causes in these cases. This is because some approaches only allow for first-order causes by definition (e.g., [11]), while other accounts include a minimality clause that does not consider causal strength and, thus, excludes virtually all high-order occurrences in practice (e.g., [20]; but see [21]). Take the example of a simple conjunction AND = A ∧ B in the transition {AB = 11} ≺ {AND = 1} (see Figures 7B and 17). To our knowledge, all contingency-based approaches regard the first-order occurrences {A = 1} and {B = 1} as two separate causes of {AND = 1}, in this case (but see [58]); while we identify the second-order occurrence {AB = 11} (the conjunction) as the one actual cause, with α max Given a particular occurrence, x t−1 , in the transition v t−1 ≺ v t , we explicitly consider the whole power set of v t as candidate effects of x t−1 , and the whole power set of v t−1 as candidate causes of a particular occurrence y t (see Figure 17). However, the possibility of genuine multi-variate actual causes and effects requires a principled treatment of causal over-determination. While most approaches to actual causation generally allow both {A = 1} and {B = 1} to be actual causes of {AND = 1}, this seemingly-innocent violation of the causal exclusion principle becomes prohibitive once {A = 1}, {B = 1}, and {AB = 11} are recognized as candidate causes. In this case, either {AB = 11} was the actual cause, or {A = 1}, or {B = 1}. Allowing for any combination of these occurrences, however, would be illogical. Within our framework, any occurrence can, thus, have, at most, one actual cause (or effect) within a transition-the minimal occurrence with α max (Figure 17). Finally, cases of true mechanistic over-determination, due to symmetries in the causal network, are resolved by leaving the actual cause (effect) indetermined between all x * (y t ) with α max c (see Definitions 1 and 2). In this way, the causal account provides a complete picture of the actual mechanistic constraints within a given transition.

Intended Scope and Limitations
The objective of many existing approaches to actual causation is to provide an account of people's intuitive causal judgments [12,51]. For this reason, the literature on actual causation is largely rooted in examples involving situational narratives, such as "Billy and Suzy throw rocks at a bottle" [7,12], which are then compressed into a causal model to be investigated. Such narratives can serve as intuition pumps, but can also lead to confusion if important aspects of the story are omitted in the causal model applied to the example [9][10][11].
Our objective is to provide a principled, quantitative causal account of "what caused what" within a fully-specified (complete) model of a physical systems of interacting elements. We purposely set aside issues regarding model selection or incomplete causal knowledge, in order to formulate a rigorous theoretical framework applicable to any pre-determined dynamical causal network [12,36]. This puts the explanatory burden on the formal framework of actual causation, rather than on the adequacy of the model. In this setting, causal models should always be interpreted mechanistically and time is explicitly taken into account. Rather than on capturing intuition, an emphasis is put on explanatory power and consistency (see, also, [10]). With a proper formalism in place, future work should address to what extent and under which conditions the identified actual causes and effects generalize across possible levels of description (macro versus micro causes and effects), or under incomplete knowledge (see, also, [38,39]). While the proposed theoretical framework assumes idealized conditions and an exhaustive causal analysis is only feasible in rather small systems, a firm theoretical basis should facilitate the development of consistent empirical approximations for assessing actual causation in practice (see, also, [7,34]).
In addition, the examples examined in this study have been limited to direct causes and effects within transitions v t−1 ≺ v t across a single system update. The explanatory power of the proposed framework was illustrated in several examples, which included paradigmatic problem cases involving overdetermination and prevention. Yet, some prominent examples that raise issues of "pre-emption" or "causation by omission" have no direct equivalent in these basic types of physical causal models. While the approach can, in principle, identify and quantify counterfactual dependencies across k > 1 time steps by replacing (1), for the purpose of tracing a causal chain back in time [58], the role of intermediary occurrences remains to be investigated. Nevertheless, the present framework is unique in providing a general, quantitative, and principled approach to actual causation that naturally extends beyond simple, binary, and deterministic example cases, to all mechanistic systems that can be represented by a set of transition probabilities (as specified in Equation (1)).

Accountability and Causal Responsibility
This work presents a step towards a quantitative causal understanding of "what is happening" in systems such as natural or artificial neural networks, computers, and other discrete, distributed dynamical systems. Such causal knowledge can be invaluable, for example, to identify the reasons for an erroneous classification by a convolutional neural network [59], or the source of a protocol violation in a computer network [60]. A notion of multi-variate actual causes and effects, in particular, is crucial for addressing questions of accountability, or sources of network failures [12] in distributed systems. A better understanding of the actual causal links that govern system transitions should also improve our ability to effectively control the dynamical evolution of such systems and to identify adverse system states that would lead to unwanted system behaviors.
Finally, a principled approach to actual causation in neural networks may illuminate the causes of an agent's actions or decisions (biological or artificial) [61][62][63], including the causal origin of voluntary actions [64]. However, addressing the question "who caused what?", as opposed to "what caused what", implies modeling an agent with intrinsic causal power and intention [60,65]. Future work will extend the present mechanistic framework for "extrinsic" actual causation with a mechanistic account of "intrinsic" actual causation in autonomous agents [25,66].

Conclusions
We have presented a principled, comprehensive formalism to assess actual causation within a given dynamical causal network G u , which can be interpreted as consecutive time steps of a discrete dynamical system (feed-forward or recurrent). Based on five principles adopted from integrated information theory (IIT) [25,27]-realization, composition, information, integration, and exclusion-the proposed framework provides a quantitative causal account of all causal links between occurrences (including multi-variate dependencies) for a transition v t−1 ≺ v t in G u .
The strength of a causal link between an occurrence and its actual cause (or effect) is evaluated in informational terms, comparing interventional probabilities before and after a partition of the causal link, which replaces the state of each partitioned variable with an average across all its possible states (causal marginalization). Additionally, the remaining variables in G u but outside the causal link are causally marginalized. Rather than a single contingency, all counterfactual states are, thus, taken into account in the causal analysis. In this way, our framework naturally extends from deterministic to probabilistic causal networks, and also from binary to multi-valued variables, as exemplified above.
The generality of the proposed framework, moreover, makes it possible to derive analytical results for specific classes of causal networks, as demonstrated here for the case of linear threshold units and disjunctions of conjunctions. In the absence of analytical results, the actual cause (or effect) of an occurrence within G u can be determined based on an exhaustive search. Software to evaluate the causal account of simple binary networks (deterministic and probabilistic) is available within the PyPhi software package [44]. While approximations will have to be developed in order to apply our framework to larger systems and empirical settings, our objective here was to lay the theoretical foundation for a general approach to actual causation that allows moving beyond intuitive toy examples to scientific problems where intuition is lacking, such as understanding actual causation in biological or artificial neural networks. Acknowledgments: We thank Matteo Mainetti for early discussions concerning the extension of IIT to actual causation.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. Irreducibility of the Causal Account
Similar to the notion of system-level integration in integrated information theory (IIT) [25,26], the principle of integration can also be applied to the causal account as a whole, not only to individual causal links. The causal account of a particular transition v t−1 ≺ v t of the dynamical causal network G u is defined as the set of all causal links within the transition (Definition 4, main text).
In the following, we define the quantity A(v t−1 ≺ v t ), which measures to what extent the transition v t−1 ≺ v t is irreducible to its parts. Moreover, we introduce A e (v t−1 ≺ v t ), which measures the irreducibility of v t−1 and its set of "effect" causal , which measures the irreducibility of v t and its set of "cause" causal links {x t−1 ← y t } ∈ C(v t−1 ≺ v t ). In this way, we can:

•
Identify irrelevant variables within a causal account that do not contribute to any causal link ( Figure A1A); • evaluate how entangled the sets of causes and effects are within a transition v t−1 ≺ v t ( Figure A1B); and • compare A values between (sub-)transitions, in order to identify clusters of variables whose causes and effects are highly entangled, or only minimally connected ( Figure A1C).
We can assess the irreducibility of v t−1 and its set of "effect" causal links {x t−1 → y t } ∈ C(v t−1 ≺ v t ) in parallel to α e (x t−1 , y t ), by testing all possible partitions Ψ(v t−1 , V t ) (Equation (7)). This means that the transition v t−1 ≺ v t is partitioned into independent parts, in the same manner that an occurrence x t−1 is partitioned when assessing α e (x t−1 , y t ). We, then, define the irreducibility of v t−1 as the difference in the total strength of actual effects (causal links of the form x t−1 → y t ) in the complete causal account C, compared to the causal account under the MIP; which, again, denotes the partition in Ψ(v t−1 , V t ) that makes the least difference to C: In the same way, the irreducibility of v t and its set of causal links {x t−1 ← y t } ∈ C(v t−1 ≺ v t ) is defined as the difference in the total strength of actual causes (causal links of the form x t−1 ← y t ) in the causal account C, compared to the causal account under the MIP: where the MIP is, again, the partition that makes the least difference out of all possible partitions Ψ(V t−1 , v t ) (Equation (9)). This means that the transition v t−1 ≺ v t is partitioned into independent parts in the same manner that an occurrence y t is partitioned when assessing α c (x t−1 , y t ).
The irreducibility of a single-variable v t−1 or v t reduces to α max e of its one actual effect y t , or α max c of its one actual cause x t−1 , respectively. By considering the union of possible partitions, we can moreover assess the overall irreducibility of the is un-affected by the partition. Based on this notion, we define the irreducibility of a transition v t−1 ≺ v t as: is a summation over the strength of all causal links in the causal account C(v t−1 ≺ v t ), and the same for the partitioned causal account C MIP . Figure A1A shows the "Prevention" example of Figure 7D, main text, where {A = 1} has no effect and is not a cause in this transition. Replacing {A = 1} with an average over all its possible states does not make a difference to the causal account and, thus, A(v t−1 ≺ v t ) = 0 in this case. Figure A1B shows the causal account C MIP of the The irreducibility A(v t−1 ≺ v t ) provides a measure of how causally "entangled" the variables V are during the transition v t−1 ≺ v t . In a larger system, we can measure and compare the A values of multiple (sub-)transitions. In Figure A1C, for example, the causes and effects of the full transition are only weakly entangled (A = 0.03 bits), while the transitions involving the four upper or lower variables, respectively, are much more irreducible (A = 0.83 bits). In this way, A(v t−1 ≺ v t ) may be a useful quantity when evaluating more parsimonious causal explanations against the complete causal account of the full transition.

Appendix B. Supplementary Proof 1
The first theorem describes the actual causes and effects for an observation of a linear threshold unit (LTU) V t = {Y t } with n inputs and threshold k, and its inputs V t−1 . First, a series of Lemmas are demonstrated, based on transition probabilities q c,j from an effect repertoire: If X t−1 = x t−1 ⊆ V t−1 = v t−1 is an occurrence with size |X t−1 | = c and j of the c elements in X t−1 are in the 'ON' state (∑ x∈x t−1 x = j), then, by Equation (3) .
First, we demonstrate that the probabilities q c,j are non-decreasing as the number of 'ON' inputs j increases, for a fixed size of occurrence c, and that there is a specific range of values of j and c, such that the probabilities are strictly increasing.

Proof.
q c,c = 1 2 (q c+1,c + q c+1,c+1 ) (Lemma 1.2) Finally, we consider a quantity Q(c), the sum of q over all possible states for an occurrence of size c. The value Q(c) acts as a normalization term when calculating the cause repertoire of occurrence {Y t = 1}. Here, we demonstrate a relationship between these normalization terms across occurrences of different sizes: Proof.
Using the above lemmas, we are now in a position to prove the actual causes and actual effects in the causal account of a single LTU in the 'ON' state. The causal account for a LTU in the 'OFF' state follows, by symmetry.
Theorem A1. Consider a dynamical causal network G u , such that V t = {Y t } is a linear threshold unit with n inputs and threshold k ≤ n, and V t−1 is the set of n inputs to Y t . For a transition v t−1 ≺ v t−1 , with y t = 1 and ∑ v t−1 ≥ k, the following hold: Furthermore, the causal strength of the link is

Proof. Part 1:
Consider an occurrence {X t−1 = x t−1 }, such that |x t−1 | = c ≤ n and ∑ x∈x t−1 x = j. Then, the probability of x t−1 in the cause-repertoire of y t is As Y t is a first-order occurrence, there is only one possible partition, and the causal strength of a potential link is, thus, For a fixed value of c, the maximum value of causal strength occurs at j = c (since adding 'ON' elements can only increase q(c, j), by Lemma A1), Applying Lemmas A3 and A4, we see that, across different values of c, this maximum is increasing for 0 < c < k, = log 2 2 c+1 q c+1,c+1 Q(c) 2 c q c,c Q(c + 1) = log 2 q c+1,c+1 q c,c > 0, and that, for k ≤ c, the causal strength is constant, By setting c = j ≥ k, we find that the maximum causal strength is Any occurrence x t−1 with j ≥ k has maximal causal strength and satisfies condition (1) for being an actual cause, α c (x t−1 , y t ) = log 2 2 c q c,j Q(c) = log 2 2 k Q(k) = α max c (y t ).
If c ≥ k, then there exists a subset x t−1 ⊂ x t−1 with j ≥ k and c < c such that x t−1 also satisfies condition (1) and, thus, x t−1 does not satisfy condition (2). However, if j = c = k, then any subset x t−1 of x t−1 has j < k, and so Thus, x t−1 satisfies condition (2). Therefore, we have that the actual cause of y t is an occurrence x t−1 , such that |x t−1 | = k and min x t−1 = 1, Part 2: Again, consider occurrences X t−1 = x t−1 with |x t−1 | = c and ∑ x∈x t−1 x = j. The probability of y t in the effect repertoire of such an occurrence is .
As there is only one element in v t , the only question is whether or not x t−1 is reducible. If it is reducible, it has no actual effect. Otherwise, its actual effect must be y t . First, if j < c, then ∃ x = 0 ∈ x t−1 and we can define a partition and α e (x t−1 , y t ) ≤ log 2 π(y t |x t−1 ) π(y t |x t−1 ) ψ = log 2 q c,j q c−1,j ≤ 0 (Lemma1.1/1.2), so x t−1 is reducible. Next, we consider the case where j = c but c > k. In this case, we define a partition ψ = {{(x t−1 − x), y t }, {x, ∅}} (where x ∈ x t−1 is any element), such that π(y t |x t−1 ) ψ = π(y t |(x t−1 − x)) × π(∅|x) = π(y t |(x t−1 − x)) = q c−1,c−1 , and, since c > k, and so x t−1 is, again, reducible. Finally, we show that, for j = c and c ≤ k, that x t−1 is irreducible with actual effect {Y t = 1}. All possible partitions of the pair of occurrences can be formulated as and α e (x t−1 , y t ) = min The minimum information partition occurs when d = 1 (by Lemma A3) and, thus, {X t−1 = x t−1 } is irreducible with actual effect {Y t = 1} and causal strength α e (x t−1 , y t ) = log 2 q c,c q c−1,c−1 .

Appendix C. Supplementary Proof 2
The second theorem describes the actual causes and effects for an observation of a disjunction of conjunctions (DOC) V t = {Y t }, which is a disjunction of k conjunctions, each over n j elements, and its inputs The total number of inputs to the DOC element is n = ∑ k j=1 n j . We consider occurrences x t−1 that contain c j ≤ n j elements from each of the k conjunctions, and the total number of elements is |x t−1 | = c = ∑ k j=1 c j . To simplify notation, we further definex j,t−1 = {v i,j,t−1 } n j i=1 , an occurrence with c j = n j and c j = 0 if j = j. In other words,x j,t−1 is the set of elements that make up the j th conjunction. First, a series of lemmas are demonstrated, based on the transition probabilities q(s) from an effect repertoire (Equation (3)): To isolate the specific conjunctions, we define s j ⊂ x t−1 to be the state of X t−1 within the j th conjunction, ands j = ∪ j i=1 s i ⊆ x t−1 be the state X t−1 within the first j conjunctions. For a DOC with k conjunctions, we consider occurrences with c j elements from each conjunction, In the specific case of a disjunction of two conjunctions, and, in the case of k > 2 conjunctions, we define the probability recursively The first two lemmas demonstrate the effect of adding an additional element to an occurrence. Adding an 'ON' input to an occurrence x t−1 can never decrease the probability of {Y t = 1}, while adding an 'OFF' input to an occurrence x t−1 can never increase the probability of {Y t = 1}.
Proof. The proof is again given by induction. We, first, consider the case where k = 2, Assume (without loss of generality) that the additional element to the first conjunction (c 1 = c 1 + 1). Then, we have that Therefore, when k = 2, we have that Q(c ) Q(c) = 2. Next, we assume the result holds for k − 1 and demonstrate the result for general k. Using the recursive relationship for q, we get Again, assuming that the additional element is from the first conjunction c 1 = c 1 + 1, for the ratio we have The final two Lemmas demonstrate conditions under which the probability of {Y t = 1} is either strictly increasing or strictly decreasing. Lemma A8. If min(x t−1 ) = 1, c j < n j ∀ j and x t−1 ⊂ x t−1 , then q(s ) < q(s).
Proof. The proof is given by induction. We, first, consider the case where k = 2. Assume (without loss of generality) that x t−1 has an additional element in the first conjunction, relative to x t−1 (c 1 = c 1 + 1 = 1, c 2 = c 2 ). The result can be applied recursively for differences of more than one element. First, consider the case where c 2 = 1. Then, we have q(s ) = 1 2 n 2 −c 2 > 0 = q(s).
Therefore, when k = 2, we have that q(s) < q(s ). Next, we assume the result holds for k − 1, q(s k−1 ) < q(s k−1 ), and demonstrate the result for general k. Again, assume that x t−1 and x t−1 differ by a single element in the first conjunction (c 1 = c 1 + 1, c j = c j for j > 1). As min(s k ) = 0, q(s k ) q(s k ) = q(s k−1 ) q(s k−1 ) < 1.
Using the above Lemmas, we are now in a position to prove the actual causes and actual effects in the causal account of a single DOC and its inputs. We separately consider the case where the DOC is in the 'ON' and the 'OFF' state.
Theorem A2. Consider a dynamical causal network G u , such that V t = {Y t } is a DOC element that is a disjunction of k conditions, each of which is a conjunction of n j inputs, and V t−1 = {{V i,j,t−1 } n j i=1 } k j=1 is the set of its n = ∑ j n j inputs. For a transition v t−1 ≺ v t , the following hold: 1. If y t = 1, (a) The actual cause of {Y t = 1} is an occurrence {X t−1 = x t−1 }, where x t−1 = {x i,j,t−1 } n j i=1 ⊆ v t−1 , such that min(x t−1 ) = 1; and (b) The actual effect of {X t−1 = x t−1 } is {Y t = 1}, if min(x t−1 ) = 1 and |x t−1 | = c j = n j ; otherwise, x t−1 is reducible.

Proof. Part 1a:
The actual cause of {Y t = 1}. For an occurrence {X t−1 = x t−1 }, the probability of x t−1 in the cause repertoire of y t is π(x t−1 | y t ) = q(s) Q(c) .
As Y t is a first-order occurrence, there is only one possible partition, and the causal strength of a potential link is, thus, α c (x t−1 , y t ) = log 2 π(x t−1 | y t ) π(x t−1 ) = log 2 2 c q(s) Q(c) = log 2 (Q 1 q(s)) , where Q 1 = 2 c Q(c) ∀ c (by Lemma A7). If we, then, consider adding a single element to the occurrence x t−1 = {x t−1 , x i,j,t−1 } (x i,j,t−1 / ∈ x t−1 ) then the difference in causal strength is α c (x t−1 , y t ) − α c (x t−1 , y t ) = log 2 Q 1 q(s) Q 1 q(s ) = log 2 q(s) q(s ) .
Combining the above with Lemma A6, adding an element x i,j,t−1 = 0 to an occurrence cannot increase the causal strength and, thus, occurrences that include elements in state 'OFF' cannot be the actual cause of y t . By Lemma A5, adding an element x i,j,t−1 = 1 to an occurrence cannot decrease the causal strength. Furthermore, if c j = n j and min(x j,t−1 ) = 1, then q(s) = 1 and α c (y t , x t−1 ) = log 2 (Q 1 q(s)) = log 2 (Q 1 ), independent of the number of elements in the occurrence from other conjunctions c j and their states s j (j = j). As the value Q 1 does not depend on the specific value of j, it must be the case that this is the maximum value of causal strength, α max (y t ). Furthermore, if c j < n j ∀ j, then α c (y t , x t−1 ) = log 2 (Q 1 q(s)) < log 2 (Q 1 ) . Therefore, the maximum value of causal strength is log 2 (Q 1 ) , and an occurrence x t−1 achieves this value (satisfying condition (1) of being an actual cause) if and only if there exists j, such that c j = n j and min(x j,t−1 ) = 1 (i.e., the occurrence includes a conjunction whose elements are all 'ON'). Consider an occurrence that satisfies condition (1), such that there exists j 1 with c j 1 = n j 1 . If there exists j 2 = j 1 such that c j 2 > 0, then we can define a subset x t−1 ⊂ x t−1 with c j 1 = n j 1 and c j 2 = 0 that also satisfies condition (1) and, thus, x t−1 does not satisfy condition (2). Finally, if no such j 2 exists (x t−1 =x j,t−1 ), then any subset x t−1 ⊂ x t−1 has c j < n j ∀j and does not satisfy condition (1), so x t−1 satisfies condition (2). Therefore, we have that the actual cause of y t is an occurrence x t−1 =x j,t−1 , such that min x t−1 = 1, x * (y t ) = {x j,t−1 ⊂ v t−1 | minx j,t−1 = 1}.
Part 1b: Actual effect of x t−1 when y t = 1. Again, consider occurrences X t−1 = x t−1 with c j elements from each of the k conjunctions. The effect repertoire of a DOC with k conjunctions over such occurrences is π(y t | x t−1 = s) = q(s).
As there is only one element in v t , the only question is whether or not x t−1 is reducible. If it is reducible, it has no actual effect; otherwise, its actual effect must be y t . First, if there exists x ∈ x t−1 with x = 0, then we can define x t−1 such that x t−1 = {x t−1 , x}, and a partition ψ = {x t−1 , y t }, {x, ∅} (i.e., cutting away x), such that π(y t | x t−1 ) ψ = π(y t | x t−1 ) × π(∅ | x) = π(y t | x t−1 ) = q(s ).
By Lemma A6, adding an element x = 1 to an occurrence cannot increase the causal strength and, thus, occurrences that include elements in the state 'ON' cannot be the actual cause of y t . By Lemma A5, adding an element x = 0 to an occurrence cannot decrease the causal strength. If c j > 0 ∀ j and max(x t−1 ) = 0, then α c (y t , x t−1 ) = log 2 (Q 0 (1 − q(s))) = log 2 (Q 0 ) , independent of the actual values of c j . As this holds for any set of c j that satisfies the conditions, it must be the case that this value is α max (y t ). Furthermore, if there exists j such that c j = 0, then α c (y t , x t−1 ) = log 2 (Q 0 (1 − q(s))) < log 2 (Q 0 ) . Therefore, the maximum value of causal strength is log 2 (Q 0 ) , and an occurrence x t−1 achieves this value (satisfying condition (1) of being an actual cause) if and only if c j > 0 ∀ j and max(x t−1 ) = 0 (i.e. the occurrence contains elements from every conjunction, and only elements whose state is 'OFF').
Consider an occurrence x t−1 that satisfies condition (1). If there exists j 1 such that c j 1 > 1, then we can define a subset x t−1 ⊂ x t−1 with c j 1 = 1 that also satisfies condition (1) and, thus, x t−1 does not satisfy condition (2). Finally, if c j = 1 ∀ j then for any subset x t−1 ⊂ x t−1 there exists j such that c j = 0, so x t−1 does not satisfy condition (1) and, thus, x t−1 satisfies condition (2). Therefore, we have that the actual cause of y t is an occurrence x t−1 , such that max x t−1 = 0 and c j = 1 ∀ j, x * (y t ) = {x t−1 ⊆ v t−1 | max(x t−1 ) = 0 and c j = 1 ∀ j}.
Part 2b: Actual effect of x t−1 when y t = 0. Again, consider occurrences X t−1 = x t−1 with c j elements from each of k conjunctions. The probability of y t in the effect repertoire of x t−1 is π(y t | x t−1 = s) = 1 − q(s).
As there is only one element in v t , the only question is whether or not x t−1 is reducible. If it is reducible, it has no actual effect; otherwise, its actual effect must be y t . First, if there exists x i,j,t−1 ∈ x t−1 such that x i,j,t−1 = 1, then we can define x t−1 such that x t−1 = {x t−1 , x i,j,t−1 } and a partition ψ = {x t−1 , y t }, {x i,j,t−1 , ∅} , such that π(y t | x t−1 ) ψ = π(y t | x t−1 ) × π(∅ | x i , j, t − 1) = π(y t | x t−1 ) = 1 − q(s ).