Integrated Information in Process-Algebraic Compositions

Integrated Information Theory (IIT) is most typically applied to Boolean Nets, a state transition model in which system parts cooperate by sharing state variables. By contrast, in Process Algebra, whose semantics can also be formulated in terms of (labeled) state transitions, system parts—“processes”—cooperate by sharing transitions with matching labels, according to interaction patterns expressed by suitable composition operators. Despite this substantial difference, questioning how much additional information is provided by the integration of the interacting partners above and beyond the sum of their independent contributions appears perfectly legitimate with both types of cooperation. In fact, we collect statistical data about ϕ—integrated information—relative to pairs of boolean nets that cooperate by three alternative mechanisms: shared variables—the standard choice for boolean nets—and two forms of shared transition, inspired by two process algebras. We name these mechanisms α, β and γ. Quantitative characterizations of all of them are obtained by considering three alternative execution modes, namely synchronous, asynchronous and “hybrid”, by exploring the full range of possible coupling degrees in all three cases, and by considering two possible definitions of ϕ based on two alternative notions of distribution distance.


Introduction
Integrated Information Theory (IIT) [1,2] is concerned with the study of natural or artificial systems formed by many interconnected micro-components. One of the key steps in this study is the identification of the "hidden" macro-components of the system, namely its Minimum Information Partition (MIP). The macro-components of the MIP can be seen as distinct but interacting parts: φ measures the added value provided by their integration/composition with respect to the plain sum of their contributions-how much the integrated whole is more than the sum of the separate parts.
In Boolean Nets [3]-the state transition model predominantly used for illustrating IIT-the integration/cooperation among the MIP parts occurs via the directed edges that interconnect them: the parts influence one another by reading each other's boolean variables-a form of shared-variable cooperation.
In this paper, we contrast the above cooperation mechanism with an alternative one based on shared transitions, that arises in Process Algebras (or "Calculi") [4][5][6][7][8]. Furthermore, viewing the interacting partners under the process algebraic perspective has suggested us to extend our analysis to three execution modes for boolean nets: synchronous, asynchronous and "hybrid" (although the second one is soon dropped).
We have three main objectives in mind.
The first is to put the interaction mechanisms of shared variables and shared transitions on an equal footing, and to obtain some numerical characterization of their "performance" with respect to the ability to produce integrated information.
The second is related to the central application area of IIT-the modeling and quantification of the emergence of consciousness from the complex structure of the brain. Given that the brain architecture is indeed intrinsically structured into macro-components, the investigation of alternative or additional mechanisms of cooperation among them, that explicitly reflect such higher-level structure, could be an interesting complement to the study of cooperation mechanism that only address micro-components.
The third objective is relevant to the areas from which these additional cooperation mechanisms are borrowed, namely Process Algebra and, more generally, formal methods for Software Engineering. Using informational measures from IIT appears as a completely novel and attractive approach to characterizing quantitatively these practically useful mechanisms and their associated operators.
The paper is organized as follows.
In Section 2, we briefly recall the definition of Boolean Net and introduce the three execution modes: synchronous, asynchronous and hybrid. The first, yielding deterministic behaviors, is the traditional mode; however, the nondeterministic asynchronous and hybrid modes appear more in line with the nondeterminism of the systems typically addressed by Process Algebra. Here, we also discuss the three modes with respect to the property of conditional independence.
In Section 3, we introduce the interaction mechanism adopted in process-algebraic calculi/languages, one based on shared labeled transitions (abbreviated "sharTrans") as opposed to shared variables ("sharVar"). We in particular illustrate the flexible parametric operator of parallel composition from the LOTOS language, denoted "|β|": expression P|β|Q describes a system composed of two processes P and Q that cooperate by sharing some transitions, where β defines the degree of coupling between them.
In Section 4, we show that the parallel composition operator |β| can be readily used also for composing two boolean nets P and Q-still written P|β|Q-provided these are enriched with transition labels, and regardless of the chosen execution mode. This enables us to put the newly considered form of composition/integration under the lens of IIT without need to import and discuss any other element of Process Algebras.
In Section 5, we introduce notation P<α>Q and the idea to control the degree of coupling between two bool nets P and Q, under the sharVar mechanism, by controlling the number α of edges crossing between them.
In Section 6, we recall the notion of integrated information, the central concept of IIT, both in its state-dependent form φ(X) and in its averaged form, which we denoteφ. These definitions are based on a distance function d(y coop , y indep ) between two probabilistic state distributions, where y coop reflects inter-part cooperation while y indep corresponds to their independent operation. In IIT 2.0 [1], d is Relative Entropy (or Kullback-Liebler divergence, denoted dkl). We show that the definition of y indep for the sharVar context is such to avoid the "dkl-mismatch problem" that may arise when applying dkl to generic distributions. Then, we conduct a statistical analysis ofφ mode (P<α>Q) in order to study its dependency on α for the sync and hybrid execution modes of P<α>Q, using 10 pairs (P, Q) of randomly generated bool nets. For facilitating the comparison of P<α>Q with sharTrans compositions (in view of potential dkl-mismatch problems in the latter), we extend our statistical analysis by using a version ofφ in which dkl is replaced by Manhattan distance.
In Section 7, we address the problem of definingφ in the very different context of sharTrans bool net compositions P|β|Q. Here, we have to face two problems: the presence of deadlocks and the mentioned dkl-mismatches. The first problem is solved easily; a drastic way to bypass the second one is to switch to theφ variant based on Manhattan distance.
Wishing to stick to the original, dkl-based definition ofφ, in Section 8, we consider an alternative, process-algebraic cooperation mechanism, borrowed from CCS (Calculus of Communicating Systems) [5], that avoids the dkl-mismatch problem. In fact, we combine CCS parallelism ("P|Q") and restriction ("\γ") into the convenient syntactic form P[γ]Q ≡ (P|Q)\γ, where parameter γ still expresses the degree of coupling between the interacting parties. This enables us to compare, by statistical experiments, the trends ofφ, in its original dkl-based definition [1], for P<α>Q and P[γ]Q.
In Section 9, we regroup the 15 plots introduced in the previous sections into a compact table that facilitates the comparison of mechanisms α, β and γ.
Some closing remarks are given in Section 10.

Boolean Nets: Sync, Async and Hybrid Execution Modes
Boolean nets [3] are discrete sequential dynamical systems. An (n, k)-boolean net ("bool net" in the sequel) is a pair (G(B, E), F) where: • G(B, E) is a directed graph with n vertices B = {b 1 , . . . , b n }, and edge set E; each vertex b i ∈ B has exactly k incoming edges (this limitation on node in-degree is not essential; we adopt it only for convenience of implementation and notation) is a set of n boolean functions of k arguments, one for each vertex in B.

Each vertex b i ∈ B is a boolean variable controlled by boolean function
corresponds to the edges incident to b i . In the sequel, an (n, k)-bool net P is sometimes denoted P(n, k).
A bool net computation is a sequence of steps, assumed to take place in discrete time-one step at each clock tick. Each step consists of the instantaneous and simultaneous firing of a group of nodes, called the firing group. A firing group is a subset of B, which can be conveniently identified also by its characteristic function (i.e., characteristic function {1,1,0} indicates that only the first two nodes fire, out of three). When node b i fires, its value is updated according to boolean function f i .

Notation.
Lower case letters x and y denote discrete random variables. In particular, x or x(t) is the current state at time t of an (n, k)-bool net, consisting of an n-tuple of binary random variables (b 1 (t) . . . b n (t)). Similarly, y or y(t + 1) = (b 1 (t + 1) . . . b n (t + 1)) is the next state at time (t + 1). Upper case letters X and Y denote actual n-tuples of bits, i.e., the values that variables x and y may assume: X = (γ 1 . . . γ n ) and Y = (δ 1 . . . δ n ), where γs and δs are bits. Subscript i in X i is used when we want X i to range in a set of n-tuples, for example in the whole set {0, 1} n -not for selecting an element inside the tuple! For example, writing prob( We consistently use identifiers x and X for predecessor states, and y and Y for successor states. The densities of random variables x and y, often called here "distributions", are denoted p x and p y , but sometimes also x and y, with symbol overloading; the meaning should be clear from the context. For example, the probability for variable y to assume value Y i is written p y (Y i ) but also y(Y i ). tpm. In the sequel, an essential role is played by the transition probability matrix (tpm), in which entry tpm(X PQ , Y PQ ) expresses the conditional probability prob(Y PQ |X PQ ) obtained by counting all possible transitions that lead from state X PQ to state Y PQ .
Given an (n, k)-bool net, we consider three execution modes for it, which differ in the way we define FG, the set of firing groups possible at each step. (Note that FG does not depend on the current state.) The choice is made, again, by a uniform random distribution: the probability to pick any specific firing group is 1/2 n , where n is the number of nodes. Note that this is equivalent to firing each node with probability 1/2, independently node by node. Evolution is nondeterministic: each global state may have multiple distinct successors-as many as 2 n . Note that the empty firing group is also included.

Sync
We write tpm m S for denoting the transition probability matrix of bool net S executed in mode m (sync, async or hybrid).
We soon deal with composite bool nets. The easiest way to compose two independent bool nets P and Q is to take their union, defined in the obvious way and denoted P ∪ Q. P and Q are disconnected, and do not communicate. It is trivial to see that P ∪ Q is itself a bool net, which can be executed in any of the three modes.
Let us then establish some simple facts about the relations between the set FG m P∪Q of firing groups of P ∪ Q and sets FG m P and FG m Q of firing groups of the components, in the three modes. In the equations below, firing groups are conceived as node sets.
In Equation (1), {B P ∪ B Q } is a singleton set-a set whose unique element is the set B P ∪ B Q of nodes.
In sync mode, FG-be it referred to P, Q or P ∪ Q-has only one element, namely the firing group involving all available nodes. Thus, FG sync P = {B P } and FG sync Q = {B Q }. Symbol "× ∪ " denotes a Cartesian product that takes the union of the paired elements, which are node sets (e.g., Executing P ∪ Q in async mode (Equation (2)) means to fire (update) one node at a time. Thus, in this equation, we make use of singleton sets (e.g., {b i } or {b j }) formed from the individual nodes of P and Q, where Executing P ∪ Q in hybrid mode (Equation (3)) means to fire any possible subset of B P ∪ B Q , including the empty set. This set of firing groups can also be seen as the "special" Cartesian product FG hybrid P Note that the FG of the whole system is a Cartesian product only for the sync and hybrid modes, and that these results clearly hold also when P and Q are connected by some edges, i.e., are not independent, since firing groups are defined relative to node sets, regardless of node interconnections.

Conditional Independence in the Three Modes
Let x = {b 1 (t) . . . b n (t)} denote the current global state of an (n, k) bool net at time t, and y = {b 1 (t + 1) . . . b n (t + 1)} be the next global state, at time t + 1. Following Pearl [9], we say that, for any i, j ∈ {1 . . . n}, b i (t + 1) is conditionally independent from b j (t + 1), given x, if Once x is known, the additional knowledge of b j (t + 1) does not add anything to what we already know about b i (t + 1) (and vice versa). Note that the above equation means: for all γs and δs such that Two random variables y 1 and y 2 are independent if and only if their mutual information [10] is null: M(y 1 , y 2 ) = 0. Similarly, two random variables y 1 and y 2 are conditionally independent, given x, a third variable, if and only if their conditional mutual information is null: M(y 1 , y 2 |x) = 0.
Recall that mutual information M(y 1 , y 2 ), a symmetric quantity representing the information provided on average by one variable about the other, is: where p y 1 y 2 is the joint distribution of the two variables, while p y 1 and p y 2 are the respective marginal distributions. The conditional mutual information between variables y 1 and y 2 , relative to variable x, is which can be also formulated as the weighted sum of the mutual information relative to the individual values X k of variable x. IIT attributes much importance to conditional independence: when the property is satisfied, each element b i , with its function f i (b i,1 . . . b i,k ), can be interpreted as an individual causal element within the system; when it is violated, a possibly undesirable form of instantaneous causal influence between b i (t + 1) and b j (t + 1) arises.
The three considered execution modes perform differently with respect to conditional independence.

•
The sync mode entails conditional independence for the simple reason that, due to transition determinism, knowledge of the current state X already provides complete information about b i (t + 1) (and b j (t + 1)).

•
With the async mode, conditional independence is violated: knowing b j (t + 1), in the case b j (t + 1) = b j (t), reveals that b j has been the only firing (updating) node, which implies b i (t + 1) = b i (t)-a conclusion that we cannot draw from the pure knowledge of x.

•
The hybrid mode entails conditional independence. As already observed, picking a firing group with uniform probability 1/2 n is equivalent to firing each node with probability 1/2, independently node by node. Thus, finding that b j has fired does not provide additional information on whether or not b i has fired, thus on b i (t + 1).
It is straightforward to see that the above definition of conditional independence, and the results for the three modes, are valid not only for individual nodes but also for groups of nodes, i.e., for parts of the net, such as P and Q in the sequel.

Two-Step Conditional Independence
It could be of some interest to see how conditional dependence/independence carries over to the case of two or more transitions, e.g., for analyzing behaviors under macro-transitions, or temporal coarse-graining. Somewhat surprisingly, the scenario changes as follows.
In sync mode, conditional independence is still valid, for the same argument of the case of one transition: z(t + 2) is completely defined, once x(t) is known.
In async mode, conditional independence is still violated. If b j (t + 2) = b j (t), we know that node j has fired at least once: this fact reduces the probability that node i has fired at time t + 1 or t + 2, providing us additional information about b i (t + 2).
The change occurs with respect to the hybrid mode: while the property is satisfied after one transition, it is violated after two. Informally, finding b j (t + 2) = b j (t) reveals that node j has fired at least once, which yields additional information about b j (t + 1). This, in turn, may provide additional information about b i (t + 2) beyond what is already given by x(t). Of course, knowing who fired in the first step still does not say anything about who fired in the second step: the point is that additional knowledge about the intermediate values of y(t + 1) does refine our knowledge about the possible final values of z(t + 2).

Parallel Composition of LOTOS Processes: P|β|Q
In Process Algebras [4][5][6][7][8], a distributed concurrent system is formally described as a set of interacting processes. Each of these formalisms offers its own set of operators for specifying actions, interactions, concurrency, choice, nondeterminism, recursion, etc. By the Structural Operational Semantics [11], the syntactic expressions built by these operators, describing system structure and behavior, can be formally interpreted as labeled transition systems.
Of crucial importance for specifying the macro-structure and interaction patterns of the system are the parallel composition operators. We in particular refer to the flexible, parametric parallel composition operator of the process-algebraic language LOTOS (Language of Temporal Ordering Specification) [8].
When two processes P and Q are composed by the parallel composition operator "|β|", where β is the set of "synchronization labels", the resulting labeled transition system is obtained by forcing the processes to proceed jointly-in synchrony-with the transitions with labels in β, while proceeding independently-in "interleaving"-with their other transitions.
The Structural Operational Semantics provides one or more axioms or inference rules specifying the transitions associated with (the expressions formed by) each operator. The inference rules are usually written as "fractions", and define the transitions of an expression formed by that operator, appearing in the "denominator" (the conclusion), in terms of the transitions of the operator arguments, appearing in the "numerator" (the premise).
Three inference rules define the semantics of the LOTOS parallel composition expression P|β|Q, where P and Q are themselves expressions (processes): For example, when two processes P[a, b, c] and Q[b, c, d], able to perform transitions with labels in, respectively, sets {a, b, c} and {b, c, d}, are composed by the expression "P[a, b, c]|{b, c}|Q[b, c, d]", they will interleave their local transitions labeled a and d, and synchronize those labeled b and c.
When the set of synchronization labels is empty-β = ∅-we have the special case of pure interleaving composition P|∅|Q, also denoted P|||Q, where "|||" is called the interleaving operator. In this case, it is clear that the rules in Equations (8) and (9) are still applicable while the rule in Equation (10) is not; thus, in composition P|||Q, the components can only proceed one at a time.
In the next section, we discuss how to apply the above parallel composition operator to bool nets, and the way this operator performs with respect to the conditional independence property.

Parallel Compositions of Bool Nets: P|β|Q
Bool nets are state transition systems, and since the rules in Equations (8)-(10) for parallel composition are applicable to labeled transition systems, it is perfectly feasible to apply them to the composition P|β|Q of boolean nets. The only missing elements are transitions labels! For our investigations, we adopt pairs (P, Q) of nets with identical (n, k) parameters; for the labels, we proceed as follows.
First, we choose the label alphabet, which consists of the set {1, 2 . . . 2nk} of natural numbers (the choice of size 2nk is justified below). We overload symbol β to denote both a natural number, with 0 ≤ β ≤ 2nk, and the set of synchronization labels {1, 2 . . . β}, so that P|{1, 2 . . . β}|Q is written P|β|Q. In particular, β = 0 corresponds to the pure interleaving case P|||Q mentioned in the previous section. As a natural number, β represents the coupling factor between P and Q: the larger is β, the more frequent will be the steps in which P and Q must synchronize.
Second, we turn P and Q into labeled bool nets by adding two independent functions L P and L Q that, respectively, assign a label to each transition x P → y P and x Q → y Q : Aiming at maximum generality, our labels depend both on the source and on the target state, and are picked at random from set {1, 2 . . . 2nk}.
On this basis, the application of the rules in Equations (8)- (10) to P|β|Q becomes possible also when P and Q are labeled bool nets. Note that this can be done regardless of the mode-sync, async or hybrid-in which P and Q are executed.
It is important not to confuse the concept of sync/async execution mode of P and Q with the (orthogonal) concept of synchronous/asynchronous transition of P|β|Q. The execution mode refers to the individual component P or Q, and when we attribute some execution mode to the whole P|β|Q we mean that both P and Q operate, internally, according to that mode; in principle, we could even imagine composing a P operating in sync mode with, e.g., a Q operating in async or hybrid mode (but in this paper we never do that). On the other hand, a synchronous transition of P|β|Q is one in which P and Q proceed jointly, each contributing with a local transition performed according to its own mode; furthermore, the two local and simultaneous transitions must have the same label γ, with γ ∈ {1, 2 . . . β}. Conversely, an asynchronous transition of P|β|Q corresponds to a local, γ-labeled transition performed autonomously (and according to its own mode) by only one of the two components, where γ / ∈ {1, 2 . . . β}.

Conditional Dependence in Parallel Composition
We discuss the issue of conditional independence in Section 2, relative to pure bool nets. How does parallel bool net composition P|β|Q perform with respect to this property?
The question involves comparing prob(y P |x PQ , y Q ) with prob(y P |x PQ ) where, as before, x and y are states at time t and t + 1, respectively, and the subscripts identify the relevant system components.
Regardless of the execution mode of the two components, parallel composition does violate conditional independence. The reason is that knowing y Q and finding y Q = x Q indicates that Q has indeed performed a local transition, whose label, e.g., γ, we can partly or completely deduce from labeling function L Q , which is known. If γ / ∈ {1, 2 . . . β}, the system as a whole must have performed an asynchronous (interleaving) transition, in which P must have idled: we immediately deduce y P = x P . If, conversely, γ ∈ {1, 2 . . . β}, the system as a whole must have performed a synchronous transition, one in which P has performed a γ-labeled transition jointly with Q: this still tells us something about y P . In both cases, we acquire more information about y P than what x PQ alone can give.
In the area of formal methods for Software Engineering, to which Process Algebras belong, it is indeed conditional dependence that plays an important role. Consider, for example, the constraint-oriented specification style [12,13]. In this style, the parallel composition operator is used as a sort of logical conjunction: system behavior is specified by progressively accumulating constraints (processes) on the ordering of communication events and, possibly, on the exchanged data values. Each constraint reflects a different, partial view on the global system behavior, and all these views should agree on each global transition x → y. This agreement, governed by the inference rules in Equations (8)-(10), reflects a sort of on-the-fly communication between P and Q, as the global transition occurs. Overall, the effect of those rules is to introduce a mutual dependency among local transitions, which, in terms of conditional mutual information between local state components, means M(y P , y Q |x PQ ) = 0.

Deadlocks
No matter which execution mode is considered, a bool net will always be able to perform transitions from any state. This is not the case for bool net composition P|β|Q, when β > 0. A deadlock occurs at global state X PQ = X P .X Q , formed by the concatenation of local states X P and X Q , when: (i) no a-labeled local transitions are possible from state X P or X Q , with a / ∈ {1 . . . β} (these would become global, interleaving, asynchronous transitions by the rule in Equation (8) or Equation (9)); and (ii) no pair of local b-labeled transitions is possible from X P and X Q , with b ∈ {1 . . . β} (yielding global, synchronous transitions by the rule in Equation (10)). In this case, X PQ is a deadlock state.
Each tpm row should be a probability vector: its total must be 1. However, when X PQ is a deadlock state there is no possible successor Y PQ , and all elements tpm m P|β|Q (X PQ , * ) of the corresponding row would be 0s, thus violating the probability vector property. One option sometimes adopted for restoring that property is to set tpm(X PQ , X PQ ) = 1, forcing the system to permanently remain in that state, and turning a static into a dynamic deadlock: static deadlocks -some tpm rows, called null rows, only have 0s, and are not proper probability vectors; dynamic deadlocks -all rows are probability vectors, with loop-edges added.
The introduction of dynamic deadlocks preserves the probabilistic nature of the tpm, but does not discriminate between actual deadlocks and loop-transitions-those for which the source and target state coincide.
Deadlocks tend to increase as the coupling between the interacting parties becomes stronger: Proposition 1. Let P and Q be two labeled bool nets, and D(P|β|Q) be the set of deadlock states of system P|β|Q. Then, β 1 < β 2 implies D(P|β 1 |Q) ⊆ D(P|β 2 |Q).

Proof.
We prove by contradiction that if global state x is a deadlock for P|β 1 |Q, it is also a deadlock for P|β 2 |Q. Assume x is not a deadlock for P|β 2 |Q. Then, P|β 2 |Q can perform (at least) a labeled transition x a → y. If a ∈ {1 . . . β 2 }, the transition is a synchronization between P and Q, supported by the inference rule in Equation (10): then, either a ∈ {1 . . . β 1 } or a ∈ {β 1 + 1 . . . β 2 }. In the first case, transition x a → y would be feasible also for P|β 1 |Q (a contradiction); in the second case, the two component transitions x P a → y P and x Q a → y Q would enable, by the inference rules in Equations (8) and (9), two global, interleaving transitions of P|β 1 |Q (a contradiction).
If, on the other hand, a / ∈ {1 . . . β 2 }, then x a → y is an interleaving transition for P|β 2 |Q, which would be a fortiori a feasible interleaving transition for P|β 1 |Q (a contradiction).
Furthermore, given composition P|β|Q , we can establish the following relations among the deadlock sets for the different execution modes. Proposition 2. Let P and Q be two labeled bool nets, let P mode |β|Q mode be system P|β|Q executed in the specified mode, and let D be the deadlock set function of Proposition 1. Then, (i) D(P hybrid |β|Q hybrid ) ⊆ D(P sync |β|Q sync ); and (ii) D(P hybrid |β|Q hybrid ) ⊆ D(P async |β|Q async ).
Proof. Part (i). We show by contradiction that, if global state x is a deadlock for P hybrid |β|Q hybrid , it is also a deadlock for P sync |β|Q sync . If x were not a deadlock for P sync |β|Q sync , then this system could escape state x by some transition involving the simultaneous firing of all nodes of N P and N Q (by the inference rule in Equation (10)), or the firing of all nodes of N P or N Q (by the inference rule in Equation (8) or Equation (9)). These three firing scenarios are feasible also under the hybrid execution mode (see the definitions of the firing group sets FG for the three modes in Section 2), yielding a transition escaping state x also for system P hybrid |β|Q hybrid -a contradiction.
The proof for Part (ii) is analogous. Figure 1 shows the count of deadlock states, out of 2 5+5 = 1024 possible states, as a function of the coupling parameter β, for the parallel composition P m (5, 3)|β|Q m (5, 3) of two randomly generated, labeled (5, 3)-bool nets executed in mode m = sync, async or hybrid. The plots in Figure 1 provide experimental evidence for Propositions 1 and 2. Indeed, they might also suggest that a deadlock state x for P async |β|Q async must also be a deadlock state for P sync |β|Q sync . However, this is not always the case, as shown by the following simple counterexample.
Assume P and Q are two labeled (n, k)-bool nets with n = 2 and k = 1. P and Q have identical topology-node 1 reads node 2 and vice versa-and all nodes are associated with the same bit-flip bool function. The label set is {1, 2, 3, 4}, Assume labeling functions L P and L Q are defined so that Then, if we impose maximum synchronization between P and Q, by writing P|4|Q, we find that global state x = (0, 0, 0, 0) is a deadlock for P async |4|Q async (since P async can only perform local 1), with no label matching between P and Q) but it is not a deadlock for P sync |4|Q sync (since P sync and Q sync can synchronize by performing local transitions (0, 0) 1 → (1, 1)).

Bool Nets as P<α>Q sharVar Compositions
In analogy with the expression P|β|Q for the composition of two separate bool nets P and Q by shared transitions (Section 4), we let P<α>Q denote a single bool net whose nodes are partitioned into sets B P and B Q , and where there are exactly α "bridges", i.e., directed edges with one endpoint in B P and the other in B Q . Bridges allow the two bool net parts-called P and Q-to share and cross-read some of their variables; the other edges are "local" to P or Q. We take α as the degree of coupling between P and Q. Furthermore, P α and Q α , later equivalently denoted P * and Q * , represent the two components after separation: a bridge directed from P to Q (or vice versa) turns into a dangling edge of Q (or P), with no specified source node (the notation P α and Q α is meant to recall the presence of α bridges in the original, uncut bool net; however, it may still happen that one of the components, or both, when α = 0, has no dangling edges after separation).
What if we are now given two independent bool nets P and Q and we want to derive from them some system P<α>Q with target coupling factor α? This is done by some surgery: we turn α local edges of P and/or Q into bridges between P and Q. The choice of which local edge to turn into a bridge is made at random, and so is the choice of a new source node for it.
Thus, while in building P|β|Q the two arguments of the composition are unaffected, except for the addition of the labeling functions L P and L Q , for building P<α>Q we do change the topology of the components, although the node sets B P and B Q and the sets of boolean functions F P and F Q are preserved. Strictly, <α> should not be regarded as an algebraic operator, since the operation affects the operands. However, notations P<α>Q and P|β|Q are useful for highlighting the system bipartition and the involved degree of coupling.
Let us now clarify a final, subtle point about execution modes for the two types of cooperation-<α> and |β|. While expression P m |β|Q m completely defines the behavior of the system, expression P m <α>Q m would not.
In the first case, execution mode m defines the individual behaviors of P m and Q m in terms of their possible firing groups, while |β| defines the possible transition pairings, i.e., whether or not, given current state X PQ , a firing group of P can fire simultaneously with one of Q, which depends on the involved transition labels. In other words, m defines the firing groups at the local level and |β| controls them at the global level, by the mediation of transition labels.
In the second case, the potential firing groups of P m and Q m are well defined too, but we have no indication of how they should be combined to yield global transitions: should they act simultaneously or not? The solution is to understand the execution mode as applied to the net as a whole. Correspondingly, the correct, unambiguous notation for the sharVar cooperation mechanism would be (P<α>Q) m , although this will often be left implicit.

Integrated Informationφ for P<α>Q (sharVar)
In very abstract terms, state-dependent integrated information φ(Y), relative to a global system state Y, reduces to the distance or difference d between two Y-dependent probabilistic distributions Note the slight abuse of notation: x(Y) denotes here, and in similar contexts in the sequel, a distribution x that depends, as a whole, on some (state) value Y; x(X) elsewhere is used to select a specific element of distribution x. The meaning should be clear from the context, and is facilitated by our consistent use of symbols x/X and y/Y, for predecessor and successor states. In IIT terminology, x(Y) denotes a cause repertoire, as is clear in Equation (15); similarly, y(X) would denote an effect repertoire.
Furthermore, x coop (Y), x indep (Y) and consequently φ(Y) are defined with a system partition {P, Q} in mind (we restrict to bipartitions) (strictly, φ should refer to a specific partition, namely the Minimum Information Partition (MIP) [1,2], but we apply it to any (bi)partition). Distribution x coop refers to the system behavior in which parts P and Q cooperate according to the relevant interaction mechanism, e.g., sharVar or sharTrans. With distribution x indep the parts are assumed to operate independently. Hence, their difference d is meant to measure the added value provided by cooperation over independent operation.
In IIT 2.0 [1], d is relative entropy dkl (Kullback-Leibler divergence): where x 1 and x 2 are two distributions on the same discrete domain {X 1 . . . X N }. Note that dkl[x||x] = 0: the dkl of two equal distributions is null. Note also that, in light of Equation (13), one can express the mutual information in Equation (6) as follows: where x 1 and x 2 are two random variables with joint distribution denoted x 1 x 2 and respective marginal distributions x 1 and x 2 (we hope symbol overloading is no too confusing here!). Symbol "×" in Equation (14) denotes distribution product, which is defined in the main text.
Consider an (n, k)-bool net (P<α>Q) m executed in mode m (sync, async or hybrid), denoted PQ for short. The behavior of the net is fully defined by the transition probability matrix tpm m PQ . It is easy to see that, regardless of the mode m, tpm m PQ cannot have null-rows (all 0s), corresponding to deadlocks. Note, however, that one may find null-columns with the sync and async modes, corresponding to "Garden of Eden" states Y PQ that have no predecessor state X PQ . This does not happen with the hybrid mode, since the firing groups for this mode include the empty firing group (no node fires), which creates a loop-edge: any state has itself as a predecessor.
For the subsequent definitions of integrated information for PQ, we also need tpm m P * and tpm m Q * : these are the tpms that characterize the independent behaviors, under mode m, of P * and Q * , i.e., the two components P and Q after separation, when the data flowing across the α bridges from one to the other are lost due to the cut, and replaced by white noise, i.e., uniformly distributed bit tuples.
We are finally ready to actualize the abstract definition of Equation (12) into the concrete definition given in [1]. The state-dependent integrated information φ m P<α>Q (Y PQ ) for global state Y PQ of (n, k)-bool net PQ = P<α>Q executed in mode m is: where: • pre m P<α>Q (Y PQ ) is the distribution of the predecessors of state Y PQ , obtained by normalizing tpm m P<α>Q ( * , Y PQ )-the Y PQ -indexed column of tpm m P<α>Q ; • Y P and Y Q are the P and Q components of state Y PQ : Y PQ = Y P .Y Q (concatenation); • pre m P * (Y P ) and pre m Q * (Y Q ) are the distributions of the predecessors of, respectively, Y P and Y Q , obtained as done for pre m P<α>Q (Y PQ ) but using, respectively, tpm m P * and tpm m Q * ; and • "×" is distribution multiplication: if d 1 and d 2 are probability distributions defined, respectively, over {0, 1} n1 and {0, 1} n2 -the sets of tuples of lengths n1 and n2-and d = d 1 × d 2 is the distribution product, then, for the generic (n1 + n2)-bit tuple X n1+n2 = X n1 .X n2 , we have d(X n1+n2 ) = d(X n1 ) * d(X n2 ).
(The interested reader can find in [14] a freely downloadable demonstration tool illustrating state-dependent φ for bool nets executed in the standard, sync mode, for generic partitions.) The averaged formφ m dkl (P<α>Q) of integrated information (subscript "dkl" is convenient in light of subsequent developments) is defined as a weighted sum over all states Y PQ of the state dependent φ m P<α>Q (Y PQ )s-a weighted sum that we conveniently express as a dot product ("."): where: •x PQ denotes the uniform distribution of PQ states (n-tuples of bits); • post m P<α>Q (x PQ ), expressing the weights of the sum, is the distribution of the successors of state distributionx PQ . Note that we conceive functions pred and post to be applicable both to a specific state (some bit tuple X or Y) and to a distribution of such states, e.g., tox PQ . No ambiguity arises, since we always use lowercase to denote random variables or their distributions (x and y), and uppercase to denote specific state values (X and Y). (Using a distribution as argument of pred or post is preferred, since a specific state, e.g., state {0, 0, 1} of a three-bit bool net, can be represented as distribution {0, 1, 0, 0, 0, 0, 0, 0} assigning probability 1 to the second triple of bits, when these are presented in lexicographic order, and probability 0 to all other triples. In particular, values for all bit n-tuples Y PQ , listed in lexicographic order. In [15], it is shown thatφ dkl (P<α>Q) can also be computed as M(x PQ , y PQ ) − [M(x P * , y P * ) + M(x Q * , y Q * )], where the barred symbols denote uniform distributions (maximum entropy), and M is mutual information between current and next state, both referred to the global system PQ and to the two noised components P * and Q * . In our experiments, we took advantage of this alternative definition, which is computationally more efficient.

The dkl-Mismatch Problem for P<α>Q
In light of its definition in Equation (13), dkl[x 1 ||x 2 ] is undefined when x 1 (X i ) > 0 and x 2 (X i ) = 0 for some state X i : this is what we call the "dkl-mismatch problem".
Proposition 3 establishes that this problem does not arise with theφ m dkl (P<α>Q) we are considering, at least relative to two execution modes. Proof. In light of the definition in Equation (15), we must prove that, when an element of distribution pre m P<α>Q (Y PQ ) is different from zero, so is the corresponding element of distribution pre m P * (Y P ) × pre m Q * (Y Q ). For notational convenience, let these two distributions be called, respectively, x 1 and x 2 , as in Equation (13). Then, we must prove that, for any state X PQ : x 1 (X PQ ) > 0 implies x 2 (X PQ ) > 0, in sync and hybrid mode. Now, x 1 (X PQ ) > 0 means that there exists (at least) one transition triggered by some firing group f g m PQ . Correspondingly, tpm m P<α>Q (X PQ , Y PQ ) > 0. Representing firing groups as node sets, and observing that the firing groups of the whole system P<α>Q and of its parts are independent from α, we can take advantage of Equations (1)-(3), which refer to P ∪ Q ≡P<0>Q, finding that a global firing group can be decomposed into two local firing groups, under the same mode-f g m PQ = f g m P ∪ f g m Q -only for m = sync and m = hybrid (for m = async, f g async PQ must include exactly one node, while any f g async P and any f g async P must include one node each, so that f g async P ∪ f g async P includes two).
As a consequence, using a functional notation for transitions, for m = sync and m = hybrid we can write: The fact that Y P = f g m P (X P .X Q ) guarantees that tpm m P * (X P , Y P ) > 0: by definition, element tpm m P * (X P , Y P ) of "noised" matrix tpm m P * is obtained by the cumulative contribution of all values tpm m PQ (X P . * , Y P . * ), and we have assumed above that at least one of them, namely tpm m P<α>Q (X P .X Q , Y P .Y Q ), gives a non-null contribution. Similarly, we find that tpm m Q * (X Q , Y Q ) > 0. We conclude that the cut sub-systems P * and Q * separately support transitions X P → Y P and X Q → Y Q , meaning that distribution pre m P * (Y P ) (respectively, pre m Q * (Y Q )) assigns a non-null probability to its X P -indexed (respectively, X Q -indexed) element. Since x 2 was defined as pre m P * (Y P ) × pre m Q * (Y Q ), we conclude that x 2 (X P .X Q ) > 0.
The "anomaly" of the async mode already observed in Equations (1)-(3), and in Section 2, is further highlighted in Proposition 3. For these reasons, we drop this execution mode, and in the sequel m is only sync or hybrid.
The computation ofφ m dkl (P<α>Q) is not even affected by the presence of "Garden of Eden" states. If Y PQ is such a state, we might perhaps represent the predecessor distribution pre m P<α>Q (Y PQ ) as the null "probability vector"; then, regardless of the second argument pre m P * (Y P ) × pre m Q * (Y Q ) of dkl, we would obtain φ m P<α>Q (Y PQ ) = 0. However, in any case, these null values are selected away in the weighted sum of the definition in Equation (16), since state distribution post m P<α>Q (x PQ )-providing the weights-assigns probability 0 to Y PQ since, by the definition of Garden of Eden state, none of the X PQ s can transition to the latter.

Statistical Results forφ m dkl (P<α>Q)
Having definedφ m dkl (P<α>Q), we wish to investigate the dependence of this measure on α, the degree of coupling between P and Q when they cooperate by shared variables.
Letting n = 5 and k = 3, we have built ten pairs (P i (n, k), Q i (n, k)), i = 1 . . . 10, of randomly generated (n, k)-bool nets and have derived, for each pair, the sequence of P i <α>Q i systems for values α = 0, 1 . . . 2nk (with 2nk = 30), as described at the beginning of Section 5. Then, for each α, we have computed Mean 10 i=1 {φ m dkl (P i <α>Q i )} and associated standard deviation, for m = sync and hybrid. The results of the simulation are shown in the plots of Figure 2. The plots in Figure 2 confirm the intuitive expectation that integrated information grows with the coupling factor α between P and Q. Recall thatφ is a weighted sum of φ(Y PQ )s (Equation (16)), and that φ(Y PQ ) is a dkl "distance" between an x coop and an x indep distribution (Equation (15)). As α grows, it is to be expected that x coop and x indep drift apart, since a larger α means a stronger mutual influence between the behaviors of P and Q, thus a more marked departure from the behaviors they exhibit when acting independently.
The fact thatφ sync dkl >φ hybrid dkl can be intuitively explained as follows. Due to the different sizes of the involved firing group sets-|FG sync PQ | = 1 and |FG hybrid PQ | = 2 2n -in sync mode the successor distribution post sync PQ (X PQ ) is "punctual" (one element has probability 1, the others have probability 0), while post hybrid PQ (X PQ ) is much more spread over the states. An analogous difference in spread can be observed also when looking backward, with predecessor distributions pre sync PQ (Y PQ ) and pre hybrid PQ (Y PQ ). The local post and pre distributions for P and Q follow a similar pattern with respect to sync vs. hybrid.
the two argument distributions are closer to each other for m = hybrid than for m = sync, due to the higher spread of the distributions for the hybrid mode. Note that the local distributions are also affected by noise injection, which cannot but amplify their spread, pushing them closer to the global distribution with higher spread, namely the distribution for the hybrid mode, and smaller distance means smallerφ.

Statistical Results forφ m Manh (P<α>Q) Using Manhattan Distance
We show that, for the sharTrans cooperation mechanism |β|, the dkl-mismatch problem becomes pervasive. Thus, for enabling comparisons between sharVar and sharTrans cooperation in terms of integrated information, we consider also a state-dependent versionφ of this measure in which the dkl of Equation (15) is replaced by Manhattan distance (Manh): φ m P<α>Q (Y PQ ) is then used, as in Equation (16), for obtaining the corresponding state-independent φ m Manh (P<α>Q): We have conducted a statistical analysis forφ m Manh (P<α>Q) analogous to that presented in Section 6.2. The results are illustrated in Figure 3. This figure indicates that Manhattan distance broadly agrees with dkl (while not suffering from the mismatch problem), and confirms the two general facts already established with Figure 2: integrated information grows with the coupling factor α, and is higher for the sync than for the hybrid execution mode.

Integrated Informationφ for P|β|Q (LOTOS sharTrans)
How can one define integrated informationφ for sharTrans composition P|β|Q? The problem reduces to one of adapting to the new context the state-dependent measure φ(Y) of Equation (12), which is given a concrete form in Equation (15).
In fact, in dkl(x coop (Y), x indep (Y)), the first argument is readily defined even in the new setting, since it refers to the "cooperative" behavior of P|β|Q in which P and Q interact as specified by operator |β|-a behavior that is fully defined by tpm m P|β|Q -which is, in turn, fully defined by the inference rules in Equations (8)- (10). The difficulty arises with the definition of x indep : what does it mean, in the sharTrans context, for P and Q to operate independently?
Our proposed solution to this problem stems from the observation that, while under sharVar the cooperating parts share knowledge about each other's variables, under sharTrans they share knowledge about the order of transitions in time, since each part must follow the ordering of transitions of the other, at least limited to the transitions whose labels are in the synchronization set β. We must then conclude that the absence of cooperation occurs when there is no shared knowledge about local transition ordering, and no concern to agree on it. This immediately suggests to identify independent behavior-and x indep in expression dkl(x coop , x indep )-with the pure interleaving composition P|||Q (see Section 3). The resulting definition of state-dependent integrated information for the |β| mechanism is then: Note that the above definition is conceptually (and computationally) simpler than the corresponding definition for <α> (Equation (15)): no input noise for the cut components (P * and Q * ) is involved, and the second argument of dkl is not a distribution product but simply the first element of the sequence P|β|Q for β = 0, 1 . . . .
Then, the definitions of state-independent integrated informationφ for P|β|Q and for P<α>Q are essentially the same (compare with Equation (16)): Note that n is now the number of nodes in P, which has the same size as Q, yielding a total of 2n nodes.

The dkl-Mismatch Problem for P|β|Q
As anticipated, the dkl-mismatch problem becomes pervasive with systems of type P|β|Q. Let state Y PQ be fixed and consider distributions x 1 = pre m P|β|Q (Y PQ ) and x 2 = pre m P|||Q (Y PQ ) in Equation (19). The mismatch occurs when x 1 (X PQ ) > 0 and x 2 (X PQ ) = 0 for some X PQ . In sync mode, it is easy to imagine a transition X PQ P|β|Q −→ Y PQ of system P|β|Q such that the two bit tuples X PQ and Y PQ differ both in their P and in their Q component: this may happen when the transition is a synchronization. Since X PQ is a predecessor state of Y PQ , we have x 1 (X PQ ) > 0. On the contrary, no predecessor of Y PQ under system P|||Q can differ from Y PQ in both state components, since system P|||Q must fire one component at a time, as explained at the end of Section 3. Thus, necessarily, x 2 (X PQ ) = 0: this yields the mismatch. The argument for the hybrid mode is analogous, the key point being that a firing group of P|β|Q may involve nodes from both P and Q, while a firing group of P|||Q involves nodes exclusively from one component, by the definition of "|||".
To give an idea of how severe the dkl-mismatch problem is for P|β|Q, we have counted the number of states Y PQ yielding a dkl-mismatch for each of the 310 systems (P i <β>Q i ), for i = 1 . . . 10 and β = 0 . . . 30, where the (P i , Q i ) pairs are those already used in Section 6 ( Figure 2). These numbers are collected in the 10 × 31 grey-level matrix of Figure 4.

Statistical Results forφ m
Manh (P|β|Q) Using Manhattan Distance In light of the impossibility to use the dkl-based definition of φ (Equation (19)) for |β| cooperation, we switch, again, to a "hat" versionφ in which dkl is replaced by Manhattan distance: and use it, in turn, for defining the state-independentφ m Manh (P|β|Q), as in Equation (20): Analogous to Figure 3, in Figure 5, we plot the values ofφ m Manh (P|β|Q) as a function of the coupling factor β, each point obtained by averaging over 10 (P, Q) pairs, both using static deadlocks ( Figure 5, left) and dynamic deadlocks ( Figure 5, right). The distinction between static and dynamic deadlocks was introduced in Section 4.2. Note that when using static deadlocks-no 1s added on the diagonal of tpm m P|β|Q -the weights post m P|β|Q (x PQ ) in Equation (22) will in general not total 1, and must be re-normalized.

Integrated Informationφ for P[γ]Q (CCS sharTrans)
Kullback-Leibler divergence dkl is a central element of Integrated Information Theory 2.0 [1], thua it is indeed desirable to apply it in the new sharTrans context without incurring the dkl-mismatch problem. In this section, we propose a slightly different version of sharTrans cooperation, directly inspired to Robin Milner's seminal process algebra CCS (Calculus of Communicating Systems) [5], that precisely avoids that problem.
Consider the abstract expression: How can we conceive the cooperative PcoopQ and independent PindepQ behaviors of bipartite system PQ (as defined by matrices tpm m coop and tpm m indep , from which pre m coop (Y PQ ) and pre m indep (Y PQ ) are derived) so that the dkl-mismatch problem is ruled out?
Clearly, a sufficient condition for avoiding dkl-mismatches is the following: The existence of a transition X PcoopQ −→ Y implies the existence of transition X PindepQ −→ Y between the same states.
The two CCS behavioral operators of (non-parametric) parallel composition ("P|Q") and (parametric) restriction ("\γ"), where γ is a label set {1, 2 . . . γ} (using the same convention as for β), offer us a way to define cooperation and independence so that they satisfy the above condition.
In CCS, symbol τ denotes a special internal, not observable transition label: no synchronization is possible with a process that performs a τ-labeled transition. Let A be the set of observable labels and define A + = A ∪ {τ}. Symbol a ranges in A and symbol x ranges in A + . We provide below the four inference rules of the Structural Operational Semantics of CCS for the two mentioned operators (we depart from the standard definition of Milner [5] only in one aspect: we drop the idea of a synchronization based on the matching between a label a and its corresponding "co-label"ā, and revert to the LOTOS requirement that the two labels be simply equal).
The "interleaving" rules in Equations (23) and (24) establish that any transition that P or Q could perform locally-in itself-can be performed globally by composite system P|Q. Additionally, the rule in Equation (25) establishes that parallel composition P|Q can also perform synchronization transitions whenever two equally labeled observable transitions are available at the two sides. The rule in Equation (26) defines the restriction P\γ as a filter that enables P (which can be itself a two-process composition) to perform a transition only if its label is not in the specified set γ of forbidden labels (thus, τ is always admitted), pruning away all other transitions.
We now combine CCS parallel composition and restriction into the convenient syntactic form where parameter γ is enclosed in square brackets to distinguish it from the LOTOS form "|β|", and use it for actualizing the cooperation and independence relations between P and Q: Note that no deadlock can occur in P|Q. With the LOTOS-based sharTrans composition, we had assumed PindepQ = P|||Q, a form of independence by which P and Q will never deadlock and will never synchronize: their respective transitions can only interleave. This is not the case for the CCS-based approach, where PindepQ = P[0]Q = P|Q: the two independent systems, by the rule in Equation (25), can indeed synchronize any pair of transitions P a → P and Q a → Q with the same observable label.
However, by the rules in Equations (23) and (24), these same transitions can be executed separately, in interleaving-in independence. Cooperation P[γ]Q ≡ (P|Q)\γ, then, consists in ruling out these independent transitions-at least those specified in set γ-while preserving their synchronizations.
One could argue that P|Q already entails a sort of cooperation, via all the synchronization transitions it supports, and that pure LOTOS interleaving P|||Q is a more appropriate form of independence. This is true only in part. The "cooperation" that takes place in P|Q is not private: the transition P a → P that P shares with Q, forming a τ-labeled synchronization, is also "offered" separately by P (and by Q too) for further two-way synchronizations with other potential partners. When we apply restriction to the composition-(P|Q)\γ-we rule out this possibility, and cooperation via joint transitions with labels in γ becomes exclusive of the (P, Q) pair, occurring via a global, τ-labeled transition. In other words, in P|Q, the parties are not forced to wait for each other at specific transitions, as in P|β|Q, while this effect of mutual influence on transition ordering is enforced in P[γ]Q, when restriction is in action.
It is clear that the above sufficient condition for ruling out dkl-mismatches is satisfied by our newly adopted CCS-based definitions: since, by the definition of the restriction operator, the transitions of P[γ]Q are a subset of those of P[0]Q.

Deadlocks in P[γ]Q
While deadlocks can never occur in P|Q, they may occur in P[γ]Q ≡ (P|Q)\γ: this happens when P and Q offer disjoint sets of labels, thus preventing any synchronization between them, and when all these labels are members of γ, the set of forbidden labels.
In Figure 6, we show the count of deadlock states, out of 1024 possible states, as a function of the coupling parameter γ, for the composition P m (5, 3)[γ]Q m (5, 3) of two randomly generated labeled (5,3)-bool nets executed in modes sync and hybrid. Recall that we have dropped the async mode for its various anomalies. For the remaining two modes, deadlocks under the [γ] and |β| interaction mechanisms seem to behave quite similarly (compare with Figure 1).

Statistical Results forφ m dkl (P[γ]Q)
Before presenting the plots, we need to deal again with static vs. dynamic deadlocks (Section 4.2). Referring to the sync mode, tpms with dynamic deadlocks turn out to be inappropriate for computinḡ φ sync dkl (P[γ]Q), since they would re-introduce the dkl-mismatch problem that we have managed to rule out by switching to the [γ] cooperation mechanism! The reason is that a static deadlock is made dynamic by adding a "1" on the diagonal of tpm sync P[γ]Q , at an otherwise null row. This entry is unlikely to find a non-zero counterpart in tpm sync P[0]Q , since P[0]Q (i.e., P|Q) has no deadlocks-no 1s added on the diagonal-and the only possibility to have a non-zero entry on the diagonal is that an actual loop-transition X PQ → X PQ be possible for that system. However, when P and Q are executed in sync mode, this is unlikely, both when they operate in interleaving (i.e., when only one of them updates all its nodes) and, even worse, when they synchronize (i.e., when all nodes of P and Q are updated). Thus, for the sync mode, the option is to use static deadlocks-no 1s added on the diagonal of tpm m P[γ]Q . As observed above for the |β| composition, the weights post m P[γ]Q (x PQ ) in Equation (31), will in general not total 1, and must be re-normalized.
The dkl-mismatch problem does not arise with the hybrid mode since, using the empty firing group, a loop edge X PQ → X PQ is always possible for system P|Q, for any state X PQ , so the elements on the diagonal of tpm hybrid P[0]Q are all different from 0. In this case, we can then safely use both dynamic deadlocks and static deadlocks with re-normalization.
In Figure 7, we plot the values ofφ m dkl (P[γ]Q) as a function of the coupling factor γ, each point obtained by averaging over 10 (P, Q) pairs, using static deadlocks (for the sync and hybrid modes) and dynamic deadlocks (only for the hybrid mode).

Statistical Results forφ m Manh (P[γ]Q) Using Manhattan Distance
The potential mismatch between distributions d 1 and d 2 in dkl[d 1 ||d 2 ] is not a concern when using Manhattan distance Manh(d 1 , d 2 ) for definingφ m Manh (P[γ]Q) (by equations analogous to Equations (30) and (31)). Thus, we can handle both static deadlocks, with renormalization of the weights, and dynamic deadlocks. Figure 8 is analogous to Figure 7, except that Manhattan distance is used in place of dkl.
As stated initially, our interest is primarily in the comparison of cooperation mechanisms (i). It is then useful to aggregate the statistical data collected in the previous sections so that the plots for α, β and γ appear in the same diagram. This is done in Figure 9 which shows, for each fixed choice of execution mode (the columns) and distance function (the rows), the "performance" of the applicable mechanisms in terms of integrated information, as a function of the coupling factor ("coup"). Note that Manhattan distance is represented in Rows 2 and 3, corresponding to systems implementing, respectively, static and dynamic deadlocks: this distinction only affects the β and γ plots, since the α mechanism is immune to deadlocks. For convenience, the α plots in Row 2 are replicated in Row 3. (Recall also that the dkl-mismatch problem prevented us from applying distance dkl to the β mechanism.) Figure 9. Rearranging the plots of the previous sections for a comparison of the α, β, γ cooperation mechanisms, given a particular choice of execution mode (columns) and distance function (rows). For Manhattan distance, we differentiate between systems with static or dynamic deadlocks (Rows 2 and 3).
In the following subsections, we consider all three "dimensions" (iii), (ii) and (i), precisely in this order, with (i) being the dominant one.

Distribution Distances: Kullback-Leibler Divergence dkl vs. Manhattan
It is not our goal here to assess the various (pseudo-)distances used for definingφ: the interested reader can find an accurate study involving seven options for this metric in [16]. However, we are interested in checking whether, despite the different ranges of values and plot shapes that they yield, the two alternative distances give analogous indications about the mutual relations among the α, β and γ mechanisms.
A quick look at the grid of Figure 9 suggests that the relations between the plots for α and γ are not "qualitatively" different under dkl (Row 1) and under Manhattan distance (Rows 2 and 3). By this, we mean that the relative order betweenφ values for the different mechanisms and modes, as the coupling factor moves in its range, is substantially the same for the two distances.
More precisely, we can split the comparisons in two steps.
Fix the mode and vary the mechanism. In mode sync (Column 1 of Figure 9), the relation between the α and γ plots is qualitatively the same for dkl and for Manhattan distance. A similar observation applies relative to mode hybrid (Column 2).
Fix the mechanism and vary the mode. Consider the α mechanism: under dkl, the relation between φ sync dkl andφ hybrid dkl is depicted in Figure 2; under Manhattan distance, the relation betweenφ sync manh and φ hybrid manh is qualitatively the same, and is depicted in Figure 3. For mechanism β, the comparison dkl/Manhattan distance does not apply. For mechanism γ, under dkl the relation betweenφ  Figure 8 (left) that also assumes static deadlocks.
In conclusion, the choice to use Manhattan distance as an alternative to dkl, for comparing the α, β and γ mechanisms appears, a posteriori, convenient and fully legitimate.

Modes: Sync vs. Hybrid
Having found that the choice of distribution distance does not affect the picture of the relations among α, β and γ, we may wonder whether the same happens with the choice between sync and hybrid mode. It turns out that this is not the case: the pictures that emerge under the two modes are different, as a quick comparison of the two columns of Figure 9 reveals. In light of the findings of the previous subsection, it is sufficient, convenient and safe to focus on Manhattan distance-Tows 2 and 3. Indeed, given the minimal differences between these two rows, choosing one or the other is irrelevant: implementing static or dynamic deadlocks in the tpms does not significantly affect our comparisons.
In sync mode, the α plot is constantly higher than the β and γ plots, which are close to each other; in hybrid mode, the α plot crosses the other two. This crossing is mainly due to a substantial decrement of the values of the α plots, when switching from sync to hybrid ( Figure 3)-a justification of this is given in Section 6.2-whereas for the β mechanism the effect of the mode switch is reversed, with a moderate increase of the values of the hybrid over the sync plot ( Figure 5). For the γ mechanisms ( Figure 8) the effect of switching from sync to hybrid is similar.
Why does this happen? Why do the arguments related to distributions spread in Section 6.2 not apply here? Our conjecture is as follows. Under the α mechanism, the much higher abundance of transition possibilities provided by the hybrid over the sync mode for system (P<α>Q) hybrid causes a reduction of the distance between distributions x coop and x indep in φ hybrid = d(x coop , x indep ), as discussed. On the contrary, in system (P|β|Q) hybrid , written more accurately P hybrid |β|Q hybrid , the higher abundance of transitions in P hybrid and Q hybrid , with respect to those in P sync and Q sync , offers more opportunities to P hybrid |β|Q hybrid than to P sync |β|Q sync to perform synchronization transitions involving P and Q (the reader is invited to recall the discussed difference between sync execution mode of a bool net and synchronization transition for parallel composition P|β|Q). More synchronization transitions for P hybrid |β|Q hybrid yield a bigger gap between distributions post hybrid P|β|Q (X PQ ) and post hybrid P|||Q (X PQ ) (P|||Q cannot perform any synchronization transition), or between distributions pre hybrid P|β|Q (Y PQ ) and pre hybrid P|||Q (Y PQ ) (switching from forward to backward reasoning), the latter being the distributions that feature in the definition of φ m P|β|Q (Y PQ ) in Equation (19). The conclusion is that the comparison among α, β and γ cannot be done independently of the execution mode.

Mechanisms: α vs. β vs γ
We come finally to the comparison among cooperation mechanisms, one that we must contextualize to one or the other execution modes, as just established.
For further assessment of the data in Figure 9, we have included in the plots of Row 1, relative to the dkl distance, a gridline corresponding to the expected relative entropy dkl[rand 1 ||rand 2 ], where rand 1 and rand 2 are two random distributions. Similarly, in the plots of Rows 2 and 3, relative to Manhattan distance, gridlines indicate the expected Manhattan distance Manh[rand 1 ||rand 2 ]. These expected values are established by the next two propositions. Proposition 4. If p = {p 1 . . . p N } and q = {q 1 . . . q N } are random discrete distributions over the same domain of size N, where each element p i (respectively, q i ) is obtained by picking a real number x i (respectively, y i ) at random in (0, 1] and normalizing it so that p (respectively, q) is a probability vector, then as N → ∞ the expected KL-divergence between p and q is: Proof. We have: For Equation (34), we use the definitions of p i and q i , and Mean(∑ N i=1 x i ) = N/2. For Equation (35), we swap Mean and summation and use the fact that Mean(x i Log 2 (x i /y i )) is the same for all is. In Equation (36), we express the Mean as an integral on the unit square. The integral is routinely solved by parts, yielding Equations (37) and (38), where "ln" is the natural logarithm. Proposition 5. If p and q are random distributions of length N as defined in Proposition 4, then as N → ∞ the expected Manhattan distance between p and q is: Proof. The easy proof is analogous to that of Proposition 4, and is omitted.

Comparing Mechanisms under the sync Mode
The relevant plots for this comparison are those in Column 1 of the grid in Figure 9, Row 2 or 3.
φ values for the α mechanism are remarkably higher than those attained by the β and γ mechanisms, which are very close to each other.
Bool nets executing in the sync mode exhibit a fully deterministic behavior, and, due to their simplicity, are probably the model most widely used in the literature for illustrating the basic concepts of IIT. If we were to accept them as a sufficiently realistic model for consciousness phenomena, then, according to our findings, we would conclude that the traditional, simple α cooperation mechanism, outperforms the alternative and, in a way, more sophisticated process-algebraic mechanisms β and γ in achieving high values of integrated information and, potentially, consciousness. This gap is particularly marked in the central segment of the range of coupling values, where P and Q show an even balance of cooperation and independence. However, the ever-lasting debate on determinism vs. nondeterminism in the natural sciences must invest also the neurosciences, and, although we cannot provide an accurate picture of the status of this discussion in this field, we believe that the assumption of a fully deterministic model for the brain appears too restrictive, if not naive. The hybrid mode may then be a better option. (Of course, the nondeterministic hybrid mode appears a better option if we restrict ourselves to the relatively small family of models investigated in the paper, but there may well be other nondeterministic models that lend themselves to interesting and perhaps more appropriate and realistic applications to neuroscience. For example, an option that seems to have gained attention inside the IIT community (as emerged in a private communication) is that of noisy mechanisms run in sync mode, e.g., the idea that the computations performed by the boolean functions associated with bool net nodes are affected by some percentage of error.)

Comparing Mechanisms under the Hybrid Mode
The hybrid mode produces nondeterministic behavior. Nondeterminism may appear as a desirable feature, when dealing with complex systems and models in neuroscience; however, it is certainly a must for system models in Software Engineering. In the early phases of software development, for example, nondeterminism is typically used to prevent premature design choices that are postponed to later phases, down to the final implementation-a convenient way to offer implementation freedom. Then, if we set up to assess the three cooperation mechanisms in the context of formal models for Software Engineering, or system engineering in general, we believe that it is much more appropriate to refer to the hybrid execution mode.
Here, the relevant plots are those in the last column of Figure 9, Row 2 or 3. Theφ plots for the LOTOS-inspired mechanisms β and the CCS-inspired mechanism γ end up performing in a similar way, as it happens under the sync mode. However, the difference between α, on the one hand, and β-γ, on the other hand, is now considerably reduced, and a faster growth of the α plot in the lower part of the coup range is counterbalanced by the slightly higher values of the β-γ plots in the upper part.
It seems arduous, and perhaps even pointless, to speculate on the detailed differences among those three plots, trying to justify them formally-detailed differences that one may well expect, given the substantial difference between the sharVar and the sharTrans mechanisms. On the other hand, by taking a coarser look at the mentioned plots, we can reasonably conclude that, in terms of integrated information, the performances of the three mechanisms are roughly equivalent. Furthermore, by comparing their plots with the reference values (the gridlines) that derive from purely random distributions, we can additionally claim that all three methods of coupling two systems P and Q for them to interact, do their jobs quite well: in their highest values, all of them roughly double those reference values.

Conclusions
In this paper, we have addressed the scenario of two state transition systems P and Q that exhibit different types of cooperation and a variable degree of coupling. We have applied the informational measure of averaged Integrated Informationφ for the assessment and comparison of two fundamentally different cooperation mechanisms: (i) the standard shared variables mechanism associated to bool nets and very often adopted in the IIT literature, expressed as P<α>Q; and (ii) the shared transitions mechanism typical of process algebras, which we have studied in the two forms P|β|Q and P[γ]Q. In each case, α, β and γ control and measure the degree of coupling between P and Q. Having been able to exportφ from its standard application context and to adapt it to a completely novel field, re-defining what cooperation and independence mean in the new setting, is, in our opinion, one of the interesting and original contributions of our work, on the conceptual side.
We have modeled P and Q as boolean nets, and have considered three possible execution modes, namely synchronous, asynchronous and hybrid, although the anomalies of the second mode soon suggested to drop it. Furthermore, we have considered two variants of Integrated Information, based on two distinct measures of distribution distance, namely Kullback-Leibler divergence "dkl" and Manhattan distance, which avoids some limitations of dkl. With the main objective to compare the α, β and γ cooperation mechanisms (coop), the idea to articulate our experimental analysis along those two additional dimensions-execution modes (m) and distribution distances (dd)-has been useful for obtaining a sufficiently large set of plots forφ m dd (P-coop-Q) on which to ponder.
In summary, the inspection of these plots has led us to the following main conclusions.
• Adopting a definition ofφ based on Manhattan distance rather than dkl makes averaged integrated information more widely applicable; furthermore, when both variants apply, they yield nicely compatible indications. For our purposes, Manhattan distance is therefore more convenient, and safe. It is worth noting that the Earth Mover's Distance (EMD) [17], adopted in IIT 3.0 [2] but computationally more costly that Manhattan distance, would also avoid the mismatch problem arising with dkl.

•
Under the deterministic, sync execution mode, the IIT-standard cooperation mechanism α performs considerably better than β and γ, especially for bipartite systems structured so that the two parts exhibit an intermediate degree of coupling. Conversely, under the nondeterministic hybrid mode, which may be more appropriate for cognitive system models, but is definitely more appropriate for Software Engineering models, the three mechanisms exhibit roughly equivalent performances, and good ones, at least compared with those achieved by using randomized state distributions.
Could the latter approximate equivalence be intuitively expected a priori? For the author, this was not the case. Let us explain.
In the general context of discrete state-transition models for distributed, concurrent systems, it seems reasonable to consider a cooperation mechanism as effective when the cooperating parts can produce state distributions-successor or predecessor state distributions, corresponding to effector cause-reasoning-that are markedly different from those achieved when the parts work independently. The reason one might expect, at least for the hybrid mode, higherφ values for the sharTrans than for the sharVar mechanism has to do with the difference in intrinsic complexity of the two mechanisms.
For simplicity, let us refer to effect reasoning, i.e., successor state distributions. For finding the next state y PQ under α-cooperation, we only need to evaluate the n boolean functions of the bool net, where n is the overall number of nodes; the interactivity between P and Q comes "for free", depending only on the fact that, for intermediate values of α, both P and Q read a mix of local and remote nodes.
Under β-and γ-cooperation, boolean function evaluation is still necessary, but then the possible local transition labels must be computed, and these depend both on current local states x P and x Q and on next states y P and y Q , according to our labeling policy. Then, transition x PQ → y PQ is determined after a sort of negotiation between P and Q, based both on the locally available labeled transitions and on the set β of synchronization labels. It is clear that mechanisms β and γ are intrinsically more complex than α: they manipulate more information and do more work. It seemed then plausible that these mechanisms were able to exploit this additional information and machinery for creating next state distributions y coop that can depart more markedly from the reference distribution y indep , thus achieving higherφ values. (We also expected that the noise injected in P * and Q * for computing the distribution product pre m P * (Y P ) × pre m Q * (Y Q ) in Equation (15) could act as a limiting factor for the gap between this distribution and distribution pre m P<α>Q (Y PQ ), thus limiting the growth ofφ hybrid (P<α>Q), and keeping it well belowφ hybrid (P|β|Q) andφ hybrid (P[γ]Q).) Experimental evidence has shown that this expectation was wrong. The combination of measureφ with mechanisms β and γ may appear bizarre to the IIT expert ("why β and γ") as well as to the expert in process algebra ("whyφ").
The first expert may criticize the adoption of additional interaction mechanisms for modeling brain-like systems, when several phenomena related to consciousness have already been successfully investigated by the plain bool net model without super-imposed features, and given that parallel compositions |β| and [γ] fail to satisfy conditional independence. Nevertheless, flat bool nets using the basic α mechanism suffer from serious scalability problems. The state space of an (n, k)-bool net has size 2 n and the associated tpm is a 2 n × 2 n matrix: given this exponential growth, standard computers and algorithms can successfully deal only with "toy" models (such as those investigated in this paper), while there is no hope to handle realistic systems whose size n is, e.g., 5 or 10 orders of magnitude larger. We still believe that exploring macro-structured bool nets and higher-level interaction mechanisms-β, γ or others-may help alleviate those problems.
The process-algebra expert, in turn, may be puzzled by the fact thatφ is defined in terms of just one-step x PQ → y PQ of system behavior, but considers the full repertoire of conceivable system states, including those unreachable from the initial state (in this respect,φ, especially in the state-independent form, reflects the "counterfactual-reasoning" that informs J. Pearl's Do-Calculus of intervention [9])φ may then appear: (i) inadequate to cope with systems in which the existence and importance of an initial state is out of question-as in Process Algebra and Software Engineering; and (ii) insensitive to phenomena that emerge only with longer transition sequences, e.g., attractors (the interested reader may look at the demonstration in [18], where attractors play a role for the analysis of asymptotic mutual information between boolean net components P and Q). With respect to this objection, we do agree that the two analytical approaches of one-step-from-all-states and all-steps-from-one-state may address and reveal different system properties, but, of course, we can take them as complementary techniques and explore their potential synergy. In any case, usingφ in the context of formal models and languages for software engineering is, to our knowledge, a radically novel way to assess the "power" of their structuring principles and operators.