Article On Thermodynamic Interpretation of Transfer Entropy

We propose a thermodynamic interpretation of transfer entropy near equilibrium, using a specialised Boltzmann’s principle. The approach relates conditional probabilities to the probabilities of the corresponding state transitions. This in turn characterises transfer entropy as a difference of two entropy rates: the rate for a resultant transition and another rate for a possibly irreversible transition within the system affected by an additional source. We then show that this difference, the local transfer entropy, is proportional to the external entropy production, possibly due to irreversibility. Near equilibrium, transfer entropy is also interpreted as the difference in equilibrium stabilities with respect to two scenarios: a default case and the case with an additional source. Finally, we demonstrated that such a thermodynamic treatment is not applicable to information flow, a measure of causal effect.

mutual information I Y ;X ). However, Schreiber [1] points out that this ignores the more fundamental problem that mutual information measures the statically shared information between the two elements. 3 95 To address these inadequacies Schreiber introduced transfer entropy [1] (TE), the deviation from independence (in bits) of the state transition (from the previous state to the next state) of an information destination X from the previous state of an information source Y : . (1) Here n is a time index, x (k) n and y (l) n represent past states of X and Y (i.e. the k and l past values of X and Y up to and including time n). Schreiber points out that this formulation is a truly directional, dynamic measure of information transfer, and is a generalisation of the entropy rate to more than one element to form a mutual information rate. That is, transfer entropy may be seen as the difference between two entropy rates: where h X is the entropy rate: and h X,Y is a generalised entropy rate conditioning on the source state as well: h X,Y = − p(x n+1 , x (k) n , y (l) n ) log 2 p(x n+1 | x (k) n , y (l) n ).
The entropy rate h X accounts for the average number of bits needed to encode one additional state of the system if all previous states are known [1], while the entropy rate h X,Y is the entropy rate capturing the average number of bits required to represent the value of the next destination's state if source states are included in addition. Since one can always write it is easy to see that the entropy rate h X is equivalent to the rate h X,Y when the next state of destination is independent of the source [1]: Thus, in this case the transfer entropy reduces to zero. 96 Similarly, the TE can be viewed as a conditional mutual information I(Y (l) ; X | X (k) ) [17], that is as the average information contained in the source about the next state X of the destination that was not already contained in the destination's past X (k) : T Y →X (k, l) = I Y (l) ;X |X (k) = H X |X (k) − H X |X (k) ,Y (l) .
This could be interpreted (following [45] and [44]) as the diversity of state transitions in the destination 97 minus assortative noise between those state transitions and the state of the source. 98 Furthermore, we note that Schreiber's original description can be rephrased as the information 99 provided by the source about the state transition in the destination. of the next source value x n+1 , we shall adjust our notation from here onwards to consider the next source n+1 , so that we are always speaking about interactions between source states y n and destination 105 state transitions x n → x n+1 (with embedding lengths l and k implied).

106
Importantly, the TE remains a measure of observed (conditional) correlation rather than direct effect.

107
In fact, the TE is a non-linear extension of a concept known as the "

118
Using the technique originally described in [7], we observe that the TE is an average (or expectation value) of a local transfer entropy at each observation n, i.e.: with embedding lengths l and k implied as described above. The local transfer entropy quantifies the 119 information contained in the source state y n about the next state of the destination x n+1 at time step 120 n + 1, in the context of what was already contained in the past state of the destination x n . The measure is 121 local in that it is defined at each time n for each destination X in the system and each causal information 122 source Y of the destination.

123
The local TE may also be expressed as a local conditional mutual information, or a difference between local conditional entropies: where local conditional mutual information is given by and local conditional entropies are defined analogously: The average transfer entropy T Y →X (k) is always positive but is bounded above by the information 124 capacity of a single observation of the destination. For a discrete system with b possible observations 125 this is log 2 b bits. As a conditional mutual information, it can be either larger or smaller than the 126 corresponding mutual information [51]. The local TE however is not constrained so long as it averages 127 into this range: it can be greater than log 2 b for a large local information transfer, and can also in fact 128 be measured to be negative. Local transfer entropy is negative where (in the context of the history of 129 the destination) the probability of observing the actual next state of the destination given the source 130 state p(x n+1 | x n , y n ), is lower than that of observing that actual next state independently of the source Information from causal effect can be seen to flow through the system, like injecting dye into a river 142 [18].

143
It is well-recognised that measurement of causal effect necessitates some type of perturbation or 144 intervention of the source so as to detect the effect of the intervention on the destination (e.g. see

145
[52]). Attempting to infer causality without doing so leaves one measuring correlations of observations, 146 regardless of how directional they may be [18]. In this section, we adopt the measure information flow imposing a valueŝ has no effect on the value of y.
validly infers no direct causal effect.

170
Here we are interested in measuring the direct causal information flow from Y to X, so we must either 171 include all possible other sources in S or at least include enough sources to "block" 4 all non-immediate 172 directed paths from Y to X [18]. The minimum to satisfy this is the set of all direct causal sources of 173 X excluding Y , including any past states of X that are direct causal sources. That is, in alignment with 174 transfer entropy S would include X (k) .

175
The major task in computing I p (Y → X |Ŝ) is the determination of the underlying interventional 176 conditional PDFs in Eq. (14). By definition these may be gleaned by observing the results of intervening 177 in the system, however this is not possible in many cases.

178
One alternative is to use detailed knowledge of the dynamics, in particular the structure of the causal 179 links and possibly the underlying rules of the causal interactions. This also is often not available in 180 many cases, and indeed is often the very goal for which one turned to such analysis in the first place.

181
Regardless, where such knowledge is available it may allow one to make direct inferences.

182
Under certain constrained circumstances, one can construct these values from observational 183 probabilities only [18], e.g. with the "back-door adjustment" [52]. A particularly important constraint 184 on using the back-door adjustment here is that all {s, y} combinations must be observed. A local information flow can be defined following the argument that was used to define local 187 information transfer: The meaning of the local information flow is slightly different however. Certainly, it is an attribution   The thermodynamic state is generally considered as a fluctuating entity so that transition probabilities 208 like p (x n+1 |x n ) are clearly defined and can be related to a sampling procedure. Each macrostate can 209 be realised by a number of different microstates consistent with the given thermodynamic variables.

210
Importantly, in the theory of non-equilibrium thermodynamics close to equilibrium, the microstates 211 belonging to one macrostate x are equally probable. The thermodynamic entropy was originally defined by Clausius' as a state function S which satisfies where q rev is the heat transferred to an equilibrium thermodynamic system during a reversible process 214 from state A to state B. Note that this path integral is the same for all reversible paths between the past 215 and next states. than or equal to one). While it is not a mathematical probability between zero and one, it is sometimes called "thermodynamic probability", noting that W can be normalized to a probability p = W/N , where 223 N is the number of possible microstates for all macrostates.

224
The Shannon entropy that corresponds to the Boltzmann entropy S = k log W is the uncertainty in 225 the microstate which has produced the given macrostate. A specialisation of Boltzmann's principle by Einstein [56], for two states with entropies S and S 0 and "relative probability" W r (the ratio of numbers W and W 0 that account for the numbers of microstates in the macrostates with S and S 0 respectively), is given by: The expression in these relative terms is important, as pointed out by Norton [57], because the probability 233 W r is the probability of the transition between the two states under the system's normal time evolution.

234
In the example considered by Einstein [56,57], S 0 is the entropy of an (equilibrium) state, e.g. "a volume V 0 of space containing n non-interacting, moving points, whose dynamics are such as to favor no portion of the space over any other", while S is the entropy of the (non-equilibrium) state with the "same system of points, but now confined to a sub-volume V of V 0 ". Specifically, Einstein defined the transition probability W r = (V /V 0 ) n , yielding Since dynamics favour no portion of the space over any other, all the microstates are equiprobable. In general, the variation of entropy of a system ∆S is equal to the sum of the internal entropy production σ inside the system and the entropy change due to the interactions with the surroundings ∆S ext : In the case of a closed system, ∆S ext is given by the expression where q represents the heat flow received by the system from the exterior and T is the temperature of the system. This expression is often written as so that when the transition from the initial state S 0 to the final state S is irreversible, the entropy production σ > 0, while for reversible processes σ = 0, that is We shall consider another state vector, y, describing a state of a part Y of the exterior possibly coupled 237 to the system represented by X. In other words, X and Y may or may not be dependent. In general, we 238 shall say that σ y is the internal entropy production in the context of some source Y , while ∆S ext is the 239 entropy production attributed to Y .

240
Alternatively, one may consider two scenarios for such a general physical system. In the first scenario, In an attempt to provide a thermodynamic interpretation of transfer entropy we make two important assumptions, defining the range of applicability for such an interpretation. The first one relates the transition probability W r 1 of the system's reversible state change to the conditional probability p(x n+1 | x n ), obtained by sampling the process X: where Z 1 is a normalisation factor which depends on x n . According to the expression for transition probability (17), under this assumption the conditional probability of the system's transition from state x n to state x n+1 corresponds to some number W r 1 , such that S(x n+1 ) − S(x n ) = k log W r 1 , and hence The second assumption relates the transition probability W r 2 of the system's possibly irreversible internal state change, due to the interactions with the external surroundings represented in the state vector y, to the conditional probability p(x n+1 | x n , y n ), obtained by sampling the systems X and Y : Under this assumption the conditional probability of the system's (irreversible) transition from state x n to state x n+1 in the context of y n , corresponds to some number W r 2 , such that σ y = k log W r 2 , where σ y is the system's internal entropy production in the context of y, and thus p(x n+1 | x n , y n ) = 1 Z 2 e σy/k .
where Z 2 is a normalisation factor which depends on x n .

An example: random fluctuation near equilibrium 247
Let us consider the above-defined stochastic process X for a small random fluctuation around equilibrium: x n+1 = Λx n + ξ, where ξ is a multi-variate Gaussian noise process, with covariance matrix Σ ξ , uncorrelated in time.

248
Starting at time n with state x n having entropy S (x n ), the state develops into x n+1 , with entropy 249 S (x n+1 ).

250
From the probability distribution function of the above multi-variate Gaussian process, we obtain We now demonstrate that this expression concurs with the corresponding expression obtained under assumption (24). To do so we expand the entropies around x = 0 with entropy S(0): where Σ x is the covariance matrix of the process X.

251
Then, according to the assumption (24) where the term e 1 2 xn T Σ −1 xn is absorbed into the normalisation factor being only dependent on x n . In general [58,59], we have Given the quasistationarity of the relaxation process, assumed near an equilibrium, Λ → 0, and hence Σ x → Σ ξ . Then the equation (30) reduces to The last expression concurs with (28) when Λ → 0. Supported by this background, we proceed to interpret transfer entropy via transitions between states.

255
In doing so, we shall operate with local information theoretic measures (such as the local transfer entropy 256 (9)), as we are dealing with (transitions between) specific states y n , x n , x n+1 , etc. and not with all 257 possible state-spaces X, Y , etc. containing all realizations of specific states.
Transfer entropy is a difference not between entropies, but rather between entropy rates or conditional entropies, specified on average by (2) or (7), or for local values by (10): As mentioned above, the first assumption (23), taken to define the range of applicability for our interpretation, entails (24). It then follows that the first component of equation (33) That is, the local conditional entropy h(x n+1 | x n ) corresponds to resultant entropy change of the 259 transition from the past state x n to the next state x n+1 .

260
Now we need to interpret the second component of (33): the local conditional entropy h(x n+1 | 261 x n , y n ) in presence of some other factor or extra source, y n . Importantly, we must keep both the past 262 state x n and the next state x n+1 the same -only then we can characterise the internal entropy change, 263 offset by some contribution of the source y n .

Transfer entropy as entropy production 265
At this stage we can bring two right-hand side components of transfer entropy (33), represented by (35) and (36), together: When one considers a small fluctuation near an equilibrium, Z 1 ≈ Z 2 , as the number of microstates does not change much in the relevant macrostates. This removes the additive constant. Then, using the expression for entropy production (21), we obtain If Z 1 = Z 2 , the relationship includes some additive constant log 2 That is, the transfer entropy is proportional to the external entropy production, brought about by the 267 source of irreversibility Y . It captures the difference between the entropy rates that correspond to two  There is another possible interpretation that considers a fluctuation near the equilibrium. Using Kullback-Leibler divergence between discrete probability distributions p and q: and its local counterpart: we may also express the local conditional entropy as follows: It is known in macroscopic thermodynamics that stability of an equilibrium can be measured with  Analogously, the entropy change in another scenario, where an additional source y contributes to the fluctuation around the equilibrium, corresponds now to Kullback-Leibler divergence and can be seen as a measure of stability with respect to the fluctuation that is now affected by the extra 294 source y.

295
Contrasting both these fluctuations around the same equilibrium, we obtain in terms of Kullback-Leibler divergences: In these terms, transfer entropy contrasts stability of the equilibrium between two scenarios: the first 296 one corresponds to the original system, and the second one disturbs the system by the source Y . If, for 297 instance, the source Y is such that the system X is independent of it, then there is no difference in the 298 extents of disturbances to the equilibrium, and the transfer entropy is zero.  Under the new, stronger, assumptions the conditional entropies can be related to the heat transferred in the transition, per temperature. Specifically, assumption (24) entails where the last step used the definition of Clausius entropy (16). As per (16), this quantity is the same for 306 all reversible paths between the past and next states. An example illustrating the transition (x n → x n+1 ) 307 can be given by a simple thermal system x n that is connected to a heat bath -that is, to a system in 308 contact with a source of energy at temperature T . When the system X reaches a (new) equilibrium, e.g.,

309
the state x n+1 , due to its connection to the heat bath, the local conditional entropy h(x n+1 | x n ) of the 310 transition undergone by system X represents the heat transferred in the transition, per temperature.

311
Similarly, assumption (26) leads to where x n yn − →x n+1 is the new path between x n and x n+1 brought about by y n , and the entropy produced 312 along this path is σ y . That is, the first and the last points of the path over which we integrate heat transfers 313 per temperature are unchanged but the path is affected by the source y. This can be illustrated by a 314 modified thermal system, still at temperature T but with heat flowing through some thermal resistance 315 Y , while the system X repeats its transition from x n to x n+1 .

316
Transfer entropy captures the difference between expressions (44) and (45), i.e., between the relevant amounts of heat transferred to the system X, per temperature.
Assuming that Z 1 ≈ Z 2 is realistic, e.g. for quasistatic processes, then the additive constant disappears 317 as well.

318
It is clear that if the new path is still reversible (e.g., when the thermal resistance is zero) then the source y has not affected the resultant entropy change and we must have h(x n+1 | x n , y n ), brings about an irreversible internal change. If, however, the source Y changed the 323 path in such a way that the process became irreversible, then t Y →X (n + 1) = 0.

324
Finally, according to (19) and (20), the difference between the relevant heats transferred is dq/T , where q represents the heat flow received by the system from the exterior via the source Y , and hence In other words, local transfer entropy is proportional to the heat received or dissipated by the system 325 from/to the exterior. f (y n → x n+1 |x n ) = log 2 p(x n+1 |ŷ n ,x n ) y n p(y n |x n )p(x n+1 |ŷ n ,x n ) .
Let us first consider conditions under which this representation reduces to the local transfer entropy.

331
As pointed out by Lizier and Prokopenko [41], there are several conditions for such a reduction.

332
Firstly, y n and x n must be the only causal contributors to x n+1 . In a thermodynamic setting, this 333 means that there are no other sources affecting the transition from x n to x n+1 , apart from y n .

334
Whenever this condition is met, and in addition, the combination (y n , x n ) is observed, it follows that simplifying the numerator of Eq. (49).

335
Furthermore, there is another condition : For example, it is met when the source y n is both causally and conditionally independent of the 336 destination's past x n . Specifically, causal independence means p(y n ) ≡ p(y n |x n ), while conditional 337 independence is simply p(y n ) ≡ p(y n | x n ). Intuitively, the situation of causal and conditional 338 independence means that inner workings of the system X under consideration do not interfere with 339 the source Y . Alternatively, if X is the only causal influence on Y , the condition (51) also holds, as Y 340 is perfectly "explained" by X, whether X is observed or imposed on. In general, though, the condition 341 (51) means that the probability of y n if we impose a valuex n , is the same as if we had simply observed 342 the value x n =x n without imposing in the system X.
Under the conditions (50) and (51), the denominator of Eq. (49) reduces to p(x n+1 | x n ), yielding the equivalence between local causal effect and local transfer entropy f (y n → x n+1 |x n ) = t Y →X (n + 1). (52) In this case, the thermodynamic interpretation of transfer entropy would be applicable to causal effect as 344 well.

345
Whenever one of these conditions is not met, however, the reduction fails. Consider, for instance, the case when the condition (51) is satisfied, but the condition (50) is violated. For example, we may assume that there is some hidden source affecting the transition to x n+1 . In this case, the denominator of Eq. (49) does not simplify much, and the component which may have corresponded to the entropy rate of the transition between x n and x n+1 becomes log 2 y n p(y n | x n )p(x n+1 |ŷ n ,x n ). their study. Their main result characterised entropy production as k d KL (ρ ρ) which is equal to the 369 total entropy change in the total device. In contrast, in our study we consider the system of interest X 370 specifically, and characterise various entropy rates of X, but in doing so compare how these entropy rates are affected by some source of irreversibility Y . In short, transfer entropy is shown to concur with 372 the entropy produced/dissipated by the system attributed to the external source Y . 373 We also briefly considered a case of fluctuations in the system X near an equilibrium, relating transfer 374 entropy to the difference in stabilities of the equilibrium, with respect to two scenarios: a default case and 375 the case with an additional source Y . This comparison was carried out with Kullback-Leibler divergences 376 of the corresponding transition probabilities.

377
Finally, we demonstrated that such a thermodynamic treatment is not applicable to information flow: 378 a measure introduced by Ay and Polani [18] in order to capture a causal effect. We argue that the 379 main reason is the interventional approach adopted in the definition of causal effect. We identified 380 several conditions ensuring certain dependencies between the involved variables, and showed that the 381 causal effect may also be interpreted thermodynamically -but in this case it reduces to transfer entropy 382 anyway. The highlighted difference once more shows a fundamental difference between transfer entropy 383 and causal effect: the former has a thermodynamic interpretation relating to the source of irreversibility 384 Y , while the latter is a construct that in general assumes an observer intervening in the system in a 385 particular way. 386 We hope that the proposed interpretation will further advance studies relating information theory and 387 thermodynamics, both in equilibrium and non-equilibrium settings, reversible and irreversible scenarios, 388 average and local scopes, etc.