Article Linearized Transfer Entropy for Continuous Second Order Systems

The transfer entropy has proven a useful measure of coupling among components of a dynamical system. This measure effectively captures the influence of one system component on the transition probabilities (dynamics) of another. The original motivation for the measure was to quantify such relationships among signals collected from a nonlinear system. However, we have found the transfer entropy to also be a useful concept in describing linear coupling among system components. In this work we derive the analytical transfer entropy for the response of coupled, second order linear systems driven with a Gaussian random process. The resulting expression is a function of the auto- and cross-correlation functions associated with the system response for different degrees-of-freedom. We show clearly that the interpretation of the transfer entropy as a measure of "information flow" is not always valid. In fact, in certain instances the "flow" can appear to switch directions simply by altering the degree of linear coupling. A safer way to view the transfer entropy is as a measure of the ability of a given system component to predict the dynamics of another.


Introduction
One of the biggest challenges in the modeling and analysis of dynamical systems is understanding coupling mechanisms among different system components.Whether one is studying coupling on a small scale (e.g., neurons in a biological system) or large scale (e.g.coupling among widely separated geographical locations due to climate), understanding the functional form, strength, and/or direction of the coupling between two or more system components is a non-trivial task.However, this understanding is necessary if we are to build accurate models of the coupled system and make predictions (our ultimate goal).Accurately assessing the functional form of the coupling is beyond the scope of this work.To do so would require positing various models for a particular coupled system and then testing the predictive power of those models against observed data.Rather, the focus here is on understanding the strength and direction of the coupling among two system components.This task can be accomplished by forming a general hypothesis about what it means for two system components to be coupled, and then testing that hypothesis against observation.It is in this framework that the transfer entropy is operates.
The transfer entropy (TE) is a scalar measure designed to capture both the magnitude and direction of coupling among two components of a dynamical system.This measure was posed initially for data described by discrete probability distributions [1] and was later extended to continuous random variables [2].By construction, this measure quantifies a general definition of coupling that is appropriate for both linear and nonlinear systems.Moreover, TE is defined in such a way as to provide insight into the direction of the coupling (is component A driving component B or vice-versa?).Since its introduction, the TE has been applied to a diverse set of systems, including biological [1,3], chemical [4], economic [5], structural [6,7], and climate [8].A number of papers in the Neurosciences also have focused on the TE as a useful way to draw inference about coupling [9][10][11].In each case the TE provided information about the system that traditional linear measures of coupling (e.g., cross-correlation) could not.
The TE has also been linked to other concepts of coupling such as "Wiener-Granger Causality". in fact, for the class of systems studied in this work the TE can be shown to be entirely equivalent to measures of Granger causality [12].Linkages to other models and concepts of dynamical coupling such as conditional mutual information [13] and Dynamic Causal Modeling (DCM) [14], are also possible for certain special cases.The connectivity model assumed by DCM is fundamentally nonlinear (specifically bilinear), however as the degree of nonlinearity decreases the form of the DCM model approaches that of the model studied here.
Although the TE was designed as a way to gain insight into nonlinear system coupling, we have found the TE to be quite useful in the study of linear systems as well.In this special case, analytical expressions for the TE are possible and can be used to provide useful insight into the behavior of the TE.Furthermore, unlike in the general case, the linearized TE can be easily estimated from observed data.This work is therefore devoted to the understanding of TE as applied to coupled, driven linear systems.Specifically, we consider coupling among components of a general, second order linear structural system driven by a Gaussian random process.The particular model studied is used to describe numerous phenomena, including structural dynamics, electrical circuits, heat transfer, etc. [15].As such, it presents an opportunity to better understand the properties of the TE for a broad class of dynamical systems.Section 1 develops the general analytical expression for the TE in terms of the covariance matrices associated with different combinations of system response data.Section 2 specifies the general model under study and derives the TE for the model response data.Sections 3 and 4 present results and concluding remarks.

Mathematical Development
In what follows we assume that we have observed the signals x i (t n ), i = 1 • • • M as the output of a dynamical system and that we have sampled these signals at times t n , n = 1 • • • N .The system is assumed to be appropriately modeled as a mixture of deterministic and stochastic components, hence we choose to model each sampled value x i (t n ) as a random variable X in .That is to say, for any particular observation time t n we can define a function P X in (x i (t n )) that assigns a probability to the event that X in < x i (t n ).We further assume that these are continuous random variables and that we may also define the probability density function (PDF) The vector of random variables defines a random process and will be used to model the i th signal Using this notation,we can also define the joint PDF p X i (x i ) which specifies the probability of observing such a sequence.In this work we further assume that the random processes are strictly stationary, that is to say the joint PDF obeys The joint probability density functions are models that predict the likelihood of observing a particular sequence of values.These same models can be extended to include dynamical effects by including conditional probability, p X in (x i (t n )|x i (t n−1 )), which can be used to specify the probability of observing the value x i (t n ) given that we have already observed x i (t n−1 ).The idea that knowledge of past observations changes the likelihood of future events is certainly common in dynamical systems.A dynamical system whose output is a repeating sequence of 010101 • • • is equally likely to be in state 0 or state 1 (probability 0.5) if the system is observed at a randomly chosen time.However, if we know the value at t 1 = 0 the value t 2 = 1 is known with probability 1.This concept lies at the heart of the P th order Markov model, which by definition obeys That is to say, the probability of the random variable attaining the value x i (t n+1 ) is conditional on the previous P values only.The shorthand notation used here specifies relative lags/advances as a superscript.
Armed with this notation we consider the work of Kaiser and Schreiber [2] and define the continuous transfer entropy between processes X i and X j as where R N is used to denote the N -dimensional integral over the support of the random variables.By definition, this measure quantifies the ability of the random process X j to predict the dynamics of the random process X i .To see why, we can examine the argument of the logarithm.In the event that the two random processes are not coupled, the dynamics will obey the Markov model in the denominator of Equation (2).However, should X j carry added information about the transition probabilities of X i , the numerator is a better model.The transfer entropy is effectively mapping the difference between these hypotheses to the scalar T E j→i (t n ).In short, the transfer entropy measures deviations from the hypothesis that the dynamics of X i can be described entirely by its own past history and that no new information is gained by considering the dynamics of system X j .
Two simplifications are possible which will aid in the evaluation of Equation (2).First, recall that we assumed the processes were stationary such that the joint probability distributions are invariant to the particular temporal location t n at which they are evaluated (only relative lags between observations matter).Hence, in what follows we may drop this index from the notation, i.e., T E j→i (t n ) → T E j→i .Secondly, we may use the law of conditional probability and expand Equation (2) as i , x i , x i dx i , x i , x i dx where the terms h X = − R M p X (x) log 2 (p(x)) dx are the joint differential entropies associated with the M −dimensional random variable X.In the next section we evaluate Equation (3) among the outputs of a second-order linear system driven with a jointly Gaussian random process.

Time-Delayed TE
The only multivariate probability distribution that readily admits an analytical solution for the differential entropies is the jointly Gaussian distribution.Consider the general case of the two data vectors x ∈ R N and the y ∈ R M .The jointly Gaussian model for these data vectors is where C XY is the N × M covariance matrix and | • | takes the determinant.Substituting Equation ( 4) into the expression for the corresponding differential entropy yields Therefore, assuming that both random processes X i and X j are jointly Gaussian distributed, we may substitute Equation (4) into Equation (3) for each of the differential entropies yielding For P, Q large the needed determinants become difficult to compute.We therefore employ a simplification to the model that retains the spirit of the transfer entropy, but that makes an analytical solution more tractable.In our approach, we set P = Q = 1 i.e., both random processes are assumed to follow a first order Markov model.However, we allow the time interval between the random processes to vary, just as is typically done for the mutual information and/or linear cross-correlation functions [6].Specifically, we model X i (t) as the first order Markov model Note that in anticipation of dealing with measured data, sampled at constant time interval ∆ t , we have made the replacement Although we are only using first order Markov models, by varying the time delay τ we can explore whether or not the random variable X j (t n + τ ) carries information about the transition probability . Should consideration of x j (t n + τ ) provide no additional knowledge about the dynamics of x i (t n ) the transfer entropy will be zero, rising to some positive value should In what follows we refer to this particular form of the TE as the time-delayed transfer entropy, or, TDTE.In this simplified situation the needed covariance matrices are and i is simply the variance of the random process X i and xi its mean.The assumption of stationarity also allows to write Making these substitutions into Equation ( 6) yields the expression where we have defined particular expectations in the covariance matrices using the shorthand This particular quantity is referred to in the literature as the cross-correlation function [16].Note that the covariance matrices are positive-definite matrices and that the determinant of a positive definite matrix is positive [17].Thus the quantity inside the logarithm will always be positive and the logarithm will exist.Now, the hypothesis that the TE was designed to test is whether or not past values of the process X j carry information about the transition probabilities of the second process X i .Thus, if we are to keep with the original intent of the measure we would only consider τ < 0. However, this restriction is only necessary if one implicitly assumes a non-zero TE means X j is influencing the transition p X i (x i (t n + ∆ t )|x i (t n )) as opposed to simply carrying additional information about the transition.Again, this latter statement is a more accurate depiction of what the TE is really quantifying and we have found it useful to consider both negative and positive delays τ in trying to understand coupling among system components.
It is also interesting to note the bounds of this function.Certainly for constant signals (i.e.x i (t n ), x j (t n ) are single-valued for all time) we have ρ X i X i (∆ t ) = ρ X i X j (τ ) = 0 ∀ τ and the transfer entropy is zero for any choice of time-scales τ defining the Markov processes.Knowledge of X j does not aid in forecasting X i simply because the transition probability in going from x i (t n ) to x i (t n + ∆ t ) is always unity.Likewise, if there is no coupling between system components we have (∆t) = 0.At the other extreme, for perfectly coupled systems i.e.X i = X j , consider τ → 0. In this case, we have ρ 2 X i X j (τ ) → 1, and this last expression we have noted the symmetry of the function ρ X i X i (τ ) with respect to the time-delay).The transfer entropy then becomes and the random process X j at τ = 0 is seen to carry no additional information about the dynamics of X i simply due to the fact that in this special case we have p . These extremes highlight the care that must be taken in interpreting the transfer entropy.Because the TDTE is zero for both the perfectly coupled and uncoupled case we must not interpret the measure to quantify the coupling strength between two random processes.Rather, the TDTE measures the additional information provided by one random process about the dynamics of another.We should point out that the average mutual information function can resolve the ambiguity in the TDTE as a measure of coupling strength.For two Gaussian random processes the time-delayed mutual information is known to be . Hence, for perfect coupling I X i X j (0) → ∞ whereas for uncoupled systems I X i X j (0) → 0. Estimating both time-delayed mutual information and transfer entropies can therefore permit stronger inference about dynamical coupling.

Analytical Cross-Correlation Function
To fully define the TDTE, the auto-and cross-correlation functions ρ ii (T ), ρ ij (T ) are required.They are derived here for a general class of linear system found frequently in the modeling and analysis of physical processes.Consider the system where T and M, C, K are M × M constant coefficient matrices that capture the system's physical properties.Thus, we are considering a second-order, constant coefficient, M −degree-of-freedom (DOF) linear system.It is assumed that we may measure the response of this system at any of the DOFs and/or the forcing functions.
One physical embodiment of this system is shown schematically in Figure 1.Five masses are coupled together via restoring elements k i (springs) and dissipative elements, c i (dash-pots).The first mass is fixed to a boundary while the driving force is applied at the end mass.If the response data x(t) are each modeled as a stationary random process we may use the analytical TDTE to answer questions about shared information between any two masses.We can explore this relationship as a function of coupling strength and also which particular mass response data we choose to analyze.
However, before proceeding we require a general expression for the cross-correlation between any two DOFs, i, j ∈ [1, M ].In other words, we require the expectation E[x i (n)x j (n + T )] for any combination of i, j.Such an expression can be obtained by first transforming coordinates.Let x(t) = uη(t) where the matrix u contain the non-trivial solutions to the eigen-value problem |M −1 K − ω 2 i I|u i = 0 as its columns [18].Here the eigen-values are the natural frequencies of the system, denoted ω i , i = 1 • • • M .Making the above coordinate transformation, substituting into Equation ( 10) and then pre-multiplying both sides by u T allows the equations of motion to be uncoupled and written separately as where the eigenvectors have been normalized such that u T Mu = I (the identity matrix).In the above formulation we have also made the assumption that C = αK i.e., the dissipative coupling C ẋ(t) is of the same form as the restoring term, albeit scaled by the constant α << 1 (i.e., a lightly damped system).To obtain the form shown in Equation ( 11) we introduce the dimensionless damping coefficient ζ i = α 2 ω i .The general solution to these un-coupled, linear equations is well-known [18] and can be written as the convolution where h(θ) is the impulse response function and In general terms, we therefore have If we further consider the excitation f (t) to be a zero-mean random process, so too will be q l (t).Using this model, we may construct the covariance which is a function of the eigen-vectors u i , the impulse response function h(•) and the covariance of the modal forcing matrix.Knowledge of this covariance matrix can be obtained from knowledge of the forcing covariance matrix we write It is assumed that the random vibration inputs are uncorrelated, i.e.E[f q (t)f p (t)] = 0 ∀ q = p, with variance . Thus, the above can therefore be simplified as The most common linear models assume the input is applied at a single DOF, i.e. f p (t) is non-zero only for p = P .For a load applied at DOF P , the auto-covariance becomes The inner integral can be further evaluated as Note that we have re-written the forcing auto-covariance as the inverse Fourier transform of the associated power spectral density function, denoted S F F (ω), via the well-known Wiener-Khinchine relation [16].We have already assumed the forcing is comprised of independent, identically distributed values, in which case the forcing power spectral density S F F (ω) = const ∀ω.Denoting this constant S F F (0), we note that the Fourier Transform of a constant is simply Returning to Equation (19) we have At this point we can simplify the expression by carrying out the integral.Substituting the expression for the impulse response in Equation ( 13), the needed expectation in Equation ( 22) becomes [19,20] u lP u mP u il u jm A lm e −ζmωmτ cos(ω dm τ ) + B lm e −ζmωmτ sin(ω dm τ ) (23) where We can further normalize this function to give for the normalized auto-and cross-correlation functions.
It will also prove instructive to study the TDTE between the drive and response.This requires . Following the same procedure as above results in the expression Normalizing by the variance of the random process X i and assuming σ 2 F P = 1 yields the needed correlation function ρ if (τ ).This expression may be substituted into the expression for the transfer entropy to yield the TDTE between drive and response.At this point we have completely defined the analytical TDTE for a broad class of second order linear systems.The behavior of this function is described next.Before concluding this section we note that it also may be possible to derive expressions for the TDTE for different types of forcing functions.Impulse excitation and also non-Gaussian inputs where the marginal PDF can be described as a polynomial transformation of a Gaussian random variable (see e.g., [21]) are two such possibilities.

Behavior of the TDTE
Before proceeding with an example, we first require a means of estimating T E j→i (τ ) from observed data.Assume we have recorded the signals x i (n∆ t ), x j (n∆ t ), n = 1 • • • N with a fixed sampling interval ∆ t .In order to estimate the TDTE we require a means of estimating the normalized correlation functions ρ ij (τ ) which can be substituted into Equation (8).While different estimators of correlation functions exist (see e.g., [16]), we use a frequency domain estimator.This estimator relies on the assumption that the observed data are the output of an ergodic (therefore stationary) random process.If we further assume that the correlation functions are absolute integrable, e.g., |R ij (τ )dτ | < ∞, the Wiener-Khinchin Theorem tells us that the cross-spectral density and cross-covariance functions are related via Fourier transform as [16].
where X i (f ) denotes the Fourier transform of the signal x i (t).One approach is to therefore estimate the spectral density ŜX j X i (f ) and then inverse Fourier transform to give RX i X j (τ ).We further rely on the ergodic theorem of Birkhoff ( [22]) which (when applied to probability) allows one to write expectations defined over multiple realizations to be well-approximated temporally averaging over a finite number of samples.More specifically, we divide the temporal sequences segments of length N s (possibly) overlapping by L points.Taking the discrete Fourier transform of each segment, e.g., X is (k) = Ns−1 n=0 x i (n + sN s − L)e −i2πkn/Ns , s = 0 • • • S − 1 and averaging gives the estimator at discrete frequency k.This quantity is then inverse discrete Fourier transformed to give Finally, we may normalize the estimate to give the cross-correlation coefficient This estimator is asymptotically consistent and unbiased and can therefore be substituted into Equation ( 8) to produce very accurate estimates of the TE (see examples to follow).In the general (nonlinear) case, kernel density estimators are typically used but are known to be poor in many cases, particularly when data are scarce (see e.g., [6,23]).We also point out that for this study stationarity (and ergodicity) only up to second order (covariance) is required.In general the TDTE is a function of all joint moments hence higher-order ergodicity must be assumed.
As an example, consider a five-DOF system governed by Equation (10), where: are constant coefficient matrices commonly used to describe structural systems.In this case, these particular matrices describe the motion of a cantilevered structure where we assume a joint normally distributed random process applied at the end mass, i.e. f (t) = (0, 0, 0, 0, N (0, 1)).In this first example we examine the TDTE between response data collected from two different points on the structure.We fix m i = 0.01 kg, c i = 0.1 N • s/m, and k i = 10 N/m for each of the i = 1 • • • 5 degrees of freedom (thus we are using α = 0.01 in the modal damping model C = αK).The system response data x i (n∆ t ), n = 1 • • • 2 15 to the stochastic forcing is then generated via numerical integration.For simulation purposes we used a time-step of ∆ t = 0.01 s which is sufficient to capture all five of the system natural frequencies (the lowest of which is ω 1 = 9.00 rad/s).Based on these parameters, we generated the analytical expressions T E 3→2 (τ ) and T E 2→3 (τ ) and also T E 5→1 (τ ) and T E 1→5 (τ ) for illustrative purposes.These are shown in Figure 2 along with the estimates formed using the Fourier transform-based procedure.In forming the estimates we used L = 0, S = 2 3 , N s = 2 12 , resulting in low bias and variance, and providing curves that are in very close agreement with theory.
With Figure 2 in mind, first consider negative delays only where τ < 0. Clearly, the further the random variable X j (t n + τ ) is from X i (t n ), the less information it carries about the probability of X i transitioning to a new state ∆ t seconds into the future.This is to be expected from a stochastically driven system and accounts for the decay of the transfer entropy to zero for large |τ |.However, we also see periodic returns to the point T E j→i (τ ) = 0 for even small temporal separation.Clearly this is a reflection of the periodicity observed in second order linear systems.In fact, for this system the dominant period of oscillation is 2π/ω 1 = 0.698 seconds.It can be seen that the argument of the logarithm in Equation ( 8) periodically reaches a minimum value of unity at precisely half this period, thus we observe zeros of the TDTE at times (i − 1) × π/ω 1 , i = 1 • • • .In this case the TDTE is going to zero not because the random variables X j (t n + τ ), X i (t n ) are unrelated, but because knowledge of one allows us to exactly predict the position of the other (no additional information is present).We believe this is likely to be a feature of most systems possessing an underlying periodicity and is one reason why using the TE as a measure of coupling must be done with care.
One possible way to eliminate this feature is to condition the measure on more of the signal's past history.In fact, several papers (see e.g., [9,13]) mention the importance of conditioning on the full state vector where d is the dimensionality (in a loose sense, the number of dynamical degrees of freedom) of the random process X j .Building in more past history would almost certainly remove the oscillations as some of the past observations would always be providing additional predictive power.However, building in more history significantly complicates the ability to derive closed-form expressions.Moreover, for this simple linear system the basic envelope of the TDTE curves would not likely be effected by altering the model in this way.
We also point out that values of the TDTE are non-zero for positive delays as well.Again, so long as we interpret the TE as a measure of predictive power this makes sense.That is to say, future values X j can aid in predicting the current dynamics of X i .Interestingly, the asymmetry in the TE peaks near τ = 0 may provide the largest clue as to the location of the forcing signal.Consistently we have found that the TE is larger for negative delays when mass closest the driven end plays the role of X j ; conversely it is larger for positive delays when the mass furthest from the driven end plays this role.So long as the coupling is bi-directional, results such as those shown in Figure 2 can be expected in general.
However, the situation is quite different if we consider the case of uni-directional coupling.For example, we may consider T E f →i (τ ), i.e. the TDTE between the forcing signal and response variable i.This is a particularly interesting case as, unlike in previous examples, there is no feedback from DOF i to the driving signal.Figure 3 shows the TDTE between drive and response and clearly highlights the directional nature of the coupling.Past values of the forcing function clearly help in predicting the dynamics of the response.Conversely, future values of the forcing say nothing about transition probabilities for the mass response simply because the mass has not "seen" that information yet.Thus, for uni-directional coupling, the TDTE can easily diagnose whether X j is driving X i or vice-versa.It can also be noticed from these plots that the drive signal is not that much help in predicting the response as the TDTE is much smaller in magnitude that when computed between masses.We interpret this to mean that the response data are dominated by the physics of the structure (e.g., the structural modes), which is information not carried in the drive signal.Hence, the drive signal offers little in the way of additional predictive power.While the drive signal puts energy into the system, it is not very good at predicting the response.It should also be pointed out that the kernel density estimation techniques are not able to capture these small values of the TDTE.The error in such estimates is larger than these subtle fluctuations.Only the "linearized" estimator is able to capture the fluctuations in the TDTE for small (O(10 −2 )) values.
Figure 3.Time delay transfer entropy between the forcing (denoted as DOF "0") and mass three for the 5 DOF system driven at mass, P = 5.The plot is consistent with the interpretation of information moving from the forcing to mass three.It has been suggested that the main utility of the TE is to, given a sequence of observations, assess the direction of information flow in a coupled system.More specifically, one computes the difference T E i→j − T E j→i with a positive difference suggesting information flow from i to j (negative differences indicating the opposite) [2,4].In the system modeled by Equation (10) one would heuristically understand the information as flowing from the drive signal to the response.This is certainly reinforced by Figure 3.However, by extension it might seem probable that information would similarly flow from the mass closest the drive signal to the mass closest the boundary (e.g., DOF 5 to DOF 1).We test this hypothesis as a function of the coupling strength between masses.Fixing each stiffness and damping coefficient to the previously used values, we vary k 3 from 1 N/m to 40 N/m and examine the quantity T E i→j −T E j→i evaluated at τ * , taken as the delay at which the TDTE reaches its maximum.Varying k 3 slightly alters the dominant period of the response.By accounting for this shift we eliminate the possibility of capturing the TE at one of its nulls (see Figure 2).For example, in Figure 2 we see that τ * = −0.15 in the plot of T E 3→2 (τ ). Figure 4 shows the difference in TDTE as a function of the coupling strength.The result is non-intuitive if one assumes information would move from driven end toward the non-driven end of the system.For certain DOFs this interpretation holds, for others, it does not.Herein lies the difficulty in interpreting the TE when bi-directional coupling exists.This was also pointed out by Schreiber [1] who noted "Reducing the analysis to the identification of a "drive" and a "response" may not be useful and could even be misleading".The above results certainly reinforce this statement.
Rather than being viewed as a measure of information flow, we find it more useful to interpret the difference measure as simply one of predictive power.That is to say, does knowledge of system j help predict system i more so than i helps predict j.This is a slightly different question.Our analysis suggests that if X i and X j are both near the driven end but with DOF i the closer of the two , then knowledge of X j is of more use in predicting X i than vice-versa.This interpretation also happens to be consistent with the notion of information moving from the driven end toward the base.However as i and j become de-coupled (physically separated) it appears the reverse is true.The random process X i is better at predicting X j than X j is in predicting X i .Thus, for certain pairs of masses information seems to be traveling from the base toward the drive.One possible explanation is that because the mass X i is further removed from the drive signal it is strongly influenced by the vibration of each of the other masses.By contrast, a mass near the driven end is strongly influenced only by the drive signal.Because the dynamics X i are influenced heavily by the structure (as opposed to the drive), X i does a good job in helping to predict the dynamics everywhere.The main point of this analysis is that the difference in TE is not at all an unambiguous measure of the direction of information flow.
To further explore this question, we have repeated this numerical experiment for all possible combinations of masses.These results are displayed in Figure 5 where the same basic phenomenology is observed.If both masses being analyzed are near the driven end, the mass closest the drive is a better predictor of the one that is further away.However again, as i and j become decoupled the reverse is true.Our interpretation is that the further the process is removed from the drive signal, the more it is dominated by the other mass dynamics and the boundary conditions.Because such a process is strongly influenced by the other DOFs, it can successfully predict the motion for these other DOFs.
It is also interesting to note how the strength, and even directionality (sign) of the difference in TDTE changes with variations in a single stiffness element.Depending on the value of k 3 we see changes in which of the two masses is a better predictor.In some cases we even see zero TDTE difference, implying that the dynamics of the constituent signals are equally useful in predicting one another.Again, this does not support our intuitive notion of what it means for information to travel through a structural system.Only in the case of uni-directional coupling can we unambiguously use the TE to indicate directionality of information transport.
One of the strengths of our analysis is that these conclusions are not influenced by estimation error.In studying heart and breath rate interactions, for example, the ambiguity in information flow was assigned to difficulties in the estimation process [2].We have shown here that even when estimation error is not a factor the ambiguity remains.We would imagine a similar result would hold for more complex systems, however such systems are beyond our ability to develop analytical expressions.The difference in TDTE is, however, a useful indicator of which system component carries the most predictive power about the rest of the system dynamics.
In short, the TDTE can be a very useful descriptor of system dynamics and coupling among system components.However any real understanding is only likely to be obtained in the context of a particular system model, or class of models (e.g., linear).Absent physical insight into the process that generates the observations, understanding results of a TDTE analysis can be challenging at best.However, it is perhaps worth mentioning that the expressions derived here might permit inference about the general form of the underlying "linearized" system model.Different linear system models yield different expressions for ρ ij (τ ), hence different expressions for the TDTE.One could then conceivably use estimates of the TDTE as a means to select among this class of models given observed data.Whether or not the TDTE is of use in the context of model selection remains to be seen.

Conclusions
In this work we have derived an analytical expression for the time-delayed transfer entropy (TDTE) among components of a broad class of second order linear systems driven by a jointly Gaussian input.This solution has proven particularly useful in understanding the behavior of the TDTE as a measure of dynamical coupling.In particular, when the coupling is uni-directional, we have found the TDTE to be an unambiguous indicator of the direction of information flow in a system.However, for bi-directional coupling the situation is significantly more complicated, even for linear systems.We have found that a heuristic understanding of information flow is not always accurate.For example, one might expect information to travel from the driven end of a system toward the non-driven end.In fact, we have shown precisely the opposite to be true.Simply varying a linear stiffness element can cause the apparent direction of flow to change.It would seem a safer interpretation is that a positive difference in the transfer entropy between two system components tells the practitioner which component has the greater predictive power.

Figure 1 .
Figure 1.Physical system modeled by Equation (10).Here, an M = 5 DOF structure is represented by masses coupled together via both restoring and dissipative elements.Forcing is applied at the end mass. k

Figure 2 .
Figure 2. Time delay transfer entropy between masses two and three (top row) and one and five (bottom row) of a 5 DOF system driven at mass, P = 5.

Figure 4 .
Figure 4. Difference in time delay transfer entropy between the driven mass five and each other DOF as a function of k 3 .A positive difference indicates T E i→j > T E j→i and is commonly used to indicate that information is moving from mass i to mass j.Based on this interpretation, negative values indicate information moving from the driven end to the base; positive values indicate the opposite.Even for this linear system, choosing different masses in the analysis can produce very different results.In fact, T E 2→5 −T E 5→2 implies a different direction of information transfer, depending on the strength of the coupling, k 3

Figure 5 .
Figure 5. Difference in time-delayed transfer entropy (TDTE) among different combinations of masses.By the traditional interpretation of TE, negative values indicate information moving from the driven end to the base; positive values indicate the opposite.