Transfer Entropy for Coupled Autoregressive Processes

A method is shown for computing transfer entropy over multiple time lags for coupled autoregressive processes using formulas for the differential entropy of multivariate Gaussian processes. Two examples are provided: (1) a first-order filtered noise process whose state is measured with additive noise, and (2) two first-order coupled processes each of which is driven by white process noise. We found that, for the first example, increasing the first-order AR coefficient while keeping the correlation coefficient between filtered and measured process fixed, transfer entropy increased since the entropy of the measured process was itself increased. For the second example, the minimum correlation coefficient occurs when the process noise variances match. It was seen that matching of these variances results in minimum information flow, expressed as the sum of transfer entropies in both directions. Without a match, the transfer entropy is larger in the direction away from the process having the larger process noise. Fixing the process noise variances, transfer entropies in both directions increase with the coupling strength. Finally, we note that the method can be generally employed to compute other information theoretic quantities as well.


Introduction
Transfer entropy [1] quantifies the information flow between two processes.Information is defined to be flowing from system X to system Y whenever knowing the past states of X reduces the uncertainty of one or more of the current states of Y above and beyond what uncertainty reduction is achieved by only knowing the past Y states.Transfer entropy is the mutual information between the current state of system Y and one or more past states of system X, conditioned on one or more past states of system Y.We will employ the following notation.Assume that data from two systems X and Y are simultaneously available at k timestamps: . Then we express transfer entropies as: Each of the two transfer entropy values TE x→y and TE y→x is nonnegative and both will be positive (and not necessarily equal) when information flow is bi-directional.Because of these properties, transfer entropy is useful for detecting causal relationships between systems generating measurement time series.Indeed, transfer entropy has been shown to be equivalent, for Gaussian variables, to Granger causality [2].Reasons for caution about making causal inferences in some situations using transfer entropy, however, are discussed in [3][4][5][6].A formula for normalized transfer entropy is provided in [7].
The contribution of this paper is to explicitly show how to compute transfer entropy over a variable number of time lags for autoregressive (AR) processes driven by Gaussian noise and to gain insight into the meaning of transfer entropy in such processes by way of two example systems: (1) a first-order AR process X = {x n } with its noisy measurement process Y = {y n }, and (2) a set of two mutually-coupled AR processes.Computation of transfer entropies for these systems is a worthwhile demonstration since they are simple models that admit intuitive understanding.In what follows we first show how to compute the covariance matrix for successive iterates of the example AR processes and then use these matrices to compute transfer entropy quantities based on the differential entropy expression for multivariate Gaussian random variables.Plots of transfer entropies versus various system parameters are provided to illustrate various relationships of interest.
Note that Kaiser and Schreiber [8] have previously shown how to compute information transfer metrics for continuous-time processes.In their paper they provide an explicit example, computing transfer entropy for two linear stochastic processes where one of the processes is autonomous and the other is coupled to it.To perform the calculation for the Gaussian processes the authors utilize expressions for the differential entropy of multivariate Gaussian noise.In our work, we add to this understanding by showing how to compute these quantities analytically for higher time lags.We now provide a discussion of differential entropy, the formulation of entropy appropriate to continuousvalued processes as we are considering.

Differential Entropy
The entropy of a continuous-valued process is given by its differential entropy.Recall that the entropy of a discrete-valued random variable is given by the Shannon entropy (we shall always choose log base 2 so that entropy will be expressed in units of bits) where p i is the probability of the i th outcome and the sum is over all possible outcomes.Following [9] we derive the appropriate expression for differential entropies for conditioned and unconditioned continuous-valued random variables.When a process X is continuous-valued we may approximate it as a discrete-value process by identifying p i = f i Δx where f i is the value of the pdf at the i th partition point and Δx is the refinement of the partition.We then obtain: Note that since the X process is continuous-valued, then, as Δx → 0, we have H(X) → + infinity.Thus, for continuous-valued processes, the quantity h(X), when itself defined and finite, is used to represent the entropy of the process.This quantity is known as the differential entropy of random process X.
Closed-form expressions for the differential entropy of many distributions are known.For our purposes, the key expression is the one for the (unconditional) multivariate normal distribution [10].Let the probability density function of the n-dimensional random vector x be denoted f(x), then the relevant expressions are: where detC is the determinant of matrix C, the covariance of x.In what follows, this expression will be used to compute differential entropy of unconditional and conditional normal probability density functions.The case for conditional density functions warrants a little more discussion.
Recall that the relationships between the joint and conditional covariance matrices, C XY and C Y|X , respectively, of two random variables X and Y (having dimensions n x and n y , respectively) are given by:   Here blocks  11 and  22 have dimensions n x by n x and n y by n y , respectively.Now, using Leibniz's formula, we have that: Hence the conditional differential entropy of Y, given X, may be conveniently computed using: This formulation is very handy as it allows us to compute many information-theoretic quantities with ease.The strategy is as follows.We define C (k) to be the covariance of two random processes sampled at k consecutive timestamps {t n−k+2 , t n− k+1 , …, t n , t n+1 }.We then compute transfer entropies for values of k up to k sufficiently large to ensure that their valuations do not change significantly if k is further increased.For our examples, we have found k = 10 to be more than sufficient.A discussion of the importance of considering this sufficiency is provided in [11].

Transfer Entropy Computation Using Variable Number of Timestamps
We wish to consider two example processes each of which conforms to one of the two model systems having the general expressions: and: ~0, , ~0, .
Here, v n and w n are zero mean uncorrelated Gaussian noise processes having variances R and Q, respectively.For system stability, we require the model poles to lie within the unit circle.
The first model is of a filtered process noise X one-way coupled to an instantaneous, but noisy measurement process Y.The second model is a two-way coupled pair of processes, X and Y.
Transfer entropy (as defined by Schreiber [1]) considers the flow of information from past states (i.e., state values having, timetags ) of one process to the present   1  n t state of another process.However, note that in the first general model (measurement process) there is an explicit flow of information from the present state of the X process; x n+1 determines the present state of the Y process y n+1 (assuming c − 1 is not zero).To fully capture the information transfer from the X process to the current state of the Y process we must identify the correct causal states [4].For the measurement system, the causal states include the current (present) state.This state is not included in the definition of transfer entropy, being a mutual information quantity conditioned on only past states.Hence, for the purpose of this paper, we will temporarily define a quantity, "information transfer," similar to transfer entropy, except that the present of the driving process, x n+1 , will be lumped in with the past values of the X process: x n−k+2 :x n .For the first general model there is no information transferred from the Y to the X process.We define the (non-zero) information transfer from the X to the Y process (based on data from k timetags) as: The major contribution of this paper is to show how to analytically compute transfer entropy for AR Gaussian processes using an iterative method for computing the required covariance matrices.Computation of information transfer is additionally presented to elucidate the power of the method when similar information quantities are of interest and to make the measurement example more interesting.We now present a general method for computing the covariance matrices required to compute information-theoretic quantities for the AR models above.Two numerical examples follow.
To compute transfer entropy over a variable number of multiple time lags for AR processes of the general types shown above, we compute its block entropy components over multiple time lags.By virtue of the fact that the processes are Gaussian we can avail ourselves of analytical entropy expressions that depend only on the covariance of the processes.In this section we show how to analytically obtain the required covariance expressions starting with the covariance for a single time instance.Taking expectations, using the AR equations, we obtain the necessary statistics to characterize the process.Representing these expectation results in general, the process covariance matrix C (1) (t n ) corresponding to a single timestamp, t n , is: To obtain an expanded covariance matrix, accounting for two time instances (t n and t n+1 ), we compute the additional expectations required to fill in the matrix C (2) (t n ): Because the process is stationary, we may write: (13) where: Thus we have found the covariance matrix C (2) required to compute block entropies based on two timetags or, equivalently, one time lag.Using this matrix the single-lag transfer entropies may be computed.
We now show how to compute the covariance matrices corresponding to any finite number of time stamps.Define vector . Using the definitions above, write the matrix C (2) as a block matrix and, using standard formulas, compute the conditional mean and covariance C c of 1 Note that the expected value of the conditional mean is zero since the mean of the n z process, z  , is itself zero.With these expressions in hand, we note that we may view propagation of the state n z to its value Here S is the principal square root of the matrix C c .It is conveniently computed using the inbuilt Matlab function sqrtm.To see analytically that the recursion works, note that using it we recover at each timestamp a process having the correct mean and covariance: Thus, because the process is Gaussian and fully specified by its mean and covariance, we have verified that the recursive representation yields consistent statistics for the stationary AR system.Using the above insights, we may now recursively compute the covariance matrix C (k) for a variable number (k) of timestamps.Note that C (k) has dimensions of 2k × 2k.We denote 2 × 2 blocks of C (k) as C (k)  ij for i, j = 1,2, ..., k , where C (k) ij is the 2-by-2 block of C (k) consisting of the four elements of C (k) that are individually located in row 2i − 1 or 2i and column 2j − 1 or 2j.
The above recursion is now used to compute the block elements of C (3) .Then each of these block elements is, in turn, expressed in terms of block elements of C (2) .These calculations are shown in detail below where we have also used the fact that the mean of the z n vector is zero: By continuation of this calculation to larger timestamp blocks (k > 3), we find the following pattern that can be used to extend (augment) C (k−1) to yield C (k) .The pattern consists of setting most of the augmented matrix equal to that of the previous one, and then computing two additional rows and columns for C (k) , k > 2, to fill out the remaining elements.The general expressions are: At this point in the development we have shown how to compute the covariance matrix: Since the system is linear and the process noise w n and measurement noise v n are white zero-mean Gaussian noise processes, we may express the joint probability density function for the 2k variates as: Note that the mean of all 2k variates is zero.Finally, to obtain empirical confirmation of the equivalence of the covariance terms obtained using the original AR system and its recursive representation, numerical simulations were conducted.Using the example 1 system (below) 500 sequences were generated each of length one million.For each sequence the C (3) covariance was computed.The error for all C (3) matrices was then averaged, assuming that the C (3) matrix calculated using the method based on the recursive representation was the true value.The result was that for each of the matrix elements, the error was less than 0.0071% of its true value.We are now in position to compute transfer entropies for a couple of illustrative examples.

Example 1: A One-Way Coupled System
For this example we consider the following system: Parameter h c specifies the coupling strength of the Y process to the first-order AR process X, and R and Q are their respective (w n and v n ) zero-mean Gaussian process noise variances.For stability, we require |a| < 1. Comparing to the first general representation given above, we have . The system models filtered noise x n and a noisy measurement, y n , of x n .Thus the x n sequence represents a hidden process (or model) which is observable by way of another sequence, y n .We wish to examine the behavior of transfer entropy as a function of the correlation  between x n and y n .One might expect that the correlation  between x n and y n to be proportional of the degree of information flow; however, we will see that the relationship between transfer entropy and correlation is not quite that simple.
Both the X and Y processes have zero mean.Computing the joint covariance matrix C (1) for x n and y n and their correlation we obtain: Hence the process covariance matrix C (1) corresponding to a single timestamp, t n is: In order to obtain an expanded covariance matrix, accounting for two time instances (t n and t n+1 ) we compute the additional expectations required to fill in the matrix C (2) :  Thus we have found the covariance matrix C (2) required to compute block entropies based on a single time lag.Using this matrix the single-lag transfer entropies may be computed.Using the recursive process described in the previous section we can compute C (1 °).We have found that using higher lags does not change the entropy values significantly.
To aid the reader in understanding the calculations required to compute transfer entropies using higher time lags, it is worthwhile to compute transfer entropy for a single lag.We first define transfer entropy using general notation indicating the partitioning of the X and Y sequences in to past and future    , and   y y   , , respectively.We then compute transfer entropy as a sum of block entropies: The Y states have no influence on the X sequence in this example.Hence TE y→x = 0. Since we are here computing transfer entropy for a single lag (i.e., two time tags t n and t n+1 ) we have: By substitution of the expression for the differential entropy of each block we obtain: For this example, note from the equation for y n+1 that state x n+1 is a causal state of X influencing the value of y n+1 .In fact, it is the most important such state.To capture the full information that is transferred from the X process to the Y process over the course of two time tags we need to include state x n+1 .Hence we compute the information transfer from x → y as: Here the notation indicates the determinant of the matrix composed of the rows and columns of C (2) indicated by the list of indices i shown in the subscripted brackets.For example, is the determinant of the matrix formed by extracting columns {1, 2, 3, 4} and rows {1, 2, 3, 4} from matrix C (2) .In later calculations we will use slightly more complicated-looking notation.For example, is the determinant of the matrix formed by extracting columns {2, 4 ,…, 18, 20} and the same-numbered rows from matrix C (1 °).(Note C (k)  [i],[i] is not the same as C (k)  ii as used in Section 3).
It is interesting to note that a simplification in the expression for information transfer can be obtained by writing the expression for it in terms of conditional entropies: From the fact that y n+1 = x n+1 + v n+1 we see immediately that: Hence we may write: To compute transfer entropy using nine lags (ten timestamps) assume that we have already computed C (10) as defined above.We partition the sequence Now, using these definitions, and substituting in expressions for differential block entropies we obtain: As a numerical example we set h c = 1, Q = 1, and for three different values of a (0.5, 0.7 and 0.9) we vary R so as to scan the correlation  between the x and y processes between the values of 0 and 1.
In Figure 1 it is seen that for each value of parameter a there is a peak in the transfer entropy TE (k)  x→y .As the correlation  between x n and y n increases from a low value the transfer entropy increases since the amount of information shared between y n+1 and x n is increasing.At a critical value of  transfer entropy peaks and then starts to decrease.This decrease is due to the fact that at high values of  the measurement noise variance R is small.Hence y n becomes very close to equaling x n so that the amount of information gained (about y n+1 ) by learning x n , given y n , becomes small.Hence h(y n+1 | y n ) -h(y n+1 | y n , x n ) is small.This difference is TE (2)  x→y .The relationship between  and R is shown in Figure 2. Note that when parameter a is increased, a larger value of R is required to maintain  at a fixed value.Also, in Figure 1 we see the effect of including more timetags in the analysis.When k is increased from 2 to 10 transfer entropy values fall, particularly for the largest value of parameter a.It is known that entropies decline when conditioned on additional variables.Here, transfer entropy is acting similarly.In general, however, transfer entropy, being a mutual information quantity, has the property that conditioning could make it increase as well [12].The observation that the transfer entropy decrease is greatest for the largest value of parameter a is perhaps due to the fact that the entropy of the X process is itself greatest for the largest a value and therefore has more sensitivity to an increase in X data availability (Figure 3).
From Figure 1 it is seen that as the value of parameter a is increased, transfer entropy is increased for a fixed value of .The reason for this increase may be gleaned from Figure 3 where it is clear that the amount of information contained in the x process, H X , is greater for larger values of a. Hence more information is available to be transferred at the fixed value of  when a is larger.In the lower half of Figure 3 we see that as  increases the entropy of the Y process, H Y , approaches the value of H X .This result is due to the fact that the mechanism being used to increase  is to decrease R. Hence as R drops close to zero y n looks increasingly identical to x n (since h c = 1).Figure 4 shows information transfer IT (k)  x→y plotted versus correlation coefficient .Now note that the trend is for information transfer to increase as  is increased over its full range of values.° This result is obtained since as  is increased y n+1 becomes increasingly correlated with x n+1 .Also, for a fixed  the lowest information transfer occurs for the largest value of parameter a.We obtain this result since at the higher a values x n and x n+1 are more correlated.Thus the benefit of learning the value of y n+1 through knowledge of x n+1 is relatively reduced, given that y n (itself correlated with x n ) is presumed known.Finally, we have IT (10)  x→y < IT (2)  x→y since conditioning the entropy quantities comprising the expression for information transfer with more state data acts to reduce their difference.Also, by comparison of Figure 2 and Figure 4, it is seen that information transfer is much greater than transfer entropy.This relationship is expected since information transfer as defined herein (for k = 2) is the amount of information that is gained about y n+1 from learning x n+1 and x n , given that y n is already known.Whereas transfer entropy (for k = 2) is the information gained about y n+1 from learning only x n , given that y n is known.Since the state y n+1 in fact equals x n+1 , plus noise, learning x n+1 is highly informative, especially when the noise variance is small (corresponding to high values of ).The difference between transfer entropy and information transfer therefore quantifies the benefit of learning x n+1 , given that x n and y n are known (when the goal is to determine y n+1 ).
Figure 5 shows how information transfer varies with measurement noise variance R. As R increases the information transfer decreases since measurement noise makes determination of the value of y n+1 from knowledge of x n and x n+1 less accurate.Now, for a fixed R, the greatest value for information transfer occurs for the greatest value of parameter a.This is the opposite of what we obtained for a fixed value of  as shown in Figure 4.The way to see the rationale for this is to note that, for a fixed value of information transfer, R is highest for the largest value of parameter a.This result is obtained since larger values of a yield the most correlation between states x n and x n+1 .Hence, even though the measurement y n+1 of x n+1 is more corrupted by noise (due to higher R), the same information transfer is achieved nevertheless, because x n provides a good estimate of x n+1 and, thus, of y n+1 .  (10 x→y versus measurement error variance R for three different values of parameter a (see legend).

Example 2: Information-theoretic Analysis of Two Coupled AR Processes.
In example 1 the information flow was unidirectional.We now consider a bidirectional example achieved by coupling two AR processes.One question we may ask in such a system is how transfer entropies change with variations in correlation and coupling coefficient parameters.It might be anticipated that increasing either of these quantities will have the effect of increasing information flow and thus transfer entropies will increase.
The system is defined by the equations: For stability, we require that the eigenvalues of the constant matrix lie in the unit circle.
The means of processes X and Y are zero.The terms w n and v n are the X and Y processes noise terms respectively.Using the following definitions: we may solve for the correlation coefficient  between x n and y n to obtain: Now, as we did previously in example 1 above, compute the covariance C (2) of the variates obtained at two consecutive timestamps to yield: At this point the difficult part is done and the same calculations can be made as in example 1 to obtain C (k) ; k = 3,4, …, 10 and transfer entropies.For illustration purposes, we define the parameters of the system as shown below, yielding a symmetrically coupled pair of processes.To generate a family of curves for each transfer entropy we choose a fixed coupling term  from a set of four values.We set Q = 1000 and vary R so that  varies from about 0 to 1.For each  value we compute the transfer entropies.The relevant system equations and parameters are: Hence, we make the following substitutions to compute C (2) : . 2 For each parameter set {, Q, R} there is a maximum possible ,   obtained by taking the limit as R→ ∞ of the expression for  given above.Doing so, we obtain: where: There is a minimum value of  also.The corresponding value for R, R min , was found by means of the inbuilt Matlab program fminbnd.This program is designed to find the minimum of a function in this case (a, b, c ,d, R, Q)) with respect to one parameter (in this case R) starting from an initial guess (here, R = 500).The program returns the minimum functional value ( min ) and the value of the parameter at which the minimum is achieved (R min ).After identifying R min a set of R values were computed so that the corresponding set of  values spanned from  min to the maximum   in fixed increments of  (here equal to 0.002).This set of R values was generated using the iteration: For the four selections of parameter  we obtain the functional relationships shown in Figure 6.From Figure 6 we see that for a fixed , increasing R increases (or decreases)  depending on whether R is less than (or greater than) Q (Q = 1000).Note that large increases in R > Q are required to marginally increase  when  nears its maximum value.The reason that the minimum  value occurs when Q equals R is because whenever they are unequal one of the processes dominates the other, leading to increased correlation.Also, note that if R << Q, then increasing  will cause  to decrease since increasing the coupling will cause the variance of the y process Var(y n ), a term appearing in the denominator of the expression for , to increase.If Q << R, a similar result is obtained when  is increased.Transfer entropies in both directions are shown in Figure 7. Fixing , we note that as R is increased from a low value both  and TE x− >y initially decrease while TE y− >x increases.Then for further increases of R,  reaches a minimum value then begins to increase, while TE x→y continues to decrease and TE y→x continues to increase.By plotting the difference TE x→y -TE y→x in Figure 8 we see the symmetry that arises as R increases from a low value to a high value.What is happening is that when R is low, the X process dominates the Y process so that TE x→y > TE y→x .As R increases, the two entropies equilibrate.Then, as R rises above Q, the Y process dominates giving TE x→y < TE y→x .The sum of the transfer entropies shown in Figure 8 reveal that the total information transfer is minimal at the minimum value of  and increases monotonically with .The minimum value for  in this example occurs when the process noise variances Q and R are equal (matched).Figure 9 shows the changes in the transfer entropy values explicitly as a function of R. Clearly, when R is small (as compared to Q = 1000), TE x→y > TE y→x .Also it is clear that at every fixed value of R, both transfer entropies are higher at the larger values for the coupling term .Here it is evident how TE y→x increases from a value less than TE x→y to a value greater than TE x→y as R increases.Note that for higher coupling values  this relative increase is more abrupt.
Finally, we consider the sensitivity of the transfer entropies to the coupling term .We reprise example system 2 where now  is varied in the interval (0, ½) and three values of R (somewhat arbitrarily selected to provide visually appealing figures to follow) are considered:  The reason for this increase is that when the noise driving one process is greater in amplitude than the amplitude of the noise driving the other process, the first process becomes dominant over the other.This domination increases as the disparity between the process noise variances increases (R versus Q).Note also that as the disparity increases, the maximum correlation occurs at increasingly lower values of the coupling term .As the disparity increases at fixed  = ¼ the correlation coefficient  increases.However, the variance in the denominator of  can be made smaller and thus  larger, if the variance of either of the two processes can be reduced.This can be accomplished by reducing .
The sensitivities of the transfer entropies to changes in coupling term  are shown in Figure 12.Consistent with intuition, all entropies increase with increasing .Also, when R < Q (blue trace) we have TE x->y > TE y->x and the reverse for R > Q. (red).For R = Q, TE x->y = TE y->x (green).Finally, it is interesting to note that whenever we define three cases by fixing Q and varying the setting for R ( one of R 1 , R 2 and R 3 for each case) such that R 1 < Q, R 2 = Q and R 3 = Q 2 /R 1 (so that R i+1 = QR i /R 1 for i = 1 and i = 2) we then obtain the symmetric relationships TE x ->y (R 1 ) = TE y ->x (R 3 ) and TE x ->y (R 3 ) = TE y ->x (R 1 ) for all  in the interval (1, ½).For these cases we also obtain (R 1 ) = (R 3 ) on the same  interval.

Conclusions
It has been shown how to compute transfer entropy values for Gaussian autoregressive processes for multiple timetags.The approach is based on the iterative computation of covariance matrices.Two examples were investigated: (1) a first-order filtered noise process whose state is measured with additive noise, and (2) two first-order symmetrically coupled processes each of which is driven by independent process noise.We found that, for the first example, increasing the first-order AR coefficient at a fixed correlation coefficient, transfer entropy increased since the entropy of the measured process was itself increased.
For the second example, it was discovered that the relationships between the coupling and correlation coefficients and the transfer entropies is more complicated.The minimum correlation coefficient occurs when the process noise variances match.It was seen that matching of these variances results in minimum information flow, expressed as the sum of both transfer entropies.Without a match, the transfer entropy is larger in the direction away from the process having the larger process noise.Fixing the process noise variances, transfer entropies in both directions increase with coupling strength .
Finally, it is worth noting that the method for computing covariance matrices for a variable number of timetags as presented here facilitates the calculation of many other information-theoretic quantities of interest.To this purpose, the authors have computed such quantities as crypticity [13] and normalized transfer entropy using the reported approach.

z
at the next timestamp as accomplished by the recursion:

Figure 2 .
Figure 2. Example 1: Logarithm of R versus  for three values of parameter a (see legend).

Figure 3 .
Figure 3. Example 1: Process entropies H X and H Y versus correlation coefficient  for three values of parameter a (see legend).

Figure 4 .
Figure 4. Example 1: Information transfer IT (k) x→y versus correlation coefficient  for three different values of parameter a (see legend) for k = 10 (solid trace) and k = 2 (dotted trace).

Figure 5 .
Figure 5. Example 1: Information transfer IT(10)  x→y versus measurement error variance R for three different values of parameter a (see legend).

Figure 6 .
Figure 6.Example 2: Process noise variance R versus correlation coefficient  for a set of  parameter values (see figure legend).

Figure 7 .
Figure 7. Example 2: Transfer entropy values versus correlation  for a set of  parameter values (see figure legend).Arrows indicate direction of increasing R values.

Figure 8 .
Figure 8. Example 2: Transfer entropies difference (TE x− >y -TE y − > x ) and sum (TE x− > y + TE y− > x ) versus correlation  for a set of  parameter values (see figure legend).Arrow indicates direction of increasing R values.

Figure 9 .
Figure 9. Example 2: Transfer entropies TE x→y and TE y→x versus process noise variance R for a set of  parameter values (see figure legend).

Figure 10 .
Figure 10.Example 2: Transfer entropy TE x− > y plotted versus TE y− > x for a set of  parameter values (see figure legend).The black diagonal line indicates locations where equality obtains.Arrow indicates direction of increasing R values.

Figure 11
Figure11shows the relationship between  and , where  x =  y =  for the three R values.Note that for the case R = Q the relationship is symmetric around  = ¼.As R departs from equality more correlation between x n and y n is obtained.

Figure 11 .
Figure 11.Example 2: Correlation coefficient  vs coupling coefficient  for a set of R values (see figure legend).

Figure 12 .
Figure 12.Example 2: Transfer entropies TE x→y (solid lines) vs TE y→x (dashed lines) vs coupling coefficient  for a set of R values (see figure legend).