Residual Predictive Information Flow in the Tight Coupling Limit: Analytic Insights from a Minimalistic Model

In a coupled system, predictive information flows from the causing to the caused variable. The amount of transferred predictive information can be quantified through the use of transfer entropy or, for Gaussian variables, equivalently via Granger causality. It is natural to expect and has been repeatedly observed that a tight coupling does not permit to reconstruct a causal connection between causing and caused variables. Here, we show that for a model of interacting social groups, carried from the master equation to the Fokker–Planck level, a residual predictive information flow can remain for a pair of uni-directionally coupled variables even in the limit of infinite coupling strength. We trace this phenomenon back to the question of how the synchronizing force and the noise strength scale with the coupling strength. A simplified model description allows us to derive analytic expressions that fully elucidate the interplay between deterministic and stochastic model parts.


Introduction
In many scientific problems, causal interactions of different processes or process components are clear from a mechanistic understanding of the system dynamics and reflected by model formulation. In other situations, the interaction structure of a large and/or complex system, e.g., the assembly dynamics of a chemical system [1,2], of an ecological [3,4] or social community [5,6] or of neuronal populations [7,8], may not be clear in advance. Then, the empirical reconstruction of causal interactions from multivariate time series can be attempted using a suitable concept of causality. Since the seminal works of Wiener [9] and Granger [10], this reconstruction is based on the rationale that a causal interaction can be inferred from an increased prediction error when excluding the causing process component from the prediction of the caused component. Operationalizing this idea leads to Granger causality (GC), a measure commonly used to compare causal interactions of different process components. The quantification of a predictive information flow is also behind the definition of transfer entropy [11], which quantifies the predictive information transfer from the causing to the caused component. As shown by Barnett et al. [12], Granger causality and transfer entropy are equivalent for Gaussian variables.
From the outlined definition, it is plausible to expect that a stronger coupling will enhance the predictive information flow. On the other hand, it can be expected that an increasing coupling strength will eventually synchronize the caused with the causing variable. The approach to the synchronization manifold can be monitored by the synchronization index (phase locking value) or, for a linear dynamics, by the cross-correlation coefficient. Once the dynamics is tied to the synchronization manifold, a reconstruction is no longer possible because in a synchronous mode of motion leader and follower cannot be told apart-early reports of this insight [13,14] were followed by repeated observations of the phenomenon in the context of various directionality indices and synchronization measures (e.g., [15][16][17][18][19]), which underpins a generic phenomenon. Combining the synchronization triggered decline of directionality measures with the initial increase in the small coupling regime immediately suggests unimodal curve describing the relation between directionality and coupling strength, where any directionality index vanishes for zero (uncoupled) and infinite (tight coupling limit) coupling strength.
We note that, to the best of our knowledge, all reports of this phenomenon are solely based on numerical or empirical data and subsequent estimation of directionality indices. Moreover, directionality indices are mostly constructed as differences (asymmetries) between directed coupling indices, thus quantifying a two-way net directionality. Quite often estimators of the one-way directionality indices are plagued by a considerable bias and, in particular, non-zero values for uncoupled variables. Computing the net directionality index this bias may (or may not) be removed by forming differences. Therefore, an analytic treatment of a one-way directionality index vs. coupling strength relation for a tractable but generic dynamics seems desirable.
In the following, we address such a system dynamics that follows from a socio-dynamic model originally formulated in the framework of a master equation [20]. In the thermodynamic limit, the description is shifted to the level of a Fokker-Planck equation [21] with related drift and diffusion coefficients. The resulting expressions show that the coupling strength enters the diffusion matrix which means that random fluctuations scale up together with increasing coupling strength. Our analytic derivation leads to the interesting result that even in the tight coupling limit a residual directionality remains. Since the resultant dynamics can effectively be mapped to a vector-autoregressive process (linear stochastic process with Gaussian variables), the residual Granger causality can be interpreted as a residual predictive information flow remaining for tight coupling. Although we use a specific model at the outset of our presentation, we argue that the finding of a residual GC is a generic effect.

Granger Causality in a Nutshell
To set the stage, we briefly recall how GC is defined in the framework of a (two-dimensional) vector autoregressive process VAR[∞] (of order infinity). In this case, two time series are fitted by the following full model with white noise covariance matrix In describing stable (non-divergent) time series by a VAR, the dynamical coefficients a j , b j , c j and d j will obey a stationarity condition [22].
The prediction error of X 2 , using both its own past and the past of X 1 , is quantified by Σ 22 , the variance of η t . Predicting X 2 only from its own past means to use a reduced univariate autoregressive processes description for the second time series In passing, we note that the parameters of this reduced model are not fitted independently but derived from Equation (1) for example through a spectral factorization [23]. Moreover, based on a state-space formulation of GC, a closed form involving partial covariances can be obtained for finite order vector-autoregressive models [24]. Now, the prediction error of X 2 without inclusion of X 1 is quantified by Ξ 2 , the variance ofη t . Since in this step no information is added (but rather removed), Ξ 2 will never be smaller than Σ 22 , guaranteeing that Granger causality GC (from X 1 → X 2 ) defined as the quantity log( VAR processes belong to the model class of time discrete linear stochastic processes. In a recent work, we extended the concept of GC to diffusion processes which constitute a special class of time continuous stochastic processes. Diffusion processes can be formulated either via a Langevin type stochastic differential equation system or via the related Fokker-Planck equation (FPE) [25]. In the framework of a local linearization of the dynamics and a stroboscopic map (i.e., the temporal integration of the dynamic flow over a time increment ∆t), this allows establishing a connection between drift and diffusion coefficients of the FPE and a local GC. Averaging the local GC with respect to the invariant measure yields a global GC for the original diffusion process. For details, we refer the reader to our recent publication [26].

Weidlich's Socio-Dynamic Model
Here, we apply this connection to a stochastic socio-dynamics model proposed by Weidlich [20] to describe opinion formation in a system with two groups, leaders (Group 1 comprising n 1 members) and followers (Group 2 with n 2 members). Individuals of each group i = 1, 2 can adopt one of two opinions j = 1, 2. Opinion diversity across the community is coded by variables n i,j specifying the number of individuals of group i sharing opinion j. Since it is assumed that everybody has an opinion, n i,1 + n i,2 = n i . Eventually, the system is considered in the "thermodynamic limit" n 1 , n 2 → ∞ with n 1 /n 2 = , which suggests a change to quasi-continuous variables Members of group i change their opinion from j to j with a transition rate p j→j that may depend on the present opinion configuration {y α,β } (with α, β = 1, 2). In line with Weidlich, we employ the following expressions with a straightforward interpretation: without coupling (µ = 0) leaders and followers have a diametrical preference for one opinion over the alternative one; while leaders flip opinion without reference to the population of followers (y 2,j ), with increasing coupling followers enhance the rate of switching their opinion to the opinion populated by leaders (y 1,j ). This means that followers are uni-directionally forced by leaders with the coupling strength scaled by the parameter µ. These rates enter a master equation describing the evolution of a statistical ensemble of interacting social groups. After another change of variables x i = y i,2 − y i,1 , reflecting directed imbalance and opinion diversity within group i, the related Fokker-Planck equation can be written down [20]. It is completely specified by its drift vector and its diffusion matrix Notice that the diffusion matrix is positive definite since −1 ≤ x 1 , x 2 ≤ 1. The inverse scaling with n 1 and n 2 , respectively, means that diffusion vanishes in the "thermodynamic limit" n 1 , n 2 → ∞ (with = n 1 /n 2 ). Moreover, since the parameter p (controlling the intrinsic opinion flip rate) scales both drift vector and diffusion matrix in the same way, it may well be absorbed in a reparameterization of time. Finally, we point out that in the step from the master equation to the FPE the coupling parameter µ does not only enter the drift vector but also the diffusion matrix.

Simulation Results
Being equipped with drift vector and diffusion matrix, we can compute the GC of the socio-dynamic diffusion process (as outlined in [26]) for various values of the coupling parameter µ. This means we compute GCs locally for subsampled vector-Ornstein-Uhlenbeck processes with local additive noise; global GC is then obtained as the statistical average of local GCs with respect to a stationary distribution estimated from the simulation (1,000,000 data points for each µ value). Additionally, we compute the GC of a global vector-Ornstein-Uhlenbeck process with constant average-noise. The results are plotted with circle and crosses, respectively, in Figure 1.  (6) and (7), respectively); crosses, GC of the global VAR(1) (Equation (8)) with average-noise; solid line, analytic GC (Equation (20)) of the minimalistic model.
As expected, for µ = 0, the GC is zero because in this uncoupled situation inclusion of x 1 cannot reduce the prediction error of x 2 . With increasing coupling, parameter µ GC initially increases up to a maximum around µ = 3 before it decreases again. Surprisingly, we find that a linearized dynamics with average-noise (notice that the only non-linearity ∼ x 1 x 2 resides in D 22 ) with state space averaged additive noise gives rise to the same values (plotted in Figure 1 with crosses). Therefore, the non-monotonic characteristic of GC requires neither a non-linear dynamics nor multiplicative noise.

Analytic Description
To explain the non-monotonic behavior of the GC, we connect the diffusion process specified by Equations (6) and (7) with a VAR(1), i.e., vector-autoregressive process of first order. The general connection is elaborated in [27] and proceeds via the VAR(1) written as where and θ t is a two-component vector of zero-mean Gaussian white noise residuals with a covariance matrix Σ given by the following matrix integral where Γ is the Jacobian matrix of the diffusion process, i.e., of its drift part. The above connection is derived for the standard vector-Ornstein-Uhlenbeck process with additive noise, which corresponds to a diffusion matrixD (2) ij that is constant in state space. By contrast, in Weidlich's model we see that the diffusion coefficients D   (7)) vary with x 1 and x 2 . Since the gradients scale with the inverse of n 1 and n 2 , respectively, for rather large n 1 , n 2 , the noise is weakly multiplicative [26]. Therefore, we may consider Σ(x 1 , x 2 ) as a local covariance matrix. A rigorous derivation would now strive to compute local GC maps and perform a subsequent average over state space (with respect to the stationary probability distribution, essentially a bivariate Gaussian). This route seems feasible but resultant expressions appear far from being handy. Therefore, we approximate state dependent diffusion coefficients by substituting mean values for state variables, i.e.,

(cf. Equation
The mean values can be read from Equation (6) x 1 = 1 2 and with the consequence that we approximate In line with these simplifications, we now consider the case of a VAR(1) with the following parameterization where the parameters a, b, c are to be identified with the entries in Equation (9), i.e., where we have used the abbreviation h = p∆t.
The covariance matrix of residuals is obtained by inserting σ 2 1 and σ 2 2 (µ) of Equation (13) into Equation (10). The integration can be performed analytically and leads to the following expressions Notice that With η := n 1 /n 2 2(1−e −2h ) ∼ being small (since n 1 n 2 and h 1), this matrix has the eigenvalues which shows that even in the tight coupling limit the covariance ellipsoid of residuals possesses two finite semi-axes. Equation (14) defines a VAR(1) that was recently considered by Barnett and Seth as a minimal model that allows deriving an analytic expression for GC that can be found in [28] and was restricted to the special case that Σ was the identity matrix, i.e., θ 1,t and θ 2,t were uncorrelated (Σ 12 = Σ 21 = 0) and both with unit variance (Σ 11 = Σ 22 = 1). Relaxing these restrictions and following the same steps outlined by Barnett and Seth we can derive an explicit formula for GC that is structurally identical to Barnett's formula even for the most general, bivariate VAR(1) process in Equation (8) with bidirectional coupling (A 12 Through Equations (20) and (21) in conjunction with Equation (15), we have everything at hand to evaluate the dependence of GC on the coupling parameter µ. Note that the GC does not depend on the parameter a, and for a fixed covariance matrix is a monotonically increasing function of c growing to infinity; indeed, in a simple autoregressive model, the parameter c would be naturally considered (a monotonous function of) the coupling strength µ. However, in our example, the dependence of the c on coupling parameter µ is more complicated, as it also affects the covariance matrix. Moreover, the dependence is such that there is a delicate balance for infinite coupling strength µ between the synchronizing influence of the driver and the diffusion term leading to the finite residual GC. The result is plotted in Figure 1 (with solid line) and shows that the analytic formula perfectly matches the numerical values (plotted with circles and crosses) extracted from the simulation. The matching of our analytical results, based on Equation (11), and global GC is surprising but fortunate. The coincidence of global GC and average-noise GC, on the other hand, is not surprising since, for weakly multiplicative noise, the latter is in line with our analytic approach.
The analytic formula allows considering the asymptotic limit µ → ∞, which is given by inserting into Barnett's formula (Equation (20)). Interestingly, n 1 and n 2 again enter this expression only through the ratio = n 1 /n 2 . For the chosen parameters, the resulting asymptotic value lim µ→∞ GC is approximately 0.027, which is larger than zero. As the scaling with ∼ η indicates, the asymptotic residual GC is a consequence of the residual variance in the direction of the second semi-axes of the covariance ellipsoid in Equation (18). Residual fluctuations remaining in the tight coupling limit enable an improvement of prediction. These residual fluctuations reflect the fact that even for infinite coupling strength synchronization is not complete. To show this, we compute the cross-correlation function by solving the Yule-Walker equations [22]. As can be seen in Figure 2, the maxima of the curves rise with increasing coupling strength while the related lag approaches zero (in discrete steps) reflecting instantaneous response of the caused variable. However, our analytic treatment reveals that even in the tight coupling limit µ → ∞ the maximal cross-correlation is 0.995 and not one; the reason is, again, the balance between the synchronizing force (exponential relaxation) and the increasing noise strength σ 22 (µ). We note that this value is not plagued by estimation errors. The analytic approach also puts us in the position to sort out how the different system properties contribute to the formation of a maximum in the GC. In particular, we can investigate the effect of coupling dependent diffusivity through replacing σ 2 2 (µ) (cf. Equation (13)) by the constant value σ 2 2 (µ = 0) = σ 2 1 . Through these investigations, we arrive at the following conclusions: • For uncoupled systems (µ = 0), GC is zero because c = 0.

•
The initial increase of GC is caused by the exponential increase of c towards b with increasing coupling c (cf. Equation (15)). Residual variance of the process component X 2 is reduced via information transferred from the driver X 1 .

•
Simultaneously with the information transferred from the driver intrinsic fluctuations, modeled by increasing diffusivity σ 2 2 (µ), scale up (asymptotically ∼ µ). Nevertheless, the residual variance Σ 22 approaches a constant because tighter coupling to X 1 (as expressed by the denominator µ + 1 in Equation (16)) compensates increasing intrinsic noise. In the limit µ → ∞, the residual variance Σ 22 is the residual variance Σ 11 of the driver plus an extra term resulting from the balance between intrinsic noise and tight coupling, i.e., lim µ→∞ Σ 22 = 1 + n 1 /n 2 1−e −2h Σ 11 • From the imbalance of diffusivities follows the ratio which means that for = 0.01 the residual variance Σ 22 is increasing roughly by the factor 100. As can be read from Equation (21), this tends to reduce c 2 with increasing µ. From the opposing trends -c 2 increases with µ while 1/Σ 22 decreases with increasing µ -the peak results as the point where the dominance of both trends changes. We note that analytic expressions can be derived since we have all ingredients at hand, however, explicit formulas are quite lengthy and do not provide deeper insights. In all cases considered, we found that √ det Σ varied only mildly with µ and the variation of b 2 is of minor importance for Barnett's formula (Equation (20)).

Discussion
The increase of GC with µ is a natural consequence of enhanced information transfer with tighter coupling to the driver. The asymptotic decrease and the formation of a maximum, on the other hand, follow from the fact that the caused variable X 2 asymptotically adopts the residual variance of the driver which is two orders larger because n 2 /n 1 = −1 = 100, meaning that "every leader has one hundred followers". This allows predicting that no such maximum can be found if n 1 ≈ n 2 or for n 1 > n 2 , i.e., if the leader group is of comparable or even larger size than the follower group-a statement that we can confirm through our analytic analysis. It is interesting to note that the asymptotic residual GC, i.e., an improvement of prediction in the tight-coupling limit, remains even in the thermodynamic limit n 1 , n 2 → ∞, i.e., when the diffusion constants approach zero, provided = n 1 /n 2 > 0. This results from the scaling behavior of the variance Σ 22 .
The question whether the found residual GC is a property of the specific socio-dynamic model or a generic property of a larger class of systems can be answered in the following way: our results follow from a minimalistic linear system (vector-Ornstein-Uhlenbeck process or VAR(1)), which means it is not a fancy effect of non-linear dynamics. In this setup, the rGC results from the linear asymptotic scaling behavior of σ 2 2 (µ) (Equation (13)), which balances the µ in the denominator of the first term in Equation (16) resulting from integrating the exponentials (in Equation (10)) and reflecting the faster relaxation with increasing coupling strength. The question as to which other systems might give rise to a residual GC thus leads back to the linear scaling of the diffusion coefficient D (2) 22 (µ). As the derivation in Weidlich's original paper [20] shows, this is a consequence of the ansatz of coupling groups via a coupling parameter that enters the flipping rates (Equation (5)) additively and linearly. Therefore, we conclude that a residual GC will not be restricted to the specific details of Weidlich's model but be a quite generic aspect of a stochastic dynamics described by a similar master equation.
The observed non-monotonic behavior of GC with coupling strength does not rely on non-linear dynamics or multiplicative noise. A peak of the information flow was pointed out and a finite value of the information flow, at large couplings, was also reported in [29]. However, it is interesting to consider how our insights can be transferred to a general diffusion process. Our conclusions drawn from the minimalistic system (see Equation (14) with Equation (15)), i.e., a linear dynamics with additive noise, can be carried over to the class of nonlinear diffusion processes with weakly multiplicative noise in the framework of a local linearization generating a GC map in state space. Subsequent averaging with respect to the invariant measure on the attractor leads to the global GC of the system, a concept that we put forward in a recent publication [26]. In this way, the global GC vs. coupling relation will be computable and the resultant curve will depend on the details of the considered system. The question whether a non-monotonic GC vs. coupling relation can result for a system with diffusion constants that do not vary with coupling strength remains open and will be dealt with in a forthcoming publication.
Finally, one may ask whether the non-monotonic GC curve plays any role in natural systems. We speculate that coupling enhanced predictability might be useful for the performance of cognitive functions and that certain brain regions are tuned to optimal synaptic coupling. In this context, a scale up of effects through coupling large neuronal ensembles can also be imagined.

Summary
We analytically investigated the non-monotonic behavior of Granger causality, defined on a socio-dynamic model of leaders and followers, with respect to coupling strength. Our derivations finally substantiate frequent observations of this phenomenon for various directionality indices and agree with the common sense that synchronization impedes a reconstruction of causal relationships. To make this point, it was necessary to extend Granger causality of Barnett's minimal model to a general vector-autoregressive process of order one. Fortunately, we were able to demonstrate that a tight coupling in the master equation can map not only to the dynamics but also to the noise and therefore does not automatically imply synchronization and a vanishing Granger causality.