Information Theory Analysis of Cascading Process in a Synthetic Model of Fluid Turbulence

The use of transfer entropy has proven to be helpful in detecting which is the verse of dynamical driving in the interaction of two processes, X and Y . In this paper, we present a different normalization for the transfer entropy, which is capable of better detecting the information transfer direction. This new normalized transfer entropy is applied to the detection of the verse of energy flux transfer in a synthetic model of fluid turbulence, namely the Gledzer–Ohkitana–Yamada shell model. Indeed, this is a fully well-known model able to model the fully developed turbulence in the Fourier space, which is characterized by an energy cascade towards the small scales (large wavenumbers k), so that the application of the information-theory analysis to its outcome tests the reliability of the analysis tool rather than exploring the model physics. As a result, the presence of a direct cascade along the scales in the shell model and the locality of the interactions in the space of wavenumbers come out as expected, indicating the validity of this data analysis tool. In this context, the use of a normalized version of transfer entropy, able to account for the difference of the intrinsic randomness of the interacting processes, appears to perform better, being able to discriminate the wrong conclusions to which the “traditional” transfer entropy would drive.


Introduction
This paper is about the use of quantities, referred to as information dynamical quantities (IDQ), derived from the Shannon information [1] to determine cross-predictability relationships in the study of a dynamical system.We will refer to as "cross-predictability" the possibility of predicting the (near) future behavior of a process, X, by observing the present behavior of a process, Y , likely to be interacting with X. Observing X given Y is of course better than observing only X, as far as predicting X is concerned: it will be rather interesting to compare how the predictability of X is increased given Y with the increase of predictability of Y given X.To our understanding, this cross-predictability analysis (CPA) gives an idea of the verse of dynamical driving between Y and X.In particular, the data analysis technique presented here is tested on a synthetic system completely known by construction.The system at hand is the Gledzer-Ohkitana-Yamada (GOY) shell model, describing the evolution of turbulence in a viscous incompressible fluid.In this model, the Fourier component interaction takes place locally in the space of wavenumbers, and due to dissipation growing with k, a net flux of energy flows from the larger to the smaller scales (direct cascade).A dynamical "driving" of the large on the small scales is then expected, which is verified here through simulations: mutual information and transfer entropy analysis are applied to the synthetic time series of the Fourier amplitudes that are interacting.
The purpose of applying a certain data analysis technique to a completely known model is to investigate the potentiality of the analysis tool in retrieving the expected information, preparing it for future applications to real systems.The choice of the GOY model as a test-bed for the IDQ-based CPA is due both to its high complexity, rendering the test rather solid with respect to the possible intricacies expected in natural systems, and to its popularity in the scientific community, due to how faithfully it simulates real features of turbulence.
In order to focus on how IDQ-based CPA tools are applied in the study of coupled dynamical processes, let us consider two processes, X and Y , whose proxies are two physical variables, x and y, evolving with time, and let us assume that the only thing one measures are the values, x (t) and y (t), as time series.In general, one may suppose the existence of a stochastic dynamical system (SDS) governing the interaction between X and Y , expressed mathematically as: where the terms, f and g, contain stochastic forces rendering the dynamics of x and y probabilistic [2].Actually, the GOY model studied here is defined as deterministic, but the procedures discussed are perfectly applicable, in principle, to any closed system in the form of (1).The sense of applying probabilistic techniques to deterministic processes is that such processes may be so complicated and rich, that a probabilistic picture is often preferable, not to mention the school of thought according to which physical chaos is stochastic, even if formally deterministic [3,4].Indeed, since real-world measurements always have finite precision and most of the real-world systems are highly unstable (according to the definition of Prigogine and his co-workers), deterministic trajectories turn out to be unrealistic, hence an unuseful tool to describe reality.
Through the study of the IDQs obtained from x (t) and y (t), one may, for example, hope to deduce whether the dependence of ẋ on y is "stronger" than the dependence of ẏ on x, hence how Y is driving X.
The IDQs discussed here have been introduced and developed over some decades.After Shannon's work [1], where the information content of a random process was defined, Kullback used it to make comparisons between different probability distributions [5], which soon led to the definition of mutual information (MI) as a way to quantify how much the two processes deviate from statistical independence.The application of MI to time series analysis appears natural: Kantz and Schreiber defined and implemented a set of tools to deduce dynamic properties from observed time series (see [6] and the references therein), including time-delayed mutual information.
The tools of Kantz and Schreiber were augmented in [7] with the introduction of the so-called transfer entropy (TE): information was no longer describing the underdetermination of the system observed, but rather, the interaction between two processes studied in terms of how much information is gained on the one process observing the other, i.e., in terms of information transfer.An early review about the aforementioned IDQs can be found in [8].
TE was soon adopted as a time series analysis tool in many complex system fields, such as space physics [9,10] and industrial chemistry [11], even if Kaiser and Schreiber developed a criticism and presented many caveats to the extension of the TE to continuous variables [12].Since then, however, many authors have been using TE to detect causality, for example, in stock markets [13], symbolic dynamics [14], biology and genetics [15] and meteorology [16,17].In the field of neuroscience, TE has been applied broadly, due to the intrinsic intricacy of the matter [18], and recently extended to multi-variate processes [19].
A very important issue is, namely, the physical meaning of the IDQs described before.Indeed, while the concept of Shannon entropy is rather clear and has been related to the thermodynamical entropy in classical works, such as [20], mutual information and transfer entropy have not been clearly given yet a significance relevant for statistical mechanics.The relationship between TE and the mathematical structure of the system (1) has been investigated in [16], while a more exhaustive study on the application of these information theoretical tools to systems with local dynamics is presented in [21] and the references therein.This "physical sense" of the IDQs will be the subject of our future studies.
The paper is organized as follows.
A short review of the IDQs is done in Section 2.Then, the use of MI and TE to discriminate the "driver" and the "driven" process is criticized, and new normalized quantities are introduced, more suitable for analyzing the cross-predictability in dynamical interactions of different "intrinsic" randomness (normalizing the information theoretical quantities, modifying them with respect to their initial definitions, is not a new thing: in [22], the transfer entropy is modified, so as to include some basic null hypothesis in its own definition; in [23], the role of information compression in the definitions of IDQs is stressed, which will turn out to emerge in the present paper, as well).
The innovative feature of the IDQs described here is the introduction of a variable delay, τ : the IDQs peak in the correspondence of some τ estimating the characteristic time scale(s) of the interaction, which may be a very important point in predictability matters [24].The problem of inferring interaction delays via transfer entropy has also been given a rigorous treatment in [25], where ideas explored in [24] are discussed with mathematical rigor.
In Section 3, the transfer entropy analysis (TEA) is applied to synthetic time series obtained from the GOY model of turbulence, both in the form already described, e.g., in [10,24], and in the new normalized version, defined in Section 2. The advantages of using the new normalized IDQs are discussed, and conclusions are drawn in Section 4.

Normalized Mutual Information and Transfer Entropy
In all our reasoning, we will use four time series: those representing two processes, x (t) and y (t), and those series themselves time-shifted forward by a certain amount of time, τ .The convenient notation adopted reads: (in Equation ( 2) and everywhere, ":=" means "equal by definition").The quantity, p t (x, y), is the joint probability of having a certain value of x and y at time t; regarding notation, the convention: is understood.Shannon entropy is defined for a stochastic process, A, represented by the variable, a, quantifying the uncertainty on A before measuring a at time t.Since, in practice, discretized continuous variables are often dealt with, all the distributions, p t (a), as in Equation ( 4), are then probability mass functions (pmfs) rather than probability density functions (pdfs).For two interacting processes, X and Y , it is worth defining the conditional Shannon information entropy of X given Y : The instantaneous MI shared by X and Y is defined as: positive M t (X, Y ) indicates that X and Y interact.The factorization of probabilities and non-interaction has an important dynamical explanation in stochastic system theory: when the dynamics in Equation ( 1) is reinterpreted in terms of probabilistic path integrals [26], then the factorization of probabilities expresses the absence of interaction terms in stochastic Lagrangians.Indeed, (stochastic) Lagrangians appear in the exponent of transition probabilities, and their non-separable addenda, representing interactions among sub-systems, are those terms preventing probabilities from being factorizable.About M t (X, Y ), one should finally mention that it is symmetric: There may be reasons to choose to use the MI instead of, say, cross-correlation between X and Y : the commonly used cross-correlation encodes only information about the second order momentum, while MI uses all information defined in the probability distributions.Hence, it is more suitable for studying non-linear dependencies [27,28], expected to show up at higher order momenta.
In the context of information theory (IT), we state that a process, Y , drives a process, X, between t and t + τ (with τ > 0) if observing y at the time, t; we are less ignorant of what x at the time, t + τ , is going to be like, than how much we are on y at the time t + τ observing x at the time t.The delayed mutual information (DMI): turns out to be very useful for this purpose.DMI is clearly a quantity with which cross-predictability is investigated.
In [9], a generalization of DMI is presented and referred to as transfer entropy (TE), by adapting the quantity originally introduced by Schreiber in [7] to dynamical systems, such as Equation ( 1), and to time delays τ that may be varied, in order to test the interaction at different time scales: In practice, the TE provides the amount of knowledge added to X at time t + τ , knowing x (t), by the observation of y (t).
The easiest way to compare the two verses of cross-predictability is of course that of taking the difference between the two: as done in [9,10,24].The verse of prevailing cross-predictability is stated to be that of information transfer.Some comments on quantities in Equation ( 9) are necessary.
Consider taking the difference between T Y →X (τ ; t) and T X→Y (τ ; t) in order to understand which is the prevalent verse of information transfer: if one of the two processes were inherently more noisy than the other, then the comparison between X and Y via such differences would be uneven, somehow.Since cross-predictability via information transfer is about the efficiency of driving, then the quantities, M Y →X (τ ; t) and T Y →X (τ ; t), must be compared to the uncertainty induced on the "investigated system" by all the other things working on it and rendering its motion unpredictable, in particular the noises of its own dynamics (here, we are not referring only to the noisy forces on it, but also/mainly to the internal instabilities resulting in randomness).When working with M Y →X (τ ; t), this "uncertainty" is quantified by I (X τ ), while when working with T Y →X (τ ; t), the "uncertainty-of-reference" must be I t (X τ |X).One can then define a normalized delayed mutual information (NDMI): and a normalized transfer entropy (NTE): or equally: These new quantities, R Y →X (τ ; t) and K Y →X (τ ; t), will give a measure of how much the presence of an interaction augments the predictability of the evolution, i.e., will quantify cross-predictability.
The positivity of ∆R Y →X (τ ; t) or ∆K Y →X (τ ; t) is a better criterion than the positivity of ∆M Y →X (τ ; t) or ∆T Y →X (τ ; t) for discerning the driving direction, since the quantities involved in R Y →X (τ ; t) and K Y →X (τ ; t) factorize the intrinsic randomness of a process and try to remove it with the normalization, I t (X τ ) and I t (X τ |X), respectively.Despite this, ∆M Y →X (τ ; t) and ∆T Y →X (τ ; t) can still be used for that analysis in the case that the degree of stochasticity of X and Y are comparable.Consider for instance that at t + τ , the Shannon entropy of X and Y are equal both to a quantity, I 0 , and the conditioned ones equal both to J 0 ; clearly, one has: and the quantity, ∆R Y →X (τ ; t), is proportional to ∆M Y →X (τ ; t) through a number I −1 0 , so they encode the same knowledge.The same should be stated for ∆K Y →X (τ ; t) and ∆T Y →X (τ ; t).This is why we claim that the diagnoses in [9,10,24] were essentially correct, even if we will try to show here that the new normalized quantities work better in general.
Before applying the calculation of the quantities, ∆T Y →X (τ ; t) and ∆K Y →X (τ ; t), to the turbulence model considered in Section 3, it is worth underlining again the dependence of all these IDQs on the delay, τ : the peaks of the IDQs on the τ axis indicate those delays after which the process, X, shares more information with the process, Y , i.e., the characteristic time scales of their cross-predictability, due to their interaction.

Turbulent Cascades and Information Theory
This section considers an example in which we know what must be expected, and apply our analysis tools to it to check and refine them.In this case, the application of the normalized quantities instead of the traditional ones revised in Section 2 is investigated in some detail.In the chosen example, the IDQs are used to recognize the existence of cascades in a synthetic model of fluid turbulence [29,30].
Some theoretical considerations are worth being done in advance.The quantities described above are defined using instantaneous pmfs, i.e., pmfs that exist at time t.As a result, the quantities may vary with time.Unfortunately, it is difficult to recover the statistics associated with such PMFs, except in artificial cases, when running ensemble simulations on a computer.When examining real-world systems, for example in geophysics, then this luxury is not available.Hence, in many cases, one can only calculate time statistics rather than ensemble statistics; since any analysis in terms of time statistics is only valid when the underlying system is sufficiently ergodic, what one can do is to restrict the analysis to locally ergodic cases, picking up data segments in which ergodicity apparently holds.In the following experiment, pmfs are calculated by collecting histograms from the time series, with appropriate choices of bin-width, e.g., see [31].
The system at hand is described in [30,32] and the references therein and is referred to as the Gledzer-Ohkitana-Yamada shell model (the GOY model, for short): this is a dynamic system model, which can be essentially considered as a discretization of the fluid motion problem, governed by the Navier-Stokes equation, in the Fourier space.The GOY model was one of the first available shell models for turbulence.Indeed, other, more advanced models exist (see e.g., [30]).However, we will limit our discussion to the GOY model, because all the other refined ones mainly do not substantially differ in the energy cascading mechanism in the inertial domain.
The physical variable that evolves is the velocity of the fluid, which is assigned as a value, V h , at the h-th site of a 1D lattice; each of these V h evolves with time as V h = V h (t).The dependence upon the index, h, in V h is the space-dependence of the velocity field.With respect to this space dependence, a Fourier transform can be performed: out of the real functions, V h (t), a set of complex functions u n = u n (t) will be constructed, where u n is the n-th Fourier amplitude of the velocity fluctuation field at the n-th shell characterized by a wavenumber, k n .The n-th wavenumber k n is given by: k 0 being the fundamental, lowest wavenumber and q a magnifying coefficient relating the n-th to the (n + 1)-th wavenumber as k n+1 = qk n .In the case examined, the coefficient, q, is two, approximately meaning that the cascade takes place, halving the size of eddies from u n to u n+1 .Each Fourier mode, u n (t), is a physical process in its own right, and all these physical processes interact.The velocity field, V h , is supposed to be governed by the usual Navier-Stokes equation, whose non-linearities yield a coupling between different modes [29].The system is not isolated, but an external force stirs the medium.The force is assigned by giving its Fourier modes, and here, it is supposed to have only the n = 4 mode different from zero.The complex Fourier amplitude, f n , of the stirring external force is hence given by f n = δ 4,n (1 + i) f , f being a constant.The system of ordinary differential equations governing the u n s according to the GOY model turns out to be written as: where n = 1, 2, ... and z * is the complex conjugate of z.Each mode, u n , is coupled to u n+1 , u n+2 , u n−1 and u n−2 in a non-linear way, and in addition, it possesses a linear coupling to itself via the dissipative term, −νk 2 n u n .There is also a coupling to the environment through f n , which actually takes place only for the fourth mode.In the present simulations, the values of f = 5 * 10 −3 (1+i) and ν = 10 −7 were used.The integration procedure is the one due to Adam and Bashfort, described in [32], with an integration step of 10 −4 .
Even if the lattice is 1D, the equations in (14) show coefficients suitably adapted, so that the spectral and statistical properties of turbulence here are those of a real 3D fluid dynamics, so that we are really studying 3D turbulence.
The stirring force pumps energy and momentum into the fluid, injecting them at the fourth scale, and the energy and momentum are transferred from u 4 to all the other modes via the non-linear couplings.There is a scale for each k n and eddy turnover time τ n .At each k n , a τ n = 2π kn|un| corresponds.The characteristic times of the system will be naturally assigned in terms of the zeroth mode eddy turnover time, τ 0 , or of other τ n s.After a certain transitory regime, the system reaches an "equilibrium" from the time-average point of view, in which the Fourier spectrum appears for the classical energy cascade of turbulence, as predicted by Kolmogorov [29] (see Figure 1).
Figure 1.The time-average instantaneous power spectral density (PSD) of the velocity field of the Gledzer-Ohkitana-Yamada (GOY) model after a certain transitory regime.The modes chosen for the transfer entropy analysis (TEA) are indicated explicitly, together with the n = 4 scale at which the external force, f , is acting.The quantity on the ordinate is P SD(k n ), as defined in Equation ( 15) below.The quantity represented along the ordinate axis of Figure 1 is the average power spectral density (PSD), defined as follows: where [t 1 , t 2 ] is a time-interval taken after a sufficiently long time, such that the system ( 14) has already reached its stationary regime (i.e., after some initial transient regime with heterogeneous fluctuations in which the turbulence is not fully developed yet).The evaluation of the duration of the transient regime is made "glancing at" the development of the plot of |un| 2 kn versus k n as the simulation time runs and picking the moment after which this plot does not change any more.In terms of the quantities involved in Equation (15), this means t 1 is "many times" the largest eddy turnover time.
The energetic, and informatic, behavior of the GOY system is critically influenced by the form of the dissipative term, −νk 2 n u n , in Equation ( 14): the presence of the factor, k 2 n , implies that the energy loss is more and more important for higher and higher |k n |, i.e., smaller and smaller scales.The energy flows from any mode, u n , both towards the smaller and the higher scales, since u n is coupled both with the smaller scales and larger scales.Energy is dissipated at all the scales, but the dissipation efficiency grows with k 2 n , so that almost no net energy can really reflow back from the small scales to the large ones.In terms of cross-predictability, a pass of information in both verses is expected, but the direct cascade (i.e., from small to large |k n |s) should be prevalent.Not only this: since the ordinary differential equations Equation ( 14) indicate a k-local interaction (up to the second-adjacent n, i.e., n ± 1 and n ± 2 coupling with n), one also expects the coupling between u m and u n m or u n m to be almost vanishing and the characteristic interaction times to be shorter for closer values of m and n.Our program is to check all these expectations by calculating the transfer entropy and its normalized version for the synthetic data obtained running the GOY model (14).In particular, we would like to detect the verse of information transfer along the inertial domain between shells not directly coupled in the evolution Equation (14).
To get the above target, and to investigate the application of TEA to the GOY model and illustrate the advantages of using the new normalized quantities discussed in Section 2, we selected three non-consecutive shells.In particular, the choice: is made.The real parts of u 9 , u 13 and u 17 are reported in Figure 2 as functions of time for a short time interval of 15 τ 4 .For each of the selected shells, we considered very long time series of the corresponding energy e n = |u n | 2 .The typical length of the considered time series is of many ( 1, 000) eddy turnover times of the injection scale.The quantities, ∆T 1→2 and ∆T 1→3 , and ∆K 1→2 and ∆K 1→3 , can be calculated as functions of the delay, τ .The difference ∆T i→j or ∆K i→j are calculated as prescribed in Section 2 using the time series, e i (t) and e j (t), in the place of y (t) and x (t), respectively.The calculations of the quantities, T 1→2 , T 2→1 , T 1→3 , T 3→1 and the corresponding quantities normalized, i.e., K 1→2 , K 2→1 , K 1→3 and K 3→1 , give the results portrayed in Figure 3, where all these quantities are reported synoptically as a function of τ in units of the eddy turnover time, τ #1 , that pertains to the 1 mode (with n = 9).Figure 3.The quantities, T 1→2 , T 2→1 , T 1→3 , T 3→1 , K 1→2 , K 2→1 , K 1→3 and K 3→1 (see Section 2) calculated for the three modes of the GOY model chosen.In the case of the quantities K a→b , the inset shows the normalization factor.All the quantities are expressed as functions of the delay in units of the eddy turnover time τ #1 = τ 9 .Note that transfer entropy is always positive, indicating one always learns something from observing another mode; transfer entropy decreases as τ increases, since the distant past is not very helpful for predicting the immediate future (about the positiveness of these quantities, it should be mentioned that this has been tested against surrogate data series, as described in Figure 4 below).
The use of non-adjacent shells to calculate the transfer of information is a choice: the interaction between nearby shells is obvious from Equation ( 14), while checking the existence of an information transfer cascade down from the large to the small scales requires checking it non-locally in the k-space.
All the plots show clearly that there is a direct cascade for short delays.The first noticeable difference between the transfer entropies and the normalized transfer entropies is that in the #1 ↔ #3 coupling, a non-understandable inverse regime appears after about 4τ #1 , when the "traditional" transfer entropy is used.Instead, the use of the normalized quantities suggests decoupling after long times (after about 6τ #1 ).A comparison between the #1 ↔ #2 and #1 ↔ #3 interactions is also interesting: the maximum of the "direct cascade" coupling is reached at less than 0.5τ #1 for both the interactions if the TEs are used.However in the plot of K 1→2 , K 2→1 , K 1→3 and K 3→1 , some time differences appear; this is clarified when difference quantities are plotted, as in Figure 4.Both the analyses diagnose a prevalence of the smaller-onto-larger wavenumber drive for sufficiently small delays: the ∆T 1→2 indicates a driving of u 9 onto u 13 (Mode #1 onto Mode #2) for τ 5τ #1 , while ∆T 1→3 indicates a driving of u 9 onto u 17 (Mode #1 onto Mode #3) for τ 3.5τ #1 .This is expected, due to how the system ( 14) is constructed.What is less understandable is the fact that for large values of τ , the quantities ∆T 1→2 (τ ) and, even more, ∆T 1→3 (τ ) become negative.This would suggest that after a long time a (weaker) "inverse cascade" prevails; however, this is not contained in the system (14) in any way and, hence, is either evidence of chaos-driven unpredictability or is an erroneous interpretation.For this reason, it is instructive to examine the plots of ∆ 1→2 K (τ ) and ∆ 1→3 K (τ ): after roughly 6.5τ #1 , the modes appear to become decoupled, since ∆ 1→2 K 0 and ∆ 1→3 K 0, albeit with significant noise.The lack of evidence of an inverse cascade in these plots suggests that the interpretation of an inverse cascade as descending from the old TEA was wrong.
The misleading response of the ∆T analysis may be well explained looking at the insets in the lower plots of Figure 3, where the quantities reported as functions of τ are the normalization coefficients, I t (X τ |X), indicating the levels of inherent unpredictability of the two e n (t) compared.In the case of the 1 → 2 comparison, the levels of inherent unpredictability of the two time series, e 9 (t) and e 13 (t), become rather similar for large τ , while the asymptotic levels of inherent unpredictability are very different for e 9 (t) and e 17 (t) (indeed, one should expect that the more different is m from n, the more different level of inherent unpredictability will be shown by e m and e n ).This means that ∆T 1→2 (τ ) and ∆ 1→2 K (τ ) are expected to give a rather similar diagnosis, while the calculation of ∆ 1→3 K (τ ) will probably fix any misleading indication of ∆T 1→3 (τ ).
Another observation that deserves to be made is about the maxima of ∆T 1→2 (τ ), ∆T 1→3 (τ ), ∆ 1→2 K (τ ) and ∆ 1→3 K (τ ) with respect to τ : this should detect the characteristic interaction time for the interaction, (e 9 , e 13 ), and for the interaction, (e 9 , e 17 ).In the plots of ∆T 1→2 (τ ) and ∆T 1→3 (τ ), of the correct verse of dynamical enslaving is better visible using K Y →X and ∆K Y →X , in which the intrinsic randomness of the two processes is taken into account (in particular, the inspection of ∆T Y →X indicated the appearance of an unreasonable inverse cascade for large τ , which was ruled out by looking at ∆K Y →X ).
An indication is then obtained that for the irregular non-linear dynamics at hand, the use of the TEA via ∆K Y →X (τ ; t) is promising in order to single out relationships of cross-predictability (transfer of information) between processes.
The systematic application of the TEA via ∆K Y →X to models and natural systems is going to be done in the authors' forthcoming works.

Figure 2 .
Figure 2. Time series plots showing the real part of the processes, u 9 (t), u 13 (t) and u 17 (t).The time is given in units of the eddy turnover time, τ 4 , of the scale forced.

Figure 4 .
Figure 4. Comparison between TEA via the "traditional" transfer entropies (TEs) (left) and the new renormalized quantities (right), in the case of the modes selected in the GOY model.The significance limit appearing in the plot on the right was obtained by a surrogate data test, through N = 10 4 surrogate data realizations, as described in the text.0.20 0.15 0.10 0.05 0.00 -0.05 -0.10