A Study of the Cross-Scale Causation and Information Flow in a Stormy Model Mid-Latitude Atmosphere

A fundamental problem regarding the storm–jet stream interaction in the extratropical atmosphere is how energy and information are exchanged between scales. While energy transfer has been extensively investigated, the latter has been mostly overlooked, mainly due to a lack of appropriate theory and methodology. Using a recently established rigorous formalism of information flow, this study attempts to examine the problem in the setting of a three-dimensional quasi-geostrophic zonal jet, with storms excited by a set of optimal perturbation modes. We choose for this study a period when the self-sustained oscillation is in quasi-equilibrium, and when the energetics mimick the mid-latitude atmospheric circulation where available potential energy is cascaded downward to smaller scales, and kinetic energy is inversely transferred upward toward larger scales. By inverting a three-dimensional elliptic differential operator, the model is first converted into a low-dimensional dynamical system, where the components correspond to different time scales. The information exchange between the scales is then computed through ensemble prediction. For this particular problem, the resulting cross-scale information flow is mostly from smaller scales to larger scales. That is to say, during this period, this model extratropical atmosphere is dominated by a bottom-up causation, as collective patterns emerge out of independent entities and macroscopic thermodynamic properties evolve from random molecular motions. This study makes a first step toward an important field in understanding the eddy–mean flow interaction in weather and climate phenomena such as atmospheric blocking, storm track, North Atlantic Oscillation, to name a few.


Introduction
The atmospheric motion is rich in scale. In many cases, the formation of weather and climate patterns can be attributed to the interaction between a few scale ranges such as that between synoptic eddies and the jet stream, or that among the synoptic eddies, the low-frequency variability, and the mean flow, just as the storm track in the Northern Hemisphere (e.g., [1][2][3]), the blocking high (e.g., [4,5]), the sudden stratospheric warming, e.g., [6,7], and references therein), the North Atlantic Oscillation (e.g., [8][9][10][11][12][13]), to name but a few. This makes multiscale interaction a central issue in dynamic meteorology.
An important problem in multiscale interaction is how energy is transferred across scales; this transfer is closely related to the fundamental processes in atmospheric flows, namely, barotropic instability and baroclinic instability. The extratropical atmosphere is special in that, while on the whole the available potential energy is cascaded toward smaller scales, the kinetic energy is inversely transferred upward to larger scales (e.g., [14,15]). More specifically, there exists a symbiotic relationship between the synoptic-scale and planetary-scale disturbances. In a 3-scale setting, Cai and Mak [1] found that the former are produced and maintained through extracting energy from the zonal flow via Reynolds stress; they then supply energy to the latter through upscale energy transfer, while the latter form regions of enhanced baroclinicity where the former preferentially grow. This kind of energetic cycle has been employed to interpret many weather/climate phenomena. In the case of atmospheric blockings, for example, Holopainen and Fortelius [16] identified an enhanced transfer of eddy kinetic energy (KE) to the mean flow over the storm tracks. Hansen and Chen [17] found that the Realizing that information flow is a real physical notion, Liang and Kleeman [34] put the problem on a rigorous footing, and obtained in a closed form the information flow between the components of a 2D dynamical system. This formalism was soon generalized by Majda and Harlim [35] to a setting with two subspaces. Recently, it was successfully extended by Liang [36] to systems of arbitrary dimensionality. The following is a brief review of the work that pertains to this study.
We begin by stating a principle or an observational fact about causality: If the evolution of an event, say, X 1 , is independent of another one, X 2 , then the causality from X 2 to X 1 is zero.
Since it is the only quantitatively stated fact about causality, all previous empirical/half-empirical causality formalisms have attempted to verify it in applications. Considering its importance, it has been referred to as the principle of nil causality [36]. Recently, Smirnov [37] systematically examined the traditional formalisms, i.e., transfer entropy analysis and/or Granger causality testing, and found that they cannot verify the principle in a wide range of situations; similar conclusions have been drawn by Lizier and Prokopenko [38]. We will see soon below that, within our framework, this principle turns out to be a proven theorem. Now, consider an n-dimensional continuous-time stochastic system for state variables x = (x 1 , . . . , x n ) where F = (F 1 , . . . , F n ) may be arbitrary nonlinear functions of x and t,ẇ is a vector of white noise, and B = (b ij ) is the matrix of perturbation amplitudes which may also be any functions of x and t.
Here, we adopt the convention in physics and do not distinguish deterministic and random variables; in probability theory, they are usually distinguished with capital and lower-case symbols. Assume that F and B are both differentiable with respect to x and t. We then have the following theorem [36]: Theorem 1. For the system (1), the rate of information flowing from x j to x i (in nats per unit time) is where dx \ i\ j signifies dx 1 . . . dx i−1 dx i+1 . . . dx j−1 dx j+1 . . . dx n , E stands for mathematical expectation, g ii = ∑ n k=1 b ik b ik , ρ i = ρ i (x i ) is the marginal probability density function (pdf) of x i , ρ j|i is the pdf of x j conditioned on x i , and ρ \ j = R ρ(x)dx j .
If T j→i = 0, then x j is not causal to x i ; otherwise, it is causal, and the absolute value measures the magnitude of the causality from x j to x i . For discrete-time mappings, the information flow is in a much more complicated form; see [36]. Corollary 1. [39] When n = 2, In the absence of noise, this is precisely the result of [34] based on a heuristic argument. There is a nice property for the above information flow: (Principle of nil causality) If in Equation (1) neither F 1 nor g 11 depends on X 2 , then T 2→1 = 0.
Note that this is precisely the principle of nil causality. Remarkably, here it appears as a proven theorem, while the classical ansatz-like formalisms fail to verify it in many problems (e.g., [37]).
In the case with only two time series (no dynamical system is given), we have the following result [31]: Theorem 3. Given two time series X 1 and X 2 , under the assumption of a linear model with additive noise, the maximum likelihood estimator (mle) of the rate of information flowing from X 2 to X 1 iŝ where C ij is the sample covariance between X i and X j , and C i,dj the sample covariance between X i and a series derived from X j using the Euler forward differencing scheme:Ẋ j,n = (X j,n+k − X j,n )/(k∆t), with k ≥ 1 some integer.
Equation (4) is rather concise in form; it only involves the common statistics, i.e., sample covariances. In other words, a combination of some sample covariances will give a quantitative measure of the causality between the time series. This makes causality analysis, which otherwise would be complicated with the classical empirical/half-empirical methods, very easy. Nonetheless, note that Equation (4) cannot replace (3); it is just the mle of the latter. A statistical significance test must be performed before a causal inference is made based on the computed T 2→1 . For details, refer to [31].
Considering the long-standing debate ever since George Berkeley in 1710 over correlation versus causation, we may rewrite (4) in terms of linear correlation coefficients, which immediately implies [31]: Causation implies correlation, but correlation does not imply causation.
Causality can be normalized in order to reveal the relative importance of a causal relation. However, the normalization is by no means as trivial as that for covariance, considering that information flow is asymmetric in direction (T 2→1 = T 1→2 in general), and, in addition, there is no such property like a Cauchy-Schartz inequality that makes it possible for covariance to be normalized. In [40], a way of normalization is given, but a complete solution is yet to be sought.
The above formalism has been validated with many benchmark systems (e.g., [36]) such as Baker transformation, Hénon map, Kaplan-Yorke map, and Rössler system, to name a few. Particularly, Equation (4) has been validated with touchstone problems where the traditional Granger causality test and transfer entropy analysis fail. An example is the highly chaotic anticipatory system problem described in [41], which with Equation (4) turns out not to be a problem at all.
The formalism has been successfully applied to the studies of many real world problems, among them are the causal relation between El Niño-Indian Ocean Dipole [31], tropical cyclone genesis prediction [42], near-wall turbulence [28], global climate change ( [43,44]), and financial time series analysis [40], to name but a few. Here, we particularly want to mention the study by Stips et al. [43] who, through examining with Equation (4) the causality between the CO 2 index and the surface air temperature, identified a reversing causal relation with time scale. They found, during the past century, that CO 2 emission indeed drives the recent global warming; the causal relation is one-way, i.e., from CO 2 to global mean atmosphere temperature. Moreover, they were able to find how the causality is distributed over the globe, thanks to the quantitative nature of our formalism. However, on a time scale of 1000 years or over, the causality is completely reversed; that is to say, on a paleoclimate scale, it is global warming that causes CO 2 concentration to rise!

The Governing Equation
Consider a three-dimensional (3D) quasi-geostrophic (QG) model on a β-plane within a channel between latitudes y = ±1 (cf. [45,46]): where ψ is streamfunction, ∂ψ ∂z (6) potential vorticity, J the Jacobian operator such that J(ψ, ζ) = ∂ψ ∂x ∂ζ ∂y − ∂ζ ∂x ∂ψ ∂y , N the buoyancy frequency, and F r the rotational internal Froude number. On the right-hand side, the first term stands for the Rayleigh-type friction and the second is the horizontal dissipation of the potential vorticity. In this study, these two terms are set to be zero (r = A H = 0). This equation together with the following boundary conditions: ψ is periodic in x, ∂ ∂t where b is the bottom relief, forms the problem that we are about to solve. In Equation (9), at z = 1, the atmosphere is taken as a rigid lid (vertical velocity w(z = 1) = 0); at z = 0, a flat bottom is considered and hence b = 0. We choose to solve the equation for the perturbation around the mean stateψ. The mean state is a zonally homogeneous jetŪ =Ū(y, z).
It is easy to verify that it satisfies the QG equation for an ideal fluid. Here,Ū horizontally is assumed to be a cosine jet within [−L, L], L ≤ 1, as that in [47], outside [−L, L], the fluid is motionless. In this study, L is chosen to be 1, Z (z) is prescribed such that ∂Ū ∂z = 0 near z = 0 and z = 1, and is maximized in the upper troposphere. FromŪ, the mean states can be easily obtained, and ∂ζ ∂y where we have shortened F 2 r /N 2 as S(z).
Correspondingly, the boundary conditions are changed to: ψ vanishes at y = ±1, and at z = 0, 1. Note in this studyŪ z = 0 at z = 0, 1, hence the third term vanishes. Thus, this is simply the horizontal advection of Sψ z . If initially ψ z = 0, and there is no energy flux toward the upper and lower boundaries, it will remain unperturbed. To simplify the problem, we then set the vertical boundary condition as As we will see later, though with such a rather strong condition, the result does reproduce the expected energetics typical of the large-scale mid-latitude atmospheric motion.

Model Setup
In this study, we choose a mesh with spacings of ∆x = 0.2, ∆y = 0.04, and ∆z = 0.2, which results in a grid with 50 × 51 × 5 points. Choose a vertical profile for the basic flow Z (z k ) = (0.2, 0.2, 0.6, 1, 1) such that ∂Ū/∂z = 0 at z = 0 (k = 1) and z = 1 (k = 5). To determine S(z) = F 2 r /N 2 , first notice that F r = f 0 L 0 /N 0 H 0 is the rotational Froude number; usually, it is taken as 1. We hence only need to pay attention to the buoyancy frequency N. Scale it by N 0 . In dimensional form, it is defined as where ρ is the density of the fluid. While for oceans this can be directly computed, for the atmosphere, it is usually converted into another form Here,θ is potential temperature; it is the temperature of an air parcel moving adiabatically to some reference pressure (usually 1000 hPa), i.e., a temperature with the effect of pressure change excluded: where κ d ≈ 0.286, R = 8.314 J/mol · K = 287 J/kg · K is the ideal gas constant, and c p = 1004 J/kg · K the specific heat capacity at constant pressure. This yields where − ∂T ∂z is the lapse rate. For the troposphere, usually the lapse rate can be roughly taken as a constant ∂T ∂z ≈ −0.65 • C/100 m = 6.5 × 10 −3 • C/m. That is to say, Thus, In Figure 1, the vertical profile of N 2 0 N 2 for the atmosphere between latitudes 20 • N-60 • N is computed. Furthermore, its reciprocal N −2 0 N −2 is shown. Considering that the lapse rate − ∂T ∂z is nearly a constant 0.65 • C/100 m, N 2 0 /N 2 = S(z) (F 2 r = 1) and hence almost decreases linearly with z, with a rate of (1 − 1 1.4 )/5 = 0.057 per level. In a brief summary, Table 1 gives a list of the parameters for the model.

Solution Strategy
The problems (15)- (17) is solved with a leap-frog scheme. To suppress the computational mode that may arise due to the time splitting, the result at each integration step is filtered in time as follows: with a weak filter coefficient α = 0.01. This is equivalent to a weak dissipation of the flow. At each step, it is required to solve the 3D elliptic equation for ψ subject to in the vertical, a no-flux condition in y, and a periodic condition in x. We may separate z from (x, y) to convert the 3D equation to a set of 2D equations. Separation of variables results in an eigenvalue problem together with a boundary condition This is a Sturm-Liouville problem (cf. [48]), and it can be proven that the resulting eigenvectors {θ k } form an orthogonal set. The set is complete and can be normalized. Thus, it can be made as an orthonormal basis. Expand ψ and ζ (time-dependence suppressed for notational simplicity) with the basis: where n is the number of levels (of the discretized model) in the vertical direction, and substitute them into the original Equation (25) to get By the orthonormality of {θ k }, the original 3D equation is transformed into n decoupled 2D equations: which can be solved individually.
The eigenvalue problems (27)- (28) are solved with the parameters as listed in Table 1. The resulting eigenvalues λ k , k = 1, . . . , 5, are all negative; see Figure 2. The corresponding eigenvectors θ k are displayed in Figure 3; it is easy to verify that they are orthonormal (cf. [48]).
When N 2 is constant, the problem can be solved analytically (e.g., [49]). In that case, the most rudimentary mode is barotropic and the remaining ones baroclinic. In this case, θ 1 is the very barotropic mode, and θ 3 , θ 4 , θ 5 are approximately sinusoidal and hence are just like the baroclinic modes. Here, as N 2 varies with height, mode 2 somehow has a different form. Note that c k = 1/ √ −λ k corresponds to the phase speed of mode k.

Initialization
Different initial disturbances will in general yield different solutions, many of which may be eventually damped out. In order to obtain a quasi-equilibrium oscillatory state, there exists some "optimal perturbation." To find this, linearize Equation (15) This together with the boundary condition forms a linear system. Write (33) and (34) as and write ζ together with (34) as Then, the linearized perturbation equation becomes As L is linear and independent from t, it commutes with ∂/∂t. We hence obtain the following linear dynamical system for the perturbation field ψ : The discretized version of (38) is denoted as: where u is a vector of the values of ψ . Initialized with a vector u 0 , its solution is The optimal perturbation corresponds to the largest singular value of the matrix e At . To see this, it suffices to consider one particular time, say, t = 1. Perform a singular value decomposition of e A·1 . Perturbations of the modal forms corresponding to singular values greater than 1 will grow. Here, the singular values become smaller than 1 after the modal number m > 4333, as shown in Figure 4. In order to have the disturbance grow, we need to choose the modes with numbers lower than 4333.  Displayed in Figures 5 and 6 are modes 2, 100, 500, and 1000 (mode 1 is trivial). Clearly, they have different structures in both horizontal and vertical directions. Theoretically, the lower the modes, the more efficient the perturbation. However, since this is just a linear solution, the evolution after the initial perturbation may not grow as expected after nonlinearity takes effect. Here, we find that modes 100, 500, and 1000, among others, are satisfactory. In the following, we choose mode 1000 as the perturbation to initialize the system.

Results of the Quasi-Geostrophic Model
After initialization, the model reaches a quasi-equilibrium after some 400,000 steps. Figure 7 shows the evolution of total perturbation kinetic energy. Consider for our purpose the time interval between steps 500,000 and 650,000. We choose such an interval because (1) it is not too large; otherwise, too many processes may be involved and hence the model cannot be reduced much, (2) the processes during the interval appear to be stationary. Another reason is that the energetic cycle mimicks well that in the mid-latitude atmosphere. To see this, apply a multiscale window transform (MWT) to separate the process. MWT is a functional analysis tool developed by Liang and Anderson [24] which can decompose a function space into a direct sum of orthogonal subspaces, each with an exclusive range of scales, while preserving the local properties of the functions. Such a subspace is called a "scale window". Originally, MWT is developed for a physically consistent expression of multiscale quadratic quantities, and hence to make multiscale energetics analysis possible. Traditionally, filters have been widely used for multiscale studies in atmospheric research, but, in a rigorous sense, the traditional filters are generally incapable of representing multiscale energy, which is a concept in phase space (it is connected to energy in the physical sense thanks to the Parseval identity), while the outputs of traditional filters are fields in physical space. Liang and Anderson [24] found that, for a class of specially designed orthogonal filters, there exists a transform-reconstruction pair, i.e., a pair of MWT and multiscale window reconstruction (MWR), just as Fourier transform and inverse Fourer transform. An MWR is just like a filtered quantity, while the corresponding MWT coefficient squared gives the energy on the scale window of concern. With MWT and MWR, it has been established that, for an atmospheric/oceanic flow, at each location, there exists a local Lorenz cycle consisting of three conservative processes. The resulting energy transfers have been referred to as canonical transfers; they all bear a Lie bracket form, in contrast to the classical emprically-obatained energy transfers. A comprehensive introduction of the theory is beyond the scope of this study; for details, see [25]. Now, perform a two-scale window decomposition, and choose the longest scale to be the whole interval. (This can be done by setting the lowest scale window index to be zero; see [24]). With this setting, compute the canonical transfers using the localized multiscale energetics of Liang (2016a) [25]. The horizontally integrated canonical transfers are shown in Figure 8, where positive values indicate a transfer of energy from the mean to the perturbation. Clearly, here the transfer of available potential energy overwhelmingly dominates that of kinetic energy (two orders larger), and, on the whole, the former is cascaded downward (left panel), while the latter is transferred inversely upward (right panel). This seems to agree with what has been observed in the mid-latitude atmosphere (e.g., [14,15]).

Principal Component Analysis
With the modeled 4D field ψ(x, y, z; t) on the chosen time interval, we perform a principal component (PC) analysis, or empirical orthogonal function (EOF) analysis as it is called. The eigenvalues λ are shown in Figure 9a. Obviously, the first three modes possess most of the variance. Displayed in Figure 9b are some of the corresponding PCs. It appears that the first and second PCs approximately are in quadrature phase; they should form a harmonic subsystem. The third and the fourth have similar frequencies, though that of the latter is a little higher.

Model Reduction
The EOF modes form an orthonormal basis for ψ. In this subsection, we use the basis to reduce the original governing Equation (15) into a low-dimensional dynamical system.
With the operator L as defined in (36), Equation (15) becomes Since {e m } is orthonormal, taking an inner product on both sides with e m results in dp m dt where Likewise, from Equation (45), the coefficients for the quadratic terms are computed as follows: Note that in reconstructing ψ there is actually a mean partψ ≡ p 0 to be added. That is to say, Theoretically, this part should vanish in the system, but, in reality, it may not. If it is added to Equation (43), then ∑ m dp m dt The second line is the new term in comparison to the original one. Thus, the following − e m , L −1 [J(p 0 , L e i ) + J(e i , L p 0 )] should be added to the above coefficients α L . In addition, there exists an nonautonomous term However, here it is shown that all these are negligible. Thus, it is adequate to use the above autonomous system in the following studies.

Information Flow between the Scales of the Model Atmosphere
As we showed above, the interactions among the first four EOF modes can be utilized to study the multiscale interactions typical in the problem of concern, as the modes occur on different time scales. In order to examine the information flow between the modes, we make random draws for (p 1 , p 2 , p 3 , p 4 ) from a pool of values, and then, starting from these initial conditions, run forward the system to generate an ensemble of solutions. Assume that the initial values obey a normal distribution with a mean vector (0.1, 0.1, 0.1, 0.1) and a 4 × 4 identity covariance matrix. Here, the variance is set rather small in order for the trajectories to stay under effective control. The sample space is assumed to be [−6, 6] × [−6, 6] × [−6, 6] × [−6, 6], which makes sense if we do not make too long an integration, as made evident in Figure 10, where the trajectory of a sample path is plotted. The space is discretized using a spacing ∆ = 0.2 (the same for the four dimensions), and the probability density functions are then estimated at each time step by counting the bins in the coarse-grained space. To compute the information flows among the four components, recall the deterministic version of Equation (2) t Figure 11. The computed information flows among the components of the reduced system. Note the range scale for the first two subplots (T 1→2 and T 2→1 ) is twice that for the others.
For a system with four components, by expectation, there are in general 4 × 3 = 12 information flows. As we have shown before, the four components make two pairs, i.e., (p 1 , p 2 ) and (p 3 , p 4 ), which essentially represent two scales. Thus, the cross-scale information flows are those between modes (1,2) and modes (3,4). In Figure 11, T 1→2 and T 2→1 are overwhelmingly large (note the different scale range in the first two subplots); second to them are T 3→4 , T 4→3 . By the property of causality (ideally nonzero information flow implies causality), that is to say, p 1 and p 2 are mutually causal, and so are p 3 and p 4 . These are the information flows within their respective scales. These causal patterns are similar to that between the displacement and linear momentum of a harmonic oscillator, as is shown in Liang [36]. From the table of α L m,i indeed to the first order, the system is like d dt just as the computed T 1→2 and T 2→1 would imply. The other information flows are interscale. Strictly speaking, there exist flows in both directions (small-scale−→large-scale and large-scale−→small-scale). However, by observation, |T 4→1 | and |T 3→2 | are much larger than others. This asymmetric flow structure indicates that the causation between the scales are dominantly one-way, i.e., from higher frequency modes (modes 3 and 4) to lower frequency modes, modes 1 and 2.
It should be mentioned that what has been solved is actually the QG equation for the perturbation field; the mean flow is not included in the four components (p 1 , p 2 , p 3 , p 4 ) of the reduced system. However, the influence has been embedded in the system. Here, we give it an evaluation.
For notational convenience, let p 0 denote the "mean component." Since here the mean flow is prescribed, it does not vary in time, so there is no way to examine the influence of other components on it. That is to say, there is no base to study T i→0 , but nonetheless we can evaluate T 0→i , i = 1, 2, 3, 4. We know, from the Bayes' rule, that Since the mean flow is prescribed, it is certain; ρ(p i |p 0 ) is hence in fact ρ i (p i ). Thus, the whole term is equal to 1. This substituted into (48) yields by the compactness of ρ. That is to say, the information flow between the mean flow and the higher frequency components, if it exists, cannot be toward higher modes. In other words, if existing, it must be one way, i.e., in the direction upward toward the mean. It should be emphasized that, generally, the mean flow should also have a distribution, and hence the information flow may not be this easy to evaluate. However, in this case, as we have shown in the preceding section, the variation around the mean is so small that it can be neglected in forming the low-dimensional system. Anyhow, for this particular case, by computation the information flow, and hence causation, is essentially one-way, i.e., from high frequency modes to low frequency modes.
We want to mention that here EOF analysis has been used to reduce the model order. The advantages of using it include its orthonormality, the maximization of variance toward lowest modes, etc. The limitations of this approach are also well known. The most serious one is that the EOF modes may not be real modes in the physical sense. Here, what we are investigating is the information exchange between processes on different temporal scales, and, fortunately, the principal components of the lowest modes do reflect such temporal variabilities ( Figure 9). However, in a more general situation, this may not be true. We hope some advanced methods, such as the recently developed method by Majda and Qi [50] to efficiently reduce models, can help here.
An alternative approach is to use Equation (4) to estimate from data, rather than directly compute, the information flow, and hence avoid solving a large-dimensional Liouville equation (the curse of dimensionality). However, here comes another issue: Theorem 3 relies on the assumption of Gaussianity. Though (4) has also been successfully applied to some highly nonlinear systems, e.g., the chaotic anticipatory system in [41] (see [31]), caution should be used, as non-Gaussianity may appear significant in realistic atmospheres. However, anyhow, these are topics for future studies; here as the first step, we only consider what we have generated with the QG model.

Discussion and Conclusions
How processes on different scales interact to form weather and climate patterns is one of the central issues in dynamic meteorology. Traditionally, it is studied by diagnosing the exchange of energy (such as the Lorenz cycle), or, equivalently, momentum/angular momentum, between the scales. However, it has long been realized that just multiscale energetics based on the governing equation may not be enough. In a nonlinear dynamical system, as time moves on, two highly correlated events may soon lose correlation, while, on the other hand, two completely irrelevant events could turn out to be correlated in the end. As remarked by Corning [51], the underlying causal efficacy may actually be missing in the equations or "rules". In addition, in the classical multiscale formalism, cyclogenesis is driven by Reynolds stress, which is essentially the linear correlation between the perturbation fields.
As we proved earlier on, while causation implies correlation, correlation does not necessarily imply causation. That said, the traditional perspective on the problem may be limited.
In physics, entropy is another concept as important as energy. The transference of entropy results in a flow of information, but how information flows or transfers across scales has been overlooked in dynamic meteorology, in contrast to the extensively studied energy transfer. Recently, information flow has been rigorously formulated in the framework of dynamical systems; it proves to satisfy the "principle of nil causality" (see [36]), an observational fact which people endeavor to verify in real applications. In this study, this formalism is applied to study the information flow among the scales within a three-dimensional quasigeostrophic (QG) circulation. The basic flow is a zonal jet mimicking the atmospheric jet stream. We chose a period when the system is in equilibrium with an energetic scenario typical of a mid-latitude atmosphere: the mean state is releasing available potential energy to eddies, while the latter feeds kinetic energy back to the mean state. We first solved the 3D QG equation; then, for the period of concern, performed a principal component analysis and obtained the EOF modes to construct a basis. It has been shown that these modes characterize the desired temporal scales. The state variable, i.e., streamfunction, is then expanded with the aid of the basis, and the expansion is truncated at the fourth term. By inverting a 3D elliptic differential operator, the QG equation is converted into a four-dimensional dynamical system. The study of the information flows among the scales is then converted into the investigation of the information flows among the components of the low-dimensional system.
Initialized with an ensemble of streamfunctions drawn randomly according to a normal distribution, the system is integrated forward and, at each step, a probability density function is estimated, which, by Formula (2), allows us to obtain the desired information flow pairs. By computation mode 1 and mode 2, which represent the long temporal scale, are mutually causal, functioning like the components of a 2D harmonic oscillator; this is also the case for mode 3 and mode 4 that represent the motion on a short scale. These are the information flows within their respective scales. The interscale flows are significant only for that from mode 4 to mode 1 and that from mode 3 to mode 2, i.e., from modal pair (3,4) to modal pair (1,2). In addition, the possibility that the mean state has information flow to these four modes are excluded. That is to say, for this particular problem, the information flow is mostly one-way-from higher frequency modes to lower frequency modes. Hence, for this particular problem, underlying the multiscale interaction is mostly a bottom-up causation.
The bottom-up causation, or the information flow from the low levels to higher levels, is actually seen in many natural and social phenomena. In investigating the transition in biological complexity, for example, a reductionist will view the emergence of new, higher level, aggregate entities as a result of lower level entities (e.g., [52][53][54]). Similarly, it is found that some simple computer networks may transit from a low traffic state to a high congestion state, entailing a flow of information from a combination of independent objects to a collective pattern representing a higher level of organization. Most of all, in statistical physics [55,56], bottom-up causation lays for it the theoretical foundation, based on which the macroscopic thermodynamic properties can be tracked back to random molecular motions.
However, we did not exclude the existence of information flow the other way around; it is just weak by comparison in this example. Top-down causation has been found in many fields. For example, in community ecology, it has been argued that host community-level structures may determine the disease dynamics and hence control the constituent populations (e.g., [57]). Nonetheless, here we showed that a prescribed mean flow seems to be unlikely to have information flow to the anomalies.
Of course, the result here is just for a particular case with a reduced-order model; in reality, the problem could be very complicated, depending on the stage where the evolving state is. In addition, for simplicity, we have adopted a rigid-lid assumption on the top, and an idealized boundary condition ( ∂ψ ∂z = 0, i.e., no density perturbation) at the bottom, although the simplified model does reproduce the desired downward transfer of available potential energy and upward kinetic energy. Nonetheless, the resulting interaction scenario is encouraging, in agreement with those in complex systems, although it is quite different from the corresponding energetic cycle. This result, though preliminary at this stage, may help better understand the mean flow-eddy interaction, gain deeper insight into the phenomena such as cyclogenesis, atmospheric blocking, sudden stratospheric warming, to name a few. On the other hand, the asymmetric causation (mostly bottom-up) provides an observational basis for the parameterization of the subgrid processes in numerical models, such as the stochastic closure scheme of Majda et al. [58]. All of these are interesting and deserve further investigation. We want to emphasize that information flow is a large field in atmospheric research, and this present study makes only a first attempt; much is yet to be explored in the future.