Information length as a useful index to understand variability in the global circulation

With improved measurement and modelling technology, variability has emerged as an essential feature in non-equilibrium processes. While traditionally, mean values and variance have been heavily used, they are not appropriate in describing extreme events where a significant deviation from mean values often occurs. Furthermore, stationary Probability Density Functions (PDFs) miss crucial information about the dynamics associated with variability. It is thus critical to go beyond a traditional approach and deal with time-dependent PDFs. Here, we consider atmospheric data from the Whole Atmosphere Community Climate Model (WACCM) model and calculate time-dependent PDFs and the information length from these PDFs, which is the total number of statistically different states that a system passes through in time. Time-dependent PDFs are shown to be non-Gaussian in general, and the information length calculated from these PDFs shed us a new perspective of understanding variabilities, correlation among different variables and regions. Specifically, we calculate time-dependent PDFs and information length and show that the information length tends to increase with the altitude albeit in a complex form. This tendency is more robust for flows/shears than temperature. Also, much similarity among flows and shears in the information length is found in comparison with the temperature. These results also suggest the importance of high latitude/altitude in the information budge in the Earth's atmosphere, the spatial gradient of the information as a useful proxy for the transport of physical quantities.

beyond a traditional approach and deal with time-dependent PDFs. Here, we consider atmospheric data from the Whole Atmosphere Community Climate Model (WACCM) and calculate time-dependent PDFs and the information length from these PDFs, which is the total number of statistically different states that a system passes through in time. Time-dependent PDFs are shown to be non-Gaussian in general, and the information length calculated from these PDFs shed us a new perspective of understanding variabilities, correlation among different variables and regions.
Specifically, we calculate time-dependent PDFs and information length and show that the information length tends to increase with the altitude albeit in a complex form. This tendency is more robust for flows/shears than temperature. Also, much similarity among flows and shears in the information length is found in comparison with the temperature. This means a stronger correlation among flows/shears because of a strong coupling through gravity waves in this particular WACCM model. We also find the increase of the information length with the latitude and interesting hemispheric asymmetry for flows/shears/temperature, a stronger anti-correlation (correlation) between flows/shears and temperature at a higher (low) latitude. These results also suggest the importance of high latitude/altitude in the information budge in the Earth's atmosphere, the spatial gradient of the information as a useful proxy for the transport of physical quantities.
Tornadoes are rare, large amplitude events, but can cause very substantial damage when they do occur. Furthermore, gene expression and protein productions, which used to be thought of as smooth processes, have also been observed to occur in bursts (e.g. [16][17][18][19][20].
How to quantify variability mathematically however does not seem to be well established.
For unpredictable events, we use a Probability Density Function (PDF) to describe the likelihood of a certain event to take place. A simplest and popular example is Gaussian PDF which has the nice property of symmetry and uniquely being defined by only two parameters -the mean value µ for the peak position and standard deviation σ for the width of a PDF. Note that the variance is the square of standard deviation. As a broad PDF has a wide range of values for a finite probability, suggesting less predictability, variability can mean a large variance. On the other hand, the temporal change in mean value is also used as a measure of variability. How do then we treat the case where variance increases or decreases in time? Furthermore, for a non-Gaussian PDF, we also need to consider the change in other characteristics like symmetry, skewness or kurtosis, and all other higher moment. This is especially important for extreme events noted above since the assumption of small fluctuations with short correlation time for the Gaussian PDF badly fails, with a very limited utility of mean value and variance. This brings us the importance of considering the entire PDF and their time evolution in defining variability.
In our previous work, we showed that time-dependent PDFs provide a key insight that is completely missing in any studies using only mean values, variance or stationary PDFs.
Specifically, we quantify the similarity and disparity between PDFs by assigning the metric between the two such that the distance between two PDFs increases with the disparity 3 between them [22,23]. For Gaussian PDFs, a statistically different state is attained when the physical distance exceeds the resolution set by the uncertainty (PDF width). We extended this concept to time-dependent problems where a PDF changes continuously in time and introduced the information length L to quantify the number of statistically different states that a system passes through in time to reach time t starting from an initial PDF at time 0 [24][25][26][27][28][29][30][31]. One of the merits of L is that it is invariant under the (time-independent) change of variables and thus can be directly compared between different variables unlike physical variables which have different units. For instance, it can be used to quantify the correlation between different variables [32].
Rigorously, L can be shown to be related to the sum of the infinitesimal relative entropy along the trajectory of the system [31,33]. It is however instructive to consider defining i) a dynamical time scale τ (t) as the rate of information change and then ii) by measuring the clock time t by τ . For example, for a time dependent PDF p(x, t), τ is calculated as From Eq. (1), we can see that the dimension of τ = τ (t) is time and serves as a dynamical time unit for information change. L(t) is the total information change between time 0 and t: The integral in Eq. (2)  what is required is the integral of 1/τ over the time.
As a measure of the information change, L ∞ was shown to map out an attractor structure.
In particular, in the case of a stable equilibrium, the effect of different deterministic forces was demonstrated by the scaling of L ∞ against the peak position of a narrow initial PDF, the minimum value of L ∞ occurring at the equilibrium point. Furthermore, L ∞ varies smoothly with the initial conditions (e.g. the distance of an initial PDF from the attractor point). In a sharp contrast, in the case of a chaotic attractor, L ∞ varies abruptly with the peak position of a narrow initial PDF; this sensitive dependence on initial conditions is reminiscent of a Lyapunov exponent. That is, L provides a new way of understanding dynamical systems.
Finally, the information length can also be applied to any data such as music (e.g. see [29]) where the information flow in different classical musics (e.g. see [29]) were calculated.
In this work, we apply this to the Whole Atmosphere Community Climate Model (WACCM) and show that the information length L(t) as a useful index to measure 'dynamic variability' and 'correlation'. The remainder of this paper is organized as follows.
Section II provides the analysis of WACCM data. Discussions and Conclusion are provided in Section III.  [34,35].
We are interested in the information budget in 7 layers in the atmosphere covering thermosphere, mesopause, mesosphere, stratopause, stratosphere, tropopause, troposphere from the top to the bottom of the atmosphere. We consider three different cases of data sampling for PDFs: • All longitude and latitude data to understand the global information budge at 7 layers.
• All longitude data to understand the information budget across latitude at 7 layers.
• All longitude and latitude data to understand the global information budget at all altitude.
According to the sampling in each case above, we calculate the time-dependent PDFs for the six variables, • Temperature T . blue, green, and red for mesopause, stratopause and tropopause, respectively.
• Total (vertial) shear respectively. Three dashed lines are for the three pauses, blue, green, and red representing mesopause, stratopause and tropopause, respectively. It is noticeable that in Fig. 1, standard deviations of U and V tend to be much larger than mean values at all 7 layers.
At any fixed tine, prominent is a clear phase shift between U and V at the same level.
This is due to the presence of strong gravity waves, driving an almost isotropic turbulence with the phase shift between U and V . Also, much less change in the mean temperature compared to its standard deviation is observed. However, there is no systematic variation in either mean or standard deviation from the top to the bottom layers of the atmosphere. This is to be contrasted to the behaviour of the information length, discussed in detail below.
The first 8 panels in Fig. 2  results from the similar evolution of the time-dependent PDFs of U and V , signifying a strong correlation between U and V due to strong gravity waves (isotropic turbulence), as noted above.
The case of temperature T is shown in the last 8 panels in Fig. 2, where we observe quite broad PDFs with more than one peak at some layers. The ordering of L(t) for T is a bit different from that for U and V near the top layer while similar near the bottom layer. Also, for U and V , the largest information gradient is between thermosphere and mesopause. In comparison, for T , the largest information gradient is observed between tropopause and stratopause while thermosphere is well coupled to mesopause with similar L. In comparison with Fig. 3, PDFs of zonal shear, meridional shear and total shear in Fig. 3 all have much simpler shapes with a narrower width. The evolution and ordering of L for all shears is remarkably similar to those for zonal/meridional flows shown in Fig. 2 although the value of L for shears is smaller than for that for flows (due to less change in time-dependent PDFs). More detailed investigation on the dependence of L on the altitude is presented in §II.3 where L is calculated for all pressure level (instead of the 7 levels).
Finally, we check on the robustness of our results by using different 1-day data and also 2-day data for zonal flows, meridional flow and temperature and by performing similar analysis.

B. Information budget across latitude
The global information budget studied in §II.A includes the contribution from all different latitudes. In order to understand how the information is distributed across latitude, we now use data from all longitude and 5 points in each layer for each variable. The correlation between flows/shears and temperature was shown to depend on the altitude, with the anti-correlation (correlation) between flows/shears and temperature at a higher (low) altitude. Overall, the information length was found to tend to increases as the latitude increase, with the interesting hemispheric asymmetry for flows/shears/temperature.
We propose that the information would tend to flow from a higher information length to a small information length since a small information is due to a large entropy and since the direction of time follows the direction of the entropy increase. The information flow from higher to lower altitude/latitude in general then highlights the importance of the high latitude and altitude in the information budget in the Earth Atmopshere.
In summery, our results suggest the utility of the information length as a useful index to understand correlation among different variables and regions as well as information flow.

IV. ACKNOWLEDGEMENTS
EK acknowledges the Leverhulme Trust Research Fellowship (RF-2018-142-9); EK and JH acknowledge the HAO visitor programmes for their support and are grateful for the hospitality during their one month visit to HAO.