Information Geometry in Classical and Quantum Systems

A probabilistic description is essential for understanding the dynamics of stochastic systems far from equilibrium. To compare different Probability Density Functions (PDFs), it is extremely useful to quantify the difference among different PDFs by assigning an appropriate metric to probability such that the distance increases with the difference between the two PDFs. This metric structure then provides a key link between stochastic processes and geometry. For a non-equilibrium process, we define an infinitesimal distance at any time by comparing two PDFs at times infinitesimally apart and sum these distances in time. The total distance along the trajectory of the system quantifies the total number of different states that the system undergoes in time and is called the information length. By using this concept, we investigate classical and quantum systems and demonstrate the utility of the information length as a unique Lagrangian diagnostic to quantify the information change as a system continuously evolves in time and to map out attractor structure. We further elucidate quantum effects (uncertainty relation) and the dual role of the width of PDF in quantum systems.


Introduction
Stochastic processes are ubiquitous in nature and laboratories, and play a major role across traditional disciplinary boundaries.For a proper understanding and description of such processes, it is essential to utilize a probabilistic methodology such as a Probability Density Function (PDF).Furthermore, in order to compare different systems, it is invaluable to employ a measure which is independent of any particular realization of a system.This can very conveniently be achieved by using a geometric measure [1] in a statistical space by assigning a metric between PDFs such that the distance increases with the difference between the two PDFs.This metric structure then provides a key link between stochastic processes and information geometry.One important example is a dimensionless, statistical distance [2] based on the Fisher (or Fisher-Rao) metric [3], which represents the number of distinguishable states between two PDFs.For example, for a Gaussian distribution, statistically distinguishable states are determined by the standard deviation, which increases with the level of fluctuations; two PDFs which have the same standard deviation and differ in peak positions by less than one standard deviation are statistically indistinguishable.Previous work using this fluctuation-based metric include [4][5][6][7][8][9][10][11][12][13][14].
Compared with a metric defined for any given two PDFs, significantly much less work has been done in the case of a time-dependent PDF in non-equilibrium systems.A continuous change in PDFs in this case necessitates defining a distance at any time by comparing two PDFs at times infinitesimally apart and then integrating these distances over time (see Section 2).The cumulative change in information is mathematically quantified by the information length L(t), which is uniquely defined as a function of time for a given initial condition.This paper reviews the utility of L(t) in understanding information change and attractor structure in classical and quantum systems [15][16][17][18][19][20][21][22][23].
The remainder of the paper is organized as follows.§2 discusses information length and §3 investigates attractor structure.§4 and §5 present the analysis of classical music and quantum systems, respectively.Conclusions are found in §6.

Information length
Intuitively, we define the information length L by computing how quickly information changes in time and then measuring the clock time based on that time scale.Specifically, the characteristic time scale τ over which the information changes can be computed by the correlation time of a time-dependent PDF, say p(x, t), as follows.
As defined in Eq. ( 1), τ(t) has the dimension of time and serves as a dynamical time unit for information change.L(t) is the accumulated change in information between the initial and final times, 0 and t respectively, given by Note that τ(t) in Eq. ( 1) can vary with time, so we need the integral for L in Eq. ( 2).To make an analogy, we can consider an oscillator with a period τ = 2 secs.Then, within the clock time 10 secs, there are 5 oscillations.When the period τ is changing with time, we need an integration of dt/τ over the time interval.We now recall how τ(t) and L(t) in Eqs. ( 1)-( 2) are related to the relative entropy (or Kullback-Leibler divergence) [17,18].To this end, we consider p 1 = p(x, t 1 ) and p 2 = p(x, t 2 ) and the relative entropy D(p 1 , p 2 ) = dx p 2 ln (p 2 /p 1 ).We compute this to the second order in an infinitesimally small |t 2 − t 1 | by Taylor expansion.To this end, we calculate By taking the limit where t 2 → t 1 = t (p 2 → p 1 = p) and by using the total probability conservation dx∂ t p = 0, Eqs. ( 3) and ( 5) above lead to lim while Eqs.( 4) and (6) give lim where O((dt) 3 ) is higher order term in dt.We can then define the infinitesimal distance dl(t 1 ) between t 1 and t 1 + dt by We sum dt(t 1 ) for t 1 = 0, dt, ...t − dt by using Eq. ( 10) and then take the limit of dt → 0 as where L(t) is the information length.Thus, L is related to the sum of infinitesimal relative entropy.It is important to note that Eq. (11), that is, L, depends not only on the initial and final PDFs p(x, 0) and p(x, t), but also on the particular path that a system takes.Thus, in general, l(t) 2 in Eq. ( 11) is not simply proportional to the relative entropy D(p(x, 0), p(x, t)), which depends only on p(x, 0) and p(x, t).
To recapitulate, Eq. ( 2) establishes a Lagrangian distance between the initial and final PDFs according to the total number of different states that a system passes through in time.For example, in equilibrium where ∂p ∂t = 0 and hence τ(t 1 ) → ∞ for all time t 1 ; thus dt 1 /τ(t 1 ) = 0 in Eq. ( 2), and t 0 dt 1 /τ(t 1 ) = 0.This reflects that there is no flow of information in equilibrium.In the opposite limit of a small τ, information changes very quickly.Eq. ( 1) can be written in terms of Fisher metric [3] as shown in §III.

Attractor structure
Since L(t) represents the accumulated change in information (due to the change in PDF) at time t, L(t) approaches a constant value L ∞ when a PDF settles into its final equilibrium PDF.The latter is a unique representation of the total number of statistically different states that a PDF evolves through in reaching a final PDF; the smaller L ∞ , the smaller number of states that the initial PDF passes through to reach the final equilibrium.Therefore, L ∞ provides us with a path-dependent, Lagrangian measure of the distance between a given initial and final PDF.This enables us to map out the attractor structure (the proximity of y 0 to an equilibrium) by choosing a narrow initial PDF at a different peak position y 0 and by measuring L ∞ against y 0 .We are particularly interested in the behaviour of L ∞ against y 0 depending on whether a system has a stable equilibrium point in the absence of noise or is chaotic.

Linear vs cubic forces
We first consider the case where a system has a stable equilibrium point in the absence of noise and investigate how L ∞ is affected by different deterministic forces [17,18].To this end, we consider a stochastic variable x driven by a random noise ξ with the strength D, governed by the following Langevin equation for over-damped oscillators [24]: Here, x is a random variable of interest, and ξ is a white noise with a short correlation time with the following property: Here, D is the strength of the forcing.We note that the dimension of D is length 2 /time since the dimensions of ξ and δ(t − t ) are length/time and 1/time, respectively.F(x) is a deterministic force, which can be interpreted as the gradient of the potential U(x) as F(x) = − ∂U(x) ∂x .We consider two types of F, which both have a stable equilibrium point x = 0; the first one is the linear force F = −γx ) and the second is the cubic force F = −µx 3 (U = µ 4 x 4 ).Here γ and µ have dimensions of 1/time and 1/(time × length 2 ), respectively.The linear system is the familiar Ornstein-Uhlenbeck (O-U) process, which has been widely used as a model for a noisy relaxation system in many areas of physical science and financial mathematics (e.g.[? ]).The Fokker-Planck equation corresponding to Eqs. ( 12) and ( 13) is [24 As an initial PDF, we consider a Gaussian PDF with the inverse temperature β 0 as Then, for the O-U process, we have an exact time-dependent PDF [17,18]: where 1 Here x(t) is the mean position of the Gaussian profile, and y 0 is its initial value.Similarly, β(t) is the inverse temperature, and β 0 is its initial value.As t tends to infinity, x → 0 and β(t) → γ 2D ≡ β * .To compare initial and final equilibrium states, it is convenient also to introduce D 0 = γ 2β 0 .The variance at t = 0 and t → ∞ is then given by (x 0 − y 0 ) 2 = 1 2β 0 = D 0 γ and x 2 = 1 2β * = D γ , respectively.We note from Eqs. ( 16)-( 18) that when D = D 0 , β(t) = β 0 = γ 2D for all time.In this case, the Gaussian PDF simply moves from y 0 to 0 without changing its shape.If D is greater (lesser) than D 0 , it also broadens (narrows) as it moves.
For this Gaussian process, β and x constitute a parameter space on which the distance is defined with the Fisher metric tensor [3] g ij (i, j = 1, 2) as [18]   where i, j = 1, 2, z 1 = β, z 2 = x .This enables us to recast 1 τ 2 in Eq. ( 1) in terms of g ij as 1 The derivation of the first relation in Eq. ( 20) is provided in Appendices A-B.Using Eqs. ( 2) and ( 20), we can calculate L analytically for this O-U process (see also Appendices A-B).
In comparison, theoretical analysis can be done only in limiting cases such as small and large times for the cubic process [19,25].In particular, the stationary PDF for large time can be shown to have the form where β c = µ 4D .Eq. ( 14) must be solved numerically to calculate L(t).
To summarise, due to the restoring forcing F, the equilibrium is given by a PDF around x = 0, Gaussian for linear force and quartic exponential for cubic force.If we were to pick any point in x, say y 0 , we are curious about how close y 0 is to the equilibrium and how F(x) affects it.To determine this, we use a range of points around x = y 0 to make a narrow PDF (see Fig. 1) and measure L ∞ .The question is how this L ∞ depends on y 0 .We repeat the same procedure for the cubic process, as shown in Fig. 1, and examine how L ∞ depends on y 0 .L ∞ as a function of y 0 is shown for both linear (in red dotted line) and cubic (in blue solid line) processes in Fig. 2. In the linear case we can see a clear linear relation between y 0 and L ∞ , meaning that the information length preserves the linearity of the system.This linear relationship holds for all D and D 0 .In contrast, for the cubic process, the relation is not linear, and the log-log plot on the right in Fig. 2 shows a power-law dependence with the power-law index p.This power-law index p varies between 1.52 and 1.91 and depends on the width (∝ D 1/2 0 ) of initial PDF and stochastic forcing amplitude D, as shown in [18].This demonstrates that nonlinear interaction tends to change geometric structure of a non-equilibrium process from linear to power-law scalings.In either cases here, L ∞ has a smooth variation with y 0 with its minimum value at y 0 = 0 since the equilibrium point 0 is stable.This will be compared with the behaviour in chaotic systems in §3.2.

Chaotic attractor
Section 3.1 demonstrates that L ∞ takes its minimum value at a stable equilibrium point [17,18].We now show that in contrast, in a chaotic attractor, L ∞ is minimum for an unstable equilibrium point [15].To this end, we consider a chaotic attractor using a logistic map [15].The latter is simply given by a rule as to how to update the value x at t + 1 from its previous value at t as follows [27] where x = [−1, 1] and a is a parameter, which controls the stability of the system.As we are interested in a chaotic attractor, we chose the value a = 2 so that any initial value x 0 evolves to a chaotic attractor given by an invariant density (shown in the right panel of Fig. 3).A key question is then whether all values of x 0 are similar as they all evolve to the same invariant density in the long time limit.To address how close a particular point x 0 is to equilibrium, we i) consider a narrow initial PDF around x 0 at t = 0, ii) evolve it until it reaches the equilibrium distribution, iii) measure the L ∞ between initial and final PDF, and iv) repeat steps i)-iii) for many different values x 0 .For example, for x 0 = 0.7, the initial PDF is shown on the left and final PDF on the right in Fig. 3.We show L ∞ against x 0 in Fig. 4. A striking feature of Fig. 4 is an abrupt change in L ∞ for a small change in x 0 .This means that the distance between x 0 and the final chaotic attractor depends sensitively on x 0 .This sensitive dependence of L ∞ on x(t = 0) means that a small change in the initial condition x 0 causes a large difference in a path that a system evolves through and thus L ∞ .This is quite similar to the sensitive dependence of the Lyapunov exponent on the initial condition [27].That is, our L ∞ provides a new methodology to test chaos.And, it is a good illustration of a chaotic equilibrium.Another interesting feature of Fig. 4 are several points with small values of L ∞ , shown by red circles.In particular, x 0 = 0.5 has the smallest value of L ∞ , indicating that the unstable point is closest to the chaotic attractor.That is, an unstable point is most similar to the chaotic attractor and thus minimises L ∞ .

Music: can we see the music?
Our methodology can be applied to any system as long as time-dependent PDFs can be computed, e.g. from data.As an example, we apply our theory to music data and discuss information change associated with different pieces of classical music.In particular, we are interested in understanding differences among famous classical music in view of information change.To gain an insight, we used the MIDI file [26], computed time-dependent PDFs and the information length as a function of time [16].Fig. 6 shows L(t) against time for Vivaldi's Summer, Mozart, Tchaikovsky's 1812 Overture, and Beethoven's Ninth Symphony 2nd movement.We observe the difference among different composers, in particular, more classical, more subtle in information change.We then look at the rate of information change against time for different music by calculating the gradient of L ( dL dt = 1/τ) in Fig. 6, which also manifests the most subtle change in information length for Vivaldi and Mozart.

Quantum systems
Finally, we examine quantum effects on information length [23].In Quantum Mechanics (QM), stochasticity arises due to the uncertainty relation ∆x∆p ≥ h 2 even in the absence of an external noise.In particular, in the semi-classical limit, h serves as a unit of information in the p − x phase space, since each quantum state corresponds to a classical volume h; the total number of states is the classical volume of phase space divided by h.A wider PDF corresponds to a QM wave function with larger variance, and occupies a larger x region in the phase space; it is thus expected to cause more change in information, opposite to what might be expected in classical systems (e.g.discussed in §III.A).To elucidate quantum effects, for simplicity, we consider a particle of mass m under a constant force F and assume that it has an initial Gaussian wave function localised around x = 0 [23] where k 0 = p(t = 0)/h is the wave number at t = 0, d x = (2β 0 ) −1/2 is the width of the wave packet, and p is the momentum.A PDF P(x, t) = |ψ(x, t)| 2 is then found (e.g.see [23,28]): where Here, v 0 is the constant velocity and the angular brackets denote the average.Eq. ( 24) obviously represents a Gaussian, with the mean x = hk 0 t m + Ft 2 2m and the variance The first term in Eq. ( 26) is due to the variance of the initial PDF, ∆(0) = (x(0) − x(0) ) 2 = 1 4β 0 .The second term represents the spreading of the wave packet/PDF in time due to quantum effects, which disappears in the classical limit h → 0 since β = β(0) = 2β 0 .These quantum effects give rises to a super-diffusion ∝ t 2 , occurring faster than the Brownian motion ∆ ∝ t, in the limit of a very narrow initial wave packet (as β(0) → ∞).
Since the PDF in Eq. ( 24) is Gaussian, we can use Eq. ( 20) to find (e.g.see [18]) Here, we defined a characteristic time T = m 2hβ 0 . It is interesting to rewrite T using ∆x(0)∆p(0) ∼ h 2 so that 1 where ∆v 0 = ∆p(0)/m.Thus, T represents the characteristic time scale for the spreading of the initial Gaussian wave packet.Without the constant force F, the right-hand side in Eq. ( 27) decreases with time and τ increases, taking longer time for information change.However, with a constant force in Eq. ( 27), it is possible to have a constant value of τ (geodesic) for a sufficiently large t when the increase in the momentum ∝ Ft compensates the increase in the width of the PDF ∝ t.Specifically, as t → ∞, Eq. ( 27) is reduced to In Eq. ( 29), Ft represents the momentum due to the constant force F while d x = (2β 0 ) −1/2 is the width of the initial wave packet.Thus, Ftd x in Eq. ( 29) represents the phase volume covered by the motion due to the constant force F in the p − x phase space, demonstrating information changes associated with the coverage of a phase volume h due to the constant force F. Interestingly, similar results are also obtained in the momentum representation where L is computed from the PDF P(p, t) in the momentum space (see [29]): where α = 1 2h 2 β 0 .Thus, a (non-zero) constant τ (geodesic) is induced by the force F. Furthermore, L = √ 2αFt ∼ (Ft)∆(0)/h, similar to L in Eq. ( 29).This is an elegant result showing the robustness of L in either position or momentum representation for a geodesic solution, despite the complementary relation between position and momentum in quantum systems.

Conclusions
We investigated the information geometry of non-equilibrium processes involved in classical and quantum systems.Specifically, we introduced τ(t) as a dynamical time scale quantifying information change and calculated L(t) by measuring the total elapsed time t in units of τ.As a unique representation of the total number of statistically different states that a PDF evolves through in reaching a final PDF, L ∞ was demonstrated to be a novel diagnostic for mapping out an attractor structure.In particular, L ∞ preserves a linear geometry of a linear process while manifesting nonlinear geometry in a cubic (nonlinear) process; it takes its minimum value at the stable equilibrium point.In the case of a chaotic attractor, L ∞ exhibits a sensitive dependence on initial conditions like a Lyapunov exponent.Thus, L ∞ is a useful diagnostic for mapping out an attractor structure.To illustrate that L can be applied to any data as long as time-dependent PDFs can be constructed from the data, we presented the analysis of different classical music (e.g.see [16]).Finally, the width of PDFs was shown to play a dual role in information length in quantum systems.It cannot be over-emphasized that L is path-specific and is a dynamical measure of the metric, capturing the actual statistical change that occurs during time evolution.This path-specificity would be crucial when it is desirable to control certain quantities according to the state of the system (e.g.time-dependent PDFs) at any given time (e.g.see [18]).Due to the generality of our methodology, we envision a large scope for further applications to different phenomena.
From Eq. (A2), we can see that the dimension of q, r and T is the inverse of time.By using (A1) and (A2) in Eq. ( 2), we obtain To compute Eq. (A3) for r = 0, we let Y = r 2 + qT and recast it as where Y i and Y f are Y evaluated at T i and T f , and H is defined as (A5) Eq. (A5) is to be evaluated separately for two cases: q ≥ r and q < r.First, for q ≥ r, we use Y = qr − r 2 tan θ in (A5) to obtain ) to obtain We note that Eq. (A7) is continuous across q = r.In Eq. (A4), the contribution from the difference in PDF width through r = 0 and that from the difference in mean value of x (e.g.PDF peaks) through q = 0 appear in both first and second terms.
In the case r = 0 where the initial and final PDFs have the same width, β(t) = β 0 for all time, and Eqs.(A1) and (2) give us We use that for r = 0, T = γe 2γt , T i = γ at t = 0 and simplify Eq. (A8) as where y = y 0 e −γt = x is the mean position.Thus, L in Eq. (A9) is the change in the mean position y 0 − y between initial time 0 and time t measured in units of the resolution D γ .Interestingly, this resolution D γ is the standard deviation, which is the square root of the variance (x − x ) 2 = D γ = 1 2β = 1 2β 0 .In general when q = 0 and r = 0, L results from the mixed contribution from the entropy change (r = 0) and the change in y (q = 0) measured in units of the resolution.In a more technical term, β and y in Eq. (A1) constitute a hyperbolic geometry upon a suitable change of variables (e.g.see [18]).

Figure 1 .
Figure 1.Initial (red) and final (blue) PDFs for the O-U process on the left and the cubic process on the right.

Figure 2 .
Figure 2.Left: L ∞ against x(t = 0) = y 0 for the linear process in red dashed line and for the cubic process in blue solid line.Right: L ∞ against x(t = 0) = y 0 for the cubic process on log-log scale[19].

Figure 3 .
Figure 3. Left: An initial narrow PDF at the peak x 0 = 0.7.Right: The invariant density of a logistic map.

Figure 4 .
Figure 4. L ∞ against the peak position x = x 0 of an initial PDF in the chaotic regime of a logistic map (from [15]).