Investigating Information Geometry in Classical and Quantum Systems through Information Length

Stochastic processes are ubiquitous in nature and laboratories, and play a major role across traditional disciplinary boundaries. These stochastic processes are described by different variables and are thus very system-specific. In order to elucidate underlying principles governing different phenomena, it is extremely valuable to utilise a mathematical tool that is not specific to a particular system. We provide such a tool based on information geometry by quantifying the similarity and disparity between Probability Density Functions (PDFs) by a metric such that the distance between two PDFs increases with the disparity between them. Specifically, we invoke the information length L(t) to quantify information change associated with a time-dependent PDF that depends on time. L(t) is uniquely defined as a function of time for a given initial condition. We demonstrate the utility of L(t) in understanding information change and attractor structure in classical and quantum systems.


Introduction
Stochastic processes are ubiquitous in nature and laboratories, and play a major role across traditional disciplinary boundaries. Due to the randomness associated with stochasticity, the evolution of these systems is not deterministic but instead probabilistic. Furthermore, these stochastic processes are described by different variables and are thus very system-specific. This system-specificity makes it impossible to make comparison among different processes. In order to understand universality or underlying principles governing different phenomena, it is extremely valuable to utilise a mathematical tool that is not specific to a particular system. This is especially indispensable given the diversity of stochastic processes and the growing amount of data.
Information geometry provides a powerful methodology to achieve this goal. Specifically, the similarity and disparity between Probability Density Functions (PDFs) is quantified by a metric [1] such that the distance between two PDFs increases with the disparity between them. This was the very idea behind a statistical distance [2] based on the Fisher (or Fisher-Rao) metric [3] which represents the total number of statistically different states between two PDFs in Hilbert space for quantum systems. The analysis in [2] was extended to impure (mixed-state) quantum systems using a density operator by [4]. Other related work includes [5][6][7][8][9][10][11][12]. For Gaussian PDFs, a statistically different state is attained when the physical distance exceeds the resolution set by the uncertainty (PDF width).
This paper presents a method to define such distance for a PDF which changes continuously in time, as is often the case of non-equilibrium systems. Specifically, we invoke the information length L(t) according to the total number of statistically different states that a system evolves through in time. L(t) is uniquely defined as a function of time for a given initial condition. We demonstrate 1 (1) From Equation (1), we can see that the dimension of τ = τ(t) is time and serves as a dynamical time unit for information change. L(t) is the total information change between time 0 and t: In principle, τ(t) in Equation (1) can depend on time, so we need the integral for L in Equation (2). To make an analogy, we can consider an oscillator with a period τ = 2 s. Then, within the clock time 10 s, there are five oscillations. When the period τ is changing with time, we need an integration of dt/τ over the time interval.
We now recall how τ(t) and L(t) in Equations (1) and (2) are related to the relative entropy (Kullback-Leibler divergence) [15,16]. We consider two nearby PDFs p 1 = p(x, t 1 ) and p 2 = p(x, t 2 ) at time t = t 1 and t 2 and the limit of a very small δt = t 2 − t 1 to do Taylor expansion of D[p 1 , p 2 ] = dx p 2 ln (p 2 /p 1 ) by using In the limit Up to O((dt) 2 ) (dt = t 2 − t 1 ), Equation (7) and D(p 1 , p 1 ) = 0 lead to and thus the infinitesimal distance dl(t 1 ) between t 1 and t 1 + dt as By summing dt(t i ) for i = 0, 1, 2, ..., n − 1 (where n = t/dt) in the limit dt → 0, we have (10) where L(t) is the information length. Thus, L is related to the sum of infinitesimal relative entropy. It cannot be overemphasised that L is a Lagrangian distance between PDFs at time 0 and t and sensitively depends on the particular path that a system passed through reaching the final state.
In contrast, the relative entropy D[p(x, 0), p(x, t)] depends only on PDFs at time 0 and t and thus does not tell us about intermediate states between initial and final states.

Attractor Structure
Since L(t) represents the accumulated change in information (due to the change in PDF) at time t, L(t) settles to a constant value L ∞ when a PDF reaches its final equilibrium PDF. The smaller L ∞ , the smaller number of states that the initial PDF passes through to reach the final equilibrium. Therefore, L ∞ provides us with a unique representation of a path-dependent, Lagrangian measure of the distance between a given initial and final PDF. We will utilise this property to map out the attractor structure by considering a narrow initial PDF at a different peak position y 0 and by measuring L ∞ against y 0 . We are particularly interested in how the behaviour of L ∞ against y 0 depends on whether a system has a stable equilibrium point or is chaotic.

Linear vs. Cubic Forces
We first consider the case where a system has a stable equilibrium point when there is no stochastic noise and investigate how L ∞ is affected by different deterministic forces [15,16]. We consider the following Langevin equation [22] for a variable x: Here, ξ is a short (delta) correlated stochastic noise with the strength D as where the angular brackets denote the average over ξ and ξ = 0. We consider two types of F, which both have a stable equilibrium point x = 0; the first one is the linear force F = −γx (γ > 0 is the frictional constant) which is the familiar Ornstein-Uhlenbeck (O-U) process, a popular model for a noisy relaxation system (e.g., [23]). The second is the cubic force F = −µx 3 where µ represents the frictional constant. Note that, in these models, the dimensions of γ (s −1 ) and µ (s −1 m −2 ) are different. Equivalent to the Langevin equation governed by Equations (11) and (12) is the Fokker-Planck equation [22] ∂ ∂t As an initial PDF, we consider a Gaussian PDF Entropy 2018, 20, 574

of 11
Then, for the O-U process, the PDF remains Gaussian for all time with the following form [15,16]: In Equations (14) and (15), x = y 0 e −γt is the mean position and y 0 is its initial value; β 0 is the inverse temperature at t = 0, which is related to the variance at t = 0 as ( Note that, when D = D 0 , β(t) = β 0 = γ 2D for all t, PDF maintains the same width for all t. For this Gaussian process, β and x constitute a parameter space on which the distance is defined with the Fisher metric tensor [3] g ij (i, j = 1, 2) as [16] where i, j = 1, 2, z 1 = β, z 2 = x . This enables us to recast 1 τ 2 in Equation (1) in terms of g ij as 1 The derivation of the first relation in Equation (18) is provided in Appendix A (see Equation (A2)). Using Equations (2) and (18), we can calculate L analytically for this O-U process (see also Appendix A).
In comparison, theoretical analysis can be done only in limiting cases such as small and large times for the cubic process [17,24]. In particular, the stationary PDF for large time is readily obtained as where β c = µ 4D . For the exact calculation of L(t), Equation (13) is to be solved numerically. To summarise, due to the restoring forcing F, the equilibrium is given by a PDF around x = 0, Gaussian for linear force and quartic exponential for cubic force. If we were to pick any point in x, say y 0 , we are curious about how close y 0 is to the equilibrium and how F(x) affects it. To determine this, we make a narrow PDF around x = y 0 (see Figure 1) at t = 0 and measure L ∞ . The question is how this L ∞ depends on y 0 . We repeat the same procedure for the cubic process, as shown in Figure 1, and examine how L ∞ depends on y 0 . L ∞ as a function of y 0 is shown for both linear (in red dotted line) and cubic (in blue solid line) processes in Figure 2. In the linear case we can see a clear linear relation between y 0 and L ∞ , meaning that the information length preserves the linearity of the system. This linear relationship holds for all D and D 0 . In particular, when D = D 0 , we can show that L ∞ = 1 √ D/γ y 0 by taking the limit of t → ∞ (y → 0) in Equation (A10). In contrast, for the cubic process, the relation is not linear, and the log-log plot on the right in Figure 2 shows a power-law dependence with the power-law index p. This power-law index p varies between 1.52 and 1.91 and depends on the width (∝ D 1/2 0 ) of initial PDF and stochastic forcing amplitude D, as shown in [16]. This indicates that nonlinear force breaks the linear scaling of geometric structure and changes it to power-law scalings. In either cases here, L ∞ has a smooth variation with y 0 with its minimum value at y 0 = 0 since the equilibrium point 0 is stable. This will be compared with the behaviour in chaotic systems in Section 3.2.

Chaotic Attractor
Section 3.1 demonstrates that the minimum value of L ∞ occurs at a stable equilibrium point [15,16]. We now show that in contrast, in the case of a chaotic attractor, the minimum value of L ∞ occurs at an unstable point [13]. To this end, we consider a chaotic attractor using a logistic map [13]. The latter is simply given by a rule as to how to update the value x at t + 1 from its previous value at t as follows [25] x where x = [−1, 1] and a is a parameter, which controls the stability of the system. As we are interested in a chaotic attractor, we chose the value a = 2 so that any initial value x 0 evolves to a chaotic attractor given by an invariant density (shown in the right panel of Figure 3). A key question is then whether all values of x 0 are similar as they all evolve to the same invariant density in the long time limit. To address how close a particular point x 0 is to equilibrium, we (i) consider a narrow initial PDF around x 0 at t = 0, (ii) evolve it until it reaches the equilibrium distribution, (iii) measure the L ∞ between initial and final PDF, and (iv) repeat steps (i)-(iii) for many different values x 0 . For example, for x 0 = 0.7, the initial PDF is shown on the left and final PDF on the right in Figure 3. We show L ∞ against x 0 in Figure 4. A striking feature of Figure 4 is an abrupt change in L ∞ for a small change in x 0 . This means that the distance between x 0 and the final chaotic attractor depends sensitively on x 0 . This sensitive dependence of L ∞ on x(t = 0) means that a small change in the initial condition x 0 causes a large difference in a path that a system evolves through and thus L ∞ . This is a good illustration of a chaotic equilibrium and is quite similar to the sensitive dependence of the Lyapunov exponent on the initial condition [25]. That is, our L ∞ provides a new methodology to that for almost the entire evolution of the system we have zeroprobability states, p(x, t) = 0 while p(x, t ′ ) ̸ = 0. This means that our set representation of the evolution is vital to avoid unphysical infinite lengths. We recall that the logistic map is governed by the following mapping which describes the position of an orbit x t+1 at time t + 1 as a function of its position x t at the earlier time t. a is the control parameter, which is taken to be 2 for simulating a chaotic region.
The stationary density for a = 2 is given by In this chaotic region, the map has the two unstable fixed points x = − 1 and x = 1/2, which turn out to play an interesting role in L(t) as shown later.
A key question of our interest is how an initial state far from equilibrium approaches p 0 (x) in probability space in terms of L(t). For instance, is there any unique property of L(t) that can be identified for all evolutions starting from different initial conditions? To answer this question, we perform numerical simulation of Eq. (17) starting from an initial PDF which is strongly localised at x = x 0 , approximated by a delta function. For each simulation using different initial x 0 , the domain, [− 1, 1] will be broken into M bins, with the width of each bin 2 M . The number of bins used is a free parameter after all "There is no law of nature that defines the coarse grains" [17]. Here we have fixed the number so as to make each simulation comparable. p(x, t) thus represents the probability of finding an orbit in bin x at time t.
Using the random initial distribution of M = 9 × 10 7 points, centred at x 0 = − 0.553, we first check the validity of approximating Eq. (5) with Eq. (10). Interestingly, Fig. 1 shows that L given in Eq. (5) plotted in the solid black line with solid dots agrees very well for most of the evolution with L given by Eq. (10) shown by the line with circles, respectively. It is seen from Fig. 1 that initially, the PDFs never overlap at the two consecutive times, occupying only set Q p . For 12 < t ≤ 15, the PDFs overlap and change rather rapidly but still do not fill the whole state space. In this regime, approximation of the derivative seems to give errors, caus-  Fig. 4. All of these initial conditions reach fixed points in 5 iterations or less. Since L represents the statistical distance between the initial PDFs and the final, invariant density, this means that the unstable fixed points are what is most efficiently converting available work into wasted work such as heat. Phrased another way, the fixed points reduce the information of the PDF, bringing each PDF nearer to the invariant density, which is the distribution with the highest disorder [9]. If one wished to then prolong the distance to a stationary distribution or conversely find the shortest path to said distribution, one simply finds the path Fig. 5. The evolution of L starting from xo = 0.7071. The ev into four main phases. 0 < t ≤ 4, all x ∈ Q p , 4 < t ≤ 7 all orbi holds the x = −1 fixed point, though the operator that wou orbits is reducible. 7 < t ≤ 16, L(t) < 1 as the PDFs overlap changes. t > 16 the system settles into p0(x).

Conclusion
In this paper we have investigated both theo merically the information length using our set th We have shown that dL dt > 0 is guaranteed for equilibrium as long as the system is evolved und operator.
On of the sets that contribute to L is Q p whi of probability not being used in the systems evolu servation of probability, when one PDF does not i PDF at the next time step only set Q p is occupie meaning the system has no correlation with itse the system's PDFs start intersecting at the two s we have non-zero Q w and the rate of change in time. This is because the available work attribut (measured with DS) is reduced through the con able probability in Q p . The logistic map was use our results. An interesting result of this simulatio tem almost always follows the minimum path. appears to deviate from this is when the system from a non-stationary distribution that fills the en to the invariant distribution. We also showed th unstable fixed points as the most efficient areas convert a non-equilibrium distribution into the for the logistic map. This curious result may warra tigation as to the scope of its generality in other work will also include a more detailed investigat total change in L and the structure of attractors

Music: Can We See the Music?
Our methodology is not system-specific and applicable to any stochastic processes. In particular, given any time-dependent PDFs that are computed from a theory, simulations or from data, we can compute L(t) to understand information change. As an example, we apply our theory to music data and discuss information change associated with different pieces of classical music. In particular, we are interested in understanding differences among famous classical music in view of information change.
To gain an insight, we used the MIDI file [26], computed time-dependent PDFs and the information length as a function of time [14].
Specifically, the midi file stores a music by the MIDI number according to 12 different music notes (C, C # , D, D # , E, F, F # , G, G # , A, A # , B) and 11 different octaves, with the typical time ∆t between the two adjacent notes of order ∆t ∼ 10 −3 s. In order to construct a PDF, we specify 129 statistically different states according to the MIDI number and one extra rest state (see Table 1 in [14]) and calculate an instantaneous PDF (see Figure S1 in [14]) from an orchestra music by measuring the frequency (the total number of times) that a particular state is played by all instruments at a given time. Thus, the time-dependent PDFs are defined in discrete time steps with ∆t ∼ 10 −3 , and the discrete version of L (Equation (7) in [14]) is used in numerical computation. Figure 5 shows L(t) against time for Vivaldi's Summer, Mozart, Tchaikovsky's 1812 Overture, and Beethoven's Ninth Symphony 2nd movement. We observe the difference among different composers, in particular, more classical, more subtle in information change. We then look at the rate of information change against time for different music by calculating the gradient of L ( dL dt = 1/τ) in Figure 6, which also manifests the most subtle change in information length for Vivaldi and Mozart.   Figure 5 (from [14]).

Quantum Systems
Finally, we examine quantum effects on information length [21]. In Quantum Mechanics (QM), the uncertainty relation ∆x∆P ≥¯h 2 between position x and momentum P gives us an effect quite similar to a stochastic noise. We note here that we are using P to denote the momentum to distinguish it from a PDF (p(x, t)). For instance, the trajectory of a particle in the x − P phase space is random and not smooth. Furthermore, the phase volume h plays the role of resolution in the phase space, one unit of information given by the phase volume h. Thus, the total number of states is given by the total Entropy 2018, 20, 574 8 of 11 phase volume divided by h. This observation points out a potentially different role of the width of PDF in QM in comparison with the classical system since a wider PDF in QM occupies a larger region of x in the phase space, with the possibility of increasing the information.
To investigate this, for simplicity, we consider a particle of mass m under a constant force F and assume an initial Gaussian wave function around x = 0 [21] ψ(x , 0) = 2β 0 where k 0 = P 0 /h is the wave number at t = 0, D x = (2β 0 ) −1/2 is the width of the initial wave function, and P 0 is the initial momentum. A time-dependent PDF p(x, t) is then found as (e.g., see [21,27]): Here, Equation (22) clearly shows that the PDF is Gaussian, with the mean x =¯h k 0 t m + Ft 2 2m and the variance In Equation (24), Var(0) = (x(0) − x(0) ) 2 = 1 4β 0 = D x 2 is the initial variance. We note that the last term in Equation (24) increases quadratically with time t due to the quantum effect, the width of wave function becoming larger over time. Obviously, this effect vanishes ash → 0.
Since the PDF in Equation (22) is Gaussian, we can use Equation (18) to find (e.g., see [16]) where T = m 2hβ 0 , the time scale of the broadening of the initial wave function [21]. It is interesting to note that when there is no external constant force F, the two terms in Equation (25) decrease for large time t, making τ large. The situation changes dramatically in the presence of F in Equation (25) as the second term approaches a constant value for large time. The region with the same value of τ signifies that the rate of change in information is constant in time, and was argued to be an optimal path to minimise the irreversible dissipation (e.g., [16]). Physically, this geodesic arises when when the broadening of a PDF is compensated by momentum Ft which increases with time. Mathematically, the limit t → ∞ reduces Equation (25) and thus L to Since Ft = P and D x = (2β 0 ) −1/2 is the width of the wave function at t = 0, FtD x in Equation (26) represents the volume in the P − x phase space spanned by this wave function. This reflects the information changes associated with the coverage of a phase volumeh. Interestingly, similar results are also obtained in the momentum representation where L is computed from the PDF p(P, t) in the momentum space: p(P, t) = λ π e −λ(p−(mv 0 +Ft)) 2 , 1 where λ = 1 2h 2 β 0 . In Equation (27), τ is obviously constant, and L linearly increases with time t.
We can see even a strong similarity between Equation (27) and Equation (26) as t → ∞ once using L ∝ √ 2λFt ∼ (Ft)D x /h. In view of the complementary relation between position and momentum in quantum systems, the similar result for L in momentum and position space highlights the robustness of the geodesic.

Conclusions
We investigated information geometry associated with stochastic processes in classical and quantum systems. Specifically, we introduced τ(t) as a dynamical time scale quantifying information change and calculated L(t) by measuring the total clock time t by τ. As a unique Lagrangian measure of the information change, L ∞ was demonstrated to be a novel diagnostic for mapping out an attractor structure. In particular, L ∞ was shown to capture the effect of different deterministic forces through the scaling of L ∞ again the peak position of a narrow initial PDF. For a stable equilibrium, the minimum value of L ∞ occurs at the equilibrium point. In comparison, in the case of a chaotic attractor, L ∞ exhibits a sensitive dependence on initial conditions like a Lyapunov exponent. We then showed the application of our method to characterize the information change associated with classical music (e.g., see [14]). Finally, we elucidated the effect of the width of a PDF on information length in quantum systems. Extension of this work to impure (mixed-state) quantum systems and investigation of Riemannian geometry on the space of density operators would be of particular interest for future work.
Funding: This research received no external funding.