Information Geometry, Fluctuations, Non-Equilibrium Thermodynamics, and Geodesics in Complex Systems

Information theory provides an interdisciplinary method to understand important phenomena in many research fields ranging from astrophysical and laboratory fluids/plasmas to biological systems. In particular, information geometric theory enables us to envision the evolution of non-equilibrium processes in terms of a (dimensionless) distance by quantifying how information unfolds over time as a probability density function (PDF) evolves in time. Here, we discuss some recent developments in information geometric theory focusing on time-dependent dynamic aspects of non-equilibrium processes (e.g., time-varying mean value, time-varying variance, or temperature, etc.) and their thermodynamic and physical/biological implications. We compare different distances between two given PDFs and highlight the importance of a path-dependent distance for a time-dependent PDF. We then discuss the role of the information rate Γ=dLdt and relative entropy in non-equilibrium thermodynamic relations (entropy production rate, heat flux, dissipated work, non-equilibrium free energy, etc.), and various inequalities among them. Here, L is the information length representing the total number of statistically distinguishable states a PDF evolves through over time. We explore the implications of a geodesic solution in information geometry for self-organization and control.


Introduction
Information geometry refers to the application of the techniques of differential geometry to probability and statistics. Specifically, it uses differential geometry to define the metric tensor that endows the statistical space (consisting of probabilities) with the notion of distance . While seemingly too abstract, it permits us to measure quantitative differences among different probabilities. It then makes it possible to link a stochastic process, complexity, and geometry, which is particularly useful in classifying a growing number of data from different research areas (e.g., from astrophysical and laboratory systems to biosystems). Furthermore, it can be used to obtain desired outcomes [6][7][8][9][10]15] or to understand statistical complexity [4].
For instance, the Wasserstein metric [6][7][8][9][10] was widely used in the optimal transport problem where the main interest is to minimize transport cost which is a quadratic function of the distance between two locations. It satisfies the Fokker-Planck equation for gradient flow which minimizes the entropy/energy functional [7]. For Gaussian distributions, the Wasserstein metric space consists of physical distances-Euclidean and positive symmetric matrices for the mean and variance, respectively (e.g., see [8]).
In comparsion, the Fisher (Fisher-Rao) information [32] can be used to define a dimensionless distance in statistical manifolds [33,34]. For instance, the statistical distance ds represents the number of indistinguishable states as [5,33] (ds) 2 ≡ ∑ j dp 2 j p j = ∑ j p j (d ln p j ) 2 = ∑ j,α,β p j ∂ ln p j ∂λ α ∂ ln p j ∂λ β dλ α dλ β = ∑ α,β dλ α g αβ dλ β . (1) Here the Fisher information metric g αβ = ∂ ln p j ∂λ α ∂ ln p j ∂λ β = ∑ j p j ∂ ln p j ∂λ α ∂ ln p j ∂λ β provides natural (Riemannian) distinguishability metric on the space of probability distributions. λ α 's are the parameters of the probability p j and the angular brackets represent the ensemble average over p j . Note that Equation (1) is given for a discrete probability p j . For a continuous Probability Density Function (PDF) p(x) for a variable x, Equation (1) becomes (ds) 2 = dxp(x)[d ln (p(x)] 2 = ∑ α,β dλ α g αβ dλ β where g αβ = dx p(x) ∂ ln (p(x)) ∂λ α ∂ ln (p(x)) ∂λ β . For Gaussian processes, the Fisher metric is inversely proportional to the covariance matrices of fluctuations in the systems. Thus, in thermodynamic equilibrium, strong fluctuations lead to a strong correlation and a shorter distance between the neighboring states [34,35]. Alternatively, fluctuations determine the uncertainty in measurements, providing the resolution (the distance unit) that normalizes the distance between different thermodynamic states.
To appreciate the meaning of fluctuation-based metric, let us consider the (equilibrium) Maxwell-Boltzmann distribution p(E j ) = βe −βE j for the energy state E j Here β = 1/k B T is the inverse temperature; k B is the Boltzmann constant; T is the temperature of the heat bath. In Equation (2), the thermal energy k B T = E of the heat bath (the width/uncertainty of the probability) provides the resolution to differentiate different states ∆E = E i − E j . The smaller is the resolution (temperature), the more distinguishable states (more accessible information in the system) there are. It agrees with the expectation that a PDF gradient (the Fisher-information) increases with information [32]. This concept has been generalized to non-equilibrium systems [36][37][38][39][40][41][42][43], including the utilization for controlling systems to minimize entropy production [38,40,42], the measurement of the statistical distance in experiments to validate theoretical predictions [41], etc. However, some of these works rely on the equilibrium distribution Equation (2) that is valid only in or near equilibrium while many important phenomena in nature and laboratories are often far from equilibrium with strong fluctuations, variability, heterogeneity, or stochasticity [44][45][46][47][48][49][50][51][52]. Far from equilibrium, there is no (infinite-capacity) heat bath that can maintain the system at a certain temperature, or constant fluctuation level. One of the important questions far from equilibrium is indeed to understand how fluctuation level β(t) −1 changes with time. Furthermore, PDFs no longer follow the Maxwell-Boltzmann nor Gaussian distributions and can involve the contribution from (rare) events of large amplitude fluctuations [53][54][55][56][57][58][59][60][61][62]. Therefore, the full knowledge of time-varying PDFs and the application of information geometry to such PDFs have become of considerable interest.
The paper aims to discuss some recent developments in the information geometric theory of non-equilibrium processes. Since this would undoubtedly span a broad range of topics, this paper will have to be selective and will focus on elucidating the dynamic aspect of non-equilibrium processes and thermodynamic and physical/biological implications. Throughout the paper, we highlight that time-varying measures (esp. variance) introduces extra complication in various relations, in particular, between the information geometric measure and entropy production rate. We make the efforts to make this paper self-contained (e.g., by including the derivations of some well-known results) wherever possible.
The remainder of this paper is organized as follows. Section 2 discusses different distances between two PDFs and the generalization for a time-dependent non-equilibrium PDF. Section 3 compares the distancs from Section 2. Section 4 discusses key thermodynamic relations that are useful for non-equilibrium processes. Section 5 establishes relations between information geometric quantities (in Section 2) and thermodynamics (in Section 4). In Section 6, we discuss the concept of a geodesic in information geometry and its implications for self-organization or designing optimal protocols for control. Conclusions are provided in Section 7.

Distances/Metrics
This section discusses the distance defined between two probabilities (Section 2.1) and along the evolution path of a time-dependent probability (Section 2.2). Examples and comparisons of these distances are provided in Section 3.1. For illustration, we use a PDF p(x, t) of a stochastic variable x and differential entropy S = − dx p(x, t) ln (p(x, t)) by using the unit k B = 1.

Distance Between Two PDFs
We consider the distance between two PDFs p 1 = p(x, t 1 ) and p 2 = p(x, t 2 ) of a stochastic variable x at two times t 1 and t 2 , respectively where t 1 = t 2 or t 1 = t 2 in general.

Wootters' Distance
The Wootters' distance [5,33] is defined in quantum mechanics by the shortest distance between the two p 1 and p 2 that have the wave functions ψ 1 and ψ 2 (p 1 = |ψ 1 | 2 and p 2 = |ψ 2 | 2 ), respectively. Specifically, for given p 1 and p 2 , the distance s(p 1 , p 2 ) between p 1 and p 2 can be parameterized by infinitely many different paths between p 1 and p 2 . Letting z be the affine parameter of a path, we have where ds is given in Equation (1). Among all possible paths, the minimum of s(p 1 , p 2 ) is obtained for a particular path that optimizes the quantum distinguishability; the (Hilbertspace) angle between the two wave functions provides such minimum distance as Equation (4) is for a pure state and has been generalized to mixed states (e.g., see [37,102] and references therein). Note that the Wootters' distance is related to the Hellinger distance [43].

Kullback-Leibler (K-L) Divergence/Relative Entropy
Kullback-Leibler (K-L) divergence between the two PDFs [83], also called relative entropy, is defined by Relative entropy quantifies the difference between a PDF p 1 and another PDF p 2 . It takes the minimum zero value for identical two PDFs p 1 = p 2 and becomes large as p 1 and p 2 become more different. However, as it is defined in Equation (5), it is not symmetric between p 1 and p 2 and does not satisfy the triangle inequality. It is thus not a metric in a strict sense.

Jensen Divergence
The Jensen divergence (also called Jensen distance) is the symmetrized Kullback-Leibler divergence defined by While the square root of the Jensen-Shannon divergence J (p 1 |p 2 ) is a metric [4,103], J (p 1 |p 2 ) itself has also been used in examining statistical complexity (e.g., see [43,104,105]).

Euclidean Norm
In analysis of big data, the Euclidean norm [5,106] is used, which is defined by While Equation (7) has a direct analogy to the physical distance, it has a limitation in measuring statistical complexity due to the neglect of the stochastic nature [5]. For instance, the Wootters' distance in Equation (4) was shown to work better than Equation (7) in capturing complexity in the logistic map [5].

Distance along the Path
Equations (4)-(7) can be used to define the distance between the two given PDFs p(x, t 1 ) and p(x, t 2 ) at times t 1 and t 2 (t 2 > t 1 ). However, p(x, t) at the intermediate time t = (t 1 , t 2 ) can take an infinite number of different values depending on the exact path that a system takes between p(x, t 1 ) and p(x, t 2 ). One example would be i) p(x, t 1 ) = p(x, t 2 ) = p(x, t) for all t = (t 1 , t 2 ) and x, in comparison with ii) p(x, t 1 ) = p(x, t 2 ) but p(x, t) = p(x, t 1 ) and p(x, t) = p(x, t 2 ). What is necessary is a path-dependent distance that depends on the exact evolution and the form of p(x, t) for t = (t 1 , t 2 ).
Note that E can be viewed as the Fisher information [32] if time is interpreted as a parameter (e.g., [97]). However, time in classical mechanics is a passive quantity that cannot be changed by an external control. Γ is also called the entropy production rate in quantum mechanics [107]. However, as shown in Sections 4.1 and 4.4, the relation between Γ and thermodynamic entropy production rate is more complicated (see Equation (28)).
Γ in Equation (9) is the information rate representing how quickly new information is revealed as a PDF evolves in time. Here, Γ −1 = τ is the characteristic time scale of this information change in time. To show that Γ is related to fluctuation's smallest time scale [97], we assume that λ α 's are the estimators (parameters) of a p(x, t) and use the Cramér-Rao bound on the Fisher information where C αβ ≡ δλ α δλ β is the covariance matrix (e.g., see [32]); δλ α = λ α − λ α denotes fluctuation. Using For the diagonal g αβ = g α δ αβ , Equation (10) is simplified as Equation (11) shows how the RMS fluctuation-normalized rate at which the parameter λ α can change is bounded above by Γ. If there is only α = 1 (λ α = λδ α,1 ), Equation (11) is further simplified: clearly showing that λ normalized by its RMS fluctuations cannot change faster than the information rate. Finally, it is worth highlighting that Equation (9) is general and can be used even when the parameters λ α 's and g αβ in Γ 2 in Equation (10) are unknown. Examples include the cases where PDFs are empirically inferred from experimental/observational data. Readers are referred to Refs. [21,23,28] for examples. It is only the special case where we have a complete set of parameters λ α 's of a PDF that we can express Γ using the Fisher information as in Equation (10). For instance, for a Gaussian p(x, t) that is fully described by the mean value x and variance 1 2β , (λ 1 , λ 2 ) = ( x , β).
[We note that different names (e.g., statistical length [108], or statistical distance [97]) were also used for L.] It is important to note that unlike the Wootters' distance (the shortest distance among all possible paths between the two PDFs) in Equation (3) (e.g., [5]), L(t) in Equation (13) is fixed for a given time-evolving PDF p(x, t).
By definition in Equation (13), L(t = 0) = 0 and L(t) monotonically increases with time since Γ ≥ 0 (e.g., see Figure A2 in [22]). L(t) takes a constant value only in a stationary state (Γ = ∂ t p = 0). One of its important consequences is that when p(x, t) relaxes into a stationary PDF in the long time limit t → ∞, Γ(t) → 0 and L(t) → L ∞ as t → ∞ where L ∞ is a constant depending on initial conditions and parameters. This property of L ∞ was used to understand attractor structure in a relaxation problem; specifically, Refs. [15,16,18,22,28] calculated L ∞ for different values of the mean position x 0 of an initial PDF and examined how L ∞ depends on x 0 . Furthermore, Γ and L were shown to be useful for quantifying hysteresis in forward-backward processes [19], correlation and self-regulation among different players [23,25], and predicting the occurrence of sudden events [27] and phase transitions [23,25]. Some of these points are illustrated in Section 3.1.

Model and Comparison of Metrics
For discussing/comparing different metrics in Section 2 and statistical measures in Section 4, we use the following Langevin model [109] dx Here, V(x, t) is, in general, a time-dependent potential which can include an internal potential and an external force; ξ is a short (delta)-correlated Gaussian noise with the following statistical property Here, the angular brackets represent the ensemble average over the stochastic noise ξ; D ≥ 0 is the amplitude of ξ. It is important to note that far from equilibrium, the average (e.g., x(t) ) is a function of time, in general.
The exact PDFs can be obtained for the Ornstein-Uhlenbeck (O-U) process which has in Equation (14). Here, v(t) is a deterministic function of time. Specifically, for the initial Gaussian PDF p(x, 0) a time-dependent PDF remains Gaussian at all time: In Equations (16) Here, β, σ and σ 2 are the inverse temperature, standard deviation, and variance, respectively; β 0 and x 0 are the values of β and x , respectively, at t = 0. Equation (18) shows that as 2D . Note that we use both β and σ here to clarify the connections to the previous works [15,17,22,[26][27][28].

Geometric Structure of Equilibrium/Attractors
To elucidate the main difference between the distances in Equations (4)-(6) and (13), we consider the relaxation problem by assuming v(t) = 0. In the following, we compare the distance between p(x, 0) and p(x, t → ∞) by using (4)-(6) and Equation (13). Analytical expressions for these distances are given in [22].
Each curve in Figure 1 shows how each distance depends on the initial mean position x 0 . The four different curves are for L ∞ (in blue), Wootters' distance (in orange), K-L relative entropy (in green), and Jensen divergence (in red), respectively. The relative entropy and Jensen divergence exhibit similar behavior, the red and green color curves being superimposed on each other. Of note is a linear relation between L ∞ and x 0 in Figure 1. Such linear relation is not seen in other distances. This means that the information length is a unique measure that manifests a linear geometry around its equilibrium point in a linear Gaussian process [28,30]. Note that for a nonlinear force f , L ∞ has a power-law relation with x 0 for a sufficiently large x 0 [18,28]. These contrast with the behaviour in a chaotic system [16,28] where L ∞ depends sensitively on the initial condition and abruptly changes with x 0 . Thus, the information length provides a useful tool to geometrically understand attractor structures in relaxation problems.

Correlation between Two Interacting Components
We next show that the information length is also useful in elucidating the correlation between two interacting species such as two competing components relaxing to the same equilibrium in the long time limit. Specifically, the two interacting components with the time-dependent PDFs P 1 (x, t) and P 2 (x, t) are coupled through the Dichotomous noise [110,111] (see Appendix B) and relax into the same equilibrium Gaussian PDF P 1 (x, t → ∞) = P 2 (x, t → ∞) = 1 2 P(x, t → ∞) around x = 0 in the long time limit. Here, P(x, t) = P 1 (x, t) + P 2 (x, t) is the total PDF. For the case considered below, P(x, t) satisfies the O-U process (see Appendix B for details). We choose the initial conditions where P 1 (t = 0) = P 1 (t → ∞) with zero initial mean value while P 2 (t = 0) takes an initial mean value x 0 . These are demonstrated in the cartoon figure, Figure 2a,c.
in time due to its coupling to P 2 and thus P 1 (x, t) = P 1 (x, t = 0), as shown in Figure 2b. Consequently, L(t) calculated from P 1 monotonically increases to its asymptotic value L ∞ until it reaches the equilibrium (see Figure A2 in [22] for time-evolution of L from P 1 and P 2 ). On the other hand, P 2 with an initial mean value x 0 undergoes a different time evolution (unless x 0 = 0) until it reaches the equilibrium.
For the total P, a linear relation between L ∞ and x 0 is seen in Figure 3a (like in Figure 1). This linear relation is not seen in L ∞ calculated from either P 1 or P 2 in Figure3b or Figure 3c; a non-monotonic dependence of L ∞ in Figure 3b,c is due to large-fluctuations and strong-correlation between P 1 and P 2 during time-evolution for large x 0 . What is quite remarkable is that in contrast to other distances, L ∞ calculated from P 1 and P 2 in Figure 3b,c exhibits a very similar dependence on x 0 . It means that despite very different time-evolutions of P 1 and P 2 (see Figure 2), they undergo similar total change in information. These results suggest that strong coupling between two components can be inferred from their similar information length (see also [24,25]).

Thermodynamic Relations
To elucidate the utility of information geometric theory in understanding non-equilibrium thermodynamics, we review some of the important thermodynamic measures of irreversibility and dissipation [112] and relate them to information geometric measures Γ and K [29]. For illustration below, we use the model in Equations (14) and (15) unless stated otherwise. Corresponding to Equations (14) and (15) is the following Fokker-Planck equation [109] ∂p(x, t) ∂t where J = f p − D∂ x p is the probability current.

Entropy Production Rate and Flow
For non-equilibrium thermodynamics, we need to consider the entropy in the system S and the environment S m , and the total entropy S T = S + S m . To clarify the difference among these, we go over some derivation by using ∂ t p = −∂ x J and J = f p − D∂ x p to obtainṠ where,Ṡ Here, we used integration by parts in t and x.Ṡ T denotes the (total) entropy production rate, which is non-negativeṠ T ≥ 0 by definition, and serves as a measure of irreversibility [112]. The sign ofṠ m in Equation (22) represents the direction in which the entropy flows between the system and environment. Specifically,Ṡ m > 0 (Ṡ m < 0) when the entropy flows from the system (environment) to the environment (system).Ṡ m is related to the heat flux Q = DS m from the system to the environment. The equalityṠ T = 0 holds in an equilibrium reversible process. In this case,Ṡ = −Ṡ m = − Q D , which is the usual equilibrium thermodynamic relation. In comparison, whenṠ = 0,Ṡ T =Ṡ m ≥ 0.
For the O-U process with V = γ 2 (x − v(t)) 2 and f = −γ(x − v(t)) in Equations (14), (17)- (19), (21) and (22) lead to (see [29] for details) Here, we used In order to relate these thermodynamical quantitiesṠ T andṠ above to the information rate Γ, we recall that for the O-U process [15,17,[26][27][28], Equations (24) and (27) then give us Interestingly, Equation (28) reveals that the entropy production rate needs be normalized by variance σ 2 . This is because of the extensive nature ofṠ T unlike Γ orṠ. That is,Ṡ T changes its value when the variable x is rescaled by a scalar factor, say, α (> 0) as y = αx. Furthermore, Equation (28) shows that the information rate Γ in general does not have a simple relation to the entropy production rate (c.f., [107]).
One interesting limit of Equation (28) is the case of constant β(t) withṠ = 0. In that case, Equation (24) becomes DṠ T = (∂ t x ) 2 while Equations (13), (27) and (28) give us Equation (29) simply states that L measures the total change in the mean value normalized by fluctuation level σ. Equation (30) manifests a linear relation between Γ 2 andṠ T when ∂ t σ = 0, as invoked in the previous works (e.g., [107]). Furthermore, a linear relation between Γ 2 andṠ T in Equation (30) implies that minimizing the entropy production t dt 1ṠT along the trajectory corresponds to minimizing t 0 dt 1 Γ 2 (t 1 ), which, in turn, is equivalent to minimizing L(t) (see Section 5 for further discussions).
Finally, to demonstrate how entropy production rate and thermal bath temperature (D) are linked to the speed of fluctuations c = σΓ [97], we rewrite Equation (28) as For constant varianceβ = 0, Equation (31) gives a simple relation c = σΓ = DṠ T .

Non-Equilibrium Thermodynamical Laws
To relate the statistical measures in Section 4.1 to thermodynamics, we let U (internal energy) be the average potential energy U = V and obtain (see also [66,113] and references therein) The powerẆ represents the average rate at which the work is done to the system because of time-varying potential; the average work during the time interval [t 0 , t] is calculated by W = t t 0 dt Ẇ (t ). On the other hand,Q represents the rate of dissipated heat. Equation (32) establishes the non-equilibrium thermodynamic relationU =Ẇ −Q. Physically, it simply means that the work done to the systemẆ increases U while the dissipated heat to the environmentQ decreases it. Equations (21), (32), and (34) permit us to define a non-equilibrium (information) free energy F (t) = U(t) − DS(t) [92] and its time-derivativeḞ whereḞ = dF dt andU = dU dt . SinceṠ T ≥ 0, Equation (35) leads to the following inequality where the non-negative dissipated power (lost to the environment)Ẇ D is defined. Finally, the time-integral version of Equation (36) provides the bound on the average work performed on the system as W − ∆F = W D ≥ 0 (e.g., [68]).

Relative Entropy as a Measure of Irreversibility
The relative entropy has proven to be useful in understanding irreversibilities and non-equilibrium thermodynamic inequality relations [91][92][93][94][114][115][116]. In particular, the dissipated work W D = W − ∆F (in Equation (36)) is related to the relative entropy between the PDFs in the forward and reverse processes (e.g., see [91][92][93][94].) Here, p F (γ F (t)) and p R (γ R (t)) are the PDFs for the forward and reverse processes driven by the forward γ F (t) and reverse γ R (t) protocols, respectively. Using Equation (36) in Equation (37) immediately giveṡ which is a proxy for irreversibility (see [115,116] for a slightly different expression of Equation (38)). It is useful to note that forward and reversal protocols are also used to establish various fluctuations theorems for different dissipative measures such as entropy production, dissipated work, etc. (see, e.g., [80] for a nice review and references therein). However, we cannot consider forward and reversal protocols in the absence of a model control parameter that can be prescribed as a function of time. Even in this case, the relative entropy is useful in quantifying irreversibility through inequalities, and this is what we focus on in the remainder of Section 4.3.
To this end, let us consider a non-equilibrium state p(x, t) which has an instantaneous non-equilibrium stationary state p s (x, t) and calculate the relative entropy between the two. Here, p s (x, t) is a steady solution of the Fokker-Planck equation ∂ t p s = 0 in Equation (20) (e.g., see [29]). Specifically, one finds p s (x, t) = e − V(x,t)−Fs (t) D by treating the parameters to be constant (being frozen to their instantaneous values at a given time). Here, V and F s are the potential energy and the stationary free energy, respectively. For clarity, an example of p s (x, t) is given in Section 4.4.
The average of ln p s (t) in the non-equilibrium state p(x, t) can be expressed as follows: Equations (35) and (39) Here, we used the fact the relative entropy is non-negative. Equation (40) explicitly shows that non-equilibrium free energy is bounded below by the stationary one F ≥ F s (see also [1,92]

and references therein for open Hamiltonian systems).
Equation (40) together with Equation (35) then lead to the following irreversible work W irr [29,92]: The derivation of Equation (41) for open-driven Hamiltonian systems is provided in [92] (see their Equation (38)).
On the other hand, we directly calculate the time-derivative of K[p(x, t)|p s (x, t)] in One can see easily that equating Equation (42) (40)) simply recoversẆ − d dt F = DṠ T in Equation (35). Finally, we obtain a differential form of Equation (41)

Example
We consider v(t) = ut with a constant u so that V = − γ 2 (x − ut) 2 in Equation (14). While the discussion below explicitly involves v(t), the results are general and valid for the limiting case v(t) = 0. The case with v(t) = 0 is an example where the forward and reversal protocols do not exist while a non-equilibrium stationary state does.
For f = −γ(x − ut), Equation (19) is simplified as follows For the non-equilibrium stationary state with fixed γ and D, β s = γ 2D is also constant ( d dt F s = 0). Therefore, we have Then, we can find (see [29] for details) Here, we used Equations (23)-(26), x and β in Equations (44) and (18), respectively, anḋ It is worth looking at the two interesting limits of Equations (46)- (50). First, in the long time limit as t → ∞, the following simpler relations are found: Equation (51) illustrates how the external force v(t) = ut = 0 keeps the system out of equilibrium even in the long time limit, with non-zero entropy production and dissipation. When there is no external force u = 0, the system reaches equilibrium as t → ∞, and all quantities in Equation (51) apart from β become zero.
The second is when the system is initially in equilibrium with β(t = 0) = β(t → ∞) = γ 2D and x 0 = 0 and evolve in time as it is driven out of equilibrium by u = 0. As u does not affect variance, β(t) = β 0 = γ 2D (∂ t σ = 0) andṠ = 0 for all time. In this case, we find Equation (52) shows thatṠ T ,Q, Γ 2 ,Ẇ, and K start with zero values at t = 0 and monotonically increase to their asymptotic values as t → ∞. Finally, both cases considered above in Equations (51) and (52) have ∂ t σ = 0 and thus recover Equation (30):

Inequalities
Section 4 utilized the average (first moment) of a variable (e.g., V ) and the average of its first time derivative ( ∂ t V =Ẇ) while the work W = dtẆ is defined by the time integral ofẆ = ∂ t V in Equation (33). This section aims to show that the rates at which average quantities vary with time are bounded by fluctuations and Γ. Since the average and time derivatives do not commute, we pay particular attention to when the average is taken.
To this end, let us first define the microscopic free energy µ = V + D ln p (called the chemical potential energy in [113]). In terms of µ, we have J = −p∂ x µ and µ = U − DS = F . On the other hand, ∂ t µ =Ẇ means that the average rate of change in the microscopic free is the power. From Equation (54), it follows Equation (55) establishes the relation between the microscopic free energy and Γ. Next, we calculate the time-derivative of F Using d dt F =Ẇ − DṠ T in Equation (56) givesṠ T in terms of µ aṡ Equation (57) is to be used in Section 5.1 for linkingṠ T to Γ through an inequality.

General Inequality Relations
We now use dxṗ = 0, dxṗ A = A dxṗ = 0 for any A = A(x, t) and apply the Schwartz and (57) to obtain Equation (60) (Equation (59)) establishes the inequality between entropy production rate (heat flux) and the product of the RMS fluctuations of the microscopic free energy (potential energy) and Γ. Since δµ = δV + D(δ ln p), we have These relations are to be used in Section 5.2 below.

Geodesics, Control and Hyperbolic Geometry
The section aims to discuss geodesics in information geometry and its implications for self-organization and control. To illustrate the key concepts, we utilize an analytically solvable, generalized O-U process given by where γ(t) > 0 is a damping constant; v(t) is a deterministic force which determines the time evolution of the mean value of x; ξ is a short (delta)-correlated noise with the time-dependent amplitude D(t) in general, satisfying Equation (15). For the initial condition in Equation (16), the mean value x ≡ y(t) and β(t) are given by where x 0 = x(t = 0) .

Geodesics-Shortest-Distance Path
A geodesics between the two spatial locations is a unique path with the shortest distance. A similar concept can be applied to information geometry to define a unique evolution path between the two given PDFs, say, p(x, t 1 ) and p(x, t 2 ) in the statistical space. The Wootters' distance in quantum mechanics in Equation (4) is such an example. For time-varying stochastic processes, there is an infinite number of different trajectories between the two PDFs at different times. The key question that we address in this section is how to find an exact time evolution of p(x, t) when initial and final PDFs [15] are given. This is a much more difficult problem than finding a minimum distance between two PDFs (like the Wootter's distance). In the following, we sketch some main steps needed for finding such a unique evolution path (the so-called geodesics) between given initial and final PDFs by minimizing L (see [15] for detailed steps).
For the O-U process in Equation (64), a geodesic solution does not exist for constant γ, v(t) and D. Thus, finding a geodesic solution boils down to determing suitable functions of γ(t), v(t) or D(t) [15]. To be specific, let p(x, t 0 ) and p(x, t F ), respectively, be the PDFs at the time t = t 0 and t F (> t 0 ) and find a geodesic solution by minimizing L(t) = The Euler-Lagrange equation (β = dβ dt andẏ = dy dt ) then gives us where c is constant. An alternative method of obtaining Equations (69) and (70) is provided in Appendix C. The following equations are obtained from Equations (69) and (70) [15] dβ dt where α is another (integration) constant. General solutions to Equations (70) and (71) for c = 0 were found in terms of hyperbolic functions as [15] where A and B are constant. Equation (73) can be rewritten using σ = (2β) − 1 2 and z = y where s denotes the sign of c so that s = 1 when c > 0 while s = −1 when c < 0. Equation (74) is an equation of a circle for the variables z and σ with the radius R and the center z c , defined in the upper-half plane where σ ≥ 0. Thus, geodesic motions occur along the portions of a circle as long as c = 0 (as can be seen in Figure 4). A geodesic moves on a circle with a larger radius for a larger information rate Γ and speed and vice versus. This manifests the hyperbolic geometry in the upper half Poincaré model [13,117] where the half-plane represents z and σ = 0 (see also Appendix D). The constants c, α, A, and B determine the coordinate of the center and the radius of the circle R. These constants should be fixed by the fixed conditions at the initial t = 0 and final time t F . Having found the most general form of the geodesic solution for y(t) and β, the next steps require finding the values of constant values c, α, A, B to satisfy the boundary conditions at t = t 0 and t F , and then finding appropriate γ(t), D(t), and v(t) that ensure the geodesic solutions. This means the O-U process should be controlled by γ(t), D(t) and v(t) to ensure a geodesic solution.  Figure 4 shows an example of a geodesic solution in the upper half-plane y and β −1/2 when γ(t) = 1 is constant while D(t) and v(t) are time-dependent. The boundary conditions are chosen as y(t 0 ) = y 0 = 5 6 and y(t F ) = y F = 1 30 in all panels (a)-(d). β(t 0 ) = β 0 = β(t F ) = β F = 0.3 in panels (a) and (b) while β 0 = β F = 3 in panels (c) and (d). Interestingly, circular-shape phase-portraits are seen in panels (b) and (d), reflecting hyperbolic geometry noted above (see also Appendix D) [13,117]. The speed at which the geodesic motion takes place in the phase portrait is determined by the constant value of Γ = α 2 (i.e., the larger α, the faster time evolution). Figure 5a,b are the corresponding PDF snapshots at different times (shown in different colors), demonstrating how the PDF evolves from the initial PDF in red to the final PDF in blue. In both cases, it is prominent that the PDF width (∝ β −1/2 ) initially broadens and then becomes narrower.

Comments on Self-Organization and Control
Self-organization (also called homeostasis) is the novel phenomena where order spontaneously emerges out of disorder and is maintained by different feedbacks in complex systems [45,52,53,[118][119][120][121][122][123]. The extremum principles of thermodynamics such as the minimum entropy production (e.g., [119,121]) or maximum entropy entropy production (e.g., [122,123]) have been proposed by considering a steady state or an instant time in different problems.
However, far from equilibrium, self-organization can be a time-varying non-equilibrium process involving perpetual or large fluctuations (e.g., see [52][53][54]). In this case, the extreme of entropy production should be on accumulative entropy production over time rather than at one instant time nor in a steady state. That is, we should consider the time-integral of the entropy productionṠ T , or equivalently, the time-integral of Ṡ T . As seen from Equations (24) and (53), for a linear O-U process with a constant variance, there is an exact proportionality between Ṡ T and Γ. In this case, the extreme of L(t) = t dt 1 Γ(t 1 ) would be the same as the extreme of t dt 1 Ṡ T . However, as noted previously, Γ ∝ Ṡ T does not hold in general (e.g., see Equation (28)).
With these comments, we now look at the implications of a geodesic for self-organization, in particular, in biosystems. For the very existence and optimal functions of a living organiss, it is critical to minimize the dispersion of its physical states and to maintain its states within certain bounds upon changing conditions [124]. How fast its state changes in time can be quantified by the surprise rate ∂ t [ln (p(x, t)]. Since dxp(x, t)∂ t ln (p(x, t)) = 0, we use its RMS value (∂ t ln p) 2 = Γ (see Equation (9)) and realize that the total change over a finite time interval [t 0 , t F ] is nothing more than L ( t) = the future prediction based on the current state [124,125]), we can then interpret L as an accumulative biological cost. Thus, geodesic would be an optimal path that minimizes such an accumulative biological cost.
Ref [15] addressed how to utilize this idea to control populations (tumors). Specifically, the results in Section 6.1 were applied to a nonlinear stochastic growth model (obtained by a nonlinear change of variables of the O-U process), and the geodesic solution in Equation (73) was used to find the optimal protocols v(t) and D(t) in reducing a largesize tumor to a smaller one. Here, in this problem, D(t) represents the heterogeneity of tumor cells (e.g., larger D for metastatic tumor) that can be controlled by gene reprogramming while v(t) models the effect of a drug or radiation that reduces the mean tumor population/size.

Discussions and Conclusions
There has been a growing interest in information geometry from theoretical and practical considerations. This paper discussed some recent developments in information geometric theory, focusing on time-dependent dynamic aspects of non-equilibrium processes (e.g., time-varying mean value, time-varying variance, or temperature) and their thermodynamic and physical/biological implications.
In Sections 2 and 3, by utilizing a Langevin model of an over-damped stochastic process x(t), we highlighted the importance of a path-dependent distance L in describing time-varying processes. In Sections 4 and 5, we elucidated the thermodynamic meanings of the relative entropy and the information rate Γ by relating them to the entropy production rate (Ṡ T ),Ṡ, heat flux (Q = DṠ m ), dissipated work (Ẇ D ), etc., and demonstrated the role of Γ in determining bounds (or speed limit) on thermodynamical quantities.
Specifically, in the O-U process, we showed the exact relation (28)), which is simplified as σΓ = DṠ T when ∂ t σ = 0 (σ = (δx) 2 is the standard deviation of x). Finally, Section 6 discussed geodesic and its implication for self-organization as well as the underlying hyperbolic geometry. It remains future works to explore the link between Γ and the entropy production rate in other (e.g., nonlinear) systems consisting of three or more interacting components or data from self-organizing systems (e.g., normal brain). Acknowledgments: Eun-jin Kim acknowledges the Leverhulme Trust Research Fellowship (RF-2018-142-9) and thanks the collaborators, especially, James Heseltine who contributed to the works in this paper.

Conflicts of Interest:
The author declares no conflicts of interest.

Appendix A
In this Appendix, we show the invariance of Equation (9) when x changes as y = F(x). Using the conservation of the probability, we then have This shows that p(x, t) and p(y, t) give the same Γ 2 (t).

Appendix B. The Coupled O-U Process
The coupled O-U process for Figure 3 in Section 3.1 is governed by the Fokker-Planck equation [22] ∂P 1 ∂t Here, D is the strength of a short-correlated Gaussian noise given by Equation (15). These equations are the coupled O-U processes with the coupling constants f 0 and g 0 through the Dichotomous noise [110,111].
For simplicity, we use γ 1 = γ 2 = γ and f 0 = g 0 = and the following initial conditions The solutions are given by for m = 1, 2. In the limit of t → ∞, P 1 and P 2 in Equations (A7) and (A8) approach the same equilibrium distribution where β m (t → ∞) = γ 2D . We note that the total PDF P = P 1 + P 2 satisfies the single O-U process where the initial PDF is given by the sum of Equations (A5) and (A6). Figure 3 is shown for the fixed parameter values γ = 0.1, D = 1 and = 0.5, β 20 = β 10 = 0.05 = β(t → ∞) = γ 2D = 0.05. Different values of the initial mean position x 0 of P 2 are used to examine how metrics depend on x 0 . As noted in Section 2.2, P 1 at t = 0 is chosen to be the same as the final equilibrium state which has the zero mean value and inverse temperature β 10 = 0.05.

Appendix C. Curved Geometry: The Christoffel and Ricci-Curvature Tensors
A geodesic solution in Section 6.1 can also be found by solving the geodesic equation in general relativity (e.g., [31,107]). To this end, we let the two parameters be λ 1 = x = y and λ 2 = β and express Equation (A11) in terms of the metric tensor g ij as follows (see also Equation (10)) where Note that while g ij is diagonal, the 1-st diagonal component depends on β (the second parameter). That is, g ii is not independent of j-th parameter for j = i in general. From Equation (A12), we can find non-zero components of connection tensor Γ ijk = 1 2 ∂ i g jk + ∂ j g ik − ∂ k g ij (Γ i jk = g im Γ jkm ) Equations (A13)-(A15) give Equation (70). Note that if g ii is independent of the λ j (j = i) for all i and j, the Christoffel tensors have non-zero values only for Γ i ii , leading to a much simpler geodesic solution (e.g., see [31]).
Finally, to appreciate the curved geometry associated with this geodesic solution, we proceed to calculate the Riemann curvature tensor R i kmn = ∂ m Γ i nk + Γ i mp Γ p nk − ∂ n Γ i mk − Γ i np Γ p mk and the Ricci tensor R ij = R k ikj from Equation (A13) and find the following non-zero components [15] Non-zero curvature tensors represent that the metric space is curved with a finite curvature. Specifically, we find the Ricci tensor R ij = R k ikj and curvature R: The negative curvature is typical of hyperbolic geometry. Finally, using R = −1, we calculate the Einstein field equation Since G ij = 8πT ij where T ij is the stress-energy tensor, we see that T ij = 0 for this problem.

Appendix D. Hyperbolic Geometry
The Hyperbolic geometry in the upper-half plane [13,117] becomes more obvious when Equation (A12) is expressed in terms of the two parameters x(t) and σ(t) where x and y axes represent x(t) and σ(t) with the metric tensor (A20)