Information Length Analysis of Linear Autonomous Stochastic Processes

When studying the behaviour of complex dynamical systems, a statistical formulation can provide useful insights. In particular, information geometry is a promising tool for this purpose. In this paper, we investigate the information length for n-dimensional linear autonomous stochastic processes, providing a basic theoretical framework that can be applied to a large set of problems in engineering and physics. A specific application is made to a harmonically bound particle system with the natural oscillation frequency ω, subject to a damping γ and a Gaussian white-noise. We explore how the information length depends on ω and γ, elucidating the role of critical damping γ=2ω in information geometry. Furthermore, in the long time limit, we show that the information length reflects the linear geometry associated with the Gaussian statistics in a linear stochastic process.


Introduction
Stochastic processes are common in nature or laboratories, and play a major role across traditional disciplinary boundaries (e.g., see [1,2]). These stochastic processes often exhibit complex temporal behaviour and even the emergence of order (self-organization). The latter can also be artificially designed to complete an orderly task (guided self-organization) [3][4][5][6]. In order to study and compare the dynamics of different stochastic processes and self-organization, it is valuable to utilize a measurement which is independent of any specifics of a system [7][8][9][10][11] (e.g., physical variables, units, dimensions, etc.). This can be achieved by using information theory based on probability density functions (PDFs) and working in terms of information content or information change, e.g., by quantifying the statistical difference between two states [12][13][14]. Mathematically, we do this by assigning a metric to probability and by using the notion of 'length' or 'distance' in the statistical space.
One method of measuring the information content in a system is utilizing the Fisher information, which represents the degree of certainty, or order. The opposite is entropy, which is a popular concept for the uncertainty or amount of disorder. Comparing entropy at different times then gives a measure of the difference in information content between the two states, which is known as relative entropy (e.g., see [15]). Another example is the Wasserstein metric [16,17], which provides an exact solution to the Fokker-Planck equation for a gradient flow subject to the minimization of the energy functional defined as the sum of the entropy and potential energy [18][19][20]. This metric has units of a physical length in comparison with other metrics, for instance the dimensionless statistical distance based on the Fisher information metric [21][22][23]. Interestingly, there is a link between the Fisher information and the Wasserstein distance [24]. Furthermore, the relative entropy can be expressed by the integral of the Fisher information along the same path [25].
Although quite useful, the relative entropy lacks the locality of a metric as it concerns only about the difference between given two PDFs. For instance, when these two PDFs represent the two states at different times, the relative entropy between them tells us nothing about how one PDF evolves to the other PDF over time or what intermediate states a system passes through between the two PDFs. As a result, it can only inform us of the changes that affect the overall system evolution [26]. To overcome this limitation, the information length L(t) was proposed in recent works, which quantifies the total number of different states that the system evolves through in time [27,28]. This means that the information length is a measure that depends on the evolution path between two states (PDFs). Its formulation allows us to measure local changes in the evolution of the system as well as providing an intriguing link between stochastic processes and geometry [26].
For instance, the relation between the information length L ∞ = L(t → ∞) and the mean value of the initial PDF for the fixed values of all other parameters was invoked as a new way of mapping out an attractor structure in a relaxation problem where any initial PDF relaxes into its equilibrium PDF in the long time limit. Specifically, for the Ornstein-Uhlenbeck (O-U) process driven by a Gaussian white-noise (which is a linearly damped, relaxation problem), L ∞ increases linearly with the distance between the mean position of an initial PDF and the stable equilibrium point for further details, see [28,29], with its minimum value zero at the stable equilibrium point. This linear dependence manifests that the information length preserves the linear geometry of the underlying Gaussian process, which is lost in other metrics [26]. For a nonlinear stochastic process with nonlinear damping, L ∞ still takes its minimum value at the stable equilibrium point but exhibits a power-law dependence on the distance between the mean value of an initial PDF and the stable equilibrium point. In contrast, for a chaotic attractor, L ∞ changes abruptly under an infinitesimal change of the mean value of an initial PDF, reminiscent of the sensitive dependence on initial conditions of the Lyapunov exponent [30]. These results suggest that L ∞ elucidates how different (non)linear forces affect (information) geometry.
With the above background in mind, this paper aims to extend the analysis of the information length of the O-U process to an arbitrary n-th order linear autonomous stochastic processes, providing a basic theoretical framework to be utilized in a large set of problems in both engineering and physics. In particular, we provide a useful analytical result that defines the information diagnostics as a function of the covariance matrix and the mean vector of the system, which enormously reduces the computational cost of numerical simulations of high-order systems. This is followed by a specific application to a harmonically bound particle system (Kramers equation) for the position x and velocity v = dx dt , with the natural oscillation frequency ω, subject to a damping constant γ and a Gaussian white-noise (short-correlated). We find an exact time-dependent joint PDF p(x, v, t) starting from an initial Gaussian PDF which has a finite-width. Note that as far as we are aware of, our result p(x, v, t) is original since in literature, the calculation was done only for the case of a delta-function initial PDF. Since this process is governed by the two variables, x and v, we investigate how L ∞ depends on their initial mean values x 0 and v 0 . Here, the angular brackets denote the average. Furthermore, the two characteristic time scales associated with ω and γ raise the interesting question as to their role in L ∞ . Thus, we explore how the information length depends on ω and γ. Our principle results are as follows: (i) L ∞ tends to increase linearly with either the deviation of initial mean position x 0 or the initial mean velocity v 0 from their equilibrium values x(0) = v(0) = 0; (ii) a linear geometry is thus preserved for our linearly coupled stochastic processes driven by a Gaussian noise; (iii) L ∞ tends to take its minimum value near the critical damping γ = 2ω for the same initial conditions and other parameters.
The remainder of this paper is organized as follows: Section 2 presents the basic concept of information length and the formulation of our problem. In Section 3, our main theoretical results are provided (see also Appendix A). In Section 4, we apply the results in Section 3 to analyze a harmonically bound particle system with the natural oscillation frequency ω subject to a damping constant γ and a Gaussian white-noise. Finally, Section 5 contains our concluding remarks.
To help readers, we here provide a summary of our notations: R and C are the sets of real and complex numbers, respectively. x ∈ R n represents a column vector x of real numbers of dimension n, A ∈ R n×n represents a real matrix of dimension n × n, tr(A) corresponds to the trace of the matrix A, A T and A −1 are the transpose and inverse of matrix A, respectively. (Bold-face letters are used to represent vectors and matrices.) In some places, ∂ t or the prime both are used for the partial derivative with respect to time. Besides, i = √ −1 and for s ∈ C, L −1 F(s) = 1 2πi lim b→∞ a+bi a−bi e st F(s) ds corresponds to the inverse Laplace transform of the complex function F(s). Finally, the average of a random vector x is denoted by x .

Information Length
As noted in Section 1, the information length [26,27,31] is a dimensionless measurement of the total number of statistically different states that a system passes through in time in non-equilibrium processes. We cannot overemphasize that it is a measure that depends on the evolution of the system, being a useful index for understanding the information geometry underlying non-equilibrium processes. For example, for a time-dependent PDF p(x, t) of one stochastic variable x, the information length L(t) is the total information change between time 0 and t, and is defined by Here, is the square of the information velocity (recalling we are working with the unit where the distance given by the information length has no dimension). As we can see, to define the information length, we compute the dynamic time unit τ(t) = 1 √ E , which quantifies the correlation time over which the PDF p(x, t) changes. Besides, τ serves as the time unit in the statistical space. Alternatively, the information velocity 1 τ(t 1 ) quantifies the (average) rate of change of information in time.

Problem Formulation
We consider the following linear autonomous procesṡ Here, A is an n × n constant real matrix; Γ ∈ R n is a stochastic driving given by a n dimensional vector of δ-correlated Gaussian noises Γ i (i = 1, 2, ...n), with the following statistical property Note that D ii represents the strength of the i-th stochastic noise while D ij for i = j denotes the correlation between i-th and j-th noises (i.e., random fluctuations). Then, from the joint PDF p(x, t), we define the information length L of system (2) by the following integral is the square of the information velocity.
The first goal of this paper is to provide theoretical results for the information length (4) for the system (2) and (3). This is done in the following Section 3.

General Analytical Results
In the section, we provide the analytical results for Problem 2.1, summarizing the main steps required to calculate information length (4). To this end, we assume that an initial PDF is Gaussian and then take the advantage of the fact that a linear process driven by a Gaussian noise with an initial Gaussian PDF is always Gaussian. The joint PDF for (2) and (3) is thus Gaussian, whose form is provided below.
Proposition 1 (Joint probability). The system (2) and (3) for a Gaussian random variable x at any time t has the following joint PDF where and D ∈ R n×n is the matrix of elements D ij . Here, x(t) is the mean value of x(t) while Σ is the covariance matrix.
Proof. For a Gaussian PDF of x, all we need to calculate are the mean and covariance of x and substitute them in the general expression for multi-variable Gaussian distribution (5). To this end, we first write down the solution of Equation (2) as follows By taking the average of Equation (8), we find the mean value of x(t) of (8) as follows which is Equation (6). On the other hand, to find covariance Σ(t), we let x = x + δx, and use the property δx(0)Γ(t) = 0 to find Here δx(0) = δx(t = 0) is the initial fluctuation at t = 0. Equation (10) thus proves Equation (7). Substitution of Equations (6) and (7) in Equation (5) thus gives us a joint PDF p(x, t) Next, in order to calculate the information length from the joint PDF p(x, t) in Equation (5), we now use the following Theorem: Theorem 1 (Information Length). The information length of the joint PDF of system (2) and (3) is given by the following integral where Q = Σ −1 (recall, a prime denotes ∂ ∂t ).
Proof. To prove this theorem, we use the PDF (5) in (4). To simplify the expression, we let We then compute step by step as follows: Now, using Equation (12) in Equation (14), we compute the integral To calculate the three averages in (15), we use the properties ∞ −∞ e − 1 2 w T Qw dw = det(2πΣ) [32], We then have Here We recall that ω i , q ij and ω i , q ij denote the first and second derivative over time of the elements ω i and q ij . By substituting (17) in (16) and making some arrangements, we obtain Now with the help of the following relations Equation (19) thus proves Equation (11).
Given important properties of the covariance matrix eigenvalues (see, e.g., [35]), it is useful to express Equation (19) and the information length as a function of these covariance matrix eigenvalues. This is done in the following Corollary. Corollary 1. Let ϕ i (t)'s (i = 1, ...n) be the eigenvalues of the covariance matrix Σ, andx = x (t) T P where P is an orthonormal matrix whose column vectors are linearly independent eigenvectors of Q = Σ −1 . We can rewrite the information length (11) as Proof. The proof follows straightforwardly from the fact that Σ is a symmetric matrix which can be diagonalised by finding the orthonormal matrix P such that P T Σ −1 P = Φ. Here Φ is the diagonal matrix whose entries are the eigenvalues 1 ϕ i (t) ∀i = 1, 2, . . . , n (recall that ϕ i (t) is i-th the eigenvalue of Σ −1 ). This gives us This finishes the proof.
It is useful to check that Equation (20) reproduces the previous result for the O-U process [36] where β = 1 2 (x− x ) 2 is the inverse temperature. Here, β denotes the time derivative of β. To show this, we note that for the O-U process, the covariance matrix is a scalar (n = 1) with the value Σ = 1 2β = ϕ(t) and thus Q = 1 ϕ(t) = 2β while x (t) = x . Thus, In sum, for the O-U process, the square of the information velocity (shown in expression (22)) increases with the 'roughness' of the process, as quantified by the squared ratio of the rate of change of the inverse temperature (or precision) and the precision -plus a term that depends upon this precision times the variance of the state velocity.

Kramers Equation
In this section we apply our results in Section 3 to the Kramers equation for a harmonically bound particle [19,37]. As noted in Introduction, we investigate the behaviour of the information length when varying various parameters and initial conditions to elucidate how the information geometry is affected by the damping, oscillations, strength of the stochastic noises and initial mean values.
Consider the Kramers equation Here, ω is a natural frequency and γ is the damping constant, both positive real numbers. ξ is a Gaussian white-noise acting on v with the zero mean value ξ(t) = 0, with the statistical property Comparing Equations (23) and (24) with Equations (2) and (3), we note that x 1 = x, x 2 = v, ξ 1 = 0, ξ 2 = ξ, D 11 = 0, D 12 = 0, and D 22 = D while the matrix A for (23) has the element A 11 = 0, A 12 = 1, A 21 = −ω 2 , A 22 = −γ. Thus, the eigenvalues of A are λ 1,2 = − 1 2 γ ± γ 2 − 4ω 2 . To find the information length for the system (23), we use Proposition 1 and Theorem 1. First, Proposition 1 requires the computation of the exponential matrix e At involving a rather long algebra with the help of [38]. The result is: Here, I ∈ R n×n is the identity matrix. Similarly, we can show Using Equations (25) and (26) in Equations (6) and (7), we have the time-dependent (joint) PDF (5) at any time t for our system (23) and (24). To calculate Equation (11) with the help of Equations (25) and (26), we perform numerical simulations (integrations) for various parameters in Equations (23) and (24) as well as initial conditions. Note that while we have simulated many different cases, for illustration, we show some representative cases by varying D, ω, γ and x(0) , v(0) in Sections 4.1-4.3 and Appendix A, respectively, for the same initial covariance matrix Σ(0) with elements Σ 11 (0) = Σ 22 (0) = 0.01 and Σ 12 (0) = Σ 21 (0) = 0. Note that the initial marginal distributions of p(x(0)) and p(v(0)) are Gaussian with the same variance 0.01. Results in the limit ω → 0 are presented in Section 4.4. Figure 1 shows the results when varying D as D ∈ (0.0005, 0.04) for the fixed parameters γ = 2 and ω = 1. The initial joint PDFs are Gaussian with the fixed mean values x(0) = −0.5, v(0) = 0.7; as noted above, the covariance matrix Σ(0) with elements Σ 11 (0) = Σ 22 (0) = 0.01 and Σ 12 (0) = Σ 21 (0) = 0. Consequently, at t = 0, the marginal distributions of p(x(0)) and p(v(0)) are Gaussian PDFs with the same variance 0.01 and the mean values x(0) = −0.5 and v(0) = 0.7, respectively. Figure 1a,b show the snapshots of time-dependent joint PDF p(x, t) (in contour plots) for the two different values of D = 0.0005 and D = 0.04, respectively. The black solid represents the phase portrait of the mean value of x(t) and v(t) while the red arrows display the direction of time increase. Note that in Figure 1b, only some of the initial snapshots of the PDFs are shown for clarity, given the great amount of overlapping between different PDFs. Figure 1c,d show the time-evolution of the information velocity E (t) and information length L(t), respectively, for different values of D ∈ (0.0005, 0.04). It can be seen that the system approaches a stationary (equilibrium) state for t 20 for all values of D, L(t) approaching constant values (recall L(t) does not change in a stationary state). Therefore, we approximate the total information length as L ∞ = L(t = 50), for instance. Finally, the total information length L ∞ = L(t = 50) is shown in Figure 1e. We determine the dependence of L ∞ on D by fitting an exponential function as L ∞ (D) = 7.84e −329.05D + 11.21e −11.86D (shown in red solid line).

Varying ω or γ
We now explore how results depend on the two parameters ω and γ, associated with oscillation and damping, respectively. To this end, we use D = 0.0005 and the same initial conditions as in Figure 1 but vary ω ∈ (0, 2) and γ ∈ (0, 6) in Figures 2 and 3, respectively. Specifically, in different panels of these figures, we show the snapshots of the joint PDF p(x, t), the time-evolutions of E (t) and L(t) for different values of ω ∈ (0, 2) and γ ∈ (0, 6), and L ∞ against either ω or γ. From Figures 2e and 3e, we can see that the system is in a stationary state for sufficiently large t = 10 and t = 100, respectively. Thus, we use L ∞ = L(t = 10) = L(10) in Figure 2f,g and L ∞ = L(t = 100) = L(10) in Figure 3f,g.

Varying x(0) or v(0)
To elucidate the information geometry associated with the Kramer equation (Equations (23) and (24) Figure 4, we use the same initial covariance matrix Σ(0) as in Figures 1-3, D = 0.0005 and ω = 1 and a few different values of γ (above/below/at the critical value γ = 2). We note that the information geometry near a non-equilibrium point is studied in Appendix A. Specifically, snapshots of p(x, t) are shown in Figure 4a-f for γ = 2.5 (above its critical value γ = 2 = 2ω) while those in Figure 4c-g are for γ = 0.1 below the critical value 2. By approximating L ∞ = L(t = 100), we then show how L ∞ depends on x(0) and v(0) for different values of γ in Figure 4d,e and Figure 4h,i, respectively. Figure 4d,e show the presence of a minimum in L ∞ at the equilibrium x(0) = 0 (recall v(0) = 0); L ∞ is a linear function of x(0) for x(0) 0.1, which can be described as L ∞ (x(0), γ) = h(γ)| x(0) | + f (γ). Here, h(γ) and f (γ) are constant functions depending on γ for a fixed ω which represent the slope and the y-axis intercept, respectively. A non-zero value of L ∞ at x(0) = 0 is caused by the adjustment (oscillation and damping) of the width of the PDFs in time due to the disparity between the width of the initial and equilibrium PDFs (see Figure 4b). In other words, even though the mean values remain in equilibrium for all time [ x(0) , v(0) ] T = lim t→∞ x(t) = [0, 0] T , the information length (11) depends on the covariance matrix Σ which changes from its initial value to the final equilibrium value as follows On the other hand, L ∞ against x(0) shows parabolic behaviour for small x(0) < 0.1 in Figure 4e. This is caused by the finite width 0.1 = Σ 11 (0) = Σ 22 (0) of the initial p(x, 0); we see that x(0) < 0.1 is within the uncertainty of the initial p(x, 0). Similarly, Figure 4h,i exhibit a minimum in L ∞ at the equilibrium v(0) = 0 (recall x(0) = 0 in this case); L ∞ is a linear function of v(0) for v(0) 0.1 described by L ∞ (v(0), γ) = H(γ)| v(0)| + F(γ) (again parabolic for v(0) < 0.1, see Figure 4i). Here again, H(γ) and F(γ) are constant functions depending on γ for a fixed ω which represent the slope and the y-axis intercept, respectively.

The Limit
Where ω → 0 When the natural frequency ω = 0 (i.e., damped-driven system like the O-U process [36]) in Equation (23), the two eigenvalues of the matrix A become λ 1 → −γ and λ 2 → 0. It then easily follows that and Σ(t) is composed by the elements To investigate the case of ω → 0, we consider the scan over D ∈ (0.0005, 0.04) for the same parameter value γ = 2, and the initial conditions as in Figure 1, apart from using ω = 0 instead of ω = 1. Figure 5

Concluding Remarks
We have presented theoretical results of time-dependent PDFs and the information length for n-th order linear autonomous stochastic processes, which can be applied to a variety of practical problems. In particular, the information length diagnostics was found as a function of the mean and covariance matrices; the latter was further expressed in terms of the covariance matrix eigenvalues. A Specific application was made to a harmonically bound particle system with the natural oscillation frequency ω, subject to a damping γ and a Gaussian white-noise (Kramer equation). We investigated how the information length depends on ω and γ, elucidating the role of critical damping γ = 2ω in information geometry. The fact that the information length tends to take its minimum value near the critical damping can be viewed as the simplification of dynamics and thus the decrease in information change due to the reduction of the two characteristic time scales associated with ω and γ to the one value. On the other hand, the information length in the long time limit was shown to preserve the linear geometry associated with the Gaussian statistics in a linear stochastic process, as in the case of the O-U process.
Future works would include the exploration of our results when applied to high-dimensional processes and the extension of our work to a more general (e.g., finite-correlated) stochastic noise, non-autonomous systems or non-linearly coupled systems. In particular, it will be of interest to look for a geodesic solution in non-autonomous systems [9] with the help of an external force, optimization or guiding self-organization (multi-agent systems) as well as elucidating the role of critical damping and resonances in self-organization. In addition, it would also be interesting to utilize the results introduced in [39] to predict the bound on the evolution of any observable for the Kramers problem (23), and compare it with a natural observable in such a system, the energy, for instance.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Analysis for Non-Zero Fixed Initial Conditions
In Section 4.3 we analysed the behaviour of the information geometry associated with the Kramer equation (Equations (23) and (24)) for different γ ∈ (0, 2.5) near the equilibrium point x(0) = v(0) = 0. To this end, we plotted L ∞ when varying x(0) and v(0) for a fixed v(0) = 0 and x(0) = 0, respectively. In this Appendix, we want to show how such information geometry changes near a non-equilibrium point by scanning over x(0) and v(0) for a fixed non-zero v(0) = 0.7 and x(0) = −0.5, respectively. We show that the use of non-zero fixed initial conditions changes the location of the minimum L ∞ depending on γ. Here, we use the same parameter values D = 0.0005, ω = 1, Σ 12 (0) = Σ 21 (0) = 0 and Σ 11 (0) = Σ 22 (0) = 0.01. First, snapshots of p(x, t) are shown in Figure A1a,f for γ = 2.5 (above its critical value γ = 2 = 2ω) while those in Figure A1b,g are for γ = 0.1 below the critical value 2. It is important to notice that there is a non-symmetric behaviour of the trajectories of the system for γ 0. This is shown at Figure A1a,f whose trajectories asymmetrically vary over the initial conditions in comparison with the results shown in Figure 4a,f. By approximating L ∞ = L(t = 100), we then show how L ∞ depends on x(0) and v(0) for different values of γ in Figure A1c,d and Figure A1h,i, respectively. Of prominence in Figure A1c,d is the presence of a distinct minimum in L ∞ for a particular value of x(0) = x c , L ∞ linearly increasing with | x(0) − x c | for a sufficiently large | x(0) − x c |; similarly, Figure A1h,i shows a distinct minimum in L ∞ for a particular value of v(0) = v c , L ∞ linearly increasing with | v(0) − v c | for a sufficiently large | v(0) − v c |.