Comparing Information Metrics for a Coupled Ornstein–Uhlenbeck Process

It is often the case when studying complex dynamical systems that a statistical formulation can provide the greatest insight into the underlying dynamics. When discussing the behavior of such a system which is evolving in time, it is useful to have the notion of a metric between two given states. A popular measure of information change in a system under perturbation has been the relative entropy of the states, as this notion allows us to quantify the difference between states of a system at different times. In this paper, we investigate the relaxation problem given by a single and coupled Ornstein–Uhlenbeck (O-U) process and compare the information length with entropy-based metrics (relative entropy, Jensen divergence) as well as others. By measuring the total information length in the long time limit, we show that it is only the information length that preserves the linear geometry of the O-U process. In the coupled O-U process, the information length is shown to be capable of detecting changes in both components of the system even when other metrics would detect almost nothing in one of the components. We show in detail that the information length is sensitive to the evolution of subsystems.


Introduction
Describing many natural systems statistically can give great insight into the system's dynamics, when uncertainty or degrees of freedom are too high to do otherwise. Measures of information change can be particularly useful in understanding the evolution of a system under perturbation, or comparing data (e.g., see [1]). Here, by information, we specifically refer to a measurable, statistical difference between the states of a system, defined by probability density functions (PDFs), and avoid any of the more diaphanous definitions of the term. The statistical difference can be quantified by assigning a metric to probability, which then endows a stochastic system with a geometric structure. Previously, different metrics (e.g., Refs. [2][3][4][5][6][7][8][9][10] have been considered depending on the question of interest. A popular measure of the information change in a system would be entropy, which measures the uncertainty or 'disorder' of the system. More specifically, it is a measure of the number of states that are accessible from the current state. Comparing entropy at different times gives a measure of the difference in information for the system, called the relative entropy. We can use this relative entropy as a metric. Another example is the Wasserstein metric, which was used to optimize transport cost in the optimal transport problem [4,6]; for Gaussian PDFs, the Wasserstein metric is defined in the product space consisting of Euclidean and positive symmetric matrices for the mean and variance, respectively (e.g., see [2]). The link between the Fisher information [8] and the Wasserstein distance was made in [6] where the integral of the Fisher information [8] along the Ornstein-Uhlenbeck semigroup was shown to be the same as the Wasserstein distance. Furthermore [1] stated that relative entropy was the integral of Fisher information along the same path.
However, the way in which the relative entropy has mostly been used in the past lacks a sense of locality to a metric of the system as it focuses on quantifying the difference between two given PDFs, for instance, PDFs at time t 1 and t 2 . As a result, they are independent of the intermediate PDFs between time t 1 and t 2 (the history/path of a system), and thus can only inform us about changes which affect the overall structure of the system. The work of [11] was, in part, a search for a disequilibrium component for a statistical complexity measure (SCM). In short, an SCM is a measure of both the 'order' and 'disorder' of a system, which can help to reveal hidden structures of a disordered system. They proposed several metrics 'disorder' or disequilibrium component of the SCM.
In this paper we compare several of the proposed metrics of [11] with the information length L [12][13][14][15][16][17][18][19][20]. The information length, proportional to the time integral of the square root of the infinitesimal relative entropy, depends on the intermediate states between t = t 1 and t = t 2 and is thus a Lagrangian measure. Also, the formulation of the information length allows us to measure local change for the system in time. L ∞ , the total information length over the entire evolution t = t 1 = 0 and t = t 2 → ∞, was shown to be useful to quantify the proximity of any initial PDF to a final attractor of a dynamical system. For instance, for the Ornstein-Uhlenbeck process (O-U) [16,18], L ∞ was shown to take its minimum value at the stable equilibrium point and increase linearly with the mean position of an initial PDF from the stable equilibrium point. This linear dependence manifests that the information length preserves the linear geometry of the underlying Gaussian process. In this paper, we will show that this linear relation is lost for other metrics (e.g., relative entropy, Jensen divergence). Note that for a chaotic attractor, L ∞ varies sensitively with the mean position of a narrow initial PDF, taking its minimum value at the most unstable point [21]. This sensitive dependence of L ∞ on the initial PDF is similar to a Lyapunov exponent.
We note that the O-U is a prototypical relaxation problem and can be particularly useful to study, as its attractor provides a natural equilibrium state. It can model many stochastic systems which relax to a stable equilibrium. The solution to this process is Gaussian, and so has 'nice' properties of analytical tractability, permitting us to perform detailed investigation under the change of parameters. We first compare different metrics for a single O-U process then move to a coupled O-U process. The O-U process is a well studied model, though less so for the coupled system. Our focus is to compare different metrics and to see if the information length may be more revealing the behavior of the components of the coupled system, as well as the overall system. The remainder of this paper is organized as follows. Section 2 provides the definition of different metrics. Section 3 is devoted to the discussion of a single O-U process. Section 4 provides analytical solutions to the coupled O-U process and Section 5 compares different metrics for the coupled O-U process. Conclusions are found in Section 6. In Appendix A, we present how to solve the Fokker-Planck equation(s) numerically by using a second-order accuracy method in time and compare analytical results with numerical results. Appendix B comments on the Langevin equation for our coupled O-U process.

Information Length and Other Metrics
We consider a PDF P(x, t) for a stochastic variable x in the following.

Information Length
The information length L(t) between time 0 and t is given by where 1 |τ(t)| 2 is the second moment given by, Here, τ(t) has the unit of time while L has no dimension. The parameter τ(t) is the characteristic timescale of the system, and quantifies the correlation time for the system [15]. Hence, 1 τ(t) is the rate of change of the information in time. Integrating 1 τ(t) over [0, t] gives the total number of statistically different states that a system passes through in time. We note that L quantifies the information change in time through the root-mean-squared fluctuating energy, using the second moment of the partial derivative with respect to time. When the parameters governing a PDF are known, the information length L(t) can be written in terms of the Fisher information metric [8,12,15].
As noted in Section 1, L(t) is a Lagrangian quantity and has the property of being a local measure, being sensitive to how P(x, t) evolves at different x in time. In comparison with entropy which is independent of the spatial (x) gradient of a PDF, it is this property that may elevate L(t) above entropy in revealing micro-scale interactions within a system.
The discrete version of Equations (1) and (2) are as follows: Here, i and j represent the discrete time and spatial point, respectively; P j i is the discrete version of P(x, t), h = t/n is the time step while s is the spatial step.

Other Metrics
Here, we list the metrics taken from [11]. We will calculate each metric relative to the initial state PDF P(x, 0) at t = 0 in order to compare their time-evolution with that of the information length. That is, the metrics below are based on comparing two PDFs P(x, 0) and P(x, t). For convenience, the metrics are given both for the continuous process and discrete approximation (that is used for numerical calculation) by using i, j as an index representing time and space, respectively and s as the spatial step, as above. The reference probability P(x, 0) [P 0 ] will be the initial PDF while P(x, t) [P i ] is the PDF at time t [i].

Euclidean norm
Applying our standard notion of distance in Euclidean space seems like a natural extension. However, it quickly becomes apparent that the statistical space of stochastic systems is rarely well described by Euclidean metrics. Mostly included here as a base case, whilst this formulation seems appealing, it does not yield illuminating details about the disequilibrium of the system. We will use it as an example of a poor measure of information change.

Wootters' Distance
This metric, as the notion of statistical distance itself, originates in quantum information theory [9]. However, as quantum information theory is purely statistical in formulation, it can be applied to any systems defined by a PDF. Fundamentally, this metric is based on the principle that any finite number of measurements on a stochastic system will yield results that may not be exactly the same as underlying probability distributions. It would be impossible to distinguish 2 states whose real underlying probabilities are different less than a typical fluctuation of the error of measurement. This intrinsically defines a resolution for the system. The Wootters' distance was shown to be a monotone transformation of the Hellinger distance [22].

Kullback-Leibler relative entropy
Kullback-Leibler relative entropy was first introduced by Solomon Kullback and Richard Leibler [10], and sometimes is referred to as the Kullback-Leibler divergence. It represents a measure of the difference between a probability distribution and some other reference probability distribution. Whilst a useful tool, it is not strictly a metric as it does not satisfy the triangle inequality. It is however used in the definition of some other quantities, such as the mutual information of two co-varying random variables, and the Jensen divergence.

Jensen Divergence
The Jensen divergence is simply the symmetric version of the Kullback-Leibler divergence. Often it is referred to as the Jensen distance, and the square root of this quantity can be shown to be a metric [11], which can allow us to examine the statistical geometry of the system. The Jensen divergence is the mutual information of a random variable x, with a mixture distribution from P(x, 0) and P(x, t), and a binary indicator variable used to build the distribution. In other words it is a measure of the mutual dependence of x on the way you construct the mixture, and thus quantifies the amount of information difference between the two distributions.

The O-U Process
The one-dimensional O-U process is based on the Langevin equatioṅ where x is the stochastic variable (e.g., position, velocity, etc.), γ is the damping constant, µ is the position of the attractor for the system, and ξ is a δ-correlated, Gaussian-distributed stochastic forcing, i.e., where D is the strength of the stochastic forcing. The corresponding Fokker-Planck equation [23,24] is given by ∂P ∂t where the solution P = P(x, t) is the time-dependant PDF which describes the evolution of the system. It can be shown that the solution to Equation (11) is given by [13] P( given the initial condition where β = 1/2 (x − x ) 2 and x in Equation (12) represent the inverse temperature and the mean value of x, respectively, and β 0 and x 0 in Equation (13) are the values of β and x at t = 0, respectively.

Information Length
In [13], we showed that the information length L(t) for the O-U process is given by where y = r 2 + qT, r = 2β 0 D − γ, q = β 0 γx 2 0 , and T = β 0 D(e 2γt − 1) + γ. y i is y evaluated at the initial time (t = 0 in our case) and y f is y evaluated at final time. x 0 is the initial mean position.
Let the integral in the last term of Equation (14) be H, and let r = q = 0. Then Equation (14) can be written as where Note that this is continuous through q = r. For q = 0, we can directly compute For r = 0 we have

Jensen Divergence
By using Equations (12) and (13) in (8), we obtain 3.5. Comparison Figure 1 shows the final value for each metric as we vary the initial position x 0 of the system. The total information length L ∞ = L(t → ∞), Wootters' distance, K-L relative entropy and Jensen divergence against x 0 are shown in blue, orange, green and red, respectively, in the long time limit as t → ∞. Note that the green and red lines are overlapped. It is notable in Figure 1 that the linear relation between the metric and the initial mean value x 0 is obtained only by the information length. That is, it is only the information length that preserves the linear geometry underlying a linear stochastic process. For all other metrics, this linear relation is lost. We show in the Appendix A that our analytical metrics in Figure 1 has a good agreement with those calculated directly from the numerical solutions to the Fokker-Planck Equation (11) by time-stepping (see Figure A1).

The Coupled O-U Process
We now consider the coupled system of equations where D is the strength of a short-correlated Gaussian noise given by Equation (10). These equations are a pair of O-U processes, linked by coupling constants f 0 and g 0 . The coupling f 0 and g 0 are due to the Dichotomous noise [25] (see Appendix B for the Langevin equation corresponding to Equations (22) and (23)). We choose a coupled system like this one to examine the localized dynamics of these interacting sub-processes. This system could model any process for which there are two competing components relaxing to an equilibrium, like evaporation in a closed system, or a reversible chemical reaction.
Since we are mainly interested in the relaxation process from non-equilibrium initial states, we choose the different initial conditions for P 1 and P 2 while for simplicity, considering the case where γ 1 = γ 2 = γ and f 0 = g 0 = where the Fokker-Planck Equations (22) and (23) are reduced to Specifically, as initial conditions, we use the following different Gaussian PDFs Note that β 10 and β 20 are the initial inverse temperatures for P 1 and P 2 , respectively. On the other hand, we fix the initial mean of P 1 to be zero while the initial P 2 is taken to have any arbitrary mean value x 0 . We also note that as t → ∞, P 1 and P 2 approach the same PDF.
To solve Equations (24) and (25), we take the Fourier transform [P m (k, t) = dx e ikx P m (x, t) for m = 1, 2] and use the characteristic equation to recast Equations (24) and (25) as Here d dt = ∂ ∂t + dk dt ∂ ∂x is the total derivative along the characteristic defined by which has the solution where k 0 = k(0) is the initial wavenumber. We solve Equations (28) and (29) in terms of new variables for m = 1, 2. The coupled Equations (28) and (29) are then simplified as We write down the solutions to Equations (33) and (34) using the two constants a and b To determine a and b, we take the Fourier transform of the initial conditions Equations (26) and (27) to obtainP Thus, evaluating Equation (32) for m = 1, 2 at t = 0 and equating them to Equations (37) and (38), we find On the other hand, using Equation (31) in Equation (32), we writeP m (m = 1, 2) in terms of P m as Finally, taking the inverse Fourier transform [P m (x, t) = 1 2π dx e −ikxP m (x, t) for m = 1, 2] and performing several Gaussian integrals, we obtain for m = 1, 2. We can check that at t = 0, Equations (42) and (43) recover Equations (26) and (27). On the other hand, in the limit of t → ∞, Equations (42) and (43) give us which is the stationary solution to a single O-U process where β m (t → ∞) = γ 2D . We note that the total PDF P = P 1 + P 2 is the solution of this single O-U process with the initial condition given by the sum of Equations (26) and (27).

Results for the Coupled O-U Process
For the illustration in this section, we show results for the fixed parameter values γ = 0.1, D = 1 and = 0.5. We recall from Section 4 that for these parameter values, in equilibrium (see Equation (44)),

Varying β 20
We first examine how the system changes when varying the initial inverse temperature β 20 of P 2 for the fixed zero mean position (x 0 = 0) (the equilibrium value). We investigate what changes are detected by each metric. Figure 2 shows the total information length (in blue), Euclidean norm (in orange), Wootters' distance (in green), K-L relative entropy (in red), and Jensen divergence (in purple) against β 20 in the long time limit as t → ∞. Panels (a), (b) and (c) are for the overall system P, and the components P 1 and P 2 , respectively. Since P 1 is initially chosen to be in its equilibrium state, the behavior for the overall system P in panel (a) is more similar to P 2 in panel (c) than P 1 in panel (b). What is prominent in both panels (a) and (c) is the presence of the distinct minimum in the information length around β 20 = 0.05. This is because the final equilibrium has the inverse temperature 0.05, demonstrating that the total information also maps out the underlying attractor structure when varying β 20 , taking its minimum value at the equilibrium state (reminiscent of the results for the single O-U process above and previous works [13,16,18]). The minimum value around β 20 = 0.05 is also observed for other metrics (apart from the Euclidean norm) although less pronounced than the total information length. In fact, for β 20 = β 10 = β(t → ∞), there is no temporal change in both P 1 and P 2 , so all the metrics apart from the Wootters' one take the value of zero. Also, of interest is an almost linear increase in the total information length for P 2 in panel (c) as |β 20 − 0.05| increases. Now, what is happening to P 1 which starts with the equilibrium state (β 10 = 0.05 and zero mean value)? While Equation (44) shows that β 1 = β 10 = γ/2D = 0.05 for all time, the actual PDF P 1 in Equation (42) changes with time due to its coupling to P 2 via . That is, P 1 changes over time, initially deviating from the equilibrium state due to the interaction with P 2 and then finally relaxing back to the final equilibrium state. Associated with this time evolution of P 1 is the information change which can be measured by different metrics. However, Figure 2b shows that for P 1 , it is only information length that detects a noticeable difference in the information change as β 20 changes. Furthermore, the information length for P 1 takes the minimum value for the equilibrium value of β 20 = 0.05, as was the case for P 2 . This result demonstrates that the information length is sensitive to the evolution of of the component P 1 (a subsystem).
To investigate the evolution of the metrics for the P 1 around β 20 , we show in Figure 3 how the metrics evolve over time for the component P 1 for the two different values of β 20 = 0.02 and 0.08 near the equilibrium value β 20 = 0.05. Panel a) is for β 20 = 0.02 < 0.05 and panel b) is for β 20 = 0.08 > 0.05. The different metrics are denoted by using the same color as used in Figure 2. This small deviation of β 20 from the equilibrium value induces a small change in P 1 over time due to the coupling to P 2 , as noted above. However, in panels a) and b), we see a significant change in the Euclidean norm, K-L relative entropy, and Jensen divergence in time, with a large increase before settling down to a lower value. In comparison, the information length shows no such spike. These spikes are caused by the deviation of P 1 from its initial equilibrium state due to the coupling to P 2 before settling into the equilibrium state as P 2 approaches the equilibrium. What is interesting is that the information length does not show such a spike since it measures the local information change; this would thus be the more sensible view in this instance since the change in the component P 1 is small compared to its width (uncertainly). Furthermore, such spikes do not appear for P 2 nor P (results not shown) since P 2 starting from a non-equilibrium state monotonically approaches its equilibrium.

Varying x 0
We now fix β 20 = 0.05 and vary the initial mean position x 0 of P 2 to examine how metrics depend on x 0 , as we have done for the single O-U process in Section 3. Figure 4 shows the total information change for each metric for different values of x 0 , for P, P 1 and P 2 in panels (a), (b), and (c) respectively. Specifically, Figure 4a shows that for P, the total information length against x 0 is linear, capturing the linearity of the system as expected from the single O-U process. None of the other metrics are capable of showing the linear relationship in the same way. On the other hand, Figure 4b,c shows an interesting non-monotonic dependence of the information length on x 0 .
To understand this, we show in Figure 5 the time evolution of P, P 1 , and P 2 in panels (a), (b), and (c), respectively, by using x 0 = 20. Of interest is that the evolution of P 1 and P 2 in Figure 5b,c involves the formation of the two peaks from the initial one peak, followed by merging of these two peaks into one peak as a system settles into the equilibrium. This formation of the two peaks is due to the interaction between P 1 and P 2 when they are initially widely separated for a sufficiently large x 0 20. The formation of two PDFs peaks for x 0 20 leads to the maximum in the total information length around x 0 = 20 in Figure 4b,c. Specifically, the formation of the two peaks in P 1 and P 2 shown in Figure 5b,c takes place when the two peak are a full PDF width apart, and facilitates broadening of both PDFs in the relaxation process. As x 0 is further increased from 20, P 1 and P 2 form two peaks which are more widely separated, leading to P 1 and P 2 becoming effectively broader. This in turn reduces the information length (as x 0 increases further from x 0 = 20) since large fluctuations (uncertainty) associated with a broad PDF reduces the information length. Again, the information length is the only measure which detects the difference in the overall information change for P 1 due to its sensitive dependence on the local evolution of a system.

Conclusions
When searching for a way to quantify the information change in a given dynamical system, our choices are many and varied. Our aim here was to show the power of the information length L when compared with some of the more popular methods of measuring information change. Utilizing the O-U, we compared several relative-entropy formulations with our information length to investigate what each could reveal about the dynamics of system.
Specifically, we investigated the relaxation problem given by a single and coupled O-U process and compared the information length L with K-L relative entropy, Jensen divergence, Wootters' distance, and Euclidean norm. By measuring the total information length in the long time limit, we showed that L was unique in detecting the linear spatial relationship between the total information change and the initial position of a PDF. In the coupled O-U process, the information length was shown to be the most effective in detecting changes in the components of the system even when the others would detect almost nothing in one of the components. In particular, when P 1 started with an equilibrium state with the zero mean value, the formation of the two peaks of P 1 (or P 2 ) from an initial one peak P 1 (or P 2 ) and the merging of the two peaks into one peak as a system settled into equilibrium was detected only by the information length with its intriguing non-monotonic dependence on x 0 (the mean value of P 2 ). This underscores the sensitivity of the information length on the evolution of subsystems.
Future work will include the study of a system with multiple attractor positions for the system or how the system behaves when changing the position of the attractor. It would also be interesting to examine the case where the coupling parameters f 0 and g 0 are not constant, but are functions of time. This could result in a periodic equilibrium where the PDF varies between 2 or more unstable states. This could represent physical systems like reversible chemical reactions, or even fluctuating financial markets. It would also be of interest to investigate implication of the information length for the deep neural network [26], in particular, to elucidate the role of geodesic along which the information length is minimized [27].

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
In this appendix, we show the numerical solutions to the Fokker-Planck equation by time-stepping and metrics using these numerical solutions for the O-U process and the coupled O-U processes.

. The O-U Process
We use the finite difference method to approximate the partial derivatives from Equation (11), where h is our time-step, s is our spatial step, i is the current time step, and j is the current spatial step.
Then using a second-order Runga-Kutta Method, we create a time-stepped solution. Specifically, we first create an intermediate valueP by where which is the time-gradient for the system. Here, t i is the current time value. Then we use the intermediate value to calculate the next time step value, P j i+1 , using the formula For simplicity, we set D = 1. This is to consider the case of the one parameter system, as the O-U process can always be arranged to combine γ and D into a single parameter. For our stepping, we use h = 0.04 and s = 0.1. We check on the convergence of our solutions by reducing h and s. We use the initial condition given by Equation (13). All solutions are coded in pythonuni We present the numerically computed metrics from the time-stepping in Figure A1, which shows the final value for each metric as we vary the initial position of the system, x 0 . Figure A1 is quite similar to the analytically calculated metrics shown in Figure 1, demonstrating a good agreement between the analytical and numerical solutions.
To test the accuracy of our time-stepped model further, Figure A2 compares the time evolution of different metrics for the largest x 0 value, i.e., the largest perturbation from equilibrium in our testing, with analytic and time-stepped solutions on the left and right respectively. The agreement between the analytical and numerical solutions is observed to be quite good. where β 1 and β 2 are the inverse temperature for P 1 and P 2 respectively, and ν 1 and ν 2 are their initial mean positions. When varying the system parameters, we fix the parameters for P 1 at equilibrium (ν 1 = 0, β 1 = 0.05, γ 1 = 0.1) and vary the parameters for P 2 (ν 2 , β 2 ). The choice between fixing P 1 or P 2 is arbitrary. We solve the Fokker-Planck equations for the coupled O-U process (Equations (22) and (23)) by time-stepping and then numerically calculated the metrics. These are shown to have a good agreement with the results using the analytic formulation of the PDF (Equations (42) and (43)) in Section 5.