Next Article in Journal
Maximum Entropy Analysis of Flow Networks: Theoretical Foundation and Applications
Next Article in Special Issue
On the Wave Turbulence Theory for the Nonlinear Schrödinger Equation with Random Potentials
Previous Article in Journal
Deep-Reinforcement Learning-Based Co-Evolution in a Predator–Prey System
Previous Article in Special Issue
Information Geometry of Spatially Periodic Stochastic Systems

Entropy 2019, 21(8), 775; https://doi.org/10.3390/e21080775

Article
Comparing Information Metrics for a Coupled Ornstein–Uhlenbeck Process
School of Mathematics and Statistics, University of Sheffield, Sheffield S3 7RH, UK
*
Authors to whom correspondence should be addressed.
Received: 9 July 2019 / Accepted: 6 August 2019 / Published: 8 August 2019

Abstract

:
It is often the case when studying complex dynamical systems that a statistical formulation can provide the greatest insight into the underlying dynamics. When discussing the behavior of such a system which is evolving in time, it is useful to have the notion of a metric between two given states. A popular measure of information change in a system under perturbation has been the relative entropy of the states, as this notion allows us to quantify the difference between states of a system at different times. In this paper, we investigate the relaxation problem given by a single and coupled Ornstein–Uhlenbeck (O-U) process and compare the information length with entropy-based metrics (relative entropy, Jensen divergence) as well as others. By measuring the total information length in the long time limit, we show that it is only the information length that preserves the linear geometry of the O-U process. In the coupled O-U process, the information length is shown to be capable of detecting changes in both components of the system even when other metrics would detect almost nothing in one of the components. We show in detail that the information length is sensitive to the evolution of subsystems.
Keywords:
stochastic processes; Langevin equation; Fokker–Planck equation; information length; Fisher information; metrics; O-U process; probability density function

1. Introduction

Describing many natural systems statistically can give great insight into the system’s dynamics, when uncertainty or degrees of freedom are too high to do otherwise. Measures of information change can be particularly useful in understanding the evolution of a system under perturbation, or comparing data (e.g., see [1]). Here, by information, we specifically refer to a measurable, statistical difference between the states of a system, defined by probability density functions (PDFs), and avoid any of the more diaphanous definitions of the term. The statistical difference can be quantified by assigning a metric to probability, which then endows a stochastic system with a geometric structure. Previously, different metrics (e.g., Refs. [2,3,4,5,6,7,8,9,10] have been considered depending on the question of interest.
A popular measure of the information change in a system would be entropy, which measures the uncertainty or ‘disorder’ of the system. More specifically, it is a measure of the number of states that are accessible from the current state. Comparing entropy at different times gives a measure of the difference in information for the system, called the relative entropy. We can use this relative entropy as a metric. Another example is the Wasserstein metric, which was used to optimize transport cost in the optimal transport problem [4,6]; for Gaussian PDFs, the Wasserstein metric is defined in the product space consisting of Euclidean and positive symmetric matrices for the mean and variance, respectively (e.g., see [2]). The link between the Fisher information [8] and the Wasserstein distance was made in [6] where the integral of the Fisher information [8] along the Ornstein–Uhlenbeck semigroup was shown to be the same as the Wasserstein distance. Furthermore [1] stated that relative entropy was the integral of Fisher information along the same path.
However, the way in which the relative entropy has mostly been used in the past lacks a sense of locality to a metric of the system as it focuses on quantifying the difference between two given PDFs, for instance, PDFs at time t 1 and t 2 . As a result, they are independent of the intermediate PDFs between time t 1 and t 2 (the history/path of a system), and thus can only inform us about changes which affect the overall structure of the system. The work of [11] was, in part, a search for a disequilibrium component for a statistical complexity measure (SCM). In short, an SCM is a measure of both the ‘order’ and ‘disorder’ of a system, which can help to reveal hidden structures of a disordered system. They proposed several metrics ‘disorder’ or disequilibrium component of the SCM.
In this paper we compare several of the proposed metrics of [11] with the information length L [12,13,14,15,16,17,18,19,20]. The information length, proportional to the time integral of the square root of the infinitesimal relative entropy, depends on the intermediate states between t = t 1 and t = t 2 and is thus a Lagrangian measure. Also, the formulation of the information length allows us to measure local change for the system in time. L , the total information length over the entire evolution t = t 1 = 0 and t = t 2 , was shown to be useful to quantify the proximity of any initial PDF to a final attractor of a dynamical system. For instance, for the Ornstein–Uhlenbeck process (O-U) [16,18], L was shown to take its minimum value at the stable equilibrium point and increase linearly with the mean position of an initial PDF from the stable equilibrium point. This linear dependence manifests that the information length preserves the linear geometry of the underlying Gaussian process. In this paper, we will show that this linear relation is lost for other metrics (e.g., relative entropy, Jensen divergence). Note that for a chaotic attractor, L varies sensitively with the mean position of a narrow initial PDF, taking its minimum value at the most unstable point [21]. This sensitive dependence of L on the initial PDF is similar to a Lyapunov exponent.
We note that the O-U is a prototypical relaxation problem and can be particularly useful to study, as its attractor provides a natural equilibrium state. It can model many stochastic systems which relax to a stable equilibrium. The solution to this process is Gaussian, and so has ‘nice’ properties of analytical tractability, permitting us to perform detailed investigation under the change of parameters. We first compare different metrics for a single O-U process then move to a coupled O-U process. The O-U process is a well studied model, though less so for the coupled system. Our focus is to compare different metrics and to see if the information length may be more revealing the behavior of the components of the coupled system, as well as the overall system. The remainder of this paper is organized as follows. Section 2 provides the definition of different metrics. Section 3 is devoted to the discussion of a single O-U process. Section 4 provides analytical solutions to the coupled O-U process and Section 5 compares different metrics for the coupled O-U process. Conclusions are found in Section 6. In Appendix A, we present how to solve the Fokker–Planck equation(s) numerically by using a second-order accuracy method in time and compare analytical results with numerical results. Appendix B comments on the Langevin equation for our coupled O-U process.

2. Information Length and Other Metrics

We consider a PDF P ( x , t ) for a stochastic variable x in the following.

2.1. Information Length

The information length L ( t ) between time 0 and t is given by
L ( t ) = 0 t d t τ ( t ) = 0 t d t d x P ( x , t ) t 2 1 P ( x , t ) ,
where 1 | τ ( t ) | 2 is the second moment given by,
1 | τ ( t ) | 2 = d x P ( x , t ) t 2 1 P ( x , t ) .
Here, τ ( t ) has the unit of time while L has no dimension. The parameter τ ( t ) is the characteristic timescale of the system, and quantifies the correlation time for the system [15]. Hence, 1 τ ( t ) is the rate of change of the information in time. Integrating 1 τ ( t ) over [ 0 , t ] gives the total number of statistically different states that a system passes through in time. We note that L quantifies the information change in time through the root-mean-squared fluctuating energy, using the second moment of the partial derivative with respect to time. When the parameters governing a PDF are known, the information length L ( t ) can be written in terms of the Fisher information metric [8,12,15].
As noted in Section 1, L ( t ) is a Lagrangian quantity and has the property of being a local measure, being sensitive to how P ( x , t ) evolves at different x in time. In comparison with entropy which is independent of the spatial (x) gradient of a PDF, it is this property that may elevate L ( t ) above entropy in revealing micro-scale interactions within a system.
The discrete version of Equations (1) and (2) are as follows:
L n = h i = 0 n 1 τ i ,
1 τ i 2 = s h 2 j P i j ln ( P i + 1 j ) ln ( P i j ) 2 .
Here, i and j represent the discrete time and spatial point, respectively; P i j is the discrete version of P ( x , t ) , h = t / n is the time step while s is the spatial step.

2.2. Other Metrics

Here, we list the metrics taken from [11]. We will calculate each metric relative to the initial state PDF P ( x , 0 ) at t = 0 in order to compare their time-evolution with that of the information length. That is, the metrics below are based on comparing two PDFs P ( x , 0 ) and P ( x , t ) . For convenience, the metrics are given both for the continuous process and discrete approximation (that is used for numerical calculation) by using i , j as an index representing time and space, respectively and s as the spatial step, as above. The reference probability P ( x , 0 ) [ P 0 ] will be the initial PDF while P ( x , t ) [ P i ] is the PDF at time t [i].

2.2.1. Euclidean norm

| | P ( x , t ) P ( x , 0 ) | | 2 = d x P ( x , 0 ) P ( x , t ) 2 , | | P 0 P i | | 2 = s j [ P 0 j P i j ] 2 .
Applying our standard notion of distance in Euclidean space seems like a natural extension. However, it quickly becomes apparent that the statistical space of stochastic systems is rarely well described by Euclidean metrics. Mostly included here as a base case, whilst this formulation seems appealing, it does not yield illuminating details about the disequilibrium of the system. We will use it as an example of a poor measure of information change.

2.2.2. Wootters’ Distance

W [ ( P ( x , 0 ) , ( P ( x , t ) ] = cos 1 d x [ P ( x , 0 ) ] 1 2 [ P ( x , t ) ] 1 2 , W [ P 0 , P i ] = cos 1 s j [ P 0 j ] 1 2 [ P i j ] 1 2 .
This metric, as the notion of statistical distance itself, originates in quantum information theory [9]. However, as quantum information theory is purely statistical in formulation, it can be applied to any systems defined by a PDF. Fundamentally, this metric is based on the principle that any finite number of measurements on a stochastic system will yield results that may not be exactly the same as underlying probability distributions. It would be impossible to distinguish 2 states whose real underlying probabilities are different less than a typical fluctuation of the error of measurement. This intrinsically defines a resolution for the system. The Wootters’ distance was shown to be a monotone transformation of the Hellinger distance [22].

2.2.3. Kullback-Leibler Relative Entropy

K ( P ( x , 0 ) | P ( x , t ) ) = d x P ( x , 0 ) ln P ( x , 0 ) P ( x , t ) , K [ P 0 | P i ] = s j P 0 j log P 0 j P i j .
Kullback–Leibler relative entropy was first introduced by Solomon Kullback and Richard Leibler [10], and sometimes is referred to as the Kullback–Leibler divergence. It represents a measure of the difference between a probability distribution and some other reference probability distribution. Whilst a useful tool, it is not strictly a metric as it does not satisfy the triangle inequality. It is however used in the definition of some other quantities, such as the mutual information of two co-varying random variables, and the Jensen divergence.

2.2.4. Jensen Divergence

J ( P ( x , 0 ) | P ( x , t ) ) = 1 2 K ( P ( x , 0 ) | P ( x , t ) ) + K ( P ( x , t ) | P ( x , 0 ) ) , J [ P 0 | P i ] = 1 2 K [ P 0 | P i ] + K [ P i | P 0 ] .
The Jensen divergence is simply the symmetric version of the Kullback–Leibler divergence. Often it is referred to as the Jensen distance, and the square root of this quantity can be shown to be a metric [11], which can allow us to examine the statistical geometry of the system. The Jensen divergence is the mutual information of a random variable x, with a mixture distribution from P ( x , 0 ) and P ( x , t ) , and a binary indicator variable used to build the distribution. In other words it is a measure of the mutual dependence of x on the way you construct the mixture, and thus quantifies the amount of information difference between the two distributions.

3. The O-U Process

The one-dimensional O-U process is based on the Langevin equation
x ˙ = γ ( x μ ) + ξ ,
where x is the stochastic variable (e.g., position, velocity, etc.), γ is the damping constant, μ is the position of the attractor for the system, and ξ is a δ -correlated, Gaussian-distributed stochastic forcing, i.e.,
ξ ( t 1 ) ξ ( t 2 ) = 2 D δ ( t 1 t 2 ) ,
where D is the strength of the stochastic forcing. The corresponding Fokker–Planck equation [23,24] is given by
P t = x [ γ ( x μ ) P ] + D 2 P x 2 ,
where the solution P = P ( x , t ) is the time-dependant PDF which describes the evolution of the system. It can be shown that the solution to Equation (11) is given by [13]
P ( x , t ) = β π e β ( x x ) 2 ,
where 1 2 β = e 2 γ t 2 β 0 + D γ ( 1 e 2 γ t ) and x = x 0 e γ t , given the initial condition
P ( x , 0 ) = β 0 π e β 0 ( x x 0 ) 2 ,
where β = 1 / 2 ( x x ) 2 and x in Equation (12) represent the inverse temperature and the mean value of x, respectively, and β 0 and x 0 in Equation (13) are the values of β and x at t = 0 , respectively.

3.1. Information Length

In [13], we showed that the information length L ( t ) for the O-U process is given by
L = 1 2 ln y r y + r y i y f + 2 r y i y f q r r 2 y 2 + q r r 2 d y ,
where y = r 2 + q T , r = 2 β 0 D γ , q = β 0 γ x 0 2 , and T = β 0 D ( e 2 γ t 1 ) + γ . y i is y evaluated at the initial time ( t = 0 in our case) and y f is y evaluated at final time. x 0 is the initial mean position.
Let the integral in the last term of Equation (14) be H, and let r q 0 . Then Equation (14) can be written as
L = 1 2 ln y r y + r y i y f + 2 r H ,
where
H = q r r 2 tan 1 Y q r r 2 if   q r r 2 > 0 , r 2 r q 2 ln Y r 2 r q Y + r 2 r q if   q r r 2 < 0 .
Note that this is continuous through q = r . For q = 0 , we can directly compute
L = 1 2 | r | r ln T T + r .
For r = 0 we have
L = 2 q 1 T T i T f .

3.2. Wootters’ Distance

By using Equations (12) and (13) in (6), we obtain
W [ P ( x , 0 ) , P ( x , t ) ] = β 0 β π 2 1 4 d x e β 0 2 ( x x 0 ) 2 β 2 ( x x ) 2 = 2 β 0 β β 0 + β 1 2 exp β β 0 ( x 0 x ) 2 2 ( β 0 + β ) ,
where x = x 0 e γ t .

3.3. Kullback–Leibler Relative Entropy

By using Equations (12) and (13) in (7), we can show
K [ P ( x , 0 ) | P ( x , t ) ] = d x P ( x , 0 ) ln β 0 β e β 0 ( x x 0 ) 2 + β ( x x ) 2 = ln β 0 β + d x P ( x , 0 ) β 0 ( x x 0 ) 2 + β ( x x ) 2 = ln β β 0 + β ( x 0 x ) 2 + β 2 β 0 1 2 ,
where x = x 0 e γ t .

3.4. Jensen Divergence

By using Equations (12) and (13) in (8), we obtain
J [ P ( x , 0 ) | P ( x , t ) ] = β + β 0 2 ( x 0 x ) 2 + β 2 + β 0 2 4 β 0 β 1 2 .

3.5. Comparison

Figure 1 shows the final value for each metric as we vary the initial position x 0 of the system. The total information length L = L ( t ) , Wootters’ distance, K-L relative entropy and Jensen divergence against x 0 are shown in blue, orange, green and red, respectively, in the long time limit as t . Note that the green and red lines are overlapped. It is notable in Figure 1 that the linear relation between the metric and the initial mean value x 0 is obtained only by the information length. That is, it is only the information length that preserves the linear geometry underlying a linear stochastic process. For all other metrics, this linear relation is lost. We show in the Appendix A that our analytical metrics in Figure 1 has a good agreement with those calculated directly from the numerical solutions to the Fokker–Planck Equation (11) by time-stepping (see Figure A1).

4. The Coupled O-U Process

We now consider the coupled system of equations
P 1 t = x [ γ 1 ( x μ ) P 1 ] + D 2 P 1 x 2 f 0 P 1 + g 0 P 2 ,
P 2 t = x [ γ 2 ( x μ ) P 2 ] + D 2 P 2 x 2 + f 0 P 1 g 0 P 2 ,
where D is the strength of a short-correlated Gaussian noise given by Equation (10). These equations are a pair of O-U processes, linked by coupling constants f 0 and g 0 . The coupling f 0 and g 0 are due to the Dichotomous noise [25] (see Appendix B for the Langevin equation corresponding to Equations (22) and (23)).
We choose a coupled system like this one to examine the localized dynamics of these interacting sub-processes. This system could model any process for which there are two competing components relaxing to an equilibrium, like evaporation in a closed system, or a reversible chemical reaction.
Since we are mainly interested in the relaxation process from non-equilibrium initial states, we choose the different initial conditions for P 1 and P 2 while for simplicity, considering the case where γ 1 = γ 2 = γ and f 0 = g 0 = ϵ where the Fokker–Planck Equations (22) and (23) are reduced to
t P 1 ( x , t ) = x γ P 1 ( x , t ) + D 2 x 2 P 1 ( x , t ) + ϵ ( P 2 P 1 ) ,
t P 2 ( x , t ) = x γ P 2 ( x , t ) + D 2 x 2 P 2 ( x , t ) + ϵ ( P 1 P 2 ) .
Specifically, as initial conditions, we use the following different Gaussian PDFs
P 1 ( x , 0 ) = 1 2 β 10 π exp [ β 10 x 2 ] ,
P 2 ( x , 0 ) = 1 2 β 20 π exp [ β 20 ( x x 0 ) 2 ] .
Note that β 10 and β 20 are the initial inverse temperatures for P 1 and P 2 , respectively. On the other hand, we fix the initial mean of P 1 to be zero while the initial P 2 is taken to have any arbitrary mean value x 0 . We also note that as t , P 1 and P 2 approach the same PDF.
To solve Equations (24) and (25), we take the Fourier transform [ P ˜ m ( k , t ) = d x e i k x P m ( x , t ) for m = 1 , 2 ] and use the characteristic equation to recast Equations (24) and (25) as
d P ˜ 1 d t = D k 2 P ˜ 1 + ϵ ( P ˜ 2 P ˜ 1 ) ,
d P ˜ 2 d t = D k 2 P ˜ 2 + ϵ ( P ˜ 1 P ˜ 2 ) .
Here d d t = t + d k d t x is the total derivative along the characteristic defined by
d k d t = γ k ,
which has the solution
k ( t ) = k 0 e γ t ,
where k 0 = k ( 0 ) is the initial wavenumber. We solve Equations (28) and (29) in terms of new variables
P ¯ m = P ˜ m exp ϵ t + D 0 t d t 1 k ( t 1 ) 2 ,
for m = 1 , 2 . The coupled Equations (28) and (29) are then simplified as
d P ¯ 1 d t = ϵ P ¯ 2 ,
d P ¯ 2 d t = ϵ P ¯ 1 .
We write down the solutions to Equations (33) and (34) using the two constants a and b
P ¯ 1 = a e ϵ t + b e ϵ t ,
P ¯ 2 = a e ϵ t b e ϵ t .
To determine a and b, we take the Fourier transform of the initial conditions Equations (26) and (27) to obtain
P ˜ 1 ( 0 ) = 1 2 e k 0 2 4 β 2 ,
P ˜ 2 ( 0 ) = 1 2 e i k 0 x k 0 2 4 β 1 .
Thus, evaluating Equation (32) for m = 1 , 2 at t = 0 and equating them to Equations (37) and (38), we find
a = 1 2 P ˜ 1 ( 0 ) + 1 2 e i k 0 x P ˜ 2 ( 0 ) ,
b = 1 2 P ˜ 1 ( 0 ) 1 2 e i k 0 x P ˜ 2 ( 0 ) .
On the other hand, using Equation (31) in Equation (32), we write P ˜ m ( m = 1 , 2 ) in terms of P ¯ m as
P ˜ m = exp [ ϵ t D 2 γ ( 1 e 2 γ t ) k 2 ] P ¯ m .
Finally, taking the inverse Fourier transform [ P m ( x , t ) = 1 2 π d x e i k x P ˜ m ( x , t ) for m = 1 , 2 ] and performing several Gaussian integrals, we obtain
P 1 ( x , t ) = 1 4 β 1 π ( 1 + e 2 ϵ t ) e β 1 x 2 + β 2 π ( 1 e 2 ϵ t ) e β 2 ( x e γ t x 0 ) 2 ,
P 2 ( x , t ) = 1 4 β 1 π ( 1 e 2 ϵ t ) e β 1 x 2 + β 2 π ( 1 + e 2 ϵ t ) e β 2 ( x e γ t x 0 ) 2 ,
where
1 2 β m = e 2 γ t 2 β m 0 + D γ 1 e 2 γ t ,
for m = 1 , 2 . We can check that at t = 0 , Equations (42) and (43) recover Equations (26) and (27). On the other hand, in the limit of t , Equations (42) and (43) give us
P m ( x , t ) = 1 2 β m ( t ) π e β m ( t ) x 2 ,
which is the stationary solution to a single O-U process where β m ( t ) = γ 2 D . We note that the total PDF P = P 1 + P 2 is the solution of this single O-U process with the initial condition given by the sum of Equations (26) and (27).
Using these analytical solutions in Equations (42)–(44), we present the different metrics in Equations (1) and (5)–(8) in Section 5.

5. Results for the Coupled O-U Process

For the illustration in this section, we show results for the fixed parameter values γ = 0.1 , D = 1 and ϵ = 0.5 . We recall from Section 4 that for these parameter values, in equilibrium (see Equation (44)), β 1 = β 2 = β ( t ) = γ 2 D = 0.05 while the mean values in Equation (45) are zero for both P 1 and P 2 . For comparing metrics, we consider the case where P 1 is initially in the final equilibrium with the zero mean value and inverse temperature β 10 = 0.05 . On the other hand, P 2 at t = 0 is taken to have either different mean values x 0 or different inverse temperatures β 20 . Here, we present results obtained by using analytical solutions in Section 4 only. (See Appendix A for the numerical solutions and comparison with the analytical solutions.)

5.1. Varying β 20

We first examine how the system changes when varying the initial inverse temperature β 20 of P 2 for the fixed zero mean position ( x 0 = 0 ) (the equilibrium value). We investigate what changes are detected by each metric.
Figure 2 shows the total information length (in blue), Euclidean norm (in orange), Wootters’ distance (in green), K-L relative entropy (in red), and Jensen divergence (in purple) against β 20 in the long time limit as t . Panels (a), (b) and (c) are for the overall system P, and the components P 1 and P 2 , respectively. Since P 1 is initially chosen to be in its equilibrium state, the behavior for the overall system P in panel (a) is more similar to P 2 in panel (c) than P 1 in panel (b). What is prominent in both panels (a) and (c) is the presence of the distinct minimum in the information length around β 20 = 0.05 . This is because the final equilibrium has the inverse temperature 0.05 , demonstrating that the total information also maps out the underlying attractor structure when varying β 20 , taking its minimum value at the equilibrium state (reminiscent of the results for the single O-U process above and previous works [13,16,18]). The minimum value around β 20 = 0.05 is also observed for other metrics (apart from the Euclidean norm) although less pronounced than the total information length. In fact, for β 20 = β 10 = β ( t ) , there is no temporal change in both P 1 and P 2 , so all the metrics apart from the Wootters’ one take the value of zero. Also, of interest is an almost linear increase in the total information length for P 2 in panel (c) as | β 20 0.05 | increases.
Now, what is happening to P 1 which starts with the equilibrium state ( β 10 = 0.05 and zero mean value)? While Equation (44) shows that β 1 = β 10 = γ / 2 D = 0.05 for all time, the actual PDF P 1 in Equation (42) changes with time due to its coupling to P 2 via ϵ . That is, P 1 changes over time, initially deviating from the equilibrium state due to the interaction with P 2 and then finally relaxing back to the final equilibrium state. Associated with this time evolution of P 1 is the information change which can be measured by different metrics. However, Figure 2b shows that for P 1 , it is only information length that detects a noticeable difference in the information change as β 20 changes. Furthermore, the information length for P 1 takes the minimum value for the equilibrium value of β 20 = 0.05 , as was the case for P 2 . This result demonstrates that the information length is sensitive to the evolution of of the component P 1 (a subsystem).
To investigate the evolution of the metrics for the P 1 around β 20 , we show in Figure 3 how the metrics evolve over time for the component P 1 for the two different values of β 20 = 0.02 and 0.08 near the equilibrium value β 20 = 0.05 . Panel a) is for β 20 = 0.02 < 0.05 and panel b) is for β 20 = 0.08 > 0.05 . The different metrics are denoted by using the same color as used in Figure 2. This small deviation of β 20 from the equilibrium value induces a small change in P 1 over time due to the coupling to P 2 , as noted above. However, in panels a) and b), we see a significant change in the Euclidean norm, K-L relative entropy, and Jensen divergence in time, with a large increase before settling down to a lower value. In comparison, the information length shows no such spike. These spikes are caused by the deviation of P 1 from its initial equilibrium state due to the coupling to P 2 before settling into the equilibrium state as P 2 approaches the equilibrium. What is interesting is that the information length does not show such a spike since it measures the local information change; this would thus be the more sensible view in this instance since the change in the component P 1 is small compared to its width (uncertainly). Furthermore, such spikes do not appear for P 2 nor P (results not shown) since P 2 starting from a non-equilibrium state monotonically approaches its equilibrium.

5.2. Varying x 0

We now fix β 20 = 0.05 and vary the initial mean position x 0 of P 2 to examine how metrics depend on x 0 , as we have done for the single O-U process in Section 3. Figure 4 shows the total information change for each metric for different values of x 0 , for P, P 1 and P 2 in panels (a), (b), and (c) respectively. Specifically, Figure 4a shows that for P, the total information length against x 0 is linear, capturing the linearity of the system as expected from the single O-U process. None of the other metrics are capable of showing the linear relationship in the same way. On the other hand, Figure 4b,c shows an interesting non-monotonic dependence of the information length on x 0 .
To understand this, we show in Figure 5 the time evolution of P, P 1 , and P 2 in panels (a), (b), and (c), respectively, by using x 0 = 20 . Of interest is that the evolution of P 1 and P 2 in Figure 5b,c involves the formation of the two peaks from the initial one peak, followed by merging of these two peaks into one peak as a system settles into the equilibrium. This formation of the two peaks is due to the interaction between P 1 and P 2 when they are initially widely separated for a sufficiently large x 0 20 . The formation of two PDFs peaks for x 0 20 leads to the maximum in the total information length around x 0 = 20 in Figure 4b,c. Specifically, the formation of the two peaks in P 1 and P 2 shown in Figure 5b,c takes place when the two peak are a full PDF width apart, and facilitates broadening of both PDFs in the relaxation process. As x 0 is further increased from 20, P 1 and P 2 form two peaks which are more widely separated, leading to P 1 and P 2 becoming effectively broader. This in turn reduces the information length (as x 0 increases further from x 0 = 20 ) since large fluctuations (uncertainty) associated with a broad PDF reduces the information length. Again, the information length is the only measure which detects the difference in the overall information change for P 1 due to its sensitive dependence on the local evolution of a system.

6. Conclusions

When searching for a way to quantify the information change in a given dynamical system, our choices are many and varied. Our aim here was to show the power of the information length L when compared with some of the more popular methods of measuring information change. Utilizing the O-U, we compared several relative-entropy formulations with our information length to investigate what each could reveal about the dynamics of system.
Specifically, we investigated the relaxation problem given by a single and coupled O-U process and compared the information length L with K-L relative entropy, Jensen divergence, Wootters’ distance, and Euclidean norm. By measuring the total information length in the long time limit, we showed that L was unique in detecting the linear spatial relationship between the total information change and the initial position of a PDF. In the coupled O-U process, the information length was shown to be the most effective in detecting changes in the components of the system even when the others would detect almost nothing in one of the components. In particular, when P 1 started with an equilibrium state with the zero mean value, the formation of the two peaks of P 1 (or P 2 ) from an initial one peak P 1 (or P 2 ) and the merging of the two peaks into one peak as a system settled into equilibrium was detected only by the information length with its intriguing non-monotonic dependence on x 0 (the mean value of P 2 ). This underscores the sensitivity of the information length on the evolution of subsystems.
Future work will include the study of a system with multiple attractor positions for the system or how the system behaves when changing the position of the attractor. It would also be interesting to examine the case where the coupling parameters f 0 and g 0 are not constant, but are functions of time. This could result in a periodic equilibrium where the PDF varies between 2 or more unstable states. This could represent physical systems like reversible chemical reactions, or even fluctuating financial markets. It would also be of interest to investigate implication of the information length for the deep neural network [26], in particular, to elucidate the role of geodesic along which the information length is minimized [27].

Author Contributions

Conceptualization, E.-j.K.; methodology, J.H. and E.-j.K.; software, J.H.; investigation, J.H. and E.-j.K.; writing—original draft preparation, J.H.; writing—review and editing, E.-j.K.; visualization, J.H.; supervision, E.-j.K.

Funding

This research was funded by Engineering & Physical Sciences Research Council Doctoral Training Partnership Scholarship (J.H.) and by Leverhulme Trust Research Fellowship RF-2018-142-9 (E.-j.K.).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

In this appendix, we show the numerical solutions to the Fokker-Planck equation by time-stepping and metrics using these numerical solutions for the O-U process and the coupled O-U processes.

Appendix A.1. The O-U Process

We use the finite difference method to approximate the partial derivatives from Equation (11), where h is our time-step, s is our spatial step, i is the current time step, and j is the current spatial step.
P i + 1 j P i j h = γ ( x j x 0 ) [ P i j + 1 P i j ] s + D P i j + 1 2 P i j + P i j 1 s 2 .
Then using a second-order Runga-Kutta Method, we create a time-stepped solution. Specifically, we first create an intermediate value P ˜ by
( P ˜ ) i + 1 j = P i j + h f ( t i , P i j ) ,
where
f ( t i , P i j ) = γ ( x j x 0 ) [ P i j + 1 P i j ] s + D P i j + 1 2 P i j + P i j 1 s 2 ,
which is the time-gradient for the system. Here, t i is the current time value. Then we use the intermediate value to calculate the next time step value, P i + 1 j , using the formula
P i + 1 j = P i j + h 2 [ f ( t i , P i j ) + f ( t i + 1 , P ˜ ) i + 1 j ) ] .
For simplicity, we set D = 1 . This is to consider the case of the one parameter system, as the O-U process can always be arranged to combine γ and D into a single parameter. For our stepping, we use h = 0.04 and s = 0.1 . We check on the convergence of our solutions by reducing h and s. We use the initial condition given by Equation (13). All solutions are coded in pythonuni We present the numerically computed metrics from the time-stepping in Figure A1, which shows the final value for each metric as we vary the initial position of the system, x 0 . Figure A1 is quite similar to the analytically calculated metrics shown in Figure 1, demonstrating a good agreement between the analytical and numerical solutions.
To test the accuracy of our time-stepped model further, Figure A2 compares the time evolution of different metrics for the largest x 0 value, i.e., the largest perturbation from equilibrium in our testing, with analytic and time-stepped solutions on the left and right respectively. The agreement between the analytical and numerical solutions is observed to be quite good.
Figure A1. The metrics calculated by the numerical solutions to the Fokker–Planck equation by time-stepping.
Figure A1. The metrics calculated by the numerical solutions to the Fokker–Planck equation by time-stepping.
Entropy 21 00775 g0a1
Figure A2. Comparing the evolution of the metrics with x 0 = 30 . Panels (a) and (b), respectively, are obtained by using the analytical and numerical solutions to the Fokker–Planck equation.
Figure A2. Comparing the evolution of the metrics with x 0 = 30 . Panels (a) and (b), respectively, are obtained by using the analytical and numerical solutions to the Fokker–Planck equation.
Entropy 21 00775 g0a2

Appendix A.2. Coupled O-U Process

We solve (22) and (23) numerically by time-stepping, using the same method as for the single equation case. Our discrete versions of (22) and (23) become
( P 1 ) i + 1 j ( P 1 ) i j h = γ 1 ( x j μ 1 ) [ ( P 1 ) i j + 1 ( P 1 ) i j ] s + D 1 ( P 1 ) i j + 1 2 ( P 1 ) i j + ( P 1 ) i j 1 s 2 ϵ ( P 1 ) i j + ϵ ( P 2 ) i j
( P 2 ) i + 1 j ( P 2 ) i j h = γ 2 ( x j μ 2 ) [ ( P 2 ) i j + 1 ( P 2 ) i j ] s + D 2 ( P 2 ) i j + 1 2 ( P 2 ) i j + ( P 2 ) i j 1 s 2 + ϵ ( P 1 ) i j ϵ ( P 2 ) i j ,
where h is our time-step, and s is our spatial-step. We then use Huen’s method to approximate the time-evolution for the system. So our final numerical model becomes
( P 1 ˜ ) i + 1 j = ( P 1 ) i j + h f 1 ( t i , ( P 1 ) i j , ( P 2 ) i j ) ,
( P 2 ˜ ) i + 1 j = ( P 2 ) i j + h f 2 ( t i , ( P 1 ) i j , ( P 2 ) i j ) ,
( P 1 ) i + 1 j = ( P 1 ) i j + h 2 [ f 1 ( t i , ( P 1 ) i j , ( P 2 ) i j ) + f 1 ( t i + 1 , ( P 1 ˜ ) i + 1 j , ( P 2 ˜ ) i + 1 j ) ] ,
( P 2 ) i + 1 j = ( P 2 ) i j + h 2 [ f 2 ( t i , ( P 1 ) i j , ( P 2 ) i j ) + f 2 ( t i + 1 , ( P 1 ˜ ) i + 1 j , ( P 2 ˜ ) i + 1 j ) ] ,
where
f 1 ( t i , ( P 1 ) i j , ( P 2 ) i j ) = γ 1 ( x j μ 1 ) [ ( P 1 ) i j + 1 ( P 1 ) i j ] s + D 1 ( P 1 ) i j + 1 2 ( P 1 ) i j + ( P 1 ) i j 1 s 2 ϵ ( P 1 ) i j + ϵ ( P 2 ) i j
f 2 ( t i , ( P 1 ) i j , ( P 2 ) i j ) = γ 2 ( x j μ 2 ) [ ( P 2 ) i j + 1 ( P 2 ) i j ] s + D 2 ( P 2 ) i j + 1 2 ( P 2 ) i j + ( P 2 ) i j 1 s 2 ϵ ( P 2 ) i j + ϵ ( P 1 ) i j ,
and ( P k ˜ ) i + 1 j are the intermediate values. Here we set D 1 = D 2 = 1 and use h = 0.04 and s = 0.1 .
We give our system components a Gaussian initial condition, given by
( P 1 ) 0 j = 1 2 β 1 π e β 1 ( x j ν 1 ) 2 ,
( P 2 ) 0 j = 1 2 β 2 π e β 2 ( x j ν 2 ) 2 ,
where β 1 and β 2 are the inverse temperature for P 1 and P 2 respectively, and ν 1 and ν 2 are their initial mean positions.
When varying the system parameters, we fix the parameters for P 1 at equilibrium ( ν 1 = 0 , β 1 = 0.05 , γ 1 = 0.1 ) and vary the parameters for P 2 ( ν 2 , β 2 ). The choice between fixing P 1 or P 2 is arbitrary. We solve the Fokker–Planck equations for the coupled O-U process (Equations (22) and (23)) by time-stepping and then numerically calculated the metrics. These are shown to have a good agreement with the results using the analytic formulation of the PDF (Equations (42) and (43)) in Section 5.

Appendix B

The Langevin equation corresponding to Equations (22) and (23) is given by
t x = ( x μ ) η ( t ) + ξ .
Here, ξ is the Gaussian noise given by Equation (10), and η ( t ) is the dichotomous Markov noise (DMN) [25,28] which takes the two value { γ 1 , γ 2 } , with constant transition rates k ± = f 0 , g 0 between the two states. A pure DMN case is governed by
t x = η ( t ) ,
and the waiting time τ ± in the two states are exponentially-distributed stochastic variable, with the probability
P ( τ ± ) = k ± e k ± τ ± .
Note that η ( t ) is not a white noise, but has the exponentially decaying correlation function as
η ( t ) η ( t ) = Q τ c exp | t t | τ c .
Here, Q = k + k τ c 3 ( γ 1 + γ 2 ) 2 and τ c = 1 k + + k is the characteristic relaxation time to the stationary state of the DMN. For the DMN, due to a finite correlation time, an exact, analytical time-dependent PDF, in general, is unavailable. For instance, the pure DMN case given by Equation (A17) leads to a coupled linear Fokker–Planck equation (e.g., see Equations (9) and (10) in [28]), but a resulting equation for the total PDF is a differential and integral equation in time (see, e.g., Equation (11) in [28]).

References

  1. Zamir, R. A proof of the Fisher information inequality via a data processing argument. IEEE Trans. Inf. Theory 1998, 44, 1246–1250. [Google Scholar] [CrossRef]
  2. Gibbs, A.L.; Su, F.E. On choosing and bounding probability metrics. Int. Stat. Rev. 2002, 70, 419–435. [Google Scholar] [CrossRef]
  3. Jordan, R.; Kinderlehrer, D.; Otto, F. The variational formulation of the Fokker–Planck equation. SIAM J. Math. Anal. 1998, 29, 1–17. [Google Scholar] [CrossRef]
  4. Lott, J. Some geometric calculations on Wasserstein space. Commun. Math. Phys. 2008, 277, 423–437. [Google Scholar] [CrossRef]
  5. Takatsu, A. Wasserstein geometry of Gaussian measures. Osaka J. Math. 2011, 48, 1005–1026. [Google Scholar]
  6. Otto, F.; Villani, C. Generalization of an Inequality by Talagrand and Links with the Logarithmic Sobolev Inequality. J. Funct. Anal. 2000, 173, 361–400. [Google Scholar] [CrossRef]
  7. Costa, S.; Santos, S.; Strapasson, J. Fisher information distance. Discret. Appl. Math. 2015, 197, 59–69. [Google Scholar] [CrossRef]
  8. Frieden, B.R. Science from Fisher Information; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
  9. Wootters, W.K. Statistical distance and Hilbert space. Phys. Rev. D 1981, 23, 357–362. [Google Scholar] [CrossRef]
  10. Kullback, S. Letter to the Editor: The Kullback-Leibler distance. Am. Stat. 1951, 41, 340–341. [Google Scholar]
  11. Kowalski, A.M.; Martin, M.T.; Plastino, A.; Rosso, O.A.; Casas, M. Distances in Probability Space and the Statistical Complexity Setup. Entropy 2011, 13, 1055–1075. [Google Scholar] [CrossRef]
  12. Information Length. Available online: https://encyclopedia.pub/238 (accessed on 29 July 2019).
  13. Heseltine, J.; Kim, E. Novel mapping in non-equilibrium stochastic processes. J. Phys. A 2016, 49, 175002. [Google Scholar] [CrossRef]
  14. Kim, E. Investigating Information Geometry in Classical and Quantum Systems through Information Length. Entropy 2018, 20, 574. [Google Scholar] [CrossRef]
  15. Kim, E.; Lewis, P. Information length in quantum systems. J. Stat. Mech. 2018, 043106. [Google Scholar] [CrossRef]
  16. Kim, E.; Hollerbach, R. Signature of nonlinear damping in geometric structure of a nonequilibrium process. Phys. Rev. E 2017, 95, 022137. [Google Scholar] [CrossRef]
  17. Kim, E.; Hollerbach, R. Geometric structure and information change in phase transitions. Phys. Rev. E 2017, 95, 062107. [Google Scholar] [CrossRef]
  18. Hollerbach, R.; Dimanche, D.; Kim, E. Information geometry of nonlinear stochastic systems. Entropy 2018, 20, 550. [Google Scholar] [CrossRef]
  19. Hollerbach, R.; Kim, E.; Mahi, Y. Information length as a new diagnostic in the periodically modulated double-well model of stochastic resonance. Physica A 2019, 525, 1313–1322. [Google Scholar] [CrossRef]
  20. Kim, E.; Hollerbach, R. Time-dependent probability density function in cubic stochastic processes. Phys. Rev. E 2016, 94, 052118. [Google Scholar] [CrossRef]
  21. Nicholson, S.B.; Kim, E. Investigation of the statistical distance to reach stationary distributions. Phys. Lett. A 2015, 379, 83–88. [Google Scholar] [CrossRef]
  22. Matey, A.; Lamberti, P.W.; Martin, M.T.; Plastron, A. Wotters’ distance resisted: A new distinguishability criterium. Eur. Rhys. J. D 2005, 32, 413–419. [Google Scholar]
  23. Risken, H. The Fokker-Planck Equation: Methods of Solution and Applications; Springer: Berlin, Germany, 1996. [Google Scholar]
  24. Klebaner, F. Introduction to Stochastic Calculus with Applications; Imperial College Press: London, UK, 2012. [Google Scholar]
  25. Bena, I. Dichotomous Markov Noise: Exact results for out-of-equilibrium systems (a brief overview). Int. J. Mod. Phys. B 2006, 20, 2825–2888. [Google Scholar] [CrossRef]
  26. Shwartz-Ziv, R.; Tishby, N. Opening the Black Box of Deep Neural Networks via Information. arXiv 2017, arXiv:1703.00810. [Google Scholar]
  27. Kim, E.; Lee, U.; Heseltine, J.; Hollerbach, R. Geometric structure and geodesic in a solvable model of nonequilibrium process. Phys. Rev. E 2016, 93, 062127. [Google Scholar] [CrossRef]
  28. Van Den Brock, C. On the relation between white shot noise, Gaussian white noise, and the dichotomic Markov process. J. Stat. Phys. 1983, 31, 467–483. [Google Scholar] [CrossRef]
Figure 1. The metrics against x 0 in the long time limit for a single Ornstein–Uhlenbeck (O-U) process.
Figure 1. The metrics against x 0 in the long time limit for a single Ornstein–Uhlenbeck (O-U) process.
Entropy 21 00775 g001
Figure 2. Behavior of the metrics for varying β 20 for the overall system in panel (a) and components P 1 in panel (b) and P 2 in panel (c).
Figure 2. Behavior of the metrics for varying β 20 for the overall system in panel (a) and components P 1 in panel (b) and P 2 in panel (c).
Entropy 21 00775 g002
Figure 3. The metrics against time for P 1 around β 20 = 0.05 . β 20 = 0.02 in panel (a) and β 20 = 0.08 in panel (b). The Y-axis scaling on the panels is 10 6 .
Figure 3. The metrics against time for P 1 around β 20 = 0.05 . β 20 = 0.02 in panel (a) and β 20 = 0.08 in panel (b). The Y-axis scaling on the panels is 10 6 .
Entropy 21 00775 g003
Figure 4. Behavior of the metrics for varying x 0 for the overall system P in panel (a), P 1 in (b), and P 2 in (c).
Figure 4. Behavior of the metrics for varying x 0 for the overall system P in panel (a), P 1 in (b), and P 2 in (c).
Entropy 21 00775 g004
Figure 5. Time-dependent partial differential equations (PDFs) for P in panel (a), for P 1 in panel (b) and for P 2 in panel (c); x 0 = 20 .
Figure 5. Time-dependent partial differential equations (PDFs) for P in panel (a), for P 1 in panel (b) and for P 2 in panel (c); x 0 = 20 .
Entropy 21 00775 g005

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Back to TopTop