Information Geometry of Spatially Periodic Stochastic Systems

We explore the effect of different spatially periodic, deterministic forces on the information geometry of stochastic processes. The three forces considered are f0=sin(πx)/π and f±=sin(πx)/π±sin(2πx)/2π, with f- chosen to be particularly flat (locally cubic) at the equilibrium point x=0, and f+ particularly flat at the unstable fixed point x=1. We numerically solve the Fokker–Planck equation with an initial condition consisting of a periodically repeated Gaussian peak centred at x=μ, with μ in the range [0,1]. The strength D of the stochastic noise is in the range 10-4–10-6. We study the details of how these initial conditions evolve toward the final equilibrium solutions and elucidate the important consequences of the interplay between an initial PDF and a force. For initial positions close to the equilibrium point x=0, the peaks largely maintain their shape while moving. In contrast, for initial positions sufficiently close to the unstable point x=1, there is a tendency for the peak to slump in place and broaden considerably before reconstituting itself at the equilibrium point. A consequence of this is that the information length L∞, the total number of statistically distinguishable states that the system evolves through, is smaller for initial positions closer to the unstable point than for more intermediate values. We find that L∞ as a function of initial position μ is qualitatively similar to the force, including the differences between f0=sin(πx)/π and f±=sin(πx)/π±sin(2πx)/2π, illustrating the value of information length as a useful diagnostic of the underlying force in the system.


Introduction
It is of interest to apply the idea of a metric to problems involving stochastic processes, e.g., [1][2][3][4][5][6]. Given a metric, the differences between different Probability Density Functions (PDFs) can be quantified, with different metrics focusing on a range of aspects, and hence most suitable for various applications. Fisher information [7] yields a metric where distance is measured in units of the PDF's width. The distance in the Fisher metric is thus dimensionless, and represents the number of statistically different states [8].
By extending the statistical distance in [8] to time-dependent situations, we recently introduced a way of quantifying information changes associated with time-varying PDFs [9][10][11][12][13][14][15][16]. We first compare two PDFs separated by an infinitesimal increment in time, and consider the corresponding infinitesimal distance. Integrating in time gives the total number of statistically distinguishable states that the system passes through, called the information length L, e.g., [6][7][8]14]. Another interpretation of L that can be useful is as a measure of the total elapsed time in units of an 'information-change' dynamical timescale.
We start by defining the dynamical time τ(t) as That is, τ(t) is the characteristic timescale over which the information changes, and quantifies the PDF's correlation time. Alternatively, 1/τ quantifies the (average) rate of change of information in time. A PDF that evolves such that E is constant in time is referred to as a geodesic, along which the information propagates at a uniform rate [6]. The information length L(t) is then defined by which can be interpreted as measuring time in units of τ. It is important to note that L has no dimension (unlike entropy) and represents the total number of statistically different states that a system passes through in time between 0 and t. If we know the parameters that determine the PDF p(x, t), E and L in Equations (1) and (2) can be written in terms of the Fisher metric tensor defined in the statistical space spanned by those parameters. However, it is not always possible to have access to the parameters that govern PDFs, for instance, in the case of PDFs calculated from data. The merit of Equations (1) and (2) is thus that E and L can be directly calculated from PDFs even without knowing the parameters governing the PDFs, nor the Fisher metric. For instance, L was calculated from PDFs of music data in [10]. In the work here, we first compute time-dependent PDFs by solving the Fokker-Planck equation numerically, and then calculate E and L from these PDFs as additional diagnostics. Unlike quantities such as entropy, relative entropy, Kullback-Leibler divergence, or Jensen divergence, information length is a Lagrangian measure, that is, it includes the full details of the PDF's evolution, and not just the initial and final states. L ∞ , the total information length over the entire evolution, is then particularly useful to quantify the proximity of any initial PDF to a final attractor of a dynamical system. In previous work [12,15] we explored these aspects of L for restoring forces that were power-laws in the distance to the attractor. For instance, for the Ornstein-Uhlenbeck process, which is a linear relaxation process, we showed that L consists of two parts: the first is due to the movement of the mean position measured in units of the width of the PDF, and the second is due to the entropy change. Thus, the total entropy change that is often discussed in previous works (e.g., [17]) contributes only partially to L. Importantly, for the Ornstein-Uhlenbeck process, L ∞ increases linearly from the stable equilibrium point (with its minimum value at the stable equilibrium point) with the mean position of the initial PDFs regardless of the strength of the stochastic noise and the width of the initial PDFs. The linear relation indicates that a linear process preserves a linearity of the underlying process. Heseltine & Kim [18] shows that this linear relation is lost for other metrics (e.g., Kullback-Leibler divergence, Jensen divergence). Note that L is related to the integral of the square root of the infinitesimal relative entropy (see Appendix A). In comparison, for a chaotic attractor, L ∞ varies sensitively with the mean position of a narrow initial PDF, taking its minimum value at the most unstable point [9]. This sensitive dependence of L ∞ on the initial PDF is similar to a Lyapunov exponent.
These results highlight L ∞ as an alternative diagnostic to understand attractor structures of dynamical systems. It is this attractor structure that we are interested in in this paper. We thus focus on the relaxation problem as in [9,12,15,18] by considering periodic deterministic forces and elucidate the importance of the initial condition and its interplay with the deterministic forces in the relaxation and thus attractor structure.

Model
We consider the following nonlinear Langevin equation: Here x is a random variable; f (x) is a deterministic force; ξ is a stochastic forcing, which for simplicity can be taken as a short-correlated Gaussian random forcing as follows: where the angular brackets represent the average over ξ, ξ = 0, and D is the strength of the forcing.
In [15] we considered the choice f (x) = x n and investigated how varying the degree of nonlinearity n = 3, 5, 7 affects the system. In this work we take f (x) to be periodic in x, and explore some of the new effects this can create. The three choices of f (x) we consider are Figure 1 shows these profiles, which are all anti-symmetric in x, and periodic on the interval x ∈ [−1, 1]. All three choices have x = 0 as an attractor, and x = 1 as an unstable fixed point. The particular combinations of harmonics for f ± were chosen so that they are locally cubic rather than linear at either x = 0 (for f − ) or x = 1 (for f + ). In applications such a Brownian motors many specific choices of f (x) are considered to model particular physics. However, as noted in the introduction, we are here more interested in attractor structures in the relaxation problem, in particular, how initial conditions and stochastic noise interact with deterministic forces and the role of the asymmetry of the deterministic force and the stable and unstable fixed points on the local dynamics. Comparing these three periodic functions with the previous choices, two significant differences stand out. First, for f (x) = x n with n = 3, 5, 7, all initial conditions are pushed directly toward the origin, and there are no unstable fixed points. It is therefore of particular interest to see how the choices here behave for initial conditions near x = 1. Second, f (x) = x 3,5,7 all curve upward (that is, have f > 0 for all x > 0), whereas the choices here have different combinations of curvatures, which will turn out to have clearly identifiable effects.
The Fokker-Planck equation [19,20] corresponding to Equation (3) is In [15] we solved the corresponding equation by finite-differencing in x. For the periodic systems considered here, it is more convenient to start with the Fourier expansion The coefficients a k and b k are then time-stepped using second-order Runge-Kutta. The term ∂ x ( f (x)p) is separated out into the relevant Fourier components using a fast Fourier transform. (For the very simple choices of f (x) considered here, consisting of at most two Fourier modes, it would be straightforward to do this separation analytically, and thereby do the entire calculation purely in Fourier space, but the code was developed with more general choices for f (x) in mind, where this approach becomes increasingly cumbersome as the number of harmonics in f (x) increases. For such more general choices of f (x) the FFT approach is most convenient). Resolutions in the range K = 2 11 − 2 14 are used, and carefully checked to ensure fully resolved solutions. Time-steps were in the range 10 −4 -10 −5 , and were again varied to ensure proper accuracy. Another useful test of the numerical implementation is to monitor the coefficient a 0 : this is time-stepped along with the others, but must in fact remain constant if the total probability p dx = 2a 0 is to remain constant. It was found that if the initial condition is correctly set to have a 0 = 0.5, then this was maintained throughout the entire subsequent evolution.
The initial conditions are of the form that is, Gaussians centred at x = µ and having half-width scaling as √ D 0 . We are interested in the range µ ∈ [0, 1]; by symmetry the range µ ∈ [−1, 0] would behave the same, simply approaching x = 0 from the other direction.
This initial condition is also periodic, on the same x ∈ [−1, 1] interval as the entire problem. For the purposes of actually implementing Equation (8), it was most convenient to consider the range as being x ∈ [−0.5, 1.5]. In particular, for µ ∈ [0, 1] and the values of D 0 considered here, Equation (8) yields results at x = −0.5 and x = 1.5 that are different, but both are so vanishingly small that the discrepancy does not need to be smoothed out in defining the initial condition. If instead Equation (8) were implemented on either x ∈ [0, 2] or x ∈ [−1, 1], then µ near either 0 or 1 would be more awkward to handle correctly.
In [15] we also used a Gaussian initial condition, with D 0 = 10 −6 , and then explored the regime D = 10 −6 to 10 −9 . Here we are again interested in the regime D 0 ≥ D, which allows at least the initial parts of the evolution to be nondiffusive. Having the initial peak be so narrow that D 0 < D can also be interesting in other contexts (e.g., [21]), but diffusive effects are then necessarily important from the outset, which would obscure some of the dynamics of interest here. We therefore focus on the range D 0 = 10 −2 to 10 −4 , and D = 10 −4 to 10 −6 . In contrast, the timescales for f − are much longer, and vary substantially with D. Comparing the f − results here with Figure 1 in [15], we see that f − is exactly analogous to the previous f (x) = x 3 . This is because for x ≤ 0.5 the shape of f − is very close to a cubic. Similarly, for x ≤ 0.5 the shape of f 0 is still reasonably close to linear, and the evolution is therefore essentially like the linear Ornstein-Uhlenbeck process f (x) = x, for which an exact analytic solution exists [21]. It is only f + whose shape is already substantially different from either linear or cubic even on the interval x ≤ 0.5, being close to linear for x ≤ 0.2 but strongly curved for 0.2 < x ≤ 0.5. Correspondingly f + also shows a new effect, namely an initial reduction in the peak amplitudes. This effect becomes even more pronounced for f 0 and f − and the initial position µ = 0.95, in the bottom row of Figure 2. This reduction in the peak amplitudes is not caused by diffusive spreading but is a consequence of the non-diffusive (ξ = D = 0) evolution resulting from the interplay between an initial PDF and the deterministic force. We note in particular how D = 10 −4 to 10 −6 yield identical reductions in amplitudes here. It is worth comparing this with the non-diffusive evolution in [15] where the opposite behavior-an initial increase in peak amplitudes (the same effect as seen here for f − , µ = 0.5)-was observed. The interplay between the initial PDF and the deterministic force is elaborated below.

Results
If f (x) is such that it increases more rapidly than linearly, i.e., curves upward, then those parts of any initial condition furthest from the origin are pushed toward it fastest, whereas those parts closest move more slowly. The result is that an initial Gaussian peak bunches up on itself, causing the amplitude to increase. In contrast, if f (x) curves downward the opposite effect occurs, and an initial Gaussian peak is spread out, even before diffusion starts to play a role. Eventually of course the peak moves sufficiently close to the origin that the behaviour is as before, explaining why the behaviour at later times is similar to the previous µ = 0.5 results.
Finally, the behaviour for f + with µ = 0.95 is yet again different, namely an initial reduction in amplitude up to t ≈ 10, followed by an abrupt increase. This is caused by a fundamentally new peak forming at the origin, rather than the initial peak moving toward it. Note also that time here is on a logarithmic scale, corresponding to a very slow equilibration process, unlike the previous case f + with µ = 0.5. Figures 3-5 illustrate these various behaviours in more detail, showing the actual PDFs at different times for f 0 , f − and f + , respectively. Starting with f 0 , we see how the peak initially located at µ = 0.95 becomes broader as it moves toward the origin, an effect again not caused by diffusion, but rather by the curvature of f 0 at these values of x. Note for example how the solutions at t = 3 or 4 have much steeper leading edges (nearer to the origin) than trailing edges, caused by the trailing edges moving so much slower. Another feature to note is how parts of the solution reach the origin coming from the 'other' direction. That is, if the initial condition is a peak centred at µ = 0.95, and having half-width 0.07 (corresponding to D 0 = 10 −3 ), then a small but non-negligible portion of the initial condition is in the range x ≥ 1, as seen also in Figure 3. For this part of the initial condition the nearest attractor is x = 2 rather than x = 0. Viewed on the interval x ∈ [−0.2, 1.2], this part therefore approaches from negative x values, as seen at times t = 5 and 6. (The interval x ∈ [1.2, 1.8] or equivalently x ∈ [−0.8, −0.2] is not shown in these figures because the amplitudes are rather small there, due to the PDFs being very spread out as they traverse this range). Finally, between t = 6 and 9 we see how the two peaks coming from negative and positive x values combine to form the single final equilibrium consisting of a Gaussian centred at the origin.  Figure 4 shows the corresponding solutions for f − . For small durations the behaviour is very similar to that seen in Figure 3, except that it happens roughly twice faster (e.g., compare t = 0.5 in Figure 4 with t = 1 in Figure 3). This is readily understandable by noting that the slope of f − near x = 0.95 is roughly twice that of f 0 , yielding faster evolution. The later evolution is much slower though, with the merging of the two peaks only occurring between t = 30 and 100, and even t = 300 still displaying some asymmetry, and hence not yet the final quartic profile. This is the same very slow final adjustment process previously analysed in detail in [15], and is caused by f − being cubic rather than linear near the origin. Figure 5 shows the solutions for f + . We see the behaviour alluded to above, of an abrupt transition from one peak to another. Because f + is so flat near x = 1, there is hardly any tendency to push the initial peak away. Instead, it simply broadens out, slumping as it spreads. A new peak then forms at the origin, overtaking the original one in amplitude around t ≈ 10, as previously noted in Figure 2. Note though that long after this time a significant portion of the original peak still remains near x = 1, and this portion only fades away on very long timescales; x = 1 is an unstable fixed point, but f + is so small everywhere near x = 1 that there is very little tendency to push the solutions away from there. As noted in the introduction, we are particularly interested in the effects that these various different types of behaviour have on the information length quantities E (t) and L ∞ . Figure 6 shows E (t) for the same solutions as before in Figure 2. We see that E is initially uniform, and independent of D (provided D is sufficiently small in comparison with D 0 ), corresponding to the 'geodesic' behaviour first identified by [6]. For some configurations, E then immediately transitions to an exponential decay, whereas for others it first has a power-law decay before ultimately decaying exponentially. Correspondingly, the timescales to achieve E ≤ 10 −8 also vary dramatically, as seen by the various linear and logarithmic scales for t. Different scaling regimes signify fundamentally different dynamics.   to initial conditions that have farther to travel to reach the origin. Such initial conditions also spread out much more though, as seen above, and according to the interpretation of information length, this should indeed reduce L. Very close to x = 1 the L ∞ values are particularly small, because having peaks collapse in place and reform at the new location is an informationally very efficient way to move, as seen also in other contexts [13,22,23].
Finally, Figure 8 shows the time, call it T ∞ , needed for E to drop to 10 −8 . The precise cutoff E = 10 −8 is of course somewhat arbitrary, but as seen in Figure 6 is sufficiently small to be in the exponential decay regime in all cases. This is therefore a convenient measure of the time taken to reach L ∞ , and any even smaller cutoff would only add small increments to T ∞ (and essentially nothing to L ∞ ).
Starting with f 0 , we note first that T ∞ is on a linear scale, meaning that each reduction of D by a factor of 10 only adds a constant amount to T ∞ . This is the same effect already seen in Figure 2, where smaller D requires slightly longer to settle in to the final states. Equivalently, smaller D in Figure 6 remains in the flat, geodesic regime for slightly longer times. The other feature to note for f 0 is the behaviour near µ = 1, where T ∞ increases strongly, and increasingly abruptly for smaller D 0 . This can be understood by noting that if √ D 0 1 − µ, the initial condition Equation (8) is essentially zero at x = 1, whereas if µ is within √ D 0 of 1, Equation (8) does have a non-negligible component at x = 1. Therefore, if √ D 0 1 − µ the initial peak will simply move monotonically toward the origin, which occurs on a rapid timescale, whereas if 1 − µ ≤ √ D 0 the evolution will include a significant component of the slumping-in-place behaviour, which we saw only happens on slower timescales. For f − , the scale for T ∞ is logarithmic, so that each reduction of D by a factor of 10 increases T ∞ by a factor of √ 10. For intermediate values of µ, T ∞ is also essentially independent of µ. The equilibration time is completely dominated by the final settling-in time, just as in the cubic case in [15], and the initial motion of the peak toward the origin is negligible in comparison. For very small values of µ the behaviour is different, with much smaller values of T ∞ . If µ ≤ √ D 0 , the peak is essentially at the origin already, making the adjustment quicker. Finally, there is a similar end-effect for µ sufficiently close to 1; if 1 − µ ≤ √ D 0 , the initial peak is essentially at the unstable fixed point, and the evolution is the slumping-in-place behaviour, which has a faster final adjustment than if the peak moves toward the origin and then adjusts its shape there (but still scaling as D −1/2 ).
Finally, f + is qualitatively similar to f 0 , in the sense that T ∞ is a monotonically increasing function of µ. Indeed, for intermediate values of µ the behaviour is virtually identical to f 0 , with T ∞ increasing by a constant amount every time D is decreased by a constant factor. (This is simply not visible because T ∞ is on a logarithmic rather than linear scale here). Because f 0 and f + are both linear near the origin, the extremely slow final adjustment that happens for f − does not apply to either of them, leaving only this much weaker dependence on D. The behaviour near µ = 1, with the very strong increase in T ∞ , and again more abruptly for smaller D 0 , is again because this is the regime where the slumping-in-place behaviour occurs. Also, because f + is so much flatter near x = 1 than either of f 0 of f − , this slumping-in-place behaviour is much slower for f + than for the other choices (recall again how long the peak at x = 1 lasts in Figure 5). This explains why T ∞ is on a logarithmic scale for f + but on a linear scale for f 0 , even though for intermediate values of µ they exhibit the same (weak) scaling with D.

Conclusions
The results presented here extend our previous work [12,15] to the deterministic forces that are periodic in space. This naturally allows for forces f (x) that curve in opposite directions in different regions, as well as unstable fixed points. The deterministic force can also be adjusted to be particularly flat at either the stable equilibrium or the unstable fixed points, which both turn out to have important consequences, with either choice yielding particularly long timescales, scaling as D −1/2 . The interesting consequences of the interplay between an initial PDF and a deterministic force on the PDF evolution and scalings of different quantities were discussed in detail by considering three types of periodic deterministic forces and comparing the results.
In particular, we computed how the rate of information change and the resulting total information length L ∞ depend on the position of an initial Gaussian peak. We found that for all choices of f (x), the unstable fixed points yield comparatively small L ∞ , even though they are farthest away from the final equilibrium points. It is particularly interesting that L ∞ as a function of initial position qualitatively follows f (x), indicating the close connection between the information geometry and the underlying forcing.
Finally, we note that this work can be extended in many different directions, including: (1) If the initial condition is not one Gaussian peak for every period of f (x), but only one peak for some much larger (tending to infinity) number of periods of f (x), then one can study how this initial peak gradually spreads out, how that compares with pure diffusion, and what effect the precise shape of f (x) might have [24][25][26]. This is also related to so-called anomalous diffusion [27][28][29], which can be considerably more general though, involving fractional derivatives. (2) If the force also includes a constant component tending to push the initial condition in a particular direction, it is of interest to study situations where the constant force is comparable to the periodic component of the force [30][31][32].
(3) Allowing the force to vary in time as well as space is relevant to so-called Brownian motors [33][34][35][36]. Work in some of these areas is currently ongoing. and thus the infinitesimal distance dl(t 1 ) between t 1 and t 1 + dt as dl(t 1 ) = D[p 1 , p 2 ] = 1 √ 2 dx (∂ t 1 p(x, t 1 )) 2 p(x, t 1 ) dt. (A7) By summing dt(t i ) for i = 0, 1, 2, ..., n − 1 (where n = t/dt) in the limit dt → 0, we have where L(t) is the information length. Thus, L is related to the sum of infinitesimal relative entropy. Note that L is a Lagrangian distance between PDFs at time 0 and t and sensitively depends on the particular path that a system passed through reaching the final state. In contrast, the relative entropy D[p(x, 0), p(x, t)] depends only on PDFs at time 0 and t and thus does not tell us about intermediate states between initial and final states.