## 1. Introduction

In classical statistical mechanics, the Gaussian (or normal) distribution and mean-field type theories based on such distributions have been widely used to describe equilibrium or near equilibrium phenomena. The ubiquity of the Gaussian distribution stems from the central limit theorem that random variables governed by different distributions tend to follow the Gaussian distribution in the limit of a large sample size [

1,

2,

3]. In such a limit, fluctuations are small and have a short correlation time, and mean values and variance completely describe all different moments, greatly facilitating analysis.

Many systems in nature and laboratories are however far from equilibrium, exhibiting significant fluctuations. Examples are found not only in turbulence in astrophysical and laboratory plasmas, but also in forest fires, the stock market and biological ecosystems [

4,

5,

6,

7,

8,

9,

10,

11,

12,

13,

14,

15,

16,

17,

18,

19,

20,

21,

22,

23]. Specifically, anomalous (much larger than average values) transport associated with large fluctuations in fusion plasmas can degrade the confinement, potentially even terminating fusion operation [

6]. Tornadoes are rare, large amplitude events, but can cause very substantial damage when they do occur. In biology, the pioneering work of Delbrück on bacteriophages showed that viruses replicate in strongly fluctuating bursts [

24]. The fluctuations of the burst amplitudes were explained mathematically by stochastic autocatalytic reaction models first introduced in [

25]. Delbrück’s autocatalytic models predict discrete negative-binomial distributions, which can be well approximated by gamma distributions when the average number of particles is large. Furthermore, gene expression and protein productions, which used to be thought of as smooth processes, have also been observed to occur in bursts, leading to negative binomial and gamma distributed protein copy numbers (e.g., [

19,

20,

21,

22,

23]). Such rare events of large amplitude (called intermittency) can dominate the entire transport even if they occur infrequently [

8,

26]. They thus invalidate the assumption of small fluctuations with short correlation time, making mean value and variances meaningless. Therefore, to understand the dynamics of a system far from equilibrium, it is crucial to have a full knowledge of Probability Distribution Functions (PDFs), including time-dependent PDFs [

27].

The consequences of strong fluctuations in far-from-equilibrium systems are multiple. In physics, far-from-equilibrium fluctuations produce dissipative patterns, shift or wipe out phase transitions, etc. In economics, finance and actuarial science, strong fluctuations are important issues of risk evaluation. In biology, strong fluctuations generate phenotypic heterogeneity that helps multicellular organisms or microbial populations to adapt to changes of the environment by so-called “bet-hedging” strategies. In such a strategy, only a part of the cell population adapts upon emergence of new environmental conditions. The remaining part retains the memory of the old conditions and is thus already adapted if environmental conditions revert to previous ones [

28]. Exceptional behaviour can also rescue cell subpopulations from drug-induced lethal conditions, thus generating drug resistance [

29]. In particular, because of the skewness and exponential tail of the gamma distribution, gamma-distributed populations contain a significant proportion of individuals with an exceptionally high phenotypic variation. Bet-hedging being a dynamic phenomenon, it is important, for biological studies, to be able to predict not only steady-state, but also time-dependent distributions.

Obtaining good quality PDFs is often very challenging, as it requires a sufficiently large number of simulations or observations. Therefore, a PDF is usually constructed by averaging data from a long time series and is thus stationary (independent of time). Unfortunately, such stationary PDFs miss crucial information about the dynamics/evolution of non-equilibrium processes (e.g., tumour evolution). Theoretical prediction of time-dependent PDFs has proven to be no less challenging due to the limitation in our understanding of nonlinear stochastic dynamical systems, as well as the complexity in the required analysis.

Spectral analysis, for example, using theoretical tools similar to those used in quantum mechanics (e.g., raising and lower operators) is useful (e.g., [

1]), but the summation of all eigenfunctions is necessary for time-dependent PDFs far from equilibrium. Various different methodologies have also been developed to obtain approximate PDFs, such as the variational principle, the rate equation method or the moment method [

30,

31,

32,

33,

34,

35]. In particular, the rate equation method [

31,

32] assumes that the form of a time-dependent PDF during the relaxation is similar to that of the stationary PDF and thus approximates a time-dependent PDF during transient relaxation by a PDF having the same functional form as a stationary PDF, but with time-varying parameters.

In this work, we show that this assumption is not always appropriate. We consider a stochastic logistic model with multiplicative noise. We show that for fixed parameter values, the stationary PDFs are always gamma distributions (e.g., [

36,

37]), one of the most popular distributions used in fitting experimental data. However, we find numerically that the time-dependent PDFs in transitioning from one set of parameter values to another are significantly different from gamma distributions, especially for strong stochastic noise. For sufficiently strong multiplicative noise, it is necessary to introduce additive noise, as well, to obtain stationary distributions at all. We note that in inferential statistics, gamma distributions facilitate Bayesian model learning from data, as a gamma distribution is a conjugate prior to many likelihood functions. It is therefore interesting to test whether models with stationary gamma distributions also have time-dependent gamma distributions.

## 2. Stochastic Logistic Model

We consider the logistic growth with a multiplicative noise given by the following Langevin equation:

where

x is a random variable and

$\xi $ is a stochastic forcing, which for simplicity, can be taken as a short-correlated random forcing as follows:

In Equation (

2), the angular brackets represent the average over

$\xi $,

$\langle \xi \rangle =0$, and

D is the strength of the forcing.

$\gamma $ is the control parameter in the positive feedback, representing the growth rate of

x, while

$\u03f5$ represents the efficiency in self-regulation by a negative feedback.

$\gamma x-\u03f5{x}^{2}$ can be considered as the gradient of the potential

V as

$\gamma x-\u03f5{x}^{2}=-\frac{\partial V}{\partial x},$ where

$V=-\frac{\gamma}{2}{x}^{2}+\frac{\u03f5}{3}{x}^{3}$. Thus,

V has its minimum value at

$x=\frac{\gamma}{\u03f5}$. When

$\xi =0$,

$x=\frac{\gamma}{\u03f5}$ (the carrying capacity) is a stable equilibrium point since

${\partial}_{xx}V{|}_{x=\gamma /\u03f5}=\gamma >0$;

$x=0$ is an unstable equilibrium point since

${\partial}_{xx}V{|}_{x=0}=-\gamma <0$.

The multiplicative noise in Equation (

1) shows that the linear growth rate contains the stochastic part

$\xi $. This model is entirely phenomenological, and

x can be interpreted as the size of a critical physical phenomenon (vortex, tornado, etc.), stock market, number of biological cells, viruses and proteins. It is reminiscent of Delbrück’s autocatalytic processes [

25], but is different from these in many ways, the most important being the lack of discreteness and the possibility of reaching a steady-state due to the finite capacity of logistic growth. We will show in the following that in spite of these differences, our model is capable of producing large fluctuations.

By using the Stratonovich calculus [

2,

3,

38], we can obtain the following Fokker–Planck equation for the PDF

$p(x,t)$ (see

Appendix A for details):

corresponding to the Langevin Equation (

1). By setting

${\partial}_{t}p=0$, we can analytically solve for the stationary PDFs as:

which is the well-known gamma distribution. The two parameters

a and

b are given by

$a=\gamma /D$ and

$b=\u03f5/D$. The mean value and variance of the gamma distribution are found to be:

where

$\sigma =\sqrt{\mathrm{Var}(x)}$ is the standard deviation. We recognise

$\langle x\rangle $ as the carrying capacity for a deterministic system with

$\xi =0$. It is useful to note that

$\langle x\rangle $ is given by the linear growth rate scaled by

$\u03f5$, while

$\mathrm{Var}(x)$ is given by the product of the linear growth rate and the diffusion coefficient, each scaled by

$\u03f5$. That is, the effect of stochasticity should be measured relative to the linear growth rate.

Therefore, the case of small fluctuations is modelled by values of

D small compared with

$\gamma $ and

$\u03f5$. In such a limit,

a and

b are large, making

$\sqrt{\mathrm{Var}(x)}\ll \langle x\rangle $ in Equation (

5). That is, the width of the PDF is much smaller than its mean value. In this limit, Equation (

4) reduces to a Gaussian distribution. To show this, we express Equation (

4) in the following form:

where

$f(x)=bx-(a-1)lnx$. For large

b, we expand

$f(x)$ around the stationary point

$x={x}_{p}$ where

${\partial}_{x}f(x)=0=b-(a-1)/x$ up to the second order in

$x-{x}_{p}$ to find:

Here,

$a\gg 1$ was used. Using Equation (8) in Equation (

6) then gives us:

which is a Gaussian PDF with mean value

$\langle x\rangle $. Here,

$\beta =1/\mathrm{Var}(x)$ is the inverse temperature, and the variance

$\mathrm{Var}(x)$ is given by Equation (

5). Therefore, for a sufficiently small

D, the gamma distribution is approximated as a Gaussian PDF, which is consistent with the central limit theorem as small

D corresponds to small fluctuations and large system size. See also [

39] for a different derivation.

As

D increases, the Gaussian approximation becomes increasingly less valid. Indeed, even the gamma distribution becomes invalid asymptotically, when

$t\to \infty $, if

$D>\gamma $; according to Equation (

4), having

$a<1$ yields

$\underset{x\to 0}{lim}}p={\displaystyle \underset{x\to 0}{lim}}\frac{\partial p}{\partial x}=\infty $. However, from the full time-dependent Fokker–Planck Equation (

3), one finds that if the initial condition satisfies

$p=0$ at

$x=0$, then

$p(x=0)$ will remain zero for all later times. As we will see, the resolution to this seeming paradox is that no stationary distribution is ever reached for

$D>\gamma $, but instead, the peak approaches ever closer to

$x=0$, without ever reaching it.

If we are interested in obtaining stationary solutions even when

$D>\gamma $, one way to achieve that is to return to the original Langevin Equation (

1), but now include a further additive stochastic noise

$\eta $:

where

$\xi $ and

$\eta $ are uncorrelated and

$\eta $ satisfies

$\langle \eta (t)\eta ({t}^{\prime})\rangle =2Q\delta (t-{t}^{\prime})$. The new version of the Fokker–Planck Equation (

3) then becomes:

which has stationary solutions given by:

This integral can be evaluated analytically, but the final form is not particularly illuminating. The only point to note is that for non-zero

Q, the denominator is never zero, even for

$x\to 0$, which avoids any possible singularities at the origin. For

$\gamma >D$ and

$Q\ll D$, the solutions are also essentially indistinguishable from the previous gamma distribution (

4). The only significant effect of including

$\eta $ therefore is to avoid the previous difficulties at the origin when

$D>\gamma $.

As we have seen, both Fokker–Planck Equations (

3) and (

11) can be solved exactly for their stationary solutions. This is unfortunately not the case regarding time-dependent solutions, where no closed-form analytic solutions exist (see

Appendix B for the extent to which analytic progress can be made). We therefore developed finite-difference codes, second-order accurate in both space and time. Most aspects of the numerics are standard and similar to previous work [

40,

41,

42]. The only point that requires discussion are the boundary conditions. As noted above, for (3), the equation itself states that

$p=0$ at

$x=0$ is the appropriate boundary condition, provided only that the initial condition also satisfies this. In contrast, for (

11), the appropriate boundary condition is

$\frac{\partial}{\partial x}p=0$ at

$x=0$. To derive this boundary condition for (

11), we simply integrate (

11) over the range

$x=[0,\infty ]$ and require that the total probability should always remain one, so that

$\frac{d}{dt}\int p\phantom{\rule{0.166667em}{0ex}}dx=0$. Regarding the outer boundary, choosing some moderately large outer value for

x and then imposing

$p=0$ there was sufficient. Resolutions up to

${10}^{6}$ grid points were used, and results were carefully checked to ensure they were independent of the grid size, time step and precise choice of outer boundary.

## 3. Diagnostics

Once the time-dependent solutions are computed, we can analyse them using a number of diagnostics. First, we can evaluate the mean value

$\langle x\rangle $ and standard deviation

$\sigma $ from (

5). Next, to explore the extent to which the time-dependent PDFs differ from gamma distributions, we can simply compare them with ‘equivalent’ gamma distributions and compute the difference. That is, given

$\langle x\rangle $ and

$\sigma $, the gamma distribution

${p}_{\mathrm{equiv}}$ having the same mean and variance would have as its two parameters

$a={\langle x\rangle}^{2}/{\sigma}^{2}$ and

$b=\langle x\rangle /{\sigma}^{2}$. With these values, we define:

to measure how different the actual time-dependent PDF is from its equivalent gamma distribution.

Two other familiar quantities often useful in analysing PDFs are the skewness and kurtosis, defined by:

Skewness measures the extent to which a PDF is asymmetric about its peak, whereas kurtosis measures how concentrated a PDF is in the peak versus the tails, relative to a Gaussian having the same variance (the $-3$ is included in the definition of the kurtosis to ensure that a Gaussian would yield zero). For gamma distributions, one finds analytically that the skewness is $2\sqrt{D/\gamma}$, and the kurtosis is $6D/\gamma $. Comparing the skewness and kurtosis of the time-dependent PDFs with these formulas is therefore another useful way of quantifying how similar or different they are from gamma distributions.

Another quantity that can be useful is the so-called differential entropy as a measure of order versus disorder (as entropy always is):

In particular, we expect

S to be small for localised PDFs and large for spread out ones (e.g., [

40,

41,

42,

43]). For unimodal PDFs as the ones studied here, entropy and standard deviation are typically comparably good measures of localization, but for bimodal peaks, entropy can be significantly better [

42]. For the gamma distribution in Equation (

4), the differential entropy can be shown to be given by:

where

$\psi (a)=\frac{dln(\Gamma (x))}{dx}{|}_{x=a}$ is the double gamma function.

Our final diagnostic quantity is what is known as information length. Unlike all the previous diagnostics, which are simply evaluated at any instant in time, but otherwise do not involve

t, information length is a Lagrangian quantity, explicitly concerned with the full time-history of the evolution of a given PDF. It is thus ideally suited to understanding time-dependent PDFs. Very briefly, we begin by defining:

Note how

$\tau $ has units of time and quantifies the correlation time over which the PDF changes, thereby serving as a time unit in statistical space. Alternatively,

$1/\tau $ quantifies the (average) rate of change of information in time.

$\mathcal{E}$ is due to the change in either width (variance) of the PDF or the mean value, which are determined by

$\gamma $,

D and

$\u03f5$ for the gamma distribution (e.g., see Equation (

4)). In the standard Brownian motion, the mean value is zero so that

$\mathcal{E}$ is due to the change in the variance of a PDF.

The total change in information between initial and final times, zero and

t respectively, is then defined by measuring the total elapsed time in units of

$\tau $ as:

This information length

$\mathcal{L}$ measures the total number of statistically distinguishable states that a system evolves through, thereby establishing a distance between the initial and final PDFs in the statistical space. Note that

$\mathcal{L}$ by construction is a continuous variable and thus measures the total “number” of statistically-different states as a continuous number. See also [

40,

41,

42,

43,

44,

45,

46,

47,

48] for further applications and theoretical background of

$\mathcal{E}$ and

$\mathcal{L}$.