Large Deviations for Continuous Time Random Walks

Recently observation of random walks in complex environments like the cell and other glassy systems revealed that the spreading of particles, at its tails, follows a spatial exponential decay instead of the canonical Gaussian. We use the widely applicable continuous time random walk model and obtain the large deviation description of the propagator. Under mild conditions that the microscopic jump lengths distribution is decaying exponentially or faster i.e., Lévy like power law distributed jump lengths are excluded, and that the distribution of the waiting times is analytical for short waiting times, the spreading of particles follows an exponential decay at large distances, with a logarithmic correction. Here we show how anti-bunching of jump events reduces the effect, while bunching and intermittency enhances it. We employ exact solutions of the continuous time random walk model to test the large deviation theory.


Introduction
Following the erratic motion of pollen under his microscope Robert Brown discovered what is called today: Brownian motion. This phenomenon was modeled by Einstein and others, with the random walk theory while the mathematical description of Brownian motion, i.e., the Wiener process [1], was quickly established soon after. Two ingredients of this widely observed dynamics are that the mean square displacement increases linearly with time and that the spreading of the packet of particles all starting on a common origin is Gaussian [2,3]. The latter is the physical manifestation of the central limit theorem. However, in Reference [4] Chaudhuri, Berthier, and Kob, discovered what might turn out to be a no less universal feature of random walks. Analyzing experimental and numerical data, e.g., the motion of particles in glass forming systems, they revealed an exponential decay of the packet of spreading particles; see related works in References . Soon after this, a similar behavior described by the Laplace distribution, was found for many other systems, including for molecules in the cell environment [7,26,27]. In [4] it was suggested that a specific type of continuous time random walk (CTRW) model could explain the physics of the observed behavior. Two of us have recently shown that the phenomenon is indeed universal as it holds in general [28] and under very mild conditions (see below). The goal of this paper is to produce further evidence for the phenomenon, present exact solutions of the model and compare it with the new theory. We also present a more detailed analysis of the model covering cases not discussed previously.
The observed behavior, is related to the way experimentalists record the erratic paths. Following many trajectories one may find universal features of the motion in at least two different ways. In some situations the measurement time is very long such that many jumps occurred and consequently Gaussian statistics will take place. However, in practice experiments are not conducted for an infinite time. In fact in many experiments, the motion follows some trapping and then released motion, i.e., a hop and then wait dynamics, as modeled by the CTRW. In these experiments one follows many paths, e.g., many single molecules, however typically each trajectory is recorded separately. When the averaged number of jumps recorded under the microscope (or in simulations) within the observation time (0, t) is not large, one would expect naively that this would imply the non-existence of universal statistical laws. However a second limit, i.e., the limit of a large number of jumps when |x|/t → ∞, is important, which is large positions (and finite times). In this case, we will promote the idea that exponential spreading is the rule. Thus the observation of either Gaussian or exponential spreading depends on the time scale and length scale of the motion and in statistical sense the temporal and spatial extent of the field of view. The idea is that by observing the density of packet of particles in its tails (large |x|) we are in fact considering trajectories where the number of jumps is large, compared to the mean. We advocate that universal laws, that demand a large number of steps, can be found for large displacements (and fixed time) or for large time as usual. Theoretically since we have two parameters that can be made large, i.e., t/ τ or |x|/σ we may obtain more than one limiting law. Here τ is the average time between jumps, and σ is the variance of jumps lengths. The motion is unbiased hence the finite timemean jump size is zero. We will consider, among other things the case, when both t/ τ and |x|/σ are large and their ratio is finite.
The remainder of the manuscript is organized as follows. In Sections 2 and 3, after presenting the CTRW model, we consider the far tails of the distribution of the number of renewals and the position of the random walker, where the waiting times are drawn from the exponential and the Erlang distributions, respectively. The bunching effect is investigated in Section 4 using the sum of two exponential probability density functions (PDF). Finally, we conclude with a discussion.

Appetizer for Exponential Tails
We consider the well known CTRW model [2,29,30]. This model describes a wait and then jump process. A particle in dimension one, starts on the origin at time t = 0. We draw a waiting time from the PDF ψ(τ) and then after the wait, the particle will perform a jump whose length, χ, is drawn from f (χ). The process is then renewed. In our case the waiting times and the jump lengths are not correlated. The position of the particle at time t is x(t) = ∑ n i=1 χ i , and here n is the random number of jumps within the time of observation (0, t). The focus of this manuscript is the PDF of finding the particle on x at time t which is denoted P(x, t). In particular the large x limit of this distribution is of interest. We focus on non-biased CTRWs, and then if the mean waiting times and the variance of jump lengths are finite the density P(x, t) will converge to a Gaussian as expected from the central limit theorem. This limit theorem is valid for t → ∞ and |x| ∝ t 1/2 , while here we are interested in finite time effects and the large |x| limit, to be defined more precisely below. Generally, the solution of the model is given by Here the sum is over the possible outcomes of the number of jumps in the process, Q t (n) is the probability of attaining n jumps [31], while P(x|n) is the probability of finding the particle on x conditioned it made n jumps. In Laplace space [28], where s is the Laplace pair of t. In particular, when n = 0, Q t (n = 0) = ∞ t ψ(τ)dτ is called the survival probability.
Generally finding Q t (n) and P(x|n) is non trivial. Here the goal is to consider special choices of jump length distributions f (χ) and waiting time PDFs ψ(τ) which allow us to express the solution as an infinite sum over n explicitly, i.e., without restoring to inverse Fourier and Laplace transforms and/or numerical simulations. This allows us to compare between exact solutions of the problem, and the large deviation theory we have promoted previously [28]. Further, with the specific choices of the input distributions ψ(τ) and f (χ) we can derive large deviations theory in a straight forward way. The price that we pay is that we do not consider here the theory in its full generality, since we stick to exactly solvable models. The theory of large deviations of stochastic models, for example random walks, has been investigated [32][33][34][35][36][37][38][39][40][41][42], however as we show the fluctuations of the number of jumps, that is a feature of CTRW, is a key issue for appearance of exponential tails.

Displacement Follows Gaussian Distribution
We start the analysis with the simplest choice of waiting times and jump lengths distributions. The jump length PDF is assumed to be Gaussian with zero mean and variance σ = 1, i.e., f (χ) = exp(−χ 2 /2)/ √ 2π. It then follows that P(x|n) is Gaussian as well. The waiting times are exponentially distributed ψ(τ) = exp(−τ) for τ > 0. Hence the mean waiting time is τ = 1 and Q t (n) obeys Poisson statistics. The density of spreading particles all starting on the origin is therefore and in this sense we have an exactly solvable model valid for any time and x. Here the term n = 0 contains a delta function on the origin, which of course is not plotted in the figures we present below. This describes particles not moving at all. To analyze this sum in the large |x| limit we use Cramér-Daniels approach [43] to large deviations. Essentially this is a saddle point method, exploiting the fact that one may relate the cumulant generating function of P(x, t) and the large deviations in the system. For that let us take the Fourier transform of Equation (3) and then we have where we used the Fourier transform of f (χ) denoted f (k) = exp(−k 2 /2). We have exploited the fact that the jump lengths are independent and identically (IID) hence from convolution theorem of Fourier transform we have the expression P(k|n) = f n (k). The moment generating function is exp(ux) = ∞ −∞ P(x, t) exp(ux)dx. Replacing k → −iu and summing the series Then the cumulant generating function is simply the log of the above expression, we denote it Now we are interested in the large |x| limit of P(x, t), which by definition is the inverse Fourier transform P(k, t). Instead of integrating over k one may switch to integration over u and using the saddle point method valid for the large |x| one then finds a well known large deviation formula [43] P(x, t) ∼ 1 Hereû is given by the solution K (u) = x. While in this section we are treating a special choice of jump length and waiting time, this tool will serve us all along the paper. One should notice that a cumulant generating function does not always exist. For example if f (χ) decays as a power law for large |x|, the moments of the process will diverge. From here we see the first condition of the theory to hold in generality: we assume that cumulant generating function of f (χ) and P(x, t) exist, so the decay of f (χ) is faster than exponential [44].
For the Gaussian choice of f (χ), to findû we have The solution of this equation is given by the Lambert W 0 function [45] where the subscript zero denotes the branch of this function. The latter is the solution of the equation y exp(y) = z given by where the sign ofû is dictated by the sign of x. Using Equation (7) we then find with This is plotted in Figure 1. Using the large and small z limits of the Lambert function, i.e., W 0 (z 2 ) ∼ 2 ln(z) when z >> 1 and W 0 (z 2 ) ∼ z 2 for |z| << 1, we find the following two limits: valid when x/t >> 1, and P(x, t) exp[−x 2 /2t] (12) in the opposite limit when |x|/t → 0. The first is the mentioned exponential decay, and it includes what we call the ln(|x|) correction. As far as we know, in the experimental literature this ln(|x|) is not reported. However this fact is not surprising as the ln(|x|) is clearly a slowly varying function and hence difficult to detect in reality. We note that the log correction implies that the decay is in reality slightly faster than exponential, implying the existence of the cumulant generating function. The second case is the well known Gaussian behavior found in the centre of the packet. We may further formulate our results using two very much related approaches, both consider the limits x → ∞ and t → ∞, and their ratio x/t = l is kept fixed. Using Equation (10) the first limiting law is lim x→∞ − ln(P(x, t))/x = I x (l) (13) with I x (l) = l −1 − 1/ W 0 (l 2 ) + W 0 (l 2 ) and the second refers to lim t→∞ − ln(P(x, t))/t = I t (l) (14) with I t (l) = lI x (l). The functions I t (l) and I x (l) are called the rate functions [34,46,47]. All these approaches are essentially identical. Most experimentalists in the field plot P(x, t) versus x on a semi-log scale, and for that aim Equation (10) including the prefactor 1/ 2πK (û) is useful. Mathematicians influenced by long time (large x) ideas, and the large deviation literature, would possibly find it natural to use I t (l) (I x (t)) respectively. In demonstration below we plot the exact solution for finite times, using both approaches; see Figures 1-3. We now present a simple explanation for the exponential tail using a lower bound. Clearly and here n * is the value of n for which the summand in Equation (3) is the maximum. To find n * , we use Stirling's approximation, i.e., n! ∼ √ 2πn(n/e) n , and The solution of the above equation is n * ∼ |x|/ W 0 (x 2 /t 2 ). As expected, in the limit |x|/t → ∞, we find the exponential decay again where we used the asymptotic behavior of the Lambert function, i.e., W 0 (|x|) ∼ ln(|x|) with |x| → ∞. The claim in [28] is much more general as a similar behavior is found for a large class of waiting times and jump lengths PDFs, see details below.   (13), describes the rate function I x (x/t) capturing the behavior of large deviations. Note that for large l we have I x (l) ∼ 2 ln(l) and this means that P(x, t) is decaying exponentially with the distance x with a correction. The dashed lines are the plot of ln(P(x, t))/(−|x|) calculated from Equation (3).

Displacement Drawn from the Discrete PDF
We now investigate a second example, which further demonstrates the exponential far tails of P(x, t). Consider a CTRW on a one dimensional lattice, with exponential waiting times. Hence f (χ) = [δ(χ − 1) + δ(χ + 1)]/2 and ψ(τ) = exp(−τ). The Fourier transform of the jump length distribution is f (k) = cos(k) and the moment generating function is exp(uχ) = cosh(u), where cosh(u) is hyperbolic cosine function. Here P(x|n) is the well known Binomial distribution where x is an integer since the random walker is on a lattice with a unit spacing. Notice that when n is odd/even the same holds for x. The CTRW probability is then Switching to the Fourier space from the convolution theorem, the Fourier transform of P(x|n) is f n (k). Then after summation it is easy to find the moment generating function M(u) = exp(ux) = exp[−t(1 − cosh(u))]. We then use, as before, the cumulant generating function K(u) = −t[1 − cosh(u)]. We now need to find the solution of K (u) = x which is given byû = sinh −1 (x/t) = ln(x/t + √ 1 + x 2 /t 2 ), then using Equation (7) we get with Equation (19) provides for small x/t the standard Gaussian behavior, P(x, t) exp[−x 2 /2t]. In the opposite limit of large x which demonstrates the exponential decay. We may also write this solution like with the rate function Notice that in Equation (17) P(x|n) = 0 for x > n since the particle walking on a lattice cannot reach a distance larger than the number of steps. Thus for standard random walks, with a fixed number of jumps, the exponential tails are not a generic feature. It implies that the fluctuations of the number of steps are crucial for the observation of exponential tails, like those in Equation (20).

The Exponential Tails for the Erlang Waiting Time PDF
So far we exposed the exponential tails of P(x, t) assuming that the statistics of the numbers of jumps is Poissonian. To advance the theory further we now consider Q t (n) for the Erlang PDF of waiting times. For the Erlang distribution, the variance of the waiting time is finite. In the long time limit, from the central limit theorem n follows well known Gaussian distribution [31,48]. This indicates that the spreading of the particles obeys Gaussian distribution for the central part of the distribution of the position. For more details see Appendix A. For the tails, large number of jumps n is the cause for large displacement from the origin of the random walker. Hence the limit n large and t fixed, or both these observables large while their ratio is finite is of interest.

The Exponential Tail of the Number of Events
In this regard, two of us have found a general solution of the problem [28]. The large n limit of Q t (n) is controlled by the small τ limit of the waiting time PDF, since to have many jumps within a finite fixed time t, the time intervals between jumps must be made small. The main requirement we use here is that ψ(τ) is analytical in the vicinity of τ → 0 and hence Here A is a non negative integer, and C A and C A+1 are coefficients. Then in the large n limit, and t fixed we have [28] Beyond A, C A and C A+1 other properties of ψ(τ) are irrelevant. Below, e.g., in Figure 11, we call Equation (24) the Q large-n formula which denotes the limiting law of Q t (n) for n → ∞. We now consider an example, the Erlang distribution [49] where m is an integer and the mean τ = m, see Figure 4. Note that Equation (25) is a special case of the well known Gamma distribution φ(τ) = β α τ α−1 exp(−βτ)/Γ(α) with τ, α, β > 0. Here exponential waiting times are recovered when m = 1. The Laplace transform of ψ(τ) iŝ Thus, the Erlang PDF of order m is the m fold convolution of the exponential PDF. As already mentioned, a well known formula Equation (2) yields the Laplace transform t → s of Q t (n). Inserting Equation (26) and using the Laplace pairs where Γ(a, z) = ∞ z τ a−1 exp(−τ)dτ is the upper incomplete Gamma function [50], we find and Using the identity valid for a positive integer m we get We will soon use this expression to find an exact solution of the CTRW with the Erlang PDF of waiting times.   (25)) for various m. Increasing m leads to decreasing probability of obtaining small τ that leads to transition from bunching to anti-bunching [see discussion preceding Equation (54)].
From the above theorem, and the definition of the Erlang PDF we have for this example A = m − 1, C A = 1/(m − 1)! and C A+1 = −C A . Hence according to Equation (24), we have As expected this is the same as the leading term in the expansion Equation (31), when t/nm 1. We now find the rate function of this example. Using the Stirling approximation n! ∼ √ 2πn(n/e) n , Equation (31) reduces to where I n (l) = − ln(l) + l − 1, l = t/nm and For l = t/nm → 0, i.e., n → ∞ and m is fixed, the asymptotic behavior of Equation (34) follows This result is verified in Figure 5. From Equation (33), the rate function becomes where the limit is valid when l is fixed so here t is made large. See the red solid line in Figure 6. Finally we consider also and here the limit is taken such that l = mn/t remains fixed.
Remark 1. As mentioned, we use in our plots two graphical representations of the results, the distribution of observables of interest and the rate function. At least to the naked eye we see that in Figures 1, 3 and 5 the results converge already for a relatively short time, e.g., t = 2, while in Figures 2 and 6, we see that the convergence is achieved for much larger times, say t = 100. This discrepancy dissolves if all the prefactors like ln 2K (û) are included, and not only the rate function. The rate function formalism is a limit where we take say x → ∞ or n → ∞. Taking the log of the distribution, be it P(x, t) or Q t (n) and dividing by a large number we get rid of the prefactors. Hence the two representations are of course not identical.  The dashed lines are the plot of the function ln(Q t (n))/(−nm) with Q t (n) calculated from Equation (31). Clearly, with the increase of the observation time t, the function ln(Q n (t))/(−nm) versus l approaches to the rate function I n (l).

The Far Tails of the Positional PDF
For a Gaussian PDF of jump lengths and Erlang's waiting times we obtain a formal solution from Equations (1) and (31) Similar to the previous examples, this function is easy to represent graphically for any reasonable x and t with a program like MATHEMATICA. Here our goal is to find the large |x| behavior. As briefly mentioned in the introduction, it was shown previously that P(x, t) decays exponentially with the position x, more correctly like |x| multiplied by a slowly varying function. We briefly outline the main result in [28]. Using Cramérs theorem [34] it was shown that under certain conditions, in particular that the PDF of jump length decays faster than exponential decay, P(x|n) ∼ exp(−n(x/δn) β ) with β > 1 and |x|/n → ∞. Here we use Gaussian statistics for the jump lengths so we have δ = √ 2 and β = 2. Then for large x the main result in [28] reads where g 0 = (β(β − 1)/(A + 1)) 1/β /δ and g 1 = [g 0 (A + 1)/(C A Γ(A + 1)) 1/(1+A) ] β . Recall that the Lambert function W 0 (y) is a monotonically increasing function of y. Further W 0 (|y|) ∼ ln(|y|) for large y. Hence from Equation (39) we learn that P(x, t) decays exponentially (neglecting log terms) as mentioned. This result establishes that the exponential decay of the PDF P(x, t) is a universal feature of CTRW, in the same spirit as experimental and numerical evidence demonstrates in many examples.
To appreciate this result and further test it, let us derive it with our example (in this case the proof is somewhat easier as compared to the general case). When x → ∞ and t fixed, say t of order of c τ and c = 1 or c = 137 etc, the terms contributing to the infinite sum giving P(x, t) are those with a large n. Physically and mathematically this is obvious, as to reach large x in a finite time we need many jumps since the average of displacement is finite. We therefore use large n approximation of Q t (n) Equation (31) and then Only large n terms contribute, hence including small n in the summation is not a problem, since these terms are of order exp(−x 2 ) while |x| → ∞. We now switch to Fourier space, and find with the sum S m (y) = ∑ ∞ n=0 y n /(nm)!. The next step is to find the asymptotic behavior of S m (y) for large y. In order to do so, let us consider some special cases. Hence for m = 1, S 1 (y) = exp(y), and similarly S 2 (y) = cosh( √ y) ∼ exp( √ y)/2, etc. For a general m, we write the infinite sum as an integral Note that Equation (45) can also be obtained directly by using the asymptotic behavior of the Mittag-Leffler function E α,β (y) = ∑ ∞ j=0 y j /Γ(αj + β) since S m (y) = E m,1 (y) [51]. Below, we will use the above equation to calculate the far tails of the distribution of the position. As usual switching to k = −iu we obtain the moment generating function, and taking the log, we get the cumulant generating function K(u) −t + t exp(u 2 /(2m)).
This is valid for large u since that limit is the relevant one for the calculations of P(x, t) for large x, similar to our previous examples. Here we neglected a ln(m) term that is negligible in the limit under study. Using the standard large deviation technique Equation (7), we find the solution of K (u) = x denotedû and this solves the equationû and hence we find yet again the Lambert type of solution for x > 0 otherwise the right hand side of Equation (48) has a negative sign. If m = 1, as expected, we relax to Equation (9). Using the large deviation formula Equation (7), we find This solution and the more general one Equation (39) are of course in agreement. As shown in Figure 7, the far tails of the distribution of the position follow exponential decay predicted by Equation (49) since as mentioned W 0 (z) ∼ ln(z). But as shown in Figure 7 the rate of convergence is slow, for example for m = 3.
To check better the convergence issue, we consider the rate function. Based on Equation (49), the rate function with respect to the position becomes lim t→∞ − ln(P(x, t)) |x| = I x (l) (50) with l = x/(t/ τ ) and It can be seen in Figure 8 that with the increase of observation time t, the left-hand of Equation (50) tends to I x (l) (Equation (51)) slowly. One contribution to this slow convergence effect, is that t is not very large, the prefactor of exp(−|x|I x (l)), i.e., 1/ 2πK (û), is of importance. As expected, when mx 2 /t 2 → 0, we get the Gaussian distribution and since τ = m from Equation (25) we have P(x,t) exp(−x 2 /2t) witht = t/ τ . Recall that W 0 (y) ∼ ln(y) for y 1, hence when mx 2 /t 2 1 we may approximate Equation (49) with Thus using the rescaled timet = t/ τ we have implying that as we increase m for fixedt we get a suppression of the exponential tails. As we increase m the condition on x √ mt implies that we may observe the nearly exponential decay of the packet but only for a very large x, or only for very short times. This is because when we increase m, the waiting time exhibits a strong anti-bunching effect. Namely, to find a large x we need to have a large fluctuations of n. For example if the average of n(t) ∼ 2 we may still see realizations with many jumps, leading to exponential decay in the far tails of x. However, this is less likely for large m if compared with small m; see Figure 4. This is because when m is large we have vanishing fluctuations of n. To quantify this we may use a tool from quantum optic, namely the Mandel Q parameter [52] defined as Q = n 2 − n 2 n − 1.
The case Q < 0 is called sub-Poissonian (anti-bunching ) and Q > 0 sup-Poissonian. If Q = −1, we have no fluctuations of n at all. For the Erlang PDF, and in the long time limit Q = −1 + 1/m. Namely, when m → ∞ the fluctuations of n vanish (see Figure 9) , and hence with it the effect of exponential tails of P(x, t). This is easy to see since in the absence of fluctuations of n, the number of jumps is fixed and since we have Gaussian jump lengths, the total displacement will be Gaussian and not exponential. Thus we see that as we increase m, namely make the process more anti-bunched we see less exponential tails. Anti-bunching means the effective repulsion of the dots on the time axis on which jump events takes place. And this is controlled by the small τ behavior of ψ(τ). The anti-bunching Q reduces to a negative value, and hence kills fluctuations of n which are the key to the observation of exponential tails of P(x, t). When m increases the distribution narrows, indicating that fluctuations of n disappear. We further plot the central part of Q t (n) using the symbols according to Gaussian approximation. Here for Gaussian approximation we use Q t (n) ∼ exp(−(n − t/m) 2 /(2t/m 2 ))/ √ 2πt/m 2 (see Appendix A).

Bunching Case of Waiting Time PDF
We further explore the behavior of probability to observe a large number of renewals n. For some special distributions of the waiting times it is unwise simply to take the n → ∞ limit since the rate of convergence to Equation (24) is very slow. Here we consider ψ(τ) as a sum of two exponential waiting times PDFs We choose a < b while the opposite situation is merely relabeling. The average waiting time is τ = (a + b)/2. In what follows we shall fix τ = 1, so a + b = 2 and this means that we have one free parameter say a with 0 < a < 1. We also consider the case where the jump size distribution is Gaussian with a variance equal unity and zero mean. This implies that in the long time limit, independently of the specific choice of a, the mean square displacement x 2 ∼ t and the process is Gaussian. However, for short times the dynamics of P(x, t) is of course a sensitive. The question we wish to address: how does a control the exponential tails of P(x, t)? This is related to effect of bunching. Here for short waiting times we have ψ(0) = (1/a + 1/b)/2. When a = b = 1 we have a Poisson process, while when say a → 0, we get a large value of ψ(0). This means that there is an increased probability (compared to Poissonian case) to obtain a jump shortly after a jump event. This is an effect of bunching where the jumps come in groups, and then separated by the relatively large waiting times. Such intermittent behavior is quantified with Mandel Q parameter, which for large time is where we assumed that a + b = 2. This is a super Poissonian behavior since Q > 0. Note that in the previous example of the Erlang PDF, we had the opposite behavior, i.e., Q < 0, since there lim τ→0 ψ(τ) = 0 for any m = 1.
We can obtain Q t (n) in terms of an integral; see details in Appendix B. Aside Q t (n) we also find P(x, t), which is compared to the previous theory when x → ∞, i.e., Equation (39); see Figure 10.
Here we see that by making a smaller, namely making the process more bunched, we get large exponential tails (see Figure 10). Thus bunching makes the observation of the exponential tails effect more readily achievable in experiments. In principle for any PDF of the waiting times (provided ψ(τ) is analytical in the vicinity of τ = 0) an exponential decay of P(x, t) is obtained, but this decay can be sometimes achieved for extremely large values of x. We also observe that simply including the asymptotic behavior of Q t (n) in Equation (24) leads to slow convergence in the a → 0 limit. To further understand the effect of bunching we also plot Q t (n) versus n; see Figure 11. Strong bunching in this model, implies a relatively high probability for seeing a large n (many short time intervals between jumps since a is small). As we see, the decay of Q t (n) with n is relatively slow when bunching is pronounced, and then the appearance of non negligible probability for large n, implies that particles can travel large distances, and then the tails posses more statistical weight. In Figure 11 the slow convergence to asymptotic behavior of Equation (24) is observed in the a → 0 limit and the delicate treatment of Appendix B is preferable.

Conclusions
Following the work of Kob and co-workers we have formalized the problem of nearly exponential decay of P(x, t) using the CTRW framework. Exponential decay is the rule, and it should be considered a natural consequence of large deviation theory. In the long time limit the packet of particles is typically Gaussian, hence the phenomenon can be found/measured for intermediate and short times. To describe the dynamics we have considered a few examples where we may find exact solutions to the problem. This allowed us to find the far tail, obtain the rate function, and compare finite time solutions with asymptotic expressions.
We distinguished between bunching and anti-bunching processes. These are determined by the behavior of ψ(τ) in the vicinity of of τ close to zero. The short time behavior of ψ(τ) determines the statistical behavior of the number of jumps, when the latter is large. And large number of jumps, lead the particle to non-typical large x, where the phenomenon is found. Using the example of ψ(τ) expressed as a sum of two exponential waiting times, we show that as we increase the bunching effect, the exponential tails of P(x, t) are more pronounced. Similarly as we increase anti-bunching, by increasing the parameter m in the Erlang distribution, the exponential tails are suppressed. Indeed as m → ∞ we get the usual random walk where n is not fluctuating and then exponential tails cannot be found.
These effects are related to the rather universal behavior of Q t (n) found for large n. As we have demonstrated here, for large n Q t (n) universally attains exponential decay (with log corrections). This, as mentioned, is valid for any ψ(τ) which is analytic for small τ. Thus the agreement with the experimental observation that finds exponential tails, is a manifestation of the widely applicable CTRW model under study and not merely consequence of a fitting procedure. This universal property of CTRW is the crucial difference from the diffusing diffusivity model [10,18,19,[53][54][55][56][57].
When dealing with a regular diffusive motion the only transport parameter that determines the behavior is the diffusive constant D. The diffusive constant also determines the position range where the Gaussian behavior is to be expected, i.e., for any x such that |x| ∝ (Dt) 1/2 . D itself is defined as a ratio of two microscopic quantities, the variance of the jump lengths (σ 2 ) and the average time between jumps τ , i.e., D = σ 2 /2 τ . The tails of the density, investigated in this paper, are clearly not determined by D. When the jumps are distributed according to a Gaussian distribution with the variance σ 2 and the waiting time posses an exponential distribution, with τ as a mean waiting time, the exponential tails are expected for any x such that |x|/σ > t/ τ . We see that for this specific case the important quantity is not D but rather σ/ τ . While σ and τ are the microscopic parameters that determine the large, and small x for a given time t, for any other case with no Gaussian jumps or exponential distribution of waiting times other parameters emerge. The average waiting time τ is replaced by 1/(C A ) 1/(A+1) from the expansion of ψ(τ) in Equation (23) and σ is replaced by δ whose role was exposed in [28], a parameter that determines f (χ) for large |χ| ( f (|χ|) ∼ exp(−(|χ|/δ) β ) when |χ| → ∞). Overall the condition for appearance of exponential tails is modified to |x| > t(C A ) 1/(A+1) δ (see Reference [28]). Another thing to notice about this condition is that the presented results are not limited to large times but emerge for any sufficiently large |x|.
As mentioned in Section 3, in this manuscript we assume that the PDF of waiting time ψ(τ) is analytic at the vicinity of τ = 0. Clearly not all of the PDFs of waiting time satisfy this property, for example ψ(τ) = τ −3/2 exp[−1/(4τ)]/2 √ π. Roughly speaking, for this case the probability of performing many jumps at a finite time is much smaller than the cases studied here since ψ(τ → 0) = 0. More specifically, for this example the analytical behavior Equation (23) is not valid, and hence our main results do not apply. This implies that the far tails of distribution of the position decay faster than exponential tails. It would be of interest to investigate this problem, which is left for future work.
Finally, let us mention a few other open problems: • In the present case we assumed that the jump process starts at time t = 0. This is called an ordinary renewal process. If the processes started long before the measurement, we will have a modification of the PDF of the first waiting time [3,58]. How does this effect the large x behavior of P(x, t)? This issue seems important since the phenomenon can be found for relatively short times.

•
We focused on models that in the long time limit converge to Gaussian statistics. What happens if ψ(τ) is fat tailed [59,60] with diverging mean? Our results are certainly valid for this case as well, however we did not explore this in detail.

•
What happens in dimension d > 1? • What are ideal waiting time PDFs and jump length distributions, where exponential tails are pronounce and if possible maintained for longer times. We showed how this is related to bunching and anti-bunching, however more refined work can help to clarify a better the widely observed behavior.

•
We used CTRW, instead one could use the noisy CTRW model [61]. This adds to the jumps also noise when the particle is waiting for its next jump. Thus noisy CTRW is much more similar to real experiments.

•
Here we considered the decoupled CTRW, where jump lengths and waiting times are uncorrelated. The general framework of CTRW, goes beyond this simplification [62,63].

•
If the jump length PDF is sub-exponential, the far tail of P(x, t) will deviate from what we found here. Most likely the principle of the single big jump [64][65][66] will hold in some form, but the details of the theory are left unknown.

•
Recently Dechant et al. showed how the CTRW picture emerges from an under-damped Langevin description of a particle in a periodic potential [29]. And then showed how this model can be used to analyse dynamics of Cesium atoms in optical lattices. Thus we expect to find also here exponential tails of packets, however influence of the control parameters of this phenomenon such as the depth of the optical potential, the noise etc, are left unknown to us. Similarly, over damped Brownian motion in corrugated channels, a model of biophysical transport, is likely related to CTRW as a coarse grained description. In the former system exponential decay was already explored in Reference [67]. Thus exponential tails are found both via Langevin dynamics and within CTRW, the two approaches are related in some limits.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. PDFs of n and x in the Long Time Limit
When the waiting time has a finite variance and the observation time t is large, Q t (n) obeys the limiting law described by Gaussian distribution [31,48] In turn n(t) and the variance of n, i.e., n 2 (t) − n(t) 2 denoted as (σ n ) 2 , can be obtained from where L −1 [ĝ(s)] means the inverse Laplace transform ofĝ(s). In the limit of long t, we have n(t) ∼ t/ τ and (σ n ) 2 ∼ t( τ 2 − τ 2 )/ τ 3 .
We now consider the model with the Erlang waiting time Equation (25) in the main text and Gaussian jump length f (χ) = exp(−χ 2 /2)/ √ 2π. In the long time limit we have n(t) ∼ t/m and (σ n ) 2 ∼ t/m 2 . According to Equation (1), for large t, P(x, t) reads Figure A1 demonstrates that Equation (A2), plotted by the solid lines, is an excellent approximation.
In the limit t → ∞, Gaussian distribution is found where we used the relation Q t (n) ∼ δ(n − t/m). Here we stress that Equation (A3) is just valid for the central part of the distribution of the position, while the large deviation theory developed in the main text is applicable in the large x limit. Appendix B. Calculation of Q t (n) and P(x, t) with Waiting Time Following Equation (55) Here the aim is to find the exponential tails of the distribution of the number of renewals and the position. As mentioned in the main text, we fix the mean of waiting times, namely τ = (a + b)/2 = 1. From Equation (2), we haveQ Thus, in real space, the formal solution is Let a = 0.01. So C A = (1/a + 1/b)/2 ∼ 1/(2a) and C A+1 ∼ −1/(2a 2 ) according to Equation (23). This indicates that exp(tC A+1 /C A ) ∝ exp(−t/a) = exp(−200). It means that the tails are sensitive to value a and it's not measurable in real experiment for such a small a. As shown in Figure 11, for small a, the limiting law Equation (24) loses its role in real experiment since it is too small to observe this phenomenon. Below, the aim is to find a new formula to predicte such behavior. In the limit of a → 0, Equation (A4) reduces toQ t (s) ∼ (s + 1) n (2s + 1) n+1 . (A7) One can do the similar calculation, like Equation (A4), but there is a simple way. From Cauchy's integral formula, the inverse Laplace transform of Equation (A7) yields Q t (n) ∼ 1 2 n+1 n!
where we used the fact that Equation (A7) has poles at s = −1/2. The Equation (A8), plotted by the symbols ('+') in Figure A2, is consistent with the exact result for a = 0.01. For this case, the law Equation (24) should be modified since the rate of convergence is extremely slow (for example, see the solid line for a = 0.1 in Figure 11). Another interesting problem is how the far tail of the distribution of n decays with respect to n. Note that the limit between a and n is a bit subtle. Let a be small and n be large instead of infinity. Using the saddle point approximation, we get The corresponding asymptotic behavior of large n follows which is plotted by the dash-dotted line in Figure A2. In this limit, we still find the universal exponential decay of the far tail since the leading term of Equation (A10) is n ln(2). This is verified by the symbols in Figure A3 with a = 0.01. Clearly, the leading term 2 ln(2)|x| in the numerator of the above equation is responsible for exponential decay. While, note that with the growing of x and n, Equation (A11) fails since for large x the parameter a comes into play, see the symbols in Figure A2. On the contrary, the law Equation (24) will work for the extremely large n, which is difficult to see for small a if compared with Equation (A11). As mentioned in the main text this is related to bunching effects.  Figure A3. P(x, t) for the case where the jump lengths are Gaussian and ψ(τ) is given by Equation (55), in the limit a → 0. The exact result (Equations (1) and (A5)) for the case of a = 0.01 is described by the solid line. The theoretical approximation in the limit a → 0 (Equation (A11)) is presented by the dashed line and the symbols present the asymptotic behavior of the approximation (Equation (A12)). The Gaussian distribution (dashed-dotted line) is provided for comparison.