The Role of the Central Limit Theorem in the Heterogeneous Ensemble of Brownian Particles Approach

: The central limit theorem (CLT) and its generalization to stable distributions have been widely described in literature. However, many variations of the theorem have been deﬁned and often their applicability in practical situations is not straightforward. In particular, the applicability of the CLT is essential for a derivation of heterogeneous ensemble of Brownian particles (HEBP). Here, we analyze the role of the CLT within the HEBP approach in more detail and derive the conditions under which the existing theorems are valid.


Introduction
The heterogeneous ensemble of Brownian particle (HEBP) [1,2] describes a large class of anomalous diffusion phenomena, observed in many physical and biological systems [3][4][5][6]. The HEPB approach is based on the Langevin stochastic equation of diffusion of a free particle, i.e., the mesoscopic description of Brownian motion (Bm). The relaxation time (τ) and the diffusivity (ν) of the particle constitute two important scales of Bm process. In the classic Langevin approach, τ and ν are constant parameters. Instead, in the HEPB approach, it is assumed that τ and ν are time-independent random variables. In HEPB the single-particle trajectory (SPT) follows a classic Langevin dynamics and it is characterized by a stochastic realization of the parameters (τ i and ν i for particle i). The random nature of the scales τ and ν of the SPTs mimics the heterogeneity of the environment and/or the heterogeneity of an ensemble of particles diffusing in the environment. In fact, the anomalous diffusion behavior described by HEPB is generated by the heterogeneity of τ and ν values in different SPT realizations.
Long time and space correlation, characteristics of many anomalous diffusion processes [7][8][9], are often introduced by modifying the laws of the dynamics, by including memory kernels and/or integral operators [10,11] in the equations, for example, the fractional derivatives [12]. These changes in the dynamics introduce often non-Markovianity and/or non-locality in the processes.
The HEPB approach maintains the Markovianity of the process because the fundamental process remains the classical Brownian motion (Bm), but the heterogeneity of the scales involved in the system permits to describe a process with stationary features different from the Bm [13]. We will refer to this heterogeneity as to a population of scales, because the values of these scales follow a given probability distribution. Furthermore, the model structure permits to keep the standard dynamical laws, with integer time derivative of physical variables like the velocity (V) and the spatial coordinate (X) of the particle, and to avoid the introduction of fractional time derivatives.
One of the random scales contributing to the anomalous behavior the Langevin description of HEPB [1] is the time scale τ. When the distribution of τ is properly chosen and ν is kept constant, the HEPB describes the same one-time one-point probability density function (PDF) of the fractional Brownian motion (fBm), i.e., a normal distribution with variance (the mean squared displacement of the process, MSD) scaling as a power law of time in the long time limit: where 0 < α ≤ 2 and D α is the constant playing the role of diffusion coefficient. Depending on the value of the exponent α, it is possible to distinguish what is called super-diffusion and sub-diffusion, associated respectively to super-linear and sub-linear values of the parameter. The convergence of the PDF to a normal distribution depends on the applicability of the classical central limit theorem (CLT). We will demonstrate later that by choosing properly the population of the time scales according to certain PDFs, both the Gaussian shape of the PDF and the anomalous scaling of the variance can be guaranteed.
The CLT represents a cornerstone in probability theory. It states that when a large amount of one -or multi-dimensional, real-valued and independent (or weakly dependent [14]) random variables are summed, the probability distribution of their sum will tend to the Gaussian distribution G, defined by its characteristic function:ĝ This result has been generalized by the generalized CLT to a larger class of stable distributions described by the following characteristic function [15]: where α, β, µ, C ∈ R, ω(k, α) = tan(απ/2) if α = 1, else ω(k, α) = 2/πln(|k|). The Gaussian distribution can be found to be a special yet fundamental case when α = 2. The generalized CLT [15] describes the convergence of the sum of stable variables with also infinite variance, for example, the symmetric Levy stable distribution. The stability property of the symmetric Levy stable distribution is fundamental to obtain a random walk with infinite large displacements as the well known Lévy-Feller diffusion process [8,16,17]. The PDF of this process converges in fact to the non-Gaussian but symmetric Levy stable distributions.
In the following sections, we briefly review the CLT formulation, then we introduce the problem of the convergence in the distribution of a mixture of Gaussian random components with random variances when the variance distribution is particularly extreme. Thereafter we recall the HEBP model formulation and define the sufficient conditions over τ to obtain a fBm-like process.

The Classical CLT Formulation
For completeness, we summarize here the most famous versions of the CLT and introduce some useful notation and definitions.
For parameters µ ∈ R and σ ∈ R + , a normal (or Gaussian) distribution N (µ, σ 2 ) is a continuous probability distribution defined by its density function where µ and σ 2 are the expectation and variance of the distribution, respectively. For µ = 0 and σ = 1, we obtain what is usually called the standard normal distribution N (0, 1). For the sequence of random variables (X n ) n≥1 , we define random variables (S n ) n≥1 as partial sums S n = X 1 + X 2 + · · · + X n . The theory of central limit theorem derives conditions for which there exist sequences of constants (a n ) n≥1 , a n > 0, and (b n ) n≥1 such that the sequence S n −b n a n n≥1 converges in distribution to a non-degenerate random variable. In particular, CLT describes the convergence to standard normal distribution with constants defined as a 2 n = ∑ n k=1 Var [X k ] and b n = ∑ n k=1 E [X k ]. Different constraints on variables X 1 , X 2 , . . . lead to different versions of the CLT. We will briefly review the most prominent results of the theory of central limit theorems. For a more pedagogical and/or historical perspective, see [18][19][20][21][22][23][24].
We start with the case when variables X 1 , X 2 , . . . are independent and identically distributed. With additional requirements of finite mean µ and positive and finite variance σ 2 , we obtain: Dealing with independent, but not necessary identically distributed, random variables X 1 , X 2 , . . . with finite variance, we define µ k = EX k , σ 2 k = VarX k and s 2 n = ∑ n k=1 σ 2 k for every k ≥ 1. To obtain the main result, we need two Lindeberg's conditions: and The Lindeberg-Lévy-Feller theorem provides sufficient and necessary conditions for the following result: Lindeberg and Lévy proved (using different techniques) that if (7) holds, so do (6) and (8). Feller proved that if both (6) and (8) are satisfied, then so is (7).
Since Lindeberg's condition (7) can be hard to verify, we can instead use the Lyapounov's condition which assumes that for some δ > 0, E |X k | 2+δ < ∞ (for all k ≥ 1) and 1 s 2+δ If for independent random variables X 1 , X 2 , . . . the Lyapounov's condition is satisfied, then the central limit theorem (8) holds. Since the Lyapounov's condition implies the Linderberg's second condition this result follows directly from the Lindeberg-Lévy theorem.
In all versions of the CLT mentioned so far, the assumption of finite variance was crucial. To extend our observations to the case when variance does not exist (or is infinite), we introduce the notion of domains of attraction. We are observing a sequence X, X 1 , X 2 , . . . of independent, identically distributed random variables. We say that X, or, equivalently, its distribution function F X , belongs to the domain of attraction of the (non-degenerate) distribution G if there exist normalizing sequences (a n ) n≥1 , a n > 0, and (b n ) n≥1 , such that Another important concept is the one of stable distribution. Retaining the same notion, the distribution X is stable if there exist constants (c n ) n≥1 , c n > 0, and (d n ) n≥1 such that S n d = c n X + d n (for all n ≥ 1).
It can be shown that only stable distributions possess a domain of attraction [18]. The most notable stable distribution is Gaussian and by the classical CLT we know that all distributions X with finite variance belong to the domain of attraction of the Gauss Law. However, there are also some distributions with infinite variance that belong to it. More precisely, it can be shown [25] that random variable X with the distribution function F X belongs to the domain of the attraction of the Gauss law if and only if

CLT for a Population of Gaussian Random Variables
We reviewed the fundamental theorems related to the classical CLT, having the Gaussian distribution as limit distribution of the sum of random variables S n . The recurrent and sufficient (but not necessary) condition leading to the classical CLT description is that the variance of the i.i.d. random variables that are summed should be finite. However, there exist distributions with infinite variance that fall in the Gaussian domain of attraction [15,25]. In this paragraph, we provide a preparatory example to introduce the role of the CLT in the HEBP. The sum of a population of Gaussian variables with random variances (which may tend to infinity), is here rewritten as the sum of i.i.d. random variables defined as the mixture of random Gaussian components with random variances. If such a mixture has finite variance, the standard CLT conditions are satisfied. In fact, as it will be explained in more details hereafter, the convergence in distribution of the sum to a Gaussian is not always guaranteed when some extreme distribution of the random variance is considered.
Let us consider partial sums of independent Gaussian random variables where, denoting with f k (x k ) the PDF of X k , we have: The distribution of the sum of n random variables can be exploited in term of a convolution integral. Thus, we can derive explicitly the limit distribution of Equation (12) by inverting the characteristic function φ(ω) of S n , which corresponds to the product of the characteristics φ k (ω) of X k : which gives Let us assume σ k ∼ √ Λ, with Λ distributed according to a generic PDF f (λ). If the first moment of Λ exists in the limit of large n, by applying the law of large numbers, we can well approximate the Equation (16) in terms of EΛ: which is indeed the characteristic function of a Gaussian distribution with variance n · EΛ for finite expectation of f (λ) even if the supremum of Λ does not exists. The convergence of S n can be proven using the CLT for the sequence of independent, identically distributed random variables X, X 1 , X 2 , . . . with X ∼ N (0, Λ). These variables, in general, will not have a Gaussian shape and can equivalently be defined as the product of the independent random variables: where Z ∼ f 1 (z) = N(0, 1), Λ ∼ f 2 (λ), Λ ∈ R + . The PDF f (x) of X can be represented by the integral form [26] f Since Z is a Gaussian distribution, it follows that 1 N(0, λ). Using Fubini's theorem, now it is easy to compute the second moment of X: If EΛ is finite the partial sums S n = X 1 + · · · + X n of i.i.d. random variables X k converge in distribution to a Gaussian S n √ n d − → N (0, EΛ) .
If EΛ is not finite, the distribution f (x) may fall out of the Gaussian domain of attraction. For example, by choosing Λ (the random variance) to be the extremal Lévy density distribution, it follows that f (x) (the mixture defined by Equation (19)) corresponds to the symmetric Lévy stable distribution [27]. In fact in the case f (x) is itself a stable distribution, like the Levy stable distribution is, its sum belongs to its own domain of attraction.
However, infinite variance is not a synonym of stability. In fact, despite the presence of infinite EΛ, under certain constraints on the tail of the distribution f (x), f (x) still satisfies (11) and falls in Gaussian domain of attraction, for example if its PDF for large x is proportional to x −3 , x −3 log(x) , x −3 /log(x) [15].

Application of the CLT in the HEBP
In the HEBP Langevin model [1] the anomalous time scaling of the ensemble-averaged MSD is generated by the superposition of a population of Bm processes in a similar way to equation (12), where each single process is characterized by its own independent timescale, and with frequency of appearance of such timescale described by the same PDF.
CLT applicability guarantees that after averaging over a properly chosen timescale distribution the shape of the PDF will remain Gaussian despite the time scaling will change from being linear in time to be a power low of time in the long time limit, following Equation (1). In order to show this applicability let first introduce the HEPB construction.
Let us start with the classic Langevin equation describing the dynamics of a free particle moving in a viscous medium (or Bm): where V is the random process representing the particle velocity, τ in classical approach corresponds to the characteristic time scale of the process, i.e., the scale of decorrelation of the system. In the classic Langevin description the timescale is defined by the ratio m γ , with m being the mass of the diffusing particle and γ the Stoke's drag force coefficient of the velocity. The multiplicative constant of the Wiener noise increment dW in the square root, ν, represents the velocity diffusivity and is related to the drag term by the fluctuation-dissipation theorem (FDT) [28]. This relation does not depend on the mass of the particle but on the average energy of the environment (the fluid) and the cross-sectional interaction between the medium and the particle moving. The Wiener increment dW is the increment per infinitesimal time induced by the presence of a Gaussian white noise with unit variance and is hence fully characterized by its first two moments: The presence of Gaussian increments in the stochastic equation leads to the stationary state V ∼ N(0, kT/m) and, being V = dX/dt, to the stationary increments process X(t) ∼ N(x 0 , σ 2 x (t)), with σ 2 x (t) = ντ 2 t. Let now the parameters ν and τ be time independent random variables: ν ∼ p ν (ν) and τ ∼ p τ (τ). The way it will affect V(t) and X(t) is clear in the case of ν, but more complicated to specify in the case of τ.
Let us consider the velocity defined as a product of random variables V = √ νV . It is easy to show that √ ν factorizes out from the stochastic differential equation, resulting in the following description of the evolution of V : Therefore, the PDF associated to the processes V(t) and X(t) can be derived by applying the same integral formula of Equation (19), eventually producing non-Gaussian PDF and weak ergodicity breaking stochastic processes as result [29][30][31].
Dealing with random timescales is much more tricky because the variable τ is embedded in the correlation functions and is not possible to factorize it out without simultaneously transforming the time variable. Furthermore, because of the time variable transformation different realizations of the process would not be comparable directly anymore without reverse transformation.
To avoid these complications, we define V as the superposition of N τ independent Bm processes each with its own timescale: where V (t) can still be described by the Equation (25). If the resulting process V (t) is still a Gaussian process, the previously described approach to derive V = √ νV can be applied without further changes. However, all the correlation functions of V and moments will become the sum of the correlation functions of the single processes V (t|τ), which is equivalent to averaging with respect to p τ (τ). A careful choice of this distribution permits to obtain non-linear time scaling of the MSD of V .
Let us demonstrate the applicability of the CLT explicitly making use of the Equation (17). Assuming that a global stationary state (in the sense of stationary increments) has been reached, the relation between the MSD and the VACF determined by the free particle Langevin dynamics can be expressed by: where R(t, τ) = ντe −t/τ , with ν being an arbitrary constant, is the stationary VACF of the process associated to the realization τ of the timescale, V (t|τ). By omitting time dependence for sake of conciseness, we can define λ = σ 2 x = f (τ), which can be considered as a random variable itself distributed according to the PDF P(λ) = p τ ( f −1 (λ)) · ∂ λ ( f −1 (λ)). The average over λ is thus equivalent to computation of the expectation f (τ) with respect to τ.
In principle we may compute the expectation after the integration of Equation (27), however, it is much easier to compute it before performing the integration to avoid self-canceling terms: For a generic PDF p τ (τ) we obtain: This expression is finite for any value of time only if τ is finite. Moreover, this is a very important physical condition. In fact, when times goes to zero, R(t = 0, τ) τ is proportional to the average kinetic energy of the system.
The distribution p τ (τ) should have a power-law tail to introduce the desired anomalous time scaling of λ but a finite value of the first moment of τ to maintain CLT applicability. The importance of this assumption can be seen explicitly by solving the integral in Equation (29) for the distribution employed in [1]: where the constant C = τ Γ(1/α) α serves to control the value of the average and L α α (·) is the extreme Levy density distribution [10].
By considering the integral representation of the extremal Lévy density distribution and the Euler's gamma function with some more simplification (see Section 3.5, equation 3.109 in [32]), the result in (29) can be represented by the integral form: This expression can be solved through the residues theorem considering the poles z/α + 1 = −n or z = n, with n = 0, 1, 2, . . . , ∞, to obtain the short or the long time scaling of the variable. An interested reader can verify the explicit derivation in [1,32]. By plugging this result in Equation (28), without any assumption about time values, we observe that the condition of finite τ is necessary to guarantee λ to be finite too, ensuring the Gaussian form of the PDF.

Discussions
The CLT has a fundamental role in the HEBP approach and, generally, in the theory of stochastic processes. The domain of attraction of the distribution of the increments determines the shape of the PDF of the stochastic process in the long time limit. In this work, we reviewed the main conditions of the classical CLT, by including also the less known case of distributions with infinite variance which fall in the Gaussian domain (with slower convergence). We proposed and analyzed a preparatory exercise to give the mathematical foundations to understand the approach used in HEBP to generate PDFs with Gaussian shape and non-linear scaling of the variance in time for the long time limit. It is shown that the sum of such a population of Gaussian random variables is mathematically defined by the sum of a more complex, and in general non-Gaussian, i.i.d. random variables. The population of Gaussian distributions can be interpreted, within a Bayesian approach, as the likelihood modulated by the prior distribution of a parameter of the model. The formal randomization of the parameter of the distribution (Equation (19)) is equivalent to the computation of the marginal likelihood, which corresponds indeed to the PDF of the i.i.d. random variables. This approach could be easily generalized to other distributions and parameters for statistical application purposes. The role of CLT in HEPB is then clarified. After recalling the derivation of the model, the conditions obtained in the preparatory example have been explicitly proven.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: