The Emergence of the Normal Distribution in Deterministic Chaotic Maps

The central limit theorem states that, in the limits of a large number of terms, an appropriately scaled sum of independent random variables yields another random variable whose probability distribution tends to attain a stable distribution. The condition of independence, however, only holds in real systems as an approximation. To extend the theorem to more general situations, previous studies have derived a version of the central limit theorem that also holds for variables that are not independent. Here, we present numerical results that characterize how convergence is attained when the variables being summed are deterministically related to one another through the recurrent application of an ergodic mapping. In all the explored cases, the convergence to the limit distribution is slower than for random sampling. Yet, the speed at which convergence is attained varies substantially from system to system, and these variations imply differences in the way information about the deterministic nature of the dynamics is progressively lost as the number of summands increases. Some of the identified factors in shaping the convergence process are the strength of mixing induced by the mapping and the shape of the marginal distribution of each variable, most particularly, the presence of divergences or fat tails.


I. INTRODUCTION
According to the central limit theorem (CLT), the sum of independent variables with finite first and second moments is governed by a Gaussian distribution when the number of summands is asymptotically large.The mean value and the variance of the Gaussian equal the sum of the individual mean values and variances, respectively.The Gaussian distribution has maximal entropy for a given variance and is reached independently of the distributions from which the summands are sampled.The convergence to the Gaussian limit, therefore, can be viewed as a loss of information about the original data.Extension to sums of variables with diverging first and second moments have been derived [1][2][3][4], the asymptotic distributions of which are no longer Gaussian but are still members of a family of so-called stable distributions.
Experience shows that many systems are successfully modeled by stable distributions, for example, in the theory of errors and propagation of uncertainty.This is often justified by the fact that errors, as well as many other quantities of interest, can be conceived as the sum of a large number of variables representing disparate magnitudes that appear to be unrelated.Yet, for instance, Physics dictates that all the variables describing a system of interacting particles (as opposed to an ensemble of free particles) are correlated to one another.Therefore, the independence condition is no more than an approximation.
To improve this approximation, extensions of the CLT have been developed also for variables that bear different degrees of statistical dependencies, including those obtained by subsequent application of a deterministic rule that produces ergodicity and aperiodicity [5][6][7][8][9][10][11].Here we analyze several systems of this type.As discussed in the next section, a conveniently modified version of the CLT exists for appropriately scaled sums of variables determin-istically related to one another.Notably, the family of stable distributions for these cases coincides with the ones obtained for independent variables.These extensions provide mathematical certainty that sums of strongly correlated variables, if produced by a chaotic dynamical system, lose all memory of their original distribution, and asymptotically approach a distribution that also happens to be the limit of sums of independent variables sampled from a certain distribution.The strong statistical dependencies governing the physical world, therefore, may be legitimately ignored when describing the probability distributions of macroscopic variables, and it is legitimate to conceive the latter as a sum of a large number of microscopic, independent variables.This property greatly simplifies the description of macroscopic systems and has probably played a crucial role in the development of the theory of probability.
In practical situations, however, it is important to know how many terms a sum needs to include for its distribution to be well described by the asymptotic result.To shed light on this question, we study in this paper the convergence of the probability distribution of a sum of perfectly correlated variables, generated by the iteration of a chaotic, deterministic map, towards the asymptotic distribution predicted by the extensions of the CLT.The aim is to characterize how the loss of information about the deterministic nature of the map depends on the number of variables that are summed together.Since previous theoretical results do not predict the rate of convergence towards asymptotic distributions in deterministic systems, our analysis is based on numerical simulations of several paradigmatic examples, and on a comparison with the behavior of randomly sampled systems with the same distributions.
The paper is organized as follows.In Section II we present the main theoretical tools to be employed later, including the extension of the CLT to variables that are strongly correlated, information-theoretical measures that quantify the differences between probability distributions, and the behavior of the variance of a sum of variables that are correlated.The following three sections apply these tools to the analysis of a chaotic dynamical system with a uniform marginal distribution and varying Lyapunov exponent (the Bernoulli map, Section III), a chaotic dynamical system with a highly nonuniform marginal distribution and several types of orbits (the logistic map, Section IV), and an example of a process with fat-tailed distribution (Section V).Our main conclusions are summarized in Section VI.

II. CENTRAL LIMIT THEOREM FOR DETERMINISTIC MAPS
We consider a generic one-dimensional map, x(t + 1) = f [x(t)], with a well-defined invariant measure ρ x (x), determined by the identity where [ρ x • f ] (x) is the composition of functions ρ x (x) and f (x), and the prime indicates differentiation with respect to x.We assume that the mean value x is finite over the distribution ρ x (x), and -for now-we assume the variance σ 2 x of x is also finite: where the integrals run over the whole domain of variation of x.In Section V we study a case where we relax the condition that σ 2 x is finite.A central-limit theorem (CLT) for this kind of system applies [6][7][8][9][10][11] when the map under study is ergodic and aperiodic.We recall that a map is ergodic if all its invariant sets are null or co-null, and aperiodic if its periodic orbits form a null set [11].The combination of ergodicity and aperiodicity is typically equivalent to the dynamics being chaotic [12].In this case, the CLT states that the distribution of the (centered, suitably normalized) sums of N successive values of x(t), becomes normal for N → ∞: for some value of the variance σ 2 s .Here, G σs denotes the Gaussian centered at zero, with standard deviation σ s .
For each value of N , the variables x(t) and s N (t) can be integrated into a single two-dimensional map: where Thus, for N → ∞, the marginal invariant measures of the variables x and s N in map (5) are, respectively, ρ x (x) and the Gaussian ρ s (s) = G σs (s) of Equation ( 4).
In contrast with the sums of statistically independent random variables drawn from a given distribution, in the limit N → ∞, the variance of the sums s N (t) does not necessarily coincide with that of the summands, σ 2 x .The difference arises from the correlations between successive values of x(t), induced by the map with the ensuing mutual correlations between the values of s N (t).For a finite number of summands N , the variance of s N (t) is given by the Green-Kubo formula [13]: where the overline indicates the average with respect to the distribution ρ x (x).The value of σ 2 s N becomes independent of t when the process x(t) has reached a stationary regime.For N → ∞, the variance is Provided that the sum converges, this formula gives the variance of the asymptotic normal distribution G σs (s) of increasingly long sums s N (t).
In the following, we study the process of convergence towards the asymptotic distribution predicted by the above CLT for some selected deterministic maps, as the number of terms in the sums s N grows.For each N , we numerically iterate Equations ( 5) and estimate the distribution of the sums s N , ρ s N (s N ), as a suitably normalized 10 3 -column histogram built from, typically, 10 7 values of s N .To quantify the difference between ρ s N and the expected asymptotic Gaussian distribution G σs (s), we use the Kullback-Leibler divergence (KLD).Recall that the KLD between two distributions ρ 1 (s) and ρ 2 (s) is defined as This quantity measures the inefficiency with which the data s is represented by a code optimized to be maximally compact under the assumption that the distribution is ρ 2 when, in reality, the data are generated from ρ 1 .The inefficiency equals the mean number of extra bits per sample [14].The divergence only vanishes when the two distributions coincide, and is otherwise positive.For brevity, we hereafter denote as D G the KLD between the distribution ρ s N and the asymptotic normal distribution G σs : Additionally, for each N , it is interesting to compare ρ s N with a normal distribution with the variance σ 2 s N given by Equation ( 6), namely, the same variance as the sums s N .Since σ 2 s N → σ 2 s as N → ∞, this is an alternative way of characterizing the convergence to the asymptotic Gaussian G σs .For this comparison, we introduce the KLD Finally, in order to contrast the deterministic dynamics of the chaotic map under study with a genuinely aleatory process, we calculate the KLD for the distribution of sums of the same form as in Equation ( 3), but with the N values of the variable x drawn at random from the invariant measure ρ x (x).According to the standard CLT for statistically independent variables, as N grows, the distribution ρ random s N of these random-sampling sums is expected to asymptotically converge to a Gaussian with variance σ 2 x .To quantify this convergence, we compute The measures D random , D G and D G N reflect three different aspects of the convergence of ρ s N to G σs .The process by which D random tends to zero describes how independent variables, when summed together, lose the memory of the distribution from which they are sampled and approach a Gaussian.The Gaussian distribution is the one with maximal entropy among those with fixed variance.By acquiring a Gaussian shape, therefore, the distribution of the sum maximizes uncertainty.In Appendix B, we show that for large N the divergence D random decays as N −1 if ρ x is not symmetric around its mean value, and at least as fast as N −2 if there is symmetry.
A steep decay of D G N with N , at a faster rate than D G , implies a rapid evolution of ρ s N towards a bell-shaped distribution, whose variance may still have to evolve to its asymptotic value σ 2 s .The convergence process can therefore be conceived as a sequence of two stages, the first one consisting of shedding all the structure in ρ(x) and becoming Gaussian-like, and the second, adjusting the variance.Once ρ s N is approximately Gaussian, its KLD with the asymptotic distribution G σs can be analytically calculated in terms of their respective variances:

III. THE BERNOULLI MAP
As a first case of study, we take the generalized Bernoulli map where {•} indicates fractional part, and m > 1 is an integer factor.This map has been extensively studied since long ago as a paradigm of deterministic chaotic systems, due to its combination of complex behavior and analytical traceability.Its Lyapunov exponent equals ln m.The invariant measure of x(t) is particularly simple: for all m, with x = 1/2 and σ 2 x = 1/12.We show in Appendix A that the variances of the sums s N can be explicitly calculated: Note that for N ≫ (ln m) −1 , this variance takes the approximate form with N 0 = 2m/(m 2 − 1).For N → ∞, in turn, We first consider the Bernoulli map for m = 2. Dark full lines on the left column of Figure 1 show numerical results for the distributions of the sums s N , ρ s N , for three small values of N .Light-colored curves stand for the asymptotic Gaussian ρ s = G σs , and dashed curves are the Gaussians G σs N for each N .Their respective variances, σ 2 s and σ 2 s N , are given by Equation (12).On the right column, dark and light-colored curves respectively show the distributions of the sums of randomly sampled values of x, ρ random s N , calculated analytically as N -th order selfconvolutions of ρ x (x), and the expected asymptotic Gaussian G σx .A comparison of the two columns illustrates the difference between the distributions of the sums generated by map iteration on one side and by random sampling on the other.It also shows that convergence to the asymptotic distribution is faster in the latter case.
The main panel of Figure 2 shows, with different symbols, the KLDs D G , D G N , and D random , defined in the preceding section.dependence on N .The inset in Figure 2 shows, as dots, the numerical estimation of the variance of s N over the distribution ρ s N as a function of N .The dashed curve corresponds to the analytical expression of Equation (12).
In the range shown in the figure, for N ≳ 10, D G is larger than D random by a factor of around 14.Meanwhile, in the same range, D G N decays faster, approximately as N −2.3 .As discussed at the end of Section II, this faster decay of D G N suggests that ρ s N is rapidly approaching a Gaussian distribution, with a KLD with the asymptotic distribution ρ S as given by Equation (9).Replacing Equation (13) into Equation ( 9) and expanding up to second order in For m = 2 we have N 0 = 4/3 so that, according to the above equation, D G ≈ 0.64 N −2 .A power-law fitting of the data for D G for N ≤ 20 ≤ 50 gives D G ≈ 0.69 N −1.9 , which fits the prediction of Equation ( 15) remarkably well.This agreement provides strong evidence in favor of the hypothesis that ρ s N converges to ρ s in two stages, acquiring a Gaussian shape in the first, and adjusting the variance in the second.The transition from the first stage to the second, however, does not imply that ρ s N is strictly speaking a Gaussian distribution.
What are the implications of the fact that after the initial transient D G and D random both decay with the same power law, approximately proportional to N −2 ?In this regime, D G ≈ 14 D random which means that, for each N , D random (N ) is approximately equal to D G ( √ 14N ).By increasing the number of random samples drawn from the  3), for the Bernoulli map (10) with m = 2.The straight lines in this log-log plot have a slope −2.The inset shows, as dots, numerical results for the variance σ 2 s N over the distribution ρs N (sN ).The dashed line joins the analytical values predicted from Equation (12).invariant measure (11), D random diminishes by the same amount as D G diminishes when running the Bernoulli deterministic mapping a rescaled, larger number of samples, with a scaling factor of α ≈ √ 14 ≈ 3.7.In other words, α samples of the deterministic map are as informative about the asymptotic distribution as a single sample in the random drawing.The presence of correlations makes each new sample from the deterministic dynamics less informative (by a factor of α) than from purely independent draws.
The factor α may also be semi-quantitatively associated with the relation between the asymptotic variance σ 2 s and the original variance σ 2 x .In Equation ( 3), the normalization factor 1/

√
N compensates for the fact that the variance of a sum of N independent samples is proportional to N .Yet, when the summands bear statistical interdependence, the intended compensation need not be attained.The higher the correlations in the deterministic map, the less informative each new datum is, the more unsuccessful the compensation, and the larger the increase of the asymptotic variance.In the present case, the variance increases threefold, from 1/12 to 1/4, which is similar to the factor relating D G and D random , namely, α.
Considering now other values of m in the Bernoulli map (10), the numerical results presented in Figure 3 show that the dependence of D G on N is similar to that obtained for m = 2, with the only difference that D G becomes progressively smaller as m grows.As before, the convergence may be conceived as consisting of two stages, with Equation ( 9) approximately holding for the second stage.According to the results of Figure 3, the second state is reached faster for larger values of m.As expected from the large-N asymptotic behavior of D G predicted by Equation (15) with N 0 = 2m/(m 2 − 1) [cf.Equation ( 13)], it approaches D random for large m.This implies that the effect of the statistical dependencies induced by the deterministic nature of the map decreases as m grows.The KLD D G N is not shown in Figure 3, but its behavior is similar to that of the case of m = 2 (Figure 2).
In summary, in the Bernoulli map, D G N decreases faster than D G during the first stage of the convergence process, where ρ s N acquires a Gaussian-like shape.Only later, the variance is adjusted towards its final value.The second stage can be modeled analytically, providing a good qualitative description of the asymptotic behavior inferred from numerical results.

IV. THE LOGISTIC MAP: FULL CHAOS AND INTERMITTENCY
We now turn our attention to the logistic map [17,18] x(t with 0 < λ ≤ 4. Much like Bernoulli's, the logistic map hardly needs any presentation.We first consider the case λ = 4, which we call the regime of "full chaos".For this value of λ, the dynamics are chaotic and therefore comply with the hypotheses of the CLT for deterministic systems discussed in Section II.Moreover, due to the existence of a nonlinear change of variables that transforms the logistic map with λ = 4 into the Bernoulli map of Equation (10) with m = 2, several analytical results for the latter can be extended to the former.In spite of this connection, as we show below, the statistics of the sums s N are qualitatively different between the two maps.For λ = 4, the invariant measure of the logistic map can be written explicitly as [19] for 0 ≤ x ≤ 1, and 0 otherwise.The mean value is x = 1/2 and the variance is σ 2 x = 1/8.As we show in Appendix A, the correlations between iterations of the map, , vanish for all k.From Eqs. ( 6) and (7), this implies that the variances of the sums s N are the same for all N , and therefore coincide with both the variance of x and with the limit for N → ∞: x .Therefore -in contrast with the Bernoulli map studied in the preceding section-it is not possible to discern between a first stage of convergence to a Gaussian profile, and a second stage of adjustment of the variance.
In Figure 4, the left column shows numerical estimations of the distributions ρ s N (s N ) of the sums of N consecutive iteration of the logistic map with λ = 4, for three values of N .The light-colored curve corresponds to the expected asymptotic Gaussian.
In addition to the sharp peaks in the profile of ρ s N for small N , an important difference with the Bernoulli map (Figure 1) is that ρ s N is no longer symmetric with respect to zero.This asymmetry may come as a surprise, taking into account that both f (x) and ρ x (x) are symmetric around the mean value x.The asymmetry, however, originates from the fact that the functions x + f (x), . ., which ultimately determine the distributions of the sums s N , are not symmetric around x.
On the right column of Figure 4 we show, for the same values of N , the distributions ρ random s N of sums of N random values of x sampled from ρ x .In contrast with the case of the Bernoulli map, ρ random s N is here estimated numerically.As expected, the distributions of random-sampling sums are now symmetric with respect to zero, and exhibit a fast convergence to the asymptotic Gaussian.
Figure 5 shows D G and D random for the fully chaotic logistic map, as functions of N .Since, as explained above, σ 2 s N equals σ 2 s for all N , now D G N coincides with D G .Due to the symmetry of ρ x with respect to its mean value, the arguments given in Appendix B apply to this case, and D random decays as N −2 for large N .The full straight line in the log-log plot of the figure has slope −2, confirming this prediction in the plotted range.Yet, the behavior of D G is considerably different.It starts with a small increment between N = 1 and 2, where it attains a maximum, and thereafter decays rapidly up to N ≈ 20.This decay corresponds to the interval of N for which the distribution ρ s N displays identifiable singularities.For N ≳ 20, the singularities start to overlap, and the distribution ρ s N varies more smoothly and displays a well-defined asymmetric bell-shaped profile.In this zone, the decay of D G is slower and approximately behaves as N −1 , as illustrated by the dashed straight segment of slope −1.As shown in Appendix B, a decay as N −1 is expected for the KLD of the distribution of random-sampling sums when the distribution of the individual summands is not symmetric with respect to the mean value.If the disparate dependence on N between D G and D random persists as N grows beyond the range considered here, their relative difference would increase indefinitely for N → ∞.
Although still chaotic, other values of λ in Equation ( 16) give rise to qualitatively different dynamical features in the logistic map.For λ = 3.828, which is our next case of study, the dynamics are intermittent.Just above this value of λ, at λ 3 = 1+2 √ 2 ≈ 3.8284, the logistic map enters the largest stability window within its chaotic regime, where x(t) becomes asymptotically locked in a period-3 orbit.For λ ≲ λ 3 , the vicinity of the critical point manifests itself in the form of intermittent behavior for x(t).Namely, the dynamics alternate intermittently between intervals of "turbulent" evolution, where its behavior is conspicuously chaotic, and "laminar" evolution, where x(t) remains temporarily close to the period-3 orbit, but eventually departs away from it.The left panel of Figure 6 shows 900 successive iterations of x(t) for the above value of λ, illustrating both kinds of behavior.
For λ = 3.828, no analytical description of the logistic map exists, and we must resort to numerical techniques.As inferred from the left panel of Figure 6, in this case, ρ x (x) covers only a portion of the interval [0, 1], between x ≈ 0.157 and 0.957, and displays three peaks near the values of x in the period-3 orbit.Our numerical estimations for the mean value and the variance are x ≈ 0.593 and σ 2 x ≈ 0.0864.In principle, the variance of the sums s N could be obtained from Equations ( 6) and ( 7) by numerically computing the correlations . These quantities, however, exhibit sharp oscillations and slow convergence as k grows, as well as persistent fluctuations for large k.The right panel in Figure 6 shows c k up to k = 90.In practice, such features make impossible the evaluation of the variances σ 2 s N and σ 2 s using the sums in Equations ( 6) and (7).We therefore resort to their direct numerical calculation using the values of s N (t) obtained from successive map iterations.In particular, our estimation for the variance of the sums in the limit N → ∞ is σ 2 s ≈ 0.0403.Colored symbols in the main panel of Figure 7 stand for D G in the case of the logistic map with λ = 3.828, as a function of N .As with full chaos (cf. Figure 5), two distinct decay regimes are identifiable.Moreover, the behavior for N ≲ 50 now contains signatures of the pseudoperiodic nature of the mapping in the "laminar" intervals, namely, the relatively large values of D G when N is a multiple of 3 (triangles).In fact, for those values of N , the distributions ρ s N are narrower and sharper than for the remaining values, giving rise to higher KLDs.This is clearly illustrated by the dependence of the variance σ 2 s N on N , shown in the inset of the figure.After an abrupt initial decay, σ 2 s N displays oscillations of period 3, which progressively damp out as N grows.For N ≳ 50, the difference in D G for multiples of 3 rapidly smooths out, as the KLD enters a regime where it decays approximately as N −1 , as indicated by the dashed segment of slope −1.
For this case of intermittent dynamics, we have also calculated D G N , finding qualitatively the same behavior as for D G .As a matter of fact, D G and D G N typically differ from each other in just about a 10 %.Thus, for the sake of clarity, the numerical estimations of D G N were not included in Figure 7.As for the KLD of the distribution of random-sampling sum, D ramdom , the results of Appendix B indicate that it should decay as N −1 for large N .This behavior, however, has not yet been reached in the range of values displayed in Figure 7. Assuming nevertheless that this is the asymptotic dependence of D random , our results suggest that the KLD for random-sampling sums is no less than three orders of magnitude smaller than D G for large N .
In summary, both for λ = 4 and 3.828, the main difference between the statistics of the sums s N obtained from the iteration of the logistic map and from a random sampling of the corresponding invariant measures, as N grows, resides in their disparate rates of approach towards the asymptotic distribution.Within the range of N considered in our numerical calculations, the decay of D G as N −1 can be qualitatively understood by the lack of symmetry in the invariant measures although, strictly speaking, the corresponding result in Appendix B holds for random sampling only.
Both when λ = 4 and 3.828, for N ≳ 20, the difference between D G and D random is well above two orders of magnitude.In the intermittent case, moreover, the pseudo-periodic character of the "laminar" dynamics reveals itself in the form of oscillations in D G for small N , which are naturally absent in D random .Plausibly, pseudoperiodicity is also responsible for the slow decrease of D G during the oscillatory regime.Intermittency degrades the mixing properties of the mapping since, during the pseudoperiodic intervals, the dynamics only explore a reduced portion of the available range in x.

V. A FAT-TAILED INVARIANT DISTRIBUTION
Much like the standard CLT, the CLT for deterministic systems can be generalized to the situation where the variance of the relevant variable x diverges [11].In particular, this is the case of invariant distributions with a sufficiently slow algebraic decay for large |x|: ρ x (x) ∼ |x| −α−1 with 0 < α < 2. Under the same hypotheses of ergodicity and aperiodicity stated in Section II, and assuming for simplicity that x = 0 -for instance, due to the symmetry of ρ x (x) around zero-the distribution of the sums [cf.Equation ( 3)] converges to a stable distribution given by the Fourier antitransform of Q γs (k) = exp (−γ α s |k| α ), for some value of the dispersion parameter γ s .The result for distributions with finite variance is reobtained in the limit α = 2, with γ s ≡ σ s as defined in Equation (7).
In this section, we give an example of convergence toward a stable distribution different from a Gaussian in the case of a map with a fat-tailed invariant distribution decaying as |x| −2 for large |x| (i.e.α = 1).This specific case has the analytical advantage that the stable distribution predicted by the CLT can be explicitly written out, namely, which is nothing but the Cauchy (or Lorentzian) distribution.Like the Gaussian, the Cauchy distribution is a maximum-entropy distribution, but with a different constraint.
To get a deterministic chaotic map with a variable distributed following a fat-tailed function, we use the ad hoc procedure of applying a suitable transformation to a map whose invariant distribution is known in advance.Specifically, we take the Bernoulli map of Equation ( 10) with m = 2, for which we know that the invariant distribution is the function given by Equation (11), and introduce a change of variables that transforms this function into the desired fat-tailed profile.This is formally achieved by defining the two-variable map where transforms a variable u with uniform distribution in (0, 1) . By construction, thus, the invariant measure of variable x in map ( 20) is with x varying from −∞ to ∞.By analyzing the behavior of the Fourier transform of ρ x (x) near the origin, it is possible to obtain the dispersion parameter for the Cauchy distribution of sums of independently chosen values of x, which turns out to be γ s = π/2.Unfortunately, the value of γ s when the summands are successive iterations of x in map (20) cannot be found analytically in an explicit way.However, we have numerically found that, for N → ∞, the dispersion parameter again coincides with γ s = π/2 to a high precision.This is the value of γ s that we use to compute the KLD D C = D(ρ s N ||C γs ) between the distribution of the sums s N of Equation (18) and the Cauchy distribution (19).In addition, we do not have a practical procedure to assign a value to the dispersion parameter when the number of summands N is finite.Therefore, in the present case, we do not calculate a quantity analogous to the KLD D Gn of Sections III and IV.Regarding D random , due to the non-analytic behavior of the Fourier transform of ρ x (x) at the origin, it is now not possible to use the procedure of Appendix B to predict how this KLD decreases as N grows.Our analysis must thus rely on numerical results.
In Figure 8, we show the distributions ρ s N (s N ) (left column) and ρ random s N (s N ) (right column) for three small values of N .Light-colored curves correspond to the expected asymptotic Cauchy distribution, given by Equation (19) with γ s = π/2.Note that for N = 2, due to the peak at s N = 0, the difference between ρ random s N and the asymptotic distribution seems to be larger than that of ρ s N .16), in the intermittent regime, λ = 3.828.The arrows at t = 300 and 500 point at "turbulent" and period-3 "laminar" intervals, respectively.Right: The correlation as a function of k in the same intermittent regime, calculated numerically from sequences of 10 7 iterations of x(t).Symbols are connected by lines to facilitate visualization.The KLDs, however, reveal that ρ random s N is slightly closer to the Cauchy distribution (see Figure 9).For N = 10, it is already clear that the approach to the Cauchy distribution is faster for the random-sampling sums.Comparison with the results for the Bernoulli and the logistic maps (cf.Figures 1 and 4) suggest however that, in the present situation, the convergence to the corresponding asymptotic distribution is considerably slower than for those cases.
Figure 9 presents numerical results for the KLDs D C and D random .In order to have significant statistics in the construction of the 1000-column histogram that represents ρ s N (s) from 10 7 samples of the sums s N , we have cut off the interval of variation of s N to (−10, 10), disregarding samples outside that interval.Otherwise, for the fat-tailed distributions involved in the present case, the calculation of the KLDs would be dominated by sampling fluctuations for large values of |s N |.Along most of the range of N spanned by the figure, both D C and D random exhibit rather well-defined power-law decays.Their different exponents, however, make that they progressively diverge from each other as N grows.While D random approximately decays as N −1 , as illustrated by the full straight line of slope −1, a linear fitting of D C for N ≥ 2, shown as a dashed line, points to a slower decay with a nontrivial exponent: N −0.68 .This result suggests that the convergence to an asymptotic distribution for the sums s N in the case of fattailed invariant measures may generally be characterized by unusual exponents in the decay of the KLD.This conjecture will be thoroughly explored in future work, by both analytical and numerical means.

VI. CONCLUSION
We here analyzed the convergence to the asymptotic probability density distribution ρ s of a succession of distributions ρ s N for a conveniently scaled sum of N samples obtained from iterations of a deterministic map.Previous analytical studies had established that a modified version  4, for the sums of Equation ( 18) with x(t) obtained from map (20).Note that the scales are the same in all plots.
of the central limit theorem (CLT) exists for these cases.Yet, as far as we know, the convergence to the limit had not been characterized.Here, we studied several archetypal examples that expose a variety of ways the limiting distributions are approached.
Our characterization was based on the behavior of the Kullback-Leibler divergence (KLD) D G between ρ s and ρ s N , in that specific order.With this choice, the KLD equals the number of extra bits required to encode a sample from ρ s if the code has been optimized for ρ s N .The CLT for sums of random samples with finite variance predicts a KLD D random that decreases as N −2 if each sample is drawn from a distribution that is symmetric around its mean value, and as N −1 if it is not.This is a bold statement, since an infinitesimal modification may suffice to turn a symmetric distribution into an asymmetric one, so even a minute modification would suffice to change the entire asymptotic behavior of the KLD -the change, however, would only become relevant at increasingly larger values of N , as the asymmetry tended to disappear.We are not aware of an analogous theoretical prediction for the case of correlated samples, but the results presented have revealed similar behaviors: D G decreased as N −2 for the Bernoulli map, for which the sums are distributed symmetrically around their mean value, and as N −1 for the logistic map, where the distributions are asymmetric.
In both the Bernoulli and the logistic map, the rates at which ρ s N approached the asymptotic distribution increased with the strength of mixing.Moreover, for the intermittent logistic map, where mixing is virtually absent during pseudo-periodic intervals, convergence to the asymptotic distribution was particularly slow.Therefore, even though all the explored examples were equally deterministic, their behavior differed considerably.Details in the chaotic dynamics are crucial to the behavior of D G for large N .
The convergence of ρ s N in the Bernoulli map could be divided into two stages, one in which the distribution acquired an approximately Gaussian profile, and a subsequent one, in which the variance was adjusted to approach its asymptotic value.Remarkably, in the second stage and for sufficiently large N , the divergence D G (N ) was equal to D random (αN ) with α ≈ 3.74, implying that each sample of the deterministic map was as informative about the asymptotic distribution as α random samples.This equivalence could not be established in the other explored examples, since in all of them, D random and D G decreased with N with different power laws.No rescaling procedure, hence, could transform one into the other.
The last example involved variables with divergent variance.In this case, the derivation of Appendix B is no longer valid, and no theoretical formulation describing how D random tends to zero is known to us.Our numerical explorations revealed a behavior proportional to N −1 for D random , even for samples drawn from distributions that are symmetric around their mean values.The deterministic counterpart D G exhibited an even slower evolution, at a rate that is also slower than the one observed in the cases of finite variance.
In conclusion, in all the examples explored here, the asymptotic trend of the KLD behaved as a power law.Different deterministic maps yielded different exponents, displaying a variety of behaviors.The factors that influenced the exponents were (a) the strength of mixing in the chaotic map, (b) the tendency of the system to evolve near periodic orbits, and (c) the tails of the distribution of individual variables.We stress that the open question of establishing a quantitative connection between the rate of mixing, on one hand, and of KLD decay, on the other, remains an interesting problem for future work.Remarkably, except for the logistic map in the intermittent regime, all the maps explored here are related to each other by simple, nonlinear transformations.Despite these deterministic functional relations, their nonlinear nature determines differences in the statistical behavior of the sums of samples drawn from each map, with a large impact on the convergence towards their asymptotic distributions.
The integral in this equation may look somehow intimidating but, using the change of variables x = sin 2 ξ, it gets the much simpler form 2 π/2 0 sin 2 ξ sin 2 (Kξ)dξ, with K = 2 k .Now, it can be easily shown -for instance, by induction over Kthat the integral equals π/8 for all integers K > 1.From this result, it follows that c k = 0 for all k.Remarkably, therefore, successive iterations of the logistic map in the fully chaotic regime are linearly uncorrelated to each other, although their functional correlation is obviously very large.Thus, the variance of the sums s for all N .
Appendix B: Kullback-Leibler divergence for the distribution of random-sampling sums According to the Berry-Esséen theorem [22,23], the difference between the distribution for the sum of N independent random variables and the Gaussian predicted by the standard central limit theorem decays as 1/ √ N or faster as N grows.We show in this Appendix that, when the distribution of the individual random variables ρ x (x) admits a cumulant expansion -i.e., when the logarithm of its Fourier transform can be expanded in powers of its variable-that difference decays as 1/

√
N if ρ x (x) is asymmetric with respect to the mean value x, and as 1/N if it is symmetric.This implies that the Kullback-Leibler divergence D random defined in the main text decays as 1/N in the former case, and as 1/N 2 in the latter.In the distributions considered in the main text, the symmetry with respect to the mean value is verified for the Bernoulli map and for the logistic map in the fully chaotic regime.
Without generality loss, we assume that the mean value over the distribution ρ x (x) of the individual random variables is zero.For the sums s = N i=1 x i / √ N , where x i are independent samples of ρ x , the distribution ρ s (s) results from the N -th order self-convolution of ρ x (x).This operation is most conveniently expressed in terms of the characteristic functions (Fourier transforms) G x (k) and G s (k) of, respectively, ρ s (s) and ρ x (x).Namely, where the sum in the right-hand side is the power expan-sion of ln G x (k) around k = 0, which we assume to exist, and κ j is the j-th order cumulant of ρ x (x) [24].We recall that κ 1 = x = 0, κ 2 = σ 2 x is the variance of x over ρ x , and κ 3 = (x − x) 3 .
Using this information, the antitransform of G s (k) can be written as where the ellipsis stands for higher-order terms in the power expansion of the second exponential in the integrand of Equation (B2).Note that ∆ρ s (s) is nothing but the difference between ρ s (s) and the asymptotic Gaussian distribution G σx (s) and that, due to normalization, it must verify ∆ρ s (s)ds = 0.If ρ x (x) is asymmetric around zero, the third-order cumulant κ 3 is different from zero, and the leading term in powers of N in ∆ρ s (s) is given by the summand with j = 3 in Equation (B3), which implies ∆ρ s ∼ 1/ √ N .If, on the other hand, ρ x (x) is symmetric around zero, we have κ 3 = 0.In this case, the sum effectively starts at j ≥ 4, and ∆ρ s decreases as 1/N or faster.Of course, if ρ x is Gaussian from the start, all the higherorder cumulants vanish, and ∆ρ s (s) is trivially equal to 0 for all N .
For the Kullback-Leibler divergence we have, from Equation ( 8 where the approximation holds if ∆ρ s (s) is sufficiently small.If the distribution ρ x (x) is asymmetric around zero, since ∆ρ s (s) decays asymptotically as 1/ √ N , the decay of D random turns out to be as 1/N .If it is symmetric, D random decays as 1/N 2 or faster, depending on whether the subsequent cumulants vanish or not.

For N = 1 ,FIG. 1 .
FIG. 1. Left, dark line: Numerical results for the distribution ρs N of the sums sN defined in Equation (3), in the case the Bernoulli (10) map with m = 2, for three small values of N .The light curve is the Gaussian expected for N → ∞, and the dashed curve is a Gaussian with the same variance as predicted for ρs N .Right, dark curve: The distribution ρ random s N for sums of N values of x randomly sampled from ρx(x) is a normalized version of the Irwin-Hall distribution [15, 16], which can be obtained analytically by successive self-convolution of ρx.The light curve is the Gaussian expected for N → ∞.Note different scales on the left and right columns.

D
FIG. 2. Main panel:The Kullback-Leibler divergences DG, DG N , and D random , defined in the text, as functions of the number of terms in the sums sN of Equation (3), for the Bernoulli map(10) with m = 2.The straight lines in this log-log plot have a slope −2.The inset shows, as dots, numerical results for the variance σ 2 s N over the distribution ρs N (sN ).The dashed line joins the analytical values predicted from Equation(12).
FIG.3.The Kullback-Leibler divergence DG for the Bernoulli map(10) with various values of m, and D random (which is the same for all m).The straight lines have slope −2.

FIG. 4 .
FIG. 4. As in Figure1for the logistic map of Equation(16) in the regime of full chaos, λ = 4.The distributions ρ random s N , dark lines on the right column, have now been obtained numerically.Note the different scales in different panels.

DNFIG. 5 .
FIG.5.The Kullback-Leibler divergence DG for the logistic map of Equation (16) in the regime of full chaos, λ = 4, and D random , as a functions of N .In this case, DG N coincides with DG.The full and dashed straight lines have slopes −2 and −1, respectively.

FIG. 6 .
FIG.6.Left: 900 successive iterations of the logistic map, Equation (16), in the intermittent regime, λ = 3.828.The arrows at t = 300 and 500 point at "turbulent" and period-3 "laminar" intervals, respectively.Right: The correlation c k = [x(t) − x][x(t + k) − x] as a function of k in the same intermittent regime, calculated numerically from sequences of 10 7 iterations of x(t).Symbols are connected by lines to facilitate visualization.

DFIG. 7 .
FIG. 7. The Kullback-Leibler divergences DG for the logistic map (16) in the intermittent regime, λ = 3.828, and D random , as functions of N .For the former, triangles correspond to values of N which are multiples of 3. The slope of the dashed straight line is −1.Inset: Numerical results for the variance σ 2 s N of the sums sN , as a function of N .The arrow to the right indicates the variance obtained for large N .Symbols are connected by dashed lines to facilitate visualization.

FIG. 8 .
FIG. 8.As in Figure4, for the sums of Equation (18) with x(t) obtained from map(20).Note that the scales are the same in all plots.

FIG. 9 .
FIG.9.The Kullback-Leibler divergences DC and D random for the distributions of the sums of Equation (18), with the values of x obtained from the map(20) and the distribution of Equation (22), respectively.The full straight line has slope −1, and the dashed line, with slope −0.68, is a linear fitting of DC for N ≥ 2.