Distinguishing Log-Concavity from Heavy Tails

: Well-behaved densities are typically log-convex with heavy tails and log-concave with light ones. We discuss a benchmark for distinguishing between the two cases, based on the observation that large values of a sum X 1 + X 2 occur as result of a single big jump with heavy tails whereas X 1 , X 2 are of equal order of magnitude in the light-tailed case. The method is based on the ratio | X 1 − X 2 | / ( X 1 + X 2 ) , for which sharp asymptotic results are presented as well as a visual tool for distinguishing between the two cases. The study supplements modern non-parametric density estimation methods where log-concavity plays a main role, as well as heavy-tailed diagnostics such as the mean excess plot.


Introduction
General interest towards non-parametric thinking has increased over the last few years.One example is density estimation under shape constraints instead of requiring the membership of a parametric family.Here, a particularly robust alternative to parametric tests is provided by searching for the best fitting log-concave density.Another example is the mean excess plot that aims at distinguishing light and heavy tails.
Throughout the paper, we consider i.i.d.random variables X, X 1 , X 2 , . . .> 0 with common distribution F having density f and tail F(x) = P(X > x).Then, X is (right) heavy-tailed if Ee sX = ∞ for all s > 0 and light-tailed otherwise.The density f is log-concave, if f (x) = e φ (x) , where φ is a concave function.If φ is convex, then f is log-convex.This paper aims to illustrate that light-tailed asymptotic behaviour is associated with log-concave densities.Likewise, log-convexity seems to be connected to heavy-tailed behaviour.One can use the connection to assess potential heavy-tailedness by searching for patterns that are typically present among distributions with log-concave or log-convex densities.
Log-concavity is a widely studied topic in its own right [1,2].There also exists substantial literature regarding its connections to probability theory and statistics [3,4].Several papers concentrate on the statistical estimation of density functions assuming log-concavity [5,6].This is due to the fact that log-concavity provides desirable statistical properties for estimators.For instance, maximum likelihood estimation becomes applicable and the estimate is unique.The topic is discussed in detail in the beginning of [7].Unfortunately, much less emphasis seems to be put on verification of the log-concavity property itself.Specifically, it seems to be relatively little studied if it is feasible that the sample be generated by a log-concave distribution.See, for example [8,9].
A distribution with a log-concave density f is necessarily light-tailed.In contrast, f is log-convex in the tail in the standard examples of heavy tails such as regular variation, the lognormal distribution and Weibull case F(x) = e −x α with α < 1.An important class of heavy-tailed distributions are the subexponential ones defined by P(X 1 + X 2 > d) ∼ 2F(d).The intuition underlying this definition is the principle of a single big jump: X 1 + X 2 is large if one of X 1 , X 2 is large, whereas the other remains typical.This then motivates being close to 1.In contrast, the folklore is that X 1 , X 2 contribute equally to X 1 + X 2 with light tails.We are not aware of general rigorous formulations of this principle, but it is easily verified in explicit examples like a gamma or normal F (see further below) and, for a large number of summands rather than just 2, it is supported by conditioned limit theorems (see e.g., ( [10] (VI.5))).However, it was recently shown in [4] that these properties of R hold in greater generality and that asymptotic properties of the corresponding conditioned random variable provide a sharp borderline between log-convexity and log-concavity.
In this paper, we provide a wider perspective in terms of both sharper and more general limit results and of the usefulness for visual statistical data exploration.To this end, we propose a feature based nonparametric test.It can be used as a visual aid in identification of log-concavity or heavy-tailed behaviour.It complements earlier ways to detect signs of heavy-tailedness such as the mean excess plot [11].Further tests based on probabilistic features have been previously utilised in e.g., [12][13][14].

Background
A property holds eventually, if there exists a number y 0 so that the property holds in the set [y 0 , ∞).Standard asymptotic notation is used for limiting statements.These and basic properties of regularly varying functions with parameter α, denoted RV(α), can be recalled from e.g., [15].
We note that the principle of a single big jump relates to the fact that joint distributions of independent random variables concentrate probability mass to different regions.For example, a distribution with tail function F(x) = e −x α satisfies for α ∈ (0, 1), as x → ∞.We refer to [16][17][18][19] for related work in this direction.It is shown in Lemma 1.2 of [4] that log-concavity or log-concavity of the density is closely related to the occurrence of the principle of a single big jump.A further observation in this direction is the following lemma.It states that contour lines of joint densities of independent variables behave differently for log-concave and log-convex densities, and thereby leads naturally to different concentrations of probability mass of joint densities (recall that a contour line corresponding to a value p ∈ R of joint density f : R 2 → R is the set of points in the plane defined as {(x, y) ∈ R 2 : f (x, y) = p}).
Lemma 1. Suppose X 1 and X 2 are i.i.d.unbounded non-negative random variables.Assume further that they have a common twice differentiable density function f of the form where h is a strictly increasing function.
If f is log-concave (log-convex), then, for any fixed p ∈ (0, e −h(0) ), there exists a convex (concave) function ψ p defining a contour line of f X 1 ,X 2 corresponding to p such that f X 1 ,X 2 (x, Lemma 1 implies that log-convex and log-concave densities cause maximal points of joint densities to accumulate into different regions in the plane.Log-convex densities tend to put probability mass near the axis, while log-concave densities have a tendency to concentrate mass near the graph of the identity function.The exponential density is the limiting case where all contour lines are straight lines.More generally, for f α (x) = C α e −x α , where C a > 0 is an integration constant, the contour lines are circles for α = 2, straight lines for α = 1, and parabolas for α = 1/2.

Theoretical Results
The emphasis of the paper is on the mathematical formulation of the connection between log-convexity and the principle of a single big jump.However, some additional theoretical results are provided concerning convergence rates of the conditional ratio defined in Equation (3).These rates, or estimates for the rates, are obtained in some standard distribution classes.Their proofs are mainly based on sharp asymptotics of subexponential distributions obtained in [20][21][22].Recall that some main classes of such distributions are RV(α), meaning regularly varying ones, where F(x) = L(x)/x α with α > 0 and L(•) are slowly varying, Weibull tails with F(x) = e −x α for some α ∈ (0, 1), and lognormal tails which are close to the case γ = 2 of F(x) = e − log γ x for x ≥ 1 and some γ > 1; we refer in the following to this class as lognormal type tails.

Define the function
It can be viewed as a generalisation of the function f Z d considered in [4] and has the same interpretation as in the case with densities: if both X 1 and X 2 contribute equally to the sum X 1 + X 2 , then g should eventually obtain values close to 0; similarly, if only one of the variables tends to be the same magnitude as the whole sum, then g is close to 1 for large d.Note also that g is scale independent in the sense that g aX (d) = g X (d/a) for all a > 0. Due to this property, two or more samples can be standardised to have, say, equal means in order to obtain graphs on the same scale.
In Proposition 1, sharp asymptotic forms of g are exhibited in some classes of distributions.
Proposition 1.The following convergence rates hold for g defined in Equation (3).

3.
Let X be of lognormal type.Then, Remark 1.In the case of Weibull and lognormal distributions, the implication is that g(d) converges to 1 at a larger rate than their associated hazard rates tend to zero.In addition, inspection of the proof shows lim inf This implies that the actual convergence rate can not be substantially larger than in the regularly varying case, where the leading term is explicitly identified.
The light-tailed case appears to be more difficult to study than the heavy-tailed case.Difficulty arises mainly from the lack of good asymptotical approximations for probabilities of the form P(X 1 + X 2 > d) when P(X 1 > d) decays much faster than e −d .Interestingly, the full asymptotic form of g can be recovered in the special case of the normal distribution if we allow X to obtain negative values.Proposition 2. Suppose that X is normally distributed with E[X] = 0 and Var[X] = 1/ √ 2.Then, The following theorem can be used to assess if a sample is coming from a source with log-concave density.It can be seen as a natural continuation as well as a generalisation to [4].
Theorem 1. Assume the density f is twice differentiable and eventually log-concave.Then, lim sup Similarly, if f is eventually log-convex, then lim inf

Statistical Application: Visual Test
Suppose (X 1 , Y 2 ), (X 2 , Y 2 ), . . . is a sequence of i.i.d.vectors whose components are also i.i.d.One can formulate the empirical counterpart of Quantity (3) by setting where and 1(A) is the indicator function of the event A.
Remark 2. Equation ( 9) requires as input a two-dimensional sequence of random variables.One can form such a sequence from a real valued i.i.d.source Z 1 , Z 2 , . . ., Z N using any pairing of the Z i .Obvious examples are to take X k = Z 2k−1 , Y k = Z 2k to take the set {(X k , Y k )} as all pairings of the Z k or as a randomly sampled subset of these N(N − 1)/2 pairings.If the data is truly i.i.d, this should not have any effect on the outcome.

Examples and Applications
A graph of ĝ(d, n) as function of d can be used to determine if the data support the density being log-concave or light-tailed behaviour.According to Theorem 1, the graph should then stay below 1/2.Figures 1-4 illustrate such graphs using experimental data.
The test method is visual.A similar idea has been used at least in the classical mean excess plot, where one visually assesses if the tail excess in the sample points is increasing at the level, as is the case for heavy tails.d, n) from a classical set of Danish fire insurance data that can be obtained for instance from data set 'danish' in the R package [23].The data is scaled to have mean 1.The sample is traditionally used to illustrate how heavy-tailed data behaves.A similar set of data was previously used in [24].The graph supports the usual finding that the data set is heavy-tailed.

Finer Diagnostics
The idea of plotting ĝ(d, n) as a function of d was introduced as a graphical test for distinguishing between heavy-tailed and log-concave light tailed distributions.It seems reasonable to ask if the plot can be used for finer diagnostics, in particular to further specify the tail behaviour of F when F was found to be heavy-tailed.Such an idea would be based on the rate of convergence of ĝ(d, n) to 1.
A first conclusion is that a sample size of R = 5000 is grossly insufficient for drawing conclusions about the way in which ĝ(d, n) approaches 1-random fluctuations take over long before a systematic trend is apparent.The sample size R = 5 • 10 6 is presumably unrealistic in most cases, but even for this, the picture is only clear in the RV case.Here, d 1 − ĝ(d, n) seems to have a limit c, as it should be, and the plot is in good agreement with the value 2.4 = 2αE[X]/(α + 1) of c predicted by Proposition 1.
Whether a limit exists in the lognormal or Weibull case is less clear.The results of Proposition 1 are less definite here, but, actually, a heuristic argument suggests that the limit c should exist and be 2E[X].To this end, let X (1) < X (2) be the order statistics.According to subexponential theory (see, in particular, [25]), X (1) , X (2) are asymptotically independent given X 1 + X 2 > d, with X (1) having asymptotic distribution F and X (2) being of the form d + e(d)E with e(d), and E as in the proof of Proposition 1.For large d, this gives For the Weibull, 2E[X] = 6.65, and the R = 5 • 10 6 part of Figure 5 is rather inconclusive concerning the conjecture.We did one more run with R = 5 • 10 7 over a range of parameters (the variance σ 2 of log X for the lognormal and β for the Weibull).All X 1 , X 2 were normalized to have mean 1 so that the conjecture would assert convergence to c = 2.This is not seen in the results in Figure 6.Large values of σ 2 and small values of β could appear to give convergence, but not to 2. It should be noted that the heuristics give the correct result in the RV case.Namely, here we can take e(d) = d and P(E > x) = 1/(1 + x) α .This easily gives E1/(1 + E) = α/(α + 1) so that Quantity ( 10) is approximately 1 − 2αE[X]/d(α + 1), as rigorously verified in Proposition 1.
The overall conclusion is that the finer diagnostic value of the method is quite limited, and restricted to RV and sample sizes which may be unrealistically large in many contexts.

Proofs
Proof of Lemma 1. Suppose h is concave and p ∈ (0, 1).The contour line corresponding to value p is formed as the set of points (x, y) that satisfy f X 1 ,X 2 (x, y) = p, or equivalently For any such pair (x, y) one can solve Equation (11) for y to obtain Firstly, h −1 is convex as the inverse of an increasing concave function.Secondly, the composition of an increasing convex function and a convex function remains convex.Thus, as a function of x, Expression (12) defines a convex function when x ∈ [0, h −1 (− log p − h(0))].Thus, one can define If h is convex, the proof is analogous.
The following technical lemma is needed in the proof of Proposition 1.It applies to Pareto, Weibull and lognormal type distributions.Indeed, condition (13) follows from Proposition 1.2; (ii) of [21] and further needed assumptions are easily verified apart from strong subexponentiality, which is known to hold in the mentioned examples.Lemma 2. Suppose X 1 and X 2 are non-negative i.i.d.variables with a common density f , where the hazard rate r Then, If in addition Proof.Equality (13) implies subexponentiality of X 1 .Writing and observing that the nominator on the right-hand side is of order E[X]r(d)o(1)2F(d) proves Equation ( 14) since 2P(X 1 > d)/P(X 1 + X 2 > d) → 1 by subexponentiality.Equality (13) implies On the other hand, writing Since we know from Equation ( 16) that the leading term tending to zero must be E[X]r(d), Equation ( 15) holds.
Proof of Proposition 1. Suppose X is regularly varying with index α.In light of Lemma 2, we only need to establish The contribution to the l.h.s of Equation ( 17) from for any A > 0 and > 0. Thus, it can be neglected.We are left with estimating We will bound this quantity from above and below, assuming A < 1/2.Firstly, Now, given X > x, X − x is approximately distributed as xE for large x where P(E > z) = 1/(1 + z) α .Hence, dominated convergence gives , z → ∞.

We get
Ad Here, the error terms η 1 (d) and η 2 (d) are of order o(F(d)/d).The latter error comes from Taylor expansion of function F(d − y) around point y = 0.The fact that f is assumed eventually decreasing guarantees that f (x) Secondly, for the lower bound, we have that As before, we get for error terms η 1 and η 2 of order o(F(d)/d).
Repeating the argument with arbitrarily small A > 0 and combining the upper and lower estimates allows one to deduce Suppose then that X is Weibull distributed.Now assumptions of Lemma 2 are satisfied with r(d) = αd α−1 .Since F(d/2) 2 = O e −cx α for some c > 1 depending on α, we only need to find the order of In fact, proceeding similarly as in the regularly varying case, it can be seen that Quantity (18) It is known that (X 1 − z)/e(z)|X 1 > z, where e(z) = 1/(αz α−1 ), converges in distribution to a standard exponential variable, as z → ∞.Because e(z/2)/z = o(1), it holds for y ∈ (the interchange of expectation and convergence is justified by dominated convergence).In addition, the same error term can be used for any y.Thus, Quantity ( 19) can be written as Now, using the definition of A 2 from Lemma 2 with Equality (15), we get and Equation ( 20) follows from the fact that conditionally to A 2 , all probability mass concentrates near small values of X 1 /d.Gathering estimates and using Equation ( 14) of Lemma 2 yields This shows Equation ( 4), and Equation ( 5) can be obtained using similar calculations with e(z) = z/ log γ−1 z.
Proof of Proposition 2. Note first that X 1 + X 2 and X 1 − X 2 are independent in the normal case.Denote Z = X 1 + X 2 so that Z ∼ N (0, it follows in the same way as in the proof or Proposition 1 that the r.h.s. of Equation ( 21) is (c/d)(1 + o( 1)).This proves the claim.
Proof of Theorem 1. Suppose f is log-concave and twice differentiable.Since In fact, by symmetry, one only needs to show It is known from the proof of Proposition 2.1 of [4] that f Z z is increasing in [0, 1/2].Since f Z z is non-negative and integrates to one over interval [0, 1], there exists a number a ∈ (0, 1/2) such that f Z z (s) ≤ 1 when s ≤ a and f Z z (s) > 1 when s > a.Therefore, ( f Z z (s) − 1) ds = 0, which proves Inequality (23).Generally, if f is log-concave and twice differentiable in the set [x 0 , ∞), then f Z z is increasing in the set [x 0 /z, 1/2].The difference from the presented calculation vanishes in the limit d → ∞, and thus Inequality (7) holds.
If f is eventually log-convex, the proof is analogous and Inequality (8) holds.

Figure 3 .
Figure 3. Graph of ĝ(d, n) from a classical set of Danish fire insurance data that can be obtained for instance from data set 'danish' in the R package[23].The data is scaled to have mean 1.The sample is traditionally used to illustrate how heavy-tailed data behaves.A similar set of data was previously used in[24].The graph supports the usual finding that the data set is heavy-tailed.

Figure 4 .
Figure 4.The graphs of multiple versions of ĝ(d, n) based on a dataset obtained from Hansjörg Albrecher (private communication) and related to occurrences of floods in a particular area.The data is scaled to have mean 1.The sample size is n = 39.Bivariate vectors (X 1 , Y 1 ), . . ., (X 19 , Y 19 ) were sampled several times randomly without replacement from the original data.The overall appearance of the paths points to the data being heavy-rather than light-tailed.
2).Let e(d) = F(d)/ f (d) be the mean excess function of Z (inverse hazard rate).It is then standard that e(d) is of order 1/d and that (Z − d)e(d)|Z > d converges in distribution to a standard exponential.Writingg(d) = E |X 1 − X 2 | d E