2. The statistical meaning of kurtosis
Karl Pearson [5
] defined a distribution's degree of kurtosis as:
X denotes the sequence of inputs, μ represents the mean value of X, σ is referred to the variance of X and n the length of input sequence X. The expected value of the distribution of
scores which have been raised to the fourth power. β2 is often referred to as “Pearson's kurtosis”, and β2 − 3 (often symbolized with γ2, that is γ2 =β2 −3) as “kurtosis excess” or “Fisher's kurtosis”.
An unbiased estimator [6
] for γ2
For large sample sizes (n>1000), g2 may be distributed approximately normally, with a standard error of approximately
] introduced kurtosis as a measure of how flat the top of a symmetric distribution is when compared to a normal distribution of the same variance. He referred to more flat-topped distributions (γ2<
0) as “platykurtic”, less flat-topped distributions (γ2>
0) as “leptokurtic”, and equally flat- topped distributions as “mesokurtic” (γ2
≈0). Kurtosis is actually more influenced by scores in the tails of the distribution than scores in the center of a distribution [9
]. Accordingly, it is often appropriate to describe a leptokurtic distribution as “fat in the tails” and a platykurtic distribution as “thin in the tails”. Platykurtic curves have shorter ‘tails’ than the normal curve of error and leptokurtic longer ‘tails’.
] demonstrated that β2
) + 1. Accordingly, it may be best to treat kurtosis as the extent to which scores are dispersed away from the shoulders of a distribution, where the shoulders are the points where Z2
= 1, that is, Z
= ±1. Balanda and MacGillivray [11
] wrote “it is best to define kurtosis vaguely as the location- and scale-free movement of probability mass from the shoulders of a distribution into its centre and tails”. If one starts with a normal distribution and moves scores from the shoulders into the center and the tails, keeping variance constant, kurtosis will increase. The distribution will likely appear more peaked in the center and fatter in the tails, like a Laplace distribution (γ2
Let us denote p(x) the probability density function (pdf) of a random process x(t) and E() the mean. The kurtosis k[x(t)] is:
Assume the mean E() is zero, and k[p(x)] can be written as:
Clearly, the kurtosis sign ks(x) is equal to the fourth-order cumulant sign. Some properties can be easily derived.
Cum4 (ax+ b) = a4Cum4 (x), so ks(x) is invariant by any linear transformation ks(ax+b)=ks(x)
Let p(x) = pe (x) + po (x), where pe (x) is even and po (x) is odd. It is easy to prove that ks(x) only depends on pe (x) and that pe (x) can be considered as a pdf.
Therefore, in the following, the study may be restricted to a zero-mean process x(t) whose the pdf p(x) is even and has a variance σ2x = 1
It is well known that the kurtosis of a Gaussian distribution is equal to zero. Intuitively, the sign of the kurtosis seems related to the comparison between p(x) and Gaussian distribution, by considering the asymptotic properties of the distribution and the following definition:
A pdf p(x) is said over-Gaussian (respectively sub-Gaussian), if ∀x≥x0, p(x) > g(x) (respectively, p(x)<g(x)), where g(x) is the normalized Gaussian pdf. In many examples, it seems that ks(x) is positive for over-Gaussian signals and negative for sub-Gaussian signals.
Let us consider that for x>0, the equation p(x)=g(x) only has one sulotion p > 0, it is known that the fourth-order cumulant of a Gaussian distribution is zero. As a consequence, we can write:
. In addition, we just may study the sign of
, and we can prove that
Let us consider that the pdf p(x) is an over-Gaussian signal. Then, the sign of p(x)-g(x) remains constant on each interval [0,ρ
] and [ρ
, ∞]. Using the second mean value theorem, γ
can be rewritten as:
Where 0 < ξ
. Using the fact that p(x) and g(x) are both pdf, we can deduce that
Taking into account that p(x) is over-Gaussian, we deduce
Using the above two equation, we remark that:
Finally, if p(x) is an over-Gaussian pdf, then its kurtosis is positive. Using the same reason and under the same condition, we can claim that a sub-Gaussian pdf has a negative kurtosis.
There are some basic results about kurtosis given by Richard [12
]. These results are helpful for understanding the statistical meaning of kurtosis. Here are some of these results.
For standard scores,
, the kurtosis of X is:
Assume the two points of the distribution are at 0 and 1, with p being the frequency at 1. Then
As p + q =1
So we have
For a three-point distribution in which the density is p, then
Starting again with a normal distribution, moving scores from the tails and the center to the shoulders will decrease kurtosis. A uniform distribution certainly has a flat top, with γ2 = –1.2, but γ2 can reach a minimum value of –2 when two score values are equally probably and all other score values have probability zero (a rectangular U distribution, that is, a binomial distribution with n =1, p = 0.5). One might object that the rectangular U distribution has all of its scores in the tails, but closer inspection will reveal that it has no tails, and that all of its scores are in its shoulders, exactly one standard deviation from its mean.
Kurtosis is usually of interest only when dealing with approximately symmetric distributions. Skewed distributions are always leptokurtic [15
]. Among the several alternative measures of kurtosis that have been proposed (none of which has often been employed), is one which adjusts the measurement of kurtosis to remove the effect of skewness [16
There is much confusion about how kurtosis is related to the shape of distributions. Many people have asserted that kurtosis is a measure of the peakedness of distributions, which is not strictly true.
It is easy to confuse low kurtosis with high variance, but distributions with identical kurtosis can differ in variance, and distributions with identical variances can also differ in kurtosis. Here are some simple distributions that may explain what kurtosis is, in part, a measure of tail heaviness relatives to the total variance in the distribution.
A has the least kurtosis (–2 is the smallest possible value of kurtosis) and G the most. In the maximally platykurtic distribution A, which initially appears to have all its scores in its tails, no score is more than one σ away from the mean, that is, it has no tails! In the leptokurtic distribution G, which seems only to have a few scores in its tails, one must remember that those scores (5 and 15) are much farther away from the mean (3.3 σ) than are the 5's & 15's in distribution A. In fact, in G nine percent of the scores are more than three σ from the mean, much more than you would expect in a mesokurtic distribution (like a normal distribution), thus G does indeed have fat tails.
Kurtosis is the degree of peakedness of a distribution, defined as a normalized form of the fourth central moment of a distribution. The kurtosis for a number of some common distributions is shown below.
The following example makes it quite clear that a higher kurtosis implies that there are more extreme observations (or that the extreme observations are more extreme). It is also evident that a higher kurtosis also implies that the distribution is more ‘single-peaked’ (this would be even more evident if the sum of the frequencies was constant).
We may define mesokurtic as “having β2 equal to 3”, while platykurtic curves have β2 < 3, and leptokurtic β2 > 3. The important property which follows from this is that platykurtic curves have shorter “tails” than the normal curve of error and leptokurtic longer “tails”.
From the discussion above, the statistical meanings of kurtosis is given: kurtosis is a kind of measure of data's degree of outlier or data's peakedness.