1. Introduction
This paper addresses an important aspect of heavy tail modeling. Heavy tails, though underestimated, occur frequently in real life. The story of tails is inevitably connected to the notion of outliers. Outliers, anomalies, oddities, discordant observations, contamination, deviants, exceptions, or aberrations represent “
unusual events that occur more often than seldom” [
1]. They appear more than frequently because of various, mostly unknown reasons. Historically, statisticians have encountered, identified, and analyzed them first. Real data observations enable the modeling of fundamental stochastic processes. Probabilistic density distributions play an important role role, as the assessment of their properties makes it possible to extrapolate our knowledge.
Heavy tails can be successfully used to describe such phenomena. They may mean that a given distribution follows the power law. In other cases, they mean that it is scale-free. They may refer to stable or subexponential distributions, or they witness an infinite variance. Actually, a given distribution is heavy-tailed if its tail is heavier than any exponential distribution [
2]. Detailed explanations may be found in [
3].
Stochastic data analysis allows us to identify and model tails. However, this is only one side of the coin. Abnormal observations and tails are closely interconnected with other data interpretations and various phenomena, such as persistence [
4], fractality [
5,
6], Hurst exponent [
7], and fractionality [
8]. Stable functions, and especially
-stable distribution, allows for addressing that issue efficiently through the existence of the stability exponent
.
This work addresses the heavy-tailed properties of the -stable distribution and its detection. These features are closely connected to fractionality, though it has a different name and is assessed from a statistical perspective.
The problem of how to fit an unknown PDF function to the experimental data is well-known. The problem consists of two elements: a choice of the function itself and an estimation of its coefficients. This problem can be addressed separately or together. One may utilize different approaches [
9], like distribution function fitting to the histogram, method of moments, quantile Q-Q plots, or L-moment ratio diagrams [
10,
11]. Each method has its scope of applicability and exhibits certain properties.
Apart from probabilistic distributions and their factors, statistics deliver alternative formulations of moments. Recent research by [
9,
12] shows that L-moments, which were proposed by [
13], can play the role of the PDF fitting mechanisms [
14]. They are successfully used not only in extreme data analysis [
15,
16] but also in economics [
17] or control engineering [
18].
L-moments exhibit a lot of advantages, as they are analog to conventional moment estimates: shift, scale, skewness, and kurtosis. They introduce new characterization of the PDF shape and help to estimate its factors [
19]. Similarly to the method of moments, this is achieved by fitting the empirical L-moments to the exact theoretical values. We may evaluate them theoretically for many distributions, but the moments must exist. As the theoretical moments of the
-stable distribution may not exist, which depends on its parametrization, we cannot apply it directly [
20].
This paper presents original solution to an open problem of fitting stable probabilistic density function (PDF) to experimental data for distributions with non-existent moments. This task is solved using L-moment ratio diagrams (LMRD). The most challenging aspect of this work is the utilization of Monte-Carlo approach that allows for addressing the challenge of using a method of moments in case of infinite moments of -stable distribution. It is shown that, by recurring estimations, we may identify polynomial curves in the LMRD diagram that relate to the factors of the -stable distribution, but the estimation quality depends on stability exponent and its skewness.
The description starts with
Section 2, which describes used methodology and algorithms. It allows for further formulation of
-stable estimation experiments in
Section 3.
Section 4 concludes the paper and shows areas for further research.
3. Simulation Analysis
It is said that L-moments exhibit good approximation quality in case of small sample sizes [
13,
43]. Thus, one could take short time series, evaluate L-moments, draw L-moment diagram, and find appropriate distribution [
14]. This approach is a successor to the method of moments, which was considered the standard before the development of maximum likelihood approaches. Generally, the MoM approach is known as less accurate than ML. Furthermore, the information about the distribution shape kept by third and higher order moments is rather difficult to be assessed, particularly for small sample sizes, as sample moments’ numerical values can significantly differ from those of an original PDF [
44].
This approach is questionable in case of the -stable distribution. In case of , the first moment is infinite, but it does not dismiss this distribution from further research. Generally, one can evaluate L-moments for -stable distribution, as the first moment exists. However, the analysis might be seriously biased and difficult to interpret. This work tries to address this issue using LMRD representation.
Let us start with a simple case of and PDF in form of . Once , the function simplifies to the normal one . Thus, the third and fourth L-moments equal to and and data in LMRD is reflected by a single point .
Let us first assess how a sample size affects the LMRD estimation. The experiment is as follows: We take normal distribution
and generate
samples. Next, we divide the set into subsets of length
. We assess these sample sizes. We obtain
datasets, respectively. Finally, we calculate the L-skewness and L-kurtosis for each dataset.
Figure 2 presents the obtained results. We clearly observe that the smaller the sample size is, the more biased the estimation seems to be.
The results depicted in
Figure 2 illustrate how smaller sample sizes lead to increased bias in the estimation of L-moments. This aligns with expectations, as reduced sample sizes limit the robustness of statistical measures. Following this,
Figure 3 extends this analysis by showing the histograms of L-skewness for various sample sizes, highlighting how the distribution stabilizes as the sample size increases. These figures underscore the necessity of larger sample sizes in achieving reliable Monte-Carlo estimations, particularly for accurate representations in LMRD.
The second experiment aims at investigating how large the sample size should be for good estimation. We repeat the estimation
times for different sizes from
till
every 25 samples. For each case, we evaluate histogram and quantiles: Q1, Q2 (median), and Q3. Histograms for
are shown in
Figure 3, while
Figure 4 shows plots for
. It confirms that the sample size might not be too low.
A summary of evaluated estimation metrics, i.e., the quantiles, is shown in
Figure 5. An observation of the resulting diagrams leads to the rational decision that the sample size
allows for the reliable estimation of L-moments. That number is used in consecutive experiments.
Figure 6 shows LMRD diagrams plotted for normal distribution. Each circle in these plots reflects one population of size
. We increase the population number and observe properties of LMRD and the estimation. We measure the robust center of scattered points generated for each population as a two-dimensional geometric median (GeoMed)
, which is defined as a value of argument
, to which the sum of all Euclidean distances for
is minimized
where
L is the number of points (populations). We evaluate GeoMed with Weiszfeld’s algorithm proposed in [
45].
Once the center is known, we measure the distance from the GeoMed point to the point representing ideal normal PDF. This distance measures the estimation efficiency. The more populations we have, the better the estimation that is achieved. The relationship between the population size and the points scatter is shown in
Figure 7. Population number
enables a reliable PDF fitting.
Using the above data, i.e., the sample size
and the number of populations
, we may extend the analysis towards skewed independent distribution described by PDFs
. Therefore, we set
and
, and we draw resulting L-moment ratio diagrams.
Figure 8 shows charts for
and
. Diagrams are exactly the same, and the skewed normal PDF function
is always reflected in LMRD(
) diagrams by a single point
.
We continue with the general
-stable PDF, i.e., for
. The estimation procedure using LMRD uses above assumptions about the number of samples in each population and the number of populations. Three LMRD(
) diagrams are initially presented. Each of them is prepared for different set of
-stable PDF coefficients. The results interpretation problem lies in the lack of theoretical target values. They are called “limiting”. They are estimated using the Monte-Carlo approach with a very large sample size of
and a high number of populations equal to
.
Figure 9 presents estimation for the right-skewed data, while
Figure 10 presents the left-skewed parameters.
Figure 11 shows symmetrical S
S variants. The one with
and
denotes Cauchy PDF. As
diminishes
, the estimation error increases. Estimations tend to decrease values of
and
. Therefore, the approach using previously estimated sample and population sizes is highly biased as we recede from independent realization of
.
These results show that sample sizes should not be so low as evaluated for normal distribution. They seriously depend on the stability exponent
value. The estimation performance significantly depends on data tailedness, and it decreases for heavy-tailed data. It is interesting to note that the Cauchy case shown in
Figure 11 shows the estimated Monte-Carlo values of L-skewness and L-kurtosis:
and
. Generally, it is expected that L-kurtosis for Cauchy distribution should converge to some value. However, this value is a theoretical expectation limit achieved for an infinite sample size.
It is interesting to see how the Monte-Carlo L-moments estimation converges for the Cauchy PDF, which is a special case of the
-stable distribution (
Figure 12). We repeat consecutive estimations, assuming sample size
n = 10,000 and changing the number of populations
. The target values of L-moments are evaluated using robust location estimator with logistic M-function as in [
46]. Concluding, we might use the following estimated L-moments for Cauchy distribution:
and
. L-kurtosis differs significantly from TL-kurtosis given by [
47], i.e.,
.
The respective plot for the highest population number
l = 100,000 is given in
Figure 13. Obtained points can be approximated by upper and lower bounds. The estimation is given by the analogous Equation (
13), with its coefficients shown in
Table 3.
Obtained boundary curves are highly similar to the skewness-kurtosis chart for the truncated Cauchy distribution [
48]. Obtained results are somehow disappointing, as for proper and converging estimation we require extremely large number of observations, both in sample size and the population number. We also notice that the heavier the tail is, the more points are required to evaluate the proper GeoMed Monte-Carlo estimator of L-moments. This is hardly achievable in real life datasets.
Nonetheless, the same experiment is repeated for -stable distribution. At first, we generate large number l = 10,000 of populations of size . We use the -stable random numbers generator and generate data for . The stability factor changes with a decrement and with a decrement . For each point, we find a GeoMed center.
Figure 14 presents clear L-moment ratio diagram with such evaluated GeoMed centers in the background of characteristic points for other known distributions as given in
Table 1 and
Table 2. We see exact cover for two special cases, i.e., Gaussian and Cauchy functions, previously evaluated.
These points arrange themselves into specific shapes, which are symmetrical in the vertical axis
.
Figure 15a connects these points with straight lines, while
Figure 15b interpolates them with polynomial functions Equation (
13). Coefficients of polynomials for
are given in
Table 4 and for
in
Table 5.
Polynomials that reflect variations in L-skewness are symmetrical and have an order 2. The polynomials reflecting variations of stability exponent
are of order 3, and the ones with the same absolute value of
are symmetrical against each other around axis OY. Monte-Carlo estimation allows us to estimate the LMRD estimations for additional special cases, i.e., Holtsmark and Landau. Moments of Landau PDF, such as mean or variance, are undefined, while Holtsmark has mean, its variance is infinite, and higher moments undefined.
Table 6 shows estimated LMRD(
) points.
The estimation error rises as tails get heavier, i.e., as stability exponent diminishes.
Figure 16a compares Monte-Carlo simulations (denoted [MC]) with sample size
and number of populations
.
Figure 16b relates
to the estimation mean square error (MSE).
Error performance related to the stability index
is shown in
Figure 17. The effect of estimation error deterioration and the need for higher sample sizes is obvious. We observe close to linear trend. The trend slope increases with the increasing absolute value of skewness
. This feature is highly disappointing, because it seriously limits practical implementations, demanding extremely high observation numbers to get reliable estimation. Moreover, we have to keep in mind that similarly to the Cauchy case real theoretical values might even be different, as the values converge with an increased number of samples, and the infinity cannot be captured in simulations.