Modelling Heavy Tailed Phenomena Using a LogNormal Distribution Having a Numerically Verifiable Infinite Variance

Cococcioni, Marco; Fiorini, Francesco; Pagano, Michele

doi:10.3390/math11071758

Open AccessArticle

Modelling Heavy Tailed Phenomena Using a LogNormal Distribution Having a Numerically Verifiable Infinite Variance

by

Marco Cococcioni

^*,†

,

Francesco Fiorini

^†

and

Michele Pagano

^†

Department of Information Engineering, L.go Lucio Lazzarino, 1-56122 Pisa, Italy

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2023, 11(7), 1758; https://doi.org/10.3390/math11071758

Submission received: 15 January 2023 / Revised: 1 April 2023 / Accepted: 4 April 2023 / Published: 6 April 2023

(This article belongs to the Section E2: Control Theory and Mechanics)

Download Versions Notes

Abstract

One-sided heavy tailed distributions have been used in many engineering applications, ranging from teletraffic modelling to financial engineering. In practice, the most interesting heavy tailed distributions are those having a finite mean and a diverging variance. The LogNormal distribution is sometimes discarded from modelling heavy tailed phenomena because it has a finite variance, even when it seems the most appropriate one to fit the data. In this work we provide for the first time a LogNormal distribution having a finite mean and a variance which converges to a well-defined infinite value. This is possible thanks to the use of Non-Standard Analysis. In particular, we have been able to obtain a Non-Standard LogNormal distribution, for which it is possible to numerically and experimentally verify whether the expected mean and variance of a set of generated pseudo-random numbers agree with the theoretical ones. Moreover, such a check would be much more cumbersome (and sometimes even impossible) when considering heavy tailed distributions in the traditional framework of standard analysis.

Keywords:

non-standard analysis; alpha-theory; algorithmic numbers; non-archimedean scientific computing; heavy tailed distributions

MSC:

03H10; 60E05; 65C20

1. Introduction

Robinson’s Non-Standard Analysis introduces a field

R^{*}

(called the field of “hyperreals”), which includes infinitesimal and infinite quantities. On the contrary, standard analysis is performed over the field of real numbers

R

, which is made of finite numbers only. Frequently, the new set

\bar{R}

is defined, made by the union of

R

and the two new symbols

- \infty

and

+ \infty

. Such symbols are usually added for topological reasons, to introduce the neighbour of the “point at infinity”. Sometimes, with an abuse of notation, we are accustomed to saying that an integral is

+ \infty

, as a short-cut to say that it is not finite (“it diverges towards

+ \infty

”). This clarification is important to understanding the rest of the paper and the terminology used therein. In particular it is helpful in clarifying from the beginning that infinite numbers present in Robinson’s hyperreal set

R^{*}

must not be confused with ∞: the former are numbers, the latter is a mere symbol.

Concerning standard one-sided heavy tailed distributions (i.e., distributions defined over the non-negative real numbers), we know that they have been applied in many engineering applications, ranging from teletraffic modelling [1,2], to the modelling of risk in the insurance business [3], to asset management in quantitative finance [4], etc. Typically used heavy tailed distributions are characterized by a diverging variance. As a matter of fact, it must be considered that the LogNormal distribution is sometimes discarded for modelling heavy tailed phenomena, because of its finite variance (see Section 4 for more details). Therefore it is important to review when the use of a LogNormal distribution is appropriate, and when is the use of a heavy tailed one.

The use of a LogNormal distribution is justified when: (i) it seems an appropriate model, which fits the observed data, or (ii) when its use is due to some theoretical properties of the underlying phenomena (e.g.: it is a multiplicative one, instead of an additive one).

Similarly, resorting to heavy tailed distributions can be justified by: (i) the presence of outliers in the data, or (ii) purely theoretical considerations (e.g.: we know the phenomenon under study is self-similar, and hence better modelled by a heavy tailed distribution [5]).

From the observations above, it cannot be excluded that a given problem both requires: (i) a LogNormal distribution, and (ii) a heavy tailed one. One solution is to consider, as some authors do, the LogNormal distribution as a heavy tailed one, even if it has a finite variance. On the contrary, in this paper we will show that it is possible to build LogNormal distributions having infinite variance. Without the use of Non-Standard Analysis, this would not be possible; instead, by using Non-Standard Analysis, we are able to provide a non-standard version of the LogNormal distribution having infinite variance. This is possible because in Non-Standard Analysis both infinitesimal and infinite quantities can be used.

Our new definition of heavy tailed distribution, which works both for standard and non-standard distributions, includes both those having diverging variances and those having infinite variances.

In Section 6 we will introduce a LogNormal distribution having finite mean and infinite variance by using Benci’s Alpha Theory and the related Bounded Algorithmic Numbers. In the same section we will show how to numerically verify that the variance is actually infinite by computing the sample variance of pseudo-random numbers generated according to it.

We hope that the LogNormal distribution with finite mean and infinite variance introduced in this work will find practical applications in the near future. At least, it could help in avoiding statements like this “although a LogNormal distribution would have better fitted the data, we did not use it because it has not infinite variance”, statement quoted from [1].

In addition, we are able to numerically verify that pseudo-random values generated according to a specific non-Archimedean LogNormal actually have the expected theoretical mean and variance (of course, only up to numerical errors, as usual when doing numerical computations).

Organization of the Work

In Section 2, a particular version of Non-Standard Analysis is provided (to be more precise, Benci’s Alpha Theory), together with the associated set of Euclidean numbers (a non-standard field which includes the real numbers). Then, in Section 3, we introduce the Bounded Algorithmic Numbers, i.e., a fixed-length representation for Euclidean numbers, which mimics IEEE floating point numbers. In Section 4, we provide a new definition of heavy tailed distributions. In Section 5, we focus on the Euclidean Gaussian distribution, i.e., a non-standard distribution defined over the Euclidean numbers. Although it is not one-sided, we have analyzed it anyway since it is preparatory to the construction of the Euclidean LogNormal distribution, investigated in Section 6. For the latter, we derived the values of its parameters that allow a finite mean and an infinite (yet numerically verifiable) variance. Section 7 concludes the paper.

2. Benci’s Alpha-Theory and the Euclidean Numbers

The purpose of this section is to give a further contribution to the applications of Non-Archimedean Mathematics via numerical computations. In Mathematics, non-Archimedean refers to an ordered field which does not satisfy the axiom of Archimedes (or, equivalently, that lacks the Archimedean property). The latter states [6]:

Axiom 1. Axiom of Archimedes.

Let F be any totally ordered field. Then,

\forall x, y \in F, 0 < x < y, \exists n \in N : y < n x .

The axiom was called Archimedean by Otto Stolz, since it appears as Axiom V of Archimedes’ On the Sphere and Cylinder.

Non-Archimedean Analysis is the branch of Mathematics which deals with fields lacking the Archimedes’ property; examples of non-Archimedean ordered fields are the Levi-Civita field [7], the hyperreal numbers [8], the surreal numbers [9], the Dehn field [10]. A very important branch of non-Archimedean analysis is Non-Standard Analysis (NSA), originally proposed in 1961 by Robinson, who published the milestone book about it in 1966 [8]. By now, there are many books on NSA; we refer the interested reader to [11,12], which are closer to our approach.

Actually, our approach is based on Alpha-Theory (AT) [11,13] and the theory of numerosity [14]:

AT is an introduction to Non-Standard Analysis based on the notion of $α$ -limit. The notion of $α$ -limit is a version of the transfer principle easier to be used by practitioners; in fact, roughly speaking it could be enunciated as follows: “every relation between sequences is preserved by the limit”
The theory of numerosity is strictly related to AT. It is useful to give a meaning to some infinite numbers such as the number $α$ which is the numerosity of the set of positive natural numbers and it can be useful in some applications, e.g., see [15].

Benci’s Alpha-Theory

The essence of any axiomatic approach to non-standard fields relies on two points: (i) the existence of at least one infinite (or infinitesimal) number and its algebraic properties; (ii) the transfer principle. Actually, they correspond to Axioms 3 and 4 shown later in the section, respectively.

Axiom 2.

There exists an ordered field

E \supseteq R

whose numbers are called α-Euclidean numbers.

In the following we will refer to

E

as to the

α

-Euclidean line.

Before introducing Axiom 3 and for a better reasoning, the following definition is needed. It introduces a partitioning of the set

E

into three categories: infinite, finite, and infinitesimal numbers.

Definition 1.

Given

ξ \in E

, then

ξ is infinite ⟺ $\forall n \in N, | ξ | > n$
ξ is finite ⟺ $\exists n \in N$ , $\frac{1}{n} < | ξ | < n$
ξ is infinitesimal $⟺ \forall n \in N,$ $| ξ | < \frac{1}{n} .$

Let

V (N)

denote the superstructure on

N

, namely

V (N) = ⋃_{i = 0}^{\infty} V_{i} (N),

where

V_{0} (N) = N,

and

V_{i + 1} (N) = V_{i} (N) \cup P (V_{i} (N));

(where

P (\cdot)

indicates the set of the parts of a given set), we set

U : = {X \in V (N) | X is countable} .

Axiom 3.

There exists a function

num : U \to E

which satisfies the following properties:

if A is finite $num (A) = | A |$ ( $| \cdot |$ denotes the cardinality of a set)
$num (A) < num (B)$ if $A \subset B$
$num (A \cup B) = num (A) + num (B) - num (A \cap B)$
$num (A \times B) = num (A) \cdot num (B)$
$α = num (N) .$

Axiom 3 introduces the infinite number

α

and states that it can be manipulated as any other finite real number using the field rules such as commutativity, associativity, etc. As an example, the following relations hold true within AT:

0 < \frac{1}{α} = α^{- 1} < α^{0} = 1 < α^{1} = α < (α + 1),

α \cdot (α + 2) = α^{2} + 2 α,

\frac{- 10.0 α^{2} + 16.0 + 42.0 η^{2}}{5.0 α^{2} + 7.0} = - 2.0 + 6.0 η^{2},

where we have set

η : = \frac{1}{α} .

Hence

η

is an infinitesimal, being the reciprocal of

α

. Now we will axiomatically introduce the notion of

α

-limit:

Axiom 4.

Every sequence

φ : N \to R

has a unique α-limit denoted by

{lim}_{n ↑ α} φ (n)

which satisfies the following properties:

1.: if $ξ \in E$ , then there exists a sequence $φ : N \to R$ such that

$ξ = lim_{n ↑ α} φ (n)$
2.: if $φ (n) = n,$ then

$lim_{n ↑ α} φ (n) = α$
3.: if eventually $φ (n) \geq ψ (n)$ (namely $\exists n_{0} \in N$ such that $\forall n \geq n_{0}, φ (n) \geq ψ (n)$ ), then

$lim_{n ↑ α} φ (n) \geq lim_{n ↑ α} ψ (n)$
4.: for every sequence $φ, ψ$

$\begin{matrix} lim_{n ↑ α} φ (n) + lim_{n ↑ α} ψ (n) & = lim_{n ↑ α} (φ (n) + ψ (n)) \\ lim_{n ↑ α} φ (n) \cdot lim_{n ↑ α} ψ (n) & = lim_{n ↑ α} (φ (n) \cdot ψ (n)) . \end{matrix}$

Notice that in order to distinguish the usual limit of a sequence (which we will call Cauchy limit) from the

α

-limit, we use the symbols “

n \to α

” and “

n ↑ α

” respectively. The points 1–3 are not surprising since we expect to be satisfied by any notion of limit provided the target space be equipped with a reasonable topology. In Axiom 4, the new (and, maybe for someone, surprising) fact is that every sequence has an

α

-limit. Nevertheless, Axiom 4 is not contradictory and a model for it can be constructed within the Zermelo-Fraenkel set theory with the axiom of choice (see [11]).

The first consequence of the

α

-limit is that every real function

f : R \to R

can be extended to a function

f^{*} : E \to E

by setting:

f^{*} (lim_{n ↑ α} φ (n)) = lim_{n ↑ α} f (φ (n)) .

It is not difficult to prove that this is a “good” definition, that is

f^{*} (ξ)

does not depend on the sequence

φ (n)

which defines

ξ

. In the rest of this paper, when no ambiguity is possible, we will omit the “∗” and therefore f and

f^{*}

will be denoted by the same symbol.

As we remarked above, Axiom 4 can be seen as a weak form of Transfer Principle. For example, suppose that you want to transfer the following property of trigonometric functions:

sin (2 x) = 2 cos x \cdot sin x,

to their extension over

E

. If we take a generic point

ξ \in E

, by Axiom 4 there exists a sequence

φ

such that:

ξ = lim_{n ↑ α} φ (n) .

Since

φ (n) \in R

, we have that:

sin (2 φ (n)) = 2 cos φ (n) \cdot sin φ (n) .

We can take the

α

-limit of both sides:

lim_{n ↑ α} sin (2 φ (n)) = lim_{n ↑ α} [2 cos φ (n) \cdot sin φ (n)],

and use the properties of the

α

-limit to get:

sin (2 ξ) = 2 cos ξ \cdot sin ξ .

Now, let us see a basic theorem in non-Archimedean Mathematics. The theorem states that any non-infinite hyper-real number is infinitely close to only one real number, namely its standard part, and there exists a function which associates the former to the latter. Such a function is defined from all the non-infinite values of

E

(indicated with

E_{f i n}

) to

R

and it is indicated by

s t

.

Theorem 1.

Standard part There exists a function

s t : E_{f i n} \to R

satisfying

$x \sim y \Rightarrow s t (x) = s t (y)$
$s t (s t (x)) = s t (x)$

where ∼ stands for “is infinitely close to”, i.e.,

x - y

is an infinitesimal number.

Proof.

See e.g., [12] or [11]. □

As an example, the following equations hold true:

s t (2.1 - 5 η) = 2.1, s t (- 3) = - 3, s t (- \sqrt{2} η^{2} + π η^{3}) = 0, s t (α) = ∄ .

Put in another way, the standard part function maps any number into its closest real.

The standard part of a number is related to the Cauchy limit in the following way. If a sequence

φ (n)

admits the Cauchy limit, the relation with the

α

-limit is given by the following identity:

lim_{n \to α} φ (n) = s t (lim_{n ↑ α} φ (n)) .

(1)

Another important relation between the two limits is the following. If

lim_{n ↑ α} φ (n) = ξ

is not infinite, then there exists a subsequence

φ (n_{k})

of

φ (n)

such that:

lim_{k \to \infty} φ (n_{k}) = s t (ξ) .

For the consistence of the axioms and the proofs of the facts claimed in this section we refer to [11]. Again in [11], it is proved that the theory can be specified to get the following theorem, which is very important to defining the concept of algorithm numbers (introduced in Section 3) and the operations between them:

Theorem 2.

The number α satisfies the following properties:

Divisibility Property: For every $k \in N$ , the number α is a multiple of k and the numerosity of the set of multiples of k:

$num ({k, 2 k, 3 k, \dots, n k, \dots}) = \frac{α}{k} .$
Root Property: For every $k \in N$ , the number α is a k-th power and the numerosity of the set of k-th powers:

$num ({1^{k}, 2^{k}, 3^{k}, \dots, n^{k}, \dots}) = \sqrt[k]{α} .$
Power Property: If we set $P_{f i n} (A) = {F \in P (A) | F$ is a finite set}, then

$num (P_{f i n} (N^{+})) = 2^{α} .$
Integer numbers Property:

$num (Z) = 2 α + 1 .$
Rational numbers Property: For every $q \in Q$

$num (Q) = 2 α^{2} + 1,$

and

$num ((q, q + 1] \cap Q) = num ((0, 1] \cap Q) = α .$

Proof.

See [11], Sections 16.6 and 16.7. □

3. The Algorithmic Numbers

The Algorithmic Numbers (ANs) have been introduced in [16]. They consist in a subset of the numbers in

E

which can be better standardized, and therefore easily manipulated via computer. The definition of AN, which follows, is inspired by the work of Levi-Civita [7] and leverages on the concept of monosemium, which can be identified with a number of the form

r α^{p}

where

r \in R

and

p \in Q

.

Definition 2.

Algorithmic number A number

ξ \in E

is called algorithmic if it can be represented as a finite sum of monosemia, namely

ξ = \sum_{k = 0}^{ℓ} r_{k} α^{s_{k}}; r_{k} \in R, s_{k} \in Q; s_{k} > s_{k + 1} .

(2)

Moreover, one can always represent it in the following form, called “normal form”

ξ = α^{p} P (η^{\frac{1}{m}}),

where

η

is an infinitesimal (the reciprocal of

α

, see above),

p \in Q

,

m \in N

and

P (x)

is a polynomial with real coefficients such that

P (0) \neq 0 .

A parallelism with scientific notation for real numbers may help in suggesting the uniqueness of the representation. As an example, consider the number 1.3675E3: the term E3 means

10^{3}

and plays the same role as

α^{p}

; while the number 1.3675 can be represented by a polynomial of non-positive powers of 10, i.e.,

1 \cdot 10^{0} + 3 \cdot 10^{- 1} + 6 \cdot 10^{- 2} + \dots

. Since a polynomial in

η

is a polynomial in non-positive powers of

α

, the parallelism is completed.

In particular, two issues arise when one tries to deal with ANs within a computer:

the inverse of an AN is not always an AN, e.g., ${(α + 1)}^{- 1}$ is not an AN;
they have a variable length coding, and hence they are not suitable to number crunching-oriented applications.

To overcome these drawbacks, a notion of truncation is needed.

The truncation of a generic AN

ξ

affects both the number of monosemia used to build it and the fixed length representation of the coefficients

r_{i}

,

i = 0, \dots, ℓ

. Nevertheless, the second topic is negligible since there is plenty of literature in this area, and a very efficient mechanism to handle it already exists [11]. Concerning the first one, it reduces to the truncation of the polynomial

P (\cdot)

to its normal form. To tackle it, it is enough to define the following truncation function applied to a generic polynomial

P (x) = p_{0} x^{z_{0}} + \dots + p_{m} x^{z_{m}}

,

z_{i - 1} < z_{i}

,

i = 1, \dots, m :

{tr}_{n} [P (x)] = \{\begin{matrix} P (x) & n \geq m \\ p_{0} x^{z_{0}} + \dots + p_{n} x^{z_{n}} & n < m \end{matrix} .

The encoding of an AN truncated at n is referred to as ANn. For example, if the machine precision allows one to set n at most to 3, then the encoding used for representing ANs in an experiment on that machine is AN3.

3.1. The BANs (Bounded Algorithmic Numbers)

There is a particular subset of ANs which deserves particular attention: the Bounded Algorithmic Numbers (BANs). A BAN is defined by the normal form

α^{p} P (η)

, where

p \in Z

and

P (0) \neq 0

. Below, we report the approximated algebraic operations between truncated approximations of two BANs, namely

ξ = α^{p} P (η)

and

ζ = α^{q} Q (η)

. Notice that the order of truncation n is an arbitrary natural value specified at compile time. In general, the user tunes it taking into consideration several aspects, e.g., the required precision of the computations, the computation time available, the architectural properties of the machine, and so on.

Sum (assuming

p \geq q

):

\begin{matrix} ξ + ζ & = α^{p} P (η) + α^{p} (Q (η) η^{p - q}) \\ = α^{p} (P (η) + {tr}_{n} [Q (η) η^{p - q}]) . \end{matrix}

Product:

ξ ζ = α^{p + q} {tr}_{n} [P (η) \cdot Q (η)] .

Division:

After having rewritten

ζ

as:

ζ = α^{q} (q_{0} + \sum_{k = 1}^{n} q_{k} η^{k}) = q_{0} α^{q} (1 - ε),

where

ε = - \sum_{k = 1}^{n} \frac{q_{k}}{q_{0}} η^{k}

, the definition of the division becomes:

\begin{matrix} \frac{ξ}{ζ} = α^{p - q} {tr}_{n} [\frac{P (η)}{q_{0}} (1 + ε + ε^{2} + \dots + ε^{n})] \\ = α^{p - q} (\frac{P (η)}{q_{0}} + {tr}_{n} [ε \frac{P (η)}{q_{0}}] + \dots + {tr}_{n} [ε^{n} \frac{P (η)}{q_{0}}]) . \end{matrix}

The operations between BANs can be sped up by using the BAN Processing Unit described in [17]. Further details about these and other operations, such as the square root of a generic AN, can be found in the original work [16]. A few engineering applications of Alpha-Theory in general, and of BANs in particular, can be found in [18,19].

3.2. A Summary: Real Numbers vs. Euclidean Numbers

Table 1 contrasts Real with Euclidean numbers, with respect to their representation and visualization.

4. A New Definition of Heavy Tailed Distribution

Let now introduce a new definition of heavy tailed distribution. Before doing it, let us recall basic facts about random variables.

Let X be a random variable with the cumulative distribution function:

F (x) = P [X \leq x],

and survival function

\bar{F} (x)

, defined as:

\bar{F} (x) = 1 - F (x) = P [X > x] .

Considering its moment-generating function:

M_{X} (t) = E \{e^{t X}\},

a distribution is said to be heavy tailed if, for all t > 0,

M_{X} (t) = E \{e^{t X}\} \to \infty .

More details on this definition of heavy tailedness can be found in [3,4].

An implication of this is that:

lim_{x \to \infty} e^{t x} \bar{F} (x) = \infty \forall t > 0 .

This means that its complementary distribution function decays slower than exponentially.

However, there is still some ambiguity in the scientific world about the term “heavy tailed” and its definition.

Two different approaches to definition are usually presented, which are characterised by a different level of restriction of the constraints required in order to speak of a heavy tailed distribution.

A first, general definition, states that a distribution is said to be heavy tailed if its tails decay more slowly than an exponential curve [4].

This universally recognised definition coincides, on a mathematical level, with what has been analytically presented above and imposes a rather “weak” requirement. According to this approach in fact, such distributions exhibit many small observables, mixed with a few large observables (outliers), so that most of the contribution to the sample mean or variance comes from the latter. This justifies in a “qualitative” way their characteristic of being able to assume divergent variance or mean. A practical example can be found in distributions with a hyperbolic trend of the survival function (as in the case of the well-known Pareto distribution).

A second definition for heavy tailed distributions requires that they have variance, or in general any of their moments, that can diverge to infinity [3,4]. This second strand of thought therefore requires a more stringent constraint than the first approach. It may be difficult to verify this definition in a simulation environment.

Suppose you have generated 5000 samples of random numbers, according to a distribution having a finite mean and an infinite variance (like some Pareto distributions); if you want to check whether or not your algorithm for generating pseudo-random numbers is accurate, you might try to compute the sample variance. However, the sample variance increases when the size of the sample increases, though it always appears as a finite number (as obtained by C/C++/Matlab pseudo-number generators).

Therefore, the sample variance is not able to give a direct information, aligned with the theoretical variance (which is known to be diverging towards

+ \infty

). This problematic situation can be overcome by introducing a third “fair” definition of heavy tailed distributions, which is intermediate between the first and the second presented above, as regards the rigidity of the constraint required.

This is possible thanks to a non-Archimedean mathematical approach like the Alpha Theory.

New Definition of heavy tailed distributions: A distribution is considered heavy tailed if its variance diverges towards

+ \infty

, or if it assumes a (concrete) infinite value.

Of course such a definition makes only sense when using non-Archimedean mathematics, where infinite values are allowed.

This novel definition (and the use of Alpha-Theory) opens the way to numerically compute the sample variance (which now in some cases will converge towards a concrete infinite value) and to compare it with its theoretical counterpart. This can be achieved thanks to our implementation of BANs (i.e., Euclidean numbers) in Matlab.

In the next section we will introduce, as a first example, a Gaussian probability density function having an infinite variance, although not a diverging one. This is a preparatory step towards the creation of the Euclidean LogNormal distribution described in Section 6.

5. The Euclidean Gaussian Distribution

We recall that the probability density function of a Euclidean Gaussian random variable is the following:

f (x) = \frac{1}{σ \sqrt{2 π}} e^{- \frac{1}{2} {(\frac{x - μ}{σ})}^{2}},

where the only difference with its standard counterpart is that

μ

and

σ

are now Euclidean numbers, therefore, can be also infinite/infinitesimal.

A particularly interesting case is when

μ

is finite and

σ

is infinite, such as

2.3 α^{2}

,

4.7 α^{4}

, etc.

5.1. How to Generate Pseudo-Random Numbers According to It

The idea behind the way to simulatively implement such a distribution inside a computer, is based on an algorithmic approach to generating pseudo-random numbers, which makes use of our implementation of two Matlab classes, called Ban and BanArray, to handle scalar and vector quantities, respectively.

The generation algorithm is the “Euclidean” version of the classic procedure based on standard normal variates. More in detail, we want to allocate a BanArray, constructed as a column vector of N (where N is obviously the required size of the BanArray) rows of Ban, according to the desired Gaussian probability density. To do this, we start from the realisations of a standard Gaussian distribution with zero mean and unit variance, obtained simply with the built-in function randn(N,1), and we go to multiply each realisation, one by one, by

σ

(i.e., the theoretical standard deviation, which this time is a Ban, i.e., a Euclidean number), to obtain the desired variance, and to sum element-wise for the set mean (which can again be a Euclidean number).

Some important parameters, that must be set during this generation phase, are the degree of the Ban used, i.e., the degree of the polynomial in

η

of the Ban, and the theoretical values of mean and standard deviation that we want to use for the Gaussian probability density to be generated. The leading exponent is set at will through an input parameter in the call to the constructor of the BanArray class.

5.2. How to Numerically Verify the Sample Mean and Variance

The quality of the algorithm is tested by checking/fitting the sample mean and variance against the true theoretical values used in generating the samples.

The calculation of the estimated mean (est_mean) and the estimated variance (est_var) is entrusted, respectively, to the two Matlab functions mean and var, that we developed in order to exploit the classical sample definitions of mathematical statistics, appropriately overloaded for the BanArray class:

est_mean = mean(x); % x is a BanArray, est_mean is a Ban

est_var = var(x); % x is a BanArray, est_var is a Ban

Table 2 reports four numerical tests performed in Matlab, where a specific Euclidean Gaussian with infinite mean and finite variance is considered, for varying number of samples (ranging from

10^{2}

to

10^{6}

). Results show that, as the number of samples increases, the estimated values of mean and variance get closer and closer to the theoretical ones.

In Appendix A, Table A1, we report all the

10^{2}

samples associated to the first row of Table 2, to provide a concrete example of the generated samples (i.e., of the generated Euclidean Gaussian pseudo-random numbers).

Observe how the coefficient of the leading monosemium of the generated pseudo-random numbers is always equal to

- 1.5

. This could appear strange or counteractive, but it makes perfectly sense since our Euclidean Gaussian has an infinite mean but a finite standard deviation. If we had used an infinite standard deviation too, we would have observed perturbed values around

- 1.5

.

6. Euclidean LogNormal Distributions

In this section we introduce a Euclidean LogNormal probability density function, which is of particular interest in modelling heavy tailed phenomena: that is, one that has a finite mean and an infinite (but not diverging and also verifiable) variance.

It can be stated that if a random variable X is log-normally distributed, then the random variable Z obtained by calculating its natural logarithm is distributed according to a Gaussian. Equivalently, the relationship that exists is as follows:

X = e^{μ + σ U},

with

U \in N (0, 1) .

Indeed

Z = \ln (X) = μ + σ U,

and

Z \in N (μ, σ^{2}) .

The only difference, again, with its standard version lies in the fact that

μ

and

σ

are now Euclidean numbers.

It is well known that its theoretical mean and variance are:

\overset{˘}{E} {X} = e^{μ + \frac{σ^{2}}{2}},

\overset{˘}{VAR} \{X\} = \overset{˘}{E} \{X^{2}\} - \overset{˘}{E} {\{X\}}^{2} = e^{2 μ + 2 σ^{2}} - e^{2 μ + σ^{2}} .

6.1. A Euclidean LogNormal with Finite Mean and Infinite Variance

As numerous studies have already reported, the log normal distribution plays a central role in data fitting in the context of telecommunications networks [1,20].

In particular, in computer networks and Internet traffic analysis, LogNormal is shown as a good statistical model to represent the amount of traffic per time unit. This has been shown by applying a robust statistical approach on a large group of real Internet traces [21,22].

It has also been found that in Internet discussion fora, (from a collection of large datasets from several sources) the comment length distributions are very regular and described by the LogNormal shape with a very high precision [23].

However, this distribution, in its standard version, has one significant shortcoming: does not exhibit infinite variance. Therefore (as also reported in [1]), it cannot be considered a heavy tailed distribution (according to the more “stringent” definition presented above), and consequently cannot be studied with the classical methods usually used for this class of distributions.

This limitation can surprisingly be overcome by the use of its Euclidean version.

We were in fact able to obtain finite mean and infinite variance, properly exploiting the two degrees of freedom granted by the two free settable parameters

μ

and

σ

, of the “underlying” generating Gaussian distribution.

As an example, by fixing:

σ = α and μ = - \frac{α^{2}}{2},

we obtain a finite value for the theoretical mean:

\overset{˘}{E} {X} = e^{μ + \frac{σ^{2}}{2}} = e^{- \frac{α^{2}}{2} + \frac{α^{2}}{2}} = 1,

and an infinite value for the theoretical variance:

\overset{˘}{VAR} \{X\} = e^{2 μ + 2 σ^{2}} - e^{2 μ + σ^{2}} = e^{- α^{2} + 2 α^{2}} - e^{- α^{2} + α^{2}} = e^{α^{2}} - 1 .

Now we only need to show that, once generated pseudo-random numbers according to this Euclidean LogNormal, we obtain a sample mean close to 1 and a sample variance close to

e^{α^{2}} - 1 .

We will do this experimental verification in next subsection. Meanwhile, observe how the technique shown above can be generalized: anytime we choose an infinite value for

σ

, if we choose a value for

μ

equal to

- \frac{σ^{2}}{2}

, we will always obtain a Euclidean LogNormal having finite mean and infinite variance. This can be further generalized, as shown in next subsection:

μ

is not required to be exactly equal to

- \frac{σ^{2}}{2}

, it suffices that

μ + \frac{σ^{2}}{2}

is finite or infinitesimal to have a finite theoretical mean and an infinite theoretical variance (obviously, the assumption that

σ

has been chosen infinite must still hold true).

6.2. How to Numerically Assess whether the Euclidean LogNormal Distribution Has the Desired Mean and Variance

Differently from diverging-variance distributions, like some Pareto ones, the correctness of the theoretical mean and variance of a Euclidean LogNormal distribution can be easily verified by comparing it with the sample mean and variance, respectively.

To show this, let us consider another Euclidean LogNormal, characterized again by an infinite value for

σ

equal to

α

, and the following infinite value for

μ

:

μ = - 0.5 α^{2} + 9.54681 .

Under these choices, the theoretical mean is finite and equal to

\overset{˘}{E} {X}

=

e^{9.54681}

, while the theoretical variance is infinite and equal to

\overset{˘}{V A R} {X}

=

e^{(1 + 19.0936 η^{2}) α^{2}} - e^{19.0936}

.

Indeed:

\overset{˘}{E} \{X\} = e^{μ + \frac{σ^{2}}{2}} = e^{- 0.5 α^{2} + 9.54681 + 0.5 α^{2}} = e^{9.54681},

and

\overset{˘}{VAR} \{X\} = e^{2 μ + 2 σ^{2}} - e^{2 μ + σ^{2}}

= e^{- α^{2} + 19.0936 + 2 α^{2}} - e^{- α^{2} + 19.0936 + α^{2}} = e^{(1 + 19.0936 η^{2}) α^{2}} - e^{19.0936} .

We now only need to verify that the sample mean and variance are the expected ones. We have generated pseudo-random numbers according to the considered Euclidean LogNormal distribution following the same approach shown in Section 5.2. The sample mean and variance are reported in Table 3, as a function of the number of generated samples. To perform the simulations, we used PC equipped with a 64 GB RAM and an Intel Xeon processor (Cascadelake) with clock frequency of 3.8 GHz. The simulation took 859 s.

From Table 3, we can observe that the estimated mean is infinite, instead of finite. However, this is only due to unavoidable numerical errors. As the number of samples increases, the leading coefficient decreases from the order of

10^{- 2}

to the order of

10^{- 4}

. If we introduce a numerical threshold equal to

2.5 \cdot 10^{- 3}

, and we consider any coefficient below that threshold as being zero, then the situation would be the one depicted in Table 4. From that table, we would observe that, starting from

10^{6}

samples, the estimated mean is finite, as predicted by the theory. Also, the agreement between estimated mean and variance and their nominal counterparts starts to become very good.

6.3. An Observation about Geometric and Harmonic Means

It is worth noting that the theoretical Geometric Mean (GM) of the first considered Euclidean LogNormal (the one having

σ = α

and

μ = - \frac{α^{2}}{2}

) is infinitesimal. Indeed, it is well known that the geometric mean of the LogNormal(

μ, σ

) is

e^{μ}

, therefore:

\overset{˘}{G M} {X} = e^{- \frac{α^{2}}{2}},

which is infinitesimal. Overall we have created a Euclidean LogNormal distribution having:

an infinitesimal theoretical geometric mean: $\overset{˘}{G M} {X} = e^{- \frac{α^{2}}{2}},$
a finite theoretical mean: $\overset{˘}{E} {X} = 1,$
an infinite theoretical variance: $\overset{˘}{V A R} {X} = e^{α^{2}} - 1 .$

We can also compute the theoretical Harmonic Mean (HM), which has the following expression as a function of the two parameters

μ

and

σ

(see [24]):

\overset{˘}{H M} {X} = e^{μ - \frac{σ^{2}}{2}} .

In our case,

H M {X} = e^{- α^{2}}

, which, again, is infinitesimal and strictly lower than the GM (

e^{- α^{2}} < e^{- \frac{α^{2}}{2}}

). This is not surprising, since we know from the theory that

\overset{˘}{H M} {X} ⩽ \overset{˘}{G M} {X} ⩽ \overset{˘}{A M} {X}

,

\overset{˘}{A M} {X}

being the arithmetic mean, i.e.,

\overset{˘}{E} {X} .

In our case the chain of inequalities becomes:

\overset{˘}{H M} {X} = e^{- α^{2}} ⩽ \overset{˘}{G M} {X} = e^{- \frac{α^{2}}{2}} ⩽ \overset{˘}{A M} {X} = 1 .

As a final observation, we might ask ourselves if it exists a Euclidean LogNormal having infinitesimal HM, finite GM and infinite AM. This is possible, for instance by choosing again

σ

infinite and equal to

α

, and

μ

equal to zero. Indeed, in this case we obtain:

\overset{˘}{H M} {X} = exp (- \frac{α^{2}}{2}) ⩽ \overset{˘}{G M} {X} = 1 ⩽ \overset{˘}{A M} {X} = exp (\frac{α^{2}}{2}) .

Of course, the Euclidean LogNormal built in this way has also the variance which is infinite. Anyway, probability density functions having infinite mean have not yet found practical applications; hence, this remark is nothing more than just a curiosity, although an interesting one.

6.4. Generalizing our Euclidean LogNormal: A 3-Parameter Version

The Euclidean LogNormal distributions introduced above, having finite mean and infinite variance, have the following limitation: the two parameters

μ

and

σ

are related to each other (in particular,

μ

must be equal to

- \frac{σ^{2}}{2}

, and, of course,

σ

must be infinite).

In this subsection we show how to build a more general one, for which both the mean and variance can be specified to the desired values, keeping the fact that the mean must be finite and the variance infinite (but unrelated to each other).

Suppose that we need a Euclidean LogNormal to have variance

α

, instead of

e^{α^{2}} - 1

. It suffices to set

σ = \sqrt{ln (α + 1)}

, and then its variance will be exactly

α .

If such variance is still too high, we can easily generate another variant having variance equal to

ln α

. Indeed, by setting

σ = \sqrt{ln (ln α + 1)}

, we would get the desired variance. If we want to set the mean to a finite value, but different from 1, we can use the following three-parameter Euclidean LogNormal, obtained by translation by a positive amount equal to

δ

. The new Euclidean LogNormal, valid for

x > δ

, is:

\frac{1}{(x - δ) σ \sqrt{2 π}} e^{- \frac{{[ln (x - δ) - μ]}^{2}}{2 σ^{2}}},

and its expected mean is

δ + 1

, as it can be easily proved. Its variance is not affected by such translation, for obvious reasons. This means that, by setting, as previously done,

μ = - \frac{1}{2} σ^{2}

, we obtain a finite mean equal to

δ + 1

and an infinite variance equal to

e^{σ^{2}} - 1

, provided that

σ

is infinite. Also, the infinite variance

e^{σ^{2}} - 1

, can be made of the order of

ln α

,

α

,

α^{2}

,

e^{α}

,

e^{α^{2}}

, etc., depending on the value assigned to

σ

.

7. Conclusions

In this work we have used Non-Standard Analysis to derive a formula for a LogNormal probability density function having a finite mean and an infinite (although not diverging) variance. More precisely, we have used Benci’s Alpha-Theory and the field of Euclidean Numbers, which is a non-standard extension of the real numbers. Euclidean numbers can be represented on a computer, using a polynomial-like finite-length representation, and therefore they can be used in number-crunching oriented numerical application.

In this work, we have implemented BANs in Matlab by developing the Ban and BanArray classes. Then we have used them to design a LogNormal distribution having a finite mean and an infinite variance, and to numerically verify that the pseudo-random numbers generated according to this function are exactly (apart from numerical estimation errors) the ones predicted by the theory, and the numerical estimation error decreases as the number of samples increases.

We think that this work will allow the modelling and the numerical verification of phenomena which require LogNormal density functions having finite mean and infinite variance. Examples of application are multiplicative stochastic processes which require heavy tailed PDFs. This kind of application will be investigated in a future study.

Author Contributions

Conceptualization, M.C.; Methodology, M.P.; Software, F.F.; Validation, M.P.; Writing–original draft, F.F.; Writing–review and editing, M.C. and M.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been partly funded by the Italian Italian Ministry of Education, Universities and Research, FoReLab project (Departments of Excellence, 2023–2027), and partly by PNRR-M4C2-Investimento 1.3, Partenariato Esteso PE00000013-“FAIR-Future Artificial Intelligence Research”-Spoke 1 “Human-centered AI” (the latter program is funded, on its turn, by the European Commission under the NextGeneration EU programme).

Data Availability Statement

Not applicable.

Acknowledgments

We wish to express our warmest appreciation to Vieri Benci, for his generous help in teaching us Robinson’s Non-Standard Analysis and his Alpha-Theory.

Conflicts of Interest

The author declare that there is no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

VAR	Variance
BAN	Bounded Algorithmic Number
`Ban`	Bounded Algorithmic Number (name of the Matlab class)
`BanArray`	Vector of Bounded Algorithmic Numbers (name of the Matlab class)
GM	Geometric Mean
HM	Harmonic Mean
AM	Arithmetic Mean

Appendix A. An Example of the Samples Generated for a Euclidean Gaussian

Table A1 provides a concrete example of the BANs generated for the first row of Table 2, i.e., for the case of

10^{2}

samples.

Table A1. Pseudo-random samples related to the Euclidean Gaussian considered in Section 5.2, and associated to Table 2, for the case

10^{2}

samples. We recall that its theoretical mean is

(- 1.5 + 20.33 η - 403.26 η^{2}) α^{1}

, its theoretical variance is

(11.56 - 97.444 η - 768.853 η^{2})

, while the estimated mean is

(- 1.5 + 19.8637 η - 400.701 η^{2}) α^{1}

and its estimated variance is

(13.7816 - 114.018 η + 246.011 η^{2})

(both estimated using the 100 samples below).

Table A1. Pseudo-random samples related to the Euclidean Gaussian considered in Section 5.2, and associated to Table 2, for the case

10^{2}

samples. We recall that its theoretical mean is

(- 1.5 + 20.33 η - 403.26 η^{2}) α^{1}

, its theoretical variance is

(11.56 - 97.444 η - 768.853 η^{2})

, while the estimated mean is

(- 1.5 + 19.8637 η - 400.701 η^{2}) α^{1}

and its estimated variance is

(13.7816 - 114.018 η + 246.011 η^{2})

(both estimated using the 100 samples below).

Sample #	X	Sample #	X
1	$(- 1.5 + 22.4962 η - 416.787 η^{2}) α^{1}$	51	$(- 1.5 + 23.8709 η - 417.318 η^{2}) α^{1}$
2	$(- 1.5 + 25.2201 η - 426.399 η^{2}) α^{1}$	52	$(- 1.5 + 25.0048 η - 420.661 η^{2}) α^{1}$
3	$(- 1.5 + 21.6425 η - 413.680 η^{2}) α^{1}$	53	$(- 1.5 + 20.6005 η - 404.389 η^{2}) α^{1}$
4	$(- 1.5 + 21.8335 η - 409.761 η^{2}) α^{1}$	54	$(- 1.5 + 14.7786 η - 385.606 η^{2}) α^{1}$
5	$(- 1.5 + 18.6227 η - 396.666 η^{2}) α^{1}$	55	$(- 1.5 + 22.9006 η - 413.423 η^{2}) α^{1}$
6	$(- 1.5 + 19.1047 η - 399.796 η^{2}) α^{1}$	56	$(- 1.5 + 18.9222 η - 393.984 η^{2}) α^{1}$
7	$(- 1.5 + 16.3357 η - 385.383 η^{2}) α^{1}$	57	$(- 1.5 + 22.9960 η - 411.908 η^{2}) α^{1}$
8	$(- 1.5 + 15.6063 η - 380.955 η^{2}) α^{1}$	58	$(- 1.5 + 8.97068 η - 358.333 η^{2}) α^{1}$
9	$(- 1.5 + 20.3847 η - 405.783 η^{2}) α^{1}$	59	$(- 1.5 + 22.5302 η - 407.495 η^{2}) α^{1}$
10	$(- 1.5 + 22.1259 η - 405.929 η^{2}) α^{1}$	60	$(- 1.5 + 19.0771 η - 392.468 η^{2}) α^{1}$
11	$(- 1.5 + 20.8233 η - 409.860 η^{2}) α^{1}$	61	$(- 1.5 + 18.4388 η - 394.555 η^{2}) α^{1}$
12	$(- 1.5 + 14.2839 η - 376.800 η^{2}) α^{1}$	62	$(- 1.5 + 18.3367 η - 386.855 η^{2}) α^{1}$
13	$(- 1.5 + 24.5148 η - 415.332 η^{2}) α^{1}$	63	$(- 1.5 + 21.2766 η - 402.290 η^{2}) α^{1}$
14	$(- 1.5 + 16.8114 η - 389.671 η^{2}) α^{1}$	64	$(- 1.5 + 16.8916 η - 392.727 η^{2}) α^{1}$
15	$(- 1.5 + 20.7545 η - 406.961 η^{2}) α^{1}$	65	$(- 1.5 + 21.6471 η - 410.025 η^{2}) α^{1}$
16	$(- 1.5 + 19.0400 η - 399.484 η^{2}) α^{1}$	66	$(- 1.5 + 13.9704 η - 375.993 η^{2}) α^{1}$
17	$(- 1.5 + 16.1227 η - 386.673 η^{2}) α^{1}$	67	$(- 1.5 + 18.6799 η - 394.883 η^{2}) α^{1}$
18	$(- 1.5 + 28.1959 η - 437.504 η^{2}) α^{1}$	68	$(- 1.5 + 18.7584 η - 392.743 η^{2}) α^{1}$
19	$(- 1.5 + 18.0416 η - 392.397 η^{2}) α^{1}$	69	$(- 1.5 + 12.5249 η - 370.446 η^{2}) α^{1}$
20	$(- 1.5 + 22.1598 η - 414.842 η^{2}) α^{1}$	70	$(- 1.5 + 22.8677 η - 417.810 η^{2}) α^{1}$
21	$(- 1.5 + 24.7817 η - 419.190 η^{2}) α^{1}$	71	$(- 1.5 + 26.9057 η - 426.264 η^{2}) α^{1}$
22	$(- 1.5 + 15.8078 η - 385.981 η^{2}) α^{1}$	72	$(- 1.5 + 19.7376 η - 396.241 η^{2}) α^{1}$
23	$(- 1.5 + 20.4450 η - 401.143 η^{2}) α^{1}$	73	$(- 1.5 + 18.5704 η - 395.276 η^{2}) α^{1}$
24	$(- 1.5 + 21.2017 η - 402.164 η^{2}) α^{1}$	74	$(- 1.5 + 12.6691 η - 376.308 η^{2}) α^{1}$
25	$(- 1.5 + 20.4024 η - 408.407 η^{2}) α^{1}$	75	$(- 1.5 + 16.8693 η - 389.817 η^{2}) α^{1}$
26	$(- 1.5 + 19.6540 η - 398.629 η^{2}) α^{1}$	76	$(- 1.5 + 19.0667 η - 394.974 η^{2}) α^{1}$
27	$(- 1.5 + 20.2672 η - 407.769 η^{2}) α^{1}$	77	$(- 1.5 + 23.6460 η - 414.981 η^{2}) α^{1}$
28	$(- 1.5 + 18.0191 η - 391.469 η^{2}) α^{1}$	78	$(- 1.5 + 15.9128 η - 381.605 η^{2}) α^{1}$
29	$(- 1.5 + 20.3515 η - 400.110 η^{2}) α^{1}$	79	$(- 1.5 + 22.5581 η - 415.944 η^{2}) α^{1}$
30	$(- 1.5 + 22.1714 η - 407.758 η^{2}) α^{1}$	80	$(- 1.5 + 22.7076 η - 416.133 η^{2}) α^{1}$
31	$(- 1.5 + 17.9911 η - 391.271 η^{2}) α^{1}$	81	$(- 1.5 + 13.0442 η - 367.553 η^{2}) α^{1}$
32	$(- 1.5 + 20.0107 η - 399.708 η^{2}) α^{1}$	82	$(- 1.5 + 18.5946 η - 396.663 η^{2}) α^{1}$
33	$(- 1.5 + 24.2437 η - 421.933 η^{2}) α^{1}$	83	$(- 1.5 + 18.2342 η - 392.373 η^{2}) α^{1}$
34	$(- 1.5 + 15.2698 η - 384.007 η^{2}) α^{1}$	84	$(- 1.5 + 18.8147 η - 394.900 η^{2}) α^{1}$
35	$(- 1.5 + 17.5150 η - 388.349 η^{2}) α^{1}$	85	$(- 1.5 + 18.8560 η - 395.944 η^{2}) α^{1}$
36	$(- 1.5 + 21.9145 η - 406.013 η^{2}) α^{1}$	86	$(- 1.5 + 17.7508 η - 393.736 η^{2}) α^{1}$
37	$(- 1.5 + 20.9541 η - 403.053 η^{2}) α^{1}$	87	$(- 1.5 + 20.4584 η - 402.923 η^{2}) α^{1}$
38	$(- 1.5 + 19.0133 η - 399.941 η^{2}) α^{1}$	88	$(- 1.5 + 20.5856 η - 398.918 η^{2}) α^{1}$
39	$(- 1.5 + 15.4042 η - 390.100 η^{2}) α^{1}$	89	$(- 1.5 + 28.1696 η - 436.693 η^{2}) α^{1}$
40	$(- 1.5 + 15.8819 η - 383.715 η^{2}) α^{1}$	90	$(- 1.5 + 12.2121 η - 367.368 η^{2}) α^{1}$
41	$(- 1.5 + 22.4115 η - 406.794 η^{2}) α^{1}$	91	$(- 1.5 + 25.8154 η - 427.502 η^{2}) α^{1}$
42	$(- 1.5 + 22.6872 η - 407.943 η^{2}) α^{1}$	92	$(- 1.5 + 20.3636 η - 403.066 η^{2}) α^{1}$
43	$(- 1.5 + 26.5761 η - 425.247 η^{2}) α^{1}$	93	$(- 1.5 + 19.2718 η - 393.848 η^{2}) α^{1}$
44	$(- 1.5 + 19.9352 η - 403.147 η^{2}) α^{1}$	94	$(- 1.5 + 13.0922 η - 366.607 η^{2}) α^{1}$
45	$(- 1.5 + 18.4954 η - 396.429 η^{2}) α^{1}$	95	$(- 1.5 + 18.8844 η - 396.287 η^{2}) α^{1}$
46	$(- 1.5 + 25.9097 η - 429.933 η^{2}) α^{1}$	96	$(- 1.5 + 23.7794 η - 413.745 η^{2}) α^{1}$
47	$(- 1.5 + 23.4765 η - 413.668 η^{2}) α^{1}$	97	$(- 1.5 + 20.4874 η - 405.223 η^{2}) α^{1}$
48	$(- 1.5 + 16.7999 η - 391.804 η^{2}) α^{1}$	98	$(- 1.5 + 22.9617 η - 414.269 η^{2}) α^{1}$
49	$(- 1.5 + 27.1523 η - 431.919 η^{2}) α^{1}$	99	$(- 1.5 + 18.9252 η - 396.288 η^{2}) α^{1}$
50	$(- 1.5 + 13.9762 η - 373.639 η^{2}) α^{1}$	100	$(- 1.5 + 21.5426 η - 412.867 η^{2}) α^{1}$

References

Crovella, M. Explaining World Wide Web Traffic Self-Similarity; Technical Report TR-95-015; Boston University Computer Science Department. 1995. Available online: https://cs-www.bu.edu/faculty/crovella/paper-archive/self-sim/tr-version.pdf (accessed on 6 March 2023).
Willinger, W.; Taqqu, M.; Sherman, R.; Wilson, D. Self-similarity through high-variability: Statistical analysis of Ethernet LAN traffic at the source level. IEEE/ACM Trans. Netw. 1997, 5, 71–86. [Google Scholar] [CrossRef]
Konstantinides, D.G. Risk Theory: A Heavy Tail Approach; World Scientific Publishing Co. Pte. Ltd.: Singapore, 2018. [Google Scholar]
Bianchi, M.L.; Stoyanov, S.V.; Tassinari, G.L.; Fabozzi, F.J.; Focardi, S.M. Handbook of Heavy-Tailed Distributions in Asset Management and Risk Management; World Scientific: Singapore, 2019. [Google Scholar] [CrossRef]
Mandelbrot, B.B. The Fractal Geometry of Nature; Freeman: San Francisco, CA, USA, 1982. [Google Scholar]
Deveau, M.; Teismann, H. 72 + 42: Characterizations of the completeness and Archimedean properties of ordered fields. Real Anal. Exch. 2014, 39, 261–304. [Google Scholar] [CrossRef]
Levi-Civita, T. Sugli Infiniti ed Infinitesimi Attuali Quali Elementi Analitici; Series 7; Atti del R. Istituto Veneto di Scienze Lettere ed Arti: Venezia, Italy, 1892. (In Italian) [Google Scholar]
Robinson, A. Non-Standard Analysis, 2nd ed.; Princeton University Press: Princeton, NJ, USA, 1996. [Google Scholar]
Conway, J.H. On Numbers and Games; CRC Press: Boca Raton, FL, USA, 2000. [Google Scholar]
Dehn, M. Die Legendre’schen Satze uber die Winkelsumme im Dreieck; BG Teubner: Leipzig, Germany, 1900. [Google Scholar]
Benci, V.; Di Nasso, M. How to Measure the Infinite: Mathematics with Infinite and Infinitesimal Numbers; World Scientific: Singapore, 2018. [Google Scholar]
Keisler, H.J. Foundations of Infinitesimal Calculus; Prindle, Weber & Schmidt: Boston, MA, USA, 1976; Volume 20. [Google Scholar]
Benci, V.; Di Nasso, M.; Forti, M. The eightfold path to nonstandard analysis. In Nonstandard Methods and Applications in Mathematics; Cutland, N.J.M., Di Nasso, D.A.R., Eds.; Association for Symbolic Logic, AK Peters: Wellesley, MA, USA, 2006; Volume 25, pp. 3–44. [Google Scholar]
Benci, V.; Di Nasso, M. Numerosities of labelled sets: A new way of counting. Adv. Math. 2003, 173, 50–67. [Google Scholar] [CrossRef]
Benci, V.; Horsten, L.; Wenmackers, S. Infinitesimal Probabilities. Br. J. Philos. Sci. 2018, 69, 509–552. [Google Scholar] [CrossRef] [PubMed]
Benci, V.; Cococcioni, M. The Algorithmic Numbers in Non-Archimedean Numerical Computing Environments. Discret. Contin. Dyn. Syst. Ser. S 2021, 14, 1673–1692. [Google Scholar] [CrossRef]
Rossi, F.; Fiaschi, L.; Cococcioni, M.; Saponara, S. Design and FPGA Synthesis of BAN Processing Unit for non-Archimedean Number Crunching. In Proceedings of the International Conference on Applications in Electronics Pervading Industry, Environment and Society(ApplePies’22), Genova, Italy, 8 September 2022. [Google Scholar]
Benci, V.; Cococcioni, M.; Fiaschi, L. Non–Standard Analysis Revisited: An Easy Axiomatic Presentation Oriented Towards Numerical Applications. Int. J. Appl. Math. Comput. Sci. 2022, 32, 65–80. [Google Scholar] [CrossRef]
Fiaschi, L.; Cococcioni, M. A Non-Archimedean Interior Point Method and Its Application to the Lexicographic Multi-Objective Quadratic Programming. Mathematics 2022, 10, 4536. [Google Scholar] [CrossRef]
Crovella, M.; Bestavros, A. Self-similarity in World Wide Web traffic: Evidence and possible causes. IEEE/ACM Trans. Netw. 1997, 5, 835–846. [Google Scholar] [CrossRef]
Alasmar, M.; Parisis, G.; Clegg, R.; Zakhleniu, N. On the Distribution of Traffic Volumes in the Internet and its Implications. In Proceedings of the IEEE INFOCOM 2019—IEEE Conference on Computer Communications, Paris, France, 29 April–2 May 2019; pp. 955–963. [Google Scholar] [CrossRef]
Alasmar, M.; Clegg, R.; Zakhleniuk, N.; Parisis, G. Internet Traffic Volumes are Not Gaussian—They are Log-Normal: An 18-Year Longitudinal Study With Implications for Modelling and Prediction. IEEE/ACM Trans. Netw. 2021, 29, 1266–1279. [Google Scholar] [CrossRef]
Sobkowicz, P.; Thelwall, M.; Buckley, K.; Paltoglou, G.; Sobkowicz, A. Lognormal distributions of user post lengths in Internet discussions—A consequence of the Weber-Fechner law? EPJ Data Sci. 2013, 2, 2. [Google Scholar] [CrossRef]
Limbrunner, J.F.; Vogel, R.M.; Brown, L.C. Estimation of Harmonic Mean of a Lognormal Variable. J. Hydrol. Eng. 2000, 5, 59–66. [Google Scholar] [CrossRef]

Table 1. Real Numbers vs. Euclidean Numbers.

Real Numbers	Euclidean Numbers
Can be represented on a computer using a fixed-length representation, such as the 64-bit IEEE 754/2019 standard.	Can be represented on a computer using a fixed-length representation, such as BAN2 (a polynomial in $η$ having degree two).
An example: $2.714 \cdot 10^{- 8}$	An example: $(5 - 7 η + 11 η^{2}) \cdot α^{- 3}$
Important remark:	Important remark:
The result of the multiplication of two 64-bit floating point numbers is still a 64-bit floating point number (this is really helpful in numerical computations when using iterative methods, and holds true for the other arithmetic operations as well)	The result of the multiplication of two BAN2 numbers is still a BAN2 number (this is really helpful in numerical computations when using iterative methods, and holds true for the other arithmetic operations as well)
Convenient ASCII display format:	Convenient ASCII display format:
`2.714E-8`	`(5 - 7 11)A-3`
(where E-8 means $10^{- 8}$ )	(where A-3 means $α^{- 3}$ )

Table 2. Numerical verification for the Euclidean Gaussian having infinite theoretical mean

μ = (- 1.5 + 20.33 η - 403.26 η^{2}) α^{1}

and finite theoretical variance

σ^{2} = (11.56 - 97.444 η + 217.044 η^{2})

.

Table 2. Numerical verification for the Euclidean Gaussian having infinite theoretical mean

μ = (- 1.5 + 20.33 η - 403.26 η^{2}) α^{1}

and finite theoretical variance

σ^{2} = (11.56 - 97.444 η + 217.044 η^{2})

.

# of Samples	$E {X}$ (Sample Mean)	$VAR {X}$ (Sample Variance)
$10^{2}$	$(- 1.5 + 19.8637 η - 400.701 η^{2}) α^{1}$	$(13.7816 - 114.018 η + 246.011 η^{2})$
$10^{3}$	$(- 1.5 + 20.3742 η - 403.674 η^{2}) α^{1}$	$(12.0228 - 101.538 η + 226.115 η^{2})$
$10^{4}$	$(- 1.5 + 20.3353 η - 403.262 η^{2}) α^{1}$	$(11.5572 - 97.4869 η + 217.224 η^{2})$
$10^{5}$	$(- 1.5 + 20.3318 η - 403.270 η^{2}) α^{1}$	$(11.5690 - 97.5113 η + 217.045 η^{2})$
$10^{6}$	$(- 1.5 + 20.3316 η - 403.276 η^{2}) α^{1}$	$(11.5695 - 97.5118 η + 217.043 η^{2})$

Table 3. Sample Mean and Sample Variance of the Euclidean LogNormal having a finite mean and infinite variance, obtained by setting

σ

=

α

and

μ

=

(- \frac{1}{2} α^{2} + 9.54681) .

The finite theoretical mean is

\overset{˘}{E} {X}

=

e^{9.54681}

, while the infinite theoretical variance is

\overset{˘}{V A R} {X}

=

e^{(1 + 19.0936 η^{2}) α^{2}} - e^{19.0936}

. If the coefficients highlighted in boldface were zero, we would have obtained an almost perfect match between theory and practice.

Table 3. Sample Mean and Sample Variance of the Euclidean LogNormal having a finite mean and infinite variance, obtained by setting

σ

=

α

and

μ

=

(- \frac{1}{2} α^{2} + 9.54681) .

The finite theoretical mean is

\overset{˘}{E} {X}

=

e^{9.54681}

, while the infinite theoretical variance is

\overset{˘}{V A R} {X}

=

e^{(1 + 19.0936 η^{2}) α^{2}} - e^{19.0936}

. If the coefficients highlighted in boldface were zero, we would have obtained an almost perfect match between theory and practice.

#	$E {X}$ (Sample Mean)	$VAR {X}$ (Sample Variance)
$10^{3}$	$e^{(0.0260943 + 0.000490988 η + 9.55455 η^{2}) α^{2}}$	$e^{(1.10438 + 0.000981975 η + 19.1091 η^{2}) α^{2}} - e^{(0.0521886 + 0.000981975 η + 19.1091 η^{2}) α^{2}}$
$10^{4}$	$e^{(- 0.0126293 + 0.0190467 η + 9.56013 η^{2}) α^{2}}$	$e^{(0.949483 + 0.0380934 η + 19.1203 η^{2}) α^{2}} - e^{(- 0.0252587 + 0.0380934 η + 19.1203 η^{2}) α^{2}}$
$10^{5}$	$e^{(0.00157857 - 0.00351095 η + 9.54501 η^{2}) α^{2}}$	$e^{(1.00631 - 0.0070219 η + 19.09 η^{2}) α^{2}} - e^{(0.00315714 - 0.0070219 η + 19.09 η^{2}) α^{2}}$
$10^{6}$	$e^{(- 0.00103881 + 0.000230083 η + 9.54617 η^{2}) α^{2}}$	$e^{(0.995845 + 0.000460166 η + 19.0923 η^{2}) α^{2}} - e^{(- 0.00207762 + 0.000460166 η + 19.0923 η^{2}) α^{2}}$
$10^{7}$	$e^{(0.000162514 + 0.000439597 η + 9.54666 η^{2}) α^{2}}$	$e^{(1.00065 + 0.000879195 η + 19.0933 η^{2}) α^{2}} - e^{(0.000325027 + 0.000879195 η + 19.0933 η^{2}) α^{2}}$

Table 4. This table is another version of Table 3, where we have considered any real coefficient below the threshold

2.5 \cdot 10^{- 3}

to be equal to zero. Now the disagreement between theory and practice is still in place for small sample sizes (up to

10^{5}

), but it disappears when the sample size is

⩾ 10^{6}

. In boldface are highlighted the coefficients for which the disagreement is still in place.

Table 4. This table is another version of Table 3, where we have considered any real coefficient below the threshold

2.5 \cdot 10^{- 3}

to be equal to zero. Now the disagreement between theory and practice is still in place for small sample sizes (up to

10^{5}

), but it disappears when the sample size is

⩾ 10^{6}

. In boldface are highlighted the coefficients for which the disagreement is still in place.

#	$E {X}$ (Sample Mean)	$VAR {X}$ (Sample Variance)
$10^{3}$	$e^{(0.0260943 + 9.55455 η^{2}) α^{2}}$	$e^{(1.10438 + 19.1091 η^{2}) α^{2}} - e^{(0.0521886 + 19.1091 η^{2}) α^{2}}$
$10^{4}$	$e^{(- 0.0126293 + 0.0190467 η + 9.56013 η^{2}) α^{2}}$	$e^{(0.949483 + 0.0380934 η + 19.1203 η^{2}) α^{2}} - e^{(- 0.0252587 + 0.0380934 η + 19.1203 η^{2}) α^{2}}$
$10^{5}$	$e^{(- 0.00351095 + 9.54501 η) α}$	$e^{(1.00631 - 0.0070219 η + 19.09 η^{2}) α^{2}} - e^{(0.00315714 - 0.0070219 η + 19.09 η^{2}) α^{2}}$
$10^{6}$	$e^{9.54617}$	$e^{(0.995845 + 19.0923 η^{2}) α^{2}} - e^{19.0923}$
$10^{7}$	$e^{9.54666}$	$e^{(1.00065 + 19.0933 η^{2}) α^{2}} - e^{19.0933}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cococcioni, M.; Fiorini, F.; Pagano, M. Modelling Heavy Tailed Phenomena Using a LogNormal Distribution Having a Numerically Verifiable Infinite Variance. Mathematics 2023, 11, 1758. https://doi.org/10.3390/math11071758

AMA Style

Cococcioni M, Fiorini F, Pagano M. Modelling Heavy Tailed Phenomena Using a LogNormal Distribution Having a Numerically Verifiable Infinite Variance. Mathematics. 2023; 11(7):1758. https://doi.org/10.3390/math11071758

Chicago/Turabian Style

Cococcioni, Marco, Francesco Fiorini, and Michele Pagano. 2023. "Modelling Heavy Tailed Phenomena Using a LogNormal Distribution Having a Numerically Verifiable Infinite Variance" Mathematics 11, no. 7: 1758. https://doi.org/10.3390/math11071758

APA Style

Cococcioni, M., Fiorini, F., & Pagano, M. (2023). Modelling Heavy Tailed Phenomena Using a LogNormal Distribution Having a Numerically Verifiable Infinite Variance. Mathematics, 11(7), 1758. https://doi.org/10.3390/math11071758

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Modelling Heavy Tailed Phenomena Using a LogNormal Distribution Having a Numerically Verifiable Infinite Variance

Abstract

1. Introduction

Organization of the Work

2. Benci’s Alpha-Theory and the Euclidean Numbers

Benci’s Alpha-Theory

3. The Algorithmic Numbers

3.1. The BANs (Bounded Algorithmic Numbers)

3.2. A Summary: Real Numbers vs. Euclidean Numbers

4. A New Definition of Heavy Tailed Distribution

5. The Euclidean Gaussian Distribution

5.1. How to Generate Pseudo-Random Numbers According to It

5.2. How to Numerically Verify the Sample Mean and Variance

6. Euclidean LogNormal Distributions

6.1. A Euclidean LogNormal with Finite Mean and Infinite Variance

6.2. How to Numerically Assess whether the Euclidean LogNormal Distribution Has the Desired Mean and Variance

6.3. An Observation about Geometric and Harmonic Means

6.4. Generalizing our Euclidean LogNormal: A 3-Parameter Version

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. An Example of the Samples Generated for a Euclidean Gaussian

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI