Comparing the Zeta Distributions with the Pareto Distributions from the Viewpoint of Information Theory and Information Geometry: Discrete versus Continuous Exponential Families of Power Laws

Nielsen, Frank

doi:10.3390/psf2022005002

Open AccessProceeding Paper

Comparing the Zeta Distributions with the Pareto Distributions from the Viewpoint of Information Theory and Information Geometry: Discrete versus Continuous Exponential Families of Power Laws^†

by

Frank Nielsen

Sony Computer Science Laboratories Inc., Tokyo 141-0022, Japan

^†

Presented at the 41st International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, Paris, France, 18–22 July 2022.

Phys. Sci. Forum 2022, 5(1), 2; https://doi.org/10.3390/psf2022005002

Published: 31 October 2022

(This article belongs to the Proceedings of The 41st International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering)

Download

Browse Figure

Versions Notes

Abstract

:

We consider the zeta distributions, which are discrete power law distributions that can be interpreted as the counterparts of the continuous Pareto distributions with a unit scale. The family of zeta distributions forms a discrete exponential family with normalizing constants expressed using the Riemann zeta function. We present several information-theoretic measures between zeta distributions, study their underlying information geometry, and compare the results with their continuous counterparts, the Pareto distributions.

Keywords:

maximum entropy; exponential family; convex duality; zeta sum; truncated exponential family; Von Mangoldt function

1. Introduction

Zeta distributions [1,2] are parametric discrete distributions with probability mass functions indexed by a scalar parameter

s \in (1, \infty)

whose support is the set of positive integers

N

:

p_{s} (x) = Pr [X = x] \propto \frac{1}{x^{s}}, x \in X = N = {1, 2, \dots} .

(1)

The normalization function

ζ (s)

of the zeta distributions

p_{s} (x) = \frac{1}{ζ (s)} \frac{1}{x^{s}}

such that

\sum_{x \in N} p_{s} (x) = 1

is the real Riemann zeta function [3,4,5]:

ζ (s) = \sum_{i = 1}^{\infty} \frac{1}{i^{s}} = 1 + \frac{1}{2^{s}} + \frac{1}{3^{s}} + \dots, s > 1 .

(2)

The set of zeta distributions

Z = {p_{s} (x) : s \in (1, \infty)}

forms a discrete exponential family [6,7] with natural parameter

θ (s) = s

lying in the natural parameter space

Θ = (1, \infty)

, the sufficient statistic

t (x) = - log x

, and the cumulant function or log-normalizer

F (θ) = log ζ (θ)

. Therefore, it follows from the theory of exponential families [7] that

log ζ (θ)

is a strictly convex and real analytic function (see Figure 1). Thus, the pmf of zeta distributions can be rewritten in the canonical form of exponential families as:

p_{s} (x) = exp (θ (s) t (x) - F (θ (s))) .

(3)

The characteristic function is thus

ϕ_{s} (t) = \frac{ζ (s + i t)}{ζ (s)}

.

Thus, a zeta distribution

p_{s} (x)

can be interpreted as the discrete equivalent of a Pareto distribution

q_{s} (x)

of scale 1 and shape

s - 1

with probability density function

q_{s} (x) = \frac{s - 1}{x^{s}}

for

x > 1

(see Table 1).

The zeta function is known to be irrational at many positive odd integers [8,9,10] and can be calculated using Bernoulli numbers [11] for positive even integers:

ζ (2 n) = \frac{{(- 1)}^{n + 1} B_{2 n} {(2 π)}^{2 n}}{2 (2 n)!}, n \in N

. The zeta function can be calculated fast [12] and precisely [13]. The derivatives of the zeta function have also been studied [12,14].

The zeta distributions are related to the Zipf distributions [15]

p_{s, N} (x) \propto \frac{1}{x^{s}}

for

x \in {1, \dots, N}

and the Zipf–Mandelbrot distributions [16,17]

p_{s, q, N} (x) \propto \frac{1}{{(x + q)}^{s}}

for

x \in {1, \dots, N}

, which play an important role in quantitative linguistics. See [6] for more details. The Zipf distributions and the Zipf–Mandelbrot distributions both have finite support and can be interpreted as truncated zeta distributions (right truncation for Zipf distributions and both left & right truncations for the Zipf–Mandelbrot distributions) with normalizing constants, which can be calculated approximately using properties of the zeta function [18]. Left-only truncations of the Zeta distributions are called Hurwitz zeta distributions [19]. Similarly, truncated Pareto distributions are used in applications [20]. Notice that truncated distributions of an exponential family with fixed truncation support form another exponential family [21]. The zeta distributions are infinite divisible [19,22]: A random variable following a zeta distribution can be expressed as the probability distribution of the sum of an arbitrary number of independent and identically distributed random variables. In applications, it is important to quantitatively discriminate between zeta distributions (see, for example [23,24] or [25]). Mixtures of zeta distributions have also been used to model social networks [26]. In general, products of exponential families yield other exponential families. The products of d zeta distributions form an exponential family called the zeta-star distributions [22].

In this paper, we first study various information-theoretic measures between zeta distributions by considering them as a discrete exponential family [7]: That is, we consider the

α

-divergences [27] between zeta distributions in Section 2, and study their limit Kullback–Leibler-oriented divergences when

α \to 1

and

α \to 0

in Section 3. We then compare these results with the counterpart results obtained for the continuous exponential family of Pareto distributions in Section 4. Finally, we conclude this work in Section 5.

2. Amari’s $α$ -Divergences and Sharma–Mittal Divergences

To measure the dissimilarity between two zeta distributions

p_{s_{1}}

and

p_{s_{2}}

, one can use the α-divergences [27] defined for a real

α \in (0, 1)

as follows:

D_{α} [p_{s_{1}} : p_{s_{2}}] : = \frac{1}{α (1 - α)} (1 - I_{α} [p_{s_{1}} : p_{s_{2}}]) = D_{1 - α} [p_{s_{2}} : p_{s_{1}}],

(4)

where

I_{α} [p_{1}, p_{2}] : = \sum_{i = 1}^{\infty} p_{1} {(x)}^{α} p_{2} {(x)}^{1 - α} = I_{1 - α} [p_{2} : p_{1}],

(5)

is the

α

-Bhattacharyya coefficient (a similarity measure also called an affinity coefficient).

It follows from [28] that the skewed Bhattacharyya coefficient amounts to a skewed Jensen divergence between the natural parameters of the exponential family

E

:

I_{α} [p_{s_{1}} : p_{s_{2}}] = exp (- J_{F, α} (s_{1} : s_{2})),

(6)

where

J_{F, α}

is the skewed Jensen divergence induced by a strictly convex and smooth convex function

F (θ)

:

\begin{matrix} J_{F, α} (s_{1} : s_{2}) & : = & α F (s_{1}) + (1 - α) F (s_{2}) - F (α s_{1} + (1 - α) s_{2}) \geq 0, \end{matrix}

(7)

\begin{matrix} = & log (\frac{ζ {(s_{1})}^{α} ζ {(s_{2})}^{1 - α}}{ζ (α s_{1} + (1 - α) s_{2})}) . \end{matrix}

(8)

Thus, we have the

α

-divergences between two zeta distributions

p_{s_{1}}

and

p_{s_{2}}

available in closed form.

Theorem 1

(

α

-divergences between two zeta distributions). The α-divergence for

α \in (0, 1)

between two zeta distributions

p_{s_{1}}

and

p_{s_{2}}

is:

D_{α} [p_{s_{1}} : p_{s_{2}}] = \frac{1}{α (1 - α)} (1 - \frac{ζ (α s_{1} + (1 - α) s_{2})}{ζ {(s_{1})}^{α} ζ {(s_{2})}^{1 - α}}) .

It follows that when

s_{1}

,

s_{2}

, and

α s_{1} + (1 - α) s_{2}

are all positive even integers, we can evaluate exactly the

α

-divergences between

p_{s_{1}}

and

p_{s_{2}}

.

Example 1.

Consider

s_{1} = 4

and

s_{2} = 12

with

α = \frac{1}{2}

so that

α s_{1} + (1 - α) s_{2} = 8

. Using the formula [11]

ζ (2 n) = \frac{{(- 1)}^{n + 1} B_{2 n} {(2 π)}^{2 n}}{2 (2 n)!}, n \in N

where

B_{2 n}

denotes the Bernoulli numbers, the zeta functions can be calculated exactly at 4, 8 and 12:

ζ (4) = \frac{π^{4}}{90}

,

ζ (8) = \frac{π^{8}}{9450}

, and

ζ (12) = \frac{691 π^{12}}{638512875}

. The α-divergence for

α = \frac{1}{2}

is the squared Hellinger divergence

D_{\frac{1}{2}} [p_{s_{1}}, p_{s_{2}}] = \sum_{i = 1}^{\infty} {(\sqrt{p_{s_{1}} (i)} - \sqrt{p_{s_{2}} (i)})}^{2}

. Thus, we find the exact squared Hellinger divergence:

D_{\frac{1}{2}} [p_{4}, p_{12}] = 4 (1 - 3 \sqrt{\frac{715}{6910}}) ≃ 0.139929 \dots

.

Let us report another example where the squared Hellinger divergence is expressed using the zeta function:

Example 2.

We consider

s_{1} = 3

,

s_{2} = 7

and

α = \frac{1}{2}

so that

α s_{1} + (1 - α) s_{2} = 5

. Then, we have

D_{\frac{1}{2}} [p_{3}, p_{7}] = 4 (1 - \frac{ζ (5)}{\sqrt{ζ (3) ζ (7)}}) ≃ 0.23261 \dots

Since

{lim}_{α \to 1} D_{α} [p_{s_{1}} : p_{s_{2}}] = D_{KL} [p_{s_{1}} : p_{s_{2}}]

is the Kullback–Leibler divergence [27] (KLD)

D_{KL} [p_{s_{1}} : p_{s_{2}}] : = \sum_{i = 1}^{\infty} p_{s_{1}} (i) log \frac{p_{s_{1}} (i)}{p_{s_{2}} (i)},

(9)

we can approximate the KLD by

D_{1 - ϵ} [s_{1} : s_{2}]

for a small value of

ϵ

(say,

ϵ = 10^{- 3}

) using fast methods to compute the zeta function [12].

Corollary 1

(Approximation of the Kullback–Leibler divergence). The Kullback–Leibler divergence between two zeta distributions

p_{s_{1}}

and

p_{s_{2}}

can be approximated for small values

ϵ > 0

by

D_{KL} [p_{s_{1}} : p_{s_{2}}] ≃ D_{1 - ϵ} [p_{s_{1}} : p_{s_{2}}] = \frac{1}{ϵ (1 - ϵ)} (1 - \frac{ζ ((1 - ϵ) s_{1} + ϵ s_{2})}{ζ {(s_{1})}^{1 - ϵ} ζ {(s_{2})}^{ϵ}}) .

Example 3.

We let

1 - ϵ = 0.99

,

1 - ϵ = 0.999

,

1 - ϵ = 0.9999

,

1 - ϵ = 0.99999

, and find the following numerical approximations:

\begin{matrix} D_{KL} [p_{s_{1}} : p_{s_{2}}] & ≃_{1 - ϵ = 0.99} & 0.473 \dots \\ D_{KL} [p_{s_{1}} : p_{s_{2}}] & ≃_{1 - ϵ = 0.999} & 0.482 \dots \\ D_{KL} [p_{s_{1}} : p_{s_{2}}] & ≃_{1 - ϵ = 0.9999} & 0.483 \dots \\ D_{KL} [p_{s_{1}} : p_{s_{2}}] & ≃_{1 - ϵ = 0.99999} & 0.483 \dots \end{matrix}

We can also calculate the KLD

D_{KL} [p_{s_{1}}^{X_{1}} : p_{s_{2}}^{X_{2}}]

between two truncated zeta distributions with nested supports

X_{1} \subseteq X_{2}

. See [21]. A truncated zeta distribution on the support

{a, a + 1, \dots, b} \subset N

(with

b > a

) has pmf

p_{s}^{a, b} (x) = \frac{p_{s} (x)}{Φ_{s} (b) - Φ_{s} (a)}

where

Φ_{s} (u)

is the cumulative distribution function

Φ_{s} (u) = \sum_{x \in {1, \dots, u}} p_{s} (x) = \frac{1}{ζ (s)} \sum_{x \in {1, \dots, u}} \frac{1}{x^{s}}

.

The Chernoff information [29] is defined by

C [p_{1}, p_{2}] = - log {min}_{α \in (0, 1)} I_{α} [p_{1}, p_{2}]

. The unique optimal value

α^{*}

maximizing the Chernoff

α

-divergences

C_{α} [p_{1}, p_{2}] = - log I_{α} [p_{1}, p_{2}]

is called the Chernoff exponent [29] due to its role in bounding the probability of error in Bayesian hypothesis testing. When both pdfs or pmfs belong to the same exponential family, we have [29]

C [p_{θ_{1}}, p_{θ_{2}}] = J_{F} (θ_{1} : {(θ_{1} θ_{2})}_{α^{*}}) = B_{F} (θ_{1} : {(θ_{1} θ_{2})}_{α^{*}}) = B_{F} (θ_{2} : {(θ_{1} θ_{2})}_{α^{*}}),

(10)

where

B_{F}

denotes the Bregman divergence (corresponding to the KLD) and

{(θ_{1} θ_{2})}_{α^{*}} = α^{*} θ_{1} + (1 - α^{*}) θ_{2}

. For a uniorder exponential family such as the zeta distributions, a closed-form formula for the optimal Chernoff exponent

α^{*}

is reported in [29]:

α^{*} = \frac{{F^{'}}^{- 1} (\frac{F (θ_{2}) - F (θ_{1})}{θ_{2} - θ_{1}}) - θ_{1}}{θ_{2} - θ_{1}}

.

The Sharma–Mittal divergences [30] between two densities p and q is a biparametric family of relative entropies defined by

D_{α, β} [p : q] = \frac{1}{β - 1} ({(\int p {(x)}^{α} q {(x)}^{1 - α} d x)}^{\frac{1 - β}{1 - α}} - 1), \forall α > 0, α \neq 1, β \neq 1 .

(11)

The Sharma–Mittal divergence is induced from the Sharma–Mittal entropies, which unify the extensive Rényi entropies with the non-extensive Tsallis entropies [30]. The Sharma–Mittal divergences include the Rényi divergences (

β \to 1

) and the Tsallis divergences (

β \to α

), and in the limit case of

α, β \to 1

, the Kullback–Leibler divergence [31]. When both densities

p = p_{θ_{1}}

and

q = p_{θ_{2}}

belong to the same exponential family, we have the following closed-form formula [31]:

D_{α, β} [p_{θ_{1}} : p_{θ_{2}}] = \frac{1}{β - 1} (e^{- \frac{1 - β}{1 - α} J_{F, α} (θ_{1} : θ_{2})} - 1) .

(12)

Thus, we get the following theorem:

Theorem 2.

For

α > 0

,

α \neq 1

,

β \neq 1

, the Sharma–Mittal divergence between two zeta distributions

p_{s_{1}}

and

p_{s_{2}}

is

D_{α, β} [p_{s_{1}} : p_{s_{2}}] = \frac{1}{β - 1} ({(\frac{ζ (α s_{1} + (1 - α) s_{2})}{ζ {(s_{1})}^{α} ζ {(s_{2})}^{1 - α}})}^{\frac{1 - β}{1 - α}} - 1) .

3. The Kullback–Leibler Divergence between Two Zeta Distributions

It is well-known that the KLD between two probability mass functions of an exponential family amounts to a reverse Bregman divergence induced by the cumulant function [32]:

D_{KL} [p_{s_{1}} : p_{s_{2}}] = B_{F}^{*} (θ_{1} : θ_{2}) : = B_{F} (θ_{2} : θ_{1})

(with

θ_{1} = s_{1}

and

θ_{2} = s_{2}

). Furthermore, this Bregman divergence amounts to a Fenchel–Young divergence [33] so that we have

D_{KL} [p_{s_{1}} : p_{s_{2}}] = B_{F} (θ_{2} : θ_{1}) = F (θ (s_{2})) + F^{*} (η (s_{1})) - θ (s_{2}) η (s_{1}),

(13)

where

F^{*} (η)

denotes the Legendre convex conjugate of F,

θ (s) = s

and

η (s) = F^{'} (θ (s)) = E_{p_{s}} [t (x)] = - E_{p_{s}} [log x]

, see [7]. Moreover, the convex conjugate

F^{*} (η (s))

corresponds to the negentropy [34]:

F^{*} (η (s)) = - H [p_{s}]

, where the entropy of a zeta distribution

p_{s}

is defined by:

H [p_{s}] : = \sum_{i = 1}^{\infty} p_{s} (i) log \frac{1}{p_{s} (i)} .

(14)

Using the fact that

\sum_{i = 1}^{\infty} p_{s} (i) = 1 = \sum_{i = 1}^{\infty} \frac{1}{i^{s} ζ (s)}

, we can express the entropy as follows:

\begin{matrix} H [p_{s}] & = & \sum_{i = 1}^{\infty} \frac{1}{i^{s} ζ (s)} log i^{s} + log (ζ (s)) \sum_{i = 1}^{\infty} \frac{1}{i^{s} ζ (s)}, \\ = & \sum_{i = 1}^{\infty} \frac{1}{i^{s} ζ (s)} log (i^{s} ζ (s)) . \end{matrix}

Since

F (θ) = log ζ (θ)

, we have

η (θ) = F^{'} (θ) = \frac{ζ^{'} (θ)}{ζ (θ)}

. The function

\frac{ζ^{'} (θ)}{ζ (θ)}

has been tabulated in [35] (page 400). Notice that the maximum likelihood estimator [7] of n independently and identically distributed observations

x_{1}, \dots, x_{n}

is

\hat{η} = \frac{1}{n} \sum_{i = 1}^{n} t (x_{i})

. Thus we have:

\hat{η} = \frac{ζ^{'} (\hat{θ})}{ζ (\hat{θ})} = - \frac{1}{n} \sum_{i = 1}^{n} log x_{i} .

(15)

The inverse of the zeta function

ζ^{- 1} (\cdot)

has been studied in [36].

Proposition 1

(KLD between zeta distributions). The Kullback–Leibler divergence between two zeta distributions can be written as:

\begin{matrix} D_{KL} [p_{s_{1}} : p_{s_{2}}] & = & log (ζ (s_{2})) - H [p_{s_{1}}] + s_{2} E_{p_{s_{1}}} [log x], \\ = & log (ζ (s_{2})) - \sum_{i = 1}^{\infty} \frac{1}{i^{s_{1}} ζ (s_{1})} log (i^{s_{1}} ζ (s_{1})) - s_{2} \frac{ζ^{'} (s_{1})}{ζ (s_{1})} . \end{matrix}

Moreover, the logarithmic derivative of the zeta function can be expressed using the von Mangoldt function [37] (page 1850) for

θ > 1

:

η (θ) = \frac{ζ^{'} (θ)}{ζ (θ)} = - \sum_{i = 1}^{\infty} \frac{Λ (i)}{i^{θ}},

(16)

where

Λ (i) = log p

is

i = p^{k}

for some prime p and integer

k \geq 1

and 0 otherwise. Notice that the zeta function can be calculated using Euler product formula:

ζ (θ) = \prod_{p : prime} \frac{1}{1 - p^{- θ}}

.

Theorem 3.

The Kullback–Leibler divergence between two zeta distributions can be expressed using the real zeta function ζ and the von Mangoldt function Λ as:

D_{KL} [p_{s_{1}} : p_{s_{2}}] = log (ζ (s_{2})) - \sum_{i = 1}^{\infty} \frac{1}{i^{s} ζ (s)} log (i^{s} ζ (s)) + s_{2} \sum_{i = 1}^{\infty} \frac{Λ (i)}{i^{s_{1}}} .

Example 4.

Consider

s_{1} = 4

and

s_{2} = 12

. Letting

1 - ϵ = 0.9999

and using Corollary 1, we obtain

D_{KL} [p_{s_{1}} : p_{s_{2}}] ≃ D_{1 - ϵ} [p_{s_{1}} : p_{s_{2}}] = 0.430479743738878 \dots

Let us now calculate the KLD using Theorem 3; we obtain

log (ζ (s_{2})) = log \frac{691 π^{1} 2}{638512875}

,

H [p_{s_{1}}] ≃ 0.3337829096182664 \dots

(using 100 terms), and

η (s_{1}) = - 0.06366938697034288 \dots

(using 100 terms) so that we have

\begin{matrix} D_{KL} [p_{s_{1}} : p_{s_{2}}] & = & log (ζ (s_{2})) - \sum_{i = 1}^{\infty} \frac{1}{i^{s} ζ (s)} log (i^{s} ζ (s)) + s_{2} \sum_{i = 1}^{\infty} \frac{Λ (i)}{i^{s_{1}}}, \end{matrix}

(17)

\begin{matrix} ≃ & 0.430495790304827 \dots \end{matrix}

(18)

It is well-known that the KLD between two arbitrarily close zeta distributions

p_{s}

and

p_{s + d s}

amounts to half of the quadratic distance induced by the Fisher information:

D_{KL} [p_{s} : p_{s + d s}] \approx \frac{1}{2} I (s) d s^{2},

(19)

where

I (s) = E_{p_{s}} [{(log p_{s} (x))}^{'}^{2}] = - E_{p_{s}} [{(log p_{s} (x))}^{''}],

(20)

where the first-order and second-order derivatives are taken with respect to the parameter s. Thus, for uniorder exponential families, the Fisher information matrix is

I (s) = - E_{p_{s}} [{(log p_{s} (x))}^{''}] = {(log ζ (s))}^{''} = \frac{ζ (s) ζ^{''} (s) - ζ^{'} {(s)}^{2}}{ζ^{2} (s)} .

(21)

This second-order derivative

{(log ζ (s))}^{''}

has been studied in [38]. We have

I (s) = \sum_{n = 1}^{\infty} Λ (n) log (n) n^{- s}

(22)

where

Λ (n)

is the Von Mangoldt function.

4. Comparison of the Zeta Family with a Pareto Subfamily

The zeta distribution is also called the “pure power-law distribution” in the literature [2].

We can compute the

α

-divergences between two Pareto distributions

q_{s_{1}}

and

q_{s_{2}}

with fixed scale 1 and respective shapes

s_{1} - 1

and

s_{2} - 1

. The Pareto density writes

q_{s} (x) = \frac{s - 1}{x^{s}}

for

x \in X = (1, \infty)

. The family of such Pareto distributions forms a continuous exponential family with natural parameter

θ = s

, sufficient statistic

t (x) = - log (x)

, and convex cumulant function

F (θ) = - log (θ - 1)

for

θ \in Θ = (1, \infty)

. Thus we have [28]:

\begin{matrix} I_{α} [q_{1} : q_{2}] = \int q_{s_{1}} {(x)}^{α} q_{s_{2}} {(x)}^{1 - α} d x & = & exp (- J_{F, α} (θ_{1} : θ_{2})), \end{matrix}

(23)

\begin{matrix} = & \frac{α s_{1} + (1 - α) s_{2}}{s_{1}^{α} s_{2}^{1 - α}}, \end{matrix}

(24)

and we obtain the following closed form for the

α

-divergences between two Pareto distributions

q_{s_{1}}

and

q_{s_{2}}

:

D_{α} [q_{s_{1}} : q_{s_{2}}] = \frac{1}{α (1 - α)} (1 - \frac{α s_{1} + (1 - α) s_{2}}{s_{1}^{α} s_{2}^{1 - α}}) .

(25)

The moment parameter is

η (θ) = F^{'} (θ) = - \frac{1}{θ - 1}

so that

θ (η) = 1 - \frac{1}{η}

and

F^{*} (η) = θ (η) η - F (θ (η)) = η - 1 - log (- η)

. It follows that the KLD is

D_{KL} [q_{s_{1}} : q_{s_{2}}] = B_{F} (θ_{2} : θ_{1}) = log (\frac{s_{1} - 1}{s_{2} - 1}) + \frac{s_{2} - s_{1}}{s_{1} - 1} .

(26)

The differential entropy of the Pareto distribution

q_{s}

is

h [q_{s}] = - \int_{1}^{\infty} q_{s} (x) log q_{s} (x) d x = - F^{*} (η (s))

(27)

with

η (s) = - \frac{1}{s - 1}

. We find that

h [q_{s}] = 1 + \frac{1}{s - 1} - log (s - 1) .

(28)

Example 5.

For comparison, we calculate the KLD between two Pareto distributions with parameters

s_{1} = 4

and

s_{2} = 12

. We find

D_{KL} [q_{s_{1}} : q_{s_{2}}] = log \frac{3}{11} + \frac{8}{3} ≃ 1.367383682536406 \dots

5. Conclusions

Table 1 compares the discrete exponential family of zeta distributions with the continuous exponential family of Pareto distributions with fixed scale 1.

In general, it is interesting to consider discrete counterparts of continuous exponential families. For example, the discrete Gaussian distributions or discrete normal distributions defined as maximum entropy distributions have been studied in [39,40]. The log-normalizer or cumulant function of the discrete Gaussian distributions are related to the Riemann theta function [41]. Given a prescribed sufficient statistics

t (x)

, we may define the continuous exponential family with respect to the Lebesgue measure

μ

as the probability density functions

p (x)

maximizing the differential entropy under the moment constraint

E_{p} [t (x)] = η

. The corresponding discrete exponential family is obtained by the distributions with probability mass functions maximizing Shannon entropy under the moment constraint

E_{p} [t (x)] = η

.

Additional material is available online at https://franknielsen.github.io/ZetaParetoExpFam/index.html (accessed on 18 October 2022).

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

References

Kotz, S.; Balakrishnan, N.; Read, C.; Vidakovic, B. Encyclopedia of Statistical Sciences; Wiley: Hoboken, NJ, USA, 2005; Volume 15. [Google Scholar]
Goldstein, M.L.; Morris, S.A.; Yen, G.G. Problems with fitting to the power-law distribution. Eur. Phys. J. Condens. Matter Complex Syst. 2004, 41, 255–258. [Google Scholar] [CrossRef] [Green Version]
Titchmarsh, E.C.; Heath-Brown, D.R.; Titchmarsh, E.C.T. The Theory of the Riemann Zeta-Function; Oxford University Press: Oxford, UK, 1986. [Google Scholar]
Tempesta, P. Group entropies, correlation laws, and zeta functions. Phys. Rev. E 2011, 84, 021121. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Iwaniec, H. Lectures on the Riemann Zeta Function; American Mathematical Society: Providence, RI, USA, 2014; Volume 62. [Google Scholar]
Nielsen, F. A note on some information-theoretic divergences between Zeta distributions. arXiv 2021, arXiv:2104.10548. [Google Scholar]
Barndorff-Nielsen, O. Information and Exponential Families in Statistical Theory; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
Apéry, R. Irrationalité de ζ(2) et ζ(3). Astérisque 1979, 61, 1. [Google Scholar]
Rivoal, T. La fonction zêta de Riemann prend une infinité de valeurs irrationnelles aux entiers impairs. C. R. de l’Académie des Sci.-Ser. I-Math. 2000, 331, 267–270. [Google Scholar] [CrossRef] [Green Version]
Fischler, S.; Sprang, J.; Zudilin, W. Many odd zeta values are irrational. Compos. Math. 2019, 155, 938–952. [Google Scholar] [CrossRef] [Green Version]
Graham, R.L.; Knuth, D.E.; Patashnik, O. Concrete Mathematics: A Foundation for Computer Science; Addison-Wesley Professional: Boston, MA, USA, 1994. [Google Scholar]
Hiary, G.A. Fast methods to compute the Riemann zeta function. Ann. Math. 2011, 174, 891–946. [Google Scholar] [CrossRef] [Green Version]
Johansson, F. Rigorous high-precision computation of the Hurwitz zeta function and its derivatives. Numer. Algorithms 2015, 69, 253–270. [Google Scholar] [CrossRef] [Green Version]
Yildirim, C.Y. A Note on ζ^′′(s) and ζ^′′′(s). Proc. Am. Math. Soc. 1996, 124, 2311–2314. [Google Scholar] [CrossRef] [Green Version]
Powers, D.M. Applications and explanations of Zipf’s law. In New Methods in Language Processing and Computational Natural Language Learning; ACL anthology: Cambridge, MA, USA, 1998. [Google Scholar]
Mandelbrot, B. Information Theory and Psycholinguistics: A Theory of Word Frequencies, Readings in Mathematical Social Sciences; MIT Press: Cambridge, MA, USA, 1966. [Google Scholar]
Lovričević, N.; Pečarić, D.; Pečarić, J. Zipf–Mandelbrot law, f-divergences and the Jensen-type interpolating inequalities. J. Inequalities Appl. 2018, 2018, 36. [Google Scholar] [CrossRef]
Naldi, M. Approximation of the truncated Zeta distribution and Zipf’s law. arXiv 2015, arXiv:1511.01480. [Google Scholar]
Hu, C.Y.; Iksanov, A.M.; Lin, G.D.; Zakusylo, O.K. The Hurwitz zeta distribution. Aust. N. Z. J. Stat. 2006, 48, 1–6. [Google Scholar] [CrossRef]
Deluca, A.; Corral, Á. Fitting and goodness-of-fit test of non-truncated and truncated power-law distributions. Acta Geophys. 2013, 61, 1351–1394. [Google Scholar] [CrossRef]
Nielsen, F. Statistical Divergences between Densities of Truncated Exponential Families with Nested Supports: Duo Bregman and Duo Jensen Divergences. Entropy 2022, 24, 421. [Google Scholar] [CrossRef] [PubMed]
Saito, S.; Tanaka, T. A note on infinite divisibility of zeta distributions. Appl. Math. Sci. 2012, 6, 1455–1461. [Google Scholar]
Wang, T.; Zhang, W.; Maunder, R.G.; Hanzo, L. Near-capacity joint source and channel coding of symbol values from an infinite source set using Elias gamma error correction codes. IEEE Trans. Commun. 2013, 62, 280–292. [Google Scholar] [CrossRef] [Green Version]
Oosawa, T.; Matsuda, T. SQL injection attack detection method using the approximation function of zeta distribution. In Proceedings of the 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC), San Diego, CA, USA, 5–8 October 2014; pp. 819–824. [Google Scholar]
Doray, L.G.; Luong, A. Quadratic distance estimators for the zeta family. Insur. Math. Econ. 1995, 16, 255–260. [Google Scholar] [CrossRef]
Jung, H.; Phoa, F.K.H. A Mixture Model of Truncated Zeta Distributions with Applications to Scientific Collaboration Networks. Entropy 2021, 23, 502. [Google Scholar] [CrossRef]
Cichocki, A.; Amari, S.i. Families of alpha-beta-and gamma-divergences: Flexible and robust measures of similarities. Entropy 2010, 12, 1532–1568. [Google Scholar] [CrossRef] [Green Version]
Nielsen, F.; Boltz, S. The Burbea-Rao and Bhattacharyya centroids. IEEE Trans. Inf. Theory 2011, 57, 5455–5466. [Google Scholar] [CrossRef] [Green Version]
Nielsen, F. An information-geometric characterization of Chernoff information. IEEE Signal Process. Lett. 2013, 20, 269–272. [Google Scholar] [CrossRef] [Green Version]
Sharma, B.D.; Mittal, D.P. New non-additive measures of entropy for discrete probability distributions. J. Math. Sci 1975, 10, 28–40. [Google Scholar]
Nielsen, F.; Nock, R. A closed-form expression for the Sharma–Mittal entropy of exponential families. J. Phys. Math. Theor. 2011, 45, 032003. [Google Scholar] [CrossRef] [Green Version]
Azoury, K.S.; Warmuth, M.K. Relative loss bounds for on-line density estimation with the exponential family of distributions. Mach. Learn. 2001, 43, 211–246. [Google Scholar] [CrossRef] [Green Version]
Nielsen, F. On geodesic triangles with right angles in a dually flat space. In Progress in Information Geometry; Springer: Berlin/Heidelberg, Germany, 2021; pp. 153–190. [Google Scholar]
Nielsen, F.; Nock, R. Entropies and cross-entropies of exponential families. In Proceedings of the 2010 IEEE International Conference on Image Processing, Washington, DC, USA, 11–12 November 2010; pp. 3621–3624. [Google Scholar]
Walther, A. Anschauliches zur Riemannschen zetafunktion. Acta Math. 1926, 48, 393–400. [Google Scholar] [CrossRef]
Kawalec, A. The inverse Riemann zeta function. arXiv 2021, arXiv:2106.06915. [Google Scholar]
Weisstein, E.W. CRC Concise Encyclopedia of Mathematics; CRC press: Boca Raton, FL, USA, 2002. [Google Scholar]
Stopple, J. Notes on log(ζ(s))^′′. Rocky Mt. J. Math. 2016, 46, 1701–1715. [Google Scholar]
Agostini, D.; Améndola, C. Discrete Gaussian distributions via theta functions. SIAM J. Appl. Algebra Geom. 2019, 3, 1–30. [Google Scholar] [CrossRef] [Green Version]
Nielsen, F. The Kullback–Leibler Divergence Between Lattice Gaussian Distributions. J. Indian Inst. Sci. 2022, 1–12. [Google Scholar] [CrossRef]
Deconinck, B.; Heil, M.; Bobenko, A.; Van Hoeij, M.; Schmies, M. Computing Riemann theta functions. Math. Comput. 2004, 73, 1417–1442. [Google Scholar] [CrossRef]

Figure 1. Plot of

F (θ) = log ζ (θ)

, a strictly convex and analytic function.

Figure 1. Plot of

F (θ) = log ζ (θ)

, a strictly convex and analytic function.

Table 1. Comparisons between the Zeta family and the Pareto subfamily. The function

ζ (s)

is the real zeta function.

Table 1. Comparisons between the Zeta family and the Pareto subfamily. The function

ζ (s)

is the real zeta function.

	Zeta Distribution	Pareto Distribution
Univariate Uni-Order Exponential Family $exp (θ t (x) - F (θ))$
	Discrete EF	Continuous EF
PMF/PDF	$p_{s} (x) = \frac{1}{x^{s} ζ (s)}$	$q_{s} (x) = \frac{s - 1}{x^{s}}$
Support $X$	$N = {1, 2, \dots}$	$(1, \infty)$
Natural parameter $θ$	s	s
Cumulant $F (θ)$	$log ζ (θ)$	$- log (θ - 1)$
Sufficient statistic $t (x)$	$- log x$	$- log x$
Moment parameter $η$	$\frac{ζ^{'} (θ)}{ζ (θ)}$	$- \frac{1}{s - 1}$
Conjugate $F^{*} (η)$	$- H [p_{s}] = - \sum_{i = 1}^{\infty} \frac{1}{i^{s} ζ (s)} log (i^{s} ζ (s))$	$η - 1 - log (- η)$
Maximum likelihood estimator	$\hat{η} = \frac{ζ^{'} (\hat{θ})}{ζ (\hat{θ})} = - \frac{1}{n} \sum_{i = 1}^{n} log x_{i}$	$\hat{s} = \frac{n}{\sum_{i = 1}^{n} log x_{i}}$
Fisher information	$\sum_{i = 0}^{\infty} Λ (i) log (i) i^{- s}$	$\frac{1}{{(s - 1)}^{2}}$
Entropy $- F^{*} (η (s))$	$\sum_{i = 1}^{\infty} \frac{1}{i^{s} ζ (s)} log (i^{s} ζ (s))$	$1 + \frac{1}{s - 1} - log (s - 1)$
Bhattacharyya coefficient $I_{α}$	$\frac{ζ (α s_{1} + (1 - α) s_{2})}{ζ {(s_{1})}^{α} ζ {(s_{2})}^{1 - α}}$	$\frac{α s_{1} + (1 - α) s_{2}}{s_{1}^{α} s_{2}^{1 - α}}$
Kullback-Leibler divergence	$log (ζ (s_{2})) - \sum_{i = 1}^{\infty} \frac{1}{i^{s} ζ (s)} log (i^{s} ζ (s)) - s_{2} \frac{ζ^{'} (s_{1})}{ζ (s_{1})}$	$log (\frac{s_{1} - 1}{s_{2} - 1}) + \frac{s_{2} - s_{1}}{s_{1} - 1}$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nielsen, F. Comparing the Zeta Distributions with the Pareto Distributions from the Viewpoint of Information Theory and Information Geometry: Discrete versus Continuous Exponential Families of Power Laws. Phys. Sci. Forum 2022, 5, 2. https://doi.org/10.3390/psf2022005002

AMA Style

Nielsen F. Comparing the Zeta Distributions with the Pareto Distributions from the Viewpoint of Information Theory and Information Geometry: Discrete versus Continuous Exponential Families of Power Laws. Physical Sciences Forum. 2022; 5(1):2. https://doi.org/10.3390/psf2022005002

Chicago/Turabian Style

Nielsen, Frank. 2022. "Comparing the Zeta Distributions with the Pareto Distributions from the Viewpoint of Information Theory and Information Geometry: Discrete versus Continuous Exponential Families of Power Laws" Physical Sciences Forum 5, no. 1: 2. https://doi.org/10.3390/psf2022005002

APA Style

Nielsen, F. (2022). Comparing the Zeta Distributions with the Pareto Distributions from the Viewpoint of Information Theory and Information Geometry: Discrete versus Continuous Exponential Families of Power Laws. Physical Sciences Forum, 5(1), 2. https://doi.org/10.3390/psf2022005002

Article Menu

Comparing the Zeta Distributions with the Pareto Distributions from the Viewpoint of Information Theory and Information Geometry: Discrete versus Continuous Exponential Families of Power Laws^†

Abstract

1. Introduction

2. Amari’s $α$ -Divergences and Sharma–Mittal Divergences

3. The Kullback–Leibler Divergence between Two Zeta Distributions

4. Comparison of the Zeta Family with a Pareto Subfamily

5. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Comparing the Zeta Distributions with the Pareto Distributions from the Viewpoint of Information Theory and Information Geometry: Discrete versus Continuous Exponential Families of Power Laws †

Abstract

1. Introduction

2. Amari’s α -Divergences and Sharma–Mittal Divergences

3. The Kullback–Leibler Divergence between Two Zeta Distributions

4. Comparison of the Zeta Family with a Pareto Subfamily

5. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Comparing the Zeta Distributions with the Pareto Distributions from the Viewpoint of Information Theory and Information Geometry: Discrete versus Continuous Exponential Families of Power Laws^†

2. Amari’s $α$ -Divergences and Sharma–Mittal Divergences