Modelling the Frequency of Interarrival Times and Rainfall Depths with the Poisson Hurwitz-Lerch Zeta Distribution

Carmelo Agnese; Giorgio Baiamonte; Elvira Di Nardo; Stefano Ferraris; Tommaso Martini

doi:10.3390/fractalfract6090509

,

and

¹

Department of Agricultural, Food and Forest Sciences, Università di Palermo, Viale delle Scienze, 13, 90128 Palermo, Italy

²

Department of Mathematics “G. Peano”, Università degli Studi di Torino, Via Carlo Alberto, 10, 10123 Torino, Italy

³

Interuniversity Department of Regional and Urban Studies and Planning, Università degli Studi di Torino, Politecnico di Torino, Viale Pier Andrea Mattioli, 39, 10125 Torino, Italy

^*

Author to whom correspondence should be addressed.

Fractal Fract.2022, 6(9), 509;https://doi.org/10.3390/fractalfract6090509

This article belongs to the Special Issue Stochastic Modeling in Biological System

Version Notes

Order Reprints

Abstract

The Poisson-stopped sum of the Hurwitz–Lerch zeta distribution is proposed as a model for interarrival times and rainfall depths. Theoretical properties and characterizations are investigated in comparison with other two models implemented to perform the same task: the Hurwitz–Lerch zeta distribution and the one inflated Hurwitz–Lerch zeta distribution. Within this framework, the capability of these three distributions to fit the main statistical features of rainfall time series was tested on a dataset never previously considered in the literature and chosen in order to represent very different climates from the rainfall characteristics point of view. The results address the Hurwitz–Lerch zeta distribution as a natural framework in rainfall modelling using the additional random convolution induced by the Poisson-stopped model as a further refinement. Indeed the Poisson contribution allows more flexibility and depiction in reproducing statistical features, even in the presence of very different climates.

Keywords:

Hurwitz-Lerch zeta distribution; log-concavity; compound poisson distribution; one inflated model; moment; simulated annealing

1. Introduction

Analysis of rainfall data, and the subsequent modelling of the many variables concerning rainfall, is fundamental to many areas such as agricultural, ecological and engineering disciplines. From assessing hydrological risk to both crop and hydropower plannings, rainfall modelling is of the utmost importance. Moreover, being able to provide reliable rainfall modelling is essential in the well known issue of climate change. Due to the complexity of hydrological systems, their analysis and modelling rely heavily on historical records. Rainfall historical records are of various time scales, from hourly data to annual data. However, daily rainfall series are arguably the most used information in environmental, climate, hydrological, and water resources studies [1]. Rainfall manifests one peculiar characteristic which is common to many other geophysical processes: intermittence [2]. Intermittence is found in variables which are related to the internal and external structure of rainfall. The most commonly seen for the external structure are the Wet Spells (

W S

) and Dry Spells (

D S

), meaning the sequences of rainy days and non-rainy days, respectively. A way of studying the alternance of

W S

and

D S

is through the Interarrival Times (

I T

), that is the time elapsed between two consecutive days of rain. If we suppose that

I T

observations are independent and identically distributed (i.i.d.), one natural way to model them is through the well known renewal processes [3]. Many examples can be found in the literature. The simplest renewal process, the Bernoulli process, has been used in [4] for example. In this case, the

I T

’s are geometrically distributed. Its continuous counterpart, the Poisson process, has been used for its simpler mathematical tractability, but requires dealing with the

I T

random variable (r.v.) as continuous, despite its discrete nature. The need to suppose a non-constant probability of rain requires slightly more sophisticated models.

The challenge of this paper is to propose a suitable discrete distribution to fit

I T

at the daily scale. It is on this time scale that the intermittent character of precipitation can be appreciated and at the same time most practical applications are possible. The proposed distribution must be able to model both the numerous occurrences of the value equal to one, which represent the sequence of rainy days, and some large values scattered over time and responsible for drought phenomena. Our starting point is the three parameter family Hurwitz–Lerch Zeta distribution (HLZD), successfully proposed in [5]. Such a distribution represents a step forward with respect to other commonly used

I T

modelling distributions, such as the logarithmic one. In Section 3, we summarize the main properties of the HLZD and state new results on its log-concavity and convolution. As a step forward, in this paper we propose to model the

I T

r.v. using the Poisson-stopped HLZD (PSHLZD). This discrete distribution presents excess zeroes (paralleling the excess of

I T = 1

) and a long tail [6]. The PSHLZD has been used in [6] for comparisons with the negative binomial distribution, a popular model for fitting over-dispersed data. Indeed the PSHLZD can be seen both as a Poisson-stopped sum of HLZD’s as well as a generalization of a negative binomial distribution. The Poisson contribution allows us to model the superposition of i.i.d. HLZD’s in the observed time series as rare event. In Section 4, we summarize its main properties using the combinatorics of exponential Bell polynomials. It is noteworthy to mention that Bell polynomials are used within fractional calculus, see for example [7,8] and within fractal models [9]. Moreover, new results are added on the the PSHLZD, as for example on log-concavity.

A second goal of this paper is to show that the PSHLZD is also a suitable model for a different feature strictly related to the internal structure of rainfall: the depth (or the intensity) of the rainy days [10]. In the literature, refs. [11,12] rainfall depths are more often treated as continuous despite that sometimes these models fail to account for the time discreteness of the sample process [13]. Daily rainfall depth measurements are almost always performed by automatically counting how many times a small bucket corresponding to

0.2

mm is filled. This led use to treat them as discrete, because of the abundance of ties in the data. Finally, in Section 5 we have also considered a third modelling distribution: the One Inflated HLZD (OIHLZD). Such a distribution mixes two generating processes: the first generates one’s and the latter is governed by a HLZD. This stochastic structure takes into account the dominance of one’s in the rainfall depth or interarrival time series.

In Section 6, we discuss the results for fitting all these models to rainfall data, proving that the PSHLZD provides a very general framework for rainfall modelling. Indeed the PSHLZD replicates the fitting features of the OIHLZD and outperforms the fitted HLZD in some cases. The PSHLZD has a limited number of parameters and at the same time can adapt very well to data collected in very different climates, from England to Sicily. Let us underline that the analyzed dataset has never been considered in the literature and consists of measures sampled along 70 years at 5 different stations. These stations were chosen in order to represent different climates from the rainfall characteristics point of view. In fact, the interarrival data examined are very different from each other, with a regular pattern of many rainy days in England, and a winter rainy season alternating with long periods in summer without rain, typical of the Sicily Mediterranean climate. The same is for the rainfall depth, namely many small depths in England, and few big storms in Sicily. This made it possible to confirm the great utility of the proposed statistical models within rainfall modelling. Some concluding remarks and future developments are addressed at the end of the paper.

2. Bell Polynomials in a Nutshell

The partial exponential Bell polynomials are usually written as [14]

B_{n, j} (z_{1}, \dots, z_{n - j + 1}) = \sum \frac{n!}{{(1!)}^{r_{1}} r_{1}! {(2!)}^{r_{2}} r_{2}! \dots} z_{1}^{r_{1}} \dots z_{n - j + 1}^{r_{n - j + 1}} n \in N, j \leq n

(1)

where the summation is over all the solutions in non-negative integers

r_{1}, \dots, r_{n - j + 1}

of

r_{1} + 2 r_{2} + \dots + (n - j + 1) r_{n - j + 1} = n

and

r_{1} + r_{2} + \dots + r_{n - j + 1} = j .

A lighter expression is obtained using partitions of the integer n with length

j .

Recall that a partition of an integer n is a sequence

π = (π_{1}, π_{2}, \dots)

of weakly decreasing positive integers, named parts of

π,

such that

π_{1} + π_{2} + \dots = n .

A different notation is

π = (1^{r_{1}}, 2^{r_{2}}, \dots),

where

r_{1}, r_{2}, \dots,

named multiplicities of

π,

are the number of parts of

π

equal to

1, 2, \dots

respectively. The length of the partition is

l (π) = r_{1} + r_{2} + \dots

and the vector of multiplicities is

m (π) = (r_{1}, r_{2}, \dots) .

We write

π ⊢ n

to denote that

π

is a partition of

n .

Thus the partial exponential Bell polynomials (1) can be rewritten as [15]

B_{n, j} (z_{1}, \dots, z_{n - j + 1}) = \sum_{π ⊢ n, l (π) = j} d_{π} z_{π}

(2)

where the sum is over all the partitions

π ⊢ n

with length

l (π) = j

and

z_{π} = z_{1}^{r_{1}} z_{2}^{r_{2}} \dots d_{π} = \frac{i!}{{(1!)}^{r_{1}} r_{1}! {(2!)}^{r_{2}} r_{2}! \dots} .

(3)

Using integer partitions, the explicit expression of the partial exponential polynomials can be recovered in R using the kStatistics package [16]. A useful property used in the following is

B_{n, j} (a b z_{1}, \dots, a b^{n - j + 1} z_{n - j + 1}) = a^{j} b^{n} B_{n, j} (z_{1}, \dots, z_{n - j + 1})

(4)

with

a, b

constants. Equation (4) follows from (2) since from (3) we have

{(a b z_{1})}^{r_{1}} {(a b^{2} z_{2})}^{r_{2}} \dots = a^{r_{1} + r_{2} + \dots} b^{r_{1} + 2 r_{2} + \dots} z_{π} = a^{j} b^{n} z_{π}

taking into account that

l (π) = r_{1} + r_{2} + \dots = j

and

r_{1} + 2 r_{2} + \dots = n .

The n-th complete exponential Bell polynomials in the indeterminates

z_{1}, \dots, z_{n}

is defined as [14]

B_{n} (z_{1}, \dots, z_{n}) = \sum_{j = 0}^{n} B_{n, j} (z_{1}, \dots, z_{n - j + 1})

(5)

with

{B_{n, j}}

the partial exponential Bell polynomials (1). Note that n is the positive integer corresponding to the maximum degree of the monomials in (5). This polynomial sequence satisfies the following recurrence [14]

B_{n + 1} (z_{1}, \dots, z_{n + 1}) = \sum_{j = 0}^{n} (\binom{n}{j}) z_{j + 1} B_{n - j} (z_{1}, \dots, z_{n - j})

(6)

with the initial value

B_{0} = 1 .

The generating function of

{B_{n}}

is the formal power series composition [14]

exp [h_{z} (t) - z_{0}] = \sum_{n \geq 0} \frac{t^{n}}{n!} B_{n} (z_{1}, \dots, z_{n}) \in R [[t]]

(7)

where

R [[t]]

is the ring of formal power series in t and

h_{z} (t)

is the generating function of

{z_{k}}_{k \geq 0}

, that is

h_{z} (t) = \sum_{k \geq 0} \frac{t^{k}}{k!} z_{k} .

A different expression of the n-th complete exponential Bell polynomial involves integer partitions [15] as follows

B_{n} (z_{1}, \dots, z_{n}) = \sum_{π ⊢ n} d_{π} z_{π}

(8)

where the sum is over all the partitions

π ⊢ n, d_{π}

and

z_{π}

are given in (3). In particular we have

B_{n} (λ z_{1}, \dots, λ z_{n}) = \sum_{π ⊢ n} λ^{l (π)} d_{π} z_{π}

(9)

with

λ

a constant. Now, suppose to replace

λ^{l (π)}

in (9) with a numerical sequence

{a_{l (π)}} .

Thanks to this device, the complete exponential Bell polynomials results as a special case of a wider class of polynomial families, the generalized partition polynomials [16]

B_{n} (a_{1}, \dots, a_{n}; z_{1}, \dots, z_{n}) = \sum_{π ⊢ n} d_{π} a_{l (π)} z_{π}

(10)

where the sum is again over all the partitions

π ⊢ n .

A different expression of (10) involves the partial exponential Bell polynomials

{B_{n, j}}

in (1)

B_{n} (a_{1}, \dots, a_{n}; z_{1}, \dots, z_{n}) = \sum_{j = 1}^{n} a_{j} B_{n, j} (z_{1}, \dots, z_{n - j + 1}) .

(11)

An example of a well known polynomial family, arising from (11) is the logarithmic one [14]

L_{n} (z_{1}, \dots, z_{n}) = \sum_{j = 0}^{n} {(- 1)}^{j - 1} (j - 1)! B_{n, j} (z_{1}, \dots, z_{n - j + 1}) .

(12)

3. The Hurwitz-Lerch Zeta Distribution

Definition 1.

A discrete random variable

Y \overset{d}{=} HLZD (a, θ, s)

if

\begin{matrix} q_{y} : = P (Y = y) = \frac{θ^{y}}{T (θ, s, a) {(y + a)}^{s + 1}}, θ \in (0, 1), a > - 1, s \in R \end{matrix}

(13)

for

y = 1, 2, \dots,

with

T (θ, s, a) = θ Φ (θ, s + 1, a + 1)

, where

Φ (θ, s, a) = \sum_{n = 0}^{\infty} \frac{θ^{n}}{{(n + a)}^{s}}

(14)

is the Lerch Transcendent function.

The probability generating function (pgf) of

Y \overset{d}{=} HLZD (a, θ, s)

is

G_{Y} (z) = \frac{θ Φ (z θ, s + 1, a + 1)}{Φ (θ, s, a)}, | z | \leq 1

(15)

with

G_{Y} (0) = 0 .

3.1. Moments and Cumulants

HLZD moments have a closed form expression involving the Lerch Transcendent function. Differently from [17], we find this closed form expression using (13).

Proposition 1.

If

Y \overset{d}{=} HLZD (a, θ, s)

, then

ξ_{k} : = E [Y^{k}] = \sum_{j = 0}^{k} {(- a)}^{k - j} (\binom{k}{j}) \frac{Φ (θ, s + 1 - j, a)}{Φ (θ, s + 1, a + 1)} .

(16)

Proof.

Using the binomial expansion of

y^{k} = {(y - a + a)}^{k},

we have

\begin{matrix} ξ_{k} & = \sum_{y = 1}^{\infty} y^{k} P (Y = y) = \sum_{y = 1}^{\infty} y^{k} \frac{θ^{y - 1}}{Φ (θ, s + 1, a + 1) {(y + a)}^{s + 1}} \\ = \sum_{y = 1}^{\infty} (\sum_{j = 0}^{k} (\binom{k}{j}) {(y + a)}^{j} {(- a)}^{k - j}) \frac{θ^{y - 1}}{Φ (θ, s + 1, a + 1) {(y + a)}^{s + 1}} \\ = \sum_{j = 0}^{k} (\binom{k}{j}) {(- a)}^{k - j} \frac{1}{Φ (θ, s + 1, a + 1)} \sum_{y = 1}^{\infty} \frac{θ^{y - 1}}{{(y + a)}^{s + 1 - j}} \end{matrix}

from which (16) follows by taking into account (14). □

As a corollary, the mean and the variance are respectively:

E [Y] = \frac{T (θ, s - 1, a)}{T (θ, s, a)} - a Var [Y] = \frac{T (θ, s - 2, a)}{T (θ, s, a)} - {(\frac{T (θ, s - 1, a)}{T (θ, s, a)})}^{2} .

More generally, the k-th central moment can be recovered as

ξ_{k}^{'} : = E [{(Y - ξ_{1})}^{k}] = \sum_{j = 0}^{k} (\binom{k}{j}) ξ_{1}^{j} ξ_{k - j}

and the factorial moments as

{(ξ)}_{k} = E [Y (Y - 1) \dots (Y - k + 1)] = \sum_{j = 0}^{k} s (k, j) ξ_{k}

(17)

with

s (k, j)

Stirling numbers of the first kind [14]. HLZD cumulants are such that [14]

κ_{n} (Y) = L_{n} (ξ_{1}, \dots, ξ_{n}) n = 1, 2, \dots

where

{ξ_{j}}

are the moments of

Y \overset{d}{=} HLZD (a, θ, s),

given in (16), and

L_{n}

is the n-th logarithmic polynomial (12). Let us recall that, if the moment generating function (mgf)

M_{Y} (t)

of Y is well defined in a suitable neighborhood of

0,

then the coefficients

{κ_{n} (Y)}

in the expansion

M_{Y} (t) = exp (\sum_{n \geq 1} \frac{t^{n}}{n!} κ_{n} (Y))

are the cumulants of

Y .

The first cumulant is the mean

E [Y],

the second cumulant is the variance

Var (Y),

the skewness and the kurtosis of Y can be recovered using the third and the fourth cumulant of Y respectively.

3.2. Mode

The HLZ distribution is a particular case of a wider class of distributions called the Modified Power Series Distributions (MPSD) [18].

Definition 2.

A discrete random variable

Y \overset{d}{=} MPSD (a, g, f)

if

p_{y} : = P (Y = y) = \frac{a (y) g {(θ)}^{y}}{f (θ)}, y \in T \subset N

(18)

where

a (y)

,

g (θ)

and

f (θ)

are positive, bounded, and differentiable functions of y and θ respectively with

f (θ) = \sum_{y \in T} a (y) g {(θ)}^{y} .

Using this wider class of distributions, we will prove that

Y \overset{d}{=} HLZD (a, θ, s)

is unimodal for all

s \in R .

To this aim, let us recall that a discrete distribution with support

T \subset Z

is said to be strongly unimodal if and only if the sequence

{p_{y}}_{y \in T},

with

p_{y} P (Y = y),

is a logarithmically concave sequence [19], that is if and only if

p_{y}^{2} \geq p_{y - 1} p_{y + 1}, \forall y \in T .

(19)

Proposition 2.

Suppose

Y \overset{d}{=} HLZD (a, θ, s) .

If $s \geq - 1,$ the sequence ${q_{y}}_{y \geq 1}$ is monotonically decreasing and the mode is $y = 1$ .
If $s < - 1,$ Y is strongly unimodal.

Proof.

Similarly to what stated in Section 2.3 of [20], we have

\frac{q_{y}}{q_{y - 1}} = θ {(1 - \frac{1}{a + y})}^{s + 1} y = 2, 3, \dots .

(20)

Since

θ \in (0, 1), a > - 1

and

s \geq - 1,

the rhs of (20) is always between 0 and

1,

thus (i) follows. For (ii) we have to prove that

{q_{y}}_{y \geq 1}

is log-concave, that is it satisfies (19). Using (13), (19) reduces to

{(1 - {(y + a)}^{- 2})}^{s + 1} \geq 1,

which is always true if

s < - 1

. □

3.3. Convolution

The family of HLZ distributions is not closed under convolution. Nevertheless, as a subclass of MPS distributions, the HLZD convolution still returns a MPSD. Indeed we will prove that the family of MPS distributions is closed under convolution.

Theorem 1.

If

Y_{1}, \dots, Y_{j}

are independent r.v.’s identically distributed to

Y \overset{d}{=} HLZD (a, θ, s),

then

Y_{1} + \dots + Y_{j} \overset{d}{=} MPSD (a_{j}, g, f_{j})

with

f_{j} (θ) = {[T (θ, s, a)]}^{j}

and

a_{j} (y) = \frac{j!}{y!} \sum_{π ⊢ y, l (π) = j} d_{π} {(a_{π})}^{s + 1} with a_{π} = {(a + 1)}^{- r_{1}} {(a + 2)}^{- r_{2}} \dots

(21)

and

d_{π}

given in (3).

Proof.

Observe that if

Y_{1}, \dots, Y_{j}

are r.v.’s i.i.d. to

Y \overset{d}{=} MPSD (a, g, f),

then

Y_{1} + \dots + Y_{j} \overset{d}{=} MPSD (a_{j}, g, f_{j})

with

f_{j} (θ) = f^{j} (θ)

and

a_{j} (y) = \{\begin{matrix} \frac{j!}{y!} B_{y, j} [a (1), \dots, a (y - j + 1)], & y \in T_{j} \\ 0, & y \notin T_{j} \end{matrix}

(22)

with

T_{j} = {y_{1} + \dots + y_{j} \in N | y_{1}, \dots, y_{j} \in T} .

Indeed in (18), set

a (y) = 0

if

y \notin T

and consider the sequence

{p_{y}}_{y \geq 1}

such that

p_{y} = 0

if

y \notin T .

By using Lemma 1 in [21], we have

P (Y_{1} + \dots + Y_{j} = y) = \frac{j!}{y!} B_{y, j} (p_{1}, \dots, p_{y - j + 1})

(23)

where

{B_{y, j}}

are the partial exponential Bell polynomials (1). From (23) with

p_{i}

replaced by

a (i) g {(θ)}^{i} / f (θ)

for

i = 1, \dots, y - j + 1

and using (4) we have

P (Y_{1} + \dots + Y_{j} = y) = \frac{j!}{y!} \frac{g {(θ)}^{y}}{f^{j} (θ)} B_{y, j} [a (1), \dots, a (y - j + 1)] .

(24)

Thus

Y_{1} + \dots + Y_{j} \overset{d}{=} MPSD (a_{j}, g, f_{j})

with

f_{j} (θ) = f^{j} (θ)

and

a_{j} (y)

given in (22). From (24) note that

a_{j} (y) = 0

if

1, 2, \dots, y - j + 1 \notin T .

By replacing

g (θ) = θ, f (θ) = T (θ, s, a)

and

a (k) = {(k + a)}^{- (s + 1)}

for

k = 1, \dots, y - j + 1

in (24) we have

a_{j} (y) = \frac{j!}{y!} B_{y, j} [{(a + 1)}^{- (s + 1)}, \dots, {(a + y - j + 1)}^{- (s + 1)}] y = 1, 2, \dots .

The result follows after some manipulations, rewriting the partial Bell exponential polynomials as in (2). □

3.4. Maximum Likelihood Estimation

Consider a vector

y = (y_{1}, \dots, y_{n})

of independent observations of

Y \overset{d}{=} HLZD (a, θ, s)

. The maximum likelihood estimation (MLE) of

(θ, s, a)

is

(\hat{θ}, \hat{s}, \hat{a}) = \underset{(θ, s, a) \in Θ}{argmax} ℓ_{n} (θ, s, a, y),

(25)

with

Θ = (0, 1) \times (- \infty, + \infty) \times (- 1, \infty),

ℓ_{n} (θ, s, a, y) = ln L_{n} (θ, s, a, y)

the log-likelihood function and

L_{n} (θ, s, a, y) = \prod_{i = 1}^{n} P (Y = y_{i}) .

The MLE of the HLZD parameters

(θ, s, a)

has been studied by Gupta in [20]. He showed that the three likelihood equations arising from maximizing the log-likelihood correspond to the equations of the method of moments. In particular we have

\sum_{i = 1}^{n} \frac{y_{i}}{n} = E [Y], \sum_{i = 1}^{n} \frac{log (a + y_{i})}{n} = E [log (a + Y)], \sum_{i = 1}^{n} \frac{1}{n (a + y_{i})} = E [\frac{1}{a + Y}] .

Unfortunately, closed form solutions of the above equations are not available and also the moments

E [log (a + Y)]

and

E [1 / (a + Y)]

must be numerically approximated. As noted in [20], the likelihood equations may be solved by standard numerical methods to obtain the MLE. However, it is well known that this does not guarantee that global maxima of the likelihood have been achieved. In order to avoid this problem, a global optimization method can be employed to solve (25). The global optimization method takes advantage of the bounds of the parameters. More specifically, the MLE of the parameters can be obtained through a global optimization algorithm known as Simulated Annealing [22]. Simulated annealing is a stochastic global optimisation technique applicable to a wide range of discrete and continuous variable problems. It makes use of Markov Chain Monte Carlo samplers, to provide a means to escape local optima by allowing moves which worsen the objective function, with the aim of finding a global optimum. Technical details can be found in [22], a variant of which is the algorithm implemented in the Optim function in the base Stats R-package.

4. The Poisson-Stopped Hurwitz-Lerch Zeta Distribution

Definition 3.

A discrete random variable

X \overset{d}{=} PSHLZD (λ, a, θ, s)

if its pgf is

G_{X} (t) = exp (λ [\frac{θ Φ (z θ, s + 1, a + 1)}{Φ (θ, s, a)} - 1]) λ > 0,

(26)

where Φ is the Lerch Transcendent function (15).

According to Definition 3,

X \overset{d}{=} PSHLZD (λ, a, θ, s)

takes non-negative integer values and belongs to the class of generalized r.v.’s [23]. Indeed given two independent r.v.’s Z and

Y,

with pgf

G_{Z} (t)

and

G_{Y} (t)

respectively, the generalized r.v. X has pgf

G_{X} (t) = G_{Z} [G_{Y} (t)] .

(27)

The composition (26) matches (27) when

Y \overset{d}{=} HLZD (a, θ, s)

and Z is a Poisson r.v. (PS) of parameter

λ > 0,

independent of

Y,

since

G_{Z} (t) = exp [λ (t - 1)] .

In the following we analyse in detail the properties of the PSHLZD using the complete exponential Bell polynomials. Some of the properties given in [6] will also be briefly recalled.

Proposition 3.

If

X \overset{d}{=} PSHLZD (λ, a, θ, s)

then

p_{x} : = P (X = x) = \{\begin{matrix} e^{- λ}, & x = 0, \\ \frac{e^{- λ}}{x!} B_{x} (λ q_{1}, \dots, λ x! q_{x}), & x = 1, 2, \dots \end{matrix}

(28)

where

B_{x}

is the complete exponential Bell polynomial (5) of degree

x .

Proof.

Observe that

G_{Y} (0) = 0

and

G_{Y} (t) = \sum_{y \geq 1} y! q_{y} t^{y} / y! .

The result follows from (7) with

z_{0} = 0

and

h_{z} (t) = G_{Y} (t),

since from (26) we have

exp (- λ) exp [λ G_{Y} (t)] = \sum_{x \geq 0} \frac{t^{x}}{x!} e^{- λ} B_{x} (λ 1! q_{1}, \dots, λ x! q_{x}) .

□

Corollary 1.

If

X \overset{d}{=} PSHLZD (λ, a, θ, s)

then

p_{0} = P (X = 0) = e^{- λ}

and

p_{x} : = P (X = x) = θ^{x} e^{- λ} \sum_{π ⊢ x} {(\frac{λ}{T (θ, s, a)})}^{l (π)} \frac{{(a_{π})}^{s + 1}}{m_{π}!} x = 1, 2, \dots .

(29)

where the sum is over all the partitions

π ⊢ x,

m_{π}! = r_{1}! r_{2}! \dots

and

a_{π}

is given in (21).

Proof.

From (28), using (8) and (3), we have

\begin{matrix} p_{x} & = \frac{e^{- λ}}{x!} \sum_{π ⊢ x} \frac{x!}{{(1!)}^{r_{1}} r_{1}! {(2!)}^{r_{2}} r_{2}! \dots} {(λ 1! q_{1})}^{r_{1}} {(λ 2! q_{2})}^{r_{2}} \\ = e^{- λ} \sum_{π ⊢ x} \frac{λ^{r_{1} + r_{2} + \dots}}{r_{1}! r_{2}! \dots} \frac{{[{(a + 1)}^{- (s + 1)}]}^{r_{1}} {[{(a + 2)}^{- (s + 1)}]}^{r_{2}} \dots θ^{r_{1} + 2 r_{2} + \dots}}{T {(θ, s, a)}^{r_{1} + r_{2} + \dots}} \end{matrix}

by which (29) follows observing that

r_{1} + r_{2} + \dots = l (π)

and

r_{1} + 2 r_{2} + \dots = x .

□

As a corollary of Proposition 3 and recursion (6), the sequence

{p_{x}}

in (28) satisfies the following equations.

Corollary 2.

If

X \overset{d}{=} PSHLZD (λ, a, θ, s)

then

p_{x + 1} = \frac{λ}{x + 1} \sum_{j = 0}^{x} (j + 1) q_{j + 1} p_{x - j}, x = 1, 2, \dots with p_{0} = e^{- λ} .

Proof.

The result follows using (6) since we have

\begin{matrix} p_{x + 1} & = \frac{e^{- λ}}{(x + 1)!} B_{x + 1} (λ q_{1}, \dots, λ (x + 1)! q_{x + 1}) \\ = \frac{e^{- λ}}{(x + 1)!} \sum_{j = 0}^{x} (\binom{x}{j}) B_{x - j} (λ q_{1}, \dots, λ (x - j)! q_{x - j}) (j + 1)! q_{j + 1} \\ = \frac{e^{- λ}}{(x + 1)!} \sum_{j = 0}^{x} \frac{x!}{(x - j)! j!} p_{x - j} \frac{(x - j)!}{e^{- λ}} (j + 1)! q_{j + 1} . \end{matrix}

□

The PSHLZD is unimodal if

s \geq 0

and

- 1 < a \leq 0

([6], Property 1).

4.1. Log-Concavity

Under suitable conditions, the PSHLZD is log-concave.

Proposition 4.

If

X \overset{d}{=} PSHLZD (λ, θ, a, s)

and

s \geq - 1,

then X has a log-concave cumulative distribution function (cdf), that is

{[P (X \leq x)]}^{2} \geq P (X \leq x - 1) P (X \leq x + 1), x = 0, 1, 2, \dots .

Proof.

According to ([24], Theorem 1), a random sum

\sum_{i = 1}^{Z} Y_{i}

of i.i.d. r.v.’s has a log-concave cdf if Z is strongly unimodal and the distribution of

{Y_{i}}

has a decreasing pdf. Thus, the result follows as

X \overset{d}{=} \sum_{i = 1}^{Z} Y_{i}

with

Z \overset{d}{=} PS (λ),

which has a log-concave pdf (strongly unimodal), and

Y \overset{d}{=} HLZD (θ, a, s)

with a decreasing pdf when

s \geq - 1

(see Proposition 2). □

Proposition 4 gives a sufficient condition to get cdf log-concavity. A different way is to consider the sequence

{p_{x}} .

Indeed, if X has a log-concave pdf (19), then its cdf is also log-concave [24]. In the more general setting of generalized r.v.’s, X has a log-concave pdf if and only if the sequence

p_{x} : = P (X = x) = \frac{1}{x!} B_{x} (1! {\tilde{q}}_{1}, \dots, x! {\tilde{q}}_{x}; 1! q_{1}, \dots, x! q_{x}) x = 1, 2, \dots

(30)

is log-concave with

p_{0} = P (X = 0) = G_{Z} [G_{Y} (0)]

and

{\tilde{q}}_{x} = P (Z = x), q_{x} = P (Y = x) .

Equation (30) follows from Equation (2.3) in [23] using the general partition polynomials (8). When

Z \overset{d}{=} P S (λ)

a necessary and sufficient condition to recover strong unimodality is related to the magnitude of

q_{1}

and

q_{2},

as the following theorem shows.

Theorem 2.

If X is a generalized r.v. with Y strongly unimodal and

Z \overset{d}{=} P S (λ),

then X is strongly unimodal if and only if

λ q_{1} \geq 2 q_{2}

.

Note that a similar result is proved in ([25], Theorem 4). We provide a different proof using the following lemma.

Lemma 1.

If

{z_{j}}_{j \geq 1} \in [0, \infty)

is a log-concave sequence, then the sequence

{\frac{1}{n!} B_{n} (z_{1}, \dots, z_{n})}_{n \geq 1}

is log-concave if and only if

z_{1} \geq 2 z_{2},

with

{B_{n}}

given in (5).

Proof.

If

{z_{j}}_{j \geq 0}

with

z_{0} = 1

is a log-concave sequence of non-negative real numbers and the sequence

{a (n)}_{n \geq 0}

is defined by

\sum_{n = 0}^{\infty} \frac{a (n)}{n!} y^{n} = exp (\sum_{j = 1}^{\infty} \frac{z_{j}}{j!} y^{j})

(31)

then the sequence

{\frac{a (n)}{n!}}_{n \geq}

is log-concave [26]. Equation (31) parallels (7). Therefore, the sequence

{\frac{1}{n!} B_{n} (z_{1}, \dots, z_{n})}_{n \geq 1}

results as log-concave if the sequence

{z_{j}}_{j \geq 0}

is log-concave. Note that for

j \geq 2

we have

\frac{z_{j}^{2}}{{[(j - 1)!]}^{2}} \geq \frac{z_{j - 1}}{(j - 2)!} \frac{z_{j + 1}}{j!},

which easily reduces to

j z_{j}^{2} \geq (j - 1) z_{j - 1} z_{j + 1}

always satisfied when

{z_{j}}_{j \geq 1}

is log-concave. Now let

j = 1

. We have

{z_{j}}_{j \geq 0}

is log-concave if and only if

z_{1} \geq 2 z_{2}

and the result follows. □

Proof

(Proof of Theorem 2). Following the same arguments of Proposition 3, for a generalized r.v. with

Z \overset{d}{=} P S (λ),

(30) reduces to

P (X = x) = \frac{e^{- λ}}{x!} B_{x} (λ 1! q_{1}, \dots, λ x! q_{x})

with

q_{x} = P (Y = x)

for

x = 1, 2, \dots .

The sequence

{\frac{e^{- λ}}{x!} B_{x} (λ 1! q_{1}, \dots, λ x! q_{x})}_{x \geq 1}

is log-concave if and only if the sequence

{\frac{1}{x!} B_{x} (λ 1! q_{1}, \dots, λ x! q_{x})}_{x \geq 1}

is log-concave. The result follows using Lemma 1. □

Corollary 3.

If

s < - 1,

X \overset{d}{=} PSHLZD (λ, θ, a, s)

is strongly unimodal if and only if

λ q_{1} \geq 2 q_{2} .

4.2. Moments and Cumulants

PSHLZD moments and cumulants have closed form expressions in terms of moments of

Y \overset{d}{=} HLZD (a, θ, s) .

Proposition 5.

If

X \overset{d}{=} PSHLZD (λ, a, θ, s)

then

μ_{k} : = E [X^{k}] = B_{k} (λ ξ_{1}, \dots, λ ξ_{k}), k = 1, 2, \dots,

(32)

with

B_{k}

the k-th complete exponential Bell polynomial (5) and

ξ_{1}, \dots, ξ_{k}

the first k moments of

Y \overset{d}{=} HLZD (a, θ, s)

given in (16).

Proof.

If

M_{X} (t)

and

M_{Y} (t)

are the mgf’s of

X \overset{d}{=} PSHLZD (λ, a, θ, s)

and

Y \overset{d}{=} HLZD (a, θ, s)

respectively, then

M_{X} (t) = G_{X} (e^{t}) = e^{λ [G_{Y} (e^{t}) - 1]} = e^{λ [M_{Y} (t) - 1]}

(33)

from (27). Equation (32) follows as the rhs of (33) can be written as (7), with

h_{z} (t) = λ M_{Y} (t)

and

z_{0} = λ .

□

Remark 1.

Taking into account (33), if

X \overset{d}{=} PSHLZD (λ, a, θ, s)

then

X \overset{d}{=} Y_{1} + \dots + Y_{Z}

with

Y \overset{d}{=} HLZD (a, θ, s)

and

Z \overset{d}{=} P S (λ),

that is X is a compound Poisson r.v. Therefore the PSHLZD is an infinitely divisible distribution [27].

Moments (32) can be explicited written using (9). A straightforward corollary of recursion (6) is the following.

Corollary 4.

μ_{k + 1} = λ \sum_{j = 0}^{k} (\binom{k}{j}) μ_{k - j} ξ_{j + 1}, k = 1, 2, \dots .

If

μ_{k}^{'} E [{(X - μ_{1})}^{k}]

denotes the k-th central moment of

X \overset{d}{=} PSHLZD (λ, a, θ, s)

then

μ_{k}^{'} = \sum_{k = 0}^{n} (\binom{n}{k}) {(- λ ξ_{1})}^{n - k} B_{k} (λ ξ_{1}, \dots, λ ξ_{k}), k = 1, 2, \dots

Proposition 6.

If

X \overset{d}{=} PSHLZD (λ, a, θ, s)

then

{(μ)}_{k} : = E [X (X - 1) \dots (X - k + 1)] = B_{k} (λ {(ξ)}_{1}, \dots, λ {(ξ)}_{k}), k = 1, 2, \dots

(34)

where

{(ξ)}_{1}, \dots, {(ξ)}_{k}

are the first k factorial moments of

Y \overset{d}{=} HLZD (a, θ, s)

given in (17).

Proof.

Let us recall that, if

Q_{X} (t)

is the factorial mgf of

{{(μ)}_{k}},

then

Q_{X} (t) = G_{X} (t + 1)

with

G_{X}

the pgf of

X .

Therefore we have

Q_{X} (t) = G_{X} (t + 1) = exp (λ [G_{Y} (t + 1) - 1]) = exp (λ [Q_{Y} (t) - 1]),

(35)

with

Q_{Y} (t)

the generating function of the factorial moments

{{(ξ)}_{k}} .

Equation (34) follows as the rhs of (35) can be written as (7), with

z_{0} = λ

and

h_{z} (t) = λ Q_{Y} (t) .

□

Proposition 7.

If

κ_{n} (X)

is the n-th cumulant of

X \overset{d}{=} PSHLZD (λ, a, θ, s)

then

κ_{n} (X) = λ ξ_{n},

for

n = 1, 2, \dots

where

ξ_{n}

is the n-th moment of

Y \overset{d}{=} HLZD (a, θ, s)

given in (16).

Proof.

The result follows since

log M_{Y} (t) = log [e^{λ (M_{X} (t) - 1)}] = λ [M_{x} (t) - 1] = \sum_{n \geq 1} \frac{t^{n}}{n!} λ E [X^{k}] .

□

4.3. Maximum Likelihood Estimation

Suppose to have

x = (x_{1}, \dots, x_{n})

independent observations of

X \overset{d}{=} PSHLZD (λ, a, θ, s) .

The MLE of

(λ, θ, s, a)

is

(\hat{λ}, \hat{θ}, \hat{s}, \hat{a}) = \underset{(λ, θ, s, a) \in Θ}{argmax} ℓ_{n} (λ, θ, s, a, x),

with

Θ = (0, \infty) \times (0, 1) \times (- \infty, + \infty) \times (- 1, \infty),

ℓ_{n} (λ, θ, s, a, x) = ln L_{n} (λ, θ, s, a, x)

the log-likelihood function and

L_{n} (λ, θ, s, a, x) = \prod_{i = 1}^{n} P (X = x_{i}) .

The MLE of the PSHLZD parameters in this case must be directly tackled with the global optimization method described in Section 3.4, since

ℓ_{n} (λ, θ, s, a)

is not analytically tractable referring to (29).

5. The One Inflated Hurwitz-Lerch Zeta Distribution

Definition 4.

A discrete random variable

Z \overset{d}{=} OIHLZD (p, a, θ, s)

if

\begin{matrix} P (Z = 1) & = & p + (1 - p) P (Y = 1), \\ P (Z = x) & = & (1 - p) P (Y = x), x = 2, 3, \dots \end{matrix}

(36)

with

p \in [0, 1]

and

Y \overset{d}{=} HLZD (a, θ, s) .

This definition parallels the definition of the Zero Inflated Modified Power Series Distribution given by Gupta [28]. If

G_{Z} (t)

denotes the pgf of

Z \overset{d}{=} OIHLZD (p, a, θ, s)

then

G_{Z} (t) = p t + (1 - p) G_{Y} (t)

(37)

and the HLZD is retrieved by setting

p = 0 .

5.1. Moments and Cumulants

If

Z \overset{d}{=} OIHLZD (p, a, θ, s)

then

M_{Z} (t) = G_{Z} (e^{t}) = p e^{t} + (1 - p) G_{Y} (e^{t}) = p e^{t} + (1 - p)

M_{Y} (t)

from (37). Thus

ν_{k} : = E [Z^{k}] = p + (1 - p) ξ_{k} k = 0, 1, \dots

(38)

with

ξ_{k}

the k-th moment of

Y \overset{d}{=} HLZD (a, θ, s)

given in (16). For example, we have

E [Z] = p + (1 - p) E [Y]

and

Var [Z] = (1 - p) [Var [Y] + p (1 + E {[Y]}^{2} - 2 E [Y])] .

Similarly, if

Q_{Z} (t)

is the factorial mgf of

Z \overset{d}{=} OIHLZD (p, a, θ, s),

since

Q_{Z} (t) = G_{Z} (t + 1) = p (t + 1) + (1 - p) G_{Y} (t + 1) = p (t + 1) + (1 - p) Q_{Y} (t),

with

Q_{Y} (t)

the factorial mgf of

Y \overset{d}{=} HLZD (a, θ, s),

then

{(ν)}_{k} : = E [Z (Z - 1) \dots (Z - k + 1)] = \{\begin{matrix} p + (1 - p) {(ξ)}_{1} & k = 1 \\ (1 - p) {(ξ)}_{k} & k = 2, 3, \dots \end{matrix}

with

{(ξ)}_{k}

the k-th factorial moment of

Y \overset{d}{=} HLZD (a, θ, s)

given in (17).

OIHLZD cumulants are

κ_{n} (Z) = L_{n} (ν_{1}, \dots, ν_{n}), n = 1, 2, \dots

with

{ν_{j}}

moments of

Z \overset{d}{=} OIHLZD (p, a, θ, s)

given in (38), and

L_{n}

the n-th logarithmic polynomial (12).

5.2. Maximum Likelihood Estimation

To estimate the OIHLZD parameters using the MLE, let us first rewrite (36) using (18), that is

\begin{matrix} P (Z = 1) & = & 1 - w, \\ P (Z = x) & = & (1 - p) \frac{a (x) g {(θ)}^{x}}{f (θ)}, x = 1, 2 \dots \end{matrix}

(39)

where

w = (1 - p) [1 - P (Y = 1)], g (θ) = θ, a (x) = {(a + x)}^{- (s + 1)}

and

f (θ) = T (θ, a, s) .

Rewrite (39) as

\begin{matrix} P (Z = 1) = 1 - w \\ P (Z = x) = w P (W = x) \end{matrix}

where W has a One Truncated Hurwitz-Lerch Zeta Distribution (OTHLZD) [29], that is

P (W = x) : = \frac{1}{1 - \frac{a (1) g (θ)}{f (θ)}} \frac{a (x) g {(θ)}^{x}}{f (θ)}, x = 1, 2 \dots

(40)

Suppose

z = (z_{1}, \dots, z_{n})

is a vector of independent observations of

Z \overset{d}{=} OIHLZD (p, a, θ, s)

and

l (θ, a, s, w, z) = ln L_{n} (θ, a, s, w, z)

is the log-likelihood function with

L_{n} (θ, a, s, w, z) = \prod_{i = 1}^{n} P (Z = z_{i}) .

If

n_{j}

is the number of times the integer j appears in the vector

z

for

j = 1, 2, \dots,

then the log-likelihood

l (θ, a, s, w, z)

can be written as

\begin{matrix} l (θ, a, s, w, z) = n_{1} log (1 - w) + (n - n_{1}) log (w) + \sum_{j = 2}^{\infty} n_{j} log (\frac{P (Y = j)}{1 - P (Y = 1)}) . \end{matrix}

Now set

l_{1} (w, z) = n_{1} log (1 - w) + (n - n_{1}) log (w)

(41)

and

l_{2} (θ, a, s, z) = \sum_{j = 2}^{\infty} n_{j} (\frac{P (Y = j)}{1 - P (Y = 1)}) .

(42)

From (41) and (42), the parameters

(θ, a, s, w)

can be estimated separately, that is the estimation

\hat{w}

can be recovered from

l_{1} (w, z)

and the estimations

(\hat{θ}, \hat{a}, \hat{s})

from

l_{2} (θ, a, s, z)

. The latter ones give the MLE of the parameters of

W \overset{d}{=} OTHLZD (θ, s, a)

in (40) using the vector

z

restricted to the observations which are greater than

1 .

As a consequence, the estimation

\hat{p}

of p can be recovered from

\hat{w}

as

\hat{p} = 1 - \frac{\hat{w} T (\hat{θ}, \hat{a}, \hat{s})}{T (\hat{θ}, \hat{a}, \hat{s}) - a (1) \hat{θ}} .

6. Data-Fitting

6.1. Rainfall Depths and Interarrival Times

With rainfall depth we indicate to what depth liquid precipitation would cover a horizontal surface in an observation period if nothing could drain, evaporate or percolate from this surface. Let a time series of rainfall data be defined as

h = {h_{1}, \dots, h_{n}}

, where h (mm) is the rainfall depth recorded at a fixed uniform unit

τ

of time (e.g., a day). A day k is considered rainy if the rainfall depth

h_{k} \geq h^{*}

, where

h^{*}

is a fixed rainfall threshold. The sub-series of h of the rainy days can be defined as the event series

E = {t_{1}, \dots, t_{n_{r}}}

, where

n_{r} \leq n

is an integer multiple of the time-scale

τ

. The sequence built with the times elapsed between each element of E (except the first one) and the immediately preceding one is defined as the interarrival time series

I T = {I T_{2}, \dots, I T_{n_{r}}}

. In order to select an appropriate distribution for

I T

, some statistical characteristics usually observed in

I T

samples have to be considered: very high variance and skewness, relatively high frequency associated to the observation

I T = 1

, monotonically decreasing frequencies with a slowly decaying tail and a drop in the passage from the frequency at

I T = 1

to the one at

I T = 2

. The HLZD in (13) has been fitted to rain

I T

in [5] for stations in Sicily and in [30] for stations in Piedmont, with good results. However, it has not yet been considered for rainfall depths. Recall that in the following we assume to model rainfall depths with a discrete r.v.

6.2. The Data

In this paper, the

I T

series analyzed were obtained from the recorded rainfall observations, using the rainfall threshold

h^{*} = 1

mm, which is the conventional threshold stated by the World Meteorological Organization in order to discriminate between rainy and non rainy days. This dataset has not been previously considered in the literature and consists of both

I T

and h measured over 70 years at the following five stations: Floresta, Trapani, Torino, Oxford, Ceva. They were chosen in order to represent different climates from the rainfall characteristics point of view. Floresta and Trapani represent the Mediterranean climate with a very wet and a very dry situation respectively. For both stations, the rainfall is concentrated in the colder part of the year, as typical of the Mediterranean climate. Torino and Ceva are more continental, but Ceva is more influenced by the Ligurian sea. Therefore, Torino has its maximum rain in Spring, while that of Ceva is in Autumn, because of the heating of the sea in the Summer. Finally, Oxford is a northern Europe station with rainfall homogeneously spread across the whole year. The recordings start in 1947 and end in 2017, for a total of 70 years. Moreover, the time series were further subdivided. Thus for each station we considered for a total of 33 samples. Note that we did not consider wet and dry seasons for Oxford station due to its climate.

More specifically let station_name∈ {Floresta, Trapani, Torino, Oxford, Ceva} and season_name∈ {wet, dry, spring, summer, autumn, winter}. Then the samples tagged with station_name year span the whole length of the series for the station_name station, while the samples tagged with station_name season_name are the union of all the season_name seasons in the whole time series for the station_name station, omitting all the other seasons from the dataset. The MLE was used to fit the HLZD (Section 3), the PSHLZD (Section 4) and the OIHLZD (Section 5) to the dataset (Note that the PSHLZD has support

k = 0, 1, \dots

and the r.v.

I T

naturally has support

k = 1, 2, \dots

so we had to consider the shifted r.v.

I T^{'} = I T - 1

.). In all cases, the MLE has been tackled with the method described in Section 3.4. The addressed global optimization procedure was further simplified by the previously mentioned statistical characteristics of the data allowing to work on a subset of the parameter space

Θ .

the whole time series, without subdividing the different seasons;
all the wet seasons and all the dry seasons;
the standard meteorological seasons,

6.3. Results

In the following we summarize the results of the distribution fitting for

I T

and h data. The fitting was satisfactory for both the PSHLZD and the HLZD. The assessment of the goodness-of-fit was obtained by following the methodology suggested by [31]. In the case of long tailed distributions, the goodness-of-fit through the classical

χ^{2}

test might be biased, because if there are several small classes, strong asymmetry might occur [31] and some problems of inaccuracy might appear if the classes are grouped [32]. The alternative procedure used to test the goodness-of-fit relies on Monte Carlo simulation to numerically reconstruct the null hypothesis of the

χ^{2}

test to compute the p-values [33].

To further inspect the differences between the distributions, we have measured the fitting errors whose magnitude is strictly related to the discrepancy between the empirical frequencies and the fitted ones. Since many empirical frequencies are zero (in the tail), the cdf has been considered. In particular we considered the mean absolute error (MAE) and the mean relative absolute error (MRAE). Let us recall that, if

x = (x_{1}, \dots, x_{n})

is an ordered sample, then

MAE (x) = \sum_{j = 1}^{n} \frac{1}{n} | F_{N} (x_{j}) - \hat{F} (x_{j}) |

and

MRAE (x) = \sum_{j = 1}^{n} \frac{1}{n} | F_{n} (x_{j}) - \hat{F} (x_{j}) | / F_{n} (x_{j})

with

F_{n}

the empirical cdf and

\hat{F}

the fitted cdf.

6.3.1. Interarrival Times

We have compared the fitted PSHLZD with the fitted HLZD and the fitted PSHLZD with the fitted OIHLZD. To summarize the results, we have selected 4 of the 33 available samples since they have been considered particularly meaningful with respect to the whole dataset. The selected samples were Floresta Summer, Trapani Wet, Trapani Dry and Torino Winter.

Figure 1 is an example of

I T

empirical frequencies: they usually range from a high peak located at

I T = 1

to a multitude of rather smaller values in the slow decaying tails. Therefore, to perform comparisons, a log-log scale for all the plots has been adopted.

Figure 1. Histogram of the empirical

I T

frequency for the Trapani station over the whole year. The range is up to

133 .

The mode is

I T = 1

with relative frequency

0.44 .

The mean and the standard deviation are

5.89

and

11.97

respectively.

Figure 2 shows plots of the fitted PSHLZD (solid line) and HLZD (dotted line) compared with the empirical frequencies (dot line) for the 4 selected samples. The fitting in both cases is very good. In particular, in the cases of Floresta Summer, Torino Winter and Trapani Dry the PSHLZD succeeds in fitting the drop from

I T = 1

to

I T = 2

whereas the HLZD fails. This happens in the drier periods, where this drop is more prominent.

Figure 2. Log-log plots of the fitted HLZD (green dotted line), the fitted PSHLZD (black solid line) and the

I T

empirical frequencies (red dot line) for the 4 selected samples Floresta Summer, Torino Winter, Trapani Wet and Trapani Dry.

Moreover, Figure 1 shows the dominance of the frequencies corresponding to

I T = 1

and

I T = 2,

which are particularly meaningful in hydrology. Figure 3 shows the plots of MAE and MRAE obtained by comparing the fitted cdf’s of the PSHLZD (circle) and the HLZD (triangle) with the empirical

I T

cdf’s. Note that the MAE and the MRAE are in general lower for the PSHLZD. Due to the dominance of the frequency corresponding to

I T = 1,

we explored modelling

I T

with the OIHLZD for all the samples. In all cases, the fitted OIHLZD and the PSHLZD one have minimal differences and are almost indistinguishable (see Figure 4 for an example), confirming the great flexibility of the latter distribution.

Figure 3. Dot plots of MAE and MRAE taking as reference the cdf of the PSHLZD (black circle) and of the HLZD (red triangle) for all the

I T

samples. The maximum MAE as well as the mean MAE are given in the top left for both the fitting distributions.

Figure 4. (a), the fitted PSHLZD (black solid line) is plotted together with the fitted HLZD (green dotted line) and the

I T

empirical frequencies (red dot line) for the sample of Trapani Dry. (b) the fitted OIHLZD (black solid line) is plotted together with the fitted HLZD (green dotted line) and the

I T

empirical frequencies (red dot line) for the same sample.

To conclude the validation analysis, we compared sample means and sample variances with the same theoretical moments of the HLZD and the PSHLZD computed in Section 3 and Section 4 respectively. In Table 1, we show the results for the 4 selected samples. In all cases, the fitted distributions agree with the sample means. For the variances, the PSHLZD performs better in many cases. In Table 1, an exception is Trapani Wet because the data are highly dispersed.

Table 1. The sample means and the sample variances for the 4 selected samples are given in the first column. The means and the variances of the fitted HLZD and of the fitted PSHLZD are given in the second and in the third column respectively.

6.3.2. Rainfall Depths

In this section, we summarize the fitting of the rainfall depth time series using both the PSHLZD and the HLZD. We omit the comparison with the OIHLZD since this distribution does not add more insights on the fitting nor what happens for the

I T

datasets.

Figure 5 shows again an empirical frequency histogram ranging from a high peak in

h = 1

to a multitude of rather smaller values in the slowly decaying tails. As in the previous section, we employed a log-log scale for all the plots. The selected samples were Ceva Winter, Torino Winter, Floresta Dry and Trapani Summer.

Figure 5. Plot of the empirical h frequency histogram for the Trapani station over the whole year. 111 is the maximum registered depth. The mode is

h = 1

with relative frequency

0.22 .

The mean and the standard deviation are

6.81

and

9.16

respectively.

In Figure 6 we have plotted the fitted PSHLZD and HLZD compared with the empirical frequencies. As with

I T

samples, the fitting is very good, even better that in the

I T

case. Moreover there is less difference between the performances of the PSHLZD and the HLZD.

Figure 6. Log-log plots of the fitted HLZD (green dotted line), the fitted PSHLZD (black solid line) and the h empirical frequencies (red dot line) for the 4 selected samples Ceva Winter, Torino Winter, Floresta Dry and Trapani Summer.

Figure 7 shows the plots of the MAE and the MRAE obtained by comparing the fitted cdf’s of the PSHLZD (circle) and the HLZD (triangle) with the empirical h cdf’s. Even though both errors are smaller for the PSHLZD, there is less difference between the two distributions and they are generally lower than for the

I T

case.

Figure 7. Dot plots of MAE and MRAE taking as reference the cdf of the PSHLZD (black circle) and of the HLZD (red triangle) for all the h samples. The maximum MAE as well as the mean MAE are given in the top left for both the fitting distributions.

7. Conclusions

The first part of this paper focuses on a class of discrete distributions useful to describe very high one counts and long tails. We have reviewed the main properties using the combinatorics of exponential Bell polynomials. This device has permitted the derivation of closed form expressions for the pdf’s and their convolutions, as well as moments and cumulants. Moreover, new results on log-concavity have been presented. We have also considered the OIHLZD to compare its features with the HLZD and the PSHLZD. This deep analysis was aimed of investigating how to use these models to find a better fit for rainfall data. Indeed, the PSHLZD and the HLZD were fitted on Interarrival Times

I T

and rainfall depths h data coming from 5 different stations, which composed a dataset never previously analyzed in the literature. The h data were treated as observations of a discrete r.v., which is not the usual practice in the literature, but seems reasonable when taking into account how they are measured. The fitting was performed with the classical MLE method, but the likelihood was maximized using the Simulated Annealing procedure, which turns out to be fundamental since there are no closed forms of the likelihood equations. The fit was very good for both distributions, with the PSHLZD performing slightly better than the HLZD. This mostly happens for the

I T

data. Moreover, the PSHLZD was also able to replicate the fit of the OIHLZD further validating its flexibility.

From the modelling point of view, let us underline two final remarks. Firstly, the fit was excellent for both the

I T

and the h data, suggesting that the PSHLZD can be proposed as a general framework in rainfall modelling. Secondly, it is noteworthy to underline that these models capture the variability of rainfall stochastic phenomena, even though the 5 considered stations represent very different climates: a case study not yet considered in the literature that deals with previous applications of HLZD. Future works will consider modelling the dependence (inter-correlation) between

I T

and h. Given the remarkable performance of these distribution families in the univariate modelling, a first step would be to consider bivariate modified power series distributions [34] and the methods to estimate their parameters on a rainfall time series. This is in the agenda for our future developments.

Author Contributions

Conceptualization, C.A., G.B., S.F., E.D.N. and T.M.; methodology, E.D.N. and T.M.; software, T.M.; validation, E.D.N. and T.M.; formal analysis, E.D.N. and T.M.; data curation, C.A., G.B. and S.F.; writing—original draft preparation, E.D.N. and T.M.; writing—review and editing, C.A., G.B., S.F., E.D.N. and T.M.; funding acquisition, S.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by MIUR—Dipartimento di Eccellenza- DIST department funds, and by the “PRIN MIUR 2017SL7ABC_005 WATZON Project”.

Data Availability Statement

Data are available upon request to the Società Italiana di Meteorologia.

Acknowledgments

The authors would like to thank the anonymous referees and the associate editor for carefully reading this manuscript and giving valuable comments to improve the previous version of this paper. The authors would like to thank Nicholas Howden of University of Bristol and Daniele Cat Berro and Luca Mercalli of Società Meteorologica Italiana to provide the English and North-Italy rainfall time series. The authors would like to thank the “Risk Responsible Resilience Interdepartmental Centre” (R3C).

Conflicts of Interest

The authors declare no conflict of interest.

References

Serinaldi, F. A multisite daily rainfall generator driven by bivariate copula-based mixed distributions. J. Geophys. Res. Atmos. 2009, 114, D10103. [Google Scholar] [CrossRef]
Davis, A.; Wiscombe, W.; Cahalan, R.; Marshak, A. Multifractal characterizations of nonstationary and intermittency in geophysical fields: Observed, retrieved, or simulated. J. Geophys. Res. 1994, 99, 8055–8072. [Google Scholar] [CrossRef]
Lawrance, A.J. Stochastic Modelling of Daily Rainfall Sequences. J. R. Stat. Soc. Ser. C Appl. Stat. 1979, 28, 84. [Google Scholar] [CrossRef][Green Version]
Chatfield, C. Wet and dry spells. Weather 1966, 21, 308–310. [Google Scholar] [CrossRef]
Agnese, C.; Baiamonte, G.; Cammalleri, C. Modelling the occurrence of rainy days under a typical Mediterranean climate. Adv. Water Resour. 2014, 64, 62–76. [Google Scholar] [CrossRef]
Liew, K.W.; Ong, S.H.; Toh, K.K. The Poisson-stopped Hurwitz–Lerch zeta distribution. Commun. Stat.—Theory Methods 2022, 51, 5638–5652. [Google Scholar] [CrossRef]
Liu, M.; Zhang, L.; Fang, Y.; Dong, H. Exact solutions of fractional nonlinear equations by generalized bell polynomials and bilinear method. Therm. Sci. 2021, 25, 1373–1380. [Google Scholar] [CrossRef]
Taghavian, H. The use of partition polynomial series in Laplace inversion of composite functions with applications in fractional calculus. Math. Methods Appl. Sci. 2019, 42, 2169–2189. [Google Scholar] [CrossRef]
Fathizadeh, F.; Kafkoulis, Y.; Marcolli, M. Bell polynomials and Brownian bridge in spectral gravity models on multifractal Robertson–Walker cosmologies. In Annales Henri Poincaré; Springer: Berlin/Heidelberg, Germany, 2020; Volume 21, pp. 1329–1382. [Google Scholar]
Bernardara, P.; De Michele, C.; Rosso, R. A simple model of rain in time: An alternating renewal process of wet and dry states with a fractional (non-Gaussian) rain intensity. Atmos. Res. 2007, 84, 291–301. [Google Scholar] [CrossRef]
Yang, L.; Franzke, C.L.E.; Fu, Z. Power-law behaviour of hourly precipitation intensity and dry spell duration over the United States. Int. J. Climatol. 2020, 40, 2429–2444. [Google Scholar] [CrossRef]
Porporato, A.; Vico, G.; Fay, P.A. Superstatistics of hydro-climatic fluctuations and interannual ecosystem productivity. Geophys. Res. Lett. 2006, 33, L15402. [Google Scholar] [CrossRef]
Foufoula-Georgiou, E.; Lettenmaier, D.P. Continuous-time versus discrete-time point process models for rainfall occurrence series. Water Resour. Res. 1986, 22, 531–542. [Google Scholar] [CrossRef]
Charalambides, C.A. Enumerative Combinatorics; CRC Press Series on Discrete Mathematics and its Applications; Chapman & Hall/CRC: Boca Raton, FL, USA, 2002; Volume 16, p. 609. [Google Scholar]
Di Nardo, E.; Guarino, G.; Senato, D. A unifying framework for k-statistics, polykays and their multivariate generalizations. Bernoulli 2008, 14, 440–468. [Google Scholar] [CrossRef]
Di Nardo, E.; Guarino, G. kStatistics: Unbiased Estimates of Joint Cumulant Products from the Multivariate Faà Di Bruno’s Formula. arXiv 2022, arXiv:2206.15348. [Google Scholar]
Aksenov, S.V.; Savageau, M.A. Some properties of the Lerch family of discrete distributions. arXiv 2005, arXiv:math/0504485. [Google Scholar]
Gupta, R.C. Modified power series distribution and some of its applications. Sankhyā Ser. B 1974, 36, 288–298. [Google Scholar]
Keilson, J.; Gerber, H. Some Results for Discrete Unimodality. J. Am. Stat. Assoc. 1971, 66, 386–389. [Google Scholar] [CrossRef]
Gupta, P.L.; Gupta, R.C.; Ong, S.H.; Srivastava, H. A class of Hurwitz–Lerch Zeta distributions and their applications in reliability. Appl. Math. Comput. 2008, 196, 521–531. [Google Scholar] [CrossRef]
Eger, S. Identities for partial Bell polynomials derived from identities for weighted integer compositions. Aequationes Math. 2016, 90, 299–306. [Google Scholar] [CrossRef]
Bélisle, C.J.P. Convergence theorems for a class of simulated annealing algorithms on R^d. J. Appl. Probab. 1992, 29, 885–895. [Google Scholar] [CrossRef]
Charalambides, C.A. On the generalized discrete distributions and the Bell polynomials. Sankhyā Ser. B 1977, 39, 36–44. [Google Scholar]
Badía, F.G.; Sangüesa, C.; Federgruen, A. Log-concavity of compound distributions with applications in operational and actuarial models. Probab. Eng. Inf. Sci. 2021, 35, 210–235. [Google Scholar] [CrossRef]
Yu, Y. On the entropy of compound distributions on nonnegative integers. IEEE Trans. Inform. Theory 2009, 55, 3645–3650. [Google Scholar] [CrossRef]
Bender, E.A.; Canfield, E.R. Log-concavity and related properties of the cycle index polynomials. J. Comb. Theory Ser. A 1996, 74, 57–70. [Google Scholar] [CrossRef]
Sato, K.i. Lévy processes and infinitely divisible distributions. In Cambridge Studies in Advanced Mathematics; Translated from the 1990 Japanese original, Revised edition of the 1999 English translation; Cambridge University Press: Cambridge, UK, 2013; Volume 68, pp. 14, 521. [Google Scholar]
Gupta, P.L.; Gupta, R.C.; Tripathi, R.C. Inflated modified power series distributions with applications. Commun. Stat.-Theory Methods 1995, 24, 2355–2374. [Google Scholar] [CrossRef]
Conceição, K.S.; Louzada, F.; Andrade, M.G.; Helou, E.S. Zero-modified power series distribution and its Hurdle distribution version. J. Stat. Comput. Simul. 2017, 87, 1842–1862. [Google Scholar] [CrossRef]
Baiamonte, G.; Mercalli, L.; Berro, D.C.; Agnese, C.; Ferraris, S. Modelling the frequency distribution of inter-arrival times from daily precipitation time-series in North-West Italy. Hydrol. Res. 2019, 50, 339–357. [Google Scholar] [CrossRef]
Martínez-Rodríguez, A.M.; Sáez-Castillo, A.; Conde-Sánchez, A. Modelling using an extended Yule distribution. Comput. Stat. Data Anal. 2011, 55, 863–873. [Google Scholar] [CrossRef]
Spierdijk, L.; Voorneveld, M. Superstars without talent? The Yule distribution controversy. Rev. Econ. Stat. 2009, 91, 648–652. [Google Scholar] [CrossRef]
Hope, A.C. A simplified Monte Carlo significance test procedure. J. R. Stat. Soc. Ser. B Methodol. 1968, 30, 582–598. [Google Scholar] [CrossRef]
Shoukri, M.M.; Consul, P.C. Bivariate Modified Power Series Distribution Some Properties, Estimation and Applications. Biom. J. 1982, 24, 787–799. [Google Scholar] [CrossRef]

$Fractalfract 06 00509 g001$

Figure 1. Histogram of the empirical

I T

frequency for the Trapani station over the whole year. The range is up to

133 .

The mode is

I T = 1

with relative frequency

0.44 .

The mean and the standard deviation are

5.89

and

11.97

respectively.

$Fractalfract 06 00509 g002$

Figure 2. Log-log plots of the fitted HLZD (green dotted line), the fitted PSHLZD (black solid line) and the

I T

empirical frequencies (red dot line) for the 4 selected samples Floresta Summer, Torino Winter, Trapani Wet and Trapani Dry.

$Fractalfract 06 00509 g003$

Figure 3. Dot plots of MAE and MRAE taking as reference the cdf of the PSHLZD (black circle) and of the HLZD (red triangle) for all the

I T

samples. The maximum MAE as well as the mean MAE are given in the top left for both the fitting distributions.

$Fractalfract 06 00509 g004$

Figure 4. (a), the fitted PSHLZD (black solid line) is plotted together with the fitted HLZD (green dotted line) and the

I T

empirical frequencies (red dot line) for the sample of Trapani Dry. (b) the fitted OIHLZD (black solid line) is plotted together with the fitted HLZD (green dotted line) and the

I T

empirical frequencies (red dot line) for the same sample.

$Fractalfract 06 00509 g005$

Figure 5. Plot of the empirical h frequency histogram for the Trapani station over the whole year. 111 is the maximum registered depth. The mode is

h = 1

with relative frequency

0.22 .

The mean and the standard deviation are

6.81

and

9.16

respectively.

$Fractalfract 06 00509 g006$

Figure 6. Log-log plots of the fitted HLZD (green dotted line), the fitted PSHLZD (black solid line) and the h empirical frequencies (red dot line) for the 4 selected samples Ceva Winter, Torino Winter, Floresta Dry and Trapani Summer.

$Fractalfract 06 00509 g007$

Figure 7. Dot plots of MAE and MRAE taking as reference the cdf of the PSHLZD (black circle) and of the HLZD (red triangle) for all the h samples. The maximum MAE as well as the mean MAE are given in the top left for both the fitting distributions.

Table 1. The sample means and the sample variances for the 4 selected samples are given in the first column. The means and the variances of the fitted HLZD and of the fitted PSHLZD are given in the second and in the third column respectively.

Trapani Dry
	Sample	HLZ	PSHLZ
Mean	3.758	3.758	3.758
Var	26.173	26.668	26.588
Trapani Wet
	Sample	HLZ	PSHLZ
Mean	12.788	12.788	12.788
Var	461.938	461.168	529.513
Floresta Summer
	Sample	HLZ	PSHLZ
Mean	8.93	8.93	8.93
Var	161.344	189.699	153.808
Torino Winter
	Sample	HLZD	PSHLZD
Mean	6.594	6.594	6.594
Var	99.561	130.709	107.723

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Modelling the Frequency of Interarrival Times and Rainfall Depths with the Poisson Hurwitz-Lerch Zeta Distribution

Abstract

1. Introduction

2. Bell Polynomials in a Nutshell

3. The Hurwitz-Lerch Zeta Distribution

3.1. Moments and Cumulants

3.2. Mode

3.3. Convolution

3.4. Maximum Likelihood Estimation

4. The Poisson-Stopped Hurwitz-Lerch Zeta Distribution

4.1. Log-Concavity

4.2. Moments and Cumulants

4.3. Maximum Likelihood Estimation

5. The One Inflated Hurwitz-Lerch Zeta Distribution

5.1. Moments and Cumulants

5.2. Maximum Likelihood Estimation

6. Data-Fitting

6.1. Rainfall Depths and Interarrival Times

6.2. The Data

6.3. Results

6.3.1. Interarrival Times

6.3.2. Rainfall Depths

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics