A Novel Generalization of Zero-Truncated Binomial Distribution by Lagrangian Approach with Applications for the COVID-19 Pandemic

Muhammed Rasheed Irshad; Christophe Chesneau; Damodaran Santhamani Shibu; Mohanan Monisha; Radhakumari Maya

doi:10.3390/stats5040060

,

and

¹

Department of Statistics, Cochin University of Science and Technology, Cochin 682 022, India

²

Department of Mathematics, Université de Caen Basse-Normandie, F-14032 Caen, France

³

Department of Statistics, University College, Thiruvananthapuram 695 034, India

^*

Author to whom correspondence should be addressed.

Stats2022, 5(4), 1004-1028;https://doi.org/10.3390/stats5040060

Version Notes

Order Reprints

Abstract

The importance of Lagrangian distributions and their applicability in real-world events have been highlighted in several studies. In light of this, we create a new zero-truncated Lagrangian distribution. It is presented as a generalization of the zero-truncated binomial distribution (ZTBD) and hence named the Lagrangian zero-truncated binomial distribution (LZTBD). The moments, probability generating function, factorial moments, as well as skewness and kurtosis measures of the LZTBD are discussed. We also show that the new model’s finite mixture is identifiable. The unknown parameters of the LZTBD are estimated using the maximum likelihood method. A broad simulation study is executed as an evaluation of the well-established performance of the maximum likelihood estimates. The likelihood ratio test is used to assess the effectiveness of the third parameter in the new model. Six COVID-19 datasets are used to demonstrate the LZTBD’s applicability, and we conclude that the LZTBD is very competitive on the fitting objective.

Keywords:

Lagrangian zero-truncated binomial distribution; index of dispersion; maximum likelihood method; generalized likelihood ratio test; COVID-19; simulation

MSC:

60E05; 62E10; 62F10

1. Introduction

Certain discrete distributions whose support is a set of positive integers are known as zero-truncated discrete distributions (ZTDDs). ZTDDs are used in ecology to represent data relating to counts, such as the number of flower heads, fly eggs, European red mites, or the number of times snowshoe hares were collected over seven days. These distributions are also employed in sociology to simulate data such as the size of human groups in parks, beaches, and public locations. As a result, ZTDDs have applications in practically every discipline of study, including biology, medicine, psychology, demography, and political science. In particular, the zero-truncated Poisson distribution (ZTPD) was used in [1] to analyze the number of eggs and gall-cell counts in flower heads. The authors of [2] used the ZTPD to model deer hunting in California. The author of [3] employed the zero-truncated negative binomial distribution (ZTNBD) to model the number of children ever born to a sample of moms over 40 years old; additionally, the authors of [4] used the ZTNBD in a regression model to treat over-dispersed count data of ischemic stroke hospitalizations. The author of [5] analyzed stroke count data based on the ZTPD, ZTNBD, and zero-truncated generalized negative binomial distribution (ZTGNBD). The application of the ZTNBD in the investigation of rare species abundance and hospital stays was discussed in [6]. The authors of [7] considered the use of ZTBD as a randomization device.

Considering the health aspect, many different diseases, ranging from the ordinary cold to much more dangerous ailments like Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS), can be caused by the large family of viruses known as coronaviruses. The first cases of novel coronavirus (COVID-19) were found in Wuhan, China, in 2019 and the World Health Organization (WHO) has proclaimed it to be a pandemic. A coordinated international effort has been launched to halt the virus from spreading further, and the scientific community has contributed by starting various investigations. When it comes to model phenomena, statisticians play a critical role, and several attempts have already been made in the statistical literature. To estimate the daily new COVID-19 instances in China, the author of [8] used a mathematical model called SIR distribution. The authors of [9] developed a discrete version of the generalized Lindley distribution to model the daily new cases and deaths in the COVID-19 count data. A discrete type-2 half logistic exponential distribution was presented in [10] for estimating the number of COVID-19 deaths in Pakistan and Saudi Arabia. To model COVID-19 data in Singapore, the authors of [11] employed a discrete Marshall–Olkin inverted Topp–Leone distribution. Following the discovery of such a widespread epidemic, at least one new positive case is reported daily in practically all nations. To the best of our knowledge, ZTDDs are the most appropriate statistical model for such a situation. As far as we know, not even one statistician has attempted to model regularly occurring positive instances using ZTDDs. Hence, in this article, our aim is to propose a novel ZTDD to model the daily new positive cases. Furthermore, based on the same ZTDD, we also tried to model the number of deaths attributable to COVID-19 in a day.

On the other hand, Lagrangian distributions are a subclass of Lagrangian expansions, which were initially introduced in [12]. The authors of [13,14] introduced a discrete Lagrangian family (DLF) of probability distributions, which encompasses a vast and important class of probability distributions. It includes many families. Additionally, the authors of [14] showed that, under certain conditions, all discrete Lagrangian distributions converge to the normal and inverse Gaussian distributions. The author of [15], who discovered the Lagrangian negative binomial distribution, demonstrated its utility in a queuing process. The authors of [16] created the Lagrangian Katz family. The authors of [17] looked at how Lagrangian probability distributions can be employed to solve inferential difficulties in random mapping theory. The generalized Poisson gamma dependency model was developed in [18] using Lagrangian probability models. For collisional turbulent fluid-particle flows, the authors of [19] used the Lagrangian probability density function (pdf) models. The above-mentioned importance of the Lagrangian distributions immensely motivated us to propose a new ZTDD based on the Lagrangian approach. Therefore, based on the Lagrangian technique, we propose a unique ZTDD known as the Lagrangian zero-truncated binomial distribution (LZTBD) that can serve as a discrete model for a variety of count datasets.

The remaining parts of the paper are organized as follows: Section 2 presents some preliminaries of the Lagrangian probability distribution. In Section 3, we discuss the definition and properties of the LZTBD. The finite mixture of the new Lagrangian model is displayed in Section 4. In Section 5, we derive the maximum likelihood (ML) estimation method to estimate the unknown parameters of the LZTBD. The significance of the additional parameter is tested by using a generalized likelihood ratio test in Section 6. The finite sample performance of the ML estimation method is analyzed in Section 7 with a simulation study. Six real-world datasets are considered in Section 8 to demonstrate the usefulness of the proposed model. The concluding remarks are given in Section 9.

2. Some Preliminaries

In order to introduce the DLF, we consider the following Lagrange expansion presented in [20,21]:

\frac{g_{2} (z)}{1 - \frac{z g_{1}^{'} (z)}{g_{1} (z)}} = \sum_{r = 0}^{\infty} b_{r} u^{r},

(1)

where

u = \frac{z}{g_{1} (z)}

,

b_{0} = g_{2} (0)

,

b_{r} = \frac{1}{r!} D^{r} \{{(g_{1} (z))}^{r} g_{2} (z)\} |_{z = 0}

, with

D^{r} = \frac{d^{r}}{d z^{r}}

and

r = 0, 1, 2, 3, \dots

Consider the Lagrange expansion given in (1), where

g_{1} (z)

and

g_{2} (z)

are successively differentiable analytic functions over [−1, 1] such that

g_{1} (1) = g_{2} (1) = 1

,

g_{1} (0) \neq 0

, and

g_{2} (0) \geq 0

. A new type of probability mass function (pmf) was defined in [13,22], and it is indicated as follows:

P (X = r) = \frac{b_{r}}{\sum_{r = 0}^{\infty} b_{r}}, r = 0, 1, 2, 3, \dots

(2)

provided that

\sum_{r = 0}^{\infty} b_{r}

is finite.

Putting

z = u = 1

into (1), we obtain

\frac{g_{2} (1)}{1 - g_{1}^{'} (1)} = \frac{1}{1 - g_{1}^{'} (1)} = \sum_{r = 0}^{\infty} b_{r},

which gives, from (2),

P (X = r) = \frac{(1 - g_{1}^{'} (1)) D^{r} [\{{(g_{1} (z))}^{r} g_{2} (z)\}] |_{z = 0}}{r!}, r = 0, 1, 2, 3 \dots

(3)

This pmf defined the DLF in the broad sense.

The corresponding probability generating function (pgf) is given by

Ψ (u) = \frac{(1 - g_{1}^{'} (z)) g_{2} (z)}{1 - \frac{z g_{1}^{'} (z)}{g_{1} (z)}},

(4)

where

z = u g_{1} (z)

.

Given the applications of the DLF built with

g_{1} (z)

and

g_{2} (z)

in (3), it is worthwhile to investigate additional horizon distributions using the new function

g_{2} (z)

. This is the basis for the study’s updated distribution, which is shown below.

3. Construction of Lagrangian Zero-Truncated Binomial Distribution

The LZTBD is introduced in this section as a new member of the DLF.

Proposition 1.

Assume that the random variable (rv) X follows the LZTBD, in which

0 < α < β^{- 1}

,

0 < β < 1

and

γ > 0

. Then, the pmf of X is given by

f (x) = P (X = x) = \frac{(1 - α β) [(\binom{γ + α x}{x}) - (\binom{α x}{x})]}{1 - {(1 - β)}^{γ}} β^{x} {(1 - β)}^{γ + α x - x}, x = 1, 2, 3 \dots,

(5)

where

(\binom{y}{x})

stands for the generalized binomial coefficient, that is

(\binom{y}{x}) = \frac{y (y - 1) \dots (y - x + 1)}{x!}

.

Proof.

Let

g_{1} (z) = {(1 - β + β z)}^{α}

and

g_{2} (z) = \frac{{(1 - β + β z)}^{γ} - {(1 - β)}^{γ}}{1 - {(1 - β)}^{γ}}

which satisfy the statements given in Section 2. Using the DLF given in (3), the pmf of the LZTBD can be derived as follows:

\begin{matrix} f (x) & = (1 - g_{1}^{'} (1)) \frac{D^{x} [{(g_{1} (z))}^{x} g_{2} (z)] |_{z = 0}}{x!} \\ = (1 - α β) \frac{D^{x} [{(1 - β + β z)}^{α x} \frac{{(1 - β + β z)}^{γ} - {(1 - β)}^{γ}}{1 - {(1 - β)}^{γ}}] |_{z = 0}}{x!} \\ = \frac{(1 - α β) [(\binom{γ + α x}{x}) - (\binom{α x}{x})]}{1 - {(1 - β)}^{γ}} β^{x} {(1 - β)}^{γ + α x - x} . \end{matrix}

Thus, the proof is completed. □

The distribution described in (5) is denoted as LZTBD(

α, β, γ

), and one can note

X \sim LZTBD (α, β, γ)

to inform that an rv denoted by X follows the LZTBD with parameters

α

,

β

, and

γ

. Some special cases from the LZTBD are described below:

For $α \to 0$ , the LZTBD( $α, β, γ$ ) reduces to the one-parameter ZTBD. In this sense, the LZTBD is a generalization of the ZTBD;
For $γ = 1$ , LZTBD( $α, β, γ$ ) reduces to the Lagrangian weighted Consul distribution given in [23].

Now, Figure 1 portrays the graphical representation of the LZTBD for different parameter values of

α

,

β

, and

γ

.

Figure 1. Various shapes of the probability mass function (pmf) of the LZTBD for different values of the parameters.

The hazard rate function (hrf) of the LZTBD is obtained by substituting the pmf in the following equation:

\begin{matrix} f_{x} = P (X = x | X \geq x) = \frac{f (x)}{\sum_{j = x}^{\infty} f (j)}, x = 1, 2, 3, \dots \end{matrix}

(6)

Following (6), it goes without saying that determining the closed form expression of the hrf is more difficult, although, in order to determine the shape of the hrf, we sketch its graph. Figure 2 demonstrates the following facts about the shapes of the hrf of the LZTBD, indicating that the LZTBD has all of the typical shapes, such as increasing, decreasing, and bathtub shapes for varying parameter values.

Figure 2. Various shapes of the hazard rate function (hrf) of the LZTBD for different parameter values.

Furthermore, the choice of various specific functions for

g_{2} (z)

will provide various members of DLF. In the following, we list some DLF distributions available in the literature.

3.1. Poisson-Binomial Distribution

If we take

g_{1} (z) = {(1 - β + β z)}^{α}

and

g_{2} (z) = e^{γ (z - 1)}

, based on (3), the pmf of the considered distribution is obtained as

\begin{matrix} f_{1} (x) & = \frac{D^{x} [{(1 - β + β z)}^{α x} e^{γ (z - 1)}] |_{z = 0}}{x!} \\ = \frac{(1 - α β) e^{- γ} γ^{x} {(1 - β)}^{α x}}{x!}_{2} F_{0} (- x, - α x; \frac{β}{γ (1 - β)}), x = 0, 1, 2, 3 \dots, \end{matrix}

where

_{2} F_{0}

is the hypergeometric function. Since their pmfs coincide, the corresponding distribution is identified as the Poisson-binomial distribution (see [23]).

3.2. Weighted Consul Distribution

If we take

g_{1} (z) = {(1 - β + β z)}^{α}

and

g_{2} (z) = z

, based on (3), the pmf of the considered distribution can be derived as

\begin{matrix} f_{2} (x) & = (1 - α β) \frac{D^{x} [{(1 - β + β z)}^{α x} z] |_{z = 0}}{x!} \\ = (1 - α β) (\binom{α x}{x - 1}) β^{x - 1} {(1 - β)}^{α x - x + 1}, x = 1, 2, 3 \dots, \end{matrix}

which is the pmf of the weighted Consul distribution (see [23]).

3.3. Weighted Delta Binomial Distribution

If we take

g_{1} (z) = {(1 - β + β z)}^{α}

and

g_{2} (z) = z^{γ}

, based on (3), the pmf of the considered distribution is obtained as

\begin{matrix} f_{3} (x) & = (1 - α β) \frac{D^{x} [{(1 - β + β z)}^{α x} z^{γ}] |_{z = 0}}{x!} \\ = (1 - α β) (\binom{α x}{x - γ}) β^{x - γ} {(1 - β)}^{α x - x + γ}, x = γ, γ + 1, γ + 2 \dots, \end{matrix}

which corresponds to the pmf of the weighted delta binomial distribution (see [23]).

3.4. Linear Function Binomial Distribution

If we take

g_{1} (z) = {(1 - β + β z)}^{α}

and

g_{2} (z) = {(1 - β + β z)}^{γ}

, based on (3), the pmf of the considered distribution can be derived as

\begin{matrix} f_{4} (x) & = (1 - α β) \frac{D^{x} [{(1 - β + β z)}^{α x} {(1 - β + β z)}^{γ}] |_{z = 0}}{x!} \\ = (1 - α β) (\binom{γ + α x}{x}) β^{x} {(1 - β)}^{γ + α x - x}, x = 0, 1, 2 \dots, \end{matrix}

which is the pmf of the linear function binomial distribution (see [23]).

Proposition 2.

Let X be an rv following the LZTBD. Then, the median of X is defined by the smaller integer m greater or equal to 1 such that

\sum_{x = 1}^{m} [(\binom{γ + α x}{x}) - (\binom{α x}{x})] β^{x} {(1 - β)}^{α x - x} \geq \frac{1 - {(1 - β)}^{γ}}{2 {(1 - β)}^{γ} (1 - α β)} .

(7)

Proof.

By the definition, m is the smallest integer in the support of the rv, i.e.,

\{1, 2, \dots\}

, such that

P (X \leq m) \geq \frac{1}{2}

, which is equivalent to the desired result. □

Proposition 3.

Let X be a rv following the LZTBD. Then, the mode of X, denoted by x_m, exists in

\{1, 2, \dots\}

and lies in the case:

\frac{η (x_{m} + 1)}{η (x_{m})} \leq \frac{1}{β {(1 - β)}^{α - 1}} \leq \frac{η (x_{m})}{η (x_{m} - 1)},

(8)

where

η (x_{m}) = (\binom{γ + α x_{m}}{x_{m}}) - (\binom{α x_{m}}{x_{m}})

.

Proof.

We must find the integer

x = x_{m}

for which

f (x)

has the greatest value. That is, we aim to solve

f (x) \geq f (x - 1)

and

f (x) \geq f (x + 1)

. First, note that

f (x)

can also be written as

f (x) = \frac{(1 - α β) β^{x} {(1 - β)}^{γ + α x - x} η (x)}{1 - {(1 - β)}^{γ}},

where

η (x) = (\binom{γ + α x}{x}) - (\binom{α x}{x})

.

Obviously,

f (x) \geq f (x - 1)

implies that

\frac{η (x)}{η (x - 1)} \geq \frac{1}{β {(1 - β)}^{α - 1}} .

(9)

Additionally,

f (x) \geq f (x + 1)

implies that

\frac{η (x + 1)}{η (x)} \leq \frac{1}{β {(1 - β)}^{α - 1}} .

(10)

By combining (9) and (10), we obtain (8). □

Proposition 4.

The pgf of an rv X following the LZTBD is expressed as

Ψ (u) = E (u^{X}) = \frac{(1 - α β) {{(1 - β + β z)}^{γ} - {(1 - β)}^{γ}}}{(1 - {(1 - β)}^{γ}) (1 - \frac{z α β}{1 - β + β z})},

(11)

where

z = u {(1 - β + β z)}^{α}

.

Proof.

Using (4), the pgf of the LZTBD is of the following form:

\begin{matrix} Ψ (u) = \frac{(1 - g_{1}^{'} (1)) g_{2} (z)}{(1 - \frac{z g_{1}^{'} (z)}{g_{1} (z)})} = \frac{(1 - α β) {{(1 - β + β z)}^{γ} - {(1 - β)}^{γ}}}{(1 - {(1 - β)}^{γ}) (1 - \frac{z α β}{1 - β + β z})} . \end{matrix}

Thus, the proof is complete. □

Corollary 1.

The moment generating function (mgf) of an rv X following the LZTBD is obtained by putting

z = e^{s}

and

u = e^{k}

in (11). That is,

M (k) = E (e^{k X}) = \frac{(1 - α β) {{(1 - β + β e^{s})}^{γ} - {(1 - β)}^{γ}}}{(1 - {(1 - β)}^{γ}) (1 - \frac{α β e^{s}}{1 - β + β e^{s}})},

(12)

where

s = k + α log (1 - β + β e^{s})

.

Corollary 2.

The cumulant generating function (cgf) of an rv X following the LZTBD becomes

C (k) = log [M (k)] = log [\frac{(1 - α β) {{(1 - β + β e^{k})}^{γ} - {(1 - β)}^{γ}}}{(1 - {(1 - β)}^{γ}) (1 - \frac{α β e^{s}}{1 - β + β e^{s}})}],

(13)

where

s = k + α log (1 - β + β e^{s})

.

Proposition 5.

Let

X_{1}, X_{2}, \dots, X_{n}

be n independently and identically distributed (iid) rvs following the LZTBD(

α, β, γ

). Then, the distribution of the random sum variable

V = \sum_{i = 1}^{n} X_{i}

has the following pgf:

\begin{matrix} Ψ_{1} (u) = \frac{{(1 - α β)}^{n} {{(1 - β + β z)}^{γ} - {(1 - β)}^{γ}}^{n}}{{(1 - {(1 - β)}^{γ})}^{n} {(1 - \frac{z α β}{1 - β + β z})}^{n}}, \end{matrix}

where

z = u {(1 - β + β z)}^{α}

.

Proof.

Based on the pgf of the LZTBD given in (11), the pgf of the rv V becomes

\begin{matrix} Ψ_{1} (u) & = E (u^{V}) = E (u^{X_{1} + X_{2} + \dots + X_{n}}) = \prod_{i = 1}^{n} E (u^{X_{i}}) = \prod_{i = 1}^{n} Ψ (u) = {[Ψ (u)]}^{n} \\ = \frac{{(1 - α β)}^{n} {{(1 - β + β z)}^{γ} - {(1 - β)}^{γ}}^{n}}{{(1 - {(1 - β)}^{γ})}^{n} {(1 - \frac{z α β}{1 - β + β z})}^{n}} . \end{matrix}

This completes the proof. □

Proposition 6.

For any integer

r \geq 1

, the

r^{t h}

factorial moment of an rv X following the LZTBD(

α, β, γ

) is given by

\begin{matrix} μ_{[r]} & = E [X (X - 1) (X - 2) \dots (X - r + 1)] \\ = {{(1 - {(1 - β)}^{γ})}^{- 1} D^{r} ({1 - β + β z)}^{γ}) \\ + \frac{α β \sum_{i = 1}^{r} (r - i + 1) μ_{[r - i]} D^{i} (u {(1 - β + β z)}^{α - 1})}{1 - α β}} |_{u = z = 1}, \end{matrix}

(14)

where

z = u {(1 - β + β z)}^{α}

.

Proof.

By definition, the

r^{t h}

factorial moment of the LZTBD(

α, β, γ

) is obtained by successively differentiating

Ψ (u)

given in (11) r times with respect to (wrt) u and by putting

u = z = 1

. First, note that

(1 - u g_{1}^{'} (z)) Ψ (u) = (1 - g_{1}^{'} (1)) g_{2} (z) .

Taking the first derivative wrt u on both sides, we obtain

Ψ (u) D^{1} (1 - u g_{1}^{'} (z)) + Ψ^{'} (u) (1 - u g_{1}^{'} (z)) = (1 - g_{1}^{'} (1)) D^{1} (g_{2} (z)) .

Taking the second derivative of the above equation wrt u on both sides, we obtain

Ψ (u) D^{2} (1 - u g_{1}^{'} (z)) + 2 D^{1} (1 - u g_{1}^{'} (z)) Ψ^{'} (u) + (1 - u g_{1}^{'} (z)) Ψ^{″} (u) = (1 - g_{1}^{'} (1)) D^{2} (g_{2} (z)) .

Proceeding in a similar manner, the

r^{t h}

derivative is of the following form:

D^{r} Ψ (u) = \frac{(1 - g_{1}^{'} (1)) D^{r} (g_{2} (z)) - \sum_{i = 1}^{r} (r - i + 1) D^{i} (1 - u g_{1}^{'} (z)) D^{r - i} Ψ (u)}{1 - u g_{1}^{'} (z)} .

(15)

Substitute

g_{1} (z) = {(1 - β + β z)}^{α}

,

g_{2} (z) = \frac{{(1 - β + β z)}^{γ} - {(1 - β)}^{γ}}{1 - {(1 - β)}^{γ}}

and

z = u = 1

, in (15), we obtain (14). □

Proposition 7.

The mean (μ) and variance (

σ^{2}

) for the LZTBD are of the following forms, respectively,

μ = \frac{γ β}{(1 - {(1 - β)}^{γ}) (1 - α β)} + \frac{α β (1 - β)}{{(1 - α β)}^{2}}

(16)

and

\begin{matrix} σ^{2} & = \frac{α β + (α^{2} - 3 α) β^{2} + (2 α - α^{2}) β^{3} + 2 α^{3} β^{4}}{{(1 - α β)}^{4}} \\ + \frac{γ^{2} β^{2} + γ β (1 - β) - γ^{2} α β^{3}}{(1 - {(1 - β)}^{γ}) {(1 - α β)}^{3}} - \frac{γ^{2} β^{2}}{{(1 - {(1 - β)}^{γ})}^{2} {(1 - α β)}^{2}} . \end{matrix}

(17)

Proof.

Using (14), we obtain

\begin{matrix} μ = E (X) = & \frac{g_{2}^{'} (1)}{(1 - g_{1}^{'} (1))} + \frac{g_{1}^{″} (1) + g_{1}^{'} (1) - {(g_{1}^{'} (1))}^{2}}{{(1 - g_{1}^{'} (1))}^{2}} \\ = \frac{γ β}{(1 - {(1 - β)}^{γ}) (1 - α β)} + \frac{α β (1 - β)}{{(1 - α β)}^{2}} . \end{matrix}

On the other hand, we have

\begin{matrix} σ^{2} & = E (X (X - 1)) + E (X) - {(E (X))}^{2} \\ = \frac{g_{2}^{″} (1) + g_{2}^{'} (1) - {(g_{2}^{'} (1))}^{2}}{{(1 - g_{1}^{'} (1))}^{2}} + \frac{(1 + g_{2}^{'} (1)) g_{1}^{″} (1) + g_{1}^{'} (1) - {(g_{1}^{'} (1))}^{2})}{{(1 - g_{1}^{'} (1))}^{3}} \\ + \frac{g_{1}^{‴} (1) + g_{1}^{'} (1) g_{1}^{″} (1) + 2 g_{1}^{″} (1)}{{(1 - g_{1}^{'} (1))}^{3}} + \frac{2 {(g_{1}^{″} (1))}^{2}}{{(1 - g_{1}^{'} (1))}^{4}} \\ = \frac{α β + (α^{2} - 3 α) β^{2} + (2 α - α^{2}) β^{3} + 2 α^{3} β^{4}}{{(1 - α β)}^{4}} \\ + \frac{γ^{2} β^{2} + γ β (1 - β) - γ^{2} α β^{3}}{(1 - {(1 - β)}^{γ}) {(1 - α β)}^{3}} - \frac{γ^{2} β^{2}}{{(1 - {(1 - β)}^{γ})}^{2} {(1 - α β)}^{2}} . \end{matrix}

The desired expressions are obtained. □

A normalized measure of dispersion can be obtained by utilizing the variance-to-mean relationship. This measure is the well-known index of dispersion (IOD). The next result expressed it for the LZTBD, among others.

Proposition 8.

The IOD and coefficient of variation (CV) for the LZTBD are given as, respectively,

IOD = \frac{\frac{α β + (α^{2} - 3 α) β^{2} + (2 α - α^{2}) β^{3} + 2 α^{3} β^{4}}{{(1 - α β)}^{4}} + \frac{γ^{2} β^{2} + γ β (1 - β) - γ^{2} α β^{3}}{(1 - {(1 - β)}^{γ}) {(1 - α β)}^{3}} - \frac{γ^{2} β^{2}}{{(1 - {(1 - β)}^{γ})}^{2} {(1 - α β)}^{2}}}{\frac{γ β}{(1 - {(1 - β)}^{γ}) (1 - α β)} + \frac{α β (1 - β)}{{(1 - α β)}^{2}}}

(18)

and

CV = \frac{\sqrt{\frac{α β + (α^{2} - 3 α) β^{2} + (2 α - α^{2}) β^{3} + 2 α^{3} β^{4}}{{(1 - α β)}^{4}} + \frac{γ^{2} β^{2} + γ β (1 - β) - γ^{2} α β^{3}}{(1 - {(1 - β)}^{γ}) {(1 - α β)}^{3}} - \frac{γ^{2} β^{2}}{{(1 - {(1 - β)}^{γ})}^{2} {(1 - α β)}^{2}}}}{\frac{γ β}{(1 - {(1 - β)}^{γ}) (1 - α β)} + \frac{α β (1 - β)}{{(1 - α β)}^{2}}} .

(19)

Proof.

We have

\begin{matrix} IOD & = \frac{σ^{2}}{μ} \\ = \frac{\frac{α β + (α^{2} - 3 α) β^{2} + (2 α - α^{2}) β^{3} + 2 α^{3} β^{4}}{{(1 - α β)}^{4}} + \frac{γ^{2} β^{2} + γ β (1 - β) - γ^{2} α β^{3}}{(1 - {(1 - β)}^{γ}) {(1 - α β)}^{3}} - \frac{γ^{2} β^{2}}{{(1 - {(1 - β)}^{γ})}^{2} {(1 - α β)}^{2}}}{\frac{γ β}{(1 - {(1 - β)}^{γ}) (1 - α β)} + \frac{α β (1 - β)}{{(1 - α β)}^{2}}} . \end{matrix}

Analogously, the CV is given by

\begin{matrix} CV & = \frac{\sqrt{σ^{2}}}{μ} \\ = \frac{\sqrt{\frac{α β + (α^{2} - 3 α) β^{2} + (2 α - α^{2}) β^{3} + 2 α^{3} β^{4}}{{(1 - α β)}^{4}} + \frac{γ^{2} β^{2} + γ β (1 - β) - γ^{2} α β^{3}}{(1 - {(1 - β)}^{γ}) {(1 - α β)}^{3}} - \frac{γ^{2} β^{2}}{{(1 - {(1 - β)}^{γ})}^{2} {(1 - α β)}^{2}}}}{\frac{γ β}{(1 - {(1 - β)}^{γ}) (1 - α β)} + \frac{α β (1 - β)}{{(1 - α β)}^{2}}} . \end{matrix}

□

A probabilistic model’s asymmetry degree and flatness are commonly assessed by their skewness and kurtosis coefficients, respectively. The third central moment, normalized by the variance raised to the power of

3 / 2

, can be used to calculate the first, whereas the fourth central moment divided by the square of the variance can be used to calculate the second. Mean, variance, CV, IOD, skewness, and kurtosis for selected values of parameters of the LZTBD(

α, β, γ

) are summarized in Table 1. From this table, it is evident that the LZTBD possesses both over-dispersion (IOD

> 1

) and under-dispersion (IOD

< 1

) for varying parameter values. It is also noted that the LZTBD is mainly right-skewed, and has several kurtosis levels.

Table 1. Values of some moment measures of the LZTBD for various values of parameters

α

,

β

, and

γ

.

4. Identifiability

Finite mixture models have received a lot of attention in recent years in real contexts. In astronomy, biology, genetics, medicine, psychiatry, marketing, and other fields, mixture models are widely utilized (see [24]). We derive finite mixtures of the LZTBD(

α, β, γ

) in this section. This mixed model may be appropriate in the context of future initiatives.

Let Y be a discrete rv with the pmf

h (y) = \sum_{i = 1}^{g} l_{i} h_{i} (y)

, where

i = 1, 2, \dots g

,

l_{i} > 0

such that

\sum_{i = 1}^{g} l_{i} = 1

,

h_{i} (y) \geq 0

and

\sum_{y} h_{i} (y) = 1

. Then, we state that Y has a mixture distribution and

h (y)

is a finite mixture of distributions. The constants

l_{1}, l_{2}, \dots, l_{g}

are known as mixing weights and

h_{1} (y), h_{2} (y), \dots, h_{g} (y)

, the components of the mixture. We denote as

Θ

the collection of all distinct parameters in the components.

Let

Σ = {U (y; θ_{i}) : θ_{i} \in Θ}

be the class of pmf’s from which mixtures are to be formed. Then, the class of finite mixtures of

Σ

with the appropriate class of pmf’s is

\hat{Δ} = {Δ (y) : Δ (y) = \sum_{i = 1}^{g} l_{i} U (y; θ_{i}), l_{i} > 0, U (y; θ_{i}) \in Σ, i = 1, 2, \dots g}

. So that

\hat{Δ}

is the convex hull of

Σ

.

Definition 1.

An intege-valued rv Y is said to have a g component mixture of the LZTBDs if it has the pmf

h (y) = P (Y = y)

of the following form:

h (y) = \sum_{i = 1}^{g} l_{i} h_{i} (y),

(20)

where

0 \leq l_{i} \leq 1

, for each

i = 1, 2, 3 \dots, g

,

\sum_{i = 1}^{g} l_{i} = 1

,

h_{i} (y) = \frac{(1 - α_{i} β_{i}) [(\binom{γ_{i} + α_{i} y}{y}) - (\binom{α_{i} y}{y})] β_{i}^{y} {(1 - β_{i})}^{γ_{i} + α_{i} y - y}}{(1 - {(1 - β_{i})}^{γ_{i}})}, y = 1, 2, \dots,

(21)

with

γ_{i} > 0

,

0 \leq α_{i} < β_{i}^{- 1}

and

0 < β_{i} < 1

for each

i = 1, 2, \dots, g

.

A distribution with pmf given in (20) is called the Lagrangian zero-truncated binomial mixture distribution with g components (LZTBMDg).

The following theorem from [25] is adopted to construct the identifiability conditions of the finite mixture model.

Theorem 1.

A necessary and sufficient condition for

\hat{Δ}

to be identifiable is that Δ should be linearly independent over the field of real numbers.

Proof.

The proof is stated in [25], hence, it is not included here. □

Next, applying Theorem 1, we outline the LZTBMDg’s identifiability requirements.

Theorem 2.

The identifiability conditions for the LZTBMDg with the pmf

h (y)

as given in (20) are

α_{i} \neq α_{j}

,

β_{i} \neq β_{j}

, and

γ_{i} \neq γ_{j}

for

i, j \in {1, 2, \dots, g}

, such that

i \neq j

.

Proof.

For the first step, take

g = 2

and consider the following equation:

b_{1} F_{1} (y) + b_{2} F_{2} (y) = 0,

(22)

where

b_{1}

and

b_{2}

are any two arbitrary real numbers,

F_{1} (y) = \sum_{j = 1}^{y} h (j)

and

F_{2} (y) = \sum_{j = 1}^{y} ϕ (j)

for

y = 1, 2, \dots,

in which

ϕ (j)

is obtained from

h (j)

by replacing

α_{j}

by

τ_{j}

,

β_{j}

by

δ_{j}

and

γ_{j}

by

ω_{j}

.

Assume that for each

i = 1, 2

and

α_{i} \neq τ_{i}

,

β_{i} \neq δ_{i}

and

γ_{i} \neq ω_{i}

. Thus, for

l_{1} = l

, we have

\begin{matrix} F_{1} (y) & = l \sum_{j = 1}^{y} \frac{(1 - α_{1} β_{1}) [(\binom{γ_{1} + α_{1} j}{j}) - (\binom{α_{1} j}{j})] β_{1}^{j} {(1 - β_{1})}^{γ_{1} + α_{1} j - j}}{1 - {(1 - β_{1})}^{γ_{1}}} \\ + (1 - l) \sum_{j = 1}^{y} \frac{(1 - α_{2} β_{2}) [(\binom{γ_{2} + α_{2} j}{j}) - (\binom{α_{2} j}{j})] β_{2}^{j} {(1 - β_{2})}^{γ_{2} + α_{2} j - j}}{1 - {(1 - β_{2})}^{γ_{2}}} \end{matrix}

(23)

and

\begin{matrix} F_{2} (y) & = l \sum_{j = 1}^{y} \frac{(1 - τ_{1} δ_{1}) [(\binom{ω_{1} + τ_{1} j}{j}) - (\binom{τ_{1} j}{j})] δ_{1}^{j} {(1 - δ_{1})}^{ω_{1} + τ_{1} j - j}}{1 - {(1 - δ_{1})}^{ω_{1}}} \\ + (1 - l) \sum_{j = 1}^{y} \frac{(1 - τ_{2} δ_{2}) [(\binom{ω_{2} + τ_{2} j}{j}) - (\binom{τ_{2} j}{j})] δ_{2}^{j} {(1 - δ_{2})}^{ω_{2} + τ_{2} j - j}}{1 - {(1 - δ_{2})}^{ω_{2}}} . \end{matrix}

(24)

Now, from (22)–(24), we obtain the following equations:

\begin{matrix} b_{1} \sum_{j = 1}^{y} \frac{(1 - α_{1} β_{1}) [(\binom{γ_{1} + α_{1} j}{j}) - (\binom{α_{1} j}{j})] β_{1}^{j} {(1 - β_{1})}^{γ_{1} + α_{1} j - j}}{1 - {(1 - β_{1})}^{γ_{1}}} \\ + b_{2} \sum_{j = 1}^{y} \frac{(1 - τ_{1} δ_{1}) [(\binom{ω_{1} + τ_{1} j}{j}) - (\binom{τ_{1} j}{j})] δ_{1}^{j} {(1 - δ_{1})}^{ω_{1} + τ_{1} j - j}}{1 - {(1 - δ_{1})}^{ω_{1}}} = 0 \end{matrix}

(25)

and

\begin{matrix} b_{1} \sum_{j = 1}^{y} \frac{(1 - α_{2} β_{2}) [(\binom{γ_{2} + α_{2} j}{j}) - (\binom{α_{2} j}{j})] β_{2}^{j} {(1 - β_{2})}^{γ_{2} + α_{2} j - j}}{1 - {(1 - β_{2})}^{γ_{2}}} \\ + b_{2} \sum_{j = 1}^{y} \frac{(1 - τ_{2} δ_{2}) [(\binom{ω_{2} + τ_{2} j}{j}) - (\binom{τ_{2} j}{j})] δ_{2}^{j} {(1 - δ_{2})}^{ω_{2} + τ_{2} j - j}}{1 - {(1 - δ_{2})}^{ω_{2}}} = 0 . \end{matrix}

(26)

Solving (25) and (26), we obtain

\begin{matrix} b_{1} \sum_{j = 1}^{y} & (1 - α_{1} β_{1}) (1 - τ_{2} δ_{2}) {(β_{1} δ_{2})}^{j} {(1 - β_{1})}^{γ_{1} + α_{1} j - j} {(1 - δ_{2})}^{ω_{2} + τ_{2} j - j} \\ \{\frac{[(\binom{γ_{1} + α_{1} j}{j}) - (\binom{α_{1} j}{j})] [(\binom{ω_{2} + τ_{2} j}{j}) - (\binom{τ_{2} j}{j})]}{(1 - {(1 - β_{1})}^{γ_{1}}) (1 - {(1 - δ_{2})}^{ω_{2}}}\} \\ = b_{1} \sum_{j = 1}^{y} (1 - α_{2} β_{2}) (1 - τ_{1} δ_{1}) {(β_{2} δ_{1})}^{j} {(1 - β_{2})}^{γ_{2} + α_{2} j - j} {(1 - δ_{1})}^{ω_{1} + τ_{1} j - j} \\ \{\frac{[(\binom{γ_{2} + α_{2} j}{j}) - (\binom{α_{2} j}{j})] [(\binom{ω_{1} + τ_{1} j}{j}) - (\binom{τ_{1} j}{j})]}{(1 - {(1 - β_{2})}^{γ_{2}}) (1 - {(1 - δ_{1})}^{ω_{1}}}\} . \end{matrix}

(27)

Hence, by (27), we have

b_{1} = 0

and thus,

b_{2} = 0

. Therefore, it may be inferred from Theorem 2 that

F_{1} (y)

and

F_{2} (y)

are linearly independent. Now that the argument may be applied to any positive integer g, the proof follows. □

Proposition 9.

The pgf of the LZTBMDg given in (20) is indicated as

Ψ (u) = \sum_{i = 1}^{g} \frac{(1 - α_{i} β_{i}) {{(1 - β_{i} + β_{i} z_{i})}^{γ_{i}} - {(1 - β_{i})}^{γ_{i}}}}{(1 - {(1 - β_{i})}^{γ_{i}}) (1 - \frac{z_{i} α_{i} β_{i}}{1 - β_{i} + β_{i} z_{i}})},

(28)

where

z_{i} = u {(1 - β_{i} + β_{i} z_{i})}^{α_{i}}

.

Proof.

The proof follows simply from Definition 1, given the pgf of the LZTBD mentioned in (11). □

5. Estimation of Parameters

In this section, we estimate the unknown parameters of the LZTBD by the ML estimation method.

It is worth mentioning that the model corresponding to the LZTBD

(α, β, γ)

is a tri-parametric model with parameters

α, β, and γ

. Let us have a random sample of size n from LZTBD and let the observed frequency be

n_{x}

,

x = 1, 2, \dots, k

, so that

\sum_{x = 1}^{k} n_{x} = n

, where k is the largest of the observed value having non-zero frequencies. Then, the likelihood function is given by

L = \prod_{x = 1}^{k} {\{\frac{(1 - α β) β^{x} {(1 - β)}^{γ + α x - x} [(\binom{γ + α x}{x}) - (\binom{α x}{x})]}{(1 - {(1 - β)}^{γ})}\}}^{n_{x}} .

Therefore, the log-likelihood function is given by

\begin{matrix} L_{n} = logL & = n log (1 - α β) + n \bar{x} log β + (n γ + n \bar{x} (α - 1)) log (1 - β) \\ - n log (1 - {(1 - β)}^{γ}) + \sum_{x = 1}^{k} n_{x} \{log [\prod_{i = 0}^{x - 1} (γ + α x - i) - \prod_{i = 0}^{x - 1} (α x - i)]\} \\ - \sum_{x = 1}^{k} n_{x} log (x!), \end{matrix}

where

\bar{x} = \frac{1}{n} \sum_{x = 1}^{k} x n_{x}

. The ML estimates (MLEs) are defined by maximizing

L_{n}

wrt the parameters. Let us denote by

\hat{α}

,

\hat{β}

, and

\hat{γ}

the MLEs of

α

,

β

, and

γ

, respectively. On the computational side, the score vector is

S = {(\begin{matrix} \frac{\partial L_{n}}{\partial α} & \frac{\partial L_{n}}{\partial β} & \frac{\partial L_{n}}{\partial γ} \end{matrix})}^{T},

where the partial derivatives of

L_{n}

wrt the parameters are

\frac{\partial L_{n}}{\partial α} = n \bar{x} log (1 - β) - \frac{n β}{1 - α β} + \sum_{x = 1}^{k} n_{x} \frac{\frac{\partial}{\partial α} [\prod_{i = 0}^{x - 1} (γ + α x - i) - \prod_{i = 0}^{x - 1} (α x - i)]}{\prod_{i = 0}^{x - 1} (γ + α x - i) - \prod_{i = 0}^{x - 1} (α x - i)},

\frac{\partial L_{n}}{\partial β} = \frac{n \bar{x}}{β} - \frac{n α}{(1 - α β)} - \frac{n γ + n (α - 1) \bar{x}}{1 - β} - \frac{n γ {(1 - β)}^{γ - 1}}{1 - {(1 - β)}^{γ}}

and

\frac{\partial L_{n}}{\partial γ} = n log (1 - β) + \frac{n γ {(1 - β)}^{γ - 1}}{1 - {(1 - β)}^{γ}} + \sum_{x = 1}^{k} n_{x} \frac{\frac{\partial}{\partial γ} [\prod_{i = 0}^{x - 1} (γ + α x - i)]}{\prod_{i = 0}^{x - 1} (γ + α x - i) - \prod_{i = 0}^{x - 1} (α x - i)} .

The MLEs can then be found by setting the score vector to zero, i.e.,

S = 0

, and solving them concurrently. These equations cannot be solved analytically, and the R statistical software can be used to solve them numerically by means of iterative techniques such as the Newton–Raphson algorithm.

6. Likelihood Ratio Test

In this section, we test the significance of an additional parameter included in the LZTBD using the generalized likelihood ratio test (GLRT) (see [26]).

More precisely, to test the significance of the parameter

α

of the LZTBD

(α, β, γ)

, here, we consider the GLRT procedure. The null hypothesis

H_{0} :

“X follows the ZTBD” against the alternative hypothesis

H_{1} :

“X follows the LZTBD”. Here, the test statistic is given by

- 2 log λ^{*} = 2 (L_{n} (\hat{Θ}) - L_{n} ({\hat{Θ}}^{*})),

(29)

where

\hat{Θ}

is the vector of MLEs of

Θ = (α, β, γ)

with no constraints, and

{\hat{Θ}}^{*}

is the MLEs of

Θ

under

H_{0}

.

7. Simulation

We perform a simulation study by generating observations employing the R software to examine the asymptotic behavior of the MLEs of the parameters of the LZTBD. Here, we apply the inverse transformation method to simulate a LZTBD random sample (see [27]). The algorithm is as follows:

Step 1:: Generate a random number from the uniform $U (0, 1)$ distribution.
Step 2:: $i = 1, P = (1 - α β) β γ {(1 - β)}^{γ + α - 1}, F = P$ .
Step 3:: If $U < F$ , set $X = i$ and stop.
Step 4:: $P = P \times \frac{β {(1 - β)}^{α - 1} (γ + α i) (\binom{γ + α (i + 1)}{i + 1})}{(γ + α (i + 1)) (\binom{γ + α i}{i})}$ , $F = F + P, i = i + 1$ .
Step 5:: Go to Step 3.

Conceptually, P is the probability that

X = i

, and F is the probability that X is less than or equal to i.

Additionally, indices such as MLEs, absolute biases, and mean squared errors (MSEs) are calculated using the following equations:

Average value of MLEs: MLE( $\hat{a}$ ) = $\frac{1}{N} \sum_{i = 1}^{N} {\hat{a}}_{i}$ .
Absolute average bias: Bias( $\hat{a}$ ) = $\frac{1}{N} \sum_{i = 1}^{N} | {\hat{a}}_{i} - a |$ .
MSE: MSE( $\hat{a}$ ) = $\frac{1}{N} \sum_{i = 1}^{N} {({\hat{a}}_{i} - a)}^{2}$ .

Here,

a = α

or

β

or

γ

, and the index i represents the

i^{t h}

generated sample. The simulation takes into account sample sizes of

n =

15, 50, 175, 500, and 1000 for two different sets of parameter values of the LZTBD. We repeat the process

N = 1000

times and report the estimates and MSEs in Table 2. From this table, one can infer that the estimates are quite stable and, more precisely, close to the true parameter values for these sample sizes. A decreasing trend is being observed in the absolute average bias and MSEs as we increase the sample size. Hence, the performance of the ML estimation is quite consistent and reliable.

Table 2. The maximum likelihood estimates (MLEs) simulation results for the parameters

α

,

β

, and

γ

.

8. Applications and Empirical Study

The aim of this section is to show the empirical importance of the LZTBD. We employ six genuine datasets to apply the superiority of the LZTBD fit to the more notable fields of COVID-19 with different nations, including Italy, Senegal, Pakistan, Saudi Arabia, Belgium, and Ethiopia. The graphical method used to determine the hrf of the data set is based on the Total Time on Test (TTT). Convex, concave, convex-then-concave, and concave-then-convex empirical TTT plots correspond to decreasing, increasing, bathtub shape, and upside-down bathtub shape for the corresponding hrf, respectively (see [28]). We employ the statistical software R to evaluate these datasets numerically. To show the possible benefit of the LZTBD, the distributions below are depicted:

ZTBD with parameters $β$ and $γ$ , which has the following pmf:

$f_{5} (x) = (\binom{γ}{x}) \frac{β^{x} {(1 - β)}^{γ - x}}{1 - {(1 - β)}^{x}}, x = 1, 2, 3 \dots$
Zero-truncated generalized binomial distribution (ZTGBD) with parameters $α$ , $β$ , and $γ$ with the following pmf:

$f_{6} (x) = \frac{γ}{γ + α x} (\binom{γ + α x}{x}) \frac{β^{x} {(1 - β)}^{γ + α x - x}}{1 - {(1 - β)}^{γ}}, x = 1, 2, 3 \dots$
Zero-truncated discrete two parameter Poisson–Lindley distribution (ZTDTPPLD) with parameters $γ$ and $β$ (see [29]), which have the following pmf:

$f_{7} (x) = \frac{γ^{2}}{γ^{2} + 2 γ β + γ + β} \frac{β x + γ + β + 1}{{(γ + 1)}^{x}}, x = 1, 2, 3 \dots$
Zero-truncated Poisson–Lindley distribution (ZTPLD) with parameters $α$ (see [30]), which has the following pmf:

$f_{8} (x) = \frac{α^{2} (x + α + 2)}{(α^{2} + 3 α + 1) {(α + 1)}^{x}}, x = 1, 2, 3, \dots$
Intervened generalized Poisson distribution (IGPD) with parameters $α$ , $β$ , and $γ$ (see [31]), which has the following pmf:

$f_{9} (x) = \frac{α \{(1 + γ) {((1 + γ) α + β x)}^{x - 1} - γ {(γ α + β x)}^{x - 1}\}}{e^{α γ + β x} (e^{α} - 1) x!}, x = 1, 2, 3 \dots$

8.1. COVID-19 Data Set from Italy

Italy’s 61-day COVID-19 data collection, conducted from 13 June to 12 August 2021, is accessible in [11]. Daily newly reported cases are included in this data collection. The descriptive measures of the real data set, which include sample size (n), minimum (min), first quartile (

Q_{1}

), median (

M_{d}

), third quartile (

Q_{3}

), maximum (max), and inter-quartile range (

I Q R

) are given in Table 3.

Table 3. Descriptive statistics for the COVID-19 data set of Italy.

In addition, Figure 3 shows an empirical TTT plot of the data, from which we deduce an increasing hrf.

Figure 3. Total Time on Test (TTT) plot for the COVID-19 data set of Italy.

We compare the competitive distributions to the LZTBD using the statistical techniques provided, namely, the negative log-likelihood (

- log L

), Akaike information criterion (AIC), Bayesian information criterion (BIC), and

χ^{2}

statistic. Table 4 displays the corresponding MLEs, model adequacy measures, and

χ^{2}

values. The LZTBD has lower model adequacy measures and

χ^{2}

values than the other distributions studied, as shown in Table 4. As a result, the suggested model is the most appropriate for modeling the given COVID-19 data. It is interesting to note that the empirical mean, variance, and IOD of this COVID-19 data set are 22.6229, 160.3388, and 7.0874, respectively, and the theoretical values for the mean, variance, and IOD measures of the LZTBD are 21.6234, 160.3248, and 7.4144, respectively. Thus, the empirical and theoretical means are almost the same, and the empirical and theoretical variances and IOD values are close to each other.

Table 4. Maximum likelihood estimates (MLEs), model adequacy measures and

χ^{2}

values for the COVID-19 data set of Italy.

In the case of GLRT, the calculated value based on the test statistic (29) is

2 (- 234.5071 + 485.9380) = 251.4309

(p-value

= 0.0001

). As a result, at any level greater than 0.0001, the null hypothesis is rejected in favor of the alternative hypothesis. Hence, we conclude that the additional parameter

α

in the LZTBD is significant in light of the test procedure outlined in Section 6.

8.2. COVID-19 Data Set from Senegal

The LZTBD is fitted to another set of data for the COVID-19 in Senegal for 56 days of infection, which was recorded from 29 March 2021 to 23 May 2021. These data, which show the daily incidence of COVID-19 cases, were gathered by the World Health Organization (WHO) and are accessible at http://covid19.who.int/data, (accessed on 24 August 2022). Table 5 includes some information as well as descriptive statistics for these data.

Table 5. Descriptive statistics for the COVID-19 data set of Senegal.

In addition, Figure 4 shows an empirical TTT plot of the data from which an increasing hrf is revealed.

Figure 4. Total Time on Test (TTT) plot for the COVID-19 data set of Senegal.

We compare the competitive distributions to the suggested distribution using the statistical techniques provided, specifically, the

- log L

, AIC, BIC, and

χ^{2}

values. Table 6 displays the corresponding MLEs, model adequacy measures, and

χ^{2}

values of the LZTBD. The LZTBD’s model adequacy measures and

χ^{2}

values are less than those of the other examined models. As a result, the suggested model is the most appropriate for modeling the COVID-19 data from Senegal. It is worth noting that the empirical mean, variance, and IOD of these COVID-19 datasets are 46.54, 394.326, and 8.47, respectively, and the theoretical values for the mean, variance, and IOD measures of the LZTBD are 46.4, 394.324, and 8.49, respectively. Thus, the empirical and theoretical means are almost the same, and the empirical and theoretical variances and IOD values are very close to each other.

Table 6. Maximum likelihood estimates (MLEs), model adequacy measures, and

χ^{2}

value for the Senegal data set.

In the case of GLRT, the calculated value based on the test statistic (29) is

2 (- 244.0212 + 584.8307) = 340.8095

(p-value

= 0.0004

). As a result, at any level greater than 0.0004, the null hypothesis is rejected in favor of the alternative hypothesis. Hence, we conclude that the additional parameter

α

in the LZTBD is significant in light of the test procedure outlined in Section 6.

8.3. COVID-19 Data Set from Pakistan

The LZTBD is fitted to another set of data for the COVID-19 in Pakistan for 95 days of infection, which was recorded from 23 May 2021 to 25 August 2021. These data, which are available at http://covid19.who.int/data, (accessed on 24 August 2022), were acquired by the WHO and show the daily incidence of COVID-19 cases. Table 7 contains some information and descriptive statistics for these data.

Table 7. Descriptive statistics for the COVID-19 data set of Pakistan.

In addition, Figure 5 shows an empirical TTT plot of the data, showing an increasing hrf.

Figure 5. Total Time on Test (TTT) plot for the COVID-19 data set of Pakistan.

Using the statistical methods offered, specifically the

- log L

, AIC, BIC, and

χ^{2}

values, we compare the competing distributions to the suggested distribution. Table 8 displays the corresponding MLEs, model adequacy measures, and

χ^{2}

values of the LZTBD. The LZTBD’s model adequacy measures and

χ^{2}

values are less than those of the other examined models. The suggested model is, therefore, the most suitable one to model the COVID-19 data from Pakistan. In addition, let us mention that the empirical mean, variance, and IOD of this COVID-19 dataset are 52.6842, 538.5801, and 10.2228, respectively, and the theoretical values for the mean, variance, and IOD measures of the LZTBD are 52.6704, 538.5509, and 10.2249, respectively. We thus observe that the empirical and theoretical means are almost equal, and the empirical and theoretical variances and IOD values are very close to each other.

Table 8. Maximum likelihood estimates (MLEs), model adequacy measures, and

χ^{2}

value for the Pakistan data set.

In the case of GLRT, the calculated value based on the test statistic (29) is

2 (- 431.4451 + 1313.275) = 881.8299

(p-value

= 0.0001

). As a result, at any level greater than 0.0001, the null hypothesis is rejected in favor of the alternative hypothesis. Hence, we conclude that the additional parameter

α

in the LZTBD is significant in light of the test procedure outlined in Section 6.

8.4. COVID-19 Data Set from Saudi Arabia

The LZTBD is fitted to another set of data of COVID-19 mortality numbers in Saudi Arabia for 83 days of infection, which was recorded from 30 May to 20 August 2020. The WHO gathered these data, which represent the number of deaths per day, and they are available at http://covid19.who.int/data, (accessed on 24 August 2022). Table 9 contains some information and descriptive statistics for these data.

Table 9. Descriptive statistics for the COVID-19 data set of Saudi Arabia.

In addition, Figure 6 shows an empirical TTT plot of the data and it shows an increasing hrf.

Figure 6. Total Time on Test (TTT) plot for the COVID-19 data set of Saudi Arabia.

We compare the competitive distributions to the suggested distribution using the statistical techniques provided, specifically, the

- log L

, AIC, BIC, and

χ^{2}

values. Table 10 displays the corresponding MLEs, model adequacy measures, and

χ^{2}

values of the LZTBD. The LZTBD’s model adequacy measures and

χ^{2}

values are less than those of the other examined models. For modeling the COVID-19 data from Saudi Arabia, the suggested model is therefore the most suitable. Furthermore, the empirical mean, variance, and the IOD values of this COVID-19 dataset are 36.9277, 70.5313, and 1.9099, respectively, and the theoretical values for the mean, variance, and IOD measures of the LZTBD are 36.8724, 71.0567, and 1.9270, respectively. Hence, the empirical and theoretical means are almost the same, and the empirical and theoretical variances and IOD values are very close to each other.

Table 10. Maximum likelihood estimates (MLEs), model adequacy measures, and

χ^{2}

value for the Saudi Arabia data set.

In the case of GLRT, the calculated value based on the test statistic (29) is

2 (- 294.288 + 415.1392) = 120.8512

(p-value

= 0.0002

). As a result, at any level greater than 0.0002, the null hypothesis is rejected in favor of the alternative hypothesis. Hence, we conclude that the additional parameter

α

in the LZTBD is significant in light of the test procedure outlined in Section 6.

8.5. COVID-19 Data Set from Belgium

A different set of data on the COVID-19 infection in Belgium for 425 days (more than a year), which was recorded from 22 July 2021 to 19 September 2022, is fitted using the LZTBD. The WHO gathered these data, which represent the number of deaths per day, and are accessible at http://covid19.who.int/data, (accessed on 24 August 2022). Table 11 contains some information and descriptive statistics for these data.

Table 11. Descriptive statistics for the COVID-19 data set of Belgium.

In addition, Figure 7 shows an empirical TTT plot from which we can distinguish an increasing hrf.

Figure 7. Total Time on Test (TTT) plot for the COVID-19 data set of Belgium.

We compare the competitive distributions to the suggested distribution using the statistical techniques provided, specifically, the

- log L

, AIC, BIC, and

χ^{2}

values. Table 12 displays the corresponding MLEs, model adequacy measures, and

χ^{2}

values of the LZTBD. The LZTBD’s model adequacy measures and

χ^{2}

values are less than those of the other examined models. As a result, the suggested model is the most appropriate for modeling the COVID-19 data from Belgium. It is worth noting that the empirical mean, variance, and IOD of this COVID-19 dataset are 17.122, 178.419, and 10.420, respectively, and the theoretical values for the mean, variance, and IOD measures of the LZTBD are 17.213, 178.412, and 10.365, respectively. Thus, the empirical and theoretical means are almost the same, and the empirical and theoretical variances and IOD values are very close to each other.

Table 12. Maximum likehood estimates (MLEs), model adequacy measures and

χ^{2}

value for the Belgium data set.

In the case of GLRT, the calculated value based on the test statistic (29) is

2 (- 1601.074 + 3825.833) = 2224.759

(p-value

= 0.0012

). As a result, at any level greater than 0.0012, the null hypothesis is rejected in favor of the alternative hypothesis. Hence, we conclude that the additional parameter

α

in the LZTBD is significant in light of the test procedure outlined in Section 6.

8.6. COVID-19 Data Set from Ethiopia

The LZTBD is fitted to another set of data on the COVID-19 infection in Ethiopia for 301 days, which was recorded from 25 August 2020 to 21 June 2021. The WHO collected these data, which represent the number of deaths per day, and are accessible at http://covid19.who.int/data, (accessed on 24 August 2022). Table 13 contains some information and descriptive statistics for these data.

Table 13. Descriptive statistics for the COVID-19 data set of Ethiopia.

As an additional result, Figure 8 shows an empirical TTT plot of the data, where an increasing hrf can be seen.

Figure 8. Total Time on Test (TTT) plot for the COVID-19 data set of Ethiopia.

We use the statistical techniques provided to compare the competitive distributions to the suggested distribution, specifically, the

- log L

, AIC, BIC, and

χ^{2}

values. Table 14 displays the corresponding MLEs, model adequacy measures, and

χ^{2}

values of the LZTBD. The LZTBD’s model adequacy measures and

χ^{2}

values are less than those of the other examined models. The suggested model is therefore the most suitable one to model the Ethiopian COVID-19 data. In addition, it is found that the empirical mean, variance, and IOD of this COVID-19 dataset are 11.973, 67.00, and 5.5959, respectively, and the theoretical values for the mean, variance, and IOD measures of the LZTBD are 11.891, 67.02, and 5.6361, respectively. Thus, the empirical and theoretical means are almost equal, and the empirical and theoretical variances and IOD values are very close to each other.

Table 14. Maximum likelihood estimates (MLEs), model adequacy measures, and

χ^{2}

value for the Ethiopia data set.

In the case of GLRT, the calculated value based on the test statistic (29) is

2 (1003.902 + 1680.734) = 676.832

(p-value

= 0.0002

). As a result, at any level greater than 0.0002, the null hypothesis is rejected in favor of the alternative hypothesis. Hence, we conclude that the additional parameter

α

in the LZTBD is significant in light of the test procedure outlined in Section 6.

9. Conclusions

In this article, we used the Lagrange expansion to elaborate a new three-parameter distribution called the Lagrangian zero-truncated binomial distribution (LZTBD). It is worth noting that the proposed distribution is a generalized form of the well-known zero-truncated binomial distribution and the Lagrangian weighted Consul distribution. In particular, we paid close attention to the LZTBD. We investigated the shape properties of the probability mass and hazard functions. The expressions for the factorial moments, generating functions, mean, and median were derived. The identifiability of the LZTBD model was also proved. The LZTBD’s model parameters are estimated using the maximum likelihood estimation method. A study employing the simulation technique was also performed to show how well the maximum likelihood estimates are performing. Six actual datasets were used to validate the applicability and demonstrate that the LZTBD offers a superior fit to the competing models.

Author Contributions

Conceptualization, M.R.I., C.C., D.S.S., M.M. and R.M.; methodology, M.R.I., C.C., D.S.S., M.M. and R.M.; software, M.R.I., C.C., D.S.S., M.M. and R.M.; validation, M.R.I., C.C., D.S.S., M.M. and R.M.; formal analysis, M.R.I., C.C., D.S.S., M.M. and R.M.; investigation, M.R.I., C.C., D.S.S., M.M. and R.M.; resources, M.R.I., C.C., D.S.S., M.M. and R.M.; data curation, M.R.I., C.C., D.S.S., M.M. and R.M.; writing—original draft preparation, M.R.I., C.C., D.S.S., M.M. and R.M.; writing—review and editing, M.R.I., C.C., D.S.S., M.M. and R.M.; visualization, M.R.I., C.C., D.S.S., M.M. and R.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would also like to thank three reviewers for their thorough comments which led to improvement in the presentation of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Finney, D.; Varley, G. An example of the truncated Poisson distribution. Biometrics 1955, 11, 387–394. [Google Scholar] [CrossRef]
Creel, M.; Loomis, J. Theoretical and Empirical Advantages of Truncated Count Data Estimators for Analysis of Deer Hunting in California. Am. J. Agric. Econ. 1990, 72, 434–441. [Google Scholar] [CrossRef]
Brass, W. Simplified methods of fitting the truncated negative binomial distribution. Biometrika 1959, 45, 59–68. [Google Scholar] [CrossRef]
Lee, A.; Wang, K.; Yau, K.; Somerford, P. Truncated Negative Binomial Mixed Regression Modelling of Ischaemic Stroke Hospitalizations. Stat. Med. 2003, 22, 1129–1139. [Google Scholar] [CrossRef]
Kennedy. Does Race Predict Stroke Readmission? An Analysis Using the Truncated Negative Binomial Model. J. Natl. Med. Assoc. 2005, 97, 699–713. [Google Scholar]
Phange, A.; Loh, E. Zero Truncated Strict Arcsine Model. Int. J. Comput. Electr. Autom. Control. Inf. Eng. 2013, 7, 989–991. [Google Scholar]
Zapata, Z.; Sedory, S.; Singh, S. Zero-truncated Binomial Distribution as a Randomization Device. Sociol. Methods Res. 2019, 51, 800–815. [Google Scholar] [CrossRef]
Nesteurk, I. Statistics-based predictions of coronavirus epidemic spreading in mainland China. Innov. Biosyst. Bioeng. 2020, 4, 8–13. [Google Scholar] [CrossRef]
El-Morshedy, M.; Altun, E.; Eliwa, M. A new statistical approach to model the counts of novel coronavirus cases. Math. Sci. 2021, 16, 37–50. [Google Scholar] [CrossRef]
Ahsan-ul Haq, M.; Babar, A.; Hashmi, S.; Alghamdi, A.S.; Afify, A.Z. The Discrete Type-II Half-Logistic Exponential Distribution with Applications to COVID-19 Data. Pak. J. Stat. Oper. Res. 2021, 7, 921–932. [Google Scholar] [CrossRef]
Almetwally, E.; Abdo, D.; Hafez, E.; Jawa, T.; Sayed-Ahmed, N.; Almongy, H. The new discrete distribution with application to COVID-19 Data. Results Phys. 2022, 32, 104987. [Google Scholar] [CrossRef] [PubMed]
Lagrange, J.L. Mécanique Analytique; Jacques Gabay: Paris, France, 1788. [Google Scholar]
Consul, P.C.; Shenton, L.R. Use of Lagrange expansion for generating generalized probability distributions. SIAM J. Appl. Math. 1972, 23, 239–248. [Google Scholar] [CrossRef]
Consul, P.C.; Shenton, L.R. Some interesting properties of Lagrangian distributions. Commun. Stat. 1973, 2, 263–272. [Google Scholar] [CrossRef]
Mohanty, S.G. On a generalized two-coin tossing problem. Biom. Z. 1966, 8, 266–272. [Google Scholar] [CrossRef]
Consul, P.C.; Famoye, F. Lagrangian Katz family of distributions. Commun. Stat. Theory Methods 1996, 25, 415–434. [Google Scholar] [CrossRef]
Berg, K.; Nowicki, K. Statistical inference for a class of modified power series distribution with applications to random mapping theory. J. Stat. Plan. Inference 1991, 28, 247–261. [Google Scholar] [CrossRef]
Li, S.; Black, D.; Lee, C.; Famoye, F. Dependence Models Arising from the Lagrangian Probability Distributions. Commun. Stat.—Theory Methods 2010, 29, 1729–1742. [Google Scholar] [CrossRef]
Innocenti, A.R.; Fox, O.; Chibbaro, S. A Lagrangian probability density-function model for collisional turbulent fluid-particle flows. J. Fluid Mech. 2019, 862, 449–489. [Google Scholar] [CrossRef]
Jensen, J.L. Sur une identité d’ Abel et sur d’ autres formules analogues. Acta Math. 1902, 26, 307–318. [Google Scholar] [CrossRef]
Riordan, J. Combinatorial Identities; John Wiley and Sons, Inc.: Newyork, NY, USA, 1968. [Google Scholar]
Janardan, K.G.; Rao, B.R. Lagrangian distributions of second kind and weighted distributions. SIAM J. Appl. Math. 1983, 43, 302–313. [Google Scholar] [CrossRef]
Consul, P.C.; Famoye, F. Lagrangian Probability Distributions; Birkhäuser: New York, NY, USA, 2006. [Google Scholar]
McLachlan, G.; Peel, D. Finite Mixture Models; Wiley: Hoboken, NJ, USA, 2000. [Google Scholar]
Titterington, D.M.; Smith, A.F.; Markov, U.E. Statistical Analysis of Finite Mixture Distributions; Wiley: Hoboken, NJ, USA, 1985. [Google Scholar]
Rao, C.R. Minimum variance and the estimation of several parameters. Math. Proc. Camb. Philos. Soc. 1947, 43, 280–283. [Google Scholar] [CrossRef]
Ross, S. Simulation, 5th ed.; Academic Press: Cambridge, MA, USA, 2013; pp. 5–38. [Google Scholar] [CrossRef]
Aarset, M.V. How to identify a bathtub hazard rate. IEEE Trans. Reliab. 1987, 36, 106–108. [Google Scholar] [CrossRef]
Shanker, R.; Shukla, K.K. Zero-Truncated Discrete Two-Parameter Poisson-Lindley Distribution with Applications. J. Inst. Sci. Technol. 2018, 22, 76–85. [Google Scholar] [CrossRef]
Shanker, R.; Fesshaye, H.; Selvaraj, S.; Yemane, A. On zero-truncation of Poisson and Poisson-Lindley distributions and their applications. Biom. Biostat. Int. J. 2015, 2, 168–181. [Google Scholar] [CrossRef]
Scollnik, D.M. On the Intervened Generalized Poisson Distribution. Commun. Stat.—Theory Methods 2006, 35, 953–963. [Google Scholar] [CrossRef]

Figure 1. Various shapes of the probability mass function (pmf) of the LZTBD for different values of the parameters.

Figure 2. Various shapes of the hazard rate function (hrf) of the LZTBD for different parameter values.

Figure 3. Total Time on Test (TTT) plot for the COVID-19 data set of Italy.

Figure 4. Total Time on Test (TTT) plot for the COVID-19 data set of Senegal.

Figure 5. Total Time on Test (TTT) plot for the COVID-19 data set of Pakistan.

Figure 6. Total Time on Test (TTT) plot for the COVID-19 data set of Saudi Arabia.

Figure 7. Total Time on Test (TTT) plot for the COVID-19 data set of Belgium.

Figure 8. Total Time on Test (TTT) plot for the COVID-19 data set of Ethiopia.

Table 1. Values of some moment measures of the LZTBD for various values of parameters

α

,

β

, and

γ

.

Table 1. Values of some moment measures of the LZTBD for various values of parameters

α

,

β

, and

γ

.

$β$	$γ$	$α$	Mean	Variance	CV	IOD	Skewness	Kurtosis
0.5	1	0.3	1.2802	0.1501	0.3026	0.1172	3.8847	14.6003
		0.5	1.5555	0.4444	0.4285	0.2857	3.3511	10.1883
		0.7	1.9526	1.2205	0.5657	0.6250	2.8561	6.9567
		0.9	2.5619	3.4546	0.7254	1.3484	2.3669	4.3115
		1.1	3.5802	10.7636	0.9163	3.0063	1.8712	2.1372
0.4	3	0.3	1.8323	0.7348	0.4678	0.4010	2.9568	7.4378
		0.5	2.1007	1.1358	0.5073	0.5406	2.6612	5.7455
		0.7	2.4499	1.8497	0.5551	0.7550	2.3650	4.2161
		0.9	2.9189	3.2112	0.6139	1.1001	2.0624	2.8257
		1.1	3.5750	6.0267	0.6866	1.6857	1.7492	1.5815
0.3	5	0.3	2.0574	1.0607	0.5005	0.5155	2.7004	5.9584
		0.5	2.2665	1.4134	0.5242	0.6236	2.5057	4.9160
		0.7	2.5178	1.9288	0.5515	0.7660	2.3089	3.9376
		0.9	2.8245	2.7065	0.5824	0.9582	2.1084	3.0172
		1.1	3.2056	3.9230	0.6178	1.2237	1.9027	2.1568
0.2	7	0.3	1.9389	1.0021	0.5163	0.5168	2.8934	7.2006
		0.5	2.0671	1.2174	0.5337	0.5889	2.7471	6.3127
		0.7	2.2113	1.4917	0.5523	0.6746	2.6045	5.5028
		0.9	2.3745	1.8451	0.5720	0.7770	2.4639	4.7524
		1.1	2.5601	2.3060	0.5931	0.9006	2.3242	4.0507
0.1	9	0.3	1.5433	0.5853	0.4957	0.3792	3.7718	13.9025
		0.5	1.5963	0.6626	0.5099	0.4150	3.6638	13.0428
		0.7	1.6526	0.7503	0.5241	0.4540	3.5590	12.2257
		0.9	1.7123	0.8501	0.5384	0.4964	3.4572	11.4502
		1.1	1.7757	0.9639	0.5528	0.5428	3.3580	10.7145

Table 2. The maximum likelihood estimates (MLEs) simulation results for the parameters

α

,

β

, and

γ

.

Table 2. The maximum likelihood estimates (MLEs) simulation results for the parameters

α

,

β

, and

γ

.

Parameter Set	Sample Size	Paramters	Estimates	Absolute Bias	MSE
$α = 0.5, β = 0.16, γ = 2.51$	$n =$ 15	$α$	4.3060	3.8060	20.4138
		$β$	0.0838	0.0761	0.0138
		$γ$	1.0452	1.4647	2.7145
	$n =$ 50	$α$	1.2245	0.7245	0.9985
		$β$	0.1302	0.0297	0.0046
		$γ$	1.3618	1.1481	2.0105
	$n =$ 175	$α$	0.6913	0.1913	0.2249
		$β$	0.1582	0.0017	0.0019
		$γ$	1.9122	0.5977	1.6952
	$n =$ 500	$α$	0.4476	0.0523	0.0665
		$β$	0.1586	0.0013	0.0012
		$γ$	2.8752	0.3652	0.4870
	$n =$ 1000	$α$	0.4918	0.0081	0.0243
		$β$	0.1609	0.0009	0.0007
		$γ$	2.5054	0.0049	0.3329
$α = 1.1, β = 0.05, γ = 0.58$	$n =$ 15	$α$	5.4405	4.3405	22.0280
		$β$	0.0429	0.0080	0.0016
		$γ$	0.7075	1.1275	2.5269
	$n =$ 50	$α$	1.2827	0.0627	0.0805
		$β$	0.0614	0.0079	0.0004
		$γ$	0.9028	0.3228	2.1591
	$n =$ 175	$α$	1.3129	0.0529	0.0802
		$β$	0.0421	0.0078	0.0003
		$γ$	0.2129	0.2639	1.8871
	$n =$ 500	$α$	1.0494	0.0505	0.0754
		$β$	0.0566	0.0066	0.0003
		$γ$	0.6411	0.0611	0.2654
	$n =$ 1000	$α$	1.0234	0.0302	0.0752
		$β$	0.0660	0.0030	0.0003
		$γ$	0.6112	0.0312	0.1333

Table 3. Descriptive statistics for the COVID-19 data set of Italy.

Statistic	n	min	$Q_{1}$	$M_{d}$	$Q_{3}$	max	$IQR$
Values	61	3	13	21	28	63	15

Table 4. Maximum likelihood estimates (MLEs), model adequacy measures and

χ^{2}

values for the COVID-19 data set of Italy.

Table 4. Maximum likelihood estimates (MLEs), model adequacy measures and

χ^{2}

values for the COVID-19 data set of Italy.

Model	ZTBD	ZTDTPPLD	ZTPLD	ZTGBD	IGPD	LZTBD
MLE	$γ = 63$	$γ = 0.0858$	$α = 0.0859$	$α = 1.0276$	$α = 6.8936$	$α = 1.0150$
	$β = 0.3587$	$α = 0.9999$		$β = 0.8142$	$β = 0.6298$	$β = 0.8302$
				$γ = 4.5332$	$γ = 0.2134$	$γ = 3.1778$
$- log L$	485.9380	239.8677	239.8677	234.5301	234.6429	234.5071
$χ^{2}$	12394.92	262.6835	262.6844	203.2570	205.0283	203.2323
$d f$	7	6	7	5	5	4
AIC	973.8760	483.7375	481.7355	475.0602	475.2857	475.0141
BIC	975.9869	487.9572	483.8463	481.3928	481.6184	481.3468

Table 5. Descriptive statistics for the COVID-19 data set of Senegal.

Statistic	n	min	$Q_{1}$	$M_{d}$	$Q_{3}$	max	IQR
Values	56	15	30.75	45.50	62.50	107	31.75

Table 6. Maximum likelihood estimates (MLEs), model adequacy measures, and

χ^{2}

value for the Senegal data set.

Table 6. Maximum likelihood estimates (MLEs), model adequacy measures, and

χ^{2}

value for the Senegal data set.

Model	ZTBD	ZTDTPPLD	ZTPLD	ZTGBD	IGPD	LZTBD
MLE	$γ = 107$	$γ = 0.9999$	$α = 0.0420$	$α = 1.0000$	$α = 0.6689$	$α = 1.000$
	$β = 0.4349$	$α = 0.0422$		$β = 0.8838$	$β = 1.7954$	$β = 0.8838$
				$γ = 6.1135$	$γ = 5.5010$	$γ = 5.1175$
$- log L$	584.8307	256.4383	256.6182	244.0218	244.2461	244.0212
$χ^{2}$	702.1028 × 10⁴	547.6933	549.2564	396.7983	404.4258	396.7792
$d f$	7	6	7	5	5	5
AIC	1171.661	516.8765	515.2363	494.0436	494.4922	494.0424
BIC	1173.687	520.9272	517.2617	500.1196	500.5682	500.1184

Table 7. Descriptive statistics for the COVID-19 data set of Pakistan.

Statistic	n	min	$Q_{1}$	$M_{d}$	$Q_{3}$	max	IQR
Values	95	11	33	47	73	102	40

Table 8. Maximum likelihood estimates (MLEs), model adequacy measures, and

χ^{2}

value for the Pakistan data set.

Table 8. Maximum likelihood estimates (MLEs), model adequacy measures, and

χ^{2}

value for the Pakistan data set.

Model	ZTBD	ZTDTPPLD	ZTPLD	ZTGBD	IGPD	LZTBD
MLE	$γ = 102$	$γ = 0.9999$	$α = 0.0372$	$α = 1.0000$	$α = 0.7132$	$α = 1.0000$
	$β = 0.5165$	$α = 0.0373$		$β = 0.9107$	$β = 0.8605$	$β = 0.9109$
				$γ = 5.1609$	$γ = 8.1173$	$γ = 4.1502$
$- log L$	1313.275	441.8371	448.0789	431.447	432.5312	431.4451
$χ^{2}$	3454.6107 × 10⁴	849.5348	851.525	647.5694	661.2751	647.3886
$d f$	8	7	8	6	6	6
AIC	2628.551	899.6741	898.1577	868.8939	871.0625	868.8902
BIC	2631.104	904.7819	900.7116	876.5556	878.7241	876.5518

Table 9. Descriptive statistics for the COVID-19 data set of Saudi Arabia.

Statistic	n	min	$Q_{1}$	$M_{d}$	$Q_{3}$	max	IQR
Values	83	17	32	37	41	58	9

Table 10. Maximum likelihood estimates (MLEs), model adequacy measures, and

χ^{2}

value for the Saudi Arabia data set.

Table 10. Maximum likelihood estimates (MLEs), model adequacy measures, and

χ^{2}

value for the Saudi Arabia data set.

Model	ZTBD	ZTDTPPLD	ZTPLD	ZTGBD	IGPD	LZTBD
MLE	$γ = 58$	$γ = 0.9999$	$α = 0.0530$	$α = 1.00008$	$α = 10.1672$	$α = 1.000026$
	$β = 0.6365$	$α = 0.0530$		$β = 0.4774$	$β = 0.2780$	$β = 0.4797$
				$γ = 40.4100$	$γ = 1.6222$	$γ = 39.0503$
$- log L$	415.1392	356.3418	356.3418	294.289	294.3522	294.288
$χ^{2}$	2928.725	486.2035	486.2029	150.3025	150.2124	143.2237
$d f$	7	6	7	5	5	4
AIC	832.2784	716.6837	714.6836	594.5770	594.7044	594.5763
BIC	834.6973	721.5213	717.1025	601.834	601.9609	601.8328

Table 11. Descriptive statistics for the COVID-19 data set of Belgium.

Statistic	n	min	$Q_{1}$	$M_{d}$	$Q_{3}$	max	IQR
Values	425	1	7	13	24	66	17

Table 12. Maximum likehood estimates (MLEs), model adequacy measures and

χ^{2}

value for the Belgium data set.

Table 12. Maximum likehood estimates (MLEs), model adequacy measures and

χ^{2}

value for the Belgium data set.

Model	ZTBD	ZTDTPPLD	ZTPLD	ZTGBD	IGPD	LZTBD
MLE	$γ = 66$	$γ = 0.9568$	$α = 0.1109$	$α = 1.2600$	$α = 0.7212$	$α = 1.1254$
	$β = 0.2594$	$α = 0.1128$		$β = 0.6599$	$β = 0.5740$	$β = 0.7483$
				$γ = 4.3279$	$γ = 2.9265$	$γ = 1.6206$
$- log L$	3825.833	1604.421	1612.689	1602.449	1602.091	1601.074
$χ^{2}$	3338.99	1049.849	1049.988	1068.491	1079.062	1049.466
$d f$	12	11	12	10	10	10
AIC	7653.665	3212.841	3227.379	3210.898	3210.181	3208.149
BIC	7657.717	3220.946	3231.431	3223.054	3222.338	3220.466

Table 13. Descriptive statistics for the COVID-19 data set of Ethiopia.

Statistic	n	min	$Q_{1}$	$M_{d}$	$Q_{3}$	max	IQR
Values	301	1	6	10	15	47	9

Table 14. Maximum likelihood estimates (MLEs), model adequacy measures, and

χ^{2}

value for the Ethiopia data set.

Table 14. Maximum likelihood estimates (MLEs), model adequacy measures, and

χ^{2}

value for the Ethiopia data set.

Model	ZTBD	ZTDTPPLD	ZTPLD	ZTGBD	IGPD	LZTBD
MLE	$γ = 15$	$γ = 0.9999$	$α = 0.1554$	$α = 2.0201$	$α = 0.5923$	$α = 1.1256$
	$β = 0.2547$	$α = 0.1610$		$β = 0.3261$	$β = 0.4168$	$β = 0.6681$
				$γ = 12.4252$	$γ = 3.3590$	$γ = 2.8021$
$- log L$	1680.734	1011.388	1022.044	1004.179	1003.939	1003.902
$χ^{2}$	16363.17	326.5928	346.5871	298.3448	298.8601	294.8328
$d f$	10	9	10	8	8	8
AIC	3363.467	2026.776	2046.087	2014.358	2013.878	2013.803
BIC	3367.175	2034.190	2049.794	2025.480	2025.00	2024.925

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

A Novel Generalization of Zero-Truncated Binomial Distribution by Lagrangian Approach with Applications for the COVID-19 Pandemic

Abstract

1. Introduction

2. Some Preliminaries

3. Construction of Lagrangian Zero-Truncated Binomial Distribution

3.1. Poisson-Binomial Distribution

3.2. Weighted Consul Distribution

3.3. Weighted Delta Binomial Distribution

3.4. Linear Function Binomial Distribution

4. Identifiability

5. Estimation of Parameters

6. Likelihood Ratio Test

7. Simulation

8. Applications and Empirical Study

8.1. COVID-19 Data Set from Italy

8.2. COVID-19 Data Set from Senegal

8.3. COVID-19 Data Set from Pakistan

8.4. COVID-19 Data Set from Saudi Arabia

8.5. COVID-19 Data Set from Belgium

8.6. COVID-19 Data Set from Ethiopia

9. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics