1. Introduction
Certain discrete distributions whose support is a set of positive integers are known as zero-truncated discrete distributions (ZTDDs). ZTDDs are used in ecology to represent data relating to counts, such as the number of flower heads, fly eggs, European red mites, or the number of times snowshoe hares were collected over seven days. These distributions are also employed in sociology to simulate data such as the size of human groups in parks, beaches, and public locations. As a result, ZTDDs have applications in practically every discipline of study, including biology, medicine, psychology, demography, and political science. In particular, the zero-truncated Poisson distribution (ZTPD) was used in [
1] to analyze the number of eggs and gall-cell counts in flower heads. The authors of [
2] used the ZTPD to model deer hunting in California. The author of [
3] employed the zero-truncated negative binomial distribution (ZTNBD) to model the number of children ever born to a sample of moms over 40 years old; additionally, the authors of [
4] used the ZTNBD in a regression model to treat over-dispersed count data of ischemic stroke hospitalizations. The author of [
5] analyzed stroke count data based on the ZTPD, ZTNBD, and zero-truncated generalized negative binomial distribution (ZTGNBD). The application of the ZTNBD in the investigation of rare species abundance and hospital stays was discussed in [
6]. The authors of [
7] considered the use of ZTBD as a randomization device.
Considering the health aspect, many different diseases, ranging from the ordinary cold to much more dangerous ailments like Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS), can be caused by the large family of viruses known as coronaviruses. The first cases of novel coronavirus (COVID-19) were found in Wuhan, China, in 2019 and the World Health Organization (WHO) has proclaimed it to be a pandemic. A coordinated international effort has been launched to halt the virus from spreading further, and the scientific community has contributed by starting various investigations. When it comes to model phenomena, statisticians play a critical role, and several attempts have already been made in the statistical literature. To estimate the daily new COVID-19 instances in China, the author of [
8] used a mathematical model called SIR distribution. The authors of [
9] developed a discrete version of the generalized Lindley distribution to model the daily new cases and deaths in the COVID-19 count data. A discrete type-2 half logistic exponential distribution was presented in [
10] for estimating the number of COVID-19 deaths in Pakistan and Saudi Arabia. To model COVID-19 data in Singapore, the authors of [
11] employed a discrete Marshall–Olkin inverted Topp–Leone distribution. Following the discovery of such a widespread epidemic, at least one new positive case is reported daily in practically all nations. To the best of our knowledge, ZTDDs are the most appropriate statistical model for such a situation. As far as we know, not even one statistician has attempted to model regularly occurring positive instances using ZTDDs. Hence, in this article, our aim is to propose a novel ZTDD to model the daily new positive cases. Furthermore, based on the same ZTDD, we also tried to model the number of deaths attributable to COVID-19 in a day.
On the other hand, Lagrangian distributions are a subclass of Lagrangian expansions, which were initially introduced in [
12]. The authors of [
13,
14] introduced a discrete Lagrangian family (DLF) of probability distributions, which encompasses a vast and important class of probability distributions. It includes many families. Additionally, the authors of [
14] showed that, under certain conditions, all discrete Lagrangian distributions converge to the normal and inverse Gaussian distributions. The author of [
15], who discovered the Lagrangian negative binomial distribution, demonstrated its utility in a queuing process. The authors of [
16] created the Lagrangian Katz family. The authors of [
17] looked at how Lagrangian probability distributions can be employed to solve inferential difficulties in random mapping theory. The generalized Poisson gamma dependency model was developed in [
18] using Lagrangian probability models. For collisional turbulent fluid-particle flows, the authors of [
19] used the Lagrangian probability density function (pdf) models. The above-mentioned importance of the Lagrangian distributions immensely motivated us to propose a new ZTDD based on the Lagrangian approach. Therefore, based on the Lagrangian technique, we propose a unique ZTDD known as the Lagrangian zero-truncated binomial distribution (LZTBD) that can serve as a discrete model for a variety of count datasets.
The remaining parts of the paper are organized as follows:
Section 2 presents some preliminaries of the Lagrangian probability distribution. In
Section 3, we discuss the definition and properties of the LZTBD. The finite mixture of the new Lagrangian model is displayed in
Section 4. In
Section 5, we derive the maximum likelihood (ML) estimation method to estimate the unknown parameters of the LZTBD. The significance of the additional parameter is tested by using a generalized likelihood ratio test in
Section 6. The finite sample performance of the ML estimation method is analyzed in
Section 7 with a simulation study. Six real-world datasets are considered in
Section 8 to demonstrate the usefulness of the proposed model. The concluding remarks are given in
Section 9.
2. Some Preliminaries
In order to introduce the DLF, we consider the following Lagrange expansion presented in [
20,
21]:
where
,
,
, with
and
Consider the Lagrange expansion given in (
1), where
and
are successively differentiable analytic functions over [−1, 1] such that
,
, and
. A new type of probability mass function (pmf) was defined in [
13,
22], and it is indicated as follows:
provided that
is finite.
Putting
into (
1), we obtain
which gives, from (
2),
This pmf defined the DLF in the broad sense.
The corresponding probability generating function (pgf) is given by
where
.
Given the applications of the DLF built with
and
in (
3), it is worthwhile to investigate additional horizon distributions using the new function
. This is the basis for the study’s updated distribution, which is shown below.
3. Construction of Lagrangian Zero-Truncated Binomial Distribution
The LZTBD is introduced in this section as a new member of the DLF.
Proposition 1. Assume that the random variable (rv) X follows the LZTBD, in which , and . Then, the pmf of X is given bywhere stands for the generalized binomial coefficient, that is . Proof. Let
and
which satisfy the statements given in
Section 2. Using the DLF given in (
3), the pmf of the LZTBD can be derived as follows:
Thus, the proof is completed. □
The distribution described in (
5) is denoted as LZTBD(
), and one can note
to inform that an rv denoted by
X follows the LZTBD with parameters
,
, and
. Some special cases from the LZTBD are described below:
For , the LZTBD() reduces to the one-parameter ZTBD. In this sense, the LZTBD is a generalization of the ZTBD;
For
, LZTBD(
) reduces to the Lagrangian weighted Consul distribution given in [
23].
Now,
Figure 1 portrays the graphical representation of the LZTBD for different parameter values of
,
, and
.
The hazard rate function (hrf) of the LZTBD is obtained by substituting the pmf in the following equation:
Following (
6), it goes without saying that determining the closed form expression of the hrf is more difficult, although, in order to determine the shape of the hrf, we sketch its graph.
Figure 2 demonstrates the following facts about the shapes of the hrf of the LZTBD, indicating that the LZTBD has all of the typical shapes, such as increasing, decreasing, and bathtub shapes for varying parameter values.
Furthermore, the choice of various specific functions for will provide various members of DLF. In the following, we list some DLF distributions available in the literature.
3.1. Poisson-Binomial Distribution
If we take
and
, based on (
3), the pmf of the considered distribution is obtained as
where
is the hypergeometric function. Since their pmfs coincide, the corresponding distribution is identified as the Poisson-binomial distribution (see [
23]).
3.2. Weighted Consul Distribution
If we take
and
, based on (
3), the pmf of the considered distribution can be derived as
which is the pmf of the weighted Consul distribution (see [
23]).
3.3. Weighted Delta Binomial Distribution
If we take
and
, based on (
3), the pmf of the considered distribution is obtained as
which corresponds to the pmf of the weighted delta binomial distribution (see [
23]).
3.4. Linear Function Binomial Distribution
If we take
and
, based on (
3), the pmf of the considered distribution can be derived as
which is the pmf of the linear function binomial distribution (see [
23]).
Proposition 2. Let X be an rv following the LZTBD. Then, the median of X is defined by the smaller integer m greater or equal to 1 such that Proof. By the definition, m is the smallest integer in the support of the rv, i.e., , such that , which is equivalent to the desired result. □
Proposition 3. Let X be a rv following the LZTBD. Then, the mode of X, denoted by xm, exists in and lies in the case:where . Proof. We must find the integer
for which
has the greatest value. That is, we aim to solve
and
. First, note that
can also be written as
where .
Obviously,
implies that
Additionally,
implies that
By combining (
9) and (
10), we obtain (
8). □
Proposition 4. The pgf of an rv X following the LZTBD is expressed aswhere . Proof. Using (
4), the pgf of the LZTBD is of the following form:
Thus, the proof is complete. □
Corollary 1. The moment generating function (mgf) of an rv X following the LZTBD is obtained by putting and in (11). That is,where . Corollary 2. The cumulant generating function (cgf) of an rv X following the LZTBD becomeswhere . Proposition 5. Let be n independently and identically distributed (iid) rvs following the LZTBD(). Then, the distribution of the random sum variable has the following pgf:where . Proof. Based on the pgf of the LZTBD given in (
11), the pgf of the rv
V becomes
This completes the proof. □
Proposition 6. For any integer , the factorial moment of an rv X following the LZTBD() is given bywhere . Proof. By definition, the
factorial moment of the LZTBD(
) is obtained by successively differentiating
given in (
11)
r times with respect to (wrt)
u and by putting
. First, note that
Taking the first derivative wrt
u on both sides, we obtain
Taking the second derivative of the above equation wrt
u on both sides, we obtain
Proceeding in a similar manner, the
derivative is of the following form:
Substitute
,
and
, in (
15), we obtain (
14). □
Proposition 7. The mean (μ) and variance () for the LZTBD are of the following forms, respectively,and Proof. On the other hand, we have
The desired expressions are obtained. □
A normalized measure of dispersion can be obtained by utilizing the variance-to-mean relationship. This measure is the well-known index of dispersion (IOD). The next result expressed it for the LZTBD, among others.
Proposition 8. The IOD and coefficient of variation (CV) for the LZTBD are given as, respectively,and Proof. Analogously, the CV is given by
□
A probabilistic model’s asymmetry degree and flatness are commonly assessed by their skewness and kurtosis coefficients, respectively. The third central moment, normalized by the variance raised to the power of
, can be used to calculate the first, whereas the fourth central moment divided by the square of the variance can be used to calculate the second. Mean, variance, CV, IOD, skewness, and kurtosis for selected values of parameters of the LZTBD(
) are summarized in
Table 1. From this table, it is evident that the LZTBD possesses both over-dispersion (IOD
) and under-dispersion (IOD
) for varying parameter values. It is also noted that the LZTBD is mainly right-skewed, and has several kurtosis levels.
4. Identifiability
Finite mixture models have received a lot of attention in recent years in real contexts. In astronomy, biology, genetics, medicine, psychiatry, marketing, and other fields, mixture models are widely utilized (see [
24]). We derive finite mixtures of the LZTBD(
) in this section. This mixed model may be appropriate in the context of future initiatives.
Let Y be a discrete rv with the pmf , where , such that , and . Then, we state that Y has a mixture distribution and is a finite mixture of distributions. The constants are known as mixing weights and , the components of the mixture. We denote as the collection of all distinct parameters in the components.
Let be the class of pmf’s from which mixtures are to be formed. Then, the class of finite mixtures of with the appropriate class of pmf’s is . So that is the convex hull of .
Definition 1. An intege-valued rv Y is said to have a g component mixture of the LZTBDs if it has the pmf of the following form:where , for each , ,with , and for each . A distribution with pmf given in (20) is called the Lagrangian zero-truncated binomial mixture distribution with g components (LZTBMDg). The following theorem from [
25] is adopted to construct the identifiability conditions of the finite mixture model.
Theorem 1. A necessary and sufficient condition for to be identifiable is that Δ should be linearly independent over the field of real numbers.
Proof. The proof is stated in [
25], hence, it is not included here. □
Next, applying Theorem 1, we outline the LZTBMDg’s identifiability requirements.
Theorem 2. The identifiability conditions for the LZTBMDg with the pmf as given in (20) are , , and for , such that . Proof. For the first step, take
and consider the following equation:
where
and
are any two arbitrary real numbers,
and
for
in which
is obtained from
by replacing
by
,
by
and
by
.
Assume that for each
and
,
and
. Thus, for
, we have
and
Now, from (
22)–(
24), we obtain the following equations:
and
Solving (
25) and (
26), we obtain
Hence, by (
27), we have
and thus,
. Therefore, it may be inferred from Theorem 2 that
and
are linearly independent. Now that the argument may be applied to any positive integer
g, the proof follows. □
Proposition 9. The pgf of the LZTBMDg given in (20) is indicated aswhere . Proof. The proof follows simply from Definition 1, given the pgf of the LZTBD mentioned in (
11). □
5. Estimation of Parameters
In this section, we estimate the unknown parameters of the LZTBD by the ML estimation method.
It is worth mentioning that the model corresponding to the LZTBD
is a tri-parametric model with parameters
. Let us have a random sample of size
n from LZTBD and let the observed frequency be
,
, so that
, where
k is the largest of the observed value having non-zero frequencies. Then, the likelihood function is given by
Therefore, the log-likelihood function is given by
where
. The ML estimates (MLEs) are defined by maximizing
wrt the parameters. Let us denote by
,
, and
the MLEs of
,
, and
, respectively. On the computational side, the score vector is
where the partial derivatives of
wrt the parameters are
and
The MLEs can then be found by setting the score vector to zero, i.e., , and solving them concurrently. These equations cannot be solved analytically, and the R statistical software can be used to solve them numerically by means of iterative techniques such as the Newton–Raphson algorithm.
7. Simulation
We perform a simulation study by generating observations employing the R software to examine the asymptotic behavior of the MLEs of the parameters of the LZTBD. Here, we apply the inverse transformation method to simulate a LZTBD random sample (see [
27]). The algorithm is as follows:
- Step 1:
Generate a random number from the uniform distribution.
- Step 2:
.
- Step 3:
If , set and stop.
- Step 4:
, .
- Step 5:
Go to Step 3.
Conceptually, P is the probability that , and F is the probability that X is less than or equal to i.
Additionally, indices such as MLEs, absolute biases, and mean squared errors (MSEs) are calculated using the following equations:
Average value of MLEs: MLE() = .
Absolute average bias: Bias() = .
MSE: MSE() = .
Here,
or
or
, and the index
i represents the
generated sample. The simulation takes into account sample sizes of
15, 50, 175, 500, and 1000 for two different sets of parameter values of the LZTBD. We repeat the process
times and report the estimates and MSEs in
Table 2. From this table, one can infer that the estimates are quite stable and, more precisely, close to the true parameter values for these sample sizes. A decreasing trend is being observed in the absolute average bias and MSEs as we increase the sample size. Hence, the performance of the ML estimation is quite consistent and reliable.