Abstract
The importance of Lagrangian distributions and their applicability in real-world events have been highlighted in several studies. In light of this, we create a new zero-truncated Lagrangian distribution. It is presented as a generalization of the zero-truncated binomial distribution (ZTBD) and hence named the Lagrangian zero-truncated binomial distribution (LZTBD). The moments, probability generating function, factorial moments, as well as skewness and kurtosis measures of the LZTBD are discussed. We also show that the new model’s finite mixture is identifiable. The unknown parameters of the LZTBD are estimated using the maximum likelihood method. A broad simulation study is executed as an evaluation of the well-established performance of the maximum likelihood estimates. The likelihood ratio test is used to assess the effectiveness of the third parameter in the new model. Six COVID-19 datasets are used to demonstrate the LZTBD’s applicability, and we conclude that the LZTBD is very competitive on the fitting objective.
Keywords:
Lagrangian zero-truncated binomial distribution; index of dispersion; maximum likelihood method; generalized likelihood ratio test; COVID-19; simulation MSC:
60E05; 62E10; 62F10
1. Introduction
Certain discrete distributions whose support is a set of positive integers are known as zero-truncated discrete distributions (ZTDDs). ZTDDs are used in ecology to represent data relating to counts, such as the number of flower heads, fly eggs, European red mites, or the number of times snowshoe hares were collected over seven days. These distributions are also employed in sociology to simulate data such as the size of human groups in parks, beaches, and public locations. As a result, ZTDDs have applications in practically every discipline of study, including biology, medicine, psychology, demography, and political science. In particular, the zero-truncated Poisson distribution (ZTPD) was used in [1] to analyze the number of eggs and gall-cell counts in flower heads. The authors of [2] used the ZTPD to model deer hunting in California. The author of [3] employed the zero-truncated negative binomial distribution (ZTNBD) to model the number of children ever born to a sample of moms over 40 years old; additionally, the authors of [4] used the ZTNBD in a regression model to treat over-dispersed count data of ischemic stroke hospitalizations. The author of [5] analyzed stroke count data based on the ZTPD, ZTNBD, and zero-truncated generalized negative binomial distribution (ZTGNBD). The application of the ZTNBD in the investigation of rare species abundance and hospital stays was discussed in [6]. The authors of [7] considered the use of ZTBD as a randomization device.
Considering the health aspect, many different diseases, ranging from the ordinary cold to much more dangerous ailments like Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS), can be caused by the large family of viruses known as coronaviruses. The first cases of novel coronavirus (COVID-19) were found in Wuhan, China, in 2019 and the World Health Organization (WHO) has proclaimed it to be a pandemic. A coordinated international effort has been launched to halt the virus from spreading further, and the scientific community has contributed by starting various investigations. When it comes to model phenomena, statisticians play a critical role, and several attempts have already been made in the statistical literature. To estimate the daily new COVID-19 instances in China, the author of [8] used a mathematical model called SIR distribution. The authors of [9] developed a discrete version of the generalized Lindley distribution to model the daily new cases and deaths in the COVID-19 count data. A discrete type-2 half logistic exponential distribution was presented in [10] for estimating the number of COVID-19 deaths in Pakistan and Saudi Arabia. To model COVID-19 data in Singapore, the authors of [11] employed a discrete Marshall–Olkin inverted Topp–Leone distribution. Following the discovery of such a widespread epidemic, at least one new positive case is reported daily in practically all nations. To the best of our knowledge, ZTDDs are the most appropriate statistical model for such a situation. As far as we know, not even one statistician has attempted to model regularly occurring positive instances using ZTDDs. Hence, in this article, our aim is to propose a novel ZTDD to model the daily new positive cases. Furthermore, based on the same ZTDD, we also tried to model the number of deaths attributable to COVID-19 in a day.
On the other hand, Lagrangian distributions are a subclass of Lagrangian expansions, which were initially introduced in [12]. The authors of [13,14] introduced a discrete Lagrangian family (DLF) of probability distributions, which encompasses a vast and important class of probability distributions. It includes many families. Additionally, the authors of [14] showed that, under certain conditions, all discrete Lagrangian distributions converge to the normal and inverse Gaussian distributions. The author of [15], who discovered the Lagrangian negative binomial distribution, demonstrated its utility in a queuing process. The authors of [16] created the Lagrangian Katz family. The authors of [17] looked at how Lagrangian probability distributions can be employed to solve inferential difficulties in random mapping theory. The generalized Poisson gamma dependency model was developed in [18] using Lagrangian probability models. For collisional turbulent fluid-particle flows, the authors of [19] used the Lagrangian probability density function (pdf) models. The above-mentioned importance of the Lagrangian distributions immensely motivated us to propose a new ZTDD based on the Lagrangian approach. Therefore, based on the Lagrangian technique, we propose a unique ZTDD known as the Lagrangian zero-truncated binomial distribution (LZTBD) that can serve as a discrete model for a variety of count datasets.
The remaining parts of the paper are organized as follows: Section 2 presents some preliminaries of the Lagrangian probability distribution. In Section 3, we discuss the definition and properties of the LZTBD. The finite mixture of the new Lagrangian model is displayed in Section 4. In Section 5, we derive the maximum likelihood (ML) estimation method to estimate the unknown parameters of the LZTBD. The significance of the additional parameter is tested by using a generalized likelihood ratio test in Section 6. The finite sample performance of the ML estimation method is analyzed in Section 7 with a simulation study. Six real-world datasets are considered in Section 8 to demonstrate the usefulness of the proposed model. The concluding remarks are given in Section 9.
2. Some Preliminaries
In order to introduce the DLF, we consider the following Lagrange expansion presented in [20,21]:
where , , , with and
Consider the Lagrange expansion given in (1), where and are successively differentiable analytic functions over [−1, 1] such that , , and . A new type of probability mass function (pmf) was defined in [13,22], and it is indicated as follows:
provided that is finite.
This pmf defined the DLF in the broad sense.
The corresponding probability generating function (pgf) is given by
where .
Given the applications of the DLF built with and in (3), it is worthwhile to investigate additional horizon distributions using the new function . This is the basis for the study’s updated distribution, which is shown below.
3. Construction of Lagrangian Zero-Truncated Binomial Distribution
The LZTBD is introduced in this section as a new member of the DLF.
Proposition 1.
Assume that the random variable (rv) X follows the LZTBD, in which , and . Then, the pmf of X is given by
where stands for the generalized binomial coefficient, that is .
Proof.
Let and which satisfy the statements given in Section 2. Using the DLF given in (3), the pmf of the LZTBD can be derived as follows:
Thus, the proof is completed. □
The distribution described in (5) is denoted as LZTBD(), and one can note to inform that an rv denoted by X follows the LZTBD with parameters , , and . Some special cases from the LZTBD are described below:
- For , the LZTBD() reduces to the one-parameter ZTBD. In this sense, the LZTBD is a generalization of the ZTBD;
- For , LZTBD() reduces to the Lagrangian weighted Consul distribution given in [23].
Now, Figure 1 portrays the graphical representation of the LZTBD for different parameter values of , , and .

Figure 1.
Various shapes of the probability mass function (pmf) of the LZTBD for different values of the parameters.
The hazard rate function (hrf) of the LZTBD is obtained by substituting the pmf in the following equation:
Following (6), it goes without saying that determining the closed form expression of the hrf is more difficult, although, in order to determine the shape of the hrf, we sketch its graph. Figure 2 demonstrates the following facts about the shapes of the hrf of the LZTBD, indicating that the LZTBD has all of the typical shapes, such as increasing, decreasing, and bathtub shapes for varying parameter values.
Figure 2.
Various shapes of the hazard rate function (hrf) of the LZTBD for different parameter values.
Furthermore, the choice of various specific functions for will provide various members of DLF. In the following, we list some DLF distributions available in the literature.
3.1. Poisson-Binomial Distribution
If we take and , based on (3), the pmf of the considered distribution is obtained as
where is the hypergeometric function. Since their pmfs coincide, the corresponding distribution is identified as the Poisson-binomial distribution (see [23]).
3.2. Weighted Consul Distribution
If we take and , based on (3), the pmf of the considered distribution can be derived as
which is the pmf of the weighted Consul distribution (see [23]).
3.3. Weighted Delta Binomial Distribution
If we take and , based on (3), the pmf of the considered distribution is obtained as
which corresponds to the pmf of the weighted delta binomial distribution (see [23]).
3.4. Linear Function Binomial Distribution
If we take and , based on (3), the pmf of the considered distribution can be derived as
which is the pmf of the linear function binomial distribution (see [23]).
Proposition 2.
Let X be an rv following the LZTBD. Then, the median of X is defined by the smaller integer m greater or equal to 1 such that
Proof.
By the definition, m is the smallest integer in the support of the rv, i.e., , such that , which is equivalent to the desired result. □
Proposition 3.
Let X be a rv following the LZTBD. Then, the mode of X, denoted by xm, exists in and lies in the case:
where .
Proof.
We must find the integer for which has the greatest value. That is, we aim to solve and . First, note that can also be written as
where .
Obviously, implies that
Additionally, implies that
Proposition 4.
The pgf of an rv X following the LZTBD is expressed as
where .
Proof.
Using (4), the pgf of the LZTBD is of the following form:
Thus, the proof is complete. □
Corollary 1.
The moment generating function (mgf) of an rv X following the LZTBD is obtained by putting and in (11). That is,
where .
Corollary 2.
The cumulant generating function (cgf) of an rv X following the LZTBD becomes
where .
Proposition 5.
Let be n independently and identically distributed (iid) rvs following the LZTBD(). Then, the distribution of the random sum variable has the following pgf:
where .
Proof.
This completes the proof. □
Proposition 6.
For any integer , the factorial moment of an rv X following the LZTBD() is given by
where .
Proof.
By definition, the factorial moment of the LZTBD() is obtained by successively differentiating given in (11) r times with respect to (wrt) u and by putting . First, note that
Taking the first derivative wrt u on both sides, we obtain
Taking the second derivative of the above equation wrt u on both sides, we obtain
Proceeding in a similar manner, the derivative is of the following form:
Proposition 7.
The mean (μ) and variance () for the LZTBD are of the following forms, respectively,
and
A normalized measure of dispersion can be obtained by utilizing the variance-to-mean relationship. This measure is the well-known index of dispersion (IOD). The next result expressed it for the LZTBD, among others.
Proposition 8.
The IOD and coefficient of variation (CV) for the LZTBD are given as, respectively,
and
Proof.
We have
Analogously, the CV is given by
□
A probabilistic model’s asymmetry degree and flatness are commonly assessed by their skewness and kurtosis coefficients, respectively. The third central moment, normalized by the variance raised to the power of , can be used to calculate the first, whereas the fourth central moment divided by the square of the variance can be used to calculate the second. Mean, variance, CV, IOD, skewness, and kurtosis for selected values of parameters of the LZTBD() are summarized in Table 1. From this table, it is evident that the LZTBD possesses both over-dispersion (IOD ) and under-dispersion (IOD ) for varying parameter values. It is also noted that the LZTBD is mainly right-skewed, and has several kurtosis levels.
Table 1.
Values of some moment measures of the LZTBD for various values of parameters , , and .
4. Identifiability
Finite mixture models have received a lot of attention in recent years in real contexts. In astronomy, biology, genetics, medicine, psychiatry, marketing, and other fields, mixture models are widely utilized (see [24]). We derive finite mixtures of the LZTBD() in this section. This mixed model may be appropriate in the context of future initiatives.
Let Y be a discrete rv with the pmf , where , such that , and . Then, we state that Y has a mixture distribution and is a finite mixture of distributions. The constants are known as mixing weights and , the components of the mixture. We denote as the collection of all distinct parameters in the components.
Let be the class of pmf’s from which mixtures are to be formed. Then, the class of finite mixtures of with the appropriate class of pmf’s is . So that is the convex hull of .
Definition 1.
An intege-valued rv Y is said to have a g component mixture of the LZTBDs if it has the pmf of the following form:
where , for each , ,
with , and for each .
A distribution with pmf given in (20) is called the Lagrangian zero-truncated binomial mixture distribution with g components (LZTBMDg).
The following theorem from [25] is adopted to construct the identifiability conditions of the finite mixture model.
Theorem 1.
A necessary and sufficient condition for to be identifiable is that Δ should be linearly independent over the field of real numbers.
Proof.
The proof is stated in [25], hence, it is not included here. □
Next, applying Theorem 1, we outline the LZTBMDg’s identifiability requirements.
Theorem 2.
The identifiability conditions for the LZTBMDg with the pmf as given in (20) are , , and for , such that .
Proof.
For the first step, take and consider the following equation:
where and are any two arbitrary real numbers, and for in which is obtained from by replacing by , by and by .
Assume that for each and , and . Thus, for , we have
and
Hence, by (27), we have and thus, . Therefore, it may be inferred from Theorem 2 that and are linearly independent. Now that the argument may be applied to any positive integer g, the proof follows. □
Proposition 9.
Proof.
The proof follows simply from Definition 1, given the pgf of the LZTBD mentioned in (11). □
5. Estimation of Parameters
In this section, we estimate the unknown parameters of the LZTBD by the ML estimation method.
It is worth mentioning that the model corresponding to the LZTBD is a tri-parametric model with parameters . Let us have a random sample of size n from LZTBD and let the observed frequency be , , so that , where k is the largest of the observed value having non-zero frequencies. Then, the likelihood function is given by
Therefore, the log-likelihood function is given by
where . The ML estimates (MLEs) are defined by maximizing wrt the parameters. Let us denote by , , and the MLEs of , , and , respectively. On the computational side, the score vector is
where the partial derivatives of wrt the parameters are
and
The MLEs can then be found by setting the score vector to zero, i.e., , and solving them concurrently. These equations cannot be solved analytically, and the R statistical software can be used to solve them numerically by means of iterative techniques such as the Newton–Raphson algorithm.
6. Likelihood Ratio Test
In this section, we test the significance of an additional parameter included in the LZTBD using the generalized likelihood ratio test (GLRT) (see [26]).
More precisely, to test the significance of the parameter of the LZTBD , here, we consider the GLRT procedure. The null hypothesis “X follows the ZTBD” against the alternative hypothesis “X follows the LZTBD”. Here, the test statistic is given by
where is the vector of MLEs of with no constraints, and is the MLEs of under .
7. Simulation
We perform a simulation study by generating observations employing the R software to examine the asymptotic behavior of the MLEs of the parameters of the LZTBD. Here, we apply the inverse transformation method to simulate a LZTBD random sample (see [27]). The algorithm is as follows:
- Step 1:
- Generate a random number from the uniform distribution.
- Step 2:
- .
- Step 3:
- If , set and stop.
- Step 4:
- , .
- Step 5:
- Go to Step 3.
Conceptually, P is the probability that , and F is the probability that X is less than or equal to i.
Additionally, indices such as MLEs, absolute biases, and mean squared errors (MSEs) are calculated using the following equations:
- Average value of MLEs: MLE() = .
- Absolute average bias: Bias() = .
- MSE: MSE() = .
Here, or or , and the index i represents the generated sample. The simulation takes into account sample sizes of 15, 50, 175, 500, and 1000 for two different sets of parameter values of the LZTBD. We repeat the process times and report the estimates and MSEs in Table 2. From this table, one can infer that the estimates are quite stable and, more precisely, close to the true parameter values for these sample sizes. A decreasing trend is being observed in the absolute average bias and MSEs as we increase the sample size. Hence, the performance of the ML estimation is quite consistent and reliable.
Table 2.
The maximum likelihood estimates (MLEs) simulation results for the parameters , , and .
8. Applications and Empirical Study
The aim of this section is to show the empirical importance of the LZTBD. We employ six genuine datasets to apply the superiority of the LZTBD fit to the more notable fields of COVID-19 with different nations, including Italy, Senegal, Pakistan, Saudi Arabia, Belgium, and Ethiopia. The graphical method used to determine the hrf of the data set is based on the Total Time on Test (TTT). Convex, concave, convex-then-concave, and concave-then-convex empirical TTT plots correspond to decreasing, increasing, bathtub shape, and upside-down bathtub shape for the corresponding hrf, respectively (see [28]). We employ the statistical software R to evaluate these datasets numerically. To show the possible benefit of the LZTBD, the distributions below are depicted:
- ZTBD with parameters and , which has the following pmf:
- Zero-truncated generalized binomial distribution (ZTGBD) with parameters , , and with the following pmf:
- Zero-truncated discrete two parameter Poisson–Lindley distribution (ZTDTPPLD) with parameters and (see [29]), which have the following pmf:
- Zero-truncated Poisson–Lindley distribution (ZTPLD) with parameters (see [30]), which has the following pmf:
- Intervened generalized Poisson distribution (IGPD) with parameters , , and (see [31]), which has the following pmf:
8.1. COVID-19 Data Set from Italy
Italy’s 61-day COVID-19 data collection, conducted from 13 June to 12 August 2021, is accessible in [11]. Daily newly reported cases are included in this data collection. The descriptive measures of the real data set, which include sample size (n), minimum (min), first quartile (), median (), third quartile (), maximum (max), and inter-quartile range () are given in Table 3.
Table 3.
Descriptive statistics for the COVID-19 data set of Italy.
In addition, Figure 3 shows an empirical TTT plot of the data, from which we deduce an increasing hrf.
Figure 3.
Total Time on Test (TTT) plot for the COVID-19 data set of Italy.
We compare the competitive distributions to the LZTBD using the statistical techniques provided, namely, the negative log-likelihood (), Akaike information criterion (AIC), Bayesian information criterion (BIC), and statistic. Table 4 displays the corresponding MLEs, model adequacy measures, and values. The LZTBD has lower model adequacy measures and values than the other distributions studied, as shown in Table 4. As a result, the suggested model is the most appropriate for modeling the given COVID-19 data. It is interesting to note that the empirical mean, variance, and IOD of this COVID-19 data set are 22.6229, 160.3388, and 7.0874, respectively, and the theoretical values for the mean, variance, and IOD measures of the LZTBD are 21.6234, 160.3248, and 7.4144, respectively. Thus, the empirical and theoretical means are almost the same, and the empirical and theoretical variances and IOD values are close to each other.
Table 4.
Maximum likelihood estimates (MLEs), model adequacy measures and values for the COVID-19 data set of Italy.
In the case of GLRT, the calculated value based on the test statistic (29) is (p-value ). As a result, at any level greater than 0.0001, the null hypothesis is rejected in favor of the alternative hypothesis. Hence, we conclude that the additional parameter in the LZTBD is significant in light of the test procedure outlined in Section 6.
8.2. COVID-19 Data Set from Senegal
The LZTBD is fitted to another set of data for the COVID-19 in Senegal for 56 days of infection, which was recorded from 29 March 2021 to 23 May 2021. These data, which show the daily incidence of COVID-19 cases, were gathered by the World Health Organization (WHO) and are accessible at http://covid19.who.int/data, (accessed on 24 August 2022). Table 5 includes some information as well as descriptive statistics for these data.
Table 5.
Descriptive statistics for the COVID-19 data set of Senegal.
In addition, Figure 4 shows an empirical TTT plot of the data from which an increasing hrf is revealed.
Figure 4.
Total Time on Test (TTT) plot for the COVID-19 data set of Senegal.
We compare the competitive distributions to the suggested distribution using the statistical techniques provided, specifically, the , AIC, BIC, and values. Table 6 displays the corresponding MLEs, model adequacy measures, and values of the LZTBD. The LZTBD’s model adequacy measures and values are less than those of the other examined models. As a result, the suggested model is the most appropriate for modeling the COVID-19 data from Senegal. It is worth noting that the empirical mean, variance, and IOD of these COVID-19 datasets are 46.54, 394.326, and 8.47, respectively, and the theoretical values for the mean, variance, and IOD measures of the LZTBD are 46.4, 394.324, and 8.49, respectively. Thus, the empirical and theoretical means are almost the same, and the empirical and theoretical variances and IOD values are very close to each other.
Table 6.
Maximum likelihood estimates (MLEs), model adequacy measures, and value for the Senegal data set.
In the case of GLRT, the calculated value based on the test statistic (29) is (p-value ). As a result, at any level greater than 0.0004, the null hypothesis is rejected in favor of the alternative hypothesis. Hence, we conclude that the additional parameter in the LZTBD is significant in light of the test procedure outlined in Section 6.
8.3. COVID-19 Data Set from Pakistan
The LZTBD is fitted to another set of data for the COVID-19 in Pakistan for 95 days of infection, which was recorded from 23 May 2021 to 25 August 2021. These data, which are available at http://covid19.who.int/data, (accessed on 24 August 2022), were acquired by the WHO and show the daily incidence of COVID-19 cases. Table 7 contains some information and descriptive statistics for these data.
Table 7.
Descriptive statistics for the COVID-19 data set of Pakistan.
In addition, Figure 5 shows an empirical TTT plot of the data, showing an increasing hrf.
Figure 5.
Total Time on Test (TTT) plot for the COVID-19 data set of Pakistan.
Using the statistical methods offered, specifically the , AIC, BIC, and values, we compare the competing distributions to the suggested distribution. Table 8 displays the corresponding MLEs, model adequacy measures, and values of the LZTBD. The LZTBD’s model adequacy measures and values are less than those of the other examined models. The suggested model is, therefore, the most suitable one to model the COVID-19 data from Pakistan. In addition, let us mention that the empirical mean, variance, and IOD of this COVID-19 dataset are 52.6842, 538.5801, and 10.2228, respectively, and the theoretical values for the mean, variance, and IOD measures of the LZTBD are 52.6704, 538.5509, and 10.2249, respectively. We thus observe that the empirical and theoretical means are almost equal, and the empirical and theoretical variances and IOD values are very close to each other.
Table 8.
Maximum likelihood estimates (MLEs), model adequacy measures, and value for the Pakistan data set.
In the case of GLRT, the calculated value based on the test statistic (29) is (p-value ). As a result, at any level greater than 0.0001, the null hypothesis is rejected in favor of the alternative hypothesis. Hence, we conclude that the additional parameter in the LZTBD is significant in light of the test procedure outlined in Section 6.
8.4. COVID-19 Data Set from Saudi Arabia
The LZTBD is fitted to another set of data of COVID-19 mortality numbers in Saudi Arabia for 83 days of infection, which was recorded from 30 May to 20 August 2020. The WHO gathered these data, which represent the number of deaths per day, and they are available at http://covid19.who.int/data, (accessed on 24 August 2022). Table 9 contains some information and descriptive statistics for these data.
Table 9.
Descriptive statistics for the COVID-19 data set of Saudi Arabia.
In addition, Figure 6 shows an empirical TTT plot of the data and it shows an increasing hrf.
Figure 6.
Total Time on Test (TTT) plot for the COVID-19 data set of Saudi Arabia.
We compare the competitive distributions to the suggested distribution using the statistical techniques provided, specifically, the , AIC, BIC, and values. Table 10 displays the corresponding MLEs, model adequacy measures, and values of the LZTBD. The LZTBD’s model adequacy measures and values are less than those of the other examined models. For modeling the COVID-19 data from Saudi Arabia, the suggested model is therefore the most suitable. Furthermore, the empirical mean, variance, and the IOD values of this COVID-19 dataset are 36.9277, 70.5313, and 1.9099, respectively, and the theoretical values for the mean, variance, and IOD measures of the LZTBD are 36.8724, 71.0567, and 1.9270, respectively. Hence, the empirical and theoretical means are almost the same, and the empirical and theoretical variances and IOD values are very close to each other.
Table 10.
Maximum likelihood estimates (MLEs), model adequacy measures, and value for the Saudi Arabia data set.
In the case of GLRT, the calculated value based on the test statistic (29) is (p-value ). As a result, at any level greater than 0.0002, the null hypothesis is rejected in favor of the alternative hypothesis. Hence, we conclude that the additional parameter in the LZTBD is significant in light of the test procedure outlined in Section 6.
8.5. COVID-19 Data Set from Belgium
A different set of data on the COVID-19 infection in Belgium for 425 days (more than a year), which was recorded from 22 July 2021 to 19 September 2022, is fitted using the LZTBD. The WHO gathered these data, which represent the number of deaths per day, and are accessible at http://covid19.who.int/data, (accessed on 24 August 2022). Table 11 contains some information and descriptive statistics for these data.
Table 11.
Descriptive statistics for the COVID-19 data set of Belgium.
In addition, Figure 7 shows an empirical TTT plot from which we can distinguish an increasing hrf.
Figure 7.
Total Time on Test (TTT) plot for the COVID-19 data set of Belgium.
We compare the competitive distributions to the suggested distribution using the statistical techniques provided, specifically, the , AIC, BIC, and values. Table 12 displays the corresponding MLEs, model adequacy measures, and values of the LZTBD. The LZTBD’s model adequacy measures and values are less than those of the other examined models. As a result, the suggested model is the most appropriate for modeling the COVID-19 data from Belgium. It is worth noting that the empirical mean, variance, and IOD of this COVID-19 dataset are 17.122, 178.419, and 10.420, respectively, and the theoretical values for the mean, variance, and IOD measures of the LZTBD are 17.213, 178.412, and 10.365, respectively. Thus, the empirical and theoretical means are almost the same, and the empirical and theoretical variances and IOD values are very close to each other.
Table 12.
Maximum likehood estimates (MLEs), model adequacy measures and value for the Belgium data set.
In the case of GLRT, the calculated value based on the test statistic (29) is (p-value ). As a result, at any level greater than 0.0012, the null hypothesis is rejected in favor of the alternative hypothesis. Hence, we conclude that the additional parameter in the LZTBD is significant in light of the test procedure outlined in Section 6.
8.6. COVID-19 Data Set from Ethiopia
The LZTBD is fitted to another set of data on the COVID-19 infection in Ethiopia for 301 days, which was recorded from 25 August 2020 to 21 June 2021. The WHO collected these data, which represent the number of deaths per day, and are accessible at http://covid19.who.int/data, (accessed on 24 August 2022). Table 13 contains some information and descriptive statistics for these data.
Table 13.
Descriptive statistics for the COVID-19 data set of Ethiopia.
As an additional result, Figure 8 shows an empirical TTT plot of the data, where an increasing hrf can be seen.
Figure 8.
Total Time on Test (TTT) plot for the COVID-19 data set of Ethiopia.
We use the statistical techniques provided to compare the competitive distributions to the suggested distribution, specifically, the , AIC, BIC, and values. Table 14 displays the corresponding MLEs, model adequacy measures, and values of the LZTBD. The LZTBD’s model adequacy measures and values are less than those of the other examined models. The suggested model is therefore the most suitable one to model the Ethiopian COVID-19 data. In addition, it is found that the empirical mean, variance, and IOD of this COVID-19 dataset are 11.973, 67.00, and 5.5959, respectively, and the theoretical values for the mean, variance, and IOD measures of the LZTBD are 11.891, 67.02, and 5.6361, respectively. Thus, the empirical and theoretical means are almost equal, and the empirical and theoretical variances and IOD values are very close to each other.
Table 14.
Maximum likelihood estimates (MLEs), model adequacy measures, and value for the Ethiopia data set.
In the case of GLRT, the calculated value based on the test statistic (29) is (p-value ). As a result, at any level greater than 0.0002, the null hypothesis is rejected in favor of the alternative hypothesis. Hence, we conclude that the additional parameter in the LZTBD is significant in light of the test procedure outlined in Section 6.
9. Conclusions
In this article, we used the Lagrange expansion to elaborate a new three-parameter distribution called the Lagrangian zero-truncated binomial distribution (LZTBD). It is worth noting that the proposed distribution is a generalized form of the well-known zero-truncated binomial distribution and the Lagrangian weighted Consul distribution. In particular, we paid close attention to the LZTBD. We investigated the shape properties of the probability mass and hazard functions. The expressions for the factorial moments, generating functions, mean, and median were derived. The identifiability of the LZTBD model was also proved. The LZTBD’s model parameters are estimated using the maximum likelihood estimation method. A study employing the simulation technique was also performed to show how well the maximum likelihood estimates are performing. Six actual datasets were used to validate the applicability and demonstrate that the LZTBD offers a superior fit to the competing models.
Author Contributions
Conceptualization, M.R.I., C.C., D.S.S., M.M. and R.M.; methodology, M.R.I., C.C., D.S.S., M.M. and R.M.; software, M.R.I., C.C., D.S.S., M.M. and R.M.; validation, M.R.I., C.C., D.S.S., M.M. and R.M.; formal analysis, M.R.I., C.C., D.S.S., M.M. and R.M.; investigation, M.R.I., C.C., D.S.S., M.M. and R.M.; resources, M.R.I., C.C., D.S.S., M.M. and R.M.; data curation, M.R.I., C.C., D.S.S., M.M. and R.M.; writing—original draft preparation, M.R.I., C.C., D.S.S., M.M. and R.M.; writing—review and editing, M.R.I., C.C., D.S.S., M.M. and R.M.; visualization, M.R.I., C.C., D.S.S., M.M. and R.M. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Acknowledgments
The authors would also like to thank three reviewers for their thorough comments which led to improvement in the presentation of the manuscript.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Finney, D.; Varley, G. An example of the truncated Poisson distribution. Biometrics 1955, 11, 387–394. [Google Scholar] [CrossRef]
- Creel, M.; Loomis, J. Theoretical and Empirical Advantages of Truncated Count Data Estimators for Analysis of Deer Hunting in California. Am. J. Agric. Econ. 1990, 72, 434–441. [Google Scholar] [CrossRef]
- Brass, W. Simplified methods of fitting the truncated negative binomial distribution. Biometrika 1959, 45, 59–68. [Google Scholar] [CrossRef]
- Lee, A.; Wang, K.; Yau, K.; Somerford, P. Truncated Negative Binomial Mixed Regression Modelling of Ischaemic Stroke Hospitalizations. Stat. Med. 2003, 22, 1129–1139. [Google Scholar] [CrossRef]
- Kennedy. Does Race Predict Stroke Readmission? An Analysis Using the Truncated Negative Binomial Model. J. Natl. Med. Assoc. 2005, 97, 699–713. [Google Scholar]
- Phange, A.; Loh, E. Zero Truncated Strict Arcsine Model. Int. J. Comput. Electr. Autom. Control. Inf. Eng. 2013, 7, 989–991. [Google Scholar]
- Zapata, Z.; Sedory, S.; Singh, S. Zero-truncated Binomial Distribution as a Randomization Device. Sociol. Methods Res. 2019, 51, 800–815. [Google Scholar] [CrossRef]
- Nesteurk, I. Statistics-based predictions of coronavirus epidemic spreading in mainland China. Innov. Biosyst. Bioeng. 2020, 4, 8–13. [Google Scholar] [CrossRef]
- El-Morshedy, M.; Altun, E.; Eliwa, M. A new statistical approach to model the counts of novel coronavirus cases. Math. Sci. 2021, 16, 37–50. [Google Scholar] [CrossRef]
- Ahsan-ul Haq, M.; Babar, A.; Hashmi, S.; Alghamdi, A.S.; Afify, A.Z. The Discrete Type-II Half-Logistic Exponential Distribution with Applications to COVID-19 Data. Pak. J. Stat. Oper. Res. 2021, 7, 921–932. [Google Scholar] [CrossRef]
- Almetwally, E.; Abdo, D.; Hafez, E.; Jawa, T.; Sayed-Ahmed, N.; Almongy, H. The new discrete distribution with application to COVID-19 Data. Results Phys. 2022, 32, 104987. [Google Scholar] [CrossRef] [PubMed]
- Lagrange, J.L. Mécanique Analytique; Jacques Gabay: Paris, France, 1788. [Google Scholar]
- Consul, P.C.; Shenton, L.R. Use of Lagrange expansion for generating generalized probability distributions. SIAM J. Appl. Math. 1972, 23, 239–248. [Google Scholar] [CrossRef]
- Consul, P.C.; Shenton, L.R. Some interesting properties of Lagrangian distributions. Commun. Stat. 1973, 2, 263–272. [Google Scholar] [CrossRef]
- Mohanty, S.G. On a generalized two-coin tossing problem. Biom. Z. 1966, 8, 266–272. [Google Scholar] [CrossRef]
- Consul, P.C.; Famoye, F. Lagrangian Katz family of distributions. Commun. Stat. Theory Methods 1996, 25, 415–434. [Google Scholar] [CrossRef]
- Berg, K.; Nowicki, K. Statistical inference for a class of modified power series distribution with applications to random mapping theory. J. Stat. Plan. Inference 1991, 28, 247–261. [Google Scholar] [CrossRef]
- Li, S.; Black, D.; Lee, C.; Famoye, F. Dependence Models Arising from the Lagrangian Probability Distributions. Commun. Stat.—Theory Methods 2010, 29, 1729–1742. [Google Scholar] [CrossRef]
- Innocenti, A.R.; Fox, O.; Chibbaro, S. A Lagrangian probability density-function model for collisional turbulent fluid-particle flows. J. Fluid Mech. 2019, 862, 449–489. [Google Scholar] [CrossRef]
- Jensen, J.L. Sur une identité d’ Abel et sur d’ autres formules analogues. Acta Math. 1902, 26, 307–318. [Google Scholar] [CrossRef]
- Riordan, J. Combinatorial Identities; John Wiley and Sons, Inc.: Newyork, NY, USA, 1968. [Google Scholar]
- Janardan, K.G.; Rao, B.R. Lagrangian distributions of second kind and weighted distributions. SIAM J. Appl. Math. 1983, 43, 302–313. [Google Scholar] [CrossRef]
- Consul, P.C.; Famoye, F. Lagrangian Probability Distributions; Birkhäuser: New York, NY, USA, 2006. [Google Scholar]
- McLachlan, G.; Peel, D. Finite Mixture Models; Wiley: Hoboken, NJ, USA, 2000. [Google Scholar]
- Titterington, D.M.; Smith, A.F.; Markov, U.E. Statistical Analysis of Finite Mixture Distributions; Wiley: Hoboken, NJ, USA, 1985. [Google Scholar]
- Rao, C.R. Minimum variance and the estimation of several parameters. Math. Proc. Camb. Philos. Soc. 1947, 43, 280–283. [Google Scholar] [CrossRef]
- Ross, S. Simulation, 5th ed.; Academic Press: Cambridge, MA, USA, 2013; pp. 5–38. [Google Scholar] [CrossRef]
- Aarset, M.V. How to identify a bathtub hazard rate. IEEE Trans. Reliab. 1987, 36, 106–108. [Google Scholar] [CrossRef]
- Shanker, R.; Shukla, K.K. Zero-Truncated Discrete Two-Parameter Poisson-Lindley Distribution with Applications. J. Inst. Sci. Technol. 2018, 22, 76–85. [Google Scholar] [CrossRef]
- Shanker, R.; Fesshaye, H.; Selvaraj, S.; Yemane, A. On zero-truncation of Poisson and Poisson-Lindley distributions and their applications. Biom. Biostat. Int. J. 2015, 2, 168–181. [Google Scholar] [CrossRef]
- Scollnik, D.M. On the Intervened Generalized Poisson Distribution. Commun. Stat.—Theory Methods 2006, 35, 953–963. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).