Discrete Pseudo Lindley Distribution: Properties, Estimation and Application on INAR(1) Process

: In this paper, we introduce a discrete version of the Pseudo Lindley (PsL) distribution, namely, the discrete Pseudo Lindley (DPsL) distribution, and systematically study its mathematical properties. Explicit forms gathered for the properties such as the probability generating function, moments, skewness, kurtosis and stress–strength reliability made the distribution favourable. Two different methods are considered for the estimation of unknown parameters and, hence, compared with a broad simulation study. The practicality of the proposed distribution is illustrated in the ﬁrst-order integer-valued autoregressive process. Its empirical importance is proved through three real datasets.


Introduction
Count data reflect the non-negative integers which represent the frequency of occurrence of a discrete event. Such datasets can be observed in numerous fields, such as actuarial science, finance, medical, sports, etc. For instance, the yearly number of destructive floods, the number of sports people injured in a month and the hourly number of COVID-19 vaccinations given are some examples of count data. Increasing the utilization of discrete distributions for modelling such datasets influenced researchers to propose more flexible distributions by reducing the estimation errors. Discretizing continuous distributions by survival discretization is one of the widely followed methods for introducing discrete distributions. The most famous discretization technique is described below. Assume that X is a continuous lifetime random variable with the survival function (sf) S(x) = Pr(X > x). Then, the probability mass function (pmf) dealing with X is given by: Pr(X = x) = S(x) − S(x + 1), x = 0, 1, 2, . . . (1) Some of the recently introduced discrete distributions based on this survival discretization method are as follows: Discrete Lindley distribution by [1], discrete inverse Weibull distribution by [2], discrete Pareto distribution by [3], discrete Rayleigh distribution by [4], two-parameter discrete Lindley distribution by [5], exponentiated discrete Lindley distribution by [6], discrete Burr-Hatke distribution by [7], discrete Bilal distribution [8], discrete three-parameter Lindley distribution by [9], etc. Recently, Ref. [10] proposed a discrete version of Ramos-Louzada distribution [11] for asymmetric and over-dispersed data with a leptokurtic shape.
The parameter β can be considered as a shape parameter and θ as a scale parameter. The DPsL distribution can sometimes be denoted by the DPsL (θ, β) distribution to indicate the parameters.
The corresponding cdf and sf are given by: respectively. As a first property, the pmf given in (3) is log concave, since: is a decreasing function in x for every possible value of the parameters. The possible pmf shapes plotted for different values of the parameters of the DPsL distribution are displayed in Figure 1.  The figure clearly indicates that the DPsL distribution is rightly skewed and has a longer right tail.

If
is not an integer, x m is given as the integer part of is an integer, the DPsL distribution is bimodal, with the modes given by x < 0, the mode of the DPsL distribution is x m = 0. The hrf of the DPsL distribution can be obtained as: The hrf of the DPsL distribution was plotted for some set of values for θ and β in Figure 2.

Identifiability
A set of unknown parameters of a model is stated to be identifiable if different sets of parameters give different distributions for a given x. Here, the identifiability property of the DPsL distribution is proved. Let P DPsL (x; λ 1 ) and P DPsL (x; λ 2 ) be different pmfs of the DPsL distribution indexed by λ 1 = (θ 1 , β 1 ) and λ 2 = (θ 2 , β 2 ), respectively. Then, the likelihood ratio is given by: Taking logarithm of this ratio, we obtained: Now, by considering x as a continuous variable and taking the partial derivative of log U with respect to x and equating it to 0, we obtained: which implies that: By performing x → +∞, we obtained 0 = according to θ 2 > θ 1 or θ 2 < θ 1 , respectively, which is impossible since θ 1 > 0 and θ 2 > 0. Therefore, θ 1 = θ 2 . By taking into account this equality, by taking x = 0 in (5), we obtained which is possible if, and only if, β 1 = β 2 . Therefore, we concluded that the DPsL model is identifiable and that the parameters uniquely determine the distribution, that is, P DPsL (x; λ 1 ) = P DPsL (x; λ 2 ) ⇐⇒ λ 1 = λ 2 .

Moments, Skewness and Kurtosis
In the rest of the study, X denotes a random variable that follows the DPsL distribution. Then, the probability generating function (pgf) of X can be derived as: When s in pgf is substituted by e t , the moment generating function (mgf) follows as: By using the well-known relationship between M(t) and the (standard) moments of X, the first four moments of the DPsL distribution are: Based on E(X) and E(X 2 ), the variance of X follows from the Koenig-Huygens formula as: Expressions for skewness and kurtosis of the DPsL distribution can be derived explicitly by using the following formulas: 4 [Var(X)] 2 .

Coefficient of Variation and Dispersion Index
The expressions of the coefficient of variation (CV) and dispersion index (DI) of X are given by: respectively. In full generality, when the DI is one, the distribution is equi-dispersed, and if DI is greater than (less than) one, the distribution is over-dispersed (under-dispersed). Some numerical values of the mean, variance, DI, skewness and kurtosis for the DPsL distribution for some values of the parameters are presented in Tables 1 and 2.
From the information contained in these tables, it is clear that the DPsL distribution would be an appropriate option for modelling under as well as over-dispersed and positively skewed datasets.

Mean Residual Lifetime and Mean Past Lifetime
The mean residual lifetime (mrl) and mean past lifetime (mpl) of a component are two widely used measures to study the ageing behaviour of components. Both measures characterize the distribution uniquely. By assuming that the lifetime of a component is modelled by X, the mrl of X at i = 0, 1, 2, . . . is defined as: That is: Furthermore, the mpl of X is another reliability measure that corresponds to the time elapsed since the failure of X given that the system has already failed before some i. Thus, the mpl of X at i = 1, 2, . . . is defined by: where ζ * (0) = 0. That is:

Stress-Strength Analysis
Stress-strength reliability has wide applications in almost all fields of engineering and machine learning. Let X stress and X strength be random variables that model the stress and strength of a system, respectively. Then, the expected reliability can be calculated by the following formula: where P X (x) and S X (x) denote the pmf and sf, respectively, of a random variable X. Suppose that X stress and X strength are two independent random variables following the DPsL (θ 1 , β 1 ) and DPsL (θ 2 , β 2 ) distributions, respectively. Then, from (3) and (4), the expected reliability is obtained in closed form as: Some numerical values for Re Stress−Strength for different values of the parameters are given in Tables 3-5. From Tables 3 and 4, it is clear that the expected reliability increases (decreases) as β 1 → ∞ (β 2 → ∞). In addition, from Table 5, the expected reliability (decreases) as

Generating Random Values from the DPsL Distribution
Random values from the DPsL distribution can be generated by following the algorithm given below.

1.
Generate u as a realization of a random variable U with the U(0,1) distribution.

2.
With the expression of the quantile function of the PsL distribution in mind, compute: where W −1 (x) denotes the negative branch of Lambert-W function.

3.
Then, x = y represents a realization of a random variable with the DPsL distribution.
To generate a random sample of size n, repeat the algorithm n times.

Estimation Methods
The estimation of unknown parameters of a distribution is critical in accurately determining the behaviour of this distribution. Here, we use classical methods of estimation such as the method of maximum likelihood (mle) and weighted least square (wls) estimation for this purpose.

Maximum Likelihood Estimation
Let X 1 , X 2 , . . . , X n be a random sample taken from the DPsL (θ, β) distribution, and x 1 , x 2 , . . . , x n be observations of this random sample. The likelihood function is given by: and the log likelihood function is given by: Then, the maximum likelihood estimates (MLEs) of θ and β were obtained by maximizing L or log L with respect to these parameters. They can also be determined as the solutions of the normal equations given by: and Equations (9) and (10) can be solved by numerical optimization techniques using mathematical software such as MATHEMATICA, MATHCAD and R.

Weighted Least Squares Estimation
Let X (1) , X (2) , ..., X (n) be the order statistics of a random sample taken from the DPsL (θ, β) distribution, and x (1) , x (2) , . . . , x (n) be observations of these random variables. The weighted least squares estimates (WLEs) of the parameters θ and β of the DPsL distribution were obtained by maximizing the following function with respect to θ and β:

Simulation Study
The current section deals with examining the efficiency of two estimation methods for estimating the parameters of the DPsL distribution using simulation. Estimates were calculated for different values of parameters ((θ = 0.5, β = 1) and (θ = 2.2, β = 1.5)) for various sample sizes (25, 50, 75, 100) using the two estimation methods discussed and, thus, compared. Then, N = 1000 samples of values from the DPsL distribution using methods discussed in Section 2.7 were generated. The indices such as values of the estimates, mean square errors (MSEs), average absolute biases (Bias) and average mean relative estimates (MREs) were calculated in R software using the following formulas: where ζ = θ or β, and the index i refers to the ith sample. Simulation results, including values of estimates, Bias, MSEs and MREs for the two parameters θ and β of the DPsL distribution using the estimation approaches discussed, are reported in Tables 6 and 7.

INAR(1) Process with DPsL Innovations
Numerous fields, such as agriculture, epidemiology, actuarial science, finance, etc., have come across certain time series of counts. Analysing these kinds of datasets using the INAR(1) process was first applied using Poisson innovations by [12,13]. Suppose that {ε t } t∈Z are the innovations, so are independent and identically distributed (iid) random variables, with E(ε t ) = µ ε and variance Var(ε t ) = σ 2 ε . A stochastic process {X t } t∈Z defined as: with 0 ≤ p < 1, is stated to be an INAR(1) process. The symbol • is called as binomial thinning operator, which can be described as: where {U j } j∈Z is a sequence of iid Bernoulli random variables with parameter p. The one step transition probability of the INAR(1) process is given by: where B denotes a random variable following the Binomial (n, p) distribution. The mean, variance and dispersion index (DI) of {X t } t∈Z are given by [21]. They are: Var(X t ) = pµ ε + σ 2 ε 1 − p 2 (12) and where µ ε , σ 2 ε and DI ε are the mean, variance and DI of the innovation distribution. The results of [12,13] influenced us to propose a new INAR(1) process with DPsL innovations, which are capable of modelling over as well as under-dispersed count datasets. Suppose that {ε t } t∈Z follow a DPsL distribution; then, the one step transition probability matrix of the corresponding process is: which hereafter is called the INAR(1)DPsL process. By substituting µ ε , σ 2 ε , and DI ε in (11)-(13) with (6)-(8), the mean, variance and DI of the INAR(1)DPsL process could be attained. The conditional expectation and variance of the INAR(1)DPsL process are given by: and respectively, where µ ε and σ 2 ε are given in (6) and (7), respectively (see [13,21]).

Estimation
Here, the inference of the INAR(1)DPsL process was examined using two estimation methods: the conditional maximum likelihood (CML) and Yule-Walker (YW) methods. A simulation study was performed to assess the efficiency of the two methods.

Conditional Maximum Likelihood
Let X 1 , X 2 , . . . , X T be a random sample taken from the INAR(1)DPsL process, and x 1 , x 2 , . . . , x T be observations of this random sample. Then, the conditional log likelihood function of the INAR(1)DPsL process is given by: where Θ = (θ, β, p) is the vector of unknown parameters to be estimated. Maximizing (16) with respect to Θ yields the CML estimates (CMLEs). In this regard, we used the optimfunction in R software for the same. In addition, the fdHess function in R was used to obtain the observed information matrix and, hence, the standard errors (SE) of estimates of parameters in the INAR(1)DPsL process.

Yule-Walker
The YW estimates (YWEs) of the INAR(1)DPsL process were computed by solving simultaneous equations of sample and theoretical moments. Since the autocorrelation function (ACF) of the INAR(1) process at lag h was ρ x (h) = p h , the YWE of p is given by: Now, the YWEs for θ and β were obtained by solving the equations of sample mean equals theoretical mean and sample dispersion equals theoretical dispersion of the process. Here, by denoting asθ YW andβ YW the YWEs of θ and β, respectively, the following relationship holds:β where x = ∑ T t=1 x t /N. Substitutingβ YW with (17) in (13) and equating (13) to sample dispersion, we obtainedθ YW .

Simulation of INAR(1)DPsL Process
Here, a simulation study was conducted to comprehensively determine the performance of CMLEs and YWEs of the parameters of the INAR(1)DPsL process. In this regard, we generated N = 1000 samples each of sizes n = 25, 50, 100 from the proposed distribution for two sets of parameter values (θ = 0.1, β = 1.1 and θ = 3, β = 4). For each n, average absolute bias, MSE and MRE for the parameters were calculated for the two methods. The simulation results are presented in Table 8. From the above table, we observed that the average biases, MSEs and MREs of CMLEs tended to zero quicker than those of YWEs, making them efficient for small as well as large sample sizes. Therefore, the CML estimation was preferred to attain unknown parameters of the INAR(1)DPsL process.

Empirical Study
Three real datasets were used in this section to illustrate the performance of the DPsL distribution over some competitive distributions. The capability of the fitted distributions was compared using the goodness of fit criterion with its corresponding p-value.

Failure Times
The data of failure times for a sample of 15 electronic components in an acceleration life test (see [22]) were considered here. These data were based on the discretization concept. Adopting a data analysis setting, we compared the DPsL, discrete three-parameter Lindley (DTPL) (see [9]), discrete log-logistic (DLL) (see [23]), discrete inverse Weibull (DIW) (see [2]), discrete Burr-Hutke (DBH) (see [6]), discrete Pareto (DP) (see [3]), Poisson (P) and geometric (G) distributions. The MLEs with standard errors (SEs) and confidence intervals (CIs) for the parameter(s), estimated −log Likelihood (−L), Akaike information criterion (AIC), Bayesian information criterion (BIC) and goodness of fit statistic (Kolmogorov statistic (K-S) and p-value) of these distributions for this dataset are given in Table 9.  Table 9, it is evident that, besides the DPsL distribution, the DTPL, G and DLL distributions also performed quite well, but it is clear that the DPsL distribution was the best among them, since it had the lowest K-S, AIC and BIC, with a higher p-value. In order to illustrate this claim, Figure 3 provides the probability-probability (P-P) plots, and Figure 4 displays the estimated cdfs of the fitted distributions. From the above figures, we could infer that the DPsL distribution yielded a better fit among other fitted distributions. Table 10 completes these results by presenting some descriptive measures of the fitted DPsL distribution. Hence, it is evident that the fitted DPsL distribution was over dispersed, moderately right skewed and leptokurtic.

Numbers of Borers
The second dataset was the biological experiment data, which represented the number of European corn borer (No. ECB) larvae Pyrausta in the field (see [24]). It was an experiment conducted randomly on eight hills in 15 replications, and the experimenter counted the number of borers per hill of corn. The fits of the DPsL distribution were compared together with some competitive distributions which were the new Poisson weighted exponential (NPWE) (see [16]), DIW, discrete Burr-XII (DBXII) (see [23]), discrete Bilal (DBl) (see [8]), DP, DBH and Poisson (P) distributions. The MLEs with their corresponding SEs, CIs under the form (lower bound of the CI (LCI), upper bound of the CI (UCI)) for the parameter(s) and goodness of fit test for the numbers of borers dataset are reported in Table 11. From the above table, it is evident that, besides the DPsL distribution, the NPWE distribution also performed quite well, but it is clear that the DPsL distribution was the best among them, since it had the lowest −L, AIC, BIC and χ 2 value with the highest p-value.
From Figure 5, we could infer that the DPsL distribution yielded a better fit among other fitted distributions. To complete this, Table 12 contains some descriptive measures of the fitted DPsL distribution. Hence, here also, it is evident that the fitted DPsL distribution was over-dispersed, moderately right skewed and leptokurtic.

4.
For the INAR(1)G process: The third data we used here were to illustrate the application of the DPsL distribution in the INAR(1) process. Originally, the data were studied by [25], which consisted of 67 monthly claims for short-term disability benefits made by injured workers to the B.C. Workers' Compensation Board (WCB). These data were reported from the BC Center, Richmond, for the period of 10 years from 1985 to 1994. The mean, variance, and DI of the dataset were 8.6042, 11.2392 and 1.3062, respectively. To check whether the data considered had statistically significant over-dispersion, the hypothesis test proposed by [26] was applied. The value test statistic was 51.971 with a p-value less than 0.001, which showed the data had significant over-dispersion. Figure 6 displays the plots of the autocorrelation function (ACF), partial ACF (PACF), histogram and time series plots, and in the PACF plot the unique first lag significance indicated that these data could be used for modelling the INAR(1) process.   The residual analysis was conducted to check whether the fitted INAR(1)DPsL process was accurate. For that, Pearson residuals for the INAR(1)DPsL process were calculated through the following formula: where E(X t | X t−1 = x t−1 ) and Var(X t | X t−1 = x t−1 ) were derived from (14) and (15), respectively. When the fitted INAR(1) process was statistically valid, the Pearson residual had to be uncorrelated and should have had zero mean and unit variance [27]. Here, we obtained the mean and variance of the Pearson residuals of the INAR(1)DPsL process as 0.035 and 0.967, respectively, which were very close to the desired values. According to the results of [28], the INAR(1)DPsL process for the data was where the innovation process was such that ε t follows the DPsL (0.4835, 1.9214) distribution. Predicted values of the monthly number of claims dataset and the ACF plot of the Pearson residuals via this process were displayed in Figure 7. Based on this figure, the ACF plot of the Pearson residuals specified that there was no presence of autocorrelation for the Pearson residuals.

Concluding Remarks
In this paper, a two-parameter discrete distribution, namely, the discrete Pseudo Lindley (DPsL) distribution, was proposed. Its primary motivation is the ability to model various phenomena with under-and over-dispersed observed values. Various statistical properties, almost all having a closed form, revealed the flexibility and simplicity of the distribution. The estimation of the unknown parameters was performed using two different methods. They conducted an extensive simulation study to reveal the finite sample performance of the distribution. Crucially, a new INAR(1) process with DPsL innovations was developed and studied in detail. Three real-life datasets were considered to prove the efficiency of the proposed distribution. As a future work, we could consider other methods of discretization for the PsL distribution, which would then provide better properties than the survival discretization method. Furthermore, we can attempt to extend it to bivariate models. We hope that the DPsL distribution, as well as the related modelling strategy, will be an interesting alternative to modelling count data, especially in modelling the over-dispersed count data.