Bivariate Poisson 2Sum-Lindley Distributions and the Associated BINAR(1) Processes

: Discrete-valued time series modeling has witnessed numerous bivariate ﬁrst-order integer-valued autoregressive process or BINAR(1) processes based on binomial thinning and different innovation distributions. These BINAR(1) processes are mainly focused on over-dispersion. This paper aims to propose new bivariate distributions and processes based on a recently proposed over-dispersed distribution: the Poisson 2S-Lindley distribution. The new bivariate distributions, denoted by the abbreviations BP2S-L(I) and BP2S-L(II), are then used as innovation distributions for the BINAR(1) process. Properties are investigated for both distributions as well as for the BINAR(1) processes. The distribution parameters are estimated using the maximum likelihood method, and the BINAR(1)BP2S-L(I) and BINAR(1)BP2S-L(II) process parameters are estimated using the conditional least squares and conditional maximum likelihood methods. Monte Carlo simulation experiments are conducted to study large and small sample performances and for the comparison of the estimation methods. The Pittsburgh crime series and candy sales datasets are then used to compare the new BINAR(1) processes to some other existing BINAR(1) processes in the literature.


Introduction
Count data, or the number of times an event occurs over a set period of time, are becoming ever more abundant in all spheres of human life. In medicine, biology, ecology, economics, demography, and other fields, modeling of these data is becoming pivotal. Real-world situations frequently involve discrete bivariate data that are typically highly related. Some of the examples include counting the number of COVID-19 cases reported in a hospital and the number of deaths among them or counting the number of traffic accidents and the corresponding number of deaths. As a result, bivariate discrete models may be ideal for the statistical analysis of such data.
To this aim, multiple strategies for establishing bivariate random variables have been reported in the literature. Most of them are addressed in [1]. The use of mixed methods to construct discrete and continuous bivariate random variables is often explored in the statistical literature. For instance, see [2][3][4]. The key advantage of this approach is that its marginal probability density function (pdf) or even its moments, correlation, and certain other properties will have simple expressions. A further possibility is to construct it using a new family of distributions. The Sarmanov family of distributions (see [5]) can be used to create bivariate distributions with a variable covariance structure, both discrete and continuous. The authors of [6] looked into several generic approaches to the family f (y; θ) = θ 4 (1 + θ) 2 y y 2 6 + y + 1 e −θy , y > 0, with θ > 0. With the same number of parameters (one), the 2S-L distribution has direct stochastic ordering properties with the Lindley distribution, making it a viable alternative. Based on the 2S-L distribution, the authors of [15] recently proposed a discrete distribution by mixing the Poisson and 2S-L distributions, called the Poisson 2S-Lindley (P2S-L) distribution. A random variable X having the P2S-L distribution is characterized by the following stochastic structure: where Λ is a random variable that follows 2S-L(θ), θ > 0, X|Λ = λ ∼ D denotes " conditionally on Λ = λ, X has the D distribution", P(λ) denotes the Poisson distribution with parameter λ, and 2S-L(θ) denotes the 2S-L distribution with parameter θ. One can establish that the unconditional probability mass function (pmf) of X is P(x; θ) = θ 4 (1 + x) 6(1 + θ) 6+x x 2 + 6(θ + 2) 2 + x(11 + 6θ) , x = 0, 1, 2, .... (1) Some of the important properties related to moments of X are discussed below: The probability-generating function (pgf) of X is determined as for |s| < 1 + θ. The moment-generating function (mgf) of X is derived as for t ≤ log(1 + θ). From the function above, the mean and variance of X are easily obtained as and Var(X) = 2 θ 2 + 4θ + 2 respectively. In addition, the Fisher index of dispersion (DI) of X is DI(X) = Var(X) E(X) = 1 θ + 1 2 + 3θ + θ 2 .
One can remark that DI(X) can be less than or greater than 1, since θ is greater than 0. Thus, the P2S-L distribution can have under or over-dispersed properties. However, the study in [15] focuses on the over-dispersed case more deeply. Hence, the P2S-L distribution is effective in using it as an innovation distribution in an INAR(1) process based on binomial thinning, creating the INAR(1)P2S-L process. Such innovation distribution defined the process, its properties, and its effectiveness based on estimation and real data.

Construction of the Bivariate Distributions
In [18], the authors mentioned two methods for the construction of bivariate distributions based on the Poisson-Lindley distribution. That is, one used the basic method of bivariate construction, and the other used the Sarmanov family. In this section, we make use of those two methods for the construction of bivariate distributions: the BP2S-L(I) and BP2S-L(II) distributions.

The BP2S-L(I) Distribution
Here, a basic method is used for the construction of the BP2S-L(I) distribution. Definition 1. Let X = (X 1 , X 2 ) be a bivariate random vector such that with φ i > 0 and θ > 0. That is, conditionally on Λ = λ, the random variables X 1 and X 2 are independent, and the conditional distribution of X i , i = 1, 2, is univariate Poisson with parameters λφ i , denoted by X i | Λ = λ ∼ P(λφ i ). Then, we can say that X has the BP2S-L(I) distribution. In this case, the unconditional pmf of X is where x 1 , x 2 = 0, 1, 2, ..., φ 1 , φ 2 > 0 and θ > 0.
Proof. The pmf of the BP2S-L(I) distribution is obtained via the following integral development: The stated pmf is obtained. Now, a bivariate random vector X = (X 1 , X 2 ) having the BP2S-L(I) distribution is denoted as X ∼ BP2S-L(I)(θ, φ 1 , φ 2 ). If X ∼ BP2S-L(I)(θ, φ 1 , φ 2 ), the marginal pmf of X i , i = 1, 2, can be obtained by the usual procedure, and we have Now, the conditional pmf of X 2 |X 1 = x 1 with x 1 = 0, 1, 2, ... can be easily derived as for x 2 = 0, 1, 2, . . .. The pmf of X 1 |X 2 = x 2 can also be derived in a similar manner. The joint pgf of X is Now, the mean, variance, and covariance of X i , i = 1, 2, are computed as follows: The moment of order two of X i is given by Hence, the variance of X i can be derived as (11) and the covariance of X 1 and X 2 as It should be noted that Cov(X 1 , X 2 ) is always positive, implying that the BP2S-L(I) distribution is only appropriate for modeling bivariate data with positive correlations.

The BP2S-L(II) Distribution
The second distribution is based on the scheme characterized by the Sarmanov family of distributions. Indeed, in [5], the Sarmanov family of bivariate distributions based on specific density definitions was introduced. Here, we stress discrete bivariate densities based on the Sarmanov approach. Let us first define how the Sarmanov bivariate density can be formed. Let X 1 and X 2 be two discrete random variables having the supports χ 1 ⊆ R and χ 2 ⊆ R, respectively. Further, let us consider two functions, denoted by q i (x i ), i = 1, 2, and defined as bounded non-constant functions such that Then the joint pmf of X = (X 1 , X 2 ) for the Sarmanov family can be written as where ω is a real number, and ωq 1 (x 1 )q 2 (x 2 ) is a measure of the departure of X 1 and X 2 from independence such that the following condition is satisfied: Different choices for the functions q i (x i ), i = 1, 2, can give different cases. Here, following the spirit of [6], we use q i (x i ) = e −x i − L i (1), where L i (1) is the value of the Laplace transform of X i evaluated at s = 1. We recall that the Laplace transform of X i is given by The pmf of the Sarmanov-based bivariate P2S-L or BP2S-L(II) distribution having the P2S-L distribution as a marginal distribution is derived below. First, the Laplace transform for the P2S-L distribution is and, at s = 1, Then the joint pmf associated with a Sarmanov bivariate distribution takes the form Definition 2. The joint pmf of X = (X 1 , X 2 ) having the Sarmanov bivariate distribution with the P2S-L distribution as a marginal distribution is indicated as where x 1 , x 2 = 0, 1, 2, ..., with θ 1 , θ 2 > 0 and ω satisfies (15).
The bivariate random vector X = (X 1 , X 2 ), having the joint pmf (20), is hereafter denoted as X ∼ BP2S-L(II)(θ 1 , θ 2 , ω). Hence, if X ∼ BP2S-L(II)(θ 1 , θ 2 , ω), the mean and variance of X i , i = 1, 2, are and respectively. The covariance between X 1 and X 2 is computed as where u i = E(X i e −X i − X i L i (1)). More precisely, for the BP2S-L(II) distribution, we get Further, note that the sign of Cov(X 1 , X 2 ) depends on the sign of ω.

Estimation and Simulation of the Bivariate Distributions
In this section, the estimation of the corresponding parameters of the BP2S-L(I) and BP2S-L(II) distributions, as well as some Monte Carlo experiments for the simulation of parameters, are carried out in detail. The method of maximum likelihood (ML) is used for the estimation of parameters. For both distributions, two sets of parameter values for the sample sizes n = 25, 50, 100, 200, 400 and N = 1000 number of replications are considered.

Simulation of Parameters of the BP2S-L(I) Distribution
The MLEs of the BP2S-L(I) distribution parameters are analyzed using a simulation study. The following two sets of parameter values are considered: (θ = 0.1, φ 1 = 0.2, φ 2 = 0.4) and (θ = 1.5, φ 1 = 1.2, φ 2 = 1.4). The bias and mean square errors (MSEs) of the MLEs of the parameters are computed, and the results are reported in Table 1. Table 1. Simulation results for the BP2S-L(I) distribution.

Sample Size (n)
Parameters

Estimation for BP2S-L(II) Distribution
Let (x 1,i , x 2,i ), i = 1, 2, ..., n, be the observations of a n random sample from X ∼ BP2S-L(II)(θ 1 , θ 2 , ω). Then the log of the likelihood function is where λ = (θ 1 , θ 2 , ω) and P( (20). Then, (26) had to be maximized to find estimates for λ. Here also, the optimization technique PORT routines into the nlminb function in the R software is used to obtain the MLEs of the parameters in the BINAR(1)BP2S-L(I) process.

Simulation of Parameters of the BP2S-L(II) Distribution
In this part, the MLEs of the BP2S-L(II) distribution parameters are analyzed using a simulation study. The two following sets of parameter values are used: (θ 1 = 0.1, θ 2 = 0.6, ω = −0.3) and (θ 1 = 1.7, θ 2 = 1.1, ω = 0.6). The bias and MSEs of the estimates of the parameters are computed, and the results are reported in Table 2. Table 2. Simulation results for the BP2S-L(II) distribution.

Sample Size (n)
Parameters

The Bivariate INAR(1) Processes with Paired P2S-L Innovation Distributions
This section deals with the model development of BINAR(1) processes with the BP2S-L(I) and BP2S-L(II) distributions as innovation distributions.

General Definition
where • in (27) denotes the binomial thinning operator introduced in [19], which is described as where {C j } j∈Z is a sequence of independent and identically distributed Bernoulli random variables with parameter p. We assume that the innovation vector t in (27) is ( t,1 , t,2 ) and that t,j is independent of Y s,j , j = 1, 2 for each t and s, s < t. In addition, let the innovation vector be independent of the counting series in thinning operator •. Now, we proceed by assuming t has both of the discussed distributions, and then we develop the corresponding BINAR(1) processes.

BINAR(1) Process with BP2S-L(I) Innovation Distributions
For the BINAR(1) process discussed in (27), we suppose that the innovation vector Suppose that the process Y t = (Y t,1 , Y t,2 ), t = 1, 2, ..., is a BINAR(1)BP2S-L(I) process. In this case, the mean, variance, and DI of Y t,i , i = 1, 2, are given by respectively. Note that the DI for the BINAR(1)BP2S-L(I) process is over-dispersed marginally, even though the P2S-L distribution shows under and over-dispersion properties. Now, the conditional mean and variance of components of the process are, for i = 1, 2, and respectively. The covariance of Y t,1 and Y t,2 is The conditional joint pmf of the process is given by where y t = (y t,1 , y t,2 ), u = min(y t,1 , y t−1,1 ), v = min(y t,2 , y t−1,2 ),

BINAR(1) Process with BP2S-L(II) Innovation Distributions
Here, we describe the BINAR(1)BP2S-L(II) process. To this aim, based on the BINAR(1) process, we suppose the innovation vector satisfies: t = ( t,1 , t,2 ) ∼ BP2S-L(II)(θ 1 , θ 2 , ω). Further, we assume the assumptions made for the construction of the BINAR (1) and respectively. In particular, based on the DI, we note that the BINAR(1)BP2S-L(II) process is marginally over-dispersed here also. Now, one step ahead gives the conditional expectation and variance of components of the process for i = 1, 2 as and respectively. Furthermore, we have where The conditional joint pmf of the BINAR(1)BP2S-L(II) process can be determined by using (34), with the exception that P( t,1 = 1 , t,2 = 2 ) is swapped by (20).
In addition, under the stationary condition 0 < p i < 1 (see [20]), for both of the models, E(Y t,i ), Var(Y t,i ) for i = 1, 2 and Cov(Y t,1 , Y t,2 ) do not depend on t, and Var(Y t,i ) is finite.

Estimation and Simulation of Parameters of the BINAR(1) Processes
The estimation of the parameters and hence the simulation procedures of both the BINAR(1) processes discussed above are studied in detail. For the estimation, the methods of conditional maximum likelihood (CML) and conditional least squares (CLS) are applied.

Method of Conditional Least Squares Estimation
The CLS estimate (CLSE) of Θ for the BINAR(1)BP2S-L(I) process is obtained by minimizing the following equations with respect to Θ: for i = 1, 2, Here, we used the quasi-Newton approach (BFGS algorithm in particular) available in the optim function of R software for obtaining the CLSE.

Method of Conditional Maximum Likelihood Estimation
The CML estimate(CMLE) of Θ for the BINAR(1)BP2S-L(I) process is computed using the conditional log-likelihood function of the BINAR(1)BP2S-L(I) process. The conditional log-likelihood can be obtained by substituting (34) in the following equation: The CMLE is obtained by maximizing (42) with respect to Θ. Furthermore, the consistency and asymptotic normality of the random version of the CMLE under standard regularity conditions are demonstrated in [21,22]. Further, the covariance is obtained as the inverse of Hessian (see [23]), and standard errors (SEs) are the square root of diagonal elements of the covariance matrix. Here, the optimization routines in the nlminb function and fdHess function in the R software are used to obtain the CMLE, observed information matrix, and SEs of the estimates of the parameters in the BINAR(1)BP2S-L(I) process.

Simulation Study for BINAR(1)BP2S-L(I) Process
The estimates obtained for the unknown parameters of the BINAR(1)BP2S-L(I) process via the two methods discussed above are assessed through a simulation study. Here, N = 1000 samples, each of sizes n = 25, 50, 100, 200, 400 are taken for the two following sets of parameter values: (θ = 0.1, φ 1 = 0.2, φ 2 = 0.4; p 1 = 0.2, p 2 = 0.15) and (θ = 1.2, φ 1 = 1.5, φ 2 = 1.9, p 1 = 0.8, p 2 = 0.6). For each n, bias and MSEs were calculated and are reported in Tables 3 and 4.  Table 3. Simulation results for the BINAR(1)BP2S-L(I) process for the set of parameter values:  Tables 3 and 4 show that as the sample size increases, the bias and MSE corresponding to each parameter decrease for both methods. Although the CLS method performs slightly better than the CML method for the second set of parameters, we further proceed with the CML method since the model comparison is effective with it. Table 4. Simulation results for the BINAR(1)BP2S-L(I) process for the set of parameter values: θ = 1.2, φ 1 = 1.5, φ 2 = 1.9, p 1 = 0.8, p 2 = 0.6.

Method of Conditional Least Squares Estimation
The CLSE of Θ * for the BINAR(1)BP2S-L(II) process is obtained by minimizing H * i , i = 1, 2, 3, with respect to Θ * , where, for i = 1, 2, and Here also, we used the BFGS algorithm available in the optim function of the R software for obtaining the CLSE.

Method of Conditional Maximum Likelihood Estimation
The CMLE of Θ * for the BINAR(1)BP2S-L(II) process utilizes the conditional loglikelihood function indicated as where P(Y t = y t | Y t−1 = y t−1 ) is obtained via the joint pmf of ( t,1 , t,2 ) by (20). The CMLE is obtained by maximizing (45) according to Θ * . Moreover, the consistency and asymptotic normality of the random version of the CMLE under standard regularity conditions can be proven as given by referring to [21,22]. The covariance is obtained as the inverse of the Hessian matrix, and SEs is the square root of diagonal elements of the covariance matrix.
Here also, the optimization routines in the nlminb function and the fdHess function in R software is used to obtain the CMLE, observed information matrix, and hence the SEs of estimates of the parameters in the BINAR(1)BP2S-L(II) process.

Simulation Study for the BINAR(1)BP2S-L(II) Process
The estimates obtained for the unknown parameters of the BINAR (1) Tables 5 and 6. Table 5. Simulation results for the BINAR(1)BP2S-L(II) process for the set of parameter values:  Table 6. Simulation results for the BINAR(1)BP2S-L(II) process for the set of parameter values: Tables 5 and 6 make it clear that as the sample size increases, the bias and MSE of each parameter decrease for both methods. Here we proceed with the CML method for further analysis since it is evident that it performs better than the CLS method.

Empirical Study
The results of the suggested BINAR(1) processes are presented in this section using two real, over-dispersed time series count datasets.
The estimates of the parameters using the CML method (we used CLS estimates as initial values) along with their SEs and confidence intervals (CIs), -Log-Likelihood (-L), Akaike information criterion (AIC), Bayesian information criterion (BIC), and the root MSE (RMSE) of both the series for all the models described above are calculated. The RMSE represents the sum of squared differences between true values and one-step conditional expectations. Further, as the authors of [25] suggested, the standardized Pearson residuals are calculated to check the accuracy of the BINAR(1)BP2S-L(II) process for both datasets. They are calculated with the following formula:

Crime Series Data
The first data we used are the crime series data from the Pittsburgh police agencies in the file PghCarBeat.csv for the monthly period January 1990 to December 2001. The dataset consists of criminal records of drug activities (CDRUGS) and shooting activities (CSHOTS) in the 12 th police car beats in Pittsburgh, with a sample size of 144 each, downloaded from the website www.forecastingprinciples.com. The average CDRUGS and CSHOTS data values are 5.1736 and 5.7569, respectively, with corresponding variances of 13.1794 and 14.2412, indicating a clear over-dispersion. The plots of the time series, ACF, partial ACF (PACF) of the CDRUGS and CSHOTS data are given in Figures 1, 2 and 3, respectively.
The PACF plots make it clear that the data can be used since only the first lag is significant. Further, Figure 4 displays the cross-correlation function (CCF) plot.  From Figure 4, we observe that there is a significant cross-correlation in lag 2 between the two time series, which displays positive autocorrelation among both series, which is obvious since after drug use there is a tendency to shoot and vice versa.  Table 7 consists of the CMLEs, AIC, BIC, and RMSEs for the BINAR(1) processes considered here for the crime series dataset.  From Table 7, we observe that the BINAR(1)BP2S-L(II) process performs well for the data because it has the smallest values for the AIC and BIC. The Pearson standardized residuals were calculated for the BINAR(1)BP2S-L(II) process, and we found out that they have the means −0.046 and −0.0261, and variances 1.060 and 1.076, respectively. The ACF plots for standardized residuals for both time series are plotted in Figure 5. It clearly indicates that they are uncorrelated, which clearly shows that the BINAR(1)BP2S-L(II) process is accurate and gives a good fit to the crime series dataset. The cpgrams of Pearson residuals of both the series for the crime series dataset are plotted in Figure 6. From Figure 6, we can infer that the residuals exhibit randomly and without trend distribution.

Sales data of candies
The second dataset is related to some sales data of products on the market. The data are provided by the Kilts Center for Marketing, Graduate School of Business of the University of Chicago. Approximately nine years (1989-1997) of store-level data on the sales of 3500+ UPCs are available in this database (available on the website: http://research.chicagobooth. edu/marketing/databases/dominicks.) Here, we choose two products of candies, which are chewing gum from store '56'. The first product, named Candies1, is the 'CHICLETS TINY PACK', the UPC of which is '1254612128' and the second product, named Candies2, is the 'TIC TAC WINTERGREEN', the UPC of which is '980000007'. The sales of these two products from   The PACF plots make it clear that the data can be used since only the first lag is significant. Further, Figure 10 displays the CCF plot for candies datasets. The CCF plot shows that the coefficient of lag 0 deviates significantly from 0, which implies the occurrence of cross-correlation. Table 8 consists of the CMLEs, AIC, BIC, and RMSEs for the BINAR(1)s considered here for the sales of candies dataset. From Table 8, we observe that the BINAR(1)BP2S-L(II) process performs well for the data because it has the smallest values for the AIC and BIC. The Pearson standardized residuals were calculated for the BINAR(1)BP2S-L(II) process, and we found out that they have the means −0.0422 and −0.0023, and variances 1.2940 and 0.9683, respectively. The ACF plots for standardized residuals for both time series are plotted in Figure 11. It clearly indicates that they are uncorrelated, which clearly shows that the BINAR(1)BP2S-L(II) process is accurate and gives a good fit to the sales of candies dataset. The Pearson residual coefficients of both series for the candies sales dataset are plotted in Figure 12. From Figure 12, we can also infer that the residuals exhibit randomly and without trend distribution.

Concluding Remarks
In this paper, bivariate distributions, namely the BP2S-L(I) and BP2S-L(II) distributions, are constructed based on two different approaches. Both distributions are studied in detail, and the main mathematical properties are derived. The unknown parameters of these distributions were estimated using the ML method. Simulation studies were carried out, and the results show consistent estimates under both distributions. Most importantly, the BINAR(1)BP2S-L(I) and BINAR(1)BP2S-L(II) processes were created with these paired innovation distributions. The unknown parameters of these BINAR(1) processes were estimated by employing the CML and CLS techniques. Both techniques were compared using simulation studies. The crime series and candy sales datasets are then analyzed using these BINAR(1) processes, and the results show that the BINAR(1)BP2S-L(II) process outperforms some other newly proposed BINAR(1) processes in terms of model adequacy metrics. As a result, the BINAR(1) process with various BP2S-L innovation distributions can be regarded as meritorious bivariate time series models that compete with those already published.

Funding:
The authors received no financial support for the research, authorship, and/or publication of this article.

Data Availability Statement:
The sources of the data used in this study are provided.