A New Bivariate Random Coefﬁcient INAR(1) Model with Applications

: Excess zeros is a common phenomenon in time series of counts, but it is not well studied in asymmetrically structured bivariate cases. To ﬁll this gap, we ﬁrst considered a new ﬁrst-order, bivariate, random coefﬁcient, integer-valued autoregressive model with a bivariate innovation, which follows the asymmetric Hermite distuibution with ﬁve parameters. An attractive advantage of the new model is that the dependence between series is achieved by innovative parts and the cross-dependence of the series. In addition, the time series of counts are modeled with excess zeros, low counts and low over-dispersion. Next, we established the stationarity and ergodicity of the new model and found its stochastic properties. We discuss the conditional maximum likelihood (CML) estimate and its asymptotic property. We assessed ﬁnite sample performances of estimators through a simulation study. Finally, we demonstrate the superiority of the proposed model by analyzing an artiﬁcial dataset and a real dataset.


Introduction
Much effort has recently been put into the study of integer-valued time series models, a particular class of which are known as univariate constant coefficient INAR(1) models [1]; see [2][3][4] for recent reviews on this topic. However, the autoregressive coefficient may be affected by various environmental factors in practice; thus, random coefficient integer-valued autoregressive models have been proposed, such as the random coefficient INAR(1) model [5], and it was generalized to the p-order case by [6]; see [7][8][9][10][11][12][13] for recent developments.
Bivariate count data occur in many contexts, often as the counts of two asymmetric events, objects or individuals during a certain period of time. For example, such counts occur in epidemiology when two kinds of related diseases are examined, and in criminology when two kinds of crimes are considered. The multivariate constant coefficient INAR model was proposed by [14]. The model is defined through the thinning operator "•" due to [15]. Let A be a 2 × 2 matrix with entries α ij satisfying α ij ∈ (0, 1), for i, j = 1, 2 and X = (X 1 , X 2 ) be an integer-valued random vector. Then where α ij • X j is a univariate binomial thinning operator, and ∀i, j, α ij • X j are mutually independent. Then the bivariate INAR(1) model (BINAR(1)) is defined as where t is a sequence of i.i.d. non-negative integer-valued random vectors with a finite mean and finite variance, and it is independent with A • X t−1 ; see [14] for details. Throughout the rest of the paper, t will be referred to as innovational vectors.
Pedeli and Karlis [16] discussed a tractable BINAR(1) model, and their emphasis was placed on models with bivariate Poisson and bivariate negative binomial innovational vectors, which can be used to deal with bivariate count data with equi-dispersion and over-dispersion, but with little flexibility. Pedeli and Karlis [17] gave a further discussion of properties of the multivariate INAR(1) model. Based on a finite range of counts, Scotto et al. [18] considered the density-dependent bivariate binomial autoregressive models by using their state-dependent thinning concept. Popović [19] proposed a BINAR(1) model whose survival components are generated by different binomial distributions.
The above models assumed that parameter α ij is not affected by various environmental factors. Based on this point, a new bivariate random coefficient INAR(1) model was proposed. Popović [20] proposed a bivariate INAR(1) model with random coefficients, but it ignores the cross-correlation of the process {X t } and it assumes random coefficients follow a specified binomial distribution. Cui and Zhu [21] proposed a bivariate integervalued GARCH with flexible cross-correlation, which is other direction. See [22] for the bivariate Poisson integer-valued GARCH model. Excess zeros is a common phenomenon in time series of counts, and it has been addressed by many authors, such as [23,24], but it is not well studied in the bivariate case. The motivation of this paper was to extend the Popović's model and propose a new bivariate random coefficient INAR(1) model, which provides a method to tackle the data with excess zeros, low counts and low over-dispersion.
This paper is organized as follows. We first recall the asymmetric bivariate Hermite distribution, and a bivariate binomial thinning operator with random coefficients, including some basic properties; then we propose the new model and discuss its stochastic properties in Section 2. Section 3 uses the conditional maximum likelihood (CML) to estimate unknown parameters and discusses the asymptotic properties of their CML estimators. A simulation and two examples are given in Sections 4 and 5, respectively. Conclusions are drawn in Section 6. All proofs of theorems and propositions, including some auxiliary results are given in Appendix A.
The index of dispersion I X = Var(X) E(X) = 1 + 2λ 2 ∈ (1, 2); thus, X and Y are slightly over-dispersed random variables. Note that the univariate Hermite distribution is Poisson zero-inflated by P(Z = 0) ≥ exp{−E(Z)}, ∀Z; see [7,26] for more discussion on the univariate Hermite distribution. Hence, the bivariate Hermite distribution provides a good strategy to model series with low counts, excess zeros and slight over-dispersion by property (ii). Applied to the bivariate case with X = (X 1 , X 2 ) , the thinning concept of [15] leads to the operation: ij • X j is a random univariate binomial thinning operator, ∀i, j = 1, 2, and the counting series involving in α ij . Then, we obtain the following properties of the random bivariate binomial thinning operator.
Pedeli and Karlis [16] restricted α ii • X i ; thus, its marginal models behaved like the univariate thinning operation. However, an important limitation is that crosscorrelation in the counts is ignored; i.e., Pedeli and Karlis [17] proposed the full bivariate INAR(1) model with bivariate Poisson innovations which consider the cross-correlation in the counts, but it keeps the innovations followed a bivariate Poisson distribution. In addition, models in Pedeli and Karlis [16,17] restricted A t to be a constant matrix such that the effects of various environmental factors are ignored. Popović [20] proposed a bivariate INAR(1) model which was comprised by random coefficients, but the cross-correlation of {X t , t ∈ Z} was ignored. Therefore, a novel extension of he above BINAR(1) model is proposed which allows autoregressive coefficients to vary over time, and their innovation follows the asymmetric bivariate Hermite distribution. This model will be called the BHRCINAR(1) model, and its definition is given as follows.
The ith equation of model (2) is presented by Notice that the model given in (3) is similar to the HINAR(1) model given in [7]; the main difference is that X it involves two paralleled survivors X 1,t−1 and X 2,t−1 . It is known that the BHRCINAR(1) process {X t , t ∈ Z} has two parts: the first part consists of the survivors A t • X t−1 of the system at the preceding time t − 1, and the survival rate A t varies over time; the other part is comprised by innovation vectors t which captures such counts with many zeros, low counts and low over-dispersion. See Liu et al. [11] for the p-order random coefficient INAR model.

Remark 1.
(i) If A t is a diagonal and constant matrix and λ 2 = λ 4 = 0, model (2) is reduced to the bivariate INAR(1) model with bivariate Poisson innovations; see [16] for detail. (ii) If A t is a non-diagonal and constant matrix and λ 2 = λ 4 = 0, model (2) is reduced to the full bivariate INAR(1) model with bivariate Poisson innovations; see [17] for detail.
(iii) If A t is a diagonal matrix and P(α (2) is reduced to that given in [20].
Denote ς = (ς 1 , ς 2 ) and ϑ = (ϑ 1 , ϑ 2 ) . The conditional probability distribution of BHRCINAR(1) process can be expressed as follows: where To establish the stationary properties of the new model, we make the following two assumptions.
ij ) 4 ≤κ, where κ andκ are finite positive constants; (ii) For any i = j, A t−i is independent with A t−j ; (iii) Let E(A t ) = A, and the largest eigenvalue of non-negative matrix A is less than 1.
Hence, the bivariate marginal distribution in terms of the bivariate innovation vectors of model (2) takes the following form: where α,ᾱ, λ andλ are finite positive constants, and θ 0 is an interior point in Θ. 5 and σ 12 = σ 21 = λ 5 . This also implies that they are all finite. In addition, γ is also finite under the assumption κ ≤ E(α Under the above two assumptions, BHRCINAR(1) admits a strictly stationary and an ergodic solution, which are shown in the following theorem. Theorem 1. If Assumptions 1 and 2 hold, {X t } defined by equation (2) is a strictly stationary and an ergodic Markov chain on N 0 .
Moments of (2) are obtained and given as follows. (2) and Assumptions 1 and 2 hold; then , and I is an identity matrix.
According to (1) and (2), elements of vector of expectations are In addition, R(0) consists of where A 0 is an identity matrix. Thus, the conditional expectation converges to unconditional expectation as k → ∞ by assuming that the largest eigenvalue of non-negative matrix A is less than 1.
Then the CML estimatorθ ml is obtained by maximizing (6) where Let P θ 0 be the probability measure under the true parameter θ 0 , and unless otherwise indicated, E(·) k is taken under θ 0 , ∀k ≥ 1. To analyze the sample properties ofθ ml , the following additional assumptions are made.
Lemmas A1 and A2 in the Appendix A establish the identification of model (2). Firstorder and second-order derivatives can be obtained by Lemma A3 in the Appendix A. (2) and Assumptions 1-4 hold; then, as T → ∞, (1). There exists an unique estimatorθ ml such thatθ ml → θ 0 in probability; (2).

Simulation Study
In this section, we consider the finite sample property of the CML estimators by assuming that the distributions of α ij follow a specific parametric model. Without loss of generality, we assume α (t) ij ∼ U(0, 2α ij ), and let the theoretical mean be the initial value of the process for given parameters. When For the simulated sample, performances of coefficient of variation (CV), standard deviation (SD), mean absolute deviation error (MADE) and bias are given in Tables 1 and 2. To clearly present these results, we also give the graphical inspections for the non-diagonal BHRCINAR(1) model and the parameter combination (A1) for the diagonal model, which are given in Figures 2 and 3; others are similar.   From Tables 1 and 2, we can make the following observations. CV, SD, MADE and bias decreased as T increased for non-diagonal and diagonal models, where α ij , i, j = 1, 2 decreased faster than λ k , k = 1, 2, · · · , 5 and α ij was much closer to the original value for the same sample size, especially that for non-diagonal model. In general, the estimator of the parameter λ k tended to be positively biased in finite samples. In all cases, the parameter λ k was over-estimated compared with α ij , which is not surprising, since the value of α ij was the average value of the regression coefficient α t ij , t = 1, 2, · · · and the regression coefficient varied over time. Regarding parameter λ k , the results indicate that the bias seems to be negligible when the sample size is large enough because the SD gradually decreases to zero, which is a very obvious phenomenon.  Figures 2 and 3 show that there existed some slight fluctuations for CV, SD, bias and MADE for smaller sample sizes, but all of them had downward trends. Based on above discussions, these results illustrate the consistency of the estimators for larger sample size because the CV, SD, bias and MADE of the parameters rapidly decreased to zero as T increased, which was expected. Figures 4 and 5 give the boxplots of the estimates for the parameter combination (A1); others are similar.    5 show that the estimates were consistent for both diagonal and nondiagonal models, because the range of bias gradually decreased as the sample size increased, which is much more obvious for the non-diagonal model.

Real Data Examples
To show the flexibility of the proposed model, we use an artificial example and a real example to illustrate our model.

Artificial Data Example
In this section, we first consider a simulated time series {X t } = {(X 1t , X 2t ) , t = 1, 2, · · · , T} of length T = 200 generated from (2)  To illustrate the data structure, we give the histograms of the data series in Figure 6, in which the range of {X 1t } is between 0 and 9 and that of {X 2t } is between 0 and 11; there are 19 (15) zeros in series {X 1t } ({X 2t }). The path data series and the CCF of two data series are given in Figure 7, and the sample autocorrelation function (ACF) and the sample partial autocorrelation function (PACF) of both {X 1t } and {X 2t } are provided in Figure 8. To fit the data series, we used the proposed BHRCINAR(1) model, full BINAR(1)-BP model and full BINAR(1)-NB model; see [16,17] for details. The CML estimates and approximated standard errors (se) of parameters, including the fitted values of loglikelihood (Log-lik), AIC and root mean square errors (RMSE), are summarized in Table 3, ) and se is computed by Theorem 2.   Table 3 shows that the BHRCINAR(1) model produced the smallest AIC and RMSE. Hence, the BHRCINAR(1) model is more suitable for the analyzed dataset. The resulting Pearson residual of the BHRCINAR(1) model is defined by The resulting mean and variance of the Pearson residuals of series {X 1t } ({X 2t }) were −0.0123 and 1.0012 (−0.0067 and 1.0029), respectively. The residual analysis in Figure 9 adequately illustrates that the BHRCINAR(1) model did rather well.

Crime Data of Pittsburgh
In this section, we consider the counts of forgery and fraud on police car beat 12 of Pittsburgh from 1990 to 2001, which covers T = 144 observations. The data is download at 23 September 2016 from the website http://www.forecastingprinciples.com/index.php/ crimedata.
Empirical mean values (variances) of forgery and fraud are 1.6111 (2.7428) and 4.000 (6.4615), which shows that both of the forgery and fraud counts are slightly over-dispersed. To account for the structure of the considered data, we give their histograms in Figure 10; their path and the CCF in Figure 11; and their ACF and PACF in Figure 12, respectively. The range of series forgery is between 0 and 8 and that of series fraud is between 0 and 17. There are 45 (7) zeros in series forgery (fraud) and the two series are asymmetric but with significant dependencies. The CML estimates and se of parameters, including the fitted values of log-likelihood (Log-lik), AIC and RMSE, are summarized in Table 4.  Table 4 shows that the BHRCINAR(1) model produced the smallest AIC and RMSE. Hence, the BHRCINAR(1) model is more suitable for the analyzed dataset. According to (7), the mean and variance of Pearson residuals of series forgery (fraud) are −0.0726 and 1.0552 (−0.0021 and 1.0825), respectively. The residual analysis is given in Figure 13, which shows that this BHRCINAR(1) model can adequately describe the considered datasets.

Concluding Remarks
This paper considered a more flexible BHRCINAR(1) model for bivariate integervalued time series data. The proposed model extends the random coefficient INAR(1) model to the 2-dimensional case, and it can be regarded as a generalization of the BINAR(1) model Pedeli and Karlis [17] or the model proposed by [20], but with more flexibility for dealing with low counts, excess zeros and slightly over-dispersed data. We discussed the essential properties of the model, the CML estimators and their asymptotic behavior. A simulation study was conducted to examine the finite sample performances of the estimators. A real data application was provided to illustrate our model to be effective relative to existing models.
Beyond this adaption to modeling bivariate time series with bivariate Hermite innovations, there are many further possibilities that deserve to be considered for future research. Local influence analysis will be changeling for the BINAR models in identifying influential observations; see [27,28]. To make the BINAR-type models more flexible in real applications, it may be interesting if the autoregressive coefficient is updated by using past information of the observed process; see [29]. In addition, including explanatory covariates in the model may be an available way to analyze bivariate time series with a certain trend, and will be considered in a future project.

Conflicts of Interest:
The authors declare that there is no conflict of interests regarding the publication of this paper.

Appendix A. Proof of Theorems and Propositions
Proposition A1. The proof of items (1)-(4) and (6) see the Proposition 2.1 in [11]. Here, we focus on the proof of item (5) from the perspective of each component separately, i.e., for k, s = 1, 2, we consider sj • X j are conditionally independent for given X i and X j , and However, if k = s and i = j, we have Denote δ ks = 1 if k = s and δ ks = 0 else. Therefore, we have Theorem A1. The proof comprises two parts as follows: (a). As {X t , t ∈ Z} defined by (2) is a Markov process of order one, thus ∀t 1 , · · · , t n ∈ Z, P(X t 1 +τ = x 1 , · · · , X t n +τ = x n ) = n ∏ i=2 P( X t i +τ = x i |X t i−1 +τ = x i−1 )P(X t 1 +τ = x 1 ), ∀τ ∈ Z.
Denote P n ςϑ be the n-state transition probability from state ς to state ϑ. Then for given X t−1 , conditional probability function of X t is derived by then P 1 τυ > 0 for all τ, υ ∈ N 2 0 . Hence, {X t } is irreducible. In addition that k step conditional probability distribution P k 0,0 is obtained by (5), i.e., where the first multiplier can be obtained by the similar method of (4) and the second part is obtained by (1). Thus, P k 0,0 > 0 with probability one. Hence {X t } is aperiodic. Thus, {X t } is an ergodic Markov chain. The proof is complete.
Theorem A2. For proving (1) and (2), we need to illustrate that the conditions of Theorems 4.1.2 and 4.1.3 in [30] hold for the BHRCINAR(1) model.
The proof of Lemma A3 is obtained directly by the distributions of α (t) • X and Y.