Goodness–of–Fit Tests for Bivariate Time Series of Counts

: This article considers goodness-of-ﬁt tests for bivariate INAR and bivariate Poisson autoregression models. The test statistics are based on an L2-type distance between two estimators of the probability generating function of the observations: one being entirely nonparametric and the second one being semiparametric computed under the corresponding null hypothesis. The asymptotic distribution of the proposed tests statistics both under the null hypotheses as well as under alternatives is derived and consistency is proved. The case of testing bivariate generalized Poisson autoregression and extension of the methods to dimension higher than two are also discussed. The ﬁnite-sample performance of a parametric bootstrap version of the tests is illustrated via a series of Monte Carlo experiments. The article concludes with applications on real data sets and discussion.


Introduction
Time series of counts enjoy numerous applications in such diverse fields as business, economics, engineering, and epidemiology, and corresponding inferential procedures have been intensively studied in recent years. The reader is referred to the earlier work of McKenzie (2003), as well as to the much updated full-book treatment of Davis et al. (2015) and Weiss (2018a) for an overview of the subject. Among the most popular models for counttime series are the integer autoregression (INAR) model and the Poisson autoregressive (PAR) model, also termed Poisson INARCH model. The INAR and PAR models, originally conceived for univariate counts, have been extended in order to accommodate bivariate, and more generally multivariate counts; see Latour (1997) and Liu (2012), respectively. One of the basic parametric elements of these generalizations involves moving from a univariate to a bivariate distribution, and eventually to a multivariate one.
In this connection, when confronted with a vectorial data-set of time series of counts, and since one is presented with a choice of several possible candidate models, it is extremely important to check model-adequacy via some goodness-of-fit (GOF) test. Otherwise a poorly fitted model might yield misleading inference. Various such procedures have been proposed in the literature for univariate series of counts. A brief overview of these approaches is provided in Section 2.
Inspired by the univariate test criteria, we propose GOF tests for bivariate INAR and PAR models. Our tests mainly target the (by far most popular) Poisson specification of these models, but also involve other structural aspects, such as order and functional specification of the underlying model. The suggested test statistic is constructed as a contrast between an estimator of the probability generating function (PGF) which is entirely model-free and a semiparametric counterpart that "respects" the model under test. Hence each new test may be viewed as a bivariate extension of the earlier PGF-based procedure suggested by Meintanis and Karlis (2014), Hudecová et al. (2015) and Schweer (2016).
The remainder of this work unfolds as follows. Section 2 provides a review of the models considered and of goodness-of-fit testing for univariate time series of counts. In Sections 3 and 4 we introduce the bivariate models and the new test criteria. In Section 5 the asymptotic properties of the methods are investigated and resampling versions of the tests are proposed. An extension of one the tests to a more complicated model is discussed in Section 6. The finite-sample properties of the new criteria are studied by means of Monte Carlo methods in Section 7. Applications to real data sets are included in Section 8 while our final thoughts regarding the news methods and potential extensions thereof are summarized in Section 9. Proofs of asymptotics are provided in Appendix A.

Goodness-of-Fit Methods for Univariate Time Series of Counts
Many of the models for time series of low counts assume that conditionally on the past, the distribution of the current variable is fully specified by a family of laws indexed by a certain parameter. An important example is a class of integer autoregressive conditionally heteroscedastic (INGARCH) models, covering PAR models as special cases. Different models are based on the thinning operator see Steutel and van Harn (1979), with the INAR model being the most popular one. In the following, a count distribution refers to a discrete distribution on N 0 = N ∪ {0} and a count variable is a random variable with such distribution.
Let {Y t } = {Y t , t ∈ Z} be a univariate time series and let F t = σ{Y s , s ≤ t} is the information known up to time t. An integer GARCH model, abbreviated as nonlinear INGARCH(p, q), is defined as where F(λ) is some count distribution with mean λ, and the regression function r : [0, ∞) p+q × Θ belongs to some specific parametric family of functions R = {r(·, η) : η ∈ Θ ⊂ R k } for some k ≥ 1. If F is Poisson and R is a family of linear functions then this model is referred to as Poisson (linear) INGARCH, see Ferland et al. (2006). Some authors also use the name Poisson autoregression for cases where F is Poisson and r could be nonlinear, see, e.g., Fokianos andTjøstheim (2009), Fokianos andTjøstheim (2012), Fokianos (2012), for the case p = q = 1. If q = 0 then the model has a purely autoregressive structure and is abbreviated as INARCH(p). In the following the acronym PAR(p) is used for INARCH(p) with F Poisson and r linear. It has been shown that if p = q = 1 and r is linear, i.e., r(x, η) = η 1 + η 2 x 1 + η 3 x 2 , and if η i ≥ 0 and η 2 + η 3 < 1 and F belongs to the single-parameter exponential family of distributions (that includes the Poisson distribution as a special case), then there exists a strictly stationary and ergodic solution of (1), see Davis and Liu (2016). For an overview of conditions for strict stationarity and ergodicity for other choices of F see e.g., (Ahmad and Francq 2016, Section 3). A different class of models consists of integer autoregressive moving average (IN-ARMA) models. These models arise from the same structure as the classical ARMA time series models, but the multiplication sign is replaced by the Steutel and van Harn's thinning operator •. If Y is a count random variable and α ∈ (0, 1) then where {U i } are iid Bernoulli variables with α = P(U i = 1) = 1 − P(U i = 0), which are independent of Y, with the convention that an empty sum (the case Y = 0) equals 0. Let {ε t } be a sequence of iid count random variables with distribution G with a finite variance, and let α i , β j ∈ (0, 1) for i = 1, . . . , p, j = 1, . . . , q. The INARMA(p, q) model is defined as where the Bernoulli variables involved in all the thinning operations are independent and independent of {ε t }. If p = 1, q = 0 then the model (3) (1985) and Al-Osh and Alzaid (1987), taking the following form For α ∈ (0, 1) there exists a strictly stationary solution of (4) and the law of the innovations {ε t } uniquely determines the marginal distribution of Y t as well as the conditional distribution Y t |F t−1 . In particular, if {ε t } are iid Poisson then Y t has Poisson distribution as well, and this special case has been considered in many applications. Model (3) with p > 1, q = 0, was introduced and studied by Du and Li (1991) and since then, many authors have considered variants of model (3), various extensions and modifications, see Scotto et al. (2015) for a comprehensive review.
Various GOF tests have been suggested in the literature for the aforementioned two classes of models. Neumann (2011) and Fokianos and Neumann (2013) considered GOF tests for the regression function r in a Poisson INGARCH(1,1); see model (1) with p = q = 1 and Poisson F. A slightly less formal assessment of model adequacy is explored in Aleksandrov and Weiss (2020) for a PAR(1) model as well as for a Poisson INAR(1). GOF tests based on the sample index of dispersion were considered in Schweer and Weiss (2014) and Weiss et al. (2019) for a Poisson INAR(1), and by Weiss and Schweer (2015) for a PAR(1). A test based on the classical Pearson's χ 2 statistic for the marginal distribution specified by the null hypothesis is proposed by Weiss (2018b)

for a Poisson INAR(p).
A different approach to GOF testing for time series of counts is based on the probability generating function (PGF). Recall that if Y is a nonnegative discrete random variable then its PGF is defined as g Y (u) = Eu Y for all u for which the expectation is finite, which is always the case for u ∈ [0, 1]. The distribution of Y can be easily obtained from g (k) Y (0) and thus the PGF uniquely determines the distribution of Y. If Y = (Y 1 , Y 2 ) is a count random vector then its PGF is defined as g Y (u) = E u Y 1 1 u Y 2 2 and exists for all u = (u 1 , u 2 ) ∈ [0, 1] 2 .
Test criteria involving the estimates of the PGF of (Y t , Y t−1 ) have proved as useful not only for assessing the specification of F in (1) or G in (4), but also for determination the model itself, see Meintanis and Karlis (2014) for INAR(1) and Schweer (2016) for a more general setup involving INAR(1) and PAR(1) as special cases. In both mentioned articles the considered test statistic is an integrated L 2 distance between the empirical PGF of (Y t , Y t−1 ) and the parametric estimate of g (Y t ,Y t−1 ) under the null hypothesis. Hudecová et al. (2015) also consider tests based on PGF, but instead of the estimator g (Y t ,Y t−1 ) , their criteria are constructed as an integrated L 2 distance between the nonparametric estimate g n (u) = n −1 ∑ n t=1 u Y t of the marginal PGF g Y and a semiparametric estimate of g Y , which is model-specific. In the following sections we extend this approach from the univariate to the bivariate setup.
The models based on this kind of bivariate Poisson distribution seem to be the most popular in the literature despite some of their limitations. Please note that a different construction of a bivariate distribution with Poisson marginals is considered in Lakshminarayana et al. (1999).

Bivariate INAR Model
Suppose that we have a time series {Y t } composed by a pair of counts, i.e., Y t := (Y t,1 , Y t,2 ) . Following Latour (1997) we say that {Y t } follows a bivariate INAR model of the first order, in the following referred to as BINAR, if where ε t = (ε t,1 , ε t,2 ) are iid bivariate count random vectors with finite covariance matrix, A denotes a 2 × 2 matrix with elements a ij , i, j = 1, 2, and the operator • is a multivariate generalization of (2). Namely the operator • from (6) acts on a count random vector Y of dimension two by means of the equation where the univariate thinning operators was defined in (2). Latour (1997) showed that if the spectral radius (the absolute value of the largest eigenvalue) of A is smaller than 1 and ε t has a finite covariance matrix then there exists a strictly stationary and ergodic process satisfying (6). Furthermore, the conditional least squares (CLS) estimate is shown to be consistent and asymptotically normal. Maximum likelihood estimation is considered in Pedeli and Karlis (2011) and Pedeli and Karlis (2013c), with a special emphasis on the Poisson specification for ε t . For estimation of BINAR with negative Binomial innovations the reader is referred to Mamode Khan et al. (2019) and references therein.

Bivariate PAR Model
Let F t = σ{Y s , s ≤ t}. We say that {Y t } follows a bivariate Poisson autoregression model of the first order, referred to as BPAR, model if, where α 1 , α 2 ≥ 0 and the matrix B has non-negative entries and is of full-rank. This model is sometimes referred to as a bivariate Poisson INARCH(1) model, e.g., Lee et al. (2018). Liu (2012) proved that if 2 1−1/p B p < 1 for some 1 ≤ p ≤ ∞ then {Y t , λ t } is strictly stationary and ergodic, with B p = sup x =0, x∈C 2 Bx p / x p . Strong consistency of the conditional maximum likelihood estimator (CMLE) of parameters was proved by Andreassen (2013) under the extra assumption φ < min{α 1 , α 2 }, while Lee et al. (2018) further showed that the CMLE has an asymptotic normal distribution.

Goodness-of-Fit Tests
Assume that we have observations Y 1 , . . . , Y n , which come from a stationary bivariate time series of counts and we would like to perform a GOF test to a particular model for this data, with the model being fixed apart from finite-dimensional parameters. Let υ(·, ·) be a non-negative weight function which will be further specified below. We propose as test statistic the weighted distance measure S n,υ = n 1 0 1 0 [ g n (u 1 , u 2 ) − g n,0 (u 1 , u 2 )] 2 υ(u 1 , u 2 )du 1 du 2 , where g n (u 1 , u 2 ) is a non-parametric estimate of g Y (u 1 , u 2 ) based on Y 1 , . . . , Y n , given by g n (u 1 , u 2 ) = n −1 ∑ n t=1 u Y t,1 1 u Y t,2 2 and where g n,0 (u 1 , u 2 ) is a semi-parametric estimate of the PGF g Y (u 1 , u 2 ) of Y, which is constructed specifically under the model being tested. The null hypothesis is rejected for large values of S n,υ .
By analogy to most, if not all, of the previously published work we consider u ∈ [0, 1] 2 as our working interval, despite the fact that uniqueness of PGFs and corresponding consistency of the test might require working on a region of u containing the origin. Nevertheless we further investigate this aspect of our tests by simulations. See Esquível (2008) for a recent account of uniqueness of PGFs.
Although the idea of the proposed test statistics is analogous as for the univariate models mentioned in Section 2, the extension to a bivariate case is not straightforward, neither in terms of asymptotics nor on computational grounds as the multivariate nature of the data brings in some elevated technical difficulties. We state the necessary assumptions and provide a formal proof for the asymptotics of the suggested test statistic. In addition, extension of our GOF tests to more complex models is briefly discussed. In fact, and although here we only treat the bivariate case, extension to higher dimension also comes completely natural with our methods.

Tests for the BINAR Model
Let G = {g ε (·; θ) : θ ∈ Θ} denote a parametric family of PGFs indexed by a parameter θ ∈ Θ, with Θ being a compact subset of R k , k ∈ N. We would like to test the null hypothesis Assume that {Y t } is strictly stationary with PGF g Y . Then it follows from the properties of BINAR that and a := vec(A) = (a 11 , a 21 , a 12 , a 22 ) denotes the vectorized version of A.
Here we used the fact that for a Binomial distribution with parameters (m, p) the PGF is (1 + p(u − 1)) m , so that if additionally a and θ are suitable estimators of the unknown parameters, then (9) yields a semi-parametric estimate of g Y under model (6) as where w ij = w ij (u, a). We will consider a GOF test for the null hypothesis H 0 with G being the family of bivariate Poisson distributions with unspecified parameters. Interest in testing H 0 lies in the fact that while this distribution is by far the most popular in the univariate as well as in the multivariate context, alternative specifications have also been employed such as models with innovations following a bivariate negative Binomial distribution; see for instance Pedeli and Karlis (2013a), Popović et al. (2018), and Kim and Lee (2017). Also BPAR models (see next section) may be considered to be alternative models of interest.

Tests for the BPAR Model
The corresponding null hypothesis for the BPAR model is formulated as If {Y t } is strictly stationary, then it follows from the properties of the model that the PGF of Y t is given by where Thus, if we have an appropriate estimator θ of the parameter θ : , then in view of (11) we may define a semi-parametric estimate of g Y as where Deviations from the null hypothesis H 0 include non-Poisson conditionals (see Heinen and Rengifo (2007)) as well as model violations towards more general specifications.
Remark 1. We should remark that the tests proposed in this article are not for BINAR or BPAR models per se. Specifically the test criterion for BINAR is for bivariate counts for which the PGF satisfies Equation (9), and analogously the test criterion for BPAR is for bivariate counts for which the PGF satisfies Equation (12). In the context of time series of counts however, BINAR and BPAR models are framed by equations (9) and (12), respectively, to such an extend that for all practical purposes these equations may be regarded as characterizing equations for the models themselves. Thus, our tests could be viewed as being on an equal footing with universally consistent methods such as those suggested by Fokianos and Neumann (2013), Jiménez-Gamero et al. (2020), and Leucht et al. (2015).

Computations
From (8) and by means of (10) and (14), we have after straightforward algebra where , for the BINAR model, and for the BPAR model.
The integral figuring in Equation (16) with weight function υ(u 1 , u 2 ) = u γ 1 u γ 2 , γ ≥ 0, may be expressed in more elementary terms. However the final fully reduced explicit forms, which are available from the authors upon request, are not convenient from the computational point of view. Thus, the tests were implemented by numerical evaluation of the corresponding integrals.

Asymptotics
In this section, we study the asymptotic distribution of the test statistic S n,υ under the null hypothesis of a BINAR model as well as the corresponding limit null distribution under a BPAR model. Results for the behavior under fixed alternatives are also provided and show that in both cases the test is consistent against certain fixed alternatives. Finally resampling versions are proposed that circumvent the problem of unknown components in the aforementioned limit null distributions. Please note that under the standing assumptions these results are valid for general innovation distribution and weight function as well as for arbitrary parameter-estimates.

Asymptotics of the Test Statistic: BINAR Case
Assume the test statistic S n,υ (8) is constructed for testing H 0 with specified G = {g ε (·; θ) : θ ∈ Θ}, for a compact Θ, i.e., the semi-parametric estimate g n,0 is constructed by (10). Consider the following assumptions: (6), where the spectral radius of A is smaller than 1.
Let G correspond to distributions with finite second moment. Furthermore, let the second partial derivatives of g ε with respect to θ exist and be continuous in θ and suppose that sup ) form a strictly stationary and ergodic sequence of martingale differences with finite variance. Here, l t,1 is of the same dimension as θ and l t,2 has the dimension of vec(A), i.e., four.
Under (A.2) and (A.3), {Y t } is strictly stationary and ergodic with finite second order moments, see Franke and Rao (1995). Regarding possible estimators ζ in (A.4), Franke and Rao (1995) considered the CMLE and proved its asymptotic normality under a set of regularity conditions. These conditions involve the finiteness of E ε t 3 and some further assumptions on the distribution of ε t . Theorem 1. Under (A.1)-(A.4) and as n → ∞, the limit distribution of S n,υ under the null hypothesis H 0 is the same as the distribution of where {Q(u, ζ), u ∈ [0, 1] 2 } is a Gaussian random field with zero mean and covariance structure where l q+1,1 and l q+1,2 are from assumption (A.4) and 1 (u, a)).
The proof of the assertion is postponed to the Appendix A. The asymptotic distribution of the test statistic S n,υ depends on several unknown quantities. One possibility is to generate the Gaussian random field figuring in Theorem 1 with the theoretical quantities replaced by some consistent estimators and then compute the critical values. Another possibility is the parametric bootstrap, which is quite natural here. The justification of the bootstrap approximation under the null hypothesis proceeds along similar lines as the proof of Theorem 1, conducted conditionally on the observed data Y 1 , . . . , Y n and with the help of assertions for triangular arrays and sums of martingale difference arrays.
Write S n := S n (Y 1 , . . . , Y n ; ζ) for the test statistic based on the original observations Y 1 , . . . , Y n , and the resulting parameter estimate ζ. (Here for simplicity we suppress the dependence of S n,υ on the weight function υ(·, ·)).
Next we shortly discuss the limit behavior of the test statistic under alternatives of type g ε / ∈ G. We assume that model (6) holds true but g ε in the null hypothesis H 0 does not belong to G. Moreover suppose that the estimators ( θ , a ) satisfy for some θ A ∈ Θ and some a A such that the largest eigenvalue of the respective matrix A A is in absolute value smaller then 1.
The proof is omitted since it suffices to follow the line of proof of Theorem 1 and use stationarity and ergodicity of {Y t }.
The right-hand side of (18) is strictly positive unless the true PGF g ε (·) coincides with the PGF g ε (·; θ a ) from the null hypothesis H 0 . This fact and the uniqueness of the PGF implies the consistency of the test which rejects the null hypothesis H 0 for large values of the test statistic S n,v under such fixed alternatives. The test is also consistent for other types of fixed alternatives, e.g., against model violation. This feature of the test is further illustrated by Monte Carlo simulations in Section 7.
Please note that the test even has (non-negligible) power for some local alternatives, i.e., when the difference g ε (u) − g ε (u; θ a ) tends to 0 not too fast as n → ∞ and g ε (u) depends on n. However, a rigorous proof of this result is quite technical, and therefore it is not discussed here any further.

Asymptotics of the Test Statistic: BPAR Case
This section considers the problem of testing the BPAR model. Recall that θ = (α 1 , α 2 , b 11 , . . . , b 22 , φ) stands for the vector of the parameters of the model (7), and suppose that: The series {Y t } is strictly stationary solution of (7) with parameters θ ∈ Θ such that φ < min{α 1 , α 2 } and Θ is compact. (B. 2) The estimator θ of the parameter θ is such that where l t = l(Y t , Y t−1 , . . . , Y t−q , θ) = (l t,1 , . . . , l t,7 ) form a strictly stationary and ergodic sequence of martingale differences with finite variances.
Theorem 3. Under (A.1),(B.1)-(B.2) the limit distribution of S n,υ as n → ∞ is the same as the distribution of where {Q(u, θ), u ∈ [0, 1] 2 } is a Gaussian random field with zero mean and covariance structure with l t defined in assumption (B.2), c(u, θ) defined in (15), Similarly as for the BINAR model, the parametric bootstrap can be carried out in a very natural way and is recommended for practical use. The justification of the bootstrap approximation would again proceed along similar lines as the proof of Theorem 3.
The following theorem is analogous to Theorem 2 and describes the behavior of the test statistic under some alternatives.
The proof is omitted since it suffices to follow the line of the proof of Theorem 3 and to use stationarity and ergodicity of {Y t }. Possible comments for this theorem are quite parallel to Theorem 1 and therefore are omitted.

Extension to Generalized BPAR Model
Extension of the proposed test to arbitrary higher order or dimension will be discussed in Section 9. In this section we aim at a new PGF-based GOF test for a relatively mild, yet very important, generalization of the BPAR model. Specifically consider the following model for {Y t }: where α = (α 1 , α 2 ) with α 1 , α 2 ≥ 0, A, B are matrices of non-negative entries and B is of full rank. (Please note that if A is replaced by the zero matrix, then model (19) reduces to the BPAR model defined in (7)). Model (19) was studied, e.g., in Andreassen (2013), Liu (2012), and Lee et al. (2018), with the acronym INGARCH(1,1). If A p + 2 1−1/p B p < 1 for some 1 ≤ p ≤ ∞, then {Y t , λ t } is strictly stationary and ergodic, see Liu (2012), and if φ < min{a 1 , a 2 }, where (a 1 , a 2 ) = (I − A) −1 α, then the CMLE is consistent and asymptotically normal, see Andreassen (2013), Lee et al. (2018). Assume that {Y t } follows a stationary model (19). Following analogous arguments as in Section 4.2 we have that the PGF of Y under this model is given by, with the joint PGF of the vector (Y , λ ) defined by It is straightforward to device a test for the model (19) by using Equation (20) and proceeding analogously as with Equation (8). However the asymptotics of such a test as well as its actual implementation require a separate investigation. In this connection preliminary Monte Carlo results showed some promise but there were also problems, and therefore we decide not to pursue this extension any further here.

Simulations
The finite sample behavior of the suggested bootstrap test is investigated in the following simulation study. We consider the null hypotheses H 0 of Poisson BINAR and H 0 of BPAR model and investigate the size of the test under the null hypothesis and the power for various alternatives.
The unknown parameters of the BINAR model are estimated using the CLS method, and φ is estimated using the moment method, see Pedeli and Karlis (2013b). The parameters of BPAR model are estimated by the CMLE method. For simplicity the weight function is set to υ(·, ·) ≡ 1, i.e., we take γ = 0 in υ(u 1 , u 2 ) = u γ 1 u γ 2 . The simulations were conducted in the R-computing environment R Core Team (2019) and by employing the warp-speed bootstrap of Giacomini et al. (2013) for M = 1000 repetitions. When using this method, B=1 bootstrap samples are generated for each Monte Carlo repetition and the resulting p-value is computed from the overall bootstrap sample of M replicas.
Our results are for sample size n = 100, 200 and n = 500 at level of significance α = 0.01, 0.05 and 0.1 for both H 0 and H 0 . For the BINAR model, a reasonable alternative might be BPAR model or model (6) with innovations ε t following a distribution other than the bivariate Poisson. Such a popular alternative is a bivariate distribution with negative Binomial marginals. There are several possibilities as to how to generate such variables. Here we consider the bivariate negative Binomial distribution BNB(λ 1 , λ 2 , r) of Dunn (1967), whereby (U 1 , U 2 ) ∼ BNB(λ 1 , λ 2 , r), with U i , i = 1, 2, being marginally negative Binomial with mean λ i and variance λ i (1 + λ i /r) and cov(U 1 , U 2 ) = λ 1 λ 2 /r. This bivariate negative Binomial distribution is also used in an alternative considered to a BPAR model. Namely a model of form (7) with conditional distribution BNB(λ t1 , λ t2 , r) instead of bivariate Poisson is considered, with the dependence of λ ti on F t−1 the same as specified in Equation (7). We will refer to this model as the negative Binomial BINARCH. Finally, we also explored the power of the test when testing H 0 and the data follow BINAR model and vice versa. The results for the size of the two tests are summarized in Table 1 and for the BINAR model were obtained by using either A = A 1 or A = A 2 , where and ε t following a BP(6, 4, 3) distribution. For BPAR, we set α = (6, 4) , φ = 3, and again B equal either to A 1 or A 2 . As it may be seen from Table 1, the tests are conservative if the matrix A 1 is used in the data generating process. On the other hand, the size slightly exceeds the nominal size α if A 2 is used in the data generating process Please note that a similar phenomenon was already observed for the univariate series in Hudecová et al. (2015): The test may be slightly conservative for certain parametric settings of the model at hand, while it can be slightly anticonservative for other settings. Recall also that the unknown parameters are estimated by a CLS method for BINAR models whereas CMLE is used for BPAR models, which might also partly explain the different behaviour observed in Table 1.
In either case however, the observed size approaches the prescribed significance level as the sample size increases. In this connection, the rather poor small sample behavior observed in the left part of Table 1 for the BINAR model parameterised by the matrix A 1 , can be improved by considering a modified test statistic for which integration in Equation (8) is carried over [−1, 1] 2 rather than over [0, 1] 2 ; see the corresponding discussion at the end of Section 4. Specifically this approach (performed on the same simulated data) and for nominal size α = 0.05 leads to empirical size equal to 0.033, 0.039 and 0.052, for n = 100, 200, and 500, respectively. For all other settings however, the results for the modified test statistic (available from the authors upon a request) are very similar and hence we only present here results for the original test statistic S n,υ defined by Equation (8). The power of the test for H 0 with data from a BINAR model with negative Binomial innovations is provided in Table 2. The considered bivariate negative Binomial distribution is BNB(6, 4, r) for r = 5, 10. Please note that for larger r, the BNB distribution is closer to a bivariate Poisson distribution, and this fact is also reflected in the power of the test, which is lower for r = 10. For example, a sample size n = 100 seems to be insufficient for distinguishing between Poisson and negative binomial INAR for this larger r. However, as the sample size n growths, the test performs very well even for r = 10 irrespective of the matrix A being used to simulate the model. A similar observation also holds for the results in Table 3 which correspond to the null hypothesis H 0 with data from a negative Binomial BINARCH. These results were obtained with the same values of α and B used for the null hypothesis, and r = 30, 50. On the other hand, Table 4 reports the power of the test for H 0 with data following a BPAR model and the power of the test for H 0 with data from a Poisson BINAR model. The results in Table 4 show that if the matrix B in the BPAR model equals A 1 then the test for H 0 fails to distinguish between the two models for sample sizes up to n = 500. In contrast, if the matrix B is equal to A 2 in BPAR and we test for Poisson BINAR then the power is satisfactory even for n = 200. The same observation holds for the opposite situation when one tests for BPAR and the data come from a Poisson BINAR.

Real-Data Application
Joint modelling of count observations finds important applications in the insurance industry, see for instance Partrat (1994), andVernic (1997). In this connection, it is a common practice for insurance companies to split the reported claims into several types. Typically, it is reasonable to expect that aggregate amounts (daily or monthly) of these different types of claims to be dependent, see, e.g., Shi et al. (2016). If the mean size of claims counts is high, then classical models for continuous variables could be applied. On the other hand, if the observed counts of claims are formed by small integers it is appropriate to treat the data as genuine counts, and, consequently, engage models and methods specifically tailored for count time series.
We illustrate this kind of application on real data sets on the monthly number of claims of short-term disability benefits made by injured workers to the British Columbia Workers Compensation Board. The time period is from January 1985 to December 1994. The original data set from Freeland (1998) contains five time series corresponding to five different injury categories: burn injuries, soft tissue injuries, cuts, dermatitis and dislocations. These five time series have been previously analyzed by several authors, and separate univariate models were fitted. It has been found that the Poisson INAR is appropriate for all five series, except for series #3 (cuts), for which this model is not appropriate, see e.g., Freeland (1998), Zhu and Joe (2006), Hudecová et al. (2015). In particular, Freeland and McCabe (2004) and Zhu and Joe (2006) suggest to model the time series of cuts claims using an extension of INAR(1) model with a seasonal component. On the other hand, Biswas and Song (2009) argue that the ACF and PACF do not indicate a significant seasonal effect.
We first consider a bivariate series of soft tissue injuries claims and dermatitis claims; see Figure 1 (left panel). Possible dependence among these two series may be due to the fact that a major accident causes several injuries, often of different types. Previous analyses in Freeland (1998) or Hudecová et al. (2015) reveal that the marginal INAR(1) models might be appropriate for these two series. Hence, we consider a Poisson BINAR model with a diagonal matrix A and estimate the parameters using the conditional least squares method (note that if a general, i.e., non-diagonal, matrix A is considered, then the estimates of the off-diagonal entries are very close to zero). Estimators of parameters were obtained by the methods used in the simulations of the previous section. The matrix A is estimated as A = diag(0.452, 0.293), and the parameters of the bivariate Poisson distribution of the innovations ε t are estimated as λ = (5.376, 0.202) , φ = 0.109. The resulting GOF test applied with υ ≡ 1 and B= 999 bootstrap samples leads to S n,υ = 8.618 · 10 −4 with p-value 0.327. Thus, one may conclude that the Poisson BINAR model with a diagonal matrix A seems to fit the data well.
On the other hand, if one considers a bivariate series of soft tissue injuries claims and cuts claims, see Figure 1 (right panel), and tests the GOF for a Poisson BINAR model with a diagonal matrix A (which corresponds to univariate INAR models for the two series), the null hypothesis is rejected with p-value 0.0370. A Poisson BINAR with a general matrix A is rejected as well (p-value 0.002). This is in accordance with the findings of previous papers mentioned above. However, the hypothesis of BPAR is not rejected with

Concluding Remarks
We suggest consistent goodness-of-fit tests for bivariate INAR and bivariate Poisson autoregressive models, estimated by least squares and maximum likelihood, respectively, with well defined limit null distributions. Since these limit distributions are complicated we suggest parametric bootstrap resampling which engages distributional assumptions featuring in the null hypothesis in order to actually carry out the tests. Monte Carlo results show that this bootstrap version of the new tests is generally reasonably sized and has good power against certain popular alternative configurations. Our real-data applications are in the direction of further scrutiny and better understanding of the mechanisms generating the data at hand.
At this point, we wish to discuss potential extension of the tests to models of higher order or dimension, such as the multivariate INAR type processes studied by Franke and Rao (1995), Latour (1997), Karlis (2011), Pedeli andKarlis (2013c), and Pedeli and Karlis (2013a), and corresponding extensions of PAR type models considered by Liu (2012), Andreassen (2013), Lee et al. (2018), Ciu and Zhu (2018) and Ciu et al. (2020). In this connection, and while it is conceptually straightforward to extend the tests for INAR or PAR models of higher order, see also Section 6 of Hudecová et al. (2015), we wish to emphasize the fact that a certain order and/or dimension increase brings about serious challenges in estimation as well as on the actual implementation of the methods due to the potentially great number of new parameters introduced.
Finally, before we close, we wish to point out that while this paper deals solely with stationary time series models, certain real world phenomena, including time series of counts, often exhibit deterministic trends or seasonal patterns, and thus GOF tests for non-stationary models would be of great practical interest. However, despite the fact that univariate INAR and PAR models with deterministic components have been extensively studied in the literature, multivariate extensions of such nonstationary models are rather scarce; we refer to Santos et al. (2019) for some recent works.

Data Availability Statement:
The data presented in this study are available in Freeland (1998) or on request from the corresponding author.

Acknowledgments:
The authors are grateful to the Associated Editor and the anonymous reviewers for their careful inspection of the paper and valuable comments, which led to an improved manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Proofs
Appendix A.1. Proof of Theorem 1.

Proof.
Denote as A ⊂ [0, 1] 4 such that if a ∈ A then A has the largest eigenvalue smaller than 1. The test statistic S n,υ is of the form a). In the following, we omit for a moment the argument u in all the considered functions. Namely we consider W t = W t (a) as a function of the argument a alone, i.e., ∇W t is a differential of W t with respect to a, ∇W t = ∇ a W t = (∂W/∂a 11 , ∂W/∂a 12 , ∂W/∂a 21 , ∂W/∂a 22 ) . Likewise, ∇g ε = ∇ θ g ε is a differential of g ε with respect to θ. Then a Taylor expansion gives: where with a * = a + c(a − a) and θ * = θ + d(θ − θ) for some c, d ∈ (0, 1) and where H g ε = ∂ 2 g ε /(∂θ k ∂θ l ) k,l is the k × k matrix of second partial derivatives of g ε with respect to θ. Similarly, H W t is the 4 × 4 matrix of second partial derivatives of W t with respect to a. For ∇W t we have and clearly ∂W t /∂a ij ≤ Y tj for all u ∈ [0, 1] 2 and a ∈ A. Furthermore, and thus, ∂ 2 W t /(∂a ij ∂a kl ) ≤ Y t−1,j Y t−1,l for all u ∈ [0, 1] 2 and all a ∈ A. Due to the finite second order moments of {Y t } and Assumption (A.2) we have where K i are constants and 1 k×4 is a matrix of 1's of dimension k × 4. Hence, it follows from the Cauchy-Schwartz inequality that [0,1] 2 1 √ n ∑ n t=2 R t 2 υ(u)du P → 0, as n → ∞, and thus the asymptotic distribution of S n,υ is the same as the asymptotic distribution of J 3n (θ, a, u) = −g ε (u, θ) √ n( a − a) 1 n n ∑ t=2 ∇ a W t−1 (u, a).
III. The marginal distributions of Q n converge to the marginal distributions of Q uniformly in θ and a.
First, notice that Q n is of a form 1 √ n ∑ n t=q+1 L t , where {L t } is a martingale difference sequence L t = u Y t1 1 u Y t2 2 − g ε (θ, u)W t−1 (u, a) − l t1 (θ, a) h 2 (θ, a, u) − l t2 (θ, a) h 3 (θ, a, u). Thus EQ 2 n ≤ 1 n ∑ n t=q+1 E|L t | 2 ≤ 4(1 + g 2 ε + h 2 2 E l 11 2 + h 3 2 E l 12 2 ). Since we assume finite variances of l t,j and thus h j 2 are bounded (due to finiteness of EY t,j and assumption A.2), condition I. directly follows. Condition III. follows from the central limit theorem for martingale difference sequences on L t .
By the mean value theorem applied on the function f (x, y) = x a y b and due to the fact that E|Y t,j | is finite, we get that E[u 1 v Y t,2 2 ] 2 ≤ K u − v 2 . Next, |g ε | ≤ 1 and the partial derivatives |∂W 1 /∂u j | are bounded by |Y 1,1 + Y 1,2 |, which has finite expectation. This implies that g ε (u, θ) 2 E[W 1 (u, a) − W 1 (v, a)] 2 ≤ K u − v 2 . Finally, it follows from the definition of g ε as a PGF that it is Lipschitz. Furthermore, E|W 1 (u, a)| 2 ≤ 1, which implies (g ε (u, θ) − g ε (v, θ)) 2 E|W 1 (u, a)| 2 ≤ K u − v 2 . Together, we get E|J 1n0 (θ, a, u) − J 1n0 (θ, a, v)| 2 ≤ K u − v 2 . Similar arguments are used to show that the condition holds also for j = 3. If we use the assumption that l t has finite variances, it remains to show that h j (θ, a, u) − h j (θ, a, v) 2 ≤ K u − v α . First, notice that the partial derivatives of h 3 with respect to u are continuous functions for u ∈ [0, 1] 2 . Thus, they are bounded and we can apply the mean value theorem on the components of h 3 , which implies that h 3 (θ, a, u) − h 3 (θ, a, v) 2 ≤ K u − v 2 . Similarly, if the assumption (A.2) holds then the partial derivatives of h 2 are bounded and this implies that h 2 (θ, a, u) − h 2 (θ, a, v) 2 ≤ K u − v 2 , which completes the proof of condition II., and thus the assertion of the theorem follows.