Abstract
This article considers goodness-of-fit tests for bivariate INAR and bivariate Poisson autoregression models. The test statistics are based on an L2-type distance between two estimators of the probability generating function of the observations: one being entirely nonparametric and the second one being semiparametric computed under the corresponding null hypothesis. The asymptotic distribution of the proposed tests statistics both under the null hypotheses as well as under alternatives is derived and consistency is proved. The case of testing bivariate generalized Poisson autoregression and extension of the methods to dimension higher than two are also discussed. The finite-sample performance of a parametric bootstrap version of the tests is illustrated via a series of Monte Carlo experiments. The article concludes with applications on real data sets and discussion.
1. Introduction
Time series of counts enjoy numerous applications in such diverse fields as business, economics, engineering, and epidemiology, and corresponding inferential procedures have been intensively studied in recent years. The reader is referred to the earlier work of McKenzie (2003), as well as to the much updated full-book treatment of Davis et al. (2015) and Weiss (2018a) for an overview of the subject. Among the most popular models for count-time series are the integer autoregression (INAR) model and the Poisson autoregressive (PAR) model, also termed Poisson INARCH model. The INAR and PAR models, originally conceived for univariate counts, have been extended in order to accommodate bivariate, and more generally multivariate counts; see Latour (1997) and Liu (2012), respectively. One of the basic parametric elements of these generalizations involves moving from a univariate to a bivariate distribution, and eventually to a multivariate one.
In this connection, when confronted with a vectorial data-set of time series of counts, and since one is presented with a choice of several possible candidate models, it is extremely important to check model-adequacy via some goodness-of-fit (GOF) test. Otherwise a poorly fitted model might yield misleading inference. Various such procedures have been proposed in the literature for univariate series of counts. A brief overview of these approaches is provided in Section 2.
Inspired by the univariate test criteria, we propose GOF tests for bivariate INAR and PAR models. Our tests mainly target the (by far most popular) Poisson specification of these models, but also involve other structural aspects, such as order and functional specification of the underlying model. The suggested test statistic is constructed as a contrast between an estimator of the probability generating function (PGF) which is entirely model-free and a semiparametric counterpart that “respects” the model under test. Hence each new test may be viewed as a bivariate extension of the earlier PGF-based procedure suggested by Meintanis and Karlis (2014), Hudecová et al. (2015) and Schweer (2016).
The remainder of this work unfolds as follows. Section 2 provides a review of the models considered and of goodness-of-fit testing for univariate time series of counts. In Section 3 and Section 4 we introduce the bivariate models and the new test criteria. In Section 5 the asymptotic properties of the methods are investigated and resampling versions of the tests are proposed. An extension of one the tests to a more complicated model is discussed in Section 6. The finite-sample properties of the new criteria are studied by means of Monte Carlo methods in Section 7. Applications to real data sets are included in Section 8 while our final thoughts regarding the news methods and potential extensions thereof are summarized in Section 9. Proofs of asymptotics are provided in Appendix A.
2. Goodness-of-Fit Methods for Univariate Time Series of Counts
Many of the models for time series of low counts assume that conditionally on the past, the distribution of the current variable is fully specified by a family of laws indexed by a certain parameter. An important example is a class of integer autoregressive conditionally heteroscedastic (INGARCH) models, covering PAR models as special cases. Different models are based on the thinning operator see Steutel and van Harn (1979), with the INAR model being the most popular one. In the following, a count distribution refers to a discrete distribution on and a count variable is a random variable with such distribution.
Let be a univariate time series and let is the information known up to time t. An integer GARCH model, abbreviated as nonlinear INGARCH(), is defined as
where is some count distribution with mean , and the regression function belongs to some specific parametric family of functions for some . If F is Poisson and is a family of linear functions then this model is referred to as Poisson (linear) INGARCH, see Ferland et al. (2006). Some authors also use the name Poisson autoregression for cases where F is Poisson and r could be nonlinear, see, e.g., Fokianos and Tjøstheim (2009), Fokianos and Tjøstheim (2012), Fokianos (2012), for the case . If then the model has a purely autoregressive structure and is abbreviated as INARCH. In the following the acronym PAR(p) is used for INARCH(p) with F Poisson and r linear. It has been shown that if and r is linear, i.e., , and if and and F belongs to the single-parameter exponential family of distributions (that includes the Poisson distribution as a special case), then there exists a strictly stationary and ergodic solution of (1), see Davis and Liu (2016). For an overview of conditions for strict stationarity and ergodicity for other choices of F see e.g., (Ahmad and Francq 2016, Section 3).
A different class of models consists of integer autoregressive moving average (INARMA) models. These models arise from the same structure as the classical ARMA time series models, but the multiplication sign is replaced by the Steutel and van Harn’s thinning operator ∘. If Y is a count random variable and then
where are iid Bernoulli variables with , which are independent of Y, with the convention that an empty sum (the case ) equals 0. Let be a sequence of iid count random variables with distribution G with a finite variance, and let for , . The INARMA model is defined as
where the Bernoulli variables involved in all the thinning operations are independent and independent of . If then the model (3) corresponds to the INAR(1) model introduced in McKenzie (1985) and Al-Osh and Alzaid (1987), taking the following form
For there exists a strictly stationary solution of (4) and the law of the innovations uniquely determines the marginal distribution of as well as the conditional distribution . In particular, if are iid Poisson then has Poisson distribution as well, and this special case has been considered in many applications. Model (3) with , , was introduced and studied by Du and Li (1991) and since then, many authors have considered variants of model (3), various extensions and modifications, see Scotto et al. (2015) for a comprehensive review.
Various GOF tests have been suggested in the literature for the aforementioned two classes of models. Neumann (2011) and Fokianos and Neumann (2013) considered GOF tests for the regression function r in a Poisson INGARCH(1,1); see model (1) with and Poisson F. A slightly less formal assessment of model adequacy is explored in Aleksandrov and Weiss (2020) for a PAR(1) model as well as for a Poisson INAR(1). GOF tests based on the sample index of dispersion were considered in Schweer and Weiss (2014) and Weiss et al. (2019) for a Poisson INAR(1), and by Weiss and Schweer (2015) for a PAR(1). A test based on the classical Pearson’s statistic for the marginal distribution specified by the null hypothesis is proposed by Weiss (2018b) for a Poisson INAR(p).
A different approach to GOF testing for time series of counts is based on the probability generating function (PGF). Recall that if Y is a nonnegative discrete random variable then its PGF is defined as for all u for which the expectation is finite, which is always the case for . The distribution of Y can be easily obtained from and thus the PGF uniquely determines the distribution of Y. If is a count random vector then its PGF is defined as and exists for all . Test criteria involving the estimates of the PGF of have proved as useful not only for assessing the specification of F in (1) or G in (4), but also for determination the model itself, see Meintanis and Karlis (2014) for INAR(1) and Schweer (2016) for a more general setup involving INAR(1) and PAR(1) as special cases. In both mentioned articles the considered test statistic is an integrated distance between the empirical PGF of and the parametric estimate of under the null hypothesis. Hudecová et al. (2015) also consider tests based on PGF, but instead of the estimator , their criteria are constructed as an integrated distance between the nonparametric estimate of the marginal PGF and a semiparametric estimate of , which is model-specific. In the following sections we extend this approach from the univariate to the bivariate setup.
3. Bivariate Models for Time Series of Counts
The bivariate INAR and PAR models considered in this paper are based on the following bivariate distribution: We say that follows a bivariate Poisson distribution, see Kocherlakota and Kocherlakota (1992), denoted as , if its PGF is
where . This distribution arises via the trivariate reduction method, i.e., , , where are independent with Poisson distribution with mean , respectively.
The models based on this kind of bivariate Poisson distribution seem to be the most popular in the literature despite some of their limitations. Please note that a different construction of a bivariate distribution with Poisson marginals is considered in Lakshminarayana et al. (1999).
3.1. Bivariate INAR Model
Suppose that we have a time series composed by a pair of counts, i.e., . Following Latour (1997) we say that follows a bivariate INAR model of the first order, in the following referred to as BINAR, if
where are iid bivariate count random vectors with finite covariance matrix, denotes a matrix with elements , , and the operator ∘ is a multivariate generalization of (2). Namely the operator ∘ from (6) acts on a count random vector of dimension two by means of the equation
where the univariate thinning operators was defined in (2).
Latour (1997) showed that if the spectral radius (the absolute value of the largest eigenvalue) of is smaller than 1 and has a finite covariance matrix then there exists a strictly stationary and ergodic process satisfying (6). Furthermore, the conditional least squares (CLS) estimate is shown to be consistent and asymptotically normal. Maximum likelihood estimation is considered in Pedeli and Karlis (2011) and Pedeli and Karlis (2013c), with a special emphasis on the Poisson specification for . For estimation of BINAR with negative Binomial innovations the reader is referred to Mamode Khan et al. (2019) and references therein.
3.2. Bivariate PAR Model
Let . We say that follows a bivariate Poisson autoregression model of the first order, referred to as BPAR, model if,
where and the matrix has non-negative entries and is of full-rank. This model is sometimes referred to as a bivariate Poisson INARCH(1) model, e.g., Lee et al. (2018). Liu (2012) proved that if for some then is strictly stationary and ergodic, with Strong consistency of the conditional maximum likelihood estimator (CMLE) of parameters was proved by Andreassen (2013) under the extra assumption , while Lee et al. (2018) further showed that the CMLE has an asymptotic normal distribution.
4. Goodness-of-Fit Tests
Assume that we have observations , which come from a stationary bivariate time series of counts and we would like to perform a GOF test to a particular model for this data, with the model being fixed apart from finite-dimensional parameters. Let be a non-negative weight function which will be further specified below. We propose as test statistic the weighted distance measure
where is a non-parametric estimate of based on , given by and where is a semi-parametric estimate of the PGF of , which is constructed specifically under the model being tested. The null hypothesis is rejected for large values of .
By analogy to most, if not all, of the previously published work we consider as our working interval, despite the fact that uniqueness of PGFs and corresponding consistency of the test might require working on a region of containing the origin. Nevertheless we further investigate this aspect of our tests by simulations. See Esquível (2008) for a recent account of uniqueness of PGFs.
Although the idea of the proposed test statistics is analogous as for the univariate models mentioned in Section 2, the extension to a bivariate case is not straightforward, neither in terms of asymptotics nor on computational grounds as the multivariate nature of the data brings in some elevated technical difficulties. We state the necessary assumptions and provide a formal proof for the asymptotics of the suggested test statistic. In addition, extension of our GOF tests to more complex models is briefly discussed. In fact, and although here we only treat the bivariate case, extension to higher dimension also comes completely natural with our methods.
4.1. Tests for the BINAR Model
Let denote a parametric family of PGFs indexed by a parameter , with being a compact subset of , . We would like to test the null hypothesis
Assume that is strictly stationary with PGF . Then it follows from the properties of BINAR that
where
and denotes the vectorized version of .
Here we used the fact that for a Binomial distribution with parameters the PGF is , so that if additionally and are suitable estimators of the unknown parameters, then (9) yields a semi-parametric estimate of under model (6) as
where .
We will consider a GOF test for the null hypothesis with being the family of bivariate Poisson distributions with unspecified parameters. Interest in testing lies in the fact that while this distribution is by far the most popular in the univariate as well as in the multivariate context, alternative specifications have also been employed such as models with innovations following a bivariate negative Binomial distribution; see for instance Pedeli and Karlis (2013a), Popović et al. (2018), and Kim and Lee (2017). Also BPAR models (see next section) may be considered to be alternative models of interest.
4.2. Tests for the BPAR Model
The corresponding null hypothesis for the BPAR model is formulated as
If is strictly stationary, then it follows from the properties of the model that the PGF of is given by
where
Thus, if we have an appropriate estimator of the parameter , then in view of (11) we may define a semi-parametric estimate of as
where
and
Deviations from the null hypothesis include non-Poisson conditionals (see Heinen and Rengifo (2007)) as well as model violations towards more general specifications.
Remark 1.
We should remark that the tests proposed in this article are not for BINAR or BPAR models per se. Specifically the test criterion for BINAR is for bivariate counts for which the PGF satisfies Equation (9), and analogously the test criterion for BPAR is for bivariate counts for which the PGF satisfies Equation (12). In the context of time series of counts however, BINAR and BPAR models are framed by Equations (9) and (12), respectively, to such an extend that for all practical purposes these equations may be regarded as characterizing equations for the models themselves. Thus, our tests could be viewed as being on an equal footing with universally consistent methods such as those suggested by Fokianos and Neumann (2013), Jiménez-Gamero et al. (2020), and Leucht et al. (2015).
4.3. Computations
From (8) and by means of (10) and (14), we have after straightforward algebra
where with
where
for the BINAR model, and
for the BPAR model.
The integral figuring in Equation (16) with weight function , may be expressed in more elementary terms. However the final fully reduced explicit forms, which are available from the authors upon request, are not convenient from the computational point of view. Thus, the tests were implemented by numerical evaluation of the corresponding integrals.
5. Asymptotics
In this section, we study the asymptotic distribution of the test statistic under the null hypothesis of a BINAR model as well as the corresponding limit null distribution under a BPAR model. Results for the behavior under fixed alternatives are also provided and show that in both cases the test is consistent against certain fixed alternatives. Finally resampling versions are proposed that circumvent the problem of unknown components in the aforementioned limit null distributions. Please note that under the standing assumptions these results are valid for general innovation distribution and weight function as well as for arbitrary parameter-estimates.
5.1. Asymptotics of the Test Statistic: BINAR Case
Assume the test statistic (8) is constructed for testing with specified , for a compact , i.e., the semi-parametric estimate is constructed by (10). Consider the following assumptions:
- (A.1)
- Let be a non-negative function satisfying .
- (A.2)
- Let follow model (6), where the spectral radius of is smaller than 1.
- (A.3)
- Let correspond to distributions with finite second moment. Furthermore, let the second partial derivatives of with respect to exist and be continuous in and suppose thathold true for all and , where (recall) that k is the dimension of .
- (A.4)
- Let be an estimator of such that for somewhere form a strictly stationary and ergodic sequence of martingale differences with finite variance. Here, is of the same dimension as and has the dimension of , i.e., four.
Under (A.2) and (A.3), is strictly stationary and ergodic with finite second order moments, see Franke and Rao (1995). Regarding possible estimators in (A.4), Franke and Rao (1995) considered the CMLE and proved its asymptotic normality under a set of regularity conditions. These conditions involve the finiteness of and some further assumptions on the distribution of .
Theorem 1.
Under (A.1)–(A.4) and as , the limit distribution of under the null hypothesis is the same as the distribution of
where is a Gaussian random field with zero mean and covariance structure
where and are from assumption (A.4) and
The proof of the assertion is postponed to the Appendix A.
The asymptotic distribution of the test statistic depends on several unknown quantities. One possibility is to generate the Gaussian random field figuring in Theorem 1 with the theoretical quantities replaced by some consistent estimators and then compute the critical values. Another possibility is the parametric bootstrap, which is quite natural here. The justification of the bootstrap approximation under the null hypothesis proceeds along similar lines as the proof of Theorem 1, conducted conditionally on the observed data and with the help of assertions for triangular arrays and sums of martingale difference arrays.
Write for the test statistic based on the original observations , and the resulting parameter estimate . (Here for simplicity we suppress the dependence of on the weight function ).
- Generate , where are iid and follow the distribution with PGF .
- Compute pseudo-observations , using Equation (6) with replaced by , and replaced by .
- Fit the model (6) using , and compute the bootstrap estimator of .
- Compute the corresponding test statistic .
- Repeat steps 1–4 several times, say B, and obtain the sequence of test statistics, .
- Compute the p-value as .
Next we shortly discuss the limit behavior of the test statistic under alternatives of type . We assume that model (6) holds true but in the null hypothesis does not belong to . Moreover suppose that the estimators satisfy
for some and some such that the largest eigenvalue of the respective matrix is in absolute value smaller then 1.
Theorem 2.
Let be strictly stationary with PGF that is continuous in and let be continuous in θ. Let (A.1) and (17) be satisfied. Then, as ,
where .
The proof is omitted since it suffices to follow the line of proof of Theorem 1 and use stationarity and ergodicity of .
The right-hand side of (18) is strictly positive unless the true PGF coincides with the PGF from the null hypothesis . This fact and the uniqueness of the PGF implies the consistency of the test which rejects the null hypothesis for large values of the test statistic under such fixed alternatives. The test is also consistent for other types of fixed alternatives, e.g., against model violation. This feature of the test is further illustrated by Monte Carlo simulations in Section 7.
Please note that the test even has (non-negligible) power for some local alternatives, i.e., when the difference tends to 0 not too fast as and depends on n. However, a rigorous proof of this result is quite technical, and therefore it is not discussed here any further.
5.2. Asymptotics of the Test Statistic: BPAR Case
This section considers the problem of testing the BPAR model. Recall that stands for the vector of the parameters of the model (7), and suppose that:
- (B.1)
- The series is strictly stationary solution of (7) with parameters such that and is compact.
- (B.2)
- The estimator of the parameter is such thatwhere form a strictly stationary and ergodic sequence of martingale differences with finite variances.
An estimator that satisfies (B.2) is for instance the CMLE; see Lee et al. (2018).
Theorem 3.
Under (A.1),(B.1)–(B.2) the limit distribution of as is the same as the distribution of
where is a Gaussian random field with zero mean and covariance structure
with defined in assumption (B.2), defined in (15),
and
Similarly as for the BINAR model, the parametric bootstrap can be carried out in a very natural way and is recommended for practical use. The justification of the bootstrap approximation would again proceed along similar lines as the proof of Theorem 3.
The following theorem is analogous to Theorem 2 and describes the behavior of the test statistic under some alternatives.
Theorem 4.
The proof is omitted since it suffices to follow the line of the proof of Theorem 3 and to use stationarity and ergodicity of . Possible comments for this theorem are quite parallel to Theorem 1 and therefore are omitted.
6. Extension to Generalized BPAR Model
Extension of the proposed test to arbitrary higher order or dimension will be discussed in Section 9. In this section we aim at a new PGF-based GOF test for a relatively mild, yet very important, generalization of the BPAR model. Specifically consider the following model for :
where with , are matrices of non-negative entries and is of full rank. (Please note that if is replaced by the zero matrix, then model (19) reduces to the BPAR model defined in (7)). Model (19) was studied, e.g., in Andreassen (2013), Liu (2012), and Lee et al. (2018), with the acronym INGARCH(1,1). If for some , then is strictly stationary and ergodic, see Liu (2012), and if , where , then the CMLE is consistent and asymptotically normal, see Andreassen (2013), Lee et al. (2018).
Assume that follows a stationary model (19). Following analogous arguments as in Section 4.2 we have that the PGF of under this model is given by,
where
with the joint PGF of the vector defined by
It is straightforward to device a test for the model (19) by using Equation (20) and proceeding analogously as with Equation (8). However the asymptotics of such a test as well as its actual implementation require a separate investigation. In this connection preliminary Monte Carlo results showed some promise but there were also problems, and therefore we decide not to pursue this extension any further here.
7. Simulations
The finite sample behavior of the suggested bootstrap test is investigated in the following simulation study. We consider the null hypotheses of Poisson BINAR and of BPAR model and investigate the size of the test under the null hypothesis and the power for various alternatives.
The unknown parameters of the BINAR model are estimated using the CLS method, and is estimated using the moment method, see Pedeli and Karlis (2013b). The parameters of BPAR model are estimated by the CMLE method. For simplicity the weight function is set to , i.e., we take in . The simulations were conducted in the R-computing environment R Core Team (2019) and by employing the warp-speed bootstrap of Giacomini et al. (2013) for repetitions. When using this method, B=1 bootstrap samples are generated for each Monte Carlo repetition and the resulting p-value is computed from the overall bootstrap sample of M replicas.
Our results are for sample size and at level of significance and for both and . For the BINAR model, a reasonable alternative might be BPAR model or model (6) with innovations following a distribution other than the bivariate Poisson. Such a popular alternative is a bivariate distribution with negative Binomial marginals. There are several possibilities as to how to generate such variables. Here we consider the bivariate negative Binomial distribution of Dunn (1967), whereby , with , , being marginally negative Binomial with mean and variance and This bivariate negative Binomial distribution is also used in an alternative considered to a BPAR model. Namely a model of form (7) with conditional distribution instead of bivariate Poisson is considered, with the dependence of on the same as specified in Equation (7). We will refer to this model as the negative Binomial BINARCH. Finally, we also explored the power of the test when testing and the data follow BINAR model and vice versa.
The results for the size of the two tests are summarized in Table 1 and for the BINAR model were obtained by using either or , where
and following a distribution. For BPAR, we set , , and again equal either to or . As it may be seen from Table 1, the tests are conservative if the matrix is used in the data generating process. On the other hand, the size slightly exceeds the nominal size if is used in the data generating process Please note that a similar phenomenon was already observed for the univariate series in Hudecová et al. (2015): The test may be slightly conservative for certain parametric settings of the model at hand, while it can be slightly anticonservative for other settings. Recall also that the unknown parameters are estimated by a CLS method for BINAR models whereas CMLE is used for BPAR models, which might also partly explain the different behaviour observed in Table 1. In either case however, the observed size approaches the prescribed significance level as the sample size increases.
Table 1.
Size of the test for testing Poisson BINAR in and BPAR in .
In this connection, the rather poor small sample behavior observed in the left part of Table 1 for the BINAR model parameterised by the matrix , can be improved by considering a modified test statistic for which integration in Equation (8) is carried over rather than over ; see the corresponding discussion at the end of Section 4. Specifically this approach (performed on the same simulated data) and for nominal size leads to empirical size equal to 0.033, 0.039 and 0.052, for , and 500, respectively. For all other settings however, the results for the modified test statistic (available from the authors upon a request) are very similar and hence we only present here results for the original test statistic defined by Equation (8).
The power of the test for with data from a BINAR model with negative Binomial innovations is provided in Table 2. The considered bivariate negative Binomial distribution is for . Please note that for larger r, the distribution is closer to a bivariate Poisson distribution, and this fact is also reflected in the power of the test, which is lower for . For example, a sample size seems to be insufficient for distinguishing between Poisson and negative binomial INAR for this larger r. However, as the sample size n growths, the test performs very well even for irrespective of the matrix being used to simulate the model.
Table 2.
Power of the test for for data from BINAR model with negative Binomial marginals.
A similar observation also holds for the results in Table 3 which correspond to the null hypothesis with data from a negative Binomial BINARCH. These results were obtained with the same values of and used for the null hypothesis, and .
Table 3.
Power of the test for with data generated by a negative Binomial BINARCH model.
On the other hand, Table 4 reports the power of the test for with data following a BPAR model and the power of the test for with data from a Poisson BINAR model. The results in Table 4 show that if the matrix in the BPAR model equals then the test for fails to distinguish between the two models for sample sizes up to . In contrast, if the matrix is equal to in BPAR and we test for Poisson BINAR then the power is satisfactory even for . The same observation holds for the opposite situation when one tests for BPAR and the data come from a Poisson BINAR.
Table 4.
Power of the test of Poisson BINAR in for data following BPAR and vice versa.
8. Real-Data Application
Joint modelling of count observations finds important applications in the insurance industry, see for instance Partrat (1994), and Vernic (1997). In this connection, it is a common practice for insurance companies to split the reported claims into several types. Typically, it is reasonable to expect that aggregate amounts (daily or monthly) of these different types of claims to be dependent, see, e.g., Shi et al. (2016). If the mean size of claims counts is high, then classical models for continuous variables could be applied. On the other hand, if the observed counts of claims are formed by small integers it is appropriate to treat the data as genuine counts, and, consequently, engage models and methods specifically tailored for count time series.
We illustrate this kind of application on real data sets on the monthly number of claims of short-term disability benefits made by injured workers to the British Columbia Workers Compensation Board. The time period is from January 1985 to December 1994. The original data set from Freeland (1998) contains five time series corresponding to five different injury categories: burn injuries, soft tissue injuries, cuts, dermatitis and dislocations. These five time series have been previously analyzed by several authors, and separate univariate models were fitted. It has been found that the Poisson INAR is appropriate for all five series, except for series (cuts), for which this model is not appropriate, see e.g., Freeland (1998), Zhu and Joe (2006), Hudecová et al. (2015). In particular, Freeland and McCabe (2004) and Zhu and Joe (2006) suggest to model the time series of cuts claims using an extension of INAR(1) model with a seasonal component. On the other hand, Biswas and Song (2009) argue that the ACF and PACF do not indicate a significant seasonal effect.
We first consider a bivariate series of soft tissue injuries claims and dermatitis claims; see Figure 1 (left panel). Possible dependence among these two series may be due to the fact that a major accident causes several injuries, often of different types. Previous analyses in Freeland (1998) or Hudecová et al. (2015) reveal that the marginal INAR(1) models might be appropriate for these two series. Hence, we consider a Poisson BINAR model with a diagonal matrix and estimate the parameters using the conditional least squares method (note that if a general, i.e., non-diagonal, matrix is considered, then the estimates of the off-diagonal entries are very close to zero). Estimators of parameters were obtained by the methods used in the simulations of the previous section. The matrix is estimated as , and the parameters of the bivariate Poisson distribution of the innovations are estimated as , . The resulting GOF test applied with and B bootstrap samples leads to with p-value . Thus, one may conclude that the Poisson BINAR model with a diagonal matrix seems to fit the data well.
Figure 1.
Monthly number of claims. (a) Soft tissue injuries claims (black dots) and dermatitis claims (gray triangles). (b) Cuts claims.
On the other hand, if one considers a bivariate series of soft tissue injuries claims and cuts claims, see Figure 1 (right panel), and tests the GOF for a Poisson BINAR model with a diagonal matrix (which corresponds to univariate INAR models for the two series), the null hypothesis is rejected with p-value . A Poisson BINAR with a general matrix is rejected as well (p-value ). This is in accordance with the findings of previous papers mentioned above. However, the hypothesis of BPAR is not rejected with p-value . The MLEs in this model are , and is such that . One could further scrutinize this data set by postulating an extended (nonstationary) BPAR model with a seasonal component. This might improve the models’ predictions compared to the fitted stationary model, but such a model lies outside the scope of this paper.
9. Concluding Remarks
We suggest consistent goodness-of-fit tests for bivariate INAR and bivariate Poisson autoregressive models, estimated by least squares and maximum likelihood, respectively, with well defined limit null distributions. Since these limit distributions are complicated we suggest parametric bootstrap resampling which engages distributional assumptions featuring in the null hypothesis in order to actually carry out the tests. Monte Carlo results show that this bootstrap version of the new tests is generally reasonably sized and has good power against certain popular alternative configurations. Our real-data applications are in the direction of further scrutiny and better understanding of the mechanisms generating the data at hand.
At this point, we wish to discuss potential extension of the tests to models of higher order or dimension, such as the multivariate INAR type processes studied by Franke and Rao (1995), Latour (1997), Pedeli and Karlis (2011), Pedeli and Karlis (2013c), and Pedeli and Karlis (2013a), and corresponding extensions of PAR type models considered by Liu (2012), Andreassen (2013), Lee et al. (2018), Ciu and Zhu (2018) and Ciu et al. (2020). In this connection, and while it is conceptually straightforward to extend the tests for INAR or PAR models of higher order, see also Section 6 of Hudecová et al. (2015), we wish to emphasize the fact that a certain order and/or dimension increase brings about serious challenges in estimation as well as on the actual implementation of the methods due to the potentially great number of new parameters introduced.
Finally, before we close, we wish to point out that while this paper deals solely with stationary time series models, certain real world phenomena, including time series of counts, often exhibit deterministic trends or seasonal patterns, and thus GOF tests for non-stationary models would be of great practical interest. However, despite the fact that univariate INAR and PAR models with deterministic components have been extensively studied in the literature, multivariate extensions of such nonstationary models are rather scarce; we refer to Santos et al. (2019) for some recent works.
Author Contributions
Conceptualization, Š.H., M.H., S.G.M.; methodology, Š.H., M.H., S.G.M.; software Š.H.; writing—original draft preparation, Š.H., S.G.M.; writing—review and editing, Š.H., M.H., S.G.M. All authors have read and agreed to the published version of the manuscript.
Funding
The work of Šárka Hudecová and Marie Hušková is funded by the Czech Science Foundation project GA18-08888S.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data presented in this study are available in Freeland (1998) or on request from the corresponding author.
Acknowledgments
The authors are grateful to the Associated Editor and the anonymous reviewers for their careful inspection of the paper and valuable comments, which led to an improved manuscript.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A. Proofs
Appendix A.1. Proof of Theorem 1
Proof.
Denote as such that if then has the largest eigenvalue smaller than 1. The test statistic is of the form
Let In the following, we omit for a moment the argument in all the considered functions. Namely we consider as a function of the argument alone, i.e., is a differential of with respect to , . Likewise, is a differential of with respect to . Then a Taylor expansion gives:
where
with and for some and where is the matrix of second partial derivatives of with respect to . Similarly, is the matrix of second partial derivatives of with respect to . For we have
and clearly for all and . Furthermore,
and thus, for all and all .
Due to the finite second order moments of and Assumption (A.2) we have
where are constants and is a matrix of 1’s of dimension . Hence, it follows from the Cauchy–Schwartz inequality that as , and thus the asymptotic distribution of is the same as the asymptotic distribution of
where
Regarding the behavior of , notice that and thus, for a given and as
uniformly in due to the uniform ergodicity theorem. Now define
where Due to (A.2), it follows that as . Similarly, in view of the form of given above and the finiteness of , we get from the uniform ergodicity theorem that as
uniformly in . Define
As then Hence, it suffices to study the asymptotic behavior of , where
with . We will make use Theorem 22 from Ibragimov and Chasminskij (1981) and the subsequent remark in order to show that the integral converges in distribution to , where Q is a Gaussian random field specified in the statement of the theorem. To this end, we need to verify that
- There exist constants and such that
- The marginal distributions of converge to the marginal distributions of Q uniformly in and .
First, notice that is of a form , where is a martingale difference sequence . Thus Since we assume finite variances of and thus are bounded (due to finiteness of and assumption A.2), condition I. directly follows. Condition III. follows from the central limit theorem for martingale difference sequences on .
In order to prove condition II., we will show that for , for some and . In the following K is a generic constant. For we have
By the mean value theorem applied on the function and due to the fact that is finite, we get that Next, and the partial derivatives are bounded by , which has finite expectation. This implies that Finally, it follows from the definition of as a PGF that it is Lipschitz. Furthermore, , which implies Together, we get Similar arguments are used to show that the condition holds also for . If we use the assumption that has finite variances, it remains to show that First, notice that the partial derivatives of with respect to are continuous functions for . Thus, they are bounded and we can apply the mean value theorem on the components of , which implies that Similarly, if the assumption (A.2) holds then the partial derivatives of are bounded and this implies that which completes the proof of condition II., and thus the assertion of the theorem follows. □
Appendix A.2. Proof of Theorem 3
Proof.
The proof proceeds along the same lines as the proof of Theorem 1. A Taylor expansion is used to show that
where are defined in the statement of the theorem,
The term can be replaced by . Finally, the conditions I.–III. from Theorem 22 of Ibragimov and Chasminskij (1981) can be verified similarly as it is done for BINAR using the fact that the fourth moments of are finite and the parametric space is compact. □
References
- Ahmad, Ali, and Christian Francq. 2016. Poisson QMLE of count time series models. Journal of Time Series Analysis 37: 291–314. [Google Scholar] [CrossRef]
- Al-Osh, Mohamed A., and Abdulhamid A. Alzaid. 1987. First–order integer–valued autoregressive (INAR(1)) process. Journal of Time Series Analysis 8: 261–75. [Google Scholar] [CrossRef]
- Aleksandrov, Boris, and Christian H. Weiss. 2020. Testing the dispersion structure of count time series using Pearson residuals. AStA Advances in Statistical Analysis 104: 325–61. [Google Scholar] [CrossRef]
- Andreassen, Camilla Mondrup. 2013. Models and Inference for Correlated Count Data. Ph.D. thesis, Aarhus University, Aarhus, Denmark. [Google Scholar]
- Biswas, Atanu, and Peter Xue Kun Song. 2009. Discrete-valued ARMA processes. Statistics and Probability Letters 79: 1884–89. [Google Scholar] [CrossRef]
- Ciu, Yan, Qi Li, and Fukang Zhu. 2020. Flexible bivariate Poisson integer-valued GARCH model. Annals of the Institute of Statistical Mathematics 72: 1449–77. [Google Scholar]
- Ciu, Yan, and Fukang Zhu. 2018. A new bivariate integer-valued GARCH model allowing for negative cross-correlation. TEST 27: 428–52. [Google Scholar]
- Davis, Richard A., Scott H. Holan, Robert Lund, and Nalini Ravishanker. 2015. Handbook of Discrete-Valued Time Series. New York: Chapman and Hall/CRC. [Google Scholar]
- Davis, Richard A., and Heng Liu. 2016. Theory and inference for a class of nonlinear models with application to time series of counts. Statistica Sinica 26: 1673–707. [Google Scholar] [CrossRef][Green Version]
- Du, Jin-Guan, and Yuan Li. 1991. The integer-valued autoregressive (INAR(p)) model. Journal of Time Series Analysis 12: 129–42. [Google Scholar]
- Dunn, James E. 1967. Characterization of the bivariate negative binomial distribution. Journal of the Arkansas Academy of Science 21: 77–86. [Google Scholar]
- Esquível, Manuel L. 2008. Probability generating functions for discrete real–valued random variables. Theory of Probability and Its Applications 52: 40–57. [Google Scholar] [CrossRef]
- Ferland, René, Alain Latour, and Driss Oraichi. 2006. Integer-valued GARCH processes. Journal of Time Series Analysis 27: 923–42. [Google Scholar] [CrossRef]
- Fokianos, Konstantinos. 2012. Count time series. In Handbook of Statistics 30: Time Series—Methods and Applications. Edited by Tata Subba Rao, Suhasini Subba Rao and Calyampudi Radhakrishna Rao. Amsterdam: Elsevier, pp. 315–47. [Google Scholar]
- Fokianos, Konstantinos, and Michael H. Neumann. 2013. A goodness–of–fit tests for Poisson count processes. Electronic Journal of Statistics 7: 793–819. [Google Scholar] [CrossRef]
- Fokianos, Konstantinos, Andres Rahbek, and Dag Tjøstheim. 2009. Poisson autoregression. Journal of the American Statistical Association 104: 1430–39. [Google Scholar] [CrossRef]
- Fokianos, Konstantinos, and Dag Tjøstheim. 2012. Nonlinear Poisson autoregression. Annals of the Institute of Statistical Mathematics 64: 1205–25. [Google Scholar] [CrossRef]
- Franke, Jürgen, and T. Subba Rao. 1995. Multivariate First-Order Integer-Valued Autoregressions. Technical Report. Kaiserslautern: Universität Kaiserslautern. [Google Scholar]
- Freeland, R. Keith. 1998. Statistical Analysis of Discrete Time series with Application to the Analysis of Workers’ Compensation Claims Data. Ph.D. thesis, Management Science Division, Faculty of Commerce and Business Administration, University of British Columbia, Vancouver, BC, Canada. Available online: https://open.library.ubc.ca/cIRcle/collections/ubctheses (accessed on 25 February 2021).
- Freeland, R. Keith, and Brendan P. M. McCabe. 2004. Analysis of low count time series data by Poisson autoregression. Journal of Time Series Analysis 25: 701–22. [Google Scholar] [CrossRef]
- Giacomini, Raffaella, Dmitris Politis, and Halbert White. 2013. A warp-speed method for conducting Monte Carlo experiments involving bootstrap estimators. Econometric Theory 29: 567–89. [Google Scholar] [CrossRef]
- Heinen, Andréas, and Erick Rengifo. 2007. Multivariate autoregressive modeling of time series count data using copulas. Journal of Empirical Finance 14: 564–583. [Google Scholar] [CrossRef]
- Hudecová, Šárka, Marie Hušková, and Simos G. Meintanis. 2015. Tests for time series of counts based on the probability generating function. Statistics 49: 316–37. [Google Scholar] [CrossRef]
- Ibragimov, Il’dar Abdulovich, and Rafail Zalmonovich Chasminskij. 1981. Statistical Estimation, Asymptotic Theory. New York: Springer. [Google Scholar]
- Jiménez-Gamero, M. Dolores, Sangyeol Lee, and Simos G. Meintanis. 2020. Goodness-of-fit tests for parametric specifications of conditionally heteroscedastic models. TEST 29: 682–703. [Google Scholar] [CrossRef]
- Kim, Hanwool, and Sangyeol Lee. 2017. On first order integer-valued autoregressive process with Katz family innovations. Journal of Statistical Computation and Simulation 87: 546–62. [Google Scholar] [CrossRef]
- Kocherlakota, Subrahmaniam, and Kathleen Kocherlakota. 1992. Bivariate Discrete Distributions. New York: Marcel Dekker Inc. [Google Scholar]
- Lakshminarayana, J., S. N. Narahari Pandit, and K. Srinivasa Rao. 1999. On a bivariate Poisson distribution. Communications in Statistics—Theory and Methods 28: 267–76. [Google Scholar] [CrossRef]
- Latour, Alain. 1997. The multivariate GINAR(p) process. Advances in Applied Probability 29: 228–48. [Google Scholar] [CrossRef]
- Lee, Youngmi, Sangyeol Lee, and Dag Tjøstheim. 2018. Asymptotic normality and parameter change test for bivariate Poisson INGARCH models. TEST 27: 52–69. [Google Scholar] [CrossRef]
- Leucht, Anne, Jens-Peter Kreiss, and Michael H. Neumann. 2015. A model specification test for GARCH(1,1) processes. Scandinavian Journal of Statistics 42: 1167–93. [Google Scholar] [CrossRef]
- Liu, Heng. 2012. Some Models for Time Series of Counts. Ph.D. thesis, Columbia University, New York, NY, USA. [Google Scholar]
- Mamode Khan, Naushad Ali, Yuvraj Sunecher, Vandna Jowaheer, Miroslav M. Ristić, and Maleika Heenaye–Mamode Khan. 2019. Investigating GQL–based inferential approaches for non-stationary BINAR(1) model under different quantum of over-dispersion with application. Computational Statistics 34: 1275–313. [Google Scholar] [CrossRef]
- McKenzie, Eddie. 1985. Some simple models for discrete variate time series. Water Resources Bulletin 21: 645–50. [Google Scholar] [CrossRef]
- McKenzie, Eddie. 2003. Discrete variate time series. Stochastic Processes: Modelling and Simulation. In Handbook of Statistics. Amsterdam: Elsevier, vol. 21, pp. 573–606. [Google Scholar]
- Meintanis, Simos G., and Dimitris Karlis. 2014. Validation tests for the innovation distribution in INAR time series models. Computational Statistics 29: 1221–41. [Google Scholar] [CrossRef]
- Neumann, Michael H. 2011. Absolute regularity and ergodicity of Poisson count processes. Bernoulli 17: 1258–84. [Google Scholar] [CrossRef]
- Partrat, Christian. 1994. Compound model for two dependent kinds of claim. Insurance: Mathematics and Economics 15: 219–31. [Google Scholar] [CrossRef]
- Pedeli, Xanthi, and Dimitris Karlis. 2011. A bivariate INAR(1) process with application. Statistical Modelling 11: 325–49. [Google Scholar] [CrossRef]
- Pedeli, Xanthi, and Dimitris Karlis. 2013a. On composite likelihood estimation of a multivariate INAR(1) model. Journal of Time Series Analysis 34: 206–20. [Google Scholar] [CrossRef]
- Pedeli, Xanthi, and Dimitris Karlis. 2013b. On estimation of the bivariate Poisson INAR process. Communications in Statistics—Simulation and Computation 42: 514–33. [Google Scholar] [CrossRef]
- Pedeli, Xanthi, and Dimitris Karlis. 2013c. Some properties of multivariate INAR(1) processes. Computational Statistics & Data Analysis 67: 213–25. [Google Scholar]
- Popović, Predrag M., Aleksandar S. Nastić, and Miroslav M. Ristić. 2018. Residual analysis with bivariate INAR models. REVSTAT 16: 349–64. [Google Scholar]
- R Core Team. 2019. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. [Google Scholar]
- Santos, Cláudia, Isabel Pereira, and Manuel G. Scotto. 2019. On the theory of periodic multivariate INAR processes. Statistical Papers. [Google Scholar] [CrossRef]
- Schweer, Sebastian. 2016. A goodness–of–fit test for integer valued autoregressive processes. Journal of Time Series Analysis 37: 77–98. [Google Scholar] [CrossRef]
- Schweer, Sebastian, and Christian H. Weiss. 2014. Compound Poisson INAR(1) processes: Stochastic properties and testing for overdispersion. Computational Statistics & Data Analysis 77: 267–84. [Google Scholar]
- Scotto, Manuel G., Christian H. Weiss, and Sónia Gouveia. 2015. Thinning-based models in the analysis of integer-valued time series: A review. Statistical Modelling 15: 590–618. [Google Scholar] [CrossRef]
- Shi, Peng, Xiaoping Feng, and Jean-Philippe Boucher. 2016. Multilevel modeling of insurance claims using copulas. The Annals of Applied Statistics 10: 834–36. [Google Scholar] [CrossRef]
- Steutel, Fred W., and Klaas van Harn. 1979. Discrete analogues of self-decomposability and stability. Annals of Probability 7: 893–99. [Google Scholar] [CrossRef]
- Vernic, Raluca. 1997. On the bivariate generalized poisson distribution. ASTIN Bulletin 27: 22–32. [Google Scholar] [CrossRef]
- Weiss, Christian H. 2018a. Discrete-Valued Time Series. Hoboken: John Wiley & Sons. [Google Scholar]
- Weiss, Christian H. 2018b. Goodness–of–fit testing of a count series’ marginal distribution. Metrika 81: 619–51. [Google Scholar] [CrossRef]
- Weiss, Christian H., Annika Homburg, and Pedro Puig. 2019. Testing for zero inflation and overdispersion in INAR(1) models. Statistical Papers 60: 473–98. [Google Scholar] [CrossRef]
- Weiss, Christian H., and Sebastian Schweer. 2015. Detecting overdispersion in INARCH(1) processes. Statistica Neerlandica 69: 281–97. [Google Scholar] [CrossRef]
- Zhu, Rong, and Harry Joe. 2006. Modelling count data time series with Markov processes based on binomial thinning. Journal of Time Series Analysis 27: 725–38. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).