Partial Autocorrelation Diagnostics for Count Time Series

In a time series context, the study of the partial autocorrelation function (PACF) is helpful for model identification. Especially in the case of autoregressive (AR) models, it is widely used for order selection. During the last decades, the use of AR-type count processes, i.e., which also fulfil the Yule–Walker equations and thus provide the same PACF characterization as AR models, increased a lot. This motivates the use of the PACF test also for such count processes. By computing the sample PACF based on the raw data or the Pearson residuals, respectively, findings are usually evaluated based on well-known asymptotic results. However, the conditions for these asymptotics are generally not fulfilled for AR-type count processes, which deteriorates the performance of the PACF test in such cases. Thus, we present different implementations of the PACF test for AR-type count processes, which rely on several bootstrap schemes for count times series. We compare them in simulations with the asymptotic results, and we illustrate them with an application to a real-world data example.


Introduction
Autoregressive (AR) models for time series date back to Walker [1], Yule [2], and they assume the current observation of the considered process to be generated from its own past by a linear scheme. The ordinary pth-order AR-model for a real-valued process (Z t ) t∈Z={...,−1,0,1,...} , abbreviated as AR(p) model, is defined by the recursive scheme where the innovations (ε t ) Z are independent and identically distributed (i. i. d.) real-valued random variables (rv), which are also assumed to be square-integrable ("white noise").
To ensure a (weakly) stationary and causal solution for the AR(p) recursion (1), the ARparameters α 1 , . . . , α p ∈ R have to be chosen such that the roots of the characteristic polynomial α(z) = 1 − α 1 z − . . . − α p z p are outside the unit circle. Then, if the innovations (ε t ) Z follow a normal distribution, also the observations (Z t ) Z are normal, leading to the Gaussian AR(p) process. A characteristic property of the AR(p) process is given by the fact that its autocorrelation function (ACF), ρ(h) = Corr[Z t , Z t−h ] with h ∈ N = {1, 2, . . .} and ρ(0) = 1, satisfies the following set of linear equations: These Yule-Walker (YW) equations, in turn, give rise to define the partial autocorrelation function (PACF), ρ part (h) with time lags h ∈ N, in the following way (see Appendix A for further details): if R k := ρ(|i − j|) i,j=1,...,k and r k := ρ(1), . . . , ρ(k) ∈ R k for k = 1, 2, . . ., and if a k ∈ R k denotes the solution of the equation R k a k = r k , then the PACF at lag k is defined as the last component of a k , i.e., ρ part (k) := a k,k . Hence, if the YW-equations (2) hold, it follows that ρ part (p) = α p , ρ part (h) = 0 for all h > p. (3) This characteristic abrupt drop-down of the PACF towards zero after lag h = p is commonly used for model identification in practice, namely by inspecting the sample PACF for such a pattern, see the Box-Jenkins program dating back to Box & Jenkins [3]. Details on the PACF's computation are summarized in Appendix A. There, we also provide a brief discussion on some equivalences between ACF, PACF, and the AR-coefficients, in the sense that the AR(p) model (1) is characterized equivalently by either α 1 , . . . , α p , or ρ(1), . . . , ρ(p), or ρ part (1), . . . , ρ part (p).
Since the introduction of the ordinary AR(p) model, several other AR-type models have been proposed in the literature, not only for real-valued processes, but also for different types of quantitative processes such as count processes (and even for categorical processes), see the surveys by Holan et al. [4], Weiß [5]. In the present work, the focus is on (stationary and square-integrable) AR-type count processes (X t ) Z , i.e., where the X t have a quantitative range contained in N 0 = {0, 1, . . .}. Here, the AR(p) structure is implied by requiring the conditional mean at each time t to be linear in the last p observations [6], i.e., because then, the YW-equations (2) immediately follow by using the law of total covariance. Note that one also has to require α 0 > 0 and α 1 , . . . , α p ≥ 0, as the counts X t are non-negative rvs having a truly positive mean, computed as µ = α 0 /(1 − α 1 − . . . − α p ). The considered class of count processes satisfying (4) covers many popular special cases, such as the INAR(p) model (integer-valued AR) by Du & Li [7], the INARCH(p) model ('CH' = conditional heteroscedasticity) by Ferland et al. [8], or their bounded-counts counterparts discussed in Kim et al. [9]; see Section 2 for further details. These count processes satisfying (4), however, are not truly linear processes: by contrast to (1), there is no linear relation between their observations. As all these AR(p)-like count processes satisfy the YW-equations (2) and, thus, the PACF characterization (3), it is common practice to employ the sample PACF (SPACF) for model identification given a count time series X 1 , . . . , X n . More precisely, one commonly computes the SPACF from X 1 , . . . , X n ,ρ part (h) for h = 1, 2, . . ., and checks for the pattern (3) among those SPACF values that are classified as being significantly different from zero. An analogous procedure is common during a later step of the Box-Jenkins program. After having fitted a model to the data, one commonly computes the Pearson residuals to check the model adequacy; see Weiß [5], Jung & Tremayne [10] as well as Section 2. While, for an adequate model fit, the Pearson residuals are expected to be uncorrelated, significant SPACF values computed thereof would indicate that the fitted model does not adequately capture the true dependence structure. In both cases, practitioners usually evaluate the significance ofρ part (h) based on the following asymptotic result (see [11] (Theorem 8.1.2)): i.e., the valueρ part (h) is compared to the critical values ±z 1−α/2 / √ n to test the null hypothesis of an AR(h − 1) process on level α. Here, N(µ, σ 2 ) denotes the normal distribution with mean µ and variance σ 2 , and z γ abbreviates the γ-quantile of N(0, 1). The aforementioned critical values are automatically plotted in SPACF plots by common statistical software, e.g., if one uses the command pacf in R. However, Theorem 8.1.2 in Brockwell & Davis [11] assumes that the SPACF is computed from a truly linear AR(p) process as in (1), which is neither the case for the aforementioned AR-type count processes, nor for the Pearson residuals computed thereof. Thus, it is not clear if the approximation (5) is asymptotically correct and sufficiently precise in finite samples. In fact, some special asymptotic results in Kim & Weiß [12], Mills & Seneta [13], see Section 3 for further details, as well as some simulation results for Pearson residuals in Weiß et al. [14] indicate that this is generally indeed not the case.
Therefore, several alternative ways of implementing the PACF-test are presented in Section 4, namely relying on different types of bootstrap schemes for count time series. The performance of these bootstrap implementations compared to the asymptotic ones is analyzed in a comprehensive simulation study. In Section 5, we start with the case where the SPACF is applied to the original count time series (X t ) with the aim of identifying the AR model order. Afterwards in Section 6, we consider the case of applying the SPACF to the (non-integer) Pearson residuals computed based on a model fit, i.e., the SPACF is used for checking the model adequacy. Our findings are also illustrated by a real-data example on claims counts in Section 7. Here, the computations and simulations in Sections 5-7 have been performed with the software R, and the documented R-code for Section 7 is provided in the Supplementary Materials to this article. Further R-codes can be obtained from the corresponding author upon request. We conclude the article in Section 8.

On AR-Type Count Time Series and Pearson Residuals
Several (stationary and square-integrable) AR-type count processes (X t ) Z , which also have a conditional linear mean according to (4), have been discussed in the literature. Most of these processes either follow a model recursion using so-called thinning operators (typically referred to as INAR models), or they are defined by specifying the conditional distribution of X t |X t−1 , . . . together with condition (4), leading to INARCH models, see Weiß [5] [15]. Having the parameter α ∈ (0; 1) and being applied to a count rv X, it is defined by the conditional binomial distribution α • X|X ∼ Bin(X, α), where the boundary cases are included as 0 • X = 0 and 1 • X = X. Let ( t ) Z be squareintegrable i. i. d. count rv, denote µ = E[ t ] and σ 2 = V[ t ]. Then, the INAR(p) process (X t ) Z is defined by the recursion where all thinnings are executed independently of each other, and where α • := ∑ p j=1 α j < 1 is assumed to ensure a stationary solution. The INAR(p) process (6) constitutes a pth-order Markov process, the transition probabilities of which are a convolution between the p binomial distributions Bin(X t−1 , α 1 ), . . . , Bin(X t−p , α p ) and the innovations' distribution [16] (p. 469). The conditional mean satisfies (4) with α 0 = µ , and the conditional variance is given by see Drost et al. [16] (p. 469). The default choice for t in the literature is a Poisson (Poi) distribution (which is the integer counterpart to the normal distribution), leading to the Poi-INAR(p) process. However, any other (non-degenerate) count distribution for t might be used as well, such as the negative-binomial (NB) distribution for increased dispersion, leading to the NB-INAR(p) process. In the case of such a parametric specification for t , ones computes the moments µ , σ 2 according to this model, and then the conditional mean and variance according to (4) and (7), respectively. The INARCH(p) model of Ferland et al. [8] directly assumes the conditional linear mean (4) to hold, and then specifies the conditional distribution of X t |X t−1 , . . . In Ferland et al. [8], the case of a conditional Poi-distribution is assumed, i.e., altogether such that the conditional variance of this Poi- However, other choices for the conditional distribution of X t |X t−1 , . . . have been investigated in the literature; see [5] (Section 4.2). For parameter estimation, one commonly uses either simple method-of-moment (MM) estimators (i.e., derived from marginal sample moments and the sample ACF, also see Appendix A), or the more advanced conditional maximum likelihood (CML) estimators, which are computed by using a numerical optimization routine (see [5] (Section 2.2)). It should be noted that for the INAR(p) model, also a semi-parametric specification exists (where the innovations' distribution is left unspecified). The corresponding semiparametric CML estimator was analyzed by Drost et al. [16]; see also the small-sample refinement by Faymonville et al. [17]. It leads to non-parametric estimates for the probabilities p ,k = P( t = k) for k between some finite bounds 0 ≤ l < u < ∞ (and p ,k = 0 for k ∈ {l, . . . , u}), which can then be used for computing µ , σ 2 as required for the conditional moments (4) and (7). More precisely, the r th moment, r ∈ N, is given by E[ r t ] = ∑ u k=l k r p ,k . After having fitted a model to the count time series X 1 , . . . , X n , a widely used approach for checking the model adequacy is to investigate the corresponding (standardized) Pearson residuals [5,10,14,18,19]. Let the parameters of the considered AR(p)-type model be comprised in the vector θ, and letθ denote the estimated parameters of the fitted model. Furthermore, let us write the conditional mean as E[X t | X t−1 , . . . ; θ] and the conditional variance as V[X t | X t−1 , . . . ; θ] to express their dependence on the actual parameter values. Then, the Pearson residuals are defined as If the fitted AR(p)-type model is adequate for X 1 , . . . , X n , then R p+1 , . . . , R n should have a sample mean (variance) close to zero (one), and they should be uncorrelated. These necessary criteria are then used as adequacy checks. In the present research, our focus is on the SPACF computed from R p+1 , . . . , R n , which, for an adequate model fit, should not have values being significantly different from zero.

Some Asymptotic Results for the Sample PACF
The basic asymptotic result (5), which has been shown for the SPACF being computed from a true AR(p) process, has been extended in several directions. First, some refinements have been derived by Anderson [20,21] and further investigated by Kwan [22], who, however, assume the data-generating process (DGP) to be i. i. d. Gaussian, i.e., neither AR dependence nor count rvs are covered by their results. More precisely, Anderson [20] complements the asymptotic variance 1/n in (5) by the following O(n −2 ) approximation of the mean: While the Gaussian assumption is weakened by the statement that the result (10) "seems likely to have some validity for many non-Gaussian distributions" [20] (p. 406), the i. i. d.assumption is not relaxed.
While the O(n −3 ) extension in (11) seems relevant only for very small sample sizes n, the alternating pattern for the mean in (10) might affect the performance of the normal approximation also for larger n.
Another extension of the basic asymptotic result (5) is due to Kim & Weiß [12], Mills & Seneta [13]. These authors consider two particular types of AR(1) count process, namely a Poi-INAR(1) and a binomial AR(1) process, respectively, and derive an O(n −2 ) approximation of V ρ part (h) for h ≥ 2. While their exact formulae are not relevant for the present research, the crucial point is as follows: In both cases, the approximate variance is of the form (1 + c)/n, where c is inverse proportional to the mean µ, and also depends on the value of ρ(1). Especially for low means µ, the numerator 1 + c deviates notably from 1. Hence, the basic asymptotics (5) do not hold for these types of count process. An analogous conclusion can be drawn from the simulation results in Weiß et al. [14] (Table 1), where the rejection rate for the SPACF of the Pearson residuals (with CML-fitted Poi-INAR(p) model) under the basic asymptotic critical values (5) is analyzed. These rejection rates are often below the intended level, which indicates that (5) does not hold here.
These possible drawbacks of existing asymptotic results are illustrated by Figure 1. The upper panel refers to the mean of SPACF(h), which is either computed from 10 4 simulated Poi-INAR(1) time series (black and dark grey bars), or according to the refined asymptotic result (11) (light grey bars). Note that the sample size n = 1000 was chosen rather large such that sample properties and (true) asymptotic properties should agree reasonably well. In Figure 1a, where the SPACF is computed from the raw counts (X t ), we omit plotting the mean at h = 1 as this would violate the graphic's Y-range (recall that ρ part (1) = α). From (a) and (b), we recognize that the simple asymptotics (5), where the mean of SPACF is approximated by zero, would be misleading in practice, because a negative bias with an oscillating pattern (odd vs. even lags) is observed. As a consequence, if testing the PACF based on (5) and thus ignoring the bias, we may get unreliable sizes, which is also later observed in our simulation studies. The alternating pattern of the bias in (a) and (b) is similar to the refined asymptotics (11). However, we do not observe an exact agreement to (11), as the simulated means seem to depend on the actual value of the AR-parameter α. The effect of α gets much stronger in (c), where even positive bias values for low h are observed, contradicting (11). This is caused by the use of the MM estimator, which is known to be increasingly biased with increasing α [23]; a possible solution for practice could be to use a bias-corrected version of the MM estimator. The lower panel in Figure 1 shows the corresponding standard deviations (SDs). The strongest deviation between simulated and asymptotic results is observed for lag h = 1, followed by lag h = 2. In particular, for both types of Pearson residuals and both h = 1, 2, the asymptotic SD from (11) is too large (and the asymptotic SD according to (5) would even be larger) such that a corresponding PACF-test is expected to be conservative (which is later confirmed by our simulation study). Therefore, it seems advisable to look for other ways of implementing the PACF-test, neither relying on (5) nor (11). An approximation based on asymptotic results does not look promising in general, as we expect the asymptotics to highly depend on the actual DGP, recall the aforementioned results by Kim & Weiß [12], Mills & Seneta [13]. Thus, in what follows, our idea is to try out different types of bootstrap implementations, i.e., the true distribution of the SPACF is approximated by appropriate resampling schemes. This might also allow to account for the effect of the selected estimator when computing the Pearson residuals.

Bootstrap Approaches for the Sample PACF
Let ϑ denote the parameter of interest for the actual DGP (Y t ), and letθ = T(Y 1 , . . . , Y n ) denote an estimate of it (in the present research, this parameter is the (S)PACF at some lag h ∈ N). Analogously, let (Y * t ) denote a corresponding bootstrap DGP, and letθ * = T(Y * 1 , . . . , Y * n ) be the estimator obtained from a bootstrap sample. If E * [·] denotes the expectation operator of the bootstrap DGP, that is, conditional on the data X 1 , . . . , X n , then the centered bootstrap estimate is given byθ * cent :=θ * − E * θ * . A common approach for constructing a two-sided bootstrap confidence interval (CI) for ϑ with confidence level 1 − α ∈ (0; 1) is given by where q γ (·) denotes the γ-quantile, see Hall [24]. The bootstrap CI (12) is used for testing the null hypothesis "H 0 : ϑ = ϑ 0 " on level α by applying the following decision rule: reject H 0 if ϑ 0 is not contained in the CI (12). This implies the equivalent decision rule to In the present article, ϑ refers to the PACF at lag h, computed from either the original count process (X t ), or from the Pearson residuals (R t ) obtained after model fitting. In both cases, the PACF at lag h is tested against the hypothetical value ϑ 0 = 0, as it would be the case for an AR-type process of order < h.
If we apply the PACF to the original count time series X 1 , . . . , X n , then the following setups are considered: If we apply the PACF to the Pearson residuals (R t ), then again (semi-)parametric setups are considered, where also model fitting is replicated based on the bootstrap time series, as well as the subsequent computation of Pearson residuals based on the bootstrap model fit. This time, a centering is not necessary. Non-parametric bootstrap schemes can be directly applied to the original Pearson residuals (without the need for model fitting during bootstrap replication). Under the null of model adequacy, we expect the available Pearson residuals to be uncorrelated. Thus, a first idea is to apply the classical Efron bootstrap [27], although this bootstrap scheme actually requires i. i. d. data. Therefore, as a second idea, we also apply the aforementioned block bootstrap to (R t ) to account for possible non-linear dependencies.

Remark 1.
For implementing the (semi-)parametric INAR bootstraps, or for computing the Pearson residuals with respect to an INAR model, the model parameters have to be estimated. The following approaches are used for this purpose:

PACF Diagnostics for Raw Counts
In the first part of our simulation study, we analyze the performance of the asymptotic and (semi-)parametric implementations of PACF-tests if these are applied to the raw counts (X t ) (the results of the non-parametric bootstrap schemes are discussed separately in Remark 2). We consider 1st-and 2nd-order AR-type DGPs, where the aim of applying the PACF-tests (nominal level 0.05) is the identification of the correct AR-order p. As the bootstrap versions of these tests are computationally very demanding (especially the semi-parametric INAR bootstrap), we use the warp-speed method of Giacomini et al. [28] for executing the simulations. This, in turn, allows us to use 10 4 replications throughout our simulation study. We also cross-checked that the achieved rejection rates are close to those obtained by a traditional bootstrap implementation with B = 500 bootstrap replications per simulation run. All simulations have been done with the software R, and R-codes can be obtained from the corresponding author upon request. Table 1 shows the rejection rates of the PACF-tests for different types of AR(1)-like count DGP, recall Section 2. There, the PACFs are computed from a simulated count time series x 1 , . . . , x n of length n, where the choice n = 100 (n = 1000) represents the small (large) sample behaviour. The results refer to the medium autocorrelation case ρ(1) = 0.5, but further results for ρ(1) ∈ {0.25, 0.75} are summarized in Appendix B, see Table A1. Five implementations of the PACF-test are considered: using the simple asymptotic approximation (5) or the refined one (11) (recall Section 3), using the parametric Poi-INAR(1) bootstrap with either MM or CML estimates, and using the semi-parametric INAR(1) bootstrap with CML estimates (recall Section 4). If first looking at the block "Poi-INAR(1) DGP" in Table 1, we recognize that all implementations perform roughly the same, i.e., the rejection rate at lag h = 1 (expressing the power of the PACF-test) is close to 1, and the rejection rates at lags h ≥ 2 (expressing the size) are close to the 0.05level. It should be noted, however, that for ρ(1) = 0.25, see Table A1, the asymptotic implementations have notably less power at lag h = 1. An analogous conclusion holds for the NB-INAR(1) block in Table 1, although now, the model behind the parametric Poi-INAR(1) bootstrap is misspecified. So the parametric bootstrap exhibits robustness properties in finite samples. In the third block, "Poi-INARCH(1)", also the semi-parametric bootstrap is misspecified, but again the rejection rates are robust for ρ (1) Table A1, however, we observe size exceedences for lags h ≥ 2, i.e., the misspecification of Poi-INARCH(1) as Poi-INAR(1) is not negligible anymore for this DGP. This is plausible in view of Remark 4.1.7 in Weiß [5], where it is shown that these models lead to different sample paths for high autocorrelation. Much more surprising, also both asymptotic implementations deteriorate (even more severely) for a Poi-INARCH(1) DGP with ρ(1) = 0.75, see Table A1, i.e., we get too many false rejections in any case. Thus, if one anticipates that the data are generated by an INARCH process, a tailor-made parametric bootstrap implementation of the PACF-tests should be used. Table 1. Rejection rates of PACF-tests applied to DGP with µ = 5 and ρ(1) = 0.5, where semiparametric (parametric) bootstrap relies on null of (Poi-)INAR(1) process. Let us continue our performance analyses by turning to 2nd-order DGPs. In Table 2, the (semi-)parametric bootstrap schemes are still executed by (erroneously) assuming a 1st-order INAR DGP (like in Table 1), i.e., they are affected by a (further) source of model misspecification. But as seen from the rejection rates in Table 2, we still have good size (h ≥ 3) and power values (h = 1, 2), comparable to those of the refined asymptotic implementation (11). By contrast, the simple asymptotic (5) leads to a clearly reduced power at lag h = 2. Finally, in Table 3, the bootstrap schemes now correctly assume a 2nd-order INAR DGP, i.e., we only have the following misspecifications left: parametric Poi-INAR(2) bootstrap applied to NB-INAR(2) or Poi-INARCH(2) DGP, and semi-parametric INAR(2) bootstrap applied to Poi-INARCH(2) DGP. It can be seen that the parametric bootstrap using MM estimates as well as the semi-parametric bootstrap lead to improved power at lag h = 2, whereas the parametric CML-setup even deteriorates (especially under Poi-INARCH(2) misspecification). The latter observation fits well to later results in Section 6, where the parametric bootstrap with CML estimates does again worse than its MM-or semi-CML-counterparts. This can be explained by the fact that for a fully parametric CML approach, model misspecification affects the estimates of all parameters simultaneously, while for the MM approach, for example, the estimation of mean and dependence parameters coincide across all three types of DGP. So it does not seem advisable to use a fully parametric bootstrap in combination with CML estimation for PACF diagnostics.  Table 3. Rejection rates of PACF-tests applied to DGP with µ = 5, ρ(1) = 0.5 and α 2 = 0.2, where semi-parametric (parametric) bootstrap relies on null of (Poi-)INAR(2) process. To sum up, if computing the SPACF from the raw counts (X t ), with the aim of identifying the AR-order of the given count DGP, the overally best performance is shown by the MM-based parametric and CML-based semi-parametric bootstrap implementation of the PACF-test, but also the refined asymptotic implementation relying on (11) does reasonably well. The latter is remarkable as these asymptotics are not the correct ones regarding the considered count DGPs (also recall the discussion of Figure 1), but it appears that their approximation quality is sufficient anyway. The simple asymptotic implementation (5), by contrast, as it is used by statistical software packages by default, leads to reduced power in some cases. From a practical point of view, as the additional benefit of the (semi-)parametric bootstrap schemes compared to the refined asymptotic implementation (11) is not that large, especially in view of the necessary computational effort, it seems advisable for practice to use (11) for doing the PACF-test. Recall that this recommendation refers to the case, where the SPACF is computed from the raw counts (X t ) to identify the DGP's AR-order. The case of applying the PACF-test to Pearson residuals for checking the model adequacy is analyzed in the following Section 6.

PACF Diagnostics for Pearson Residuals
While the raw counts' SPACF is typically computed before model fitting (namely for identifying appropriate candidate models), the PACF-analysis of the Pearson residuals is relevant after model fitting, namely for checking the fitted model's adequacy. Thus, the main difference of the simulations in the present section, compared to those of Section 5, is given by the fact that this time, we first fit a (Poi-)INAR model to the data, and then we apply the SPACF to the Pearson residuals computed thereof. For Poi-INAR model fitting, we again use either MM-or CML-estimation, and then we apply the asymptotic or corresponding parametric bootstrap implementations (like before, we use the warpspeed method). An exception is given by the semi-parametric CML estimation, as in this case, also the semi-parametric bootstrap is used for methodological consistency (and Pearson residuals are computed with respect to an unspecified INAR model). We also consider the same scenarios of model orders as before, i.e., 1st-order DGPs and INAR(1)-fit (Tables 4 and 7), 2nd-order DGPs but still INAR(1)-fit (Tables 5 and 8), and 2nd-order DGPs and INAR(2)-fit (Tables 6 and 9). Recall that the fitted model is now used for both the computation of the Pearson residuals and the implementation of (semi-)parametric bootstrap schemes.  Table 4 (1st-order models and DGPs; also see Table A3 in the Appendix B), we recognize that both asymptotic implementations lead to undersizing at lags h = 1, 2 (particularly severe at h = 1). This is in close agreement to our conclusions drawn from Figure 1 as well as to the findings of Weiß et al. [14]. An analogous observation can be done in Table 6 (2nd-order models and DGPs), but now for lags h = 1, 2, 3 (particularly severe at h = 1, 2). In both cases, however, the MM-based parametric bootstrap holds the nominal 0.05-level reasonably well. The drawback resulting from this undersizing gets clear in Table 5, where the wrong AR-order was selected during model fitting: the asymptotic implementations lead to a very low power for sample size n = 100, implying that one will hardly recognize the inadequate model choice. Thus, if model assumptions are used anyway for computing the Pearson residuals, the asymptotic implementation should be avoided, but the model assumptions should also be utilized for executing the PACF-test by using the parametric bootstrap scheme. As a final remark, strictly speaking, we are always concerned with model misspecification if having an NB-INAR or Poi-INARCH DGP. However, all three DGPs per table have the same conditional mean and, thus, the same autocorrelation structure, only their conditional variances differ. Also the MM-estimates required for computing the conditional mean are identical across all models. Thus, it is not surprising that the rejection rates of the PACF-tests do not differ much among these three types of DGP (but again with slight oversizing for the Poi-INARCH DGPs).
Finally, we did the same simulations again, but using CML instead of MM estimation. Table 7 (as well as Table A5 in the Appendix B) refer to the case of both 1st-order models and 1st-order DGPs. In the first block, where the parametric Pearson residuals are computed by correctly assuming a Poi-INAR(1) DGP, we have again strong undersizing at lag 1 for the asymptotic implementation, but a close agreement to the nominal 0.05-level for the parametric bootstrap. The remaining blocks with NB-INAR(1) and Poi-INARCH(1) DGP, however, differ notably from the corresponding blocks in Tables 7 and A5, respectively. This is plausible as the parametric CML approach for a misspecified model leads to misleading estimates for all parameters. While MM estimation leads to the same estimates for the dependence parameters across the three 1st-order models, these differ for parametric CML estimation. Therefore, we have high rejection rates especially at lag 1 (especially if using the parametric bootstrap), which is desirable on the one hand as the fitted model is indeed not adequate. On the other hand, we did not misspecify the (P)ACF structure (a 1st-order model is correct for all DGPs) but the actual data-generating mechanism, i.e., a user might draw the wrong conclusion from this rejection based on the lag-1 PACF. At this point, it is interesting to look at the semi-parametric model fit and bootstrap in Table 7. For both INAR(1) DGPs, the rejection rates are close to the 0.05-level, which is the desirable result as we are concerned with an adequate model fit. For the Poi-INARCH(1) DGP, by contrast, we get moderately increased rejection rates at lag 1, which again has to be assessed ambivalently: on the one hand, the fitted INAR(1) model is indeed not adequate, but on the other hand, the inadequacy does not refer to the autocorrelation structure.     Table 9. Rejection rates of PACF-tests applied to Pearson residuals using CML estimates (DGPs with µ = 5, ρ(1) = 0.5, and α 2 = 0.2), where both residuals and bootstrap rely on null of Poi-INAR(2) process (parametric bootstrap) or unspecified INAR(2) process (semi-parametric bootstrap), respectively. Essentially analogous conclusions can be drawn from Table 9, where we are concerned with both 2nd-order models and 2nd-order DGPs. So let us turn to Table 8, where 1storder models are fitted to 2nd-order DGPs. Thus, we are concerned with at least an inadequate autocorrelation structure (and sometimes also further model misspecification) such that high rejection rates are desirable. Let us start with the first block about the Poi-INAR(2) DGP. As a consequence of the strong undersizing at lag 1, the parametric bootstrap, and especially the asymptotic implementations, show relatively low power values, especially for the small sample size n = 100. The semi-parametric bootstrap, by contrast, has substantially higer power values at lag 1. For lags h ≥ 2, the rejection rates are similar between the different implementations, with a slight advantage for the refined asymptotics as well as the parametric bootstrap. The discrepancy at lag 1 gets even more extreme for the NB-INAR(2) and Poi-INARCH(2) DGP, then all other implementations than the semi-parametric one lead to power close to zero. For lags 2 and 3, by contrast, the refined asymptotics as well as the parametric bootstrap are again more powerful. However, looking back to Table 5, it seems that the overall most appealing power is shown by the MM-based parametric bootstrap. This type of bootstrap also has the advantage that the necessary computational effort is much less than for the CML-based bootstraps. Thus, altogether, while we recommended to use the refined asymptotics (11) if testing the PACF computed from the raw counts, the PACF analysis of Pearson residuals should be done by the MM-based parametric bootstrap: if computing the Pearson residuals from an MMfitted Poi-INAR model, and if using this model fit for parametric bootstrap, one has good size properties and an appealing power performance at the same time. Certainly, this recommendation does not exclude to do CML-fitting in a second step, once the correct AR-order has been identified. But during the phase of model diagnostics, at least if n is not particularly large, the parametric-MM solution seems to be best suited.

Remark 2.
As mentioned in Section 4, we also tried out fully non-parametric bootstrap schemes. For the case where the PACF-tests are applied to the raw counts (X t ), as discussed in Section 5, the circular block bootstrap was used as a fully non-parametric setup, see Table A2 in the Appendix B for the obtained results. While these implementations lead to an appealing power at lag h = 1, strong size deteriorations are observed for h ≥ 2. The strongest deviations are observed for the fixed block length b = 5. Increasing b, first the low-lag rejection rates stabilize at 0.05, while we have undersizing for large lags. For b = 20, 25, we have good sizes for h = 5, 6, but now the low lags lead to exceedances of 0.05. Thus, tailor-made block lengths would be required for different lags h. The automatic block-length selection via b.star typically leads to block lengths between 5 and 10 (depending on the actual extent of ρ(1)), but this causes undersizing throughout, getting more severe for increasing h. The reason why b.star tends to pick block lengths that are too small to capture dependence at larger lags is given by the fact that it is designed to select a block length suitable for inference about the sample mean, but not for the sample PACF. In view of the aforementioned size problems and the unclear choice of block lengths, we discourage from using block-bootstrap implementations of the PACF-test for analyzing the raw counts data.
If doing a PACF-analysis of the Pearson residuals, as we investigate it in the present Section 6, then, besides block-bootstrap implementations, also the Efron bootstrap appears reasonable for this task. For the case where the Pearson residuals rely on MM estimates, simulation results are summarized in Table A4 in the Appendix B. If doing an automatic block-length selection via b.star, we often end up with block length 1 (as the Pearson residuals are uncorrelated under model adequacy). Thus, the b.star-block bootstrap shows nearly the same rejection rates as the Efron bootstrap, but these are too low at lags h = 1, 2, like for the asymptotic implementations. Increasing the block length to the fixed values b = 5 or b = 10, we get an even further decrease in size. Therefore, neither Efron nor block bootstrap offer any advantage compared to the asymptotic implementations. Analogous conclusions hold if model fitting is done by CML estimation, see Table A6 in the Appendix B, so we discourage from the use of Efron and block bootstrap also if doing a PACF-test of the Pearson residuals.

Real-Data Application
For illustration, we pick up a widely discussed data example from the literature, namely the claims counts data introduced by Freeland [29]. These counts express the monthly number of claims caused by burn-related injuries in the heavy manufacturing industry for the period 1987-1994, i.e., the count time series is of length n = 96; see Figure 2. Recall that the R-code used for the subsequent computations is provided in the Supplementary Materials. Freeland [29] suggested to model these data by a Poi-INAR(1) model, but following the discussions of subsequent authors, this model choice is not without controversy. For example, the marginal distribution exhibits moderate overdispersion, as the sample variance 11.357 exceeds the mean 8.604. Therefore, some authors suggested to consider an NB-INAR(1) or Poi-INARCH(1) model instead. Furthermore, one may doubt the 1st-order AR-structure, see Weiß et al. [30], as the SPACF in Figure 2 is only slightly non-significant at lag h = 2, where the plotted critical values (dashed lines) refer to the PACF-test on level 0.05 based on the simple asymptotic implementation (5). Thus, altogether, we are concerned with a scenario that fits very well to our simulation study in Sections 5 and 6: the null hypothesis for the data is that of a Poi-INAR(1) model, but this model might be misspecified in terms of marginal distribution, model order, or the actual AR-type data-generating mechanism. Moreover, the sample size n = 96 and the lag-1 sample ACF 0.452 are close to the parametrizations considered there. In what follows, we apply the different implementations of the PACF-test to (the Pearson residuals computed from) the claims data. Certainly, as we do not know the true model behind the data, we are not in a position to pass definitive judgement on whether or not a test lead to the correct or wrong decision. But we shall discuss the PACF-tests with respect to our simulation results. Let us start with an analysis of the raw counts' SPACF, in analogy to Section 5. Table 10 summarizes the SPACF(h) values for h = 1, . . . , 5 (bold font) as well as the corresponding critical values (level 0.05). The latter are computed by the five methods considered in Section 5, with the number of bootstrap replications chosen as B = 1000. For the simple asymptotic implementation (5), as we have already seen in Figure 2, we get a rejection only at lag 1, whereas the remaining methods reject also at lag 2. Thus, there is indeed evidence that the data might stem from a higher-order model. In addition, the different lag-2 decisions for (5) vs. the remaining implementations appear plausible in view of Table 2, where we found clearly lower power for (5) at h = 2. Note that all critical values except (5) are visibly asymmetric, so the SPACF appears rather biased for n = 96. Furthermore, all bootstrap implementations lead to quite similar critical values, and the refined asymptotic implementation (11) is also similar to them except for the upper critical value at h = 1.
Next, we fit either a Poi-INAR(1) model to the claims counts (via MM or CML), or an unspecified INAR(1) model by the semi-parametric CML approach. Using the resulting model fits, we first compute a set of Pearson residuals for each model, and then the SPACF thereof, like in Section 6. The critical values are determined by both asymptotic approaches as well as by the bootstrap approach corresponding to the respective estimation method. Results are summarized in Table 11. We get only a few rejections anymore, namely for the CML-fit of the Poi-INAR(1) model at lag h = 2, both for the refined asymptotics and the parametric bootstrap. The remaining model fits do not lead to a rejection, and one might ask, why? The reason seems to be the respective estimate of the AR(1)-parameter α 1 = ρ(1), which equals only 0.396 for CML, but 0.452 for MM and 0.434 for semi-CML. So the CML-fit explains less of the dependency in the data. The deeper reason for this ambiguous outcome seems to be the low sample size n = 96; according to Section 6, we can generally expect only mild power values. It is again interesting to compare the different critical values.
For the Poi-INAR(1) CML-fit, bootstrap and refined asymptotics lead to rather similar critical values, in agreement with our simulation results in Section 6, where a similar performance of both methods was observed. For the remaining estimation approaches, the bootstrap critical values tend to be more narrow than the asymptotic ones, especially at lags 1 and 2. The strongest "shrinkage" of the critical values is observed for h = 1, which goes along with our findings in Section 6, where the asymptotic implementation lead to severe undersizing at lag 1, whereas the bootstrap approaches held the nominal level quite well. Furthermore, due to the narrower critical values, the MM and semi-CML bootstraps are also more powerful at lags 1 and 2.

Conclusions
In this paper, we considered PACF model diagnostics for AR-type count processes based on raw data and on Pearson residuals, respectively. At first, we illustrated the limitations of the widely used and well-known asymptotic distribution result (as well as some refinements thereof) for the sample PACF values. Then, we introduced appropriate bootstrap schemes for the approximation of the correct sample PACF distribution. We considered a fully parametric bootstrap combined with MM and CML estimation, a semiparametric bootstrap combined with CML estimation, and a fully non-parametric bootstrap scheme. We compared the performance of the different procedures for first-and secondorder AR-type count processes. In the case where we apply the PACF test directly to the raw count data, the best performance was observed for the MM-based parametric bootstrap, CML-based semi-parametric bootstrap, and the refined asymptotic results, where the latter are preferable for computing time reasons. By contrast, when applying the PACF test to the Pearson residuals, we advise using the MM-based parametric bootstrap procedure which simultaneously provides good size properties and power performance. Finally, we applied our different PACF procedures to a well-known data set on claims counts and found some evidence for a higher-order model. Acknowledgments: The authors thank the two referees for their useful comments on an earlier draft of this article.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. On the Equivalence of ACF, PACF, and AR Coefficients
The first p Yule-Walker (YW) equations in can be rewritten in vector-matrix notation as follows: For k ∈ N, let α k := α 1 , . . . , α k ∈ R k with α i = 0 for i > p, let r k := ρ(1), . . . , ρ(k) ∈ R k , and Then, (A1) implies that the AR(p) process satisfies the linear equation Note that R k constitutes a symmetric Toeplitz matrix, i.e., it is characterized by having constant diagonals. This type of matrix structure was first considered by Toeplitz [31,32], and it is crucial for efficiently solving (A3) in α p (see the details below).
Assume that R k is invertible, and let a k ∈ R k be the unique solution of the equation Then, the PACF at lag k is defined by ρ part (k) := a k,k (last component of a k ); let us denote If (X t ) Z follows an AR(p) model, then (A3) implies that holds; in particular, we have a p = α p . Because of the Toeplitz structure of R k , the YWequations (A3) can be solved recursively for k = 1, 2, . . ., which was first recognized by Durbin [33], Levinson [34]. The recursive scheme, which is commonly referred to as the Durbin-Levinson (DL) algorithm, can be expressed as Given the (sample) ACF, the DL-algorithm (A6) is used to recursively compute the (sample) PACF for k = 1, 2, . . ., where ρ part (1) = a 1,1 = ρ (1). Furthermore, applying the DL-algorithm to (A3), we can compute the AR parameters α 1 , . . . , α p corresponding to the ACF values ρ(1), . . . , ρ(p) (or if using the sample ACF, we end up with moment estimates for the AR parameters, referred to as YW-estimates). In R, this is readily implemented via acf2AR. Given the AR parameters, in turn, (A1) or (A3) can also be solved in the ACF, see Section 3.3 in Brockwell & Davis [11] as well as the R command ARMAacf.
The previous discussion shows that an AR(p) model can be characterized equivalently by either α p or r p . According to Barndorff-Nielsen & Schou [35], this type of "equivalent parametrization" can be further extended by the one-to-one relationship between α p and π p , i.e., we have one-to-one relations between r p ↔ α p ↔ π p . For computing α p from π p , Barndorff-Nielsen & Schou [35] suggest to use the DL-algorithm (A6) together with (A4) as follows: which is initialized by setting a 1,1 = ρ part (1). Then, α p = a p . Altogether, the application of the DL-algorithm allows the transformations By contrast, recall that α p → r p (and thus π p → α p → r p ) is done by solving (A1) or (A3) in the ACF, e.g., by using the "third method" in Brockwell & Davis [11] (Section 3.3) or R's ARMAacf. Table A1. Rejection rates of PACF-test applied to DGP with µ = 5, where semi-parametric (parametric) bootstrap relies on null of (Poi-)INAR(1) process.