Next Article in Journal
Tornado Occurrences in the United States: A Spatio-Temporal Point Process Approach
Previous Article in Journal
Bayesian Model Averaging with the Integrated Nested Laplace Approximation
Previous Article in Special Issue
Distributions You Can Count On …But What’s the Point?
Open AccessFeature PaperArticle

Maximum-Likelihood Estimation in a Special Integer Autoregressive Model

Institut für Volkswirtschaftslehre (520K) and Computational Science Lab (CSL) Hohenheim, Universität Hohenheim, D-70593 Stuttgart, Germany
Management School, University of Liverpool, Liverpool L69 7ZH, UK
Author to whom correspondence should be addressed.
Econometrics 2020, 8(2), 24;
Received: 25 February 2020 / Revised: 14 May 2020 / Accepted: 2 June 2020 / Published: 8 June 2020
(This article belongs to the Special Issue Discrete-Valued Time Series: Modelling, Estimation and Forecasting)


The paper is concerned with estimation and application of a special stationary integer autoregressive model where multiple binomial thinnings are not independent of one another. Parameter estimation in such models has hitherto been accomplished using method of moments, or nonlinear least squares, but not maximum likelihood. We obtain the conditional distribution needed to implement maximum likelihood. The sampling performance of the new estimator is compared to extant ones by reporting the results of some simulation experiments. An application to a stock-type data set of financial counts is provided and the conditional distribution is used to compare two competing models and in forecasting.
Keywords: autoregression; counts; maximum-likelihood; binomial-thinning autoregression; counts; maximum-likelihood; binomial-thinning

1. Introduction

First-order integer autoregressive (INAR) models have been studied extensively in the literature and applied in a wide range of fields. The models emerged following the work of McKenzie (1985) and Al-Osh and Alzaid (1987). Higher-order models that make it possible to capture more complex dependence structures in observed count time series are somewhat less well developed.
Various estimation methods for the parameters of these models, including method of moments and conditional least squares, have been widely used. For efficency considerations, maximum-likelihood estimation (MLE) may be desirable, when it is available. Al-Osh and Alzaid (1987) present details of MLE in the first-order case. Bu et al. (2008) extend this to a particular higher-order model with Poisson innovations using the construction of Du and Li (1991). Extensions to a semi-parametric MLE (where no specific distributional assumption is made for the arrivals) is provided in Drost et al. (2009) and Martin et al. (2011). Jung and Tremayne (2011) discuss MLE in a second order model using the random operator of Joe (see e.g., Joe 1996).
In their elegant paper, Bu et al. (2008) provide the necessary conditional distribution for the higher-order case when the thinning operations are performed independently of one another in their expression (9) and explicitly apply it to the second-order case in their expression (13) and subsequent computations. The parallel expression for a conditional distribution in the case of a pth-order model considered by Alzaid and Al-Osh (1990) has not been given. Indeed, Alzaid and Al-Osh (1990, sec. 6) were of the opinion that “... the MLE for the general model seems intractable”, probably because of the unavailability of a required conditional distribution. In this paper we show that the relevant conditional distribution needed for the likelihood is available, making MLE feasible. We focus on the second-order specification here.
The model of Alzaid and Al-Osh (1990) has been found useful for practical applications, see e.g., Jung and Tremayne (2006), in the context of stock-type counts exhibiting a pronounced positive autocorrelation. A specific feature of the specification is that it invokes closure under convolution to determine the discrete distribution theory needed for MLE. This also makes the model attractive from a theoretical point of view, as may the fact that the time reversibility applying to the first-order model continuous to hold in higher-order cases (Schweer 2015).
In this short paper we provide the MLE for the second-order model, examine its finite sample performance and offer an application. We compare the performance of the second-order specification to the widely used first-order model by means of goodness-of-fit techniques and show that the former is preferred to the latter for the data set considered.
The paper is structured as follows. Section 2 provides some background material for deriving the conditional distributions needed to formulate a MLE and probabilistic forecasting. Then, in Section 3 we focus on the INAR model of order two of Alzaid and Al-Osh (1990) and provide its MLE. The methodology can be extended to higher-order models, though this would be cumbersome. Section 4 contains the results of simulation experiments to assess the small sample performance of the proposed estimator. Section 5 illustrates the model’s application to a time series of iceberg order data. The final short section provides concluding remarks.

2. Preliminaries

This section describes means for obtaining discrete conditional, or predictive, distributions for thinning-based first order INAR models. Our approach to using the relevant (multivariate) distribution theory below is to employ the following familiar relationship
P ( X t = x | X t 1 = y ) p ( x | y ) = p ( x , y ) p ( y ) ,
where p ( x , y ) = P ( X t = x , X t 1 = y ) denotes an unconditional bivariate probability mass function (pmf) and p ( y ) = P ( X t 1 = y ) a corresponding univariate pmf.
To frame the argument, consider the simplest case of a Poisson first-order INAR model, henceforth PAR(1). The model can be written as
X t = R t ( F t 1 ) + W t ,
for t = 0 ± 1 , ± 2 , , where R t ( · ) denotes a random operator, which, in this case, is the well-known binomial thinning operator of Steutel and van Harn (1979)
α X t i = j = 1 X t i Y j , t i ,
and the Y j , t i are assumed to be iid Bernoulli random variables with P ( Y j , t i = 1 ) = α and P ( Y j , t i = 0 ) = 1 α ( α [ 0 , 1 ) ). In the PAR(1) model W t is an iid discrete Poisson innovation with parameter λ and W t and F t 1 are presumed to be stochastically independent for all points in time. The conditional distribution of the number of ‘survivors’, R, given the past X t 1 = y , g ( r | y ) , is well known to be binomial with parameters given by the dependence parameter α and y. The desired predictive pmf is then the convolution of the binomial pmf g ( r | y ) , determining the number of ‘survivors’, with a Poisson pmf p W ( · ) , determining the number of innovations, i.e.,
p ( x | y ) = r = 0 g ( r | y ) · p W ( x r )
and yields (2) of McKenzie (1988).
Alternatively, one can seek to obtain the bivariate pmf p ( x , y ) and the higher dimensional pmf’s in subsequent sections directly. Following Joe (1996) (specifically, the Appendix) the trivariate reduction method (see, Mardia 1970) and its generalization will be used here. Suppose Z 1 , Z 2 , Z 12 are independent random variables in the convolution-closed infinitely divisible family F ( · ) , appropriately indexed by some parameter(s). Then a stochastic representation of the dependent random variables X and Y from the same family is given by
X = Z 1 + Z 12 and Y = Z 2 + Z 12 ,
compare Joe (1996), Equation (A2). It is straightforward to show that C o v ( X , Y ) = V a r ( Z 12 ) (Mardia 1970, eq. 9.1.9).
Define the following independent Poisson random variables: Z 1 , Z 2 i i d P o ( λ ) ; and Z 12 i i d P o ( α λ U ) , where U = ( 1 α ) 1 . By independence, the joint distribution of the Poisson random variables Z 1 , Z 2 , Z 12 is the product of their marginal distributions
p ( z 1 , z 2 , z 12 ) = p ( z 1 ) p ( z 2 ) p ( z 12 ) .
The joint distribution of the dependent variables X and Y can be obtained from (5) by writing the { Z j } (where here and below j is used as generic subscript(s) for the relevant independent random variables) in terms of X and Y. Introduce the ‘dummy’ argument A = Z 12 . Then (4) can be written in convenient form as
X Y A = 1 0 1 0 1 1 0 0 1 Z 1 Z 2 Z 12 .
The linear system (6) can now be solved for the { Z j } variables by inverting the transformation defining matrix, T , to give
( Z 1 , Z 2 , Z 12 ) = T 1 ( X , Y , A ) .
Since the joint distribution of the { Z j } is readily available from (5), p ( x , y ) can be obtained by summing across the ‘dummy’ argument a to give
p ( x , y ) = a = 0 p λ ( x a ) p λ ( y a ) p α λ U ( a ) .
Although not explicitly indicated, the upper summation limit here is in fact m i n ( x , y ) and is typically quite small because we envisage modelling low order counts. Using (7) with the case of Poisson innovations gives
p ( x , y ) = exp λ U ( α 2 ) a = 0 λ x + y 2 a ( α λ U ) α ( x a ) ! ( y a ) ! a ! .
This result corresponds to that given as (3) by McKenzie (1988) and dividing by p ( y ) the resulting conditional pmf is that given as (2) in McKenzie (1988).
Model (2) still applies with W t i i d G P ( λ , η ) , see, e.g., Consul (1989) and Jung and Tremayne (2011) for more details of the associated pmf, interpretation of the parameter η and so forth. Such a model will facilitate the handling of overdispersed count data. If closure under convolution is still to apply, it is no longer appropriate to use the binomial thinning operator as R t ( . ) . However, the approach due to Joe (1996) can still be employed with an alternative random operator. This leads to the conditional distribution of the survivors, given X t 1 = y , following a quasi-binomial distribution of the form
g ( r | y ) = y r α ( 1 α ) ( α + ψ r ) r 1 [ 1 α + ψ ( y r ) ] y r 1 ( 1 + ψ y ) y 1 , r = 0 , 1 , , y ,
where ψ = η ( 1 α ) / λ (Consul 1989, p. 195), rather than a binomial distribution. The convolution of this distribution with that of W t being GP gives the transition distribution required for MLE and the joint bivariate pmf of two consecutive observations is GP, thereby preserving closure under convolution. With this modification, a parallel argument following (1) down to (7) remains applicable though, of course, the p ( · ) random variables of, say, (5) are now GP.

3. The INAR model of Alzaid and Al-Osh

The model specification to be discussed here in detail follows the work of Alzaid and Al-Osh (1990). It is based on a second order difference equation, where the thinning operations are applied in a different way from Du and Li (1991). The innovation process is assumed to be W t P o ( λ ) , so
X t = α 1 X t 1 + α 2 X t 2 + W t .
The resulting process will henceforth be denoted the PAR(2)-AA model. Process (8) is stationary provided α 1 + α 2 < 1 . The special character of the PAR(2)-AA process stems from the fact that the two thinning operations in (8) are not performed independently of one another. The bivariate random variable ( α 1 X t , α 2 X t ) , with the thinnings otherwise independent of the past history, is structured so as to follow a trinomial conditional distribution with parameters ( α 1 , α 2 , X t ) . See e.g., Jung and Tremayne (2006) for details.
The particular nature of this construction has various important consequences. First, the process has a physical interpretation as a special branching process with immigration (see e.g., Alzaid and Al-Osh 1990). Second, the autocorrelation structure of the model is more complicated than that of a standard Gaussian AR(2) model (compare Du and Li 1991), for the specific random thinning operations introduce a moving average component into the autocorrelation structure. This is, in fact, of the form generally associated with a continuous ARMA(2,1) model and arises from the mutual dependence structure between the components of X t . In particular, the first and second order autocorrelations are Corr ( X t , X t 1 ) = α 1 and Corr ( X t , X t 2 ) = α 1 2 + α 2 (see, e.g., Jung and Tremayne 2003).
Note that the PAR(2)-AA model outlined above embodies closure under convolution so that the marginal distribution of X t is Poisson with parameter λ / ( 1 α 1 α 2 ) = λ U , thereby redefining U as used in Section 2. Using an extension of the technique outlined in Section 2, we now proceed to obtain the necessary predictive distribution for maximum likelihood estimation of the parameters of this model.
For this purpose we derive the trivariate Poisson distribution P ( X t = x , X t 1 = y , X t 2 = v ) = p ( x , y , v ) and introduce seven independent Poisson random variables: Z 1 , Z 3 i i d P o ( λ ) ; Z 2 i i d P o ( τ ) , where τ = ( 1 α 1 ) 2 λ U ; Z 12 , Z 23 i i d P o ( ϕ ) , where ϕ = α 1 ( 1 α 1 ) λ U ; Z 13 i i d P o ( α 2 λ U ) ; and Z 123 i i d P o ( α 1 2 λ U ) .
The (dependent) random variables X , Y , V can be written in terms of the independent { Z j } random variables as follows:
X = Z 1 + Z 12 + Z 13 + Z 123 Y = Z 2 + Z 12 + Z 23 + Z 123 V = Z 3 + Z 23 + Z 13 + Z 123 ,
where, as in (4), the ‘=’ means ‘equivalent in distribution to’. Such an extension of the trivariate reduction method has been used, inter alia, by Mahamunulu (1967), Loukas and Kemp (1983) and Karlis (2003). Introducing the following ‘dummy’ arguments: A = Z 12 ; B = Z 13 ; C = Z 23 ; and D = Z 123 , we then have, in compact form,
X Y V A B C D = 1 0 0 1 1 0 1 0 1 0 1 0 1 1 0 0 1 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 Z 1 Z 2 Z 3 Z 12 Z 13 Z 23 Z 123 .
Solving the linear system (9) in terms of the { Z j } random variables by inverting the transformation defining matrix, T , gives
( Z 1 , , Z 123 ) = T 1 ( X , Y , V , A , B , C , D ) .
Proceeding as in Section 2, we obtain the joint distribution p ( x , y , v ) by summing across the ‘dummy’ arguments, i.e.,
p ( x , y , v ) = a = 0 b = 0 c = 0 d = 0 p ϕ ( a ) p α 2 λ U ( b ) p ϕ ( c ) p α 1 2 λ U ( d ) × p λ ( x a b d ) · p τ ( y a c d ) × p λ ( v b c d ) .
Note that the first order autocorrelation of the PAR(2)-AA model is α 1 and remember that U has been redefined in this section. Define Z 1 = Z 2 + Z 12 , Z 2 = Z 3 + Z 13 and Z 12 = Z 23 + Z 123 , then the argument above from (5) to (7) in these newly defined independent random variables still applies and the bivariate pmf p ( y , v ) for the PAR(2)-AA model is easily seen to be
p ( y , v ) P ( X t 1 = y , X t 2 = v ) = exp ( α 1 2 ) λ U × a = 0 ( 1 α 1 ) λ U y + v 2 a ( α 1 λ U ) α ( y a ) ! ( v a ) ! a ! .
Using (10) and (11) the required conditional distribution is
p ( x | y , v ) = a b c d exp { λ [ ( 1 + α 2 ) U + 2 ] } λ x + v + y a b c 2 d × U y + b α 1 a + c + 2 d α 2 b ( 1 α 1 ) 2 y 2 d a c × [ a ! b ! c ! d ! ( x a b d ) ! ( y a c d ) ! ( v b c d ) ! ] 1 × [ p ( y , v ) ] 1 .
The unwieldy conditional distribution (12) can now be used for all purposes where predictive distributions are needed, including MLE, probabilistic forecasting and predictive model assessment. As the PAR(2)-AA model is a stationary Markov chain (see Alzaid and Al-Osh 1990), conditional MLE (ignoring end effects) based on the logarithm of (12) can be implemented on the basis of observed counts x 1 , , x T using the conditional log-likelihood
L T ( θ ) = t = 3 T log p ( x t | x t 1 , x t 2 ; θ ) ,
where θ = ( α 1 , α 2 , λ ) is the parameter vector of interest. Despite the quadruple summation, the log-likelihood is not too burdensome to calculate, because the upper summation limits are typically quite small. Maximizing (13) yields MLEs that have the usual desirable asymptotic properties.
We now provide the conditional mean E ( X t | X t 1 , X t 2 ) for this model. It can be derived from (12) or, alternatively, using the method described in Alzaid and Al-Osh (1990). Utilizing results from Mahamunulu (1967) we find, after some tedious algebra (details available on request), the conditional mean to be
E ( X t | X t 1 = y , X t 2 = v ) μ X | F | ( θ ) = λ [ 1 + ( α 1 + α 1 α 2 U ) p ( y 1 , v ) p ( y , v ) + α 2 U p ( y , v 1 ) p ( y , v ) + α 1 2 U p ( y 1 , v 1 ) p ( y , v ) ] ,
which is clearly nonlinear. This is a distinctive feature of the PAR(2)-AA model relative to the one of Du and Li (1991) with its linear regression function. The shorthand notation F in the middle part of (14) indicates the relevant past history ( X t 1 , X t 2 ) . Finally, note that Equation (5.4) given in Alzaid and Al-Osh (1990, p. 323) differs slightly from (14); after careful checking, we believe our result to be correct.
The argument between (8) and (12) can be extended in principle to deal with a p-th order model of Alzaid and Al-Osh (1990, sec. 2). Full details are not provided, because the argument is laborious and e.g., for p = 3 fifteen independent Z variables would be needed.

4. Finite Sample Performance of Parameter Estimators

In this Section, the results of a series of Monte Carlo experiments to assess the finite sample properties of estimators in the PAR(2)-AA model are discussed. Specifically, the performance of the MLE of this paper is assessed relative to three competing estimators. Consistent estimates for the starting values for the MLE procedure can either be based on method of moments (MM) estimators, or obtained from a conditional, or nonlinear, least-squares (NLS) procedure. Parameter estimates may also be obtained using the ubiquitous generalized method of moments (GMM) principle, first introduced into the econometrics literature by Hansen (1982).
Here and in what follows, all quantities based on X t are, of course, computed using sample analogues. Method of moments estimators can be obtained from moment conditions based on the marginal mean of the counts and on the first and second order sample autocorrelations (see e.g., (Jung and Tremayne 2006)). An important quantity in the construction of both NLS and GMM estimators is the one-step ahead prediction error, or residual, defined as e t = X t μ X | F | ( θ ) . Conditional or non-linear least squares estimators can be based on the criterion function
S ( θ ) = X t μ X | F | ( θ ) 2 ,
which is to be minimized with respect to the parameter vector θ .
GMM has been used in the context of count data modelling using binomial thinning in at least two contexts previously. In an early paper, Brännäs (1994) advocated its used in a univariate INAR(1) context while, more recently, Silva et al. (2020) have employed it in a bivariate INMA(1) context where no conditional, or predictive, distribution is available in closed form, thereby precluding the use of MLE. Here we employ an extension of the approach adopted by Brännäs (1994) to GMM; see that paper for full details. As there, we entertain only a conditional estimation problem. Based upon the residuals e t , which should themselves have zero mean, a suitable set of moment condition follows readily. Since θ is comprised of three elements, we employ four moment conditions to give an overidentified GMM problem.
Introduce the notation σ X | F | 2 ( θ ) to denote the (conditional) variance of the random variable with pmf given in (12) and noting that, conditionally, E [ e t 2 ] = σ X | F | 2 ( θ ) , m 2 follows. Further, the residuals e t should be serially uncorrelated (conditionally as well as unconditionally), so we may also use E [ e t e t 1 ] and E [ e t e t 2 ] . These two conditions are denoted as m 3 and m 4 below. Hence, the conditions used are:
m 1 = e t m 2 = e t 2 σ X | F | 2 ( θ ) m 3 = e t e t 1 m 4 = e t e t 2
Since we have no closed form for the (conditional) variance, we use
Var [ X t | X t 1 , X t 2 ] σ X | F | 2 ( θ ) = x 2 p ( x | y , v ) μ X | F | ( θ ) 2 .
These moment conditions can all be thought of as conditional moment restrictions. Summing over all available observations, dividing by the (adjusted) sample size and collecting them in a vector gives m ¯ ( θ ) , the vector of moment conditions. The GMM criterion function to be minimized is
Q ( θ ) = m ¯ ( θ ) W 4 1 m ¯ ( θ ) ,
where W 4 is a positive definite weighting matrix. Since Brännäs (1994) finds limited benefit to calculating two-step GMM estimates in the context of the INAR(1) model, we base our results on one-step GMM estimation, where an identity matrix serves as the weighting matrix.
The simulations are designed to resemble situations typically encountered in empirical research and to cover a range of different points in the relevant parameter space. Moreover, one set of parameters is chosen such that it mimics the situation found in the application presented in Section 5. Series of low counts of length T = 125 , 250, and 500 are generated. The design parameters for the simulations are chosen as follows. The level of the generated processes E ( X t ) varies between 0.6 and 4 in order to ensure low level counts. We experimented with other (low) values for the mean of the process and found qualitatively similar results. The dependence structure of the generated processes is governed by a combination of α 1 and α 2 , where α 1 + α 2 < 1 . In an extension of Figure 1 in Jung and Tremayne (2003) the α 1 / α 2 parameter space, along with some informative partitions, is depicted in Figure 1 here. Combinations of α 1 and α 2 are taken from each of the three areas labelled A, B and C. All combinations from area A exhibit exponentially decaying autocorrelation functions (ACFs), while those from area B exhibit an exponential decaying behaviour for lags higher than two but Corr ( X t , X t 1 ) < Corr ( X t , X t 2 ) . Parameter combinations from area C exhibit more complex dependence patterns, where the ACF is oscillating at least for the first four lags before decaying exponentially to zero. The innovations W t are generated as independent Poisson random variables with parameter λ . The simulations are carried out using programs written in GAUSS Version 19.
Of course, due to the restricted parameter space and the unconstrained estimation procedures employed in this study, the problem of inadmissible parameter estimates has to be dealt with. The number of simulations runs is chosen such that the statistics displayed in the tables below are based on a minimum of 5000 replications. Simulated data series which lead to inadmissible parameter estimates are discarded. A Referee has indicated that it is helpful to expand on this feature of the results (full details in tabular form are available on request). Overwhelmingly, such replications stem from negative estimates of α ^ 2 at the lowest sample size T = 125 and are predominantly infrequent, though they never exceed 16 % . Simulation results from four points in the parameter space are reported below. When T > 125 , inadmissible parameter estimates never arise with the second parameterization; at most 0.3 % of replications are discarded with the third; and at most 5 % with the fourth (and then only ever because of inadmissible α ^ 2 ). Parametrizations one and four lead to more inadmissible replications, probably stemming from the small value used for α 2 in data generation. The results of simulation experiments are exposited by means of tables whose entries summarize the outcomes through the bias, percentage bias and mean squared error (MSE) for the MM, NLS, GMM and ML estimators.
For the first set of simulation experiments we specify a dependence structure with an exponentially decaying ACF. Here α 1 = 0.3 and α 2 = 0.1 are chosen to represent a point in area A in Figure 1. By setting λ to 1.8 , the mean of the process is 3. The basic results are displayed in Table 1; we subsequently display the results from the other three experimants in Table 2, Table 3 and Table 4 and then provide some general discussion.
In the next set of simulation experiments we select a more complex dependence structure. Here α 1 = 0.4 and α 2 = 0.3 are chosen as representive of area B in Figure 1. By setting λ to 1.2 , the mean of the process is now 4.
For the third set of simulation experiments, the dependence parametrs are α 1 = 0.3 and α 2 = 0.4 and chosen so as to stem area C in Figure 1. Choosing λ to be 1.2 , the mean of the process is again 4.
Finally, the fourth and last set of simulation experiments use a point in the parameter space that is close to the estimated parameter values found for the data employed in Section 5. For this purpose α 1 = 0.5 and α 2 = 0.1 and λ = 0.24 , yielding a mean of 0.6 .
Across all four experiments, the estimates for the α parameters tend to be downward biased, while those for λ are upward biased. This stems from the negative correlation between the two types of parameter estimators (dependence and location) and is equivalent to the findings reported in Jung et al. (2005). While the downward bias for α 1 can be observed over all four tables above, in those cases where α 2 is quite small ( 0.1 ), i.e., in the first and fourth experiments, an upward bias results in particular for low sample sizes. This arises due to the fact that the sampling distribution of α ^ 2 is truncated at zero as only non-negative estimates for that parameter are admissible.
The more complex the dependence structure of the data generating process is, the higher the bias for all estimators tends to become. For instance, in the third experiment, stemming from data generated using parameter values from area C in Figure 1, the bias for the λ parameter can be as large as 20 % for the MM estimation method in small sample sizes. In all cases considered in the simulation experiments, it is seen that use of MLE to estimate the model parameters of the PAR(2)-AA model, in terms of lower biases and smaller mean squared errors, is most efficacious.
With regard to the final set of simulations and the parameter constellation found in the application in the next section, it is evident from Table 4 that biases in the estimated parameters are negligible, in particular when MLE is employed.

5. Application

In this section we demonstrate the applicability of the PAR(2)-AA model to a real-world data set. We consider iceberg order data from the ask side of the Lufthansa stock sampled from the XETRA trading system at a frequency of 15 minutes during 15 consecutive trading days in the first quarter of 2004. For some explanatory remarks about the nature of iceberg orders, see e.g., Jung and Tremayne (2011). The sample consists of 510 (low) counts ranging from 0 to 4. The marginal distribution of the counts is shown in Figure 2; the sample mean and variance of the data are, respectively, 0.616 and 0.673. A time series plot and the sample (partial) autocorrelations (S(P)ACFs) are depicted in Figure 3.
The time series plot of the data shows no discernable trend, the sample mean and variance do not suggest overdispersion and the SACF and SPACF of the data point toward a first or second order autoregressive dynamic structure. Based on these findings we fit a PAR(1) and a PAR(2)-AA model to the first 505 observations, with the last five reserved for an illustrative out-of-sample prediction exercise.
To compare the two model fits, we exploit goodness-of-fit techniques based on: (a) the predictive distributions of the two models, in particular the PIT histogram and scoring rules; (b) the correlogram of the Pearson residuals computed as e t / σ X | F | 2 ( θ ) ; and (c) a parametric resampling procedure based on Tsay (1992). See Jung et al. (2016) and Czado et al. (2009) for more details. Figure 4 depicts the results for the two fitted models.
It is evident from the panels in Figure 4 that the fit of the PAR(2)-AA model is superior to that of the PAR(1) model, certainly when it comes to capturing the serial dependence in the data. Further evidence favouring the PAR(2)-AA specification is provided by comparing the scoring rules provided in Table 5 for the two model fits. We, therefore, report parameter estimates and out-of-sample results for the preferred second order model only.
Table 6 provides the parameter estimates for the three parameters of the PAR(2)-AA model based on method of moments, NLS, GMM and ML estimation, respectively. Estimated standard errors are provided for the ML estimates only. It is evident that the parameter estimate for the α 2 parameter is statistically different from zero, providing further statistical evidence of a preference for the PAR(2)-AA model over the PAR(1) one.
We now present an illustration of out-of-sample prediction for the PAR(2)-AA model. We employ a fixed forecasting environment using only the fit to observations 1 to 505. Observations 504 to 510 are, in fact, (2,0,2,2,3,3,2); since a value 3 is only observed 2.4 % of times in the entire realization, these data represent a short epoch of values that will perforce be difficult to forecast. Using (12) we select three predictive distributions with different Markovian histories (pairs of lagged counts) for graphical presentation in Figure 5. Panel (a) in the Figure corresponds to the prediction of observation number 506 (observed value 2, relevant past history 2,0); the predictive distribution has a mode of zero with an estimated probability close to 0.5 so that the observed value of 2 is seen as unlikely, having a predicted probability of around 0.1 . In Panel (b) of the Figure, the predictive distribution for observation number 508 is provided (observed value 3, relevant past history 2,2). The predictive distribution for the one-step ahead forecast is now markedly different; the mode has shifted up to 1, the probability of observing another 2 has risen to 0.3 and even the large value of 3 is forecast to occur with estimated probability 0.08 . Finally, Panel (c) portrays the predictive distribution for observation 510 (observed value 2 with relevant past history 3,3). Note that the estimated probability of observing yet another 3 has risen to about 0.18 and the modal forecast is in fact the value observed. Overall, the three panels of the Figure illustrate clearly how rapidly the predictive distribution responds to altering relevant past history.

6. Concluding Remarks

This paper considers maximum likelihood estimation in an integer autoregressive model for which such parameter estimators have not been available hithero. The basic techniques used are reduction methods and change of variable techniques. The expressions that emerge for the conditional (predictive) distributions are complicated involving, as they do, multiple summations. We demonstrate that they can be fruitfully used for inferential purposes using a real-world data set. Moreover, the predictive distributions derived can be employed in goodness-of-fit analyses and out-of-sample predictions leading to probabilistic forecasts.

Author Contributions

Both authors have contributed equally to all aspects of the preparation and writing of this paper. They have both read and agreed to the published version of the manuscript.


This research received no external funding.


We are grateful to two anonymous Referees whose comments led to improvements to the original version of the paper. In particular, the simulation evidence relating to GMM estimation stemmed from one such suggestion.

Conflicts of Interest

The authors declare no conflict of interest.


  1. Al-Osh, M. A., and A. A. Alzaid. 1987. First-order integer-valued autoregressive (INAR (1)) process. Journal of Time Series Analysis 8: 261–75. [Google Scholar] [CrossRef]
  2. Alzaid, A. A., and M. Al-Osh. 1990. An integer-valued pth-order autoregressive structure (INAR(p)) process. Journal of Applied Probability 27: 314–23. [Google Scholar] [CrossRef]
  3. Brännäs, Kurt. 1994. Estimation and Testing in Integer-Valued AR(1) Models. University of Umeå Discussion Paper No. 335. Umeå: University of Umeå. [Google Scholar]
  4. Bu, Ruijun, Kaddour Hadri, and Brendan P. M. McCabe. 2008. Maximum likelihood estimation of higher-order integer-valued autoregressive processes. Journal of Time Series Analysis 29: 973–94. [Google Scholar] [CrossRef]
  5. Consul, P. C. 1989. Generalized Poisson Distributions: Properties and Applications. New York: Marcel Dekker. [Google Scholar]
  6. Czado, Claudia, Tilmann Gneiting, and Leonhard Held. 2009. Predictive model assessment for count data. Biometrics 65: 1254–61. [Google Scholar] [CrossRef] [PubMed]
  7. Drost, Feike C., Ramon van den Akker, and Bas J. M. Werker. 2009. Efficient estimation of auto-regression parameters and innovation distributions for semiparametric integer-valued AR(p) models. Journal of the Royal Statistical Society, Series B 71: 467–85. [Google Scholar] [CrossRef]
  8. Du, Jin-Guan, and Yuan Li. 1991. The integer-valued autoregressive (INAR (p)) model. Journal of Time Series Analysis 12: 129–42. [Google Scholar]
  9. Hansen, Lars Peter. 1982. Large sample properties of generalized method of moments estimators. Econometrica 50: 1029–54. [Google Scholar] [CrossRef]
  10. Joe, Harry. 1996. Time series models with univariate margins in the convolution-closed infinitely divisible class. Journal of Applied Probability 33: 664–77. [Google Scholar] [CrossRef]
  11. Jung, Robert C., Brendan P. M. McCabe, and Andrew R. Tremayne. 2016. Model validation and diagonostics. In Handbook of Discrete-Valued Time Series. Edited by Richard A. Davis, Scott H. Holan, Robert Lund and Nalini Ravishanker. Boca Raton: CRC Press, pp. 189–218. [Google Scholar]
  12. Jung, Robert C., Gerd Ronning, and Andrew R. Tremayne. 2005. Estimation in conditional first order autoregression with discrete support. Statistical Papers 46: 195–224. [Google Scholar] [CrossRef]
  13. Jung, Robert C., and Andrew R. Tremayne. 2003. Testing for serial dependence in time series of counts. Journal of Time Series Analysis 24: 65–84. [Google Scholar] [CrossRef]
  14. Jung, Robert C., and Andrew R. Tremayne. 2006. Coherent forecasting in integer time series models. International Journal of Forecasting 22: 223–38. [Google Scholar] [CrossRef]
  15. Jung, Robert C., and Andrew R. Tremayne. 2011. Convolution-closed models for count time series with applications. Journal of Time Series Analysis 32: 268–80. [Google Scholar] [CrossRef]
  16. Karlis, Dimitris. 2003. An EM algorithm for multivariate Poisson distribution and related models. Journal of Applied Statistics 30: 63–77. [Google Scholar] [CrossRef]
  17. Loukas, S., and C. D. Kemp. 1983. On computer sampling from trivariate and multivariate discrete distributions. Journal of Statistical Compuation and Simulation 17: 113–23. [Google Scholar] [CrossRef]
  18. Mahamunulu, D. M. 1967. A note on regression in the multivariate Poisson distribution. Journal of the American Statistical Association 62: 251–58. [Google Scholar] [CrossRef]
  19. Mardia, K. V. 1970. Families of Bivariate Distributions. London: Charles Griffin & Co. [Google Scholar]
  20. Martin, Gael, Brendan P. M. McCabe, and David Harris. 2011. Optimal probabalistic forecasts for counts. Journal of the Royal Statistical Society: Series B 73: 253–72. [Google Scholar]
  21. McKenzie, Ed. 1985. Some simple models for discrete variate time series. Water Resources Bulletin 21: 645–50. [Google Scholar] [CrossRef]
  22. McKenzie, Ed. 1988. Some ARMA models for dependent sequences of Poisson counts. Advances in Applied Probability 20: 822–35. [Google Scholar] [CrossRef]
  23. Schweer, Sebastian. 2015. On the time-reversibility of integer-valued autoregressive processes of general order. In Stochastic Models, Statistics and Their Applications. Edited by Ansgar Steland, Ewaryst Rafajłowicz and Krzysztof Szajowski. Cham: Springer International Publishing, pp. 169–77. [Google Scholar]
  24. Silva, Isabel, Maria Eduarda Silva, and Cristina Torres. 2020. Inference for bivariate integer-valued moving average models based on binomial thinning operation. Journal of Applied Statistics. [Google Scholar] [CrossRef]
  25. Steutel, Fred W., and Klaas van Harn. 1979. Discrete analogues of self-decomposability and stability. The Annals of Probability 7: 893–99. [Google Scholar] [CrossRef]
  26. Tsay, Ruey S. 1992. Model checking via parametric bootstraps in time series analysis. Applied Statistics 41: 1–15. [Google Scholar] [CrossRef]
Figure 1. Partition of the α 1 / α 2 parameter space in the PAR(2)-AA model.
Figure 1. Partition of the α 1 / α 2 parameter space in the PAR(2)-AA model.
Econometrics 08 00024 g001
Figure 2. Marginal distribution of the iceberg order data.
Figure 2. Marginal distribution of the iceberg order data.
Econometrics 08 00024 g002
Figure 3. Time series plot (panel (a)) and SACF and SPACF (panels (b,c)) of the iceberg order data.
Figure 3. Time series plot (panel (a)) and SACF and SPACF (panels (b,c)) of the iceberg order data.
Econometrics 08 00024 g003
Figure 4. PAR(1) model fit: residual autocorrelations (panel (a)), PIT histogram (panel (b)) and parametric resampling procedure (panel (c)); Panels (df) depict the corresponding information for the PAR(2)-AA model fit.
Figure 4. PAR(1) model fit: residual autocorrelations (panel (a)), PIT histogram (panel (b)) and parametric resampling procedure (panel (c)); Panels (df) depict the corresponding information for the PAR(2)-AA model fit.
Econometrics 08 00024 g004
Figure 5. One-step ahead predictive distributions of the iceberg order data based on the PAR(2)-AA model.
Figure 5. One-step ahead predictive distributions of the iceberg order data based on the PAR(2)-AA model.
Econometrics 08 00024 g005
Table 1. Bias, percentage bias and MSE for the MM, NLS, GMM and ML parameter estimates for the first set of simulation experiments.
Table 1. Bias, percentage bias and MSE for the MM, NLS, GMM and ML parameter estimates for the first set of simulation experiments.
T = 125T = 250T = 500
Bias% BiasMSEBias% BiasMSEBias% BiasMSE
α 1 | M M −0.0184−6.12950.0095−0.0082−2.71900.0048−0.0031−1.02990.0024
α 2 | M M 0.00606.00030.0045−0.0026−2.57950.0027−0.0045−4.53950.0016
λ M M 0.03812.11820.12590.03061.70130.06650.02391.32740.0357
α 1 | N L S −0.0181−6.04940.0097−0.0081−2.68760.0048−0.0030−0.99760.0024
α 2 | N L S 0.00595.90880.0044−0.0022−2.22550.0026−0.0043−4.31510.0015
λ N L S 0.03742.07800.12880.02941.63590.06630.02271.26130.0350
α 1 | G M M 0.00170.57720.00810.00080.28330.00420.00120.41300.0020
α 2 | G M M 0.016316.25880.00520.00343.41030.0029−0.0020−2.00000.0017
λ G M M −0.0760−4.22020.1463−0.0269−1.49190.0743−0.0037−0.20590.0382
α 1 | M L −0.0091−3.04810.0079−0.0038−1.26810.0039−0.0009−0.31330.0019
α 2 | M L 0.00929.16550.0046−0.0006−0.61370.0026−0.0034−3.36610.0015
λ M L 0.00040.02310.10950.01230.68350.05670.01380.76590.0350
Table 2. Bias, percentage bias and MSE for the MM, NLS, GMM and ML parameter estimates for the second set of simulation experiments.
Table 2. Bias, percentage bias and MSE for the MM, NLS, GMM and ML parameter estimates for the second set of simulation experiments.
T = 125T = 250T = 500
Bias% BiasMSEBias% BiasMSEBias% BiasMSE
α 1 | M M −0.0398−9.95780.0161−0.0234−5.84880.0081−0.0120−3.01090.0041
α 2 | M M −0.0218−7.25670.0067−0.0096−3.18340.0031−0.0046−1.51900.0015
λ M M 0.239719.97730.32300.129510.79390.14530.06435.36130.0656
α 1 | N L S −0.0359−8.96340.0158−0.0202−5.04050.0078−0.0107−2.68080.0036
α 2 | N L S −0.0196−6.54750.0066−0.0082−2.74610.0031−0.0039−1.30100.0015
λ N L S 0.214217.84670.30970.11019.17260.13470.05624.68090.0595
α 1 | G M M −0.0196−4.89960.0114−0.0177−4.43340.0062−0.0173−4.33010.0032
α 2 | G M M −0.0125−4.18210.0075−0.0040−1.34940.0037−0.0018−0.61450.0019
λ G M M 0.06225.18250.15340.04814.01220.07650.03923.27070.0394
α 1 | M L −0.0102−2.55880.0090−0.0053−1.32620.0044−0.0026−0.63810.0021
α 2 | M L −0.0124−4.12260.0058−0.0043−1.43820.0028−0.0016−0.53260.0014
λ M L 0.08306.91710.12380.03512.92390.05250.01441.20410.0252
Table 3. Bias, percentage bias and MSE for the MM, NLS, GMM and ML parameter estimates for the third set of simulation experiments.
Table 3. Bias, percentage bias and MSE for the MM, NLS, GMM and ML parameter estimates for the third set of simulation experiments.
T = 125T = 250T = 500
Bias% BiasMSEBias% BiasMSEBias% BiasMSE
α 1 | M M −0.0337−11.22010.0172−0.0208−6.92540.0100−0.0128−4.27410.0052
α 2 | M M −0.0303−7.58640.0076−0.0149−3.73460.0035−0.0065−1.61590.0017
λ M M 0.249020.75340.35560.137911.49330.17750.07566.30160.0839
α 1 | N L S −0.0299−9.95350.0165−0.0171−5.70060.0093−0.0103−3.43620.0943
α 2 | N L S −0.0280−7.00740.0074−0.0135−3.37850.0035−0.0057−1.43590.0017
λ N L S 0.221818.48640.33190.11649.70040.16030.06215.17260.0728
α 1 | G M M −0.032210.74680.0158−0.0188−6.25150.0106−0.0093−3.13560.0897
α 2 | G M M −0.01894.71780.0077−0.0038−0.94120.0038−0.0023−0.56430.0017
λ G M M −0.10980.15310.20680.10308.58470.11950.04323.57850.0410
α 1 | M L −0.0100−3.34390.0121−0.0053−1.76540.0065−0.0036−1.20380.0032
α 2 | M L −0.0176−4.40070.0062−0.0071−1.76310.0031−0.0022−0.54190.0015
λ M L 0.10018.3440.14750.04363.63430.06940.02131.77270.0313
Table 4. Bias, percentage bias and MSE for the MM, NLS, GMM and ML parameter estimates for the fourth set of simulation experiments.
Table 4. Bias, percentage bias and MSE for the MM, NLS, GMM and ML parameter estimates for the fourth set of simulation experiments.
T = 125T = 250T = 500
Bias% BiasMSEBias% BiasMSEBias% BiasMSE
α 1 | M M −0.0340−6.80630.0181−0.0181−3.61140.0058−0.0078−1.56590.0029
α 2 | M M 0.00696.90040.0045−0.0019−1.87960.0026−0.0033−3.28750.0016
λ M M 0.01405.82520.00590.01024.25850.00320.00522.16740.0015
α 1 | N L S −0.0321−6.42570.0119−0.0172−3.42760.0057−0.0077−1.53030.0028
α 2 | N L S 0.00595.90700.0043−0.0012−1.19180.0023−0.0020−2.04640.0013
λ N L S 0.01255.12380.00190.00873.61180.00290.00431.77530.0014
α 1 | G M M −0.0085−1.70070.0118−0.0032−0.63400.0040−0.0016−0.32300.0020
α 2 | G M M 0.016816.84470.00450.00343.41050.0029−0.0019−1.87000.0017
λ G M M −0.0079−5.87510.0059−0.0027−1.12080.00250.00060.24450.0014
α 1 | M L −0.0144−2.87590.0072−0.0068−1.21150.0034−0.0032−0.63590.0016
α 2 | M L 0.00414.09200.0035−0.0009−0.07280.0019−0.0014−1.43250.0010
λ M L 0.00261.06810.00350.00230.67560.00170.00120.49760.0009
Table 5. Scoring rules for the PAR(1) and the PAR(2)-AA model fit.
Table 5. Scoring rules for the PAR(1) and the PAR(2)-AA model fit.
Scoring RulePAR(1)PAR(2)-AA
log score0.91740.9058
quadratic score0.51610.5081
ranked prob. score0.33150.3258
Table 6. Parameter estimates for the iceberg order data based on the PAR(2)-AA model.
Table 6. Parameter estimates for the iceberg order data based on the PAR(2)-AA model.
ParameterMM EstimatesNLS EstimatesGMM EstimatesML EstimatesML Standard Errors
α 1 0.49900.49540.47120.46680.0410
α 2 0.11470.09240.08060.09990.0349
λ 0.23100.24850.27590.26140.0296
Back to TopTop