A New First-Order Integer-Valued Autoregressive Model with Bell Innovations

A Poisson distribution is commonly used as the innovation distribution for integer-valued autoregressive models, but its mean is equal to its variance, which limits flexibility, so a flexible, one-parameter, infinitely divisible Bell distribution may be a good alternative. In addition, for a parameter with a small value, the Bell distribution approaches the Poisson distribution. In this paper, we introduce a new first-order, non-negative, integer-valued autoregressive model with Bell innovations based on the binomial thinning operator. Compared with other models, the new model is not only simple but also particularly suitable for time series of counts exhibiting overdispersion. Some properties of the model are established here, such as the mean, variance, joint distribution functions, and multi-step-ahead conditional measures. Conditional least squares, Yule–Walker, and conditional maximum likelihood are used for estimating the parameters. Some simulation results are presented to access these estimates’ performances. Real data examples are provided.


Introduction
In recent years, studying count time series has attracted a lot of attention in different fields, such as finance, medical science, and insurance. There are many models for count data that have been proposed by scholars. The most famous model was first introduced by McKenzie (1985) [1] and Al-Osh and Alzaid (1987) [2] based on the binomial thinning • operator (Steutel and van Harn 1979 [3]) called the first-order integer-valued autoregressive (INAR(1)) process. Given a non-negative integer-valued random variable (r.v.) X and a constant α ∈ (0, 1), the binomial thinning operator • is defined as α • X = ∑ X i=1 ξ i , where the counting series ξ i is a sequence of independent identically distributed (i.i.d.) Bernoulli r.v.s with P(ξ i = 1) = 1 − P(ξ i = 0) = α. Then, the form of the INAR(1) model is where t is a sequence of i.i.d. discrete r.v.s, with the mean µ and finite variance σ 2 . t is independent of ξ i and X t−s for s ≥ 1. According to Alzaid and Al-Osh (1988) [4], we know that the mean and variance of the INAR(1) model are µ := µ X = µ 1 − α and σ 2 := σ 2 X = σ 2 + αµ 1 − α 2 , respectively.
For innovation t , the Poisson distribution is often assumed as the distribution of t in the INAR(1) model. A natural characteristic of the Poisson distribution is equidispersion; i.e., its mean and variance are equal to each other. In practice, however, many data examples are overdispersed (variance is greater than mean) relative to the Poisson distribution. For this reason, the INAR(1) model with Poisson innovations is not always suitable for modeling integer-valued time series. Therefore, several models which describe the overdispersion phenomena have been discussed in the statistical literature.

The BL-INAR(1) Model
In this section, we present a brief review of the Bell distribution (Castellares et al., 2018 [16]). Its definition and some properties are presented. Later we introduce the BL-INAR(1) model and derive some basic properties of it.

The Bell Distribution
At first, we introduce the Bell numbers. Bell (1934) [21] has provided the following expansion: where B n is the Bell number defined by The Bell number B n is the n-th moment of the Poisson distribution with parameter equal to 1. Some Bell numbers are listed as follows. For the convenience of the reader, we introduce the following definition and properties of the Bell distribution described in Castellares et al. (2018) [16]: . .} has a Bell distribution with parameter θ > 0, denoted as Z ∼ Bell(θ), if its probability mass function is given by where B z is the Bell number in (2).
We can see that the Bell distribution has only one parameter, and it belongs to the oneparameter exponential family of distributions. If Z ∼ Bell(θ), the probability generating function is G Z (s) = E s Z = exp e sθ − e θ , |s| < 1.
Note that Var(Z)/E(Z) = 1 + θ > 1; hence, the Bell distribution is overdispersed, which means the Bell distribution may be suitable for count data with overdispersion in certain situations. There are some other interesting properties of the Bell distribution, including the following: (i) the Poisson distribution is not nested in the Bell family, but for small values of the parameter, the Bell distribution approaches the Poisson distribution; (ii) it is identifiable, strongly unimodal and infinitely divisible; (iii) a r.v. Z ∼ Bell(θ) has the same distribution as Y 1 + Y 2 + · · · + Y N , where Y n has zero-truncated Poisson distribution with parameter θ, and N ∼ Poisson(e θ − 1). See Castellares et al. (2018) [16] for more properties.
Additionally, there are some papers based on the Bell distribution, and the following are a few related references: Batsidis et al. (2020) [22] proposed and studied a goodnessof-fit test for the Bell distribution, which is consistent against fixed alternatives; Castellares et al. (2020) [23] presented a new two-parameter Bell-Touchard discrete distribution; Lemonte et al. (2020) [24] introduced a zero-inflated Bell regression model for count data; Muhammad et al. (2021) [25] proposed a Bell ridge regression as a solution to the multicollinearity problems.

Definition and Properties of the BL-INAR(1) Process
In this section, we give the definition of the BL-INAR(1) process, and its basic statistical properties are derived. Definition 2. Let {X t } t∈N 0 be an INAR(1) process according to (1). It refers to a BL-INAR(1) model if the innovations { t } t∈N 0 are a sequence of i.i.d. Bell(θ) r.v.s given by (3); i.e., where 0 < α < 1 and θ > 0, and t is independent of ξ i and X t−1 for t ≥ 1.
According to Equation (4), we know the mean and variance of t are finite; therefore, the process of {X t } t∈N 0 in (5) is an ergodic stationary Markov chain (Du and Li, (1991) [26]) with transition probabilities Further, we can obtain the joint probability function as follows: The conditional mean, conditional variance, mean, variance, covariance and autocorrelation function of the BL-INAR(1) process are given in the following lemma. Lemma 1. Let X t be the process in Definition 2. Then it has the following properties: The proof of Lemma 1 is similar to that of Theorem 1 of Qi et al. (2019) [14], so it is omitted.
According to Lemma 1, the dispersion index (Fisher, 1950 [27]) of X t is derived as follows: thus, the BL-INAR(1) process is suited for overdispersed integer-valued time series. Additionally, we can obtain the k-step ahead conditional mean and k-step ahead conditional variance of the BL-INAR(1) process in the following theorem. Theorem 1. The k-step ahead conditional mean and k-step ahead conditional variance of the BL-INAR(1) process are given, respectively, by: and For more details about the proof of this theorem, see Qi et al. (2019) [14] and Ristić, Bakouch, and Nastić (2009) [6]. It is easy to see that if , which are the unconditional mean and unconditional variance of X t , respectively.

Estimation of Parameters
The true values of parameters α and θ are unknown in practice; therefore, we need to estimate the value of (α, θ). Sometimes we have to give an estimate of (α, µ) first to get the estimate of (α, θ). In this section, we consider three methods for estimating parameters, namely, CLS, YW and CML.

Conditional Least Squares Estimation
The CLS estimates of the parameters α and θ are obtained by and the CLS estimates of (α, µ) are given bŷ Then, the CLS estimate of θ can be obtained by solving the equationθ CLS eθ CLS =μ CLS (1 −α CLS ). According to Theorems 3.1 and 3.2 in Tjøstheim (1986) [28], we can establish the consistency and asymptotic normality of the CLS estimatesα CLS andμ CLS in the following theorem. The proofs of Theorem 2 and the following theorem are given in Appendix A.

Theorem 2.
Letα CLS andμ CLS be the CLS estimates of the BL-INAR(1) process; then (α CLS ,μ CLS ) is strongly consistent for (α, µ); and the asymptotic distribution follows as: Using the delta method, we can obtain the limit distribution of (α,θ), and we can also know thatθ is consistent.

Yule-Walker Estimation
Let X 1 , . . . , X T come from the process {X t } in Definition 2. The sample mean is X = 1 T ∑ T t=1 X t , and the sample autocorrelation function iŝ From Lemma 1, we know ρ k = α k , thus the Yule-Walker (YW) estimate of α is given bŷ then the estimate of θ can be obtained.
For asymptotic properties of the YW estimates, Freeland and McCabe (2005) [29] showed that the YW and CLS estimates are asymptotically equivalent for a Poisson INAR (1) process. The next theorem shows that the conclusion holds for our BL-INAR(1) process.

Conditional Maximum Likelihood Estimation
According to the joint probability function (6), the likelihood function can be obtained as: To condition on variable X 1 , we can obtain the conditional log likelihood function as: the CML estimates of (α, θ) are the values of (α CML ,θ CML ) obtained by maximizing the conditional log likelihood function L(α, θ). It is easy to check that the BL-INAR(1) process satisfies conditions (C1)-(C6) of Franke and Seligmann (1993) [30]; thus, the CML estimates (α CML ,θ CML ) are consistent and asymptotically normal. The proof is similar to those of Theorems 22.4 and 22.5 of Franke and Seligmann (1993) [30], so it is omitted.

Simulation
A Monte Carlo simulation was conducted to study the performances of the CLS, YW, and CML estimates of the BL-INAR(1) model. The CML estimates were obtained by using the BFGS quasi-Newton nonlinear optimization algorithm with numerical derivatives. We considered YW estimates as initial values for the algorithm. The simulation was conducted using R programming language, and the size of the sample was 100, 250, 500, or 1000.
The number of replicates was 1000. For the true values of parameters, we considered α = 0.25, 0.5, and 0.75 and θ = 0.5, and 1.5.
First, we give the Q-Q plots of the CLS, YW, and CML estimates for the BL-INAR(1) model with sample size T = 1000, α = 0.5, and θ = 1.5 in Figure 1. From the six Q-Q plots, we can see that they contain roughly straight lines; i.e., the estimates of the parameters are normally distributed. Then, the numerical simulation results are presented in Tables 1 and 2. By comparing the two tables, we can find that with the same θ and T, the mean squared error (MSE) for the estimate of θ increased with the increase of α, but the MSE for the estimate of α decreased. Additionally, the MSE for the estimate of θ increased with the increase of θ with the same α and T, but the MSE for the estimate of α decreased. Furthermore, we can observe that the estimates of CLS and YW are similar, and the bias tended toward zero for all estimates as the sample size increased. The estimates of CML converged faster to the true parameter values. We conclude that the CML estimates produced the smallest mean square errors, and CML performed better than CLS and YW.

Real Data Examples
In

Disconduct Data
The first dataset is a monthly count of disconduct in the first census tract in Rochester, which can be obtained from Available online: http://www.forecastingprinciples.com (accessed on 8 May 2012). The data comprise 132 observations (T = 132) starting from January 1991 and ending in December 2001.
The time plot, histogram, autocorrelation function (ACF), and partial autocorrelation function (PACF) are provided in Figure 2. We applied the Ljung-Box test (Ljung and Box (1978) [31]) to check whether this time series dataset has any autocorrelation. The p-value of the Ljung-Box test is 1.317 × 10 −5 , which is less than 0.05. This means that the time series data have some autocorrelation, and according to the PACF diagram, the data are first-order autocorrelated, which shows that the AR(1)-type process is appropriate for modeling this dataset.
The sample mean and variance of the data areX = 1.6288 and S 2 X = 2.4455, respectively. Thus, we got the dispersion indexÎ x = S 2 X /X = 1.5014. According to the overdispersion test of Schweer and Weiß (2014) [11], the critical value of the data is 1.1994. The dispersion indexÎ x exceeds the critical value, which means that the equidispersed P-INAR(1) model is not a good choice for the data.  Table 3. We found that the AIC, BIC, CAIC, and HQIC of the BL-INAR(1) model were smaller than those of others. We also found that the fitted means of all eight models were near to the sample mean, and the fitted mean of the PL-INAR(1) model was the closest to the sample mean. In terms of fitted variance, Table 3 shows that the fitted variance of the BL-INAR(1) model performed better than those of the other seven models. For the prediction, we used the first 126 observations to estimate the parameters, and then predicted the last six observations. The predicted values of the disconduct data could be given by E(X t+k | X t ) = α k X t + µ For a further comparison of models, we calculated the root mean square values of the prediction errors (RMSEs) for the last 6 months of the data, and the RMSE is defined as RMSE = 1 6 ∑ 6 k=1 (X t+k −X t+k ) 2 . We present the RMSE results of eight models in the last column of Table 3.  Figure 3 plots the ACF, PACF, and Q-Q plots of residuals. The ACF and PACF graphs show no correlation between residuals, which is supported by the result of the Ljung-Box test with a p-value of 0.05251 > 0.05. The Q-Q plots appear to be roughly normally distributed, as we expected. Hence, we can conclude that the BL-INAR(1) model is the most suitable among those available for this dataset.

Strikes Data
The second dataset, which was analyzed by Weiß (2010) [32], is the monthly number of work stoppages (strikes and lock-outs) of 1000 or more workers for the period 1994-2002. It was published by the US Bureau of Labor Statistics and can be obtained by online at the address Available online: http://www.bls.gov/wsp/ (accessed on 8 May 2012). The data contain 108 observations, and the time plot, histogram, ACF, and PACF are provided in Figure 4. As with the previous example, the Ljung-Box test was used to check whether the strike data have any autocorrelation. The p-value of the Ljung-Box test was 2.372 × 10 −8 , which shows that the time series data have some autocorrelation, and according to the PACF diagram, it is also first-order autocorrelated, so an AR(1)-type process is appropriate for modeling this dataset.
The sample mean, variance, and dispersion index were calculated to be 4.9444, 7.8488, and 1.5874, respectively. According to the overdispersion test, the critical value of the data is 1.2808, and we observe that it was inappropriate to use the P  Table 4. We see that the AIC, BIC, CAIC, and HQIC of the BL-INAR(1) model are smaller than those of others, and the fitted mean of the BL-INAR(1) model is not much different from those of the other seven models. Further, we can see that the BL-INAR(1) model performed better than others when calculating the fitted variance. Similarly to the previous example, the first 102 observations were used to estimate the parameters and predict the last six observations. The RMSE of the predictions is also presented in Table 4. We can observe that the RMSE of the G-INAR(1) model is the smallest; however, it is only 0.05 less than the RMSE of the BL-INAR(1) model. As in the previous example, although the BL-INAR(1) model was not the best under the fitted mean and RMSE criteria, it performed best under the other five criteria. Additionally, we show the Pearson residuals analysis. Figure 5 gives the ACF, PACF, and Q-Q plots of the residuals. We found that there is no evidence of any significant correlation within the residuals, a finding also supported by the Ljung-Box test with a p-value of 0.9522, which is greater than 0.05. The Q-Q plot also appears to be roughly normally distributed. Thus, according to above discussions and its simplicity, we can conclude that the BL-INAR(1) model was the most appropriate.

Conclusions
A new INAR(1) model with Bell innovations based on the binomial thinning operator was introduced in this paper. Based on the overdispersed property of the Bell distribution, we found that the BL-INAR(1) model is suitable for overdispersed data. Some basic properties of the model were obtained, such as transition probabilities, conditional mean, conditional variance, mean, variance, covariance, autocorrelation function, and k-step ahead conditional mean and variance. For unknown parameters, CLS, YW, and CML methods are used to estimate them. The Q-Q plots showed that the estimates of the parameters are normally distributed. The simulated results revealed that the CML estimates of parameters of the BL-INAR(1) model were better than the CLS and YW estimates. Finally, by comparing the AIC values, BIC values, CAIC values, HQIC values, fitted means, fitted variances, and RMSE values of the predictions among eight INAR(1) models, two real datasets both showed that the BL-INAR(1) model fits better than other INAR(1) models.
The analysis of residuals also shows that the BL-INAR(1) model provided adequate fits to those datasets.
Although there are many overdispersed INAR(1) models, some interesting properties of the Bell distribution, such as having one parameter, infinitely divisibility, having a simple probability mass function, belonging to the one-parameter exponential family of distributions, and for a parameter with a small value, having the Bell distribution approach the Poisson distribution, make the BL-INAR(1) model competitive. Some extended distributions of the Bell distribution, such as the zero-inflated Bell distribution and the Bell-Touchard distribution, provide ideas for us to study related INAR models in the future. To prove this theorem, we need to show that the conditions given in Theorems 3.1 and 3.2 of Tjøstheim (1986) [28] are satisfied.

Condition 2:
The vectors ∂E(X t | X t−1 )(θ 0 )/∂φ i , i, j = 1, 2 are linearly independent in the sense that if a 1 and a 2 are arbitrary real numbers such that