1. Introduction
Time series models with an integer-valued structure are prevalent in fields such as economics, insurance, medicine and crime. Recent reviews on count time series models based on thinning operators, including modeling and numerous examples, can be found in refs. [
1,
2,
3]. The most classic one is the first-order integer-valued autoregressive INAR(1) model introduced by ref. [
4] and ref. [
5] using the properties of binomial sparse operators (ref. [
6]). The class of INAR models is typically based on the assumption of observations that follow a Poisson distribution, which facilitates subsequent computations due to the equidispersion feature of the Poisson distribution (mean and variance being equal). Parameter estimation for these models is usually achieved by Yule–Walker, conditional least squares and conditional maximum likelihood methods, as discussed in refs. [
7,
8,
9], among others. Recently, there has been a growing interest in research regarding the autoregressive coefficients in these models. Generally, the autoregressive coefficients can be treated as random variables, with rules such as a stationary process or specified mean and variance; see refs. [
10,
11,
12]. Unlike the above models, this paper proposes a time-varying integer-valued autoregressive (TV-INAR) model in which the state equation is a nonstationary form.
In our work, the model is characterized by a state equation and time-varying parameters. The concept of state-space time series analysis was first introduced by ref. [
13] in the field of engineering. Over time, the term “state space” became entrenched in statistics and econometrics, as the state-space model provides an effective method to deal with a wide range of problems in time series analysis. Ref. [
14] presented a comprehensive treatment of the state-space approach to time series analysis. Recently, some progress has been made on the use of state space in integer autoregressive and count time series models. Ref. [
15] proposed a first-order random coefficient integer-valued autoregressive model by introducing a Markov chain with a finite state space and derived some statistical properties of the model in a random environment. Ref. [
16] introduced a parameter-driven state-space model to analyze integer-valued time series data. And, to accommodate the features of overdispersion, zero-inflation and temporal correlation of count time series, ref. [
17] proposed a flexible class of dynamic models in the state-space framework. As noted in ref. [
18], the popularity of these models stems in large part from the development of the Kalman recursion, proving a quick updating scheme for predicting, filtering and smoothing a time series. The traditional understanding of Kalman filtering can be found in the works of refs. [
14,
19,
20]. Nevertheless, most of this research has focused on the use of continuous time series models in an economic context, such as the analysis of stocks, trade and currency. For example, ref. [
21] proposed a class of time-varying parameter autoregressive models and proved the equivalence of the Kalman-smoothed estimate and generalized least squares estimate. Following this lead, [
22] developed a trade growth relationship model with time-varying parameters and estimated the transition of changing parameters with a Kalman filter. Both of them all involved cases of Gaussian and linear parameters. Ref. [
14] attested that the results obtained by Kalman smoothing are equivalent when the model is non-Gaussian or nonlinear. Therefore, integer-valued time series models in the non-Gaussian case are also worth investigating.
Time series models with state-space and time-varying parameters are common in economics. Many macroeconomists believe that time-varying parameter models can better predict and adapt to the data than fixed-parameter models. Early research mostly focused on time-varying continuous time series models, such as the autoregressive (AR) models, vector autoregressive (VAR) models and autoregressive moving average (ARMA) models (see refs. [
21,
23,
24]). In recent years, research on INAR models with time-varying parameters has attracted more attention and has been applied to natural disasters and medical treatment. Ref. [
25] introduced a multivariate integer-valued autoregressive model of order one with periodic time-varying parameters and adopted a composite likelihood-based approach. Ref. [
26] considered a change-point analysis of counting time series data through a Poisson INAR(1) process with time-varying smoothing covariates. However, the time-varying parameters in the above models are not controlled by a state equation. Additionally, it is difficult to deal with time-varying parameter models when there are unobserved variables that need to be estimated. There are only a few effective methods available. Ref. [
27] proposed a Bayesian estimation method for time-varying parameters and claimed that the Bayesian method is superior to the maximum likelihood method. Both the Bayesian and maximum likelihood methods require Kalman filtering to estimate state vectors that contain time-varying parameters. The application of the Kalman filter would be made possible once the model is put into a state-space form. It is worth exploring some new methods to deal with TV-INAR models.
Motivated by ref. [
21], a new TV-INAR model with a state equation is presented in this paper. Since the model is in a state-space form, the Kalman smoothing method uses only known observation variables to estimate the parameters. Unlike traditional INAR models, which are limited to modeling stationary time series data, our TV-INAR model is equipped to handle nonstationary structures and time-varying dynamics, resulting in improved model fit and more accurate predictions. The Kalman-smoothed estimates of the time-varying unobserved state variables are derived. Furthermore, the mean of the Poisson error term is estimated through the estimates obtained in the previous step and conditional least squares methods.
The rest of this paper is organized as follows. A new TV-INAR model is presented and its basic properties are constructed in
Section 2. In
Section 3, the Kalman smoother is utilized to derive an estimate of the time-varying parameter. Then, incorporating conditional least squares (CLS) methods, the estimation of the mean of the Poisson error term is established. Numerical simulations and results are discussed in
Section 4. In
Section 5, the proposed model is applied to the offense data sets in Rochester. The conclusion is given in
Section 6.
2. Poisson TV-INAR Model
In this section, the INAR(1) model is reviewed. A new TV-INAR model incorporating time-varying and nonstationary features is introduced. Then, some basic statistical properties are derived.
Suppose
Y is a non-negative integer-valued random variable and
. The binomial thinning ∘, introduced in ref. [
6], is defined as
, where
is a sequence of independent and identically distributed (i.i.d.) Bernoulli random variables, independent of
Y, and satisfying
. Then, the INAR(1) model is given by
where
is a sequence of uncorrelated non-negative integer-valued random variables, with mean
and finite variance
, and
is independent of all
.
2.1. Definition of TV-INAR Process
It is very common to extend the autoregressive coefficient to a random parameter in time series models. However, it differs from the time-varying parameter introduced in this paper. In random coefficient (RC) models, the variable is usually assigned a definite distribution or given its expectation and variance. Although it is a random variable, its expectation and variance do not change with time. In contrast, in time-varying parameter models, the parameter does not have a fixed distribution, and its expectation and variance often change over time. This is also one of the challenges of such models compared with the ordinary RC models. Thus, based on the above INAR(1) process, we define the time-varying integer-valued autoregressive (TV-INAR) process as follows.
Definition 1. Let be an integer-valued regressive process. It is a time-varying integer-valued regressive model ifwhere is a function of ; is a sequence of i.i.d. Poisson-distributed random variables with mean λ; is a sequence of i.i.d. standard normally distributed random variables; and is independent of and when . In the model (
2),
and
are observation variables, and
is an unobserved variable, often called a time-varying parameter. Our model allows the class of TV-INAR models. For example,
yields a TV-INAR(1) model. Model (
2) becomes a TV-INAR(p) model when
and the autocorrelation coefficient expands to
. Furthermore,
can also be considered as a covariate of
.
In this paper, we devote the case of
. The idea of this model is that the development of the system over time is determined by
in the second equation of (
2). However, since
cannot be observed directly, we need to analyze it based on the observations
. The first equation of (
2) is called the observation equation; the second is called the state equation [
14]. We assume initially that
where
and
are known, because the state equation is nonstationary. Variances of the error terms
and
in (
2) are time-invariant.
Studies on the class of TV-AR models have been extensive, especially in the field of economics. Such models are flexible enough to capture the complexity of macroeconomic systems and fit the data better than models with constant parameters. Retaining the advantages of the above models, we propose the TV-INAR model, which can deal with integer-valued discrete data and exists in real life. And, it is of interest that is always nonstationary, as is a random walk. Therefore, the existence of the above state equation is justified. And the binomial thinning operator ∘ portrays the probability of the event, so is given the form to ensure that the probability is between (0,1).
2.2. Statistical Properties
To have a better understanding of the model and to apply it directly to parameter estimation, some statistical properties of the model are provided. The mean, conditional mean, second-order conditional origin moment, variance and conditional variance of the time-varying integer-valued regressive process are given in the following proposition. The case of TV-INAR(1) is given in the corollary.
Proposition 1. Suppose is a process defined by (2), and is a σ-field generated by . Then, when , we have - (1)
- (2)
- (3)
Corollary 1. Suppose satisfies the TV-INAR(1) model, i.e., , and , then for ,
- (1)
- (2)
- (3)
- (4)
Clearly, when
,
is a bivariate Markov chain with the following transition probabilities:
4. Simulation
In this section, we perform simulations using Matlab R2018b to assess the performance of the two-step estimation approach discussed in
Section 3. The approach recovers the true time-varying parameters, as well as the performance of the CLS estimator.
For the data generating process, using Equation (
3), pseudodata are randomly generated by changing time
T (
) and the parameter
(
) of the error term in the observation equation. The number of replications was 100. Here, the choice of
is based on the size of the signal-to-noise ratio (SNR), which is defined as the variance in
relative to that in
, i.e., SNR
. In our model, the error of the state equation is constant. So, we consider the SNRs for 1/0.05, 1/0.1, 1/0.8, 1/1.3 and 1/4 by changing the variance in the error (
) in the observation equation. And these five sample paths of the TV-INAR(1) model are plotted in
Figure 1, respectively. We can see that the sample path of
is unsteady, and the variation in parameter combinations results in a change in the sample dispersion of the samples.
Then, we should notice the choice of the parameter’s initial value. The initial value of
follows a known normal distribution, which is mentioned in
Section 2.1. In practice, it is difficult to gauge the true value of its mean and the variance of the distribution. For simplicity, we assume
is known and nonstochastic, i.e.,
. This assumption brings great convenience in simulation studies and does not affect numerical results. This can be observed in Proposition 3 when the sample size is sufficiently large.
Our simulation concerns a first-order integer-valued autoregressive model with time-varying parameters (TV-INAR(1)). For the first step estimation, let
denote the true value of the parameter in the data generation process and
denote the Kalman-smoothed estimate of the parameter. We compute the sample means and sample standard deviations of the true and estimated values, respectively:
After
replications, the averages of the above four indicators are found, respectively:
Let
. We take the difference between
and
directly to evaluate the effectiveness of the Kalman-smoothed estimate approach. In addition, denote
as the mean of the ratio of the standard deviation of
to the standard deviation of the real process
. The quality of the estimator depends on whether
is close to one. This is similar to the criterion in [
21].
Next, we evaluate the performance of the second step estimation, which is to apply the
and the conditional least squares (CLS) method to estimate the parameter
in the model (
2). As we mentioned above, the true value of
is considered
. To evaluate the estimation performance, besides considering
, two other indicators were selected, which are the mean absolute deviation (MAD) and mean square error (MSE), as defined below:
where
is the estimation result of
at the
nth replication. Then, considering various sample sizes and initial parameter values, the simulation results of the two-step estimation process are listed in
Table 1 and
Table 2.
It is shown that the smaller the variance () of the error term, the smaller the biases of estimates and . This implies that the Kalman smoothing and CLS approaches work better in the sense of . In the first-step estimation, the larger the variance (), the closer is to one. In this sense, the Kalman smoothing method is the best if is equal to 1.3. This suggests that only using one criterion to measure the effectiveness of the estimation method is insufficient. In addition, by increasing the sample size, is smaller and is closer to one. This shows that the Kalman-smoothed estimate is closer to the true process. From the perspective of SNR, the larger the SNR, the smaller the bias in the estimation. The smaller the SNR, the closer the estimated median sample variance is to the true process. In summary, when the SNR is around 1, the estimation is relatively good. In the second-step estimation, the values of MAD and MSE are small, suggesting a relatively acceptable estimation effect. The estimation results as a whole show a trend that the larger the SNR, the smaller the estimation error. Consequently, the CLS estimate method works better. Additionally, when the sample size T increases, the corresponding MAD and MSE gradually decrease, and the estimates of the parameter gradually converge to the true values. In conclusion, the proposed two-step estimation method is a credible approach.
5. Case Application
In this section, we apply the model and method of
Section 3 to predict real-time series data. The data set is a count time series of offense data, obtainable from the NSW Bureau of Crime Statistics and Research covering January 2000 to December 2009. The observations represent monthly counts of Sexual offenses in Ballina, NSW, Australia, which comprise 120 monthly observations.
Figure 2 shows the time plot and partial autocorrelation function (PACF). It shows that the data are nonstationary and are first-order autocorrelated, which is an indication that it could be reasonable to model this data set with our model. The descriptive statistics of the data are displayed in
Table 3. Next, we compared our TV-INAR(1) model with the INAR(1) model to fit the data set. In general, it is expected that the better model to fit the data presents smaller values for -log-likelihood, AIC and BIC. From the results in
Table 4, we can conclude that the proposed model fits the data better.
For the prediction, the predicted values of the offense data are given by
Specifically, the one-step-ahead conditional expectation point predictor is given by
We compute the root mean square of the prediction errors (RMSEs) of the data in the past 6 months, with the RMSE defined as
The estimators and RMSE results are also shown in
Table 4. To analyze the adequacy of the model, we analyze the standardized Pearson residuals, defined as
with
. For our model, the mean and variance of the Pearson residuals are 0.8022 and 1.2000, respectively. As discussed in ref. [
28], for an adequately chosen model, the variance of the residuals should take a value close to 1. And 1.2 seems to be close to 1. Therefore, we conclude that the proposed model fits the data well.