Higher-Order INAR Model Based on a Flexible Innovation and Application to COVID-19 and Gold Particles Data

: INAR models have the great advantage of being able to capture the conditional distribution of a count time series based on their past observations, thus allowing it to be tailored to meet the unique characteristics of count data. This paper reviews the two-parameter Poisson extended exponential (PEE) distribution and its corresponding INAR(1) process. Then the INAR of order p (INAR( p )) model that incorporates PEE innovations is proposed, its statistical properties are presented, and its parameters are estimated using conditional least squares and conditional maximum likelihood estimation methods. Two practical data sets are analyzed and compared with competing INAR models in an effort to gauge the performance of the proposed model. It is found that the proposed model performs better than the competitors.


Introduction
Exploring discrete data has become a specialized area in time series analysis, with applications in several industries, including banking, epidemiology, and telecommunications.Integer-valued autoregressive models of order p (INAR(p)) have become effective tools for tackling the complexities of count data.With the help of these models, which reveal the underlying dynamics and temporal connections in discrete time series, phenomena that are measured in discrete units can be understood more precisely.
Count data differ from conventional continuous data in the sense that they have particular difficulties and traits, such as non-negativity and discrete outcomes.Conventional continuous models often struggle to capture the complexities inherent in count-based time series.However, INAR(p) models offer an innovative solution tailored to the distinct properties of count data.The core strength of an INAR(p) model lies in its capacity to capture the conditional distribution of the current count variable, based on its past p observations.This autoregressive framework forms the foundation of the model, allowing it to account for temporal dependencies and seasonality effects commonly found in count time series.As a result, INAR(p) models facilitate accurate forecasting and a deeper comprehension of patterns in discrete data.
In actuality, the INAR(p) model is a development of the INAR(1) model proposed by [1,2].The initial approach was based on the binomial thinning and Poisson innovations.
Considering the fact that count time series are generally overdispersed, the Poisson distribution can no longer be applied to INAR (1).Researchers have proposed various distributions of innovation and thinning operations to overcome this issue.In this context, several researchers have contributed to the study of various models for discrete-valued processes with different modifications.For instance, [3] developed a family of models using Poisson marginal distributions.Subsequently, [4] delved into investigating a novel stationary INAR(1) process with geometric marginal distributions.This was achieved through a negative binomial thinning operator, and they obtained several properties of the process.Ref. [5] introduced another stationary INAR(1) process.In a different work, [6] considered the compound Poisson INAR(1) model, which deals with time series of overdispersed counts.Additionally, [7] investigated first-order non-negative integer-valued autoregressive processes, incorporating power series innovations.Furthermore, [8] constructed an INAR(1) model using Poisson-Lindley distributed innovations.References [9][10][11][12][13], among others, also substantiate this claim.
The studies of [14,15] extended the work of [2] using INAR models with p dependence.Meanwhile, [15] proposed an alternative to the more general INAR(p) process introduced by [14].According to [15], an INAR(p) process has the same autocorrelation structure as an AR(p), whereas according to [14], it has the same autocorrelation structure as an ARMA(p, p − 1) process.Most authors follow the set-up described by [15].References [16,17] also contributed to the class of INAR(p) models.Later, [18] introduced an INAR(p) process with a signed generalized power series thinning operator.
Although the INAR(p) model is flexible in dealing with higher-order autoregressive processes, it does not incorporate periodicity, which is a common time series characteristic in a wide range of applications, including air quality and health.Periodically correlated stochastic processes are described in [19] with periodically varying mean, variance and covariance.The flexibility of the seasonal and/or periodic INAR models are studied by [20][21][22][23], among others.
The Poisson extended exponential (PEE) distribution and its related INAR(1) model are discussed in length in [24], and the current study extends that discussion to the INAR(p) model.On the basis of the weekly number of syphilis cases in the United States during 2007-2010, [24] showed that the INAR(1) with PEE innovations performed better than the other competitive INAR(1) models.These findings suggest the need to look at the PEE in other important settings, such as INAR(p) model, and to compare it with other popular INAR(p) models, demonstrating its superiority.
Firstly, in Section 2, we review the PEE distribution and associated INAR(1) model, and then an INAR(p) model based on the PEE distribution is presented, which is named the PEE-INAR(p) model, in Section 3. Section 4 discusses conditional least squares (CLS) estimation, as well as conditional maximum likelihood (CML) estimation.Section 5 provides a simulation study.Section 6 examines how the proposed model works on practical data sets to illustrate its effectiveness.A conclusion is provided in Section 7.

The INAR(1) Process with the PEE Innovations
According to [24], a comprehensive definition of the PEE distribution can be found in their paper.Compounding the Poisson and extended exponential (EE) distributions results in the PEE distribution.The probability density function of EE distribution is given by The stochastic structure of a random variable X with a PEE distribution is as follows: If the random variable X follows the PEE distribution, which holds the following stochastic representation, X|λ ∼ P(λ) and λ|η, γ ∼ EE(η, γ), where λ > 0, η > 0 and γ ≥ 0, then the unconditional probability mass function (pmf) of X has the following form: It is important to note that when γ = 1, (1) is reduced to the discrete Poisson-Lindley distribution (see, [8]).Some of the important properties of the PEE distribution are listed below.The probability generating function for a random variable X with the PEE distribution is provided by for |s| < η + 1.The mean and variance are given by and respectively.The dispersion index (DI) of the PEE distribution is given by In the PEE distribution, DI is always greater than one, demonstrating an overdispersion feature, as η > 0, γ ≥ 0. PEE distribution is effective as an innovation distribution in INAR(1) process based on binomial thinning, resulting in the PEE-INAR(1) model, [24].

The PEE-INAR(1) Model
The PEE-INAR(1) process is given by where 0 < α < 1 and the innovation process is denoted by {ϵ t } t∈Z , which are independent and identically distributed (iid) integer-valued random variables having mean E(ϵ t ) = µ ϵ and variance V(ϵ t ) = σ 2 ϵ .The binomial thinning operator is denoted by the symbol • and is defined as where U j j≥1 is the sequence of Bernoulli random variables called counting series with It is important to note that in (4), these sequences U j j≥1 are independent of each other and of {ϵ t }.For the PEE-INAR(1) process, the one step transition probability is given by where 0 < α < 1. Ref. [25] provides the mean, variance and DI of {X t } t∈Z by using the mean, variance and DI of the innovation distribution.For the PEE-INAR(1) process, they are According to [25], and [26], the conditional expectation, conditional variance, covariance and correlation of the PEE-INAR(1) process are given by and Cor(X t , X t−h ) = α h .

The INAR(p) Model with PEE Innovations
The INAR(1) process in (3) can be extended to the general INAR(p) process to yield where ϵ t ∼ PEE(η, γ), 0 < α m < 1, m = 1, 2, ..., p.Like (3), U j,m is made up of independently distributed Bernoulli random variables with the value α m .Additionally, ϵ t is supposed to be independent of {X s } s<t at every time.According to [15], if ∑ p m=1 α m < 1, then a unique stationary and ergodic solution exists for {X t }.Also, under stationarity, Cov(X t , X t+h ) = ϕ h,t = ϕ h .
From [27], for the PEE-INAR(p) process, the p-step transition probabilities are given by By substituting P(ϵ t = x t − (i 1 + ... + i p )), we get From [15,27,28] and [25], we get the mean and variance of the {X t } t∈Z , respectively, as where The conditional moments are given by and A sample path of the model, for instance, is the simulated paths of the PEE-INAR(2) process given in Figure 1.The simulated sample is considered for n = 400 with different parameter values.These plots show the stationary and ergodic behavior of the time series data.

Estimation
Two different methods of parameter estimation are used here to obtain the unknown parameters of the model.

Conditional Least Squares
The function below is minimized to obtain the CLS estimators of the parameters of PEE-INAR(p) process: A simulation study is used to test the performance of CML and CLS estimators in the next section.

Simulation Study
Simulation studies were conducted to evaluate the performance of CML and CLS estimates based on finite samples.We consider five different sample sizes (i.e., 50, 100, 200, 300, 400) with 1000 replications.Tables 1 and 2 present the simulation results.Based on this simulation, we can see that the bias and MSE decrease with increasing sample size when using the PEE-INAR(2) model as a higher order of the PEE-INAR(p) model.The parameter combinations are (α 1 = 0.5, α 2 = 0.3, η = 1.6, γ = 0.7) and (α 1 = 0.4, α 2 = 0.2, η = 1.2, γ = 0.9).Even though their behaviors are similar, the CML often offers less bias and MSE than the CLS, noting that the least bias for negative values is the one with a value close to 0. We have included a condensed version of the underlying R code in Appendix A.

Empirical Study
The considered data sets are fitted to INAR(1) and INAR(2) models under different distributed innovations and the model parameters are estimated by the CML method.The following are the competitive models taken for comparison: (i) INAR model based on the discrete Teissier innovations (DT-INAR), see [29].
The log-likelihood function, or log L, is calculated for each model, along with Akaike information criterion (AIC) and Bayesian information criterion (BIC) values, which are given as where r is the number of parameters.The following is the considered practical data with its analysis.

COVID-19 Data
Our discussion in this section focuses on the application of the proposed model.A total of 91 observations were gathered about the daily death cases in Switzerland from 6.1.2021to 8.30.2021.It is taken from the website https://covid19.who.int/data(accessed on 14 September 2023).Accordingly, the mean, variance, and DI are 1.8462, 3.9094 and 2.1175, respectively, which reflects the overdispersion of the data.Figure 2 presents the time series plot of the COVID-19 data.Figure 3 illustrates autocorrelation function (ACF), partial autocorrelation function (PACF) and histogram plots.From the ACF and PACF, it is evident that second-order autoregressive models are the more desirable ones.Table 3 presents the data analysis results, which imply that PEE-INAR(2) has the maximum log L and minimum AIC and BIC.Based on the results, the PEE-INAR(2) process provides a better fit than the other competing models.

Gold Particles Data
The second set of data includes 380 observations, which is taken from [25].The count values were measured over time in a fixed volume element of a colloidal solution, where the particles move in Brownian motion.In this case, the mean, variance, and DI are, respectively, 1.5605, 1.6242, and 1.0409, which indicates that the data is overdispersed.The time series plot of the gold particles data is illustrated in Figure 4. Figure 5 illustrates the ACF, PACF, and histogram plots.It is evident from the ACF and PACF plots that second-order models are preferred.Data analysis results can be found in Table 4, which indicate that the PEE-INAR(2) possesses the maximum log L and minimum AIC and BIC.It appears that the PEE-INAR(2) process provides a better fit than the competing models.

Conclusions
The non-negative nature of count data as well as its discrete outcomes make it a unique type of data with specific challenges.It is often difficult for conventional continuous models to capture the complexities of count-based time series.Its core strength lies in its ability to capture the conditional distribution of the current count variable based on its past p observations, which makes INAR(p) models ideal for handling count data that have unique properties.This paper reviews the PEE distribution and INAR(1) process and then proposes the PEE-INAR(p) model.Parameter estimation is based on the CML and CLS methods and various properties of the model are analyzed.Based on the empirical results, it was found that the PEE-INAR(2) model had a better performance in all aspects than all other models compared.In general, the improvement in model fit is attributed to the order of the INAR model or size of the model's past values (plus the innovation) that it analyzes.In addition, some of the improvement is due to the recommended distribution of innovation, namely the PEE distribution.In summary, the model's best performance is owing to the data that was chosen using the previously mentioned factors.If the associated bivariate INAR(p) process were built based on the bivariate PEE innovations, the research may take a different direction.It is necessary to modify and study the work substantially in the future, so we will entrust it to future research.

Figure 4 .
Figure 4.The time series plot of the gold particles data.

FrequencyFigure 5 .
Figure 5.The ACF, PACF, and histogram plots of the gold particles data.

Table 3 .
Fitting results of the COVID-19 data.

Table 4 .
Fitting results of the gold particles data.