Consistent Model Selection Procedure for Random Coefficient INAR Models

In the realm of time series data analysis, information criteria constructed on the basis of likelihood functions serve as crucial instruments for determining the appropriate lag order. However, the intricate structure of random coefficient integer-valued time series models, which are founded on thinning operators, complicates the establishment of likelihood functions. Consequently, employing information criteria such as AIC and BIC for model selection becomes problematic. This study introduces an innovative methodology that formulates a penalized criterion by utilizing the estimation equation within conditional least squares estimation, effectively addressing the aforementioned challenge. Initially, the asymptotic properties of the penalized criterion are derived, followed by a numerical simulation study and a comparative analysis. The findings from both theoretical examinations and simulation investigations reveal that this novel approach consistently selects variables under relatively relaxed conditions. Lastly, the applications of this method to infectious disease data and seismic frequency data produce satisfactory outcomes.


Introduction
Integer-valued time series are ubiquitous in scientific research and everyday life, encompassing examples such as the daily count of hospitalized patients admitted to hospitals and the frequency of crimes committed daily or monthly.Consequently, integer-valued time series have increasingly garnered attention from scholars.However, traditional continuous-valued time series models fail to capture the integer-valued characteristics, only approximating integer-valued data through continuous-valued time series models.This approximation may result in model misspecification issues, complicating statistical inference.As a result, the modeling and analysis of integer-valued time series data have become a growing area of focus in academia.Among the variety of integer-valued time series modeling methods, thinning operator models have gained favor due to their resemblance to autoregressive moving average (ARMA) models found in traditional continuous-valued time series theory.Thinning operator models substitute the multiplication in ARMA models with the binomial thinning operator introduced by Steutel and Van Harn [1]: In this equation, Y i represents a count sequence, while {B i } denotes a series of Bernoulli random variables independent of {Y i }.The probability mass function satis- fies P(B i = 1) = 1 − P(B i = 0) = φ with φ ∈ [0, 1).Building on this foundation, Al-Osh and Alzaid [2] developed the first-order integer-valued autoregressive (INAR (1)) model for t ∈ N + : where Z t is regarded as the innovation term entering the model at period t, with its marginal distribution being a Poisson distribution with an expected value of λ.Consequently, model ( 2) is called the Poisson INAR(1) model.Later, Du and Li [3] introduced the INAR(p) model and provided conditions for ensuring the stationarity and ergodicity of the INAR(p) process.The incorporation of additional lag terms increased the model's flexibility.Subsequently, Joe [4] and Zheng, Basawa, and Datta [5] developed the random coefficient thinning operator model (RCINAR (1)) by allowing the parameter φ in the INAR(1) model to follow a specific random distribution.Zheng, Basawa, and Datta [6] extended the RCINAR(1) model to the p-th order integer-valued autoregressive model, known as the RCINAR(p) model.Zhang, Wang, and Zhu [7] established a data-driven empirical likelihood interval estimation for the RCINAR(p) model using the empirical likelihood (EL) estimation method.By employing the geometric thinning operator (also referred to as the negative binomial thinning operator) proposed by Ristić, Bakouch, and Nastić [8], Tian, Wang, and Cui [9]) constructed an INAR(1) model capable of describing seasonal effects.Lu [10] investigated the prediction problem of the thinning operator model using the Taylor expansion.
For further discussions on thinning operator models, readers can consult the textbook by Weiß [11].
In general, researchers engaged in statistical analysis, particularly during the initial stages of time series data investigation, frequently encounter the challenge of model selection.Current model selection techniques can be broadly categorized into three groups: The first group relies on sample autocorrelation (ACF) and partial autocorrelation (PACF) functions for model selection, as exemplified by Latour [12]; the second group, which is the most prevalent method for variable selection, comprises a series of information criteria founded on maximum likelihood estimation.Akaike [13] introduced the Akaike Information Criterion (AIC) by performing an unbiased estimation of the expected log-likelihood function, while Schwarz [14] established the Bayesian Information Criterion (BIC) by employing a Laplace expansion for the posterior estimation of the expected log-likelihood function.Ding, Tarokh, and Yang [15] devised a novel information criterion for autoregressive time series models by connecting AIC and BIC.Furthermore, given that empirical likelihood estimation can substantially circumvent issues stemming from model misspecification and maintain certain maximum likelihood estimation features, researchers have started to investigate data-driven information criteria based on empirical likelihood estimation.Variyath, Chen, and Abraham [16] formulated the Empirical Akaike Information Criterion (EAIC) and the Empirical Bayesian Information Criterion (EBIC) by drawing on the principles of AIC and BIC with empirical likelihood estimation.They demonstrated that EBIC possesses consistency in variable selection.Chen, Wang, Wu, and Li [17] addressed potential computational convergence problems in empirical likelihood estimation by incorporating external estimators (typically moment estimators) into the empirical likelihood function, thereby developing a robust and consistent information criterion.For additional discussions on information criteria, readers may consult the textbook by Konishi and Kitagawa [18] and the review article by Ding, Tarokh, and Hong [19].
In the specific domain of integer-valued time series analysis, our objective is to determine which lagged variables of Y t ought to be incorporated into the model.Extensive research has been conducted on model selection for integer-valued autoregressive conditional heteroskedasticity (INARCH) models, which allow for relatively straightforward likelihood function establishment.Notable examples include Weiß and Feld [20], who provided comprehensive numerical simulations for integer-valued time series model selection using information criteria, and Diop and Kengne [21], who introduced consistent model selection methods for INARCH models based on quasi-maximum likelihood estimation.However, the process becomes more challenging when dealing with higher-order and random coefficient INAR(p) models constructed using thinning operators.The complexity of the likelihood functions and the substantial computational requirements make it difficult to establish and utilize information criteria.Consequently, Zheng, Basawa, and Datta [6] proposed estimating the model based on its conditional moments rather than relying on likelihood functions.While this approach facilitates the estimation of unknown parameters for researchers, it creates complications for variable selection.To overcome this hurdle, Wang,Wang,and Yang [22] implemented penalty functions and pseudoquasi-maximum likelihood estimation (PQML) for variable selection, demonstrating the robustness of their method even when faced with contaminated data.Drawing inspiration from these preceding studies, this paper endeavors to establish a novel model selection method akin to information criteria founded upon the estimating equations in conditional least squares (CLS) estimation.Furthermore, we attempt to demonstrate the consistency of this innovative model selection method in addressing variable selection problems within integer-valued time series.This approach circumvents the need for complex probability distribution assumptions while preserving effective variable selection capabilities.
The organization of this paper is as follows: In Section 2, we revisit the RCINAR(p) model, introduce the proposed information criterion, and outline its asymptotic properties.In Section 3, we carry out numerical simulation studies on variable selection utilizing this information criterion.In Section 4, we endeavor to apply this information criterion for variable selection in real data sets.Lastly, in Section 5, we engage in a discussion and offer concluding remarks.

RCINAR Model and Model Selection Procedure
In this section, we discuss the ergodic stationary RCINAR model and its associated model selection methods.

RCINAR(p) Model and Its Estimation
The INAR(p) model with constant coefficients, as introduced by Du and Li [3], is formulated as follows: In this expression, given the vector Let {Y t } T t=1 represent a non-negative integer-valued sequence.The RCINAR(p) model is defined by the following equation: where "•" denotes the thinning operator defined in Equation (1).Let θ 0 = φ 10 , . . ., φ p0 , λ 0 be the true parameter vector of this data-generating process, with θ 0 ∈ Θ, where Θ is a compact subset of R p+1 .θ = φ 1 , . . ., φ p , λ represents the p + 1 dimensional parameter vector to be estimated.Here, φ (t) j are sequences of independent and identically distributed random variables defined on [0, 1) with a mean of φ j , and their probability density function f φ j (φ) ≥ 0, ∀φ ∈ [0, 1), with ∑ p j=1 φ j < 1.Moreover, we do not assume a specific parametric distribution for {Z t }, only requir- ing that {Z t } be an independent and identically distributed non-negative integer-valued random variable sequence with a mean of λ and a probability mass function f Z (z) ≥ 0, ∀z ∈ N. In this context, we consider the semiparametric INAR model as described by Drost, Van den Akker, and Werker [23].

Remark 1.
As can be discerned from the preceding discussion, the INAR(p) model (3) represents a special case of the RCINAR(p) model (4).That is, when φ (t) j is a constant coefficient vector, the RCINAR(p) model reduces to the INAR(p) model.As demonstrated by Zheng, Basawa, and Datta [6], the statistical methods employed in the study of the RCINAR(p) model can also be directly applied to the INAR(p) model.Consequently, in order to cater to a wider range of application scenarios, the academic community tends to prioritize the study of the RCINAR model while investigating thinning operator models.For instance, Kang and Lee [24] investigated the problem of change-point detection in the RCINAR model by leveraging the Cumulative Sum (CUSUM) test.Similarly, Zhang, Wang, and Zhu [7] proposed an interval estimation method for the RCINAR model based on empirical likelihood estimation.Awale, Balakrishna, and Ramanathan [25], on the other hand, constructed a locally most powerful-type test devised specifically for examining structural changes within the framework of the RCINAR model.Therefore, this paper will center its research on the RCINAR model.
To estimate the RCINAR(p) model and establish model selection criteria, we draw inspiration from the assumptions delineated by Zhang, Wang, and Zhu [7].These assumptions are as follows: (A1) {Y t } constitutes an ergodic and strictly stationary RCINAR(p) process.(A2) There exists δ > 0 such that E|Y t | 4+δ < ∞.
Derived from Equation ( 4), the one-step-ahead transition probability is as follows: Here, P p .Utilizing this one-step-ahead transition probability function, we can construct the likelihood function: The likelihood function L for model ( 4) is notably complex, involving numerous multivariate numerical integrations within statistical computations, which demand substantial computational resources.Consequently, Zheng, Basawa, and Datta [6] advocated for estimating the model based on its conditional moments rather than employing the likelihood function.This preference also underlies the prevalent use of conditional least squares (CLS) estimation in the study of RCINAR(p) models within the scholarly community.In the subsequent section, we offer a concise introduction to the CLS estimation methodology for the RCINAR(p) model.
We can obtain the first-order conditional moment of model (4) as follows: This derivation allows us to compute the conditional least squares (CLS) estimation.Let represent the conditional least squares (CLS) objective function.The CLS estimator is then given by: θ = argmin θ (S(θ)) Then the estimating equations are: where For the estimating equation Ψ t (θ), we introduce an additional assumption: (A3) Ψ t (θ) is identifiable, that is, E(Ψ t (θ 0 )) = 0, and if θ is in the neighborhood of θ * = θ 0 , then E(Ψ t (θ)) exists and E(Ψ t (θ)) > 0. Assumption (A3) is the identifiability assumption, which further implies that the model ( 4) is identifiable if only the currently specified model satisfies E(Ψ t (θ 0 )) = 0. Based on these assumptions above, the following lemma can be deduced: Lemma 1.Based on assumptions (A1) to (A3), the subsequent conclusions are valid: ∂θ∂θ remains continuous within the neighborhood of θ 0 .(iii) Both Moreover, Zheng, Basawa, and Datta [6] established that θCLS is a consistent estimator with an asymptotic distribution: where:

Model Selection Procedure
For the data-generating process defined by Equation (4), we establish the following settings: 1.
Consequently, p + 1 represents the maximum model dimension we consider, noted as the full model, while the minimum model dimension we consider is 1, corresponding to an independent and identically distributed non-negative integer-valued random variable sequence.Let the true model be m 0 .

3.
Let , and all possible ∼ θ (m) values, when restricted to the |m| dimensional vector θ(m), are interior points of its corresponding compact subset Θ(m).Furthermore, we denote ∼ θ = θ(M) as the parameter vector to be estimated in ∼ Θ(M) = Θ(M), i.e., the parameter vector of the full model M.
For model m, we partition We can then divide the estimating equation θ into two parts: ),CLS (m), 0 , i.e., θ(1),CLS (m) is the solution to resents the CLS estimator of model m.Define the function: We can then derive the following lemma: Because the proof of this lemma closely resembles the proof of Theorem 1 in Zhang, Wang, and Zhu [7], we omit the details.It is important to note that when m = M, θCLS (M) is the solution to the estimating equation ∑ T t=p+1 Ψ t ∼ θ = 0, and in this case, .
Definition 1.We propose the following penalized criteria: where the penalty term P T is an increasing sequence, P T → ∞ and satisfies P T = O T Remark 2. Intuitively, in this penalized criterion, H θCLS (m) serves as a measure of the model's fit to the data.If it can be demonstrated that the divergence rate of H θCLS (m) is slower when m 0 ⊆ m compared to the divergence rate of H θCLS (m 1 ) when m 0 m 1 , then a smaller H θCLS (m) would suggest a superior fit of model m to the data.However, upon closer examination, it becomes evident that if we merely adopt model M, then H θCLS (M) = 0 .Consequently, it is necessary to introduce a penalty term, P T •|m|, to constrain the number of lagged variables incorporated by model m.By striking a balance between the degree of data fitting H θCLS (m) and the number of lagged variables P T •|m|, Theorems 1-3 substantiate the ability to select the appropriate model.
Under the correct model specification, the following theorem can be derived: Theorem 1.Given assumptions (A1) and (A2), under the correct model specification , where Λ j is the eigenvalue of the matrix Σ 11 , where Theorem 1 establishes the asymptotic distribution of θ(1),CLS (m) and H θCLS (m) under the correct model specification, which serves as a crucial component in the derivation for the consistency of our penalized criteria (6).In the following, we discuss the performance of H θCLS (m) when the model specification m is incorrect.

Theorem 2. Given assumptions (A1)-(A3), for any
and assumption (A3) ensure that if the model m is misspecified, H θCLS (m) will diverge to positive infinity at a rate of at least T 1 2 .Combining Theorems 1 and 2, we can present the primary conclusion of this paper.When the model is specified as m, we have the following theorem.
Theorem 3. Given assumptions (A1)-(A3), we have: From the proof of Theorem 3, and Lemma A.1, we can observe that the divergence rate of P T needs to be at least as fast as log(T).In practical applications, we may use settings such as P T = T 1 5 .In such settings, although log(T) P T → 0 , in finite samples, P T < log(T).
In fact, in the interval [4, 332, 106], T 1 5 < log(T), which may result in the performance of P T = T 1 5 not being as effective as P T = log(T) in finite samples.Nevertheless, such penalty term settings still hold value, and we will discuss this situation in the numerical simulation section.
Theorem 3 provides the consistency of the penalized criteria ( 6) for model selection.It becomes evident that Theorem 3 holds under very relaxed assumptions and relies solely on the CLS estimation, which can be rapidly completed in any statistical software, and the estimating equation constructed by first-order conditional moments, which is easy to derive.This makes the penalized criteria ( 6) highly suitable for use in INAR models, particularly in RCINAR models.Now let m be the model selected by the criterion (6):

m = argmin m⊆M (H(m) + P T •|m|)
We now present the asymptotic properties of the selected model: Theorem 4. Given assumptions (A1)-(A3), we have: where: Remark 3. From the inference process in this section, we can see that the estimating equation used in constructing the penalized criteria (6) actually utilizes the information of E(Y t |F t−1 ), where and does not involve the information of thinning operators.Therefore, the penalized criteria (6) can be applied to models with the same linear form conditional expectations, such as INARCH models and continuous-valued AR models.The likelihood functions of INARCH and AR models can be established with relative ease, enabling us to compare the efficacy of the penalty criteria (6) with that of AIC and BIC across both models.

Numerical Simulations
In this section, we first conduct a simulation study to evaluate the performance of the penalized criteria proposed in this paper for INAR models.Secondly, to compare the proposed penalized criteria with the traditional likelihood-based AIC and BIC, we apply these criteria to INARCH models and AR models.Finally, by utilizing innovation terms of different random distributions, we carry out a simulation study on the robustness of the penalized criteria proposed in this paper.

Performance of the Penalized Criteria in INAR Models
In this subsection, we consider the true data-generating process to be: where the mean of φ (t) 3 is 0.2, and λ = 2, i.e., ∼ θ 0 = (0.4, 0, 0.2, 2) .By applying the penalized criteria (6), we attempt to select the true model from all RCINAR models up to the third order.In Table 1 In addition, "Coef" denotes the random distribution of the coefficient.In this subsection, we focus on the performance of penalized criteria in INAR models.We use boldface to highlight the true model, i.e., y t−1 , y t−3 .We compare three different penalty term settings P T = log(T), P T = T 1 follows a uniform distribution on the interval [0, 0.8], φ 1 follows a beta distribution with a mean of 0.4, φ 3 follows a beta distribution with a mean of 0.2.In this scenario, we fix the parameter vector (a, b) for the beta distribution with a = 4 and control the parameter b to achieve different means.We consider sample sizes T = 100, 200, 300, 500, 1000, and for each sample size T and parameter setting, we perform 1000 independent repeated experiments.
As shown in Table 1, for the three penalty terms, the accuracy of model selection using the penalized criteria (6) increases with the sample size T, consistent with the asymptotic conclusion described in Theorem 3.However, when the sample size is large, we find that the accuracy of P T = T 1 5 is slightly worse than P T = T 1 3 and P T = log(T).This is because However, in the interval [4, 332, 106], T 1 5 < log(T), which may cause the performance of P T = T 1 5 in larger finite samples to be not as good as P T = log(T).Nonetheless, the penalty term setting P T = T 1 5 is not entirely without merit.As shown in Table 1, when the sample size is small, i.e., T ≤ 500, the performance of P T = T 1 5 is better.To investigate the performance of the three penalty terms under varying sample sizes and coefficient mean settings, we continue to consider model (7), where φ 1 follows a beta distribution with a mean of 0.4, and φ 3 follows a beta distribution with a mean of φ 3 .In Figure 1, we report the impact of sample size on the accuracy of the penalized criteria using the three penalty terms under different φ 3 settings.In Figures 1 and 2, the red line represents P T = T 1 3 , the black line represents P T = log(T), and the blue line represents P T = T 1 5 , and the vertical axis of both figures represents the frequency of the penalized criteria ( 6) selecting the correct model.It can be observed that when φ 3 is small or the sample size is small, the performance of P T = T 1 5 is superior.However, as φ 3 gradually moves further from 0 and the sample size increases, the performance of P T = T   In Figure 2, we report the frequency of selecting the model 3 • Y t−3 + Z t using the penalized criteria (6) as φ 3 gradually varies from 0 to 0.4 under different sample size conditions.It should be noted that when 3 • Y t−3 + Z t represents an incorrect model setting and the correct model setting, in this case, should be As shown in Figure 2, when the sample size is small, particularly when the sample size is 100, the performance of P T = T 1 5 is notably improved compared to P T = log(T) and P T = T 1 3 .As the sample size increases, this advantage gradually diminishes, but the penalty term setting P T = T 1 5 still maintains an advantage when φ 3 is relatively close to 0.
Based on the numerical simulation results presented in this subsection, we can offer recommendations for applying the penalized criteria (6): when the sample size is small, or some coefficients in the true model are relatively close to 0, we can employ the penalty term setting P T = T 1 5 .In other cases, the performance of the penalty term settings P T = T 1 3 and P T = log(T) is comparable and slightly better than P T = T 1 5 .Furthermore, we also conducted a simulation study on lag variable selection for the data-generating process: where the mean of φ (t) 2 is 0.3.The results can be found in Table A1 in Appendix A.

Performance of Penalized Criteria in INARCH Models and AR Models
As stated in the Remark of Section 2, we can apply the penalty criteria (6) to both INARCH and AR models.Because the likelihood functions for these two models can be easily established, we can compare the performance of the penalty criteria (6) with that of AIC and BIC for both these models.

INARCH Model
In this subsection, we consider the true data-generating process as follows: where φ 1 = 0.4, φ 3 = 0.2, and φ 0 = 2. Fokianos, Rahbek, and Tjøstheim [26] proposed this model and derived the conditions for its stationarity and ergodicity.By applying the penalized criteria (6) alongside AIC and BIC, we attempt to select the true model from all INARCH models up to the third order.In Table 2: "Criterion" denotes the model selection criteria we use, and we use H + P T •|m| to denote penalized criteria (6).Furthermore, we have bolded the true model y t−1 , y t−3 .We consider sample sizes T = 100, 200, 300, 500, 1000, and for each sample size T and parameter setting, we conduct 1000 independent repeated experiments.
From Table 2, we can observe that, similar to the INAR case, the accuracy of P T = T 1 5 is slightly worse than P T = T 1 3 and P T = log(T) in larger sample sizes, but in smaller sample sizes, i.e., T ≤ 500, the performance of P T = T 1 5 is superior.In addition, from Table 2, we can observe that the accuracy of the penalized criteria proposed in this paper is roughly equivalent to BIC when P T = T 1 3 and P T = log(T), while the accuracy is roughly equivalent to AIC in small samples when P T = T 1 5 , but P T = T 1 5 is far better than AIC when the sample size is large.Additionally, we provide a simulation study on lag variable selection for the datagenerating process: The results can be found in Table A2 in the Appendix A.

AR Model
In this subsection, we consider the true data-generating process as follows: where φ 1 = 0.4, φ 3 = 0.2, φ 0 = 1, and Z t follows a normal distribution with a mean of 0 and a standard deviation of 2. By applying the penalized criteria (6) alongside AIC and BIC, we attempt to select the true model from all AR models up to the third order.In Table 3: "Criterion" denotes the model selection criteria we use, and we use H + P T •|m| to denote penalized criteria (6).We use boldface to highlight the true model: From Table 3, we can observe that, similar to the INAR case, the accuracy of is slightly worse than P T = T 1 3 and P T = log(T) in larger sample sizes, but in smaller sample sizes, i.e., T ≤ 500, the performance of P T = T 1 5 is superior.The comparison of the penalized criteria proposed in this paper with AIC and BIC in the AR model is analogous to that in the INARCH model; thus further elaboration is not required.

Robustness of Variable Selection Procedure
In this section, we investigate the robustness of the penalized criteria (6) for different distributions of the innovation term Z t in model (7).Specifically, we consider Z t to follow a Poisson distribution, a geometric distribution with a mean of 2, and a uniform distribution over {0, 1, 2, 3, 4}.In Table 4, "Z t " denotes the random distribution of the innovation term, whereas "geom" denotes the geometric distribution.
Through Table 4, we observe that the penalized criteria (6) remain robust for various distributions of the innovation term Z t .This finding suggests that the criteria proposed in this paper can effectively select the correct lag order even when the innovation term adheres to different distributions.We use boldface to highlight the true model:  Furthermore, we compare the performance of the penalized criteria proposed in this paper, AIC, and BIC when the innovation term Z t in AR model ( 8) follows a uniform distribution over [−2, 2] while the assumption of Z t is a normal distribution with mean 0 and unknown variance σ 2 Z .In Appendix A, Table A3 shows that regardless of the distribution of the innovation term, when the conditional mean is set correctly, the performance and robustness of the penalized criteria proposed in this paper are generally equivalent to those of AIC and BIC.

COVID-19 Infection Data
The investigation of data related to infectious diseases constitutes a crucial application of integer-valued time series models within the public health domain.In May 2020, the Ministry of Health in Cyprus disseminated a national epidemic surveillance report, which displayed the temporal data pertaining to the number of infections during the initial phase of the COVID-19 outbreak.Conducting research on this data is instrumental for the public health academia in uncovering the intrinsic mechanisms governing epidemic propagation.Owing to the incubation period associated with the coronavirus, individuals who contract the virus typically disclose their infection status to governmental statistical departments after a lapse of several days.As a result, it becomes imperative to scrutinize the matter of lag variable selection within this time series dataset, see Figure 3   Based on the ACF plot, it can be inferred that the data may stem from an autoregressive data-generating process.The PACF plot suggests that selecting either the model: 3 • Y t−3 + Z t is reasonable, as the partial autocorrelation function for a lag of three periods does not significantly exceed the critical value.Consequently, we employ the model selection procedure (6) for variable selection, and we provide the result in the following Table 5.Given that the penalized criteria ( 6) favor the model under all three penalty settings, we adopt this model.The estimated results for this model are: where the mean of Z t is 1.8567; 1 and φ 3 as two non-negative random variables, have expected values of 0.5736 and 0.2933, respectively.This finding suggests that during the initial stages of the outbreak in Cyprus, the number of infections on a given day may have been influenced by the number of infections one day and three days prior.

Seismic Frequency Data
The exploration of earthquake frequency constitutes a significant application frontier for integer-valued time series models.As documented by Zucchini, MacDonald, and Langrock [27], comprehensive annual data delineating global seismic occurrences of magnitude seven or above, encompassing the period from 1900 to 2007, has been provided.This wealth of data offers a promising platform for scholars seeking to unravel the intricate mechanisms underpinning the mutual interactions among earthquakes.It is envisaged that through a meticulous investigation of this time-series data associated with seismic activities, one might gain insights into whether the interplay is mediated by crustal stress dynamics or alternative conduits, see Figure 4 below.
Informed by the ACF, we hypothesize that the underlying data generation process might be suitably modeled by an autoregressive construct.On the other hand, insights gleaned from the PACF advocate for the application of a first-order autoregressive model.To substantiate this conjecture further, we will proceed to invoke the penalized criterion (6) as our analytical tool in the ensuing discourse, and we provide the result in the following Table 6.Given that under the three penalty settings, the penalized criterion (6) exhibits a preference for the model we opt to adopt this model.The estimated results for this model are as follows: In this model, the mean value of Z t is identified as 2.1014.The derived estimations posit that every occurrence of a magnitude seven or higher earthquake in the preceding year induces a count of similar-intensity earthquakes in the subsequent year, which manifests as a discrete random variable with an expected value of 0.5799.Simultaneously, the number of major earthquakes occurring independently each year is approximately two.These results substantiate the existence of a year-on-year time-varying dependency mechanism in the frequency of major seismic disasters.

Discussion and Conclusions
In this paper, we propose a model selection criterion based on an estimation equation established in Conditional Least Squares estimation.This penalized method does not rely on detailed distributional assumptions for the data-generating process.It circumvents the complex likelihood function construction in Random Coefficient Integer-Valued Autoregressive models and can consistently select the correct variables under relatively mild assumptions.
In our numerical simulations, we compared the impact of three penalty term settings on the performance of the penalty criteria.We found that the impact of these penalty terms on the performance of the information criteria varies as partial coefficients in the RCINAR model move farther away from 0 or as the sample size increases.Moreover, we applied the model selection method proposed in this paper to both the INARCH and traditional continuous-valued AR models.We discovered that in both scenarios where likelihood functions can be easily constructed, the proposed model selection criteria and the traditional likelihood-based information criteria, AIC and BIC, exhibit similar model selection efficiency.Specifically, under the settings of P T = T 1 3 and P T = log(T), the accuracy of the proposed model selection method is similar to that of BIC.However, in cases with smaller sample sizes, the proposed model selection method with P T = T performs similarly to AIC while outperforming AIC with larger sample sizes.
In the future, model selection methods based on estimation equations have considerable potential for development.In this discussion section, we briefly introduce three aspects: (1) Distinguishing between different thinning operators or innovation terms with varying distributions: The criterion (6) provided in this paper is primarily used for lag variable selection but lacks the ability to differentiate between various thinning operators and distinct distributions of innovation terms.It is well known that INAR models can describe scenarios such as zero inflation, variance inflation, and extreme values by flexibly selecting thinning operators and innovation terms.Therefore, if a model selection criterion can distinguish between different thinning operators and varying distributions of innovation terms, it will have a more extensive application scope.(2) Incorporating higher-order conditional moments from the data-generating process into the information criterion.Through the form of the H ∼ θ function: It is evident that criterion (6) only contains the mean structure information of the model and lacks the ability to describe higher-order moment information.Since many variants of the INAR model exhibit differences in higher-order moments, incorporating higher-order moment information into the model selection criterion would enable criterion (6) to perform model selection within a broader context.(3) Detecting change points.In the field of time series data research, the change point detection problem has a long history.Specifically, within the integer-valued time series domain, the change point problem refers to the existence of positive integers τ 1 , τ 2 , • • • , τ m , such that: For continuous-valued time series models, Chen and Gupta [28] introduced a method for change point detection using AIC and BIC.Since parameter changes are prominently reflected in the mean structure of INAR models, it is likely feasible to perform change point detection using the criterion (6) based on the estimation equations.
where, according to Equation (5), o p θCLS (m) Thus, we have: where I is the identity matrix, and let we have: Let Ω = Σ 11 , which implies that Ω is a positive semi-definite matrix.Consequently, according to Johnston [29]'s Theorem 2.1.6,we have Ω = UΛU , where U is an orthogonal matrix, Λ is a diagonal matrix, and the diagonal elements of Λ are the eigenvalues of Ω, denoted as Λ 1 , . . ., Λ p+1 .Thus, where [J ] j denotes the j-th element of vector J .Therefore, we obtain: Thus, it is known that Λ j is the eigenvalue of the matrix Σ 11 , and 11 .
Proof of Theorem 3. We divide the proof of this theorem into two parts: (1) If m 0 m, by applying Theorem 1 and Lemma A1 to m 0 , and Theorem 2 to m, we know that for P T = O T If m 0 ⊆ m, then by applying Theorem 1 and Lemma A1 to both m 0 and m, we know that for P T → ∞ : Proof of Theorem 4. Following the steps in Diop and Kengne [21], we have: For x = (x i ) 1≤i≤p+1 , x i ∈ R, define: Then we have: According to Theorem 3, as T → ∞ : ∈m 0 = 0 and (x i ) i / ∈m 0 is a set of real numbers, i.e., (x i ) i / ∈m 0 < ∞, then by Lemma 1: −p are deemed to be mutually conditionally independent.This conditional independence ensures that the autocorrelation function of the INAR(p) model is congruent with that of its continuous-valued Autoregressive (AR(p)) counterpart.Moreover, Du and Li [3] substantiated that, under these model settings, the stationarity condition for the INAR(p) model necessitates that the roots of the polynomial h(z) = 1 − φ 1 z − • • • − φ p z p = 0 are located outside the unit circle.This implies that the INAR(p) model attains stationarity when the sum ∑ p i=1 φ i is less than 1.Building upon these foundational insights, Zheng, Basawa, and Datta [6] extended the INAR(p) model under the constant coefficient assumption, giving rise to the Random Coefficient Integer-valued Autoregressive (RCINAR(p)) model.

1 3
, and P T = T 15 and consider three different distributions for φ Fixed coefficients, i.e., φ (t) worse than P T = log(T) and P T = T 1 3 .

Figure 1 .
Figure 1.The impact of sample size on accuracy under different φ 3 settings.

Figure 2 .
Figure 2. The impact of φ 3 settings on accuracy under different sample sizes. below.

Figure 4 .
Figure 4. Global frequency of earthquakes of magnitude seven or greater between 1900 and 2007.
below: i.i.d.represents an i.i.d.Poisson random variable sequence,

Table 1 .
(6)quency of model selection for INAR model of order 2 by the penalized criterion(6).

Table 2 .
(6)quency of model selection for INARCH model of order 2 by the penalized criterion(6).

Table 3 .
(6)quency of model selection AR model of order 1 by the penalized criterion(6).

Table 4 .
Frequency of model selection of INAR model by the penalized criterion (6) with Z t misspecification.

Table 5 .
Result of model selection with COVID-19 data.

Table 6 .
Result of model selection with seismic frequency data.

Table A1 .
(6) Z is a standard normal random vector of dimension |m 0 | Frequency of model selection for INAR model of order 1 by the penalized criterion(6).

Table A2 .
Frequency of model selection for INARCH model of order 1 by the penalized criterion (6).

Table A3 .
Frequency of model selection for AR model by the penalized criterion (6) with Z t misspecification. Y