Statistical Inference for Periodic Self-Exciting Threshold Integer-Valued Autoregressive Processes

This paper considers the periodic self-exciting threshold integer-valued autoregressive processes under a weaker condition in which the second moment is finite instead of the innovation distribution being given. The basic statistical properties of the model are discussed, the quasi-likelihood inference of the parameters is investigated, and the asymptotic behaviors of the estimators are obtained. Threshold estimates based on quasi-likelihood and least squares methods are given. Simulation studies evidence that the quasi-likelihood methods perform well with realistic sample sizes and may be superior to least squares and maximum likelihood methods. The practical application of the processes is illustrated by a time series dataset concerning the monthly counts of claimants collecting short-term disability benefits from the Workers’ Compensation Board (WCB). In addition, the forecasting problem of this dataset is addressed.


Introduction
There has been considerable interest in integer-valued time series because of their wide range of applications, including epidemiology, finance, and disease modeling. Examples of such data are as follows: the number of major global earthquakes per year, monthly crimes in a particular country or region, and patient numbers in a hospital per month over a period of time, etc. Following the first-order integer-valued autoregressive (INAR(1)) models introduced by Al-Osh and Alzaid [1], INAR models have been widely used, see Du and Li [2], Jung et al. [3], Weiß [4], Ristić et al. [5], Zhang et al. [6], Li et al. [7], Kang et al. [8] and Yu et al. [9], among others. However, for so-called piecewise phenomenon such as high thresholds, sudden bursts of large values, and time volatility, the INAR model will not work well. The threshold models (Tong [10]; Tong and Lim [11]) have attracted much attention and have been widely used to model nonlinear phenomena. To capture the piecewise phenomenon of integer-valued time series, Monteiro et al. [12] introduced a class of self-exciting threshold integer-valued autoregressive (SETINAR) models driven by independent Poisson-distributed random variables. Wang et al. [13] proposed a selfexcited threshold Poisson autoregressive (SETPAR) model. Yang et al. [14] considered a class of SETINAR processes that properly capture flexible asymmetric and nonlinear responses without assuming the distributions for the errors. Yang et al. [15] introduced an integer-valued threshold autoregressive process based on a negative binomial thinning operator (NBTINAR(1)).
In addition, there are many sources of business, economic and meteorology time series data showing a periodically varying phenomenon that repeats itself after a regular period of time. It may be affected by seasonal factors and human activities. For dealing with the processes exhibiting periodic patterns, Bennett [16] and Gladyshev [17] proposed periodically correlated random processes. Then, Bentarzi and Hallin [18], Lund and Basawa [19], Basawa and Lund [20], and Shao [21], among other authors, studied the periodic autoregressive moving-average (PARMA) models in some detail. To capture the periodic phenomenon of integer-valued time series, Monteiro et al. [22] proposed the periodic integer-valued autoregressive models of order one (PINAR(1)) with period T, driven by a periodic sequence of independent Poisson-distributed random variables. Hall et al. [23] considered the extremal behavior of periodic integer-valued moving-average sequences. Santos et al. [24] introduced a multivariate PINAR model with time-varying parameters. The analysis of periodic self-exciting threshold integer-valued autoregressive (PSETINAR(2; 1, 1) T ) processes was introduced by Pereira et al. [25]. Manaa and Bentarzi [26] established the existence of high moment and the strict periodic stationarity for the PSETINAR(2; 1, 1) T processes. The CLS and CML methods are applied to estimate the parameters while using the nested sub-sample search (NeSS) algorithm proposed by Li and Tong [27] to estimate the periodic threshold parameters. A drawback of this PSETINAR(2; 1, 1) T model is that the mean and variance of Poisson distribution are equal, which is not always true in the real data. Therefore, in this paper, we remove the assumption of Poisson distribution, only specify the relationship between mean and variance of observations, develop quasi-likelihood inference for the PSETINAR(2; 1, 1) T processes, and consider the estimation of thresholds.
Quasi-likelihood is a non-parametric inference method proposed by Wedderburn [28]. It is very useful in cases where the exact distributional information is not available, while only the relation between mean and variance of the observation is given, and it enjoys a certain robustness of validity. Quasi-likelihood has been widely applied. For example, Azrak and Mélard [29] proposed a simple and efficient algorithm to evaluate the exact quasi-likelihood of ARMA models with time-dependent coefficients; Christou and Fokianos [30] studied probabilistic properties and quasi-likelihood estimation for negative binomial time series models; Li et al. [31] studied the quasi-likelihood inference for the self-exciting threshold integer-valued autoregressive (SETINAR(2,1)) processes under a weaker condition; Yang et al. [32] modeled overdispersed or underdispersed count data with generalized Poisson integer-valued autoregressive (GPINAR(1)) processes and investigated the maximum quasi-likelihood estimators.
The remainder of this paper is organized as follows. In Section 2, we redefine the PSETINAR(2; 1, 1) T processes under weak conditions and discuss their basic properties. In Section 3, we consider the quasi-likelihood inference for the unknown parameters. Thresholds estimation is also discussed. Section 4 presents some simulation results for the estimates. In Section 5, we give an application of the proposed processes to a real dataset. The forecasting problem of this dataset is addressed. Concluding remarks are given in Section 6. All proofs are postponed to the Appendix A.
The following proposition establishes the conditional mean and the conditional variance of the PSETINAR(2; 1, 1) T process, which plays an important role in the study of the process properties and parameter estimations. Proposition 1. For any fixed j = 1, 2, . . . , T, with T ∈ N 0 , the conditional mean and the conditional variance of the process {X t } for t = j + sT and s ∈ Z defined in (2) are given by The following theorem states the ergodicity of the PSETINAR(2; 1, 1) T process (2). This property is useful in deriving the asymptotic properties of the parameter estimators. Theorem 1. For any fixed j = 1, 2, . . . , T, with T ∈ N 0 , the process {X t } for t = j + sT and s ∈ Z defined in (2) is an ergodic Markov chain.

Parameters Estimation
Suppose we have a series of observations {X j+sT , j = 1, 2, . . . , T, s ∈ N 0 } generated from the PSETINAR(2; 1, 1) T process. The goal of this section is to estimate the unknown 2 , λ 2 , . . . , α T , λ T ) and threshold parameters vector r = (r 1 , r 2 , . . . , r T ) . This section is divided into two subsections. In Section 3.1, we estimate the parameters vector β by using the maximum quasi-likelihood (MQL) method when the thresholds value is known. We consider the maximum quasi-likelihood (MQL) and conditional least square (CLS) estimators of thresholds r in Section 3.2.

Estimation of Parameters β
As described in Proposition 1 (ii), we have the variance of X t conditional on X t−1 , , k = 1, 2, j = 1, 2, . . . , T, then the Var X j+sT |X j+sT−1 admits the representation j+sT−1 + σ 2 z,j , for ∀j = 1, 2, . . . , T, s ∈ N 0 . As discussed in Wedderburn [28], we have the set of standard quasi-likelihood estimating equations: for i = 1, . . . , 3T, where N is the total number of cycles. By solving (4), the quasi-likelihood estimator can be obtained. This method is essentially a two-step estimation, if θ j is unknown, we propose substituting a suitable consistent estimator of θ j obtained by other means, getting modified quasilikelihood estimating equations and then solving them for the primary parameters of interest. In the modified quasi-likelihood estimating equations, we replace θ j with a suitable consistent estimatorθ j . For simplicity in notation, we define V −1 This approach leads to the modified quasi-likelihood estimatorβ MQL of β (see Zheng, Basawa and Datta [33]):β where , and q N = q 1,N , q 2,N , . . . , q T,N , moreover, the 0's are (3 × 3)-null matrices, Q j,N and q j,N (j = 1, 2, . . . , T) given by Note that we use consistent estimatorθ j = α Next, the proposition gives consistent estimatorsσ 2 z,j of σ 2 z,j , which depends on some consistent estimatorsα (k) j andλ j with k = 1, 2, j = 1, 2, . . . , T.

Proposition 2.
The following variance estimators for {Z j+sT } with j = 1, 2, . . . , T, s ∈ N 0 are consistent: for k = 1, 2, j = 1, 2, . . . , T, s ∈ N 0 , in whichα (k) j andλ j are consistent estimators of α (k) j and λ j (for example, we can use the CLS estimators given in Theorem 3.1 of Pereira et al. [25]), furthermorē The two estimations are based on conditional variance Var X j+sT |X j+sT−1 and variance Var X j+sT , respectively. The details can be found in the Appendix A.
To study the asymptotic behavior of the estimatorβ MQL , we make the following assumptions about the process of {X t }: (C1) By Proposition 1 in Pereira et al. [25], we assume the {X t } is a strictly ciclostationary process; (C2) E|X t | 4 < ∞.
Now for the asymptotic properties of the quasi-likelihood estimatorβ MQL given by (5), we have the following asymptotic distribution. Theorem 2. Let {X t } be a PSETINAR(2; 1, 1) T process defined in (2), then under the assumptions (C1)-(C2), the estimatorβ MQL given by (5) is asymptotically normal, where with matrices H j (j = 1, 2, . . . , T) given by It is worth mentioning that this theorem reflects the consistency of the estimatorβ MQL .

Estimation of Thresholds Vector r
Note that in the real data application, the threshold values are also unknown. In this subsection, we estimate the thresholds vector r = (r 1 , r 2 , . . . , r T ) . Here, we further promote the nested sub-sample search (NeSS) algorithm (see, e.g., Yang et al. [15], Li and Tong [27], and Li et al. [31]) and use conditional least squares (CLS) and modified quasi-likelihood (MQL) principles to estimate r.
For some fixed λ = (λ 1 , λ 2 , . . . , λ T ) , the application of the conditional least squares principle yields the sum of squared errors: and then the thresholds vector r can be estimated by minimizing S N (r, λ), where r and r are some known lower and upper bounds of r. In practice, they can be selected as the minimum and maximum values in each cycle of the sample. For convenience, we consider an alternative objective function Now, the optimization in (8) is equivalent tô wherer CLS is the conditional least squares estimator of the thresholds vector r. Inspired by the method of conditional least squares, we investigate the performances of r by using the quasi-likelihood principle. The modified quasi-likelihood estimatorr MQL of r is obtained by maximizing the expressioñ where S N (r, λ) It is worth mentioning that there are unknown parameters λ j with j = 1, . . . , T when we use (9) and (10) to estimate thresholds vector r. As argued in Li and Tong [27], Yang et al. [14], and Yang et al. [15], when λ and r are one-dimensional parameters, we can choose any positive number as the value of λ without worrying about getting a wrong result ofr. Fortunately, we also find out by simulations that the estimations of r by maximizingJ N (r, λ) and J N (r, λ) do not depend on the value of λ. In order to give an intuitive impression ofJ N (r, λ)/N, we generate a set of data with Model I (given in Section 4, i.e., T = 2, N = 50, β = (0.2, 0.1, 3, 0.8, 0.1, 7) , r = (8, 4) ), and plot the shapes ofJ N (r, λ)/N. From Figure 1, we can see that for different values of λ, the shape of J N (r, λ)/N changes, but the maximum value in each subfigure is obtained at the true thresholds vector r = (8,4) . In practice, we can choose the mean in each cycle of the samples for λ j , j = 1, 2, . . . , T.
Actually, using the quasi-likelihood method to estimate the thresholds is a three-step estimation procedure, and we now present the algorithm to implement our estimation procedure as follows: Step 1: Choose the upper bound r and lower bound r of r, solve (9) to get ther CLS with λ j =X j = 1 N ∑ N−1 s=0 X j+sT , j = 1, 2, . . . , T; Step 2: Fixr CLS at the current value, solve (6) or (7) to get theσ 2 z,j , j = 1, 2, . . . , T, where α (k) j and λ j with k = 1, 2 can be estimated by other methods, then solve (5) to getβ MQL .

Simulation Study
In this section, we conduct simulation studies to illustrate the finite sample performances of the estimates. The initial value X 0 is fixed at 0. In order to capture the characteristics of the data from the PSETINAR(2; 1, 1) T process, we first generate a set of data with the distribution of innovations {Z t } given by Model I (mentioned below in this section) and parameters β = (0.2, 0.45, 1, 0.2, 0.45, 2, 0.65, 0.45, 1, 0.65, 0.45, 2, 0.2, 0.45, 3, 0.2, 0.45, 7, 0.8, 0.45, 7, 0.2, 0.1, 3, 0.8, 0.1, 7, 0.2, 0.1, 7, 0.8, 0.45, 2) , r = (3, 3, 3, 1, 3, 3, 5, 9, 3, 6, 7) , T = 11, N = 50. The parameter vectors we choose here are randomly selected, and there are slight differences between the parameters of each cycle, the thresholds vector of r was chosen such that there are enough data in each regime. We give the sample path in the first six cycles in Figure 2, of which N = 6. We can see that even if there are slight differences between the parameters of each cycle, the dataset still exhibits periodic characteristics.
To report the performances of the estimates, we conduct simulation studies under the following three models: Model II. Assume that {Z t } is a sequence of i.i.d. periodic Geometric distributed random variables with p.m.f. given by For {Z 1t } given in Model I and {Z 2t } given in Model II, we can easily see that For each model, we generate the data with X 0 = 0, set T = 3 and the sample sizes n = NT = 150, 300, 900. All the calculations are performed under the R3.6.2 software with 1000 replications. We use the command constrOptim to optimize the objective function of the maximum likelihood estimation. The threshold vector is calculated by the algorithms discussed in Section 3.2. Other algorithms are based on the explicit expressions.

Performances of theβ CLS ,β MQL andβ CM L
Pereira et al. [25] provided a theoretical basis for the conditional least squares (CLS) and conditional maximum likelihood (CML) estimators of the parameters vector β in the PSETINAR(2; 1, 1) T process but did not conduct simulation research. Manaa and Bentarzi [26] provided the asymptotic properties of the estimators and compared their performance through a simulation study. To compare the performance of the three estimatorŝ β CLS ,β CML andβ MQL (given in Section 3), we conduct simulation studies for these three estimators under Models I to III. The parameters are selected as follows: , r = (12,7,9) .
To eliminate the influence of the change of parameters on estimates, we choose the series randomly and change the parameters with fixed α (k) , k = 1, 2 or λ separately. The selection of these thresholds ensures there are enough data in each regime.
Spectral analysis starts from finding hidden periodicity, and it is an important subject of time series frequency domain analysis. The approach for studying hidden periods based on frequency domain analysis is the periodogram method, proposed by Schuster [34]; the rigorous examination is shown in Fisher [35]. For a series of observations {X t }, t = 1, 2, . . . , n, the periodogram is defined as where The sample path and periodogram of the Series A, B and C under Model I are plotted in Figure 3 to show the periodic characteristics. Because the period is three and short, it is difficult to see the period from the sample path, but the periodogram can show the period very well. In addition, the simulation results are summarized in Tables 1-9.
As expected, biases and MSE of the estimators decrease as the sample size N increases, which is in agreement with the asymptotic properties of the estimators: asymptotic unbiasedness and consistency. Most of the biases and MSE in Model II are larger than those in Model I. Maybe this is because the variance of {Z t } in Model II is larger than that in Model I, which leads to the fluctuation of data.
Tables 1-6 summarize the simulation results for different series under Model I and Model II. From these tables, we can see that most of the biases and MSE ofβ MQL are smaller thanβ CLS . Perhaps it is because that the MQL method uses more information about the data than the CLS method. Therefore, the MQL method can obtain the optimal value more accurately. In addition, most of the biases ofβ MQL are smaller thanβ CML , while the MSE is larger, which is because the CML uses the distribution. If the distribution is correct, it is indeed better than the MQL. It is worth mentioning that the CML method is more complicated and time-consuming than the MQL method in the simulation procedure. We can conclude that the MQL estimators are better than CLS estimators, and the CML estimators are not unanimously better than MQL estimators.       To demonstrate the robustness of the MQL method, we consider the simulations about Model III with different series by using CLS, MQL and CML methods, and set N = 300, ρ = (0.9, 0.9, 0.9), (0.8, 0.8, 0.8), respectively. From Tables 7-9, we can see that when ρ varies from (0.9, 0.9, 0.9) down to (0.8, 0.8, 0.8), the effect on CLS and MQL estimators is slight. Most of the biases and MSE of MQL estimators are smaller than CLS. But due to incorrect distribution used, the biases and MSE of CML estimators increase. This indicates that the MQL method is more robust than CLS and CML methods.     Table 9. Bias and MSE for Series C of Model III with N = 300 (MSE in parentheses).

Performances ofr MQL andr CLS
As discussed in Section 3.2, we estimate the thresholds vector by using conditional least squares and modified quasi-likelihood methods. The performances ofr MQL andr CLS are compared in this subsection through simulation studies. From the simulation results in Section 4.1, we find that the contaminated data generated from Model III has little influence on least squares and quasi-likelihood estimators, so we only simulate thresholds estimation for different series under Model I and Model II. We assess the performance of r by the bias, MSE and bias median, where the bias median is defined by: Bias median = median i∈{1,2,...,K} (r ij − r 0j ), j = 1, 2, . . . , T, wherer ij is the estimator of r 0j , r 0j is the true value with j = 1, 2, . . . , T, and K is the number of replications. The simulation results are summarized in Tables 10-15.   Tables 13-15. This might be because the variance of Model II is larger than Model I for each series. Moreover, almost all the biases, bias medians and MSE of MQL estimators are smaller than CLS estimators, and the MSE of some MQL estimators are even half of the CLS. Because the thresholds are integer values, when we assess the accuracy of the estimators, the bias medians estimated can be more reasonable. It is concluded that it is much better to estimate the thresholds with the MQL method than CLS.
In the process of simulation, we generate the data with X 0 = 0; however, 0 is not the mean of the process, so we generate a set of data, discard some data generated first, and use the remaining data for inference, namely, "burn in" samples. Here, we generate a set of data with a length of 1800. We do the simulations for Series A of Model I, Model II and Model III (ρ = (0.8, 0.8, 0.8)). Other simulation settings are the same as before. The simulation results are listed in Tables 16-20. From these tables, we can see that under the "burn in" samples, the estimated results are similar to that when the initial value is 0, which indicates that the initial value will not affect our estimated results.

Real Data Example
In this section, we use the PSETINAR(2; 1, 1) T process to fit the series of monthly counts of claimants collecting short-term disability benefits. In the dataset, all the claimants are male, have cuts, lacerations or punctures, and are between the ages of 35 and 54. In addition, they all work in the logging industry and collect benefits from the Workers' Compensation Board (WCB) of British Columbia. The dataset consists of 120 observations, from 1985 to 1994 (Freeland [36]). The simulations were performed on the R3.6.2 software. The threshold vector was calculated by the algorithms (the three-step algorithm of NeSS combined with quasi-likelihood principle and the algorithm of NeSS combined with least squares principle) described in Section 3.2. We uses the command constrOptim to optimize the objective function of the maximum likelihood estimation. Figure 4 shows the sample path, ACF and PACF plots of the observations. It can be seen from Figure 4 that this dataset is a dependent counting time series with periodic characteristic.  We use the periodogram method to determine the period about this dataset and draw Figure 5, from which it can be seen that I n ( f k ) reach maximum at f k = 1/12, and concluded that T = 12. This displays the periodic characteristic of the data and exhibits a form of periodic change per year.  Table 21 displays the descriptive statistics for the monthly counts of claimants collecting short-term disability benefits from WCB. From Table 21, we can see that the mean and variance are approximately equal in some months. We can assume that the distribution of the innovations is a periodic Poisson. However, some months and the total data indicate overdispersion. We find that the dataset has no zero and the minimum value is one. This leads us to consider the periodic Poisson, periodic Geometric, zero-truncated periodic Poisson and zero-truncated periodic Geometric distributions for the innovations to fit the model, respectively. Before the model fitting, we first estimate the threshold vector. Thê r CLS is calculated by (9) and ther MQL is calculated through (10) by using the three-step algorithm. Table 22 summarizes the fitting results ofr CLS andr MQL . Due to the lesser data, to fit the model better, when the number of data in each regime is relatively smaller than two or the threshold is the maximum or minimum value of the boundary, we think that these monthly data do not have a piecewise phenomenon, that is, March, July, and August do not have piecewise phenomena.  To capture the piecewise phenomenon of this time series dataset, we use PINAR(1) T and PSETINAR(2; 1, 1) T models with period T = 12 to fit the dataset, respectively. The PINAR(1) process proposed by Monteiro et al. [22] with the following recursive equation with α t = α j ∈ (0, 1) for t = j + sT(j = 1, . . . , T, s ∈ N 0 ), the definition of thinning operator "•" and innovation process {Z t } is the same as the PSETINAR(2; 1, 1) T process. It is worth mentioning that for this dataset, the conditional least squares and quasilikelihood methods produce non-admissible estimators for some months, so we use the conditional maximum likelihood approach to estimate the parameters. Next, we use PSETINAR(2; 1, 1) 12 and PINAR(1) 12 models to fit the dataset in combination with the four innovation distributions mentioned before. Here, the threshold vectors are based onr MQL . The AIC and BIC are listed in Table 23. When we fit the dataset, we hope to get smaller AIC and BIC values. From the results in Table 23, we can conclude that the PSETINAR(2; 1, 1) 12 model with zero-truncated periodic Poisson distribution is more suitable. Then, we do the conditional maximum likelihood estimation, and the results are listed in Table 24. Some estimators of the parameters in Table 24, for example, the α (2) of January, May, June, September, October and November, are not statistically significant, suggesting that on those months, the number of claims is mainly modeled through the innovation process. To check the predictability of the PSETINAR(2; 1, 1) T model, we carry out the hstep-ahead forecasting for varying h of the PSETINAR(2; 1, 1) T model. The h-step-ahead conditional expectation point predictor of the PSETINAR(2; 1, 1) T model is given bŷ Specifically, the one-step-ahead conditional expectation point predictor is given bŷ However, the conditional expectation will seldom produce integer-valued forecasts. Recently, coherent forecasting techniques have been recommended, which only produce forecasts in N 0 . This is achieved by computing the h-step-ahead forecasting conditional distribution. As pointed out by Möller et al. [37], this approach leads to forecasts themselves being easily obtained from the median or the mode of the forecasting distribution. In addition, Li et al. [38] and Kang et al. [8] have applied this method to forecast the integer-valued processes. Homburg et al. [39] discussed the prediction methods based on conditional distributions and Gaussian approximations and applied them to some integer-valued processes and compared them. For the PSETINAR(2; 1, 1) T process, the one-step-ahead conditional distribution of X j+sT+1 given X j+sT is given by Due to the existence of the threshold, while we use the conditional expectation method to predict X j+sT+h , h > 1, we have to predict the previous moment of X j+sT+h−1 first and compare it with the corresponding threshold before we do the next prediction. We do the same for the conditional distribution method. (To prevent confusion, we call this method a point-wise conditional distribution forecast. The predictors completely based on h-stepahead conditional distribution without intermediate step prediction will be discussed later.) The mode of h-step-ahead point-wise conditional distribution can be viewed as the point prediction. Here we compare the two forecasting methods, a standard descriptive measure of forecasting accuracy, namely, h-step-ahead predicted root mean squared error (PRMSE) is adopted. This measure can be given by where K is the full sample size, we split the data into two parts, and the last K − K 0 observations as a forecasting evaluation sample. We forecast the value of the last year when h = 1, 2, 3, 12.
The PRMSEs of the h-step-ahead point predictors are list in Table 25. For conditional expectation point predictors, the PRMSEs of PSETINAR(2; 1, 1) 12 with zero-truncated periodic Poisson distribution are smaller than the PINAR(1) 12 with periodic Poisson and zero-truncated periodic Poisson distributions. This further shows the superiority of our model. The PRMSEs of the one-step-ahead point predictors are smaller than others. This is very natural because we use the value of the previous moment as the explanatory variable. For PSETINAR(2; 1, 1) 12 with zero-truncated periodic Poisson distribution, the PRMSEs of twelve-step-ahead predictors are smaller than other h-step-ahead predictors except for one-step-ahead. This may be because our period is 12. The PRMSE of one-step-ahead conditional expectation point predictors is smaller than point-wise conditional distribution point predictors. Thus, the former method is better for this dataset.
The PRMSEs of the one-step-ahead fitted series calculated by conditional expectation and conditional distribution are 2.434 and 3.565, respectively. This further illustrates that for our dataset, one-step-ahead forecasting conditional expectation is better than conditional distribution. The original data and the fitted series (calculated by the one-stepahead conditional expectation based on the observations of the previous moments) by the PSETINAR(2; 1, 1) 12 model with zero-truncated periodic Poisson distribution are plotted in Figure 6. It is observed that the trend is similar to the original data. Except for the points with large value (the unexpected prediction may be due to the wrong judgement of regime), this model fits the data well.  Actually, we can get the h-step-ahead conditional distribution; here, we list the twostep-ahead and three-step-ahead conditional distributions as an example, where m ∈ {0, 1, . . . , n} is the possible domain of X j+sT , j = 1, . . . , T, and s ∈ N 0 . When h = 1, 2, 3, we show the plots of the h-step-ahead conditional distribution in Figure 7, where x j+sT represents the count of claimants in December 1993 and February 1994, respectively. The mode of h-step-ahead conditional distribution can be viewed as the point prediction. The PRMSEs of the two-step-ahead and three-step-ahead point predictors for the last year are 3.227 and 3.215, respectively, which is larger than the point-wise conditional distribution method described before. Maybe for other datasets or models, the h-step-ahead forecasting conditional distribution will show some advantages. We will not go into details here.

Conclusions
This paper extended the PSETINAR(2; 1, 1) T process proposed by Pereira et al. [25], by removing the assumption of Poisson distribution of {Z t } and considered the PSETINAR (2; 1, 1) T process under weak conditions that the second moment of {Z t } is finite. The ergodicity of the process is established. MQL-estimators of the model parameters vector β, MQL-estimators and CLS-estimators of the thresholds vector r are obtained. Moreover, through simulation, we can see the advantages of the quasi-likelihood method by comparing with the conditional maximum likelihood and conditional least square methods. An application to a real dataset is presented. In addition, the forecasting problem of this dataset is addressed.
In this paper, we only discuss the PSETINAR(2; 1, 1) T process for univariate time series. Hence, an extension for the multivariate PSETINAR(2; 1, 1) T process with a diagonal or cross-correlation autoregressive matrix is a topic for future investigation. Furthermore, it is also important to stress that beyond this extension, there are a number of interesting problems for future research in this area. For example, even a simple periodic model can have an inordinately large number of parameters. This is also true for PSETINAR(2; 1, 1) T models and even multi-period models. Therefore, the development of procedures of dimensionality reduction to overcome the computational difficulties is an impending problem. This remains a topic of future research.

Data Availability Statement:
The dataset is available in the book Freeland [36].

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Proof of Theorem 1. According to Theorem 2 of Tweedie [40] (see also, Zheng and Basawa [41]), for the process defined by (2), and ∀j = 1, 2, . . . , T, s ∈ Z, we have denotes the integer part of a number. Then for and for x < K, Therefore, the process {X t } for t = j + sT defined in (2) is an ergodic Markov chain.
Note that E X j+sT−1 I Therefore, by substituting a suitable consistent estimator of α (k) j , based on moment estimation, we can get the estimatorσ 2 2,z,j in Proposition 2.

S
(1) , the 0's are (3 × 3)-null matrices. Now, we replace V θ j X j+sT |X j+sT−1 by Vθ j X j+sT |X j+sT−1 , whereθ j is a consistent estimator of θ j . We aim to get the result