Change Point Test for the Conditional Mean of Time Series of Counts Based on Support Vector Regression

This study considers support vector regression (SVR) and twin SVR (TSVR) for the time series of counts, wherein the hyper parameters are tuned using the particle swarm optimization (PSO) method. For prediction, we employ the framework of integer-valued generalized autoregressive conditional heteroskedasticity (INGARCH) models. As an application, we consider change point problems, using the cumulative sum (CUSUM) test based on the residuals obtained from the PSO-SVR and PSO-TSVR methods. We conduct Monte Carlo simulation experiments to illustrate the methods’ validity with various linear and nonlinear INGARCH models. Subsequently, a real data analysis, with the return times of extreme events constructed based on the daily log-returns of Goldman Sachs stock prices, is conducted to exhibit its scope of application.


Introduction
In this study, we developed a forecasting method for the time series of counts based on support vector regression (SVR) with particle swarm optimization (PSO), and used it to detect a change in the conditional mean of the time series based on the cumulative sum (CUSUM) test that is calculated from integer-valued autoregressive conditional heteroscedastic (INGARCH) residuals. Over the past few decades, the time series of counts have gained increased attention from researchers in diverse scientific areas. Considering the research conducted by [1][2][3][4][5], two classes of models, such as integer-valued autoregressive (INAR) and INGARCH models, have been popular for analyzing the time series of counts. See [6] for more details. These models have been harnessed to analyze polio data [7], crime data [8], car accident traffic data [9], and financial data [10].
Although the basic theories and analytical tools for these models are quite well developed in the literature, as seen in [11][12][13][14][15], a restriction on their usage exists because both INAR and INGARCH models are mostly assumed to have a linear structure in their conditional mean. In INGARCH models, Poisson and negative binomial distributions have been widely adopted as the conditional distribution of current observations over past information. This is because assuming these distributions is not impractical, as the correct specification of underlying distributions is not essential when attempting to estimate the conditional mean equation, as demonstrated by [16], who considered the quasi-maximum likelihood estimation (QMLE) method for the time series of counts. However, for the QMLE approach to perform adequately, the conditional mean structure must be correctly specified. As misspecification can potentially lead to false conclusions in real situations, we considered SVR as a nonparametric algorithm for forecasting the time series of counts. To our knowledge, our current study is the first attempt in the literature to use SVR for the prediction of time series of counts based on the INGARCH scheme.
SVR has been one of the most popular nonparametric algorithms for forecasting time series and has been proven to outperform classical time series models, such as autoregressive and moving average (ARMA) and GARCH models, as SVR can approximate

INGARCH Model-Based Change Point Test
Let {Y t , t ≥ 1} be a time series of counts. In order to make inferences for {Y t }, one can consider fitting a parametric model to Y t , for instance, the INGARCH model with the conditional distribution of the one-parameter exponential family and the link function f θ , parameterized with θ ∈ Θ ⊂ R d , that describes the conditional expectation, namely, where F t denotes the past information up to time t, f θ is defined on [0, ∞) × N 0 with N 0 = {0, 1, . . .}, and p(·|·) is a probability mass function given by p(y|η) = exp{ηy − A(η)}h(y), y ≥ 0, where η is the natural parameter, A(·) and h(·) are known real-valued functions, B = A is strictly increasing, and η t = B −1 (X t ). B(η t ) and B (η t ) are the conditional mean and variance of Y t over past observations, respectively. Symbols X t (θ) and η t (θ) are used to emphasize θ. Conventionally, f θ is assumed to be bounded below by some real number c > 0 and to satisfy for all x, x ≥ 0 and y, y ∈ N 0 , where ν 1 , ν 2 ≥ 0 satisfies ν 1 + ν 2 < 1, which, according to [12], allows {Y t } to be strictly stationary and ergodic, required for the consistency of the parameter estimates. In practice, Poisson or negative binomial (NB) linear INGARCH(1,1) models with the negative binomial distribution with its mass function: Let θ 0 be a true parameter, which is assumed to be an interior point of the compact parameter space Θ. The θ 0 is then estimated using the conditional likelihood function of model (1), based on the observations Y 1 , . . . , Y n : whereη t (θ) = B −1 ( X t (θ)) is updated through the equations: X t (θ) = f θ ( X t−1 (θ), Y t−1 ) for t ≥ 2,X 1 (θ) =X 1 , with an initial value X 1 .The conditional maximum likelihood estimator (CMLE) of θ 0 is then obtained as the maximizer of the likelihood function in Equation (3): ). The authors of the reference [12,50] showed that, under certain conditions,θ n converges to θ 0 in probability and √ n(θ n − θ 0 ) is asymptotically normally distributed as n tends to ∞. Thisθ n is harnessed to make prediction for calculating residuals.
In our current study, we aim to extend Model (1) to the nonparametric model: where p(·|·) and g are unknown, and g is implicitly assumed to satisfy Equation (2). Provided g ∈ { f θ ; θ ∈ Θ} and p(·|·) is known a priori, one can estimate g withĝ = fθ. Even if p(·|·) is unknown, one can still consider using the Poisson or NB quasi-likelihood estimator (QMLE) method as in [16]. See also [54] for various types of CUSUM tests based on the QMLEs. However, when one has no prior information as to g, the parametric modeling may hamper the inference, and in this case, one can estimate g with the nonparametric SVR method stated below in Section 3. On Model (1), setting up the null and alternative hypotheses: H 0 : θ remain the same over t = 1, . . . , n. vs. H 1 : not H 0 , The authors of the reference [50] considered the problem of detecting a change in θ based on the CUSUM test: Furthermore, the authors of the references [55,56] employed the residual-based CUSUM of squares test: t , and h n = √ 2(log 10 n) 2 . The authors of the reference [50] verified that, under the null H 0 ,T res n behaves asymptotically the same as where t = Y t − X t (θ 0 ) and τ 2 = Var( 1 ). As { t } forms a sequence of martingale differences, we obtain T n ≈ T := sup 0≤s≤1 |B • (s)| in distribution [57], where B • denotes a Brownian bridge, owing to Donsker's invariance principle, so that, asT res n ≈ T n , we havê T res n ≈ T in distribution. For instance, H 0 is rejected ifT res n ≥ 1.3397 at the level of 0.05, which is obtainable with Monte Carlo simulations. Similarly, the authors of the reference [55] verified thatT square n ≈ T in distribution, so that the same critical values as for the case ofT res n can be harnessed. Provided that a change point exists, the location of change is identified aŝ This CUSUM framework for parametric models can be easily adopted for nonparametric models as far as the residualsˆ t can be accurately calculated, as seen in [21,22] who deal with the change point problem on SVR-ARMA and SVR-GARCH models. Below, when dealing with Model (4), instead ofˆ t in Equation (5), we useˆ t = Y t −ĝ(X t−1 , Y t−1 ) in the construction of the CUSUM tests in Equations (5) and (6).
When estimating g with SVR and TSVR, we train (y t , x t ) T either with y t =X t and x t = (X t−1 , Y t−1 ) T or y t = Y t and x t = (X t−1 , Y t−1 ) T , with some proper proxyX t−1 . The former has been used for the SVR-GARCH model in [21], while the latter is newly considered here, inspired by the fact that Y t = g(X t−1 , Y t−1 ) + ν t , where the error process {ν t } is a sequence of martingale differences, which also holds for Model (1) because we can express Y t = X t + ν t in this case. See Step 3 in Section 4 below for more details.

SVR-INGARCH Model
In this section, we provide an outline of the SVR, TSVR, and PSO methods for a quick reference and describe the change point test based on the SVR-INGARCH model.

Support Vector Regression
SVR is an extension of the SVM, originally proposed by [58], and merits accurate nonlinear prediction. SVR aims to identify a nonlinear function of the form: f (x) = w T φ(x) + b, where x denotes a vector of inputs, w and b are vectors of regression parameters, and φ is a known kernel function. The optimal w and b are determined from the -insensitive loss function (Vapnik, 2000): Given input vectors x i , scalar output y i , i = 1, . . . , n, and a constant C > 0, we construct the objective function of the SVR as follows: subject to where ξ 1,i , ξ 2,i > 0 denote slack variables that allow some points to lie outside the -band with a penalty, and C denotes a trade-off between the function complexity and the training error.
To obtain the optimal w and b, we formulate an unconstrained optimization problem using Lagrange multipliers [27]. The Karush-Kuhn-Tucker (KKT) conditions then lead to the following dual form: where α 1,i and α 2,i denote dual variables [26]. Subsequently, the optimization problem in Equation (9) In particular, we employ the Gaussian kernel for K in Equation (10), , and determine the tuning parameters γ 2 , C in Equation (8), and in the loss function Equation (7) using the PSO method on the cube of (C, γ 2 , ) with 1 ≤ C ≤ 100, 0.1 ≤ γ 2 ≤ 1, and 0.1 ≤ ≤ 1.

Twin Support Vector Regression
TSVR is a modified version of SVR [30]. Similar to TSVM [59], TSVR derives two nonparallel planes which respectively determines the 1 -sensitive downbound and 2 -sensitive upbound of data. Given input vectors x i and output y i , i = 1, . . . , n, the linear TSVR can be formulated as the constraint minimization problem as follows: where Y = (y 1 , . . . , y n ) T , A = (x 1 . . . x n ) T , 1 , 2 ≥ 0, e denotes a vector whose components are all equal to 1, C 1 , C 2 ≥ 0 are hyperparameters, and ξ 1 , ξ 2 ≥ 0 are slack variables. Each QPP has m constraints instead of 2m constraints and has an advantage of faster computational speed. To obtain the optimal w 1 and b 1 in Equation (11), we solve the QPP using the Lagrangian function: where α 1 ≥ 0 and β 1 are Lagrangian multiplier vectors. If there is an optimal solution, it must satisfy the following KKT conditions: (14) and (15), we have However, since G T G is only semidefinite, we introduce a regularization term σI, where σ > 0 is very small, to overcome some ill-conditioned setting and use Next, substituting Equation (16) and the KKT conditions into Equation (13), we obtain the dual QPP form: which yields u 1 . Likewise, Equation (12) can acquire the dual QPP form: where α 2 is the Lagrangian multiplier vector and h 2 = Y + e 2 . This yields u 2 = (w T 2 b 2 ) T . Then, the estimated regressor can be formulated as follows: For extending the linear TSVR to a nonlinear one, we use the kernel-generated nonparallel hyperplanes, that is, The optimization problem in this case is similar to the linear TSVR, and the nonlinear TSVR regressor is obtained as follows:

Particle Swarm Optimization Method
In a standard PSO algorithm [34], a set of d hyperparameters are considered as a particle of d-dimensional vector in search region S = {p = (p 1 , . . . , p d ) T ∈ R d ; l k ≤ p k ≤ u k with l k , u k ∈ R for k = 1, . . . , d}. Here, N particles are modeled to move in S, with the position p i = (p i1 , . . . , p id ) T and velocity v i = (v i1 , . . . , v id ) T for i = 1, . . . , N. The previous best position of the i-th particle is represented by p best i = (p best i1 , . . . , p best id ) T , and the previous best position of all particles is represented by g best = (g best 1 , . . . , g best d ) T . At each iteration k, where 1 ≤ k ≤ K max with maximum iteration number K max , the velocity and position of the i-th particle are updated as follows: where c 1 and c 2 are two acceleration factors, r 1 and r 2 are two random variables following a uniform distribution over [0,1], and w k is an inertia factor defined by where w start and w end are initial and final values of inertia. Since the positions of particles are updated, p best i and g best are also updated as follows: The finally updated g best in the above procedure is used as an optimal hyperparameter in estimating the SVR and TSVR models as seen below.

PSO-TSVR Model-Based CUSUM Test
In this subsection, we explain how to estimate X t in Model (4) using the SVR and TSVR methods with PSO as described above and how to construct the CUSUM test based on the residuals obtained from the SVR-INGARCH model. In the following steps, we assume that a time series {Y 1 , . . . , Y n , Y n+1 , . . . , Y n+n } has no change points. • Step 1. As in Section 3.3, in order to apply the PSO method, we set a space of hyperparameters and initialize the positions {p 0 1 , . . . , p 0 N } and velocities {v 0 1 , . . . , v 0 N } of particles to be evaluated within this space. Subsequently, we divide the given time series into two groups of time series, {Y 1 , . . . , Y n } and {Y n+1 , . . . , Y n+n }. The former is used as a training set while the latter is used as a validation set. • Step 2. Compute the initial estimates of X t based on the training time series. For this task, we use two different methods. The first method is using moving averages (Niemira, 1994): where m is a positive integer. When t is smaller than m,X t is computed as an average of the first to the t-th squares of observations, i.e.X t = ∑ t j=1 Y t−j+1 /t when t < m. The second method is using the Poisson QMLE [54] assuming that the time series follows Model (1), for example, These estimates play the role of proxy of X t and replace the true conditional volatility.

•
Step 3. For particles p k i , k = 1, 2, . . . K max , we train (y t , x t ) T , either with y t =X t and x t = (X t−1 , Y t−1 ) T or y t = Y t and x t = (X t−1 , Y t−1 ) T with proxyX t−1 to the SVR and TSVR models to obtainĝ. Subsequently, for the first, we obtain named "X t -targeting", and for the second, named "Ŷ t -targeting", whereŶ t is an estimate of Y t , which is an estimate X t as well sinceŶ t itself is the conditional expectation. • Step 4. Applying the estimated SVR and TSVR models and using the same proxy formula as in Step 2 for the validation time series, the mean absolute error (MAE) is computed as follows: for the case of Equation (19), and for the case of Equation (20). The MAE is employed here because it is more robust against outliers in a model fitting than the root mean square error. • Step 5. Update the p k i , v k i , p best,k i , and g best,k as in Section 3.3 and repeat Steps 3 and 4 until the MAE in Step 4 converges within a limit or k reaches the maximum iteration number K max . • Step 6. Apply the estimated SVR and TSVR models with selected parameters in Step 5 to a testing time series to perform the CUSUM tests in Equations (5) and (6) based on the residualsˆ t = Y t −ĝ(Y t−1 ,X t−1 ).

Simulation Results
In this section, we apply the PSO-SVR and -TSVR models to the INAR(1) and IN-GARCH(1,1) models, and evaluate the performance of the proposed CUSUM tests. For this task, we generate a time series of length 1000 (n = 500, n = 500) to evaluate the empirical sizes and powers at the nominal level of 0.05. The size and power are calculated as the rejection number of the null hypothesis of no changes out of 500 repetitions. The simulations were conducted with R version 3.6.3, running on Windows 10. Moreover, we use the following R packages: "pso" for the PSO [60], "kernlab" for the Gaussian kernel [61], and "osqp" [62] for solving the quadratic problem. The procedure for the simulation is as follows.

•
Step 1. Generate a time series of length n = 1000 to train the PSO-SVR and -TSVR models. • Step 2. Apply the estimation scheme described in Section 4. For the proxy of moving averages, we used m = 5. In this procedure, we divide the given time series into n = 500 and n = 500 in Step 1 of Section 4. • Step 3. Generate a testing time series of length n = 1000 to evaluate the size and power. For computing sizes, we generate a time series of no changes, whereas to examine the power, we generate a time series with a change point in the middle. • Step 4. Apply the estimated model in Step 2 to the time series of Step 3 and conduct the residual CUSUM and CUSUM of squares tests. • Step 5. Repeat the above steps N times, e.g., 500, and then compute the empirical sizes and powers.
We consider the INGARCH(1,1) and INAR(1) models, as these are the most acclaimed models in practice: , where • is a binomial thinning operator and |φ| < 1. Further, upon one referee's suggestion, we also consider the softplus INGARCH(1,1) model in [63]: Under the null hypothesis, we use the parameter settings as follows.
• Model 1: - Under the alternative hypothesis, we only consider the case of one parameter change, while the other parameters remain the same. Tables A1-A4 in Appendix A summarize the results for Model 1. Here, MA and ML denote the proxies obtained from the moving average in Equation (17) and Poisson QMLE in Equation (18), andŶ t andX t , respectively, denote the two targeting methods in Equation (20) and Equation (19). The tables show that the difference between the SVR and TSVR methods is marginal. Moreover, in most cases,T square n appears to be much more stable thanT res n ; that is, the latter test suffers from more severe size distortions. In terms of power,T res n with the ML proxy andX t -targeting tends to outperform the others. However, the gap between this test andT square n is only marginal; therefore, considering the stability of the test,T square n is highly favored for Model 1. Tables A5-A7 summarize the result for Model 2, showing thatT res n exhibits a more stable performance for the INAR models than for the INGARCH models. However, it is still not as stable asT square n and tends to outperformT square n in terms of power. Table A8 summarizes the result for Model 3, showing no significant differences from the results of the previous models. This result, to a certain extent, coincides with that of Lee and Lee (2020) who considered parametric INGARCH models for a change point test. Overall, our findings strongly confirm the reliability of usingT square n , particularly with the ML proxy andX t -targeting. However, in practice, one can additionally implementT res n because either test can react more sensitively to a specific situation in comparison to the other. Table A9 lists the computing times (in seconds) of the SVR and TSVR methods when implemented in R on Windows 10, running on a PC with an Intel i7-3770 processor (3.4 GHz) with 8 GB of RAM, wherein the figures denote the averages of training times in simulations, and the values in the parentheses indicate the sample standard deviations. In each model and parameter setting, the values of the two quickest results are written in boldface. As reported by [21,30], the TSVR method is shown to markedly reduce the CPU time. In particular, the results indicate that the computational speed of the TSVR-based method, with the ML proxy andX t -targeting, appears to be the fastest in most cases. The result suggests that using the TSVR-based CUSUM tests is beneficial when computational speed is of significance to the implementation.

Real Data Analysis
In this section, we analyze the return times of extreme events constructed based on the daily log-returns of GS stock prices from 1 January 2003, to 28 June 2019, obtained using the R package "quantmod." We used data from 2 January 2003, to 29 June 2007, as the training set and that from 1 July 2009, to 28 June 2019, as the test set. Figures 1 and 2 exhibit the GS stock prices and 100 times the log-returns, with their ranges denoted by the green and blue vertical lines, respectively. As shown in Figure 2, the time series between the training and test sets has severe volatility, owing to the financial crisis that occurred in 2008; therefore, it is omitted from our data analysis.  Before applying the PSO-SVR-INGARCH and PSO-TSVR-INGARCH methods, similarly to [12,14], we first transform the given time series into the hitting times τ 1 , τ 2 , . . ., for which the log-returns of the GS stock fall outside the 0.15 and 0.85 quantiles of the training data, that is, -1.242 and 1.440, respectively. More specifically, τ 1 = inf{t ≥ 1; w t / ∈ [−1.242, 1.440]}, τ 2 = inf{t ≥ τ 1 ; w t / ∈ [−1.242, 1.440]}, . . ., where w t denote the 100 times log-returns. We then set Y t := τ t − τ t−1 , which forms the return times of these extreme events. Consequently, the training set is transformed into an integer-valued time series of length 341, and the test set is transformed into that of length 844 (see Figure 2), plotting Y t .
To determine whether the training set exhibits change, we apply the Poisson QMLE method and CUSUM of squares test from [54]. The result shows that the CUSUM statisticŝ T square n has a value of 0.590, which is smaller than the theoretical critical value of 1.358; thus, the null hypothesis of no change is not rejected at the nominal level of 0.05, supporting the adequacy of the training set. The residual-based CUSUM of squares tests based on the SVR and TSVR models with the ML proxy andX t -targeting are then applied; subsequently, both tests detect a change point at the 441st observation of the testing data, corresponding to 16 October 2013. The red vertical line in Figure 3 denotes the detected change point.
To examine how the change affects the dynamic structure of the time series, we fit a Poisson linear INGARCH model to the training and testing time series before and after the change point. For the training time series, the fitted INGARCH model appears to havê ω = 0.334,α = 0.813, andβ = 0.086. Conversely, for the testing time series before the change, we obtainω = 0.135,α = 0.878, andβ = 0.067, which are not as different as those from the training time series case. However, after the change point, the fitted parameters are shown to beω = 1.045,α = 0.560, andβ = 0.147, thus confirming a significant change in the parameters. For instance, the sum of α and β in the training data is 0.899, which changes from 0.945 to 0.707 in the testing data.

Concluding Remarks
In this study, we proposed the CUSUM test based on the residuals obtained with the SVR and TSVR-INGARCH models to detect a parameter change in the conditional mean of the time series of counts. To improve accuracy and efficiency, we also employed the PSO method to obtain an optimal set of hyperparameters. Monte Carlo simulations were conducted using the INAR and INGARCH models with various parameter settings. The results showed that the TSVR method using the ML proxy and the conditional meanX ttargeting method is recommendable, as it generally performs well and markedly reduces computational time. Our method was then applied to the analysis of the return times of extreme events constructed based on the daily log-returns of Goldman Sachs stock prices and, subsequently, detected one change. Overall, our findings, based on a simulation study and real data analysis, demonstrated the validity of our method. Although the proposed method performs well in general, it might have a limitation in its performance when the amount of available training data is not large enough or the dataset has features that can violate the stationarity, e.g., high volatilities. The method can also suffer from over-fitting to a specific training sample. Thus, it would be an important task to develop more robust methods, which we leave as our future project. Data Availability Statement: Publicly available datasets were analyzed in this study. This data can be found here: https://finance.yahoo.com, (accessed on 10 August 2020).

Acknowledgments:
We sincerely thank the Editor and the two anonymous reviewers for their precious time and valuable comments.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: