Monitoring Volatility Change for Time Series Based on Support Vector Regression

This paper considers monitoring an anomaly from sequentially observed time series with heteroscedastic conditional volatilities based on the cumulative sum (CUSUM) method combined with support vector regression (SVR). The proposed online monitoring process is designed to detect a significant change in volatility of financial time series. The tuning parameters are optimally chosen using particle swarm optimization (PSO). We conduct Monte Carlo simulation experiments to illustrate the validity of the proposed method. A real data analysis with the S&P 500 index, Korea Composite Stock Price Index (KOSPI), and the stock price of Microsoft Corporation is presented to demonstrate the versatility of our model.


Introduction
In this paper, we study the cumulative sum (CUSUM) monitoring procedure to sequentially detect a significant change in time series with conditional volatilities. Since [1,2], the CUSUM method has been an acclaimed tool to detect an anomaly among observations in statistical process control (SPC). In SPC, a control chart is a primary component that graphically describes the behavior of sequentially observed time series. For examples of SPC, see [3] for control charts, [4] for the analysis of complex environmental time series, and [5] for backcasting-forecasting time series. In particular, the CUSUM chart is one of the most frequently adopted methods among various fields of SPC. For an overview of SPC, we refer to [6]. In general, the performance of control charts is measured with average run length (ARL). However, instead of the conventional control charts, some authors, such as [7], alternatively took the approach of controlling type I errors in probability instead of controlling ARL to deal with the monitoring process in autoregressive time series. This design of the sequential monitoring method has merit in its ability to attain a lower false alarm rate, as seen in [8], who took a similar approach to dealing with generalized autoregressive conditional heteroscedastic (GARCH) time series. For more references as to the monitoring process in time series, see [9]. Here, inspired by the previous studies, we aim to hybridize the CUSUM monitoring scheme with support vector regression (SVR) for GARCH-type time series. Support vector machine (SVM) is one of the most popular nonparametric learning methods used predominantly for classification and regression [10]. In particular, compared with traditional model-based methods, the support vector regression (SVR) is more advantageous to approximating the nonlinearity of the underlying dynamic structure of datasets. Refer to [11][12][13][14][15], and the papers cited therein. Moreover, SVR is structured to exploit the quadratic programming optimization problem, and to implement the structural risk minimization principle [16]. This facilitates SVR to be an estimation scheme that optimally minimizes the empirical risk while being relatively parsimonious [17].
In this section, we introduce our monitoring process starting from the independent and identically distributed (iid) sample case. Let us consider the problem of monitoring an anomaly in the variance from a stream of observations t of mean zero up to time n. Under the null hypothesis of no anomalies, we assume that t are iid with unit variance over time t = 1, . . . , n. Namely, we test  Var( t ) = 1, t = 1, . . . , k Var( t ) = 1, t = k + 1, . . . , n for some k = 2, . . . , n − 1.
For this task, we first define for each k ≥ 1, where τ 2 = Var( 2 1 ), which is assumed to be known. Ref. [7] considered the monitoring process based on: and later [8] additionally considered the monitoring process based on: In this study, however, we consider another monitoring process based on We employ the test statistic T max n because it combines the strength of T (1) n and T (2) n , which detect well the decrease and increase of variance, respectively. Note that if E 2 k has a positive shift, say, from 1 to 1 + δ with δ > 0, at some point k 0 = 2, . . . , n − 1, | min m≤k W m − W k | for k > k 0 would tend to have larger values due to the shift, which, however, is not true for max m≤k W m − W k , indicating that T (2) n would be able to detect the shift well while T (1) n would not do so; by contrast, if E 2 k has a negative shift, T (1) n would be able to detect the shift well while T (2) n would not do so. Additionally, T max n is preferable over T n in (3) as the latter tends to detect well a change that occurs in the middle of time series.
Using Donsker's invariance principle [31] and the fact that sup 0≤s≤t B(s) − B(t) = |B(t)| in distribution for any standard Brownian motion B, as described in [7], we have that as n → ∞, where D → denotes a convergence in distribution, and X D = Y implies the distribution functions of random variables X and Y being equivalent. Furthermore, by the continuous mapping theorem, T max n in (4) satisfies Then, the null hypothesis is rejected if T max n is larger than a constant c, which is determined asymptotically as the number satisfying P(T ∨ T > c) = α for any significance α ∈ (0, 1). In practice, we can obtain the critical value c empirically using Monte Carlo simulations. For example, c = 2.46509 when α = 0.05. Thus, an anomaly is signaled at k when T max n (k) = T (1) n (k) > c for some k = 1, . . . , n.
This monitoring process can be extended to the GARCH(1,1) model [32]: with ω > 0, α ≥ 0, β ≥ 0 satisfying α + β < 1 and E 4 1 < ∞. This model is widely used to model financial time series with high volatilities as it captures well the volatility clustering phenomenon. For its properties and characteristics and various GARCH variants, see [19,33], and the papers cited therein. Here, the monitoring process can be constructed based on residualsˆ t = y t /σ t0 withσ 2 t0 = ω 0 + α 0 y 2 t−1,0 + β 0σ 2 t−1 with some initial values y 0 and σ 0 when the true parameters ω 0 , α 0 , β 0 are known in advance from past experience. However, when they are unknown, which is a prevalent phenomenon in practice, we instead construct the CUSUM test with residualsˆ t = y t /σ t withσ 2 t =ω +αy 2 t−1 +βσ 2 t−1 , whereω,α,β are estimators of ω, α, β obtained from a given training sample. Then, we employ the following:T max n =T (1) n ∨T (2) n , whereT (i) n are the same as T (i) n in (2) and (5), i = 1, 2, with W k in (1) replaced byŴ k = ∑ k t=1 (ˆ 2 t −¯ 2 )/τ, and¯ 2 andτ 2 are the sample mean and variance of the residuals obtained from the training sample, respectively. We reject the null hypothesis ifT max n > c for the aforementioned critical value c > 0. Theoretically, the limiting distribution ofT max n is anticipated to be the same as the one in (6) when certain conditions are fulfilled, namely m = m(n) satisfies m/n → ∞ as n → ∞ (e.g., m ∼ cn log log n for some c > 0) and (ω,α,β) is a ( √ m-consistent) Gaussian quasi-maximum likelihood estimator (QMLE) as in [33], the proof of which is rather standard and omitted for brevity. The monitoring process of the nonlinear time series via SVR is provided in Section 3.3 below.

Support Vector Regression
Support vector regression (SVR) is a nonparametric function estimating method, which is a branch of the support vector machine that originated from [34] as a classification tool. Here, SVR is utilized to conduct CUSUM tests, as it effectively incorporates circumstances where the underlying structure of the conditional variance presented in (7) is unknown, and possibly nonlinear. Its details will be elaborated in Section 3.3 below.
In SVR, we seek to find a function of the form: where x denotes a vector of explanatory variables, w and b are regression parameters to be estimated, and φ is an implicit kernel operator that satisfies K(x 1 , To obtain the estimates of the regression parameters, we then formulate the problem where we can exploit its structure to employ a quadratic programming method [17]: subject to i > 0 denote slack variables, C > 0 denotes a penalty term, and is a tuning parameter that determines the level of tolerance regarding the error. Note that C and are user-defined tuning parameters of the model.
By the duality of the Karush-Khun-Tucker condition, we obtain the following optimization problem with respect to the Lagrange multipliers α i and α * i as dual variables [16]: Consequently, the solutions of the optimization problem (10) constructs the estimatef of f as follows:ŵ whereb can be obtained with various approaches. See [17] or [35] for references. The tuning parameters of our SVR-GARCH model consist of C, , and γ 2 , as we utilize the Gaussian kernel function Moreover, as the tuning parameters are user-defined, it must either be selected prior to employing the SVR, or be optimized. Here, we utilize the PSO to select the set of tuning parameters. The procedure of the PSO algorithm is elucidated in more detail in the next section.

Particle Swarm Optimization
PSO is one of the widely acclaimed meta-heuristic methodologies and is extensively adopted when optimizing tuning parameters in various machine learning techniques. Initially proposed by [24], its ability of optimization in nonlinear and nonconvex problems is inspired by the movement of organisms in bird flocks or animal herds. Here, we refer to [30] to describe the procedure.
Given a suitable objective function, a standard PSO algorithm aims to optimize a d-dimensional parameter x in the following search space: is considered as a candidate of the set of optimal solutions. Moreover, a swarm at time t is defined as S(t) = (x 1 (t), · · · , x N (t)) T . Each particle j has its own d-dimensional velocity vector at time t, denoted by v j (t) ∈ V for j = 1, · · · N, where is a velocity space with v min and v max being the lower and upper bound of each velocity element, respectively [36]. Furthermore, the best optimal solution p j (t) is defined as: where X (j,t) := {x j (1), · · · , x j (t)} denotes the trajectory of the particle j until time t. Then, the global best optimal solution of the trajectory of all particles until time t is defined as g(t) = (g 1 , · · · , g d ) T ∈ {p j (1), . . . , p j (t)}. In each generation at time t ∈ [1, T max ], where T max is a prescribed maximum generation time, the j-th particle in swarm S(t) is updated as follows: where c 1 , c 2 are acceleration factors in R and z 1 , z 2 are random variables generated from a uniform distribution U[0, 1]. In addition, w(t) is an inertia term at time t, which is calculated as follows: where w start and w end are predefined lower and upper bounds of the inertia values, respectively. Ref. [30] recently proved that the optimal solution obtained from the PSO algorithm converges to the global optimum with probability 1 under regularity conditions. The process of the aforementioned PSO algorithm is condensed in Algorithm 1 below.

Monitoring Nonlinear Time Series via SVR
We extend the monitoring process introduced in Section 2 to embrace nonlinear GARCH models with the following form: where g is an unknown, possibly nonlinear, function to be estimated, σ 2 t is the conditional volatility, and t are iid errors with zero mean and unit variance. To obtain the residuals that formulate the test statistic (8) of the monitoring process, we estimate g byĝ with a training sample y 1 , . . . , y m . For the estimation procedure, we utilize the SVR and then adopt the notion of the retrospective test described in [19].
For estimatingĝ via SVR, we first divide the training sample into two distinct samples, say {y 1 , . . . , y l } and {y l+1 , . . . , y m }, then regard the latter as the validation set. Subsequently, we replace σ 2 t with its proxyσ 2 t , as σ 2 t is unknown in practice. The authors of [14] employed the moving average method with a window size s ≥ 1 as below: for t = 1, . . . , n, where y 0 , y −1 , . . . , y −m+1 are some sensible initial values. Although its validity is advocated when s = 5 in [19], we alternatively consider the exponentially-weighted moving average (EWMA) estimator where α ∈ (0, 1) is a tuning parameter, which we set to 0.94 in our study, andσ 2 0 > 0 is some initial value. This alteration improves both the stability and the performance of the model, as portrayed in Section 4.
Moreover, to further enhance the stability of the estimation process ofĝ, we log-transform the response variable of (11), then take the exponential to obtainσ 2 t . To elaborate, we recursively obtainσ 2 t through γ t := log σ 2 t as follows: Givenĝ and a space of tuning parameters Θ, we then employ the PSO algorithm to obtain an optimal set of tuning parameters θ * ∈ Θ by evaluating the mean absolute error (MAE): The test set of length n emerges sequentially in practice, denoted by y m+1 , . . . , y m+n . Then, utilizingĝ, we yield the residualsˆ t = y t /σ t , whereσ 2 t is obtained recursively through (12). Afterwards, upon observing y l , m + 1 ≤ l ≤ m + n, the monitoring procedure is conducted by computing the CUSUM test statistic as in (8), and we declare it out of control when it exceeds the prescribed critical value under the nominal level α ∈ (0, 1).

Remark 1.
In our proposed monitoring procedure, we use the theoretically obtained critical value c, as illustrated by its notable performance in Section 4. However, if m is not large enough relative to n, there is a chance that the monitoring procedure might be undermined by the parameter estimation. In this case, one may be able to obtain c empirically through a wild bootstrap approach with the following steps [21]:
Finally, the critical value c is determined as the 100α% upper quantile ofT max,b n = max 1≤k≤nT The critical value c can be obtained via a bootstrap method similar to that for the GARCH(1,1) model in (7). In this case, we use y b σ * b2 t , alternatively one might be able to consider usingσ * b2 t :=ĝ(y b2 t−1 ,σ b2 t−1 ) to obtain the residualŝ b * t := y b t /σ * b t .

Simulation Experiments
We assess the proposed SVR-GARCH monitoring process to measure the performance when the time series is simulated from linear or nonlinear variants of GARCH models, such as GARCH(p, q), asymmetric GARCH(p, q) (AGARCH), GJR-GARCH(p, q), and Box-Cox transformed threshold GARCH(p, q) (BCTT-GARCH), specified as follows: BCTT-GARCH(p, q) : y t = σ t t , where the orders p, q are fixed as 1 and the errors t are iid. N(0, 1) are random variables.
In implementation, we generate a time series of length m from each of the above models and fit the SVR-GARCH model to it as presented in Section 3. In this procedure, we utilize the first 0.7m time series as a training set, and the rest as a validation set, where x denotes the largest integer not exceeding x. Subsequently, to create the circumstance, where the dataset for monitoring (testing sample) is observed sequentially, we initially design the underlying parametric model with a change located at point 1 < s < n. With this model, we generate a single observation of time series, obtain the residuals with the trained SVR-GARCH model, and then compute the test statistic. We terminate this monitoring process either if the test statistics are larger than the critical value c, or the accumulated number of observations reaches the prescribed length of n. To obtain the sizes and powers empirically, we iterate the procedure 1000 times at the significance level of α = 0.05. The empirical sizes and powers are calculated as the ratio of the rejections of the null hypothesis H 0 out of the 1000 repetitions.
Upon the construction of the null hypothesis, we consider the following settings: We inspect the cases of (m, n) = (1000, 1000) and (2000,2000), where m = cn log log n is used, i.e., c = 1.933, 2.028 for n = 1000, 2000, respectively, and count the number of false alarms and anomaly detections when the underlying model experiences a change at diverse locations, namely at the observation γn of the monitoring time series for γ ∈ {0.1, 0.2, . . . , 0.7}. For each set of experiments, we employ the PSO search algorithm for optimizing tuning parameters. Table 1 and Figure 1 encapsulate the empirical sizes and powers for the above four cases. In a nutshell, all four models' empirical sizes show a tendency to stabilize around α = 0.05. Additionally, their empirical powers approached 1 in a vast majority of the experiments, exhibiting a remarkable ability to detect changes across the experiments. In particular, as anticipated, the power became more significant as the γ decreases. Nevertheless, even if the location of the change was towards the end of the observation (i.e., a larger γ), its detection ability was still not heavily deteriorated. This entails that the performance of our monitoring process is not much affected by the location of change.
The result for the AGARCH model, presented in Figure 1c,d, is particularly more prominent. Despite the model being systematically unstable because of α + β being close to 1, empirical sizes and powers appeared to be highly reliable even at a relatively small sample size of n = 1000. This robustly confirms the validity of our proposed method. Overall, our findings show that the proposed monitoring process is highly applicable to datasets with high volatilities, such as the daily returns of financial time series. Table 1. Empirical size and power of the SVR-GARCH monitoring procedure for the GARCH(1,1), AGARCH(1,1), GJR-GARCH(1,1), and BCTT-GARCH(1,1) models.

Real Data Analysis
In this section, we demonstrate the real-world applicability of our proposed SVR-GARCH monitoring scheme with three financial time series: the S&P500 Composite Index (2 January 1991∼31 December 2003, the Korea Composite Stock Price Index (KOSPI) (2 July 2012∼29 September 2020), and the stock price of Microsoft Corporation (1 July 2009∼30 September 2020), obtainable from the websites Yahoo Finance and Investing.com. Moreover, the log returns, defined as r t = 100 × {log(y t ) − log(y t−1 )} (t ≥ 2), are utilized throughout the analysis, and are denoted by "S&P500", "KOSPI", and "Microsoft".
The procedure of this analysis is categorized into four steps. We first divided the time series of log returns into two chunks, and assigned the former as the training set of size m. To reflect the situation of monitoring, setting the maximum number of observations to n = 1500, we regarded the latter time series being observed sequentially. We then overviewed the general behavior of the given time series with its summary statistics, autocorrelation functions (ACFs), and partial ACFs (PACFs). Table 2 and Figure 2 reveal the basic characteristics of the datasets and plots of the ACFs and PACFs up to lag 25, respectively. The characteristics of KOSPI is similar to those of the standard normal distribution, in terms of skewness and excess kurtosis. By contrast, the excess kurtosis of S&P500 and Microsoft drifted from that of a standard normal distribution, and was moderately skewed, compared to KOSPI. Moreover, the ACFs and PACFs of all three time series did not suggest significant autocorrelations, which entails the stationarity of the datasets. Additionally, fitting the SVR-GARCH model of Lee et al. (2020b) to examine an existence of change within the training time series, we applyed their retrospective CUSUM test to the training set. Then, the result exhibited that it did not exceed the critical value of 1.3397, which consolidated the absence of change within the training set at the significance level of 0.05.  present the result of the monitoring procedure and the identified location of change in conditional volatility. The change of volatility for S&P500 appeared to occur on 28 October 1997. Notably, our model promptly responded to the event that occurred on the day prior, where log returns fluctuated by more than 7%. Additionally, we can observe the increase of volatility posterior to the detected location. Indeed, the detected location is around the period of the Asian financial crisis of 1997, and specifically on the exact date when the South Korean won plummeted massively to its new low. This instantaneous response after an abrupt change of volatility strengthens the validity of our monitoring process.
The locations of change regarding KOSPI and Microsoft were observed on 11 March and 13 March 2020, respectively. Notice that the detected location of change is close to the predetermined n. These findings particularly implicate that the detection ability of our monitoring process is not impaired, even if the volatility change is located near the end. In addition, the change point in KOSPI predates the major spread of the global pandemic outbreak by a few days, where the log-returns are observed to fluctuate abruptly, exceeding 8%. In fact, the recognized locations of change regarding KOSPI and Microsoft are around the period of the stock market crash resulting from the pandemic, which still currently affects the global financial market to a certain extent. This result illuminates the potential of our monitoring scheme to identify a signal change prior to the change in the underlying structure of the model and to serve as a solution to the necessity of prompt detection of structural changes throughout various fields of research.

Concluding Remarks
In this study, we considered a novel monitoring process for detecting a significant change of conditional volatilities in time series. For this task, we proposed a procedure based on a CUSUM test with a new test statistic, and utilized the SVR-GARCH model to calculate the residuals so as to construct the monitoring process, wherein the PSO is employed to obtain an optimal set of tuning parameters. Our simulation study demonstrated the adequacy of the proposed monitoring process for various linear and nonlinear GARCH-type time series. A real data analysis of three financial time series, namely S&P500, KOSPI, and Microsoft, was conducted to affirm the practicability of our model in various real-world circumstances. Our proposed model can be adopted in a classical SPC, where one controls ARL directly rather than the Type I error. This could be empirically achieved using the bootstrap method, as stated in Section 3.3. As the necessity for control charts with their usage tailored to various circumstances is still increasing, we leave the issue as our future research project. Acknowledgments: We thank the three anonymous referees for their careful reading and valuable comments that improved the quality of the paper.

Conflicts of Interest:
The authors declare no conflict of interest. The funding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: