Stock Price Volatility Estimation Using Regime Switching Technique-Empirical Study on the Indian Stock Market

: Volatility is the degree of variation in the stock price over time. The stock price is volatile due to many factors, such as demand, supply, economic policy, and company earnings. Investing in a volatile market is riskier for stock traders. Most of the existing work considered Generalized Auto-regressive Conditional Heteroskedasticity (GARCH) models to capture volatility, but this model fails to capture when the volatility is very high. This paper aims to estimate the stock price volatility using the Markov regime-switching GARCH (MSGARCH) and SETAR model. The model selection was carried out using the Akaike-Informations-Criteria (AIC) and Bayesian-Information Criteria (BIC) metric. The performance of the model is evaluated using the Root mean square error (RMSE) and mean absolute percentage error (MAPE) metric. We have found that volatility estimation using the MSGARCH model performed better than the SETAR model. The experiments considered the Indian stock market data.


Introduction
Estimating stock price volatility is a difficult task due to nonlinear patterns in data. Early interpretation of stock price volatility helps the traders to make more profits. Volatility in finance is a statistic to measure the rate of change in the stock price over time, and it is calculated using standard deviations. The volatility statistics help the investor to estimates the risk in the stock or stock index. When the volatility is very high, then it is riskier to invest. So, identification of volatility in the market is essential as far stock market is concerned.
In literature, it is found that stock market volatility is estimated by using autoregressive conditional heteroscedasticity [1,2]. The GARCH model has been proposed to capture the different families of conditional volatility [3]. The GARCH model is useful when the variance of the stock price is not constant [4]. Most of the work has used the GARCH model to estimate the volatility in the stock market [5][6][7]. Moreover, the GARCH model does not capture the different variations of volatility periods, and the reason is GARCH parameters alpha and beta are restricted to less than one [8][9][10][11][12]. However, when there is higher volatility, these parameter values can go beyond one. Hence, it fails to capture the higher volatility. According to a Markov process, a solution to this problem is to allow the GARCH model's parameters to vary over time [13][14][15][16][17]. Therefore, in this work, we have considered regime-switching based on the Markov switching GARCH (MSGARCH) and Self-Exciting Threshold Autoregressive (SETAR) model instead of the plain GARCH model. The contribution of this paper is to capture the dynamic volatility in time series data using the MSGARCH and SETAR models and improve the forecasting results. This is the first empirical study on the Indian stock market data based on the regime-switching model, to the best of our knowledge. GJR-GARCH(1,1). To enhance the model performance, later, EGARCH model residuals are given input to the ANN model. The ANN model is trained with the back-propagation method. Seventy percent of data was considered for training, 20% for validation, and 10% for testing. The experiment work considered S&P 500 data from 1998 to 2009. The performance of the model was evaluated using mean forecast error RMSE, MAE, and MAPE.
Sharma et al. [29] studied daily stock indices volatility forecasting using the seven GARCH models. The 21 global market indices were considered in the experiments from the year 2000 to 2013. AIC metric was considered to select the best model. The model parameters were estimated using the maximum likelihood function. The future performance of stock price was estimated using MSE and MAE metrics. The study found that the standard GARCH model performs better than TGARCH, EGARCH, AVGARCH, NGARCH, APARCH, GJR.
The exchange rate of Bangladesh and the U.S. currency volatility is estimated by using the GARCH models [30]. The experiments considered data from the year 2008 to 2015. Normal and Student's t-distribution assumptions were considered in the GARCH models. The AR(2)-GARCH(1,1) performed better than EGARCH and TGARCH models.
The hybrid model is proposed to estimates the volatility of gold price [31]. The hybrid model is the combination of the ANN and GARCH model. The residuals are captured using the GARCH model, and it is given input to the ANN model for forecasting the price. The future performance of gold prices was estimated using MSE, RMSE, and MAE metrics. The model was trained using the backpropagation method. The results show that the hybrid model performs better than the GARCH model. The volatility of the copper price was estimated using the hybrid deep learning method [32]. The hybrid is the combination of the GARCH an RNN model.
Much research has indicated that GARCH models perform exceptionally well in volatility forecasting. GARCH(1,1) [33] is tested against the Swiss Index; outcomes showed good parameter optimization for the returns. Studies have explored the impact of outliers in forecasting volatility [34], as well as different GARCH model versions [35][36][37]. The results from References [38][39][40] studies indicate asymmetric extensions of GARCH perform better than the symmetric extensions. Reference [38] explores different forecast methods for asymmetric models. Comparison in Reference [41] between GARCH-SGED and GARCH-N model is made to test their performance. The results showed that the GARCH-SGED performed better than model GARCH-N model. Reference [42] studied the Israel stock markets, indicating the outcomes from asymmetrical models provide better performance in forecasting volatility. Tan et al. [43] proposed the MSGARCH model to estimate the volatility in bitcoin prices.
In statistics, a structural break in the datasets leads to a massive difference in forecasting errors. Allaro et al. [44] considered chow tests to identify the structural break in the datasets. Chow test is used for testing whether the coefficient of two linear regression on the same datasets are equal or not. In this method, time-series datasets are split into two equal parts, then the coefficient of two linear fits is compared to know whether the structural break is present or not.
Caporale and Zekokh [45] investigated volatility of cryptocurrencies using Markov-Switching GARCH models. The GARCH models might be predicted incorrect results due to the high volatility. Therefore, authors proposed regime-switching based on the MSGARCH method. The experiments considered Coindesk Bitcoin data from the year 2010 to 2018. Chen et al. [46] proposed GARCH models for estimating volatility of wind data. Yancheng wind farm data were considered for the experiments. The wind power data were captured every 5 min. There were around 2016 data samples collected for work.
Fakhfekh and Jeribi [47] investigated a different type of GARCH models for volatility estimation. The work considered student-t and normal error distribution in the GARCH model. AIC and BIC information criteria were considered to evaluate the model. The performance of the EGARCH model was better than other models. CoinMarketCap data from August 2017 to December 2018 were used to carry out the experiments. Sun and Yu [48] used the threshold GARCH method to analyzed the positive and negative news effects. The S&P 500 data were considered in the experiments. TRINH et al. [49] investigated Vietnam government bonds price volatility using the GARCH TGARCH and EGARCH model. The study concludes that the GARCH model performed better than TGARCH and EGARCH model. The experiments considered Vietnam government bonds data from 2006 to 2019.
Emenogu et al. [7] investigated the volatility of stock price using the nine variants of GARCH models. NGARCH model performed better than other models. Nigeria Plc data from 2001 to 2017 were considered for experiments. Cao et al. [50] investigated the volatility of VIX options data using the GARCH model. OptionMetric VIX and SPX options data were collected from 2008 to 2012 for experiments. Sapuric et al. [51] studied the bitcoin volatility using the EGARCH model.
In most of the literature, stock price volatility is estimated using the GARCH model, and it is described in Table 1. Most of the work ignored the regime changes in volatility estimation. Therefore, we have applied an MSGARCH and SETAR model to estimate this work's volatility.
It includes the bull and bear market data. There are 3259 rows in the datasets. We have considered the closing price of the stock to estimates the volatility. The annualized stock indices and stock price volatility is estimated using the standard deviation as depicted in Figures 1 and 2  The stock returns are computed using the stock closing price. Let Y t is the closing price of stock at time t, and it is defined in Equation (1). Stock price returns are depicted in Figures 3-6.

Structural Break
Structural breaks in the data lead to errors in forecasting; in this work, we have considered the structural breaks in the data while forecasting the volatility [44]. Chow test is used to test the structural break in the stock datasets. In this method, time-series datasets are split into two equal parts, then the coefficient of two linear fits are compared to know whether the structural break is present or not. Let us consider the linear regression equations in (2) and (3).
Here, Y, Y1 are the dependent variables, X, X 1 are independent variables. C, C 1 are the constants, and β, β 1 are slope of the line. and 1 are the error term in regression model. We have used Equations (2) and (3) to fit the datasets. To check two linear regression fits are similar or not, we have defined the null hypothesis, and alternate hypotheses are given below.
H 0 : β = β 1 and C = C 1 ; H 1 : β = β 1 or C = C 1 . The p-value is greater than 0.05, indicating a structural break in the stock datasets. The experimental results show that there is a structural break in the datasets, and it is described in Figure 7 and Table 2.

Non-Regime Switching
Stock price data are dynamic. Autoregressive-Moving-Average (ARMA) model is useful for linear time series data [55][56][57]. Hence, ARMA models are not able to capture the dynamic volatility in time series data. Therefore, most of the work considered the GARCH model for dynamic volatility estimation [58][59][60][61][62]. The GARCH model is useful when the variance of time series data is not constant. The GARCH model is defined in Equations (4) and (5). t is a random variable with zero mean and unit variance.
It is found from the literature that the GARCH models do not capture the variations of the volatility periods. Most GARCH models considered in the literature are one term Autoregressive Conditional Heteroscedasticity (ARCH) and one GARCH, i.e., GARCH(1.1). However, the GARCH model ignores the regime-switching in volatility estimation because, in Equation (5)'s GARCH model, alpha and beta parameters can be more than one when the structural break in the present. Therefore, we have considered regime-switching using the MSGARCH and SETAR models.

Regime Switching Based on MSGARCH
Structural changes in time series data are referred to as regime-switching. The overall proposed work is depicted in Figure 8. Regime switching is essential when there is higher volatility in stock prices. Therefore, we have applied a Markov Switching-based GARCH (MSGARCH) model to estimate the stock price volatility. In this work, we have considered two MSGARCH models to estimates the volatility in stock price. In the first model, we have used homogeneous MSGARCH regime-switching. In homogeneous MSGARCH, GARCH conditional variance is considered. The normal distribution is used to analyze the error distribution in the models. In the second model, we have used heterogeneous MSGARCH regime-switching. In heterogeneous MSGARCH, GARCH, EGARCH, TGARCH, conditional variances are considered. Normal and Student t are used to analyze the error distribution in the models.
In the proposed work, stock return data are given as input for the MSGARCH model for estimating the volatility. The calculation of stock returns is discussed in Section 3. In the MSGARCH model, stock returns are defined as Y t at time t. We assumed that Y t has zero mean, and it is not serially correlated. The MSGRCH is defined in Equation (6).
where: D(0, H ,t, r ) ← Continuous distribution; H r,t ← Continuous Variance; r ← regime k; r ← vector k; P t−1 ← Stock return information set up to t − 1. D(0, H r,t, r ) denotes continuous distribution. It has mean value is zero. Conditional variance is denoted by H r,t with regime r state. r vector represents the regime-switching of Markov process. The regime switching from one state to another state is evolved using the first order of homogeneous Markov chain with state S t . Here, S t is an integer value which has discrete state {1 . . . r}. In Equation (6), S t = r represents the current state, and P t−1 is the previous state of Markov process.
We have considered the GARCH, EGARCH, and TGARCH conditional variance in the MSGARCH model to estimate stock price volatility.
Student t and Normal distributions are used to analyze the error distribution in the models. The performance of the model is estimated by using the AIC and BIC metric. The details of the steps are described in Algorithm 1.
Algorithm 1 Error distribution using Student t and Normal distributions. GARCH conditional variance with Heterogeneous regime switching method. 6: Estimate conditional distribution. ( Model estimation using AIC and BIC metrics. 8: Ten days forecast prediction.

Regime Switching Based on Self-Exciting Threshold AutoRegressive (SETAR) Model
The SETAR model is one of the popular models in time series to forecast the future trend in data. SETAR model was used when there is a structural break in the datasets. In the SETAR(R, AR), the model consists of two parts. R represents the number of the regime, and AR represents the order of auto-regression.
Consider a simple auto-regression(P) for stock price Y t , and it is defined below equation. where: TAR allows the model parameters to change according to the value of a weakly exogenous threshold variable z t for capturing nonlinear trends. where: ..Y t− P) ← Column vector variables; (r 1 , r 2 , ..r k ) ← divide the domain of the threshold variable z t into k different regimes.
In each different regime, in stock price, Y t follows a different auto-regression(P) model. When the threshold variable z t = Y t − d with the delay parameter d being a positive integer, the dynamics or regime of Y t is determined by its own lagged value Y t − d , and the TAR model is called a self-exciting TAR or SETAR model.

Experiment Results
The experiments are carried out in R studio. We have used the MSGARCH R package developed by [14]. There were around 3259 data samples collected from the year 2007 to April 2021. The data are given as input to the proposed model. We have considered two MSGARCH models: the first is homogeneous MSGARCH, and the second is heterogeneous MSGARCH. The proposed model conditional distributions are verified by using the normal distribution and Student t distribution. We have found that Student t distribution performs better than the normal distribution, depicted in Figure 9. Therefore, in further work, we have considered Student t distribution instead of normal distribution. The performance of each model is estimated using the AIC and BIC metric, and it is defined below Equations (9) and (10). L represents the value of the likelihood function. The number of estimated parameters of the model is represented using P.
The proposed work results are described in Tables 3 and 4. The results were compared with the GARCH and SETAR models. In the heterogeneous MSGARCH model, AIC and BIC value is lowest compare to GARCH and SETAR model. The MSGARCH and SETAR models' forecasting performance is calculated using the RMSE and MAPE metrics. The RMSE value of the MSGARCH model is the lowest. The MSGARCH model is outperforms compared to SETAR model, and it is described in Table 5. The ten days' forecast prediction is described in Figure 10. MSGARCH regime-switching performs better than the GARCH and SETAR models. The reason is that the SETAR model regime-switching is based on the Auto-Regressive process, and it ignores the regime changes when the stock price has higher volatility. However, there are performance variations in the homogeneous MSGARCH and heterogeneous MSGARCH models. In some stock, homogeneous MS-GARCH model is fitted well; and, in some stock, heterogeneous MSGARCH model is fitted better. The reason is that each stock follows its volatility.

Conclusions
Stock price volatility estimation is an essential topic for traders and stock analysts. The stock prices are affected due to many events, such as political uncertainty, bond market rate, and global market trends. The company earnings and other related financial issues are affected by the stock prices. Due to this, stock prices may fluctuate, and it increases the volatility in the stock market. Capturing the volatility of the stock price is a difficult task. Most of the existing work considered GARCH models to capture volatility, but it fails to capture different volatility variations [14,43,45]. To capture the volatility of stock price, we have considered regime-switching based on MSGARCH and SETAR models. In this work, we have considered two MSGARCH models to estimates the volatility in stock price. The first is homogeneous switching, and the second is heterogeneous switching. However, there are performance variations in the homogeneous MSGARCH and heterogeneous MSGARCH models. In some stock, homogeneous MSGARCH model is fitted well; and, in some stock, heterogeneous MSGARCH model is fitted better. The reason is that each stock follows its volatility. The regime-switching model selection is carried out using the Akaike Informations Criteria (AIC) and Bayesian Information Criteria (BIC) metric. The forecasting performance of the MSGARCH and models is calculated using RMSE and MAPE metrics. The RMSE value of the MSGARCH model is the lowest. We have found that MSGARCH regime-switching performed better than the GARCH and SETAR models. The reason is that stock prices are volatile and have a structural break in data; hence, GARCH models cannot fit correctly. Moreover, estimating the GARCH α and β parameters is a difficult task. Future work could be optimizing the GARCH α and β parameters using the genetic algorithm and grey wolf methods. We can also apply a machine learning algorithm for GARCH parameter selections. The stock prices are affected due to many events, such as company balance sheet variation, political uncertainty, bond market rate, and global market trends. Sometimes, stocks' prices react when there is a sudden change in management or share dividend and bonus announcement. Financial market stock price movements purely depend on various sources of information. It is not easy to interpret information from different sources. Aggregating and processing information from various platforms is a crucial challenge for future work.