Forecasting the Volatility of the Cryptocurrency Market by GARCH and Stochastic Volatility

: This study examines the volatility of nine leading cryptocurrencies by market capitalization— Bitcoin, XRP, Ethereum, Bitcoin Cash, Stellar, Litecoin, TRON, Cardano, and IOTA-by using a Bayesian Stochastic Volatility (SV) model and several GARCH models. We ﬁnd that when we deal with extremely volatile ﬁnancial data, such as cryptocurrencies, the SV model performs better than the GARCH family models. Moreover, the forecasting errors of the SV model, compared with the GARCH models, tend to be more accurate as forecast time horizons are longer. This deepens our insight into volatility forecast models in the complex market of cryptocurrencies.


Introduction
Understanding the relationships among cryptocurrencies is important for policymakers whose role is to maintain the stability of financial markets as well as for investors whose investment portfolios contain a portion of cryptocurrencies. Cryptocurrency is a non-centralized digital currency that is exchanged between peers without the need of a central government. Bitcoin [1,2], which was the first cryptocurrency, operates with block chain technology with a system of recording information in a way that makes it difficult or impossible to change, hack, or cheat the system. Because the prices of cryptocurrencies have been increased such as speculative investment purposes and/or a digital asset for real use, they have received growing attention from the media, academics, and the finance industry. Since the inception of Bitcoin in 2009, over several thousand alternative digital currencies have been developed, and there have been a number of studies on the analysis of the exchange rates of cryptocurrency [3]. The degree of the return volatility has been regarded as a crucial characteristic of cryptocurrencies for investors including them in their portfolio. The prices of Bitcoin and Ethereum have been rapidly increased so that the last one-year price of Bitcoin was an almost 400 percent increase to USD 40,406 on 15 June 2021 from USD 9451 on 15 June 2020. Since [4,5], empirical investigations of Bitcoin showed that Bitcoin is more characteristic of an asset rather than a currency, and Bitcoin also possesses risk management and hedging capabilities [6]. In order to predict the exchange rates of the Bitcoin electronic currency against the US Dollar, ref. [7] proposed a non-causal autoregressive process with Cauchy errors. The volatility of Bitcoin using monthly return series is higher than that of gold or some foreign currencies in dollars [8].
A number of academic studies have investigated the factors influencing the price and volatility of cryptocurrencies ( [9][10][11][12]). Especially, GARCH family models are employed to estimate the time-varying volatility of cryptocurrencies. Ref. [13] proposed the AR-CGARCH model to estimate the volatility of Bitcoin by comparing GARCH models. Ref. [14] looked at the tail behavior of returns of the five major cryptocurrencies (Bitcoin,

Materials and Methods
In Section 2, we introduce the two different traditional volatility models used in this study to compare the relative efficiency of the models we used. Our study conducts the SV model to forecast unobserved volatility in financial economics; however, there is another class of models that is frequently used. Refs. [20,21] develop the autoregressive conditional heteroscedasticity (ARCH) and generalized ARCH (GARCH) models, respectively. We employ GARCH (1,1) with constant in Mean, Threshold-GARCH (TGARCH) and the Integrated GARCH (IGARCH) models among the family of the GARCH models. The standard GARCH model assumes that positive and negative error terms have a symmetric effect on volatility. It means that good and bad news have the same effect on the volatility in the standard GARCH model. However, this assumption is easily violated in the financial stock market, in that the negative change in the stock market has a bigger effect on the volatility index than a positive change, or vice-versa. Ref. [22] called it a leverage effect. As such, the asymmetric GARCH models were developed for accommodating a leverage effect. The SV model allows for two error processes, while the GARCH model considers a single error term. Therefore, the SV model makes a better in-sample fit ( [23]) and thus could provide a better forecast whereas it potentially involves a heavy computational burden.

GARCH Models
For a log return series r t = log S t S t−1 , we let a t = r t − E t−1 [r t ] be the innovation at time t. All members of the family of GARCH models are obtained from a transformation of the conditional standard deviation, σ t , determined by the transformation f () of the innovations, a t , and lagged transformed conditional standard deviations. In particular, we employee three transformation models (GARCH, IGARCH, and TGARCH). An extensive discussion on the nested GARCH models is given in [7].
The mean model is chosen to have ARMA(0,0), and we include a mean-constant so that we let r t = µ + t be the innovation at time t where µ is a mean constant. For a log return series, we let a t = r t − E t−1 [r t ] be the innovation at time t. Then a t follows a GARCH (p,q) with constant in mean model if where α 0 > 0, α i ≥ 0, β i ≥ 0, and e t~t -Student distribution, which is explained by its 3 parameters, which are the location, scale and shape parameters in Equation (25) from [24].
then the GARCH (p,q) process is called IGARCH model [25] which is either non-stationary or have an infinite variance. In order to model persistence of higher volatility, we need an IGARCH (p,q) model with q ≥ 1. The TGARCH model [26] captures the asymmetric effect in the volatility is given by where the coefficient in the leverage term η i satisfies the condition −1 < η i < 1.
For the model selection of the GARCH ((p = 1) and (q = 1)) models considered, we used the Akaike Information Criterion (AIC). In addition, this study also considered the t-Student errors to take into account the possible fatness of the distribution tails of e t .

Stochastic Volatility Model
In the standard SV model framework [27,28], the data returns, r, that are generated from a probability model f (r|g) , where g is a vector of volatilities, and this unobserved vector g has a probabilistic structure f ( r|θ), where θ is a vector of parameters (see [23,29,30] for details). In the standard form of the model, volatility is modeled as a Gaussian firstorder linear autoregressive (AR(1)) process with mean µ in terms of a series of white innovations {E t } as follows; where E t iid N 0, h η , independent and identically distributed (iid) as normal, and |φ| < 1. A useful feature of Gaussian AR(1) processes is that the marginal distribution is also normal so that and the returns are given by where {e t } are independent and identically distributed (iid) as standard normal distribution. We denote p t to be spot price of a financial asset at time t and its one-period return is defined as r t = ln P t P t−1 . Let r = (r 1 , r 2 , . . . , r n ) T be a vector of returns with mean zero. The SV model is that each observation r t is assumed to have its own contemporaneous variance exp(g t /2) = √ h t which becomes g t = ln h t . The exp(g t ) is defined as the latent, time-varying volatility that follows a stochastic evolution. The SV model in this paper is given through: r t |g t ∼ N(0, exp(g t )) (7) where θ = µ, φ, h η is a vector of parameters so that µ is the level of log-variance, ϕ is the persistence of log-variance, and σ η is the volatility of log-variance. The initial state g 0 is distributed according to the stationary distribution of the autoregressive process of order one. Following [29,31] specifies a prior distribution for the parameter vector θ choosing independent components for each parameter, p(θ) = p(µ)p(φ)p h η where µ follows the usual normal prior µ ∼ N b µ , B µ . Ref. [32] notes that the prior of µ is usually chosen to be rather uninformative, e.g., through setting b µ = 0 and B µ ≥ 100 for daily log returns. The persistence parameter φ ∼ (−1, 1) is chosen so that (φ + 1)/2 follows the beta distribution B(α, β), implying where α and β are positive hyperparameters and B(α, β) = It is obvious that the autoregressive volatility process is stationary because the support of the beta distribution is the interval (−1,1). Its expected value and variance are ref. [31] chooses h η , the volatility of log-variance, such that h η follows the hyperparameter B σ η multiplied by χ 2 (d f = 1). In Section 2.3, we will compare the volatility forecast predictabilities of the models introduced in Sections 2.1 and 2.2.

Volatility Forecast Evaluation
In this subsection we carry out empirical exercise to measure predictive accuracy. The SV and GARCH models with observed returns is initially estimated by using our in-sample data. Forecasts are generated at horizons of 3, 5, 10, 20, 30, and 44 days. Then, our out-of-sample data are added to the sample, and the parameters of the model with an individual approach are estimated. The purpose of considering multiple forecast horizons is to see whether our new approach improves the predictive ability of a time series model at all horizons in a large sample. This is done by applying a common loss function which is a logarithmic version of mean square (prediction) error (MSE): h t is the estimates of the conditional log-volatility. This loss function with mathematical simplicity is the popular measure to evaluate forecasting performance in the literature (e.g., [11]). We consider two alternative ex post proxies for the conditional volatility such that ln σ 2 t = r 2 t and ln σ 2 t = |r t | which are noted as k = 1, 2, respectively. A smaller average loss is more accurate and, therefore, preferred.

Results
For volatility efficiency comparison, nine cryptocurrencies are applied to the models introduced in Section 2. Considering the sensitivity of the time period in predicting the volatility of financial time-series return data such as cryptocurrencies, we examine two different time periods, short-term and long-term periods. The sample consists of the daily log-returns of the nine cryptocurrencies over period 1 (19 August  We can say that the period 1 was low volatile time period and the period 2 was high volatile time period. It is good to perform the comparison of volatile forecasting with the GARCH models and SV model for two low and high volatile cryptocurrency data. The data set consists of the daily historical prices and volumes of the nine cryptocurrencies. Figures 1 and 2 present the plot of the daily prices of the nine cryptocurrencies. Figure 1 shows the scatterplots among the nine cryptocurrencies from August 2018 to November 2018 (period 1) and Figure 2 from January 2018 to November 2018 (period 2). According to the figures, each pair of cryptocurrencies studied addresses similar results, positive and relatively high correlation regardless of period. For the nine time series data analyses in this section, daily log-returns in percentage are defined as    Table 1 shows the summary statistics of the log returns of the nine cryptocurrencies. In general, they share the fat-tail distribution, one of the common characteristics found in the return series of financial assets. Based on the kurtosis statistics, the fat-tailed distribution is observed in all nine cryptocurrencies, though the degree of the fat-tail is quite different among them. One of the interesting observations is the change in kurtosis values between periods 1 and 2. In period 1, the values of the kurtosis of major cryptocurrencies, BTC, XRP, ETH, and BCH (the top 4 based on market capitalization), are higher than the other relatively small market-cap cryptocurrencies (XLM, LTC, TRX, ADA, and MIOTA). This phenomenon is reversed in period 2 where the small-cap cryptocurrencies generally have higher kurtosis values than the large-cap ones. It suggests that over the period of 2018 (period 1), small-cap cryptocurrencies tend to have more extreme daily returns on both directions which can be identified by the magnitude of their minimum and maximum returns in each period. When we focus on the recent data of period 1, however, this  Table 1 shows the summary statistics of the log returns of the nine cryptocurrencies. In general, they share the fat-tail distribution, one of the common characteristics found in the return series of financial assets. Based on the kurtosis statistics, the fat-tailed distribution is observed in all nine cryptocurrencies, though the degree of the fat-tail is quite different among them. One of the interesting observations is the change in kurtosis values between periods 1 and 2. In period 1, the values of the kurtosis of major cryptocurrencies, BTC, XRP, ETH, and BCH (the top 4 based on market capitalization), are higher than the other relatively small market-cap cryptocurrencies (XLM, LTC, TRX, ADA, and MIOTA). This phenomenon is reversed in period 2 where the small-cap cryptocurrencies generally have higher kurtosis values than the large-cap ones. It suggests that over the period of 2018 (period 1), small-cap cryptocurrencies tend to have more extreme daily returns on both directions which can be identified by the magnitude of their minimum and maximum returns in each period. When we focus on the recent data of period 1, however, this trend is reversed. The large-cap cryptocurrencies show higher kurtosis. As for the skewness, all the cryptocurrencies except XRP show lower skewness in the more recent period (period 1) while XRP displays even higher positive skewness. This is an interesting observation because most financial asset returns show a negative skewness. Only XRP has tended to the positively-skewed return series from the negatively-skewed ones while other cryptocurrencies have enhanced the magnitude of skewness in a negative direction. This might be explained by the fact that the most recent bull market period in the crypto market, late 2017 to early 2018, is covered in the data period, and the positive returns during the period dominate the negative returns before and after the bull market in terms of the magnitude. XRP, however, is off this trend. It has been observed that in the more recent period (period 2), a bear market, XRP is the only one which tends to have more extreme positive returns. This implies that the co-movement of XRP with the cryptocurrency market is lower than any other cryptocurrencies and thus its systematic risk in the cryptocurrency market would be low. It might, therefore, attract more attention from potential investors looking to build a market-portfolio in the cryptocurrency market. Figures 3 and 4 support the arguments by addressing boxplots of the cryptocurrencies in each period. All cryptocurrencies tend to have extreme log returns on both sides (high kurtosis) and in Figure 3, XRP shows the positively-skewed distribution in the more recent period (period 1). Panels A and B in Tables 2 and 3 show the correlations of the nine cryptocurrencies for each period. We report both Pearson's and Kendall's correlation coefficients. Based on the Pearson's correlation matrix, the magnitude of correlation coefficients tends to be higher toward to the recent period (from period 1 to 2) except in several pairs, especially XRP-related pairs (BTC-XRP from 0.68 to 0.54), which show lower magnitudes.        When we look into the results of the Kendall's coefficients; however, the trends addressed in the Pearson's results [33] show the opposite directions. Most correlation pairs in the Kendall's coefficient table were lower from period 1 to 2 (the recent period). However, with regard to the seemingly contradictory direction between the two types of correlation coefficients we suggest that Kendall's reflects the skewed and fat-tailed distribution feature of the return data of cryptocurrencies. Kendall's correlation coefficients are the rank correlation, a non-parametric test that measures the strength of dependence between two variables, whereas Pearson's are calculated based on the normality assumption. Principal component analysis (hereafter, PCA) is an effective multivariate statistical analysis technique for reducing the dimension of large data sets with minimal loss of information and extracting their structural features ( [2]). It transforms a number of correlated variables into a series of linearly uncorrelated variables called principal components by projecting the observation results onto axes to capture the maximum amount of variability in the original data. The first principal component explains the largest possible variability of the original data, and each succeeding component in turn explains the highest variability under the constraint that it is orthogonal to the preceding components. PCA is optimal from the perspective of minimizing the square distance between the observed values in the input space and the mapped values in the low-dimensional subspace ( [12]). Table 4 shows the PCA [34] results for periods 1 and 2. The proportion of variance explained by the first principal component in period 1 is 81% whereas it is 75% in period 2. Figure 5 addresses the factor loadings of the first two main components in panels A and Band the three main components (panels C and D). ETH, TRX, and XLM are the variables with high magnitude (an absolute term) of factor loadings in the first component in both periods. Interestingly, the signs of the factor scores are positive across all factors in period 1. In the second component, BCH, XLM, and BTC, has factor loadings that are influential on both periods. In period 1, the XRP return series shows the largest factor loading in an absolute term, around −0.8, in the second component, whereas LTC does the same in period 2. ). It transforms a number of correlated variables into a series of linearly uncorrelated variables called principal components by projecting the observation results onto axes to capture the maximum amount of variability in the original data. The first principal component explains the largest possible variability of the original data, and each succeeding component in turn explains the highest variability under the constraint that it is orthogonal to the preceding components. PCA is optimal from the perspective of minimizing the square distance between the observed values in the input space and the mapped values in the low-dimensional subspace ( [12]). Table 4 shows the PCA [34] results for periods 1 and 2. The proportion of variance explained by the first principal component in period 1 is 81% whereas it is 75% in period 2. Figure 5 addresses the factor loadings of the first two main components in panels A and Band the three main components (panels C and D). ETH, TRX, and XLM are the variables with high magnitude (an absolute term) of factor loadings in the first component in both periods. Interestingly, the signs of the factor scores are positive across all factors in period 1. In the second component, BCH, XLM, and BTC, has factor loadings that are influential on both periods. In period 1, the XRP return series shows the largest factor loading in an absolute term, around −0.8, in the second component, whereas LTC does the same in period 2.   Table 5 shows the value of the AIC (Akaike's information criterion) of different GARCH models (GARCH, TGARCH, and IGARCH) across nine cryptocurrencies in each period. We include TGARCH to handle the asymmetric distribution of errors which is commonly known for cryptocurrencies ([30]). In period 1, the IGARCH model provides the lowest AIC except for XRP, BCH, and LTC where the TGARCH models have the lowest AIC. In period 2, however, IGARCH shows the lowest AIC over all of cryptocurrencies. Given the value of the AIC model selection criterion, this indicates that in general IGARCH is superior to other GARCH family models. Table 6 shows the reliability of the IGARCH with ARMA (0,0) with LBTC for Period 1 and Period 2 even though is statistically significant at the significance level (0.10) for Period 1.   Table 5 shows the value of the AIC (Akaike's information criterion) of different GARCH models (GARCH, TGARCH, and IGARCH) across nine cryptocurrencies in each period. We include TGARCH to handle the asymmetric distribution of errors which is commonly known for cryptocurrencies ([30]). In period 1, the IGARCH model provides the lowest AIC except for XRP, BCH, and LTC where the TGARCH models have the lowest AIC. In period 2, however, IGARCH shows the lowest AIC over all of cryptocurrencies. Given the value of the AIC model selection criterion, this indicates that in general IGARCH is superior to other GARCH family models. Table 6 shows the reliability of the IGARCH with ARMA (0,0) with LBTC for Period 1 and Period 2 even though α 1 is statistically significant at the significance level (0.10) for Period 1.
) and e t~t -Student distribution with shape parameter. Tables 7-9 demonstrate representative results regarding forecast accuracy for an h-stepahead forecast. We report out-of-sample MSE losses in both the SV and GARCH models with the observed time series data, where the evaluation is based on two different volatility proxies for the conditional volatility. The forecast losses of the models are systematically lower over all horizons and across all cryptocurrencies. The results exhibit the superior forecasting accuracy of the SV method over the GARCH models, especially in volatility forecasting over longer time horizons. For example, 3 day out-of-sample MSEs (Mean Squared Errors) (using the variance as a conditional volatility) of the BTC over period 1 are 8.485 and 8.165 for IGARCH and SV, respectively, and those of period 2 are 8.038 and 7.407, respectively. When the forecasting time horizon is the longest (h = 44), the MSE of the SV method is 5.761 in period 1, whereas that of the IGARCH is 9.198. Thus, the SV method has better forecasting accuracy than the IGARCH as the forecasting horizon is longer.
This trend is shown in all other cryptocurrencies, regardless of the conditional volatility types (MSE1 and MSE2). This difference of MSEs between TGARCH and SV is high in TRX, ADA, and MIOTA while ETH has almost no difference in period 1 (5.669 for the TGARCH and 6.625 for the SV method). In general, the SV method shows better forecasting accuracy than the GARCH models across all the cryptocurrencies, especially in volatility forecasting over longer time horizons. One plausible reason is that the SV model allows for two error processes and thus is more flexible for modeling financial time series, while the GARCH model considers a single error term. In addition to that, the SV model allow us to use the Bayesian approach to determine the inferences for the volatilities of time series using simulation algorithms such as the Markov Chain Monte Carlo (MCMC) methods whereas the GARCH family models have the difficulty of obtaining the maximum likelihood estimates caused by the complexity of the likelihood function. Therefore, the SV model offers a better in-sample fit ( [3]).

Discussion
During period 1 (low volatile period) and period 2 (high volatile period), our finding is that the SV method shows better forecasting accuracy in terms of volatility. It indicates that institutional investors and individuals adopting cryptocurrency in their investment portfolios may better prepare for future risk management by utilizing SV models. Recently, the prices of cryptocurrency have been decreased by about 50 percent from the highest price in the early of April 2021. Investors are experiencing another round of high volatility time regarding cryptocurrency. The unexpected abrupt change of price of cryptocurrency may not be able to prepare well for the risk management to institutional investors even by employing neural network based volatility models because of the lack of the investment environment information training data about the cryptocurrency. Under this financial situation such that a sudden increase in the volatility of portfolio can bring tremendous risks including an increase in currency hedging costs, an increase in damage to loans for institutions, and a decrease in the value of beneficiary certificates, we can strongly recommend the investors to use the SV method which are confirmed by our finding with the low and high volatility time series data.

Conclusions
Understanding the volatility of the most popular cryptocurrencies is important to both investors and policymakers. In this study, we discussed the volatility of nine cryptocurrencies by using the GARCH and SV models. While previous studies have employed a variation of GARCH models, we introduced another statistical method for better out-ofsample forecasting power, the SV model. Our results provide strong empirical evidence that when we deal with extremely volatile financial data, such as cryptocurrencies, the SV method has better forecasting accuracy power than the GARCH models in terms of volatility, and this tendency is stronger as the forecasting horizons are longer. Finally, our SV model sheds light on the significance of a risk management tool for extremely volatile assets such as cryptocurrency. In this study, we only used ten cryptocurrency coins to compare the SV and GARCH models for volatility forecasting. In our future study, we will use more than 30 numbers of cryptocurrency coins to compare the neural network-based volatility model to the SV model with the recent cryptocurrency time series data. For forecasting the price of cryptocurrency, we are going to use a recurrent neural networks and long short-term memory models with traditional time series models such as the autoregressive integrated moving average model and ETS (Error, Trend, Seasonal) models.