Comparing the forecasting of cryptocurrencies by Bayesian time-varying volatility models

This paper studies the forecasting ability of cryptocurrency time series. This study is about the four most capitalized cryptocurrencies: Bitcoin, Ethereum, Litecoin and Ripple. Different Bayesian models are compared, including models with constant and time-varying volatility, such as stochastic volatility and GARCH. Moreover, some crypto-predictors are included in the analysis, such as S\&P 500 and Nikkei 225. In this paper the results show that stochastic volatility is significantly outperforming the benchmark of VAR in both point and density forecasting. Using a different type of distribution, for the errors of the stochastic volatility the student-t distribution came out to be outperforming the standard normal approach.

and have been widely adopted for forecasting and analysis of macroeconomic variables. The formulation of VARs is simple, however they tend to forecast well and are often used as the benchmark to compare the performance of forecasts among models. Sims and Zha (2006) have emphasized the value of volatility modeling for improving efficiency. Accordingly, taking time variation in volatility into account should improve the estimation of a VAR-based model and inference common in analysis of macroeconomic variables. Modeling changes in volatility of VARs should also improve the accuracy of density forecasts. Forecast densities are potentially either far too wide or too narrow, due to shifts in volatility. DAgostino et al. (2013) show that the combination of time-varying parameters and stochastic volatility improves the accuracy of point and density forecasts. One application of these regressions on a macroeconomic level is investing in assets, stocks and, as a purpose for this paper, in cryptocurrencies as mentioned above.
VAR models could have many parameters if they include many lags, however using non-data information and turning it into priors is found to be greatly improving the forecast performance. In Bayesian estimation algorithms, the stochastic volatility specification is computationally tractable, while in frequentist estimation it is captured with a single model. This is one of the reasons why in this paper the Bayesian approach is used. Another reason is that the Bayesian approach gives some advantages in parameter uncertainty, computing of probabilistic statements and estimation with many parameters. As a standard procedure the normal distribution is often used as a distribution of the so called "noise". For this research not only the normal distribution, but also the student-t distribution is used for modeling the errors.
A strong improvement of our paper is the introduction of time-varying specifications for multivariate models for better forecasting the cryptocurrencies behavior. In particular, the use of time-varying volatility jointly with the multivariate time series is of interest for capture the possible heteroscedasticity of the shocks and non-linearities in the simultaneous relations among the different cryptocurrencies in the models. Moreover, taking into account the time variation in volatility improves the VAR-based estimation and inference that have been showed in the preliminary cryptocurrencies analysis.
Our results show that including time-varying volatility and in particular stochastic volatility provides forecasting gains in terms of point and density forecasting relative to the multivariate autoregressive model. The inclusion of cryptopredictors can lead to better forecasting with respect to the benchmark but not strong improvements with respect to time-varying volatility models with only lags of the cryptocurrencies included. Directional predictability indicates that using stochastic volatility with heavy tails can be used to create profitable investment strategies.
The content of this paper is structured as follows: in Section 2 some literature which is used as research background is being reviewed, especially research in the field of Bayesian VARs and cryptocurrencies. Section 3 describes the data. Section 4 presents our models, estimation methodology and metrics used to assess our results, which are discussed in Section 5 together with the major findings. Finally, Section 6 concludes.

Literature review
Cryptocurrencies is becoming a hot topic in academia and outside of it. In particular, in last years, the interest in cryptocurrencies has exploded from around 19 billion in February 2018 to around 800 billion in December 2017, so a lot of research has been done about this subject. Although Bitcoin is a relatively new currency, there have already been some studies on this topic. Hencic and Gourieroux (2014) investigated the presence of bubbles in the Bitcoin/US Dollar exchange rate by applying a non-causal AR model, the dynamics of the daily Bitcoin/USD exchange rate shows episodes of local trends, which can be modelled and interpreted as speculative bubbles. Cheah and Fry (2015) focused on the same issue, as with many asset classes they show that Bitcoin exhibits bubbles. They find empirical evidence that the fundamental price of Bitcoin is zero. The volatility of six major currencies against the volatility of the Bitcoin was measured by Sapuric and Kokkinaki (2014), the results indicate a high volatility for the Bitcoin exchange rate. Then Chu, Nadarajah, and Chan (2015) did a statistical analysis of the log-returns of the exchange rate of the Bitcoin against the US Dollar and the generalized hyperbolic distribution is shown to give the best fit. Yermack (2015) wondered whether the Bitcoin can be considered a real currency on the financial market.
Fernndez-Villaverde and Sanches (2016) analysed privately issued fiat currencies and checked the existence of price equilibria and show that there exists an equilibrium in which price stability is consistent with competing private monies. However they also conclude that the value of private currencies monotonically converges to zero by equilibrium trajectories. Dyhrberg (2016) shows that the movements of the volatility of the Bitcoin has several similarities to gold and the dollar. Bianchi (2018) investigated if there is a relationship between returns on cryptocurrencies and traditional asset classes. There was a mild correlation with some commodities, but not that many macroeconomic variables. Catania, Grassi and Ravazzolo (2018) showed that predicting volatility can be improved by using leverage and time-varying skewness at different forecast horizons. Hotz-Behofsits, Huber and Zorner (2018) used time-varying parameter VAR with t-distributed measurement errors and stochastic volatility to model three cryptocurrencies: Bitcoin, Ethereum and Litecoin. Griffin and Shams (2018) investigated whether a cryptocurrency called: Tether, is directly manipulating the price of Bitcoin, increasing its predictability. By using algorithms to analyze the data they find that purchases with Tether go along with sizable increases in Bitcoin prices.
In 2019, there are more studies done on cryptocurrencies. Muglia, Santabarbara and Grassi (2019) investigated the predictability of the S&P 500 by the movement of the Bitcoin, they show that Bitcoin does not have any direct impact on the predictability of the S&P 500. Catania, Grassi and Ravazzolo (2019) found that point forecasting is statistically significant for the Bitcoin and Ethereum when using combinations of univariate models. They also conclude that density forecasting for all four cryptocurrencies is significant when relying on time-varying multivariate models.
The exercise in this paper is generalised to multivariate models where the four cryptocurrencies are predicted jointly using Bayesian VAR models with stochastic volatility as in Koop and Korobilis (2013). Johannes, Korteweg, and Polson (2014) predicted stock prices using time-varying parameter and stochastic volatility VAR models and find statistically and economically significant portfolio benefits for an investor who uses models of return predictability.
Many institutions tried to investigate the relationship between the Bitcoin and the stock market.
In an article by Bloomberg (2018), analysed stated that "big investors may be dragging Bitcoin toward Market correlation", investors looking for high gains may be attracted to the increasing risk of this cryptocurrency. Stavroyiannis and Babalos (2019) studied the relation between the Bitcoin and the S&P500 and found that it does not hold any of the hedge, diversifier, or safe-haven properties and the intrinsic value is not related to US markets.
There are still no studies that can confirm that Bitcoin is a good stock market predictor. This paper tries to fill the gap, analyzing whether the Bitcoin, Ethereum, Litecoin and Ripple can be forecasted by its lags and other macroeconomic variables.

Data
The data collected for the sample spans from August 8, 2015 till February 28, 2019, giving a total of 1301 observations. The data can be seen in Figure 1, it shows a big spike around the end of 2017.
Chinas Big Three exchanges were pending closure around that time, however the cryptocurrencies were largely buoyed by a bullish sentiment and went up. At December 2017 the peaks were reached and a couple days later they dropped. At this time cryptocurrencies are mainly considered as an alternative investment, due to the fact that their use for payment is still limited. This can create correlations with other assets in the financial market for at least two main reasons. The first regards investors, who usually allocate wealth in a global portfolio and hedge across investments; the second relates to market sentiments that spread fast among different assets. See Bianchi (2018) for similar arguments. In this paper, we have considered different cryptopredictors as described below. The choice of these cryptopredictors is due to the fact that possible correlations between cryptocurrencies and these assets can be created, because Bitcoin and other currencies are considered as an alternative investment and their use as payment is still poor. We use the following list of predictors for cryptocurrencies as stated in Catania et al. (2019) as proxying market sentiments: international stock index prices (the S&P 500, Nikkei 225 and Stoxx Europe 600); commodity prices (gold and silver); interest rates (the 1-month and 10-year US Treasury rates); and the VIX closing price. In order to study the possible dependence between cryptocurrencies, a transformation is necessary.
The percentage daily log returns of cryptocurrencies will be computed as follows: where S t is the price on day t and y t is the cryptocurrency log return. Table 1 reports the descriptive statistics of the cryptocurrencies. In Figure 2 the transformed data is plotted against time, as documented in Chu et al. (2015), the cryptocurrencies display high volatility, non-zero skewness, very high kurtosis and several spikes. The Ripple has the highest volatility due to the highest kurtosis. The Litecoin has also a high volatility but not that high compared to the Ripple. The other two (Bitcoin and Ethereum) are compared to the aforementioned cryptocurrencies less volatile, however the kurtosis is still far away from the normal distribution, which has a kurtosis of three. Another interesting statistic is the skewness, the Bitcoin is the only one with a negative skewness. This indicates that the tail is at the left side of the distribution, so the probability of lower values than the mean is higher than the normal distribution, which has a skewness of zero. With a positive skewness, this is the case for the other cryptocurrencies, the opposite is true. As before, the Ripple has the highest skewness, which indicates that the Ripple has the highest probabilities of higher values than its mean.
In Figure 2 the transformation of daily log returns is shown. This gives some more insight in the cryptocurrencies. The Ripple is the crypto which is the most volatile which the descriptive statistics also indicated. Also Ethereum stands out in the first half and after that it is more stable, which means that it is less volatile. Bitcoin is the crypto which is the most stable according to  The crypto market is 24/7 open, however the predictor variables are not. For this reason the data has to be adapted to use it for forecasting. The procedure is simple, when the market is closed for a variable the previous value of that variable is used. This gives a return of zero, however this is the best way since the variable is actually not changing for a day. Figure 3 shows the plots of the predictor variables.

Methodology
Studies have provided strong evidence of time-varying volatility in macroeconomic variables, however VARs with constant volatility are used in this paper. By using constant volatility the performance of point forecasting should not be affected that much by conditional heteroscedasticity, which is the case for heteroscedastic models like the GARCH and Stochastic Volatility.
Heteroscedasticity is a major concern in the regression analysis, and in the analysis of variance, as it can invalidate statistical tests. These tests assume that the errors, obtained by modelling, are uniform and uncorrelated. For example the ordinary least squares (OLS) estimator is still unbiased in case of heteroscedasticity, it is inefficient because the actual variance and covariance are underestimated.
In this paper, three types of specifications will be analysed the standard VAR model, the VAR with stochastic volatility and VAR with GARCH. The reason for multiple specifications of the model is to really see if the forecasting performance of a more complex model is better than a simple model. The Bayesian approach gives some advantages, the parameter uncertainty can be mitigated. The probabilistic statements can be computed without assumption. Another advantage is that the estimation of complex nonlinear models with many parameters is feasible.
For the stochastic volatility there are two different models determined to investigate, one where the normal distribution is used and the other where the student-t distribution is used. These procedures by using these models are not the same so it could end up with different results. This way there can also be a conclusion about which distribution would give more accurate forecasts between all the models.
As stated in Catania, Grassi and Ravazzolo (2019), the number of lags of the VAR models is selected equal to three based on the BIC. The lag of interest of the cryptopredictors is the first lag. Thus, eight models will be discussed and used in this paper: Bayesian VAR (3) parameter vector autoregressive, these models are among the most common models applied in financial and macroeconomic forecasting, see Lutkepohl (2007) and Koop and Korobilis (2010).
Regarding time-varying parameters, we left this issue as future research. To compare the models with each other, the Bayesian VAR (3) is chosen to be the benchmark. In the next subsections, the models used for the insample analysis and the forecasting exercise are explained briefly.

Bayesian VAR
First of all, the focus is on the benchmark model, the Bayesian VAR(3) model is described as follows: with T the number of total days of the data. Since this model is for every cryptocurrency the equation above can be rewritten in stacked form: where X t = [y t−1 , y t−2 , y t−3 ] , for every cryptocurrency.

Bayesian VARX
In order to introduce possible dependence to other variables, it is possible to extend the Bayesian VAR model, by including other variables of interest. The so-called VARX model can be described as: with T the number of total days of the data and where γ j and W j,t are the parameter and cryptopredictor respectively. Since this model is for every cryptocurrency the equation above can be rewritten in stacked form: with T the number of total days of the data and where X t = [y t−1 , y t−2 , y t−3 , W 1t , ..., W 8t ] , for every cryptocurrency.

Bayesian VAR-SV
In the following section, the models with time-varying volatility will be described in details by differenciating between SV and GARCH. First of all, the Bayesian VAR (3)  The Bayesian VAR-SV(3) model is described in the following way:

Bayesian VAR-GARCH
The Bayesian VAR(3) with GARCH(1,1) innovations is almost the same as the VAR-SV model, however there is a difference in the innovations term. This allows the model to take different approaches over time, for example in times of high uncertainty there could be a higher variance in the errors. It also has a memory over time so it can compare the observations with the past to get a better estimate of the predictions. For this reason one should use GARCH over SV, because of the memory over time.
The Bayesian VAR(3) with GARCH(1,1) innovations is described in the following way: with T the number of total days of the data and where R is the conditional correlation matrix, h t follows a GARCH(1,1) model where h t = [h 1t , h 2t , ..., h kt ] and (2) t = [ 2 1t , 2 2t , ..., 2 kt ] are conditional variances and squared errors respectively, ω and B and G are matrices of coefficients (Carnero and Eratalay (2014)).

Bayesian VAR-SVt
The following model description is similar to the VAR-SV, but now with a student-t distribution.
This model is in this paper referred to as VAR-SVt and described as: with T the number of total days of the data, and η the degrees of freedom. A is a lower triangular matrix with non-zero coefficients below the diagonal which are ones, Λ t is a diagonal matrix which contains the time-varying variances of shocks. This model implies that the reduced form variancecovariance matrix of innovations to the VAR is var( t ) ≡ Σ t = A −1 Λ t (A −1 ) (Clark and Ravazzolo (2015)).

Forecasting
To forecast the cryptocurrencies the methodology used is called a rolling window. The estimation part will be from 08/08/2015 till 08/08/2017 so a two year estimation window. Using the results from this estimation the point forecast one-day ahead will be calculated. The next forecast will be done by estimation a day later than before, so from 09/08/2015 till 09/08/2017. This procedure will proceed until the end of the data is reached (02/28/2019) which is after 567 days, thus the number of one-day ahead forecasts is 567. As a prior for the SV and GARCH models the Minnesota prior is used as a start. This approach is standard and can be extended to other priors, for this paper the standard approach is sufficient enough to investigate the cryptocurrencies. For every one-day forecast, a total of 6000 simulations are drawn and the first 1000 simulations are burned.
This burning of the first simulation is due to the fact that the first simulations can be correlated and/or inaccurate. Over time the simulations are independent of each other and can be used for measures.

Measures
To compare the performances of the forecasts, we will use five different types of measures. The first three are measures of point forecasts, the last two are measures of density forecasts. The difference between measures using point forecasts and measures using density forecasts is that measures using point forecasts uses the mean of the simulations, however for measures using density forecasts all the simulations are used. Measures using density forecasts will give a great view of the full simulation and will not be averaged out like the measures using point forecasts.
However measures using point forecasts still give a good interpretation of the performance and is more efficient in time.
First measure is the so-called 95% credible interval, this is an interval obtained by simulations.
The 2.5% and 97.5% quantile's of the simulations are the lower and upper bound respectively.
The idea behind this credible interval is that in 95% of the cases the forecast will be in this interval. Another measure is the sign predictability, in this paper referred as the 'Success rate', the percentage of the forecasts which are in the right direction, as the actual observations. When the actual observation goes down and the forecast as well then it counts as a 'success', it is also a 'success' when the actual observation goes up and the forecast as well. In the two other cases it counts as a 'fail', in this way the 'Success rate' is built. We do not perform sign predictability tests, the reason for this was indicated by Christoffersen and Diebold (2006). Tests that rely on the sign gives no information about volatility dynamics, which is potentially valuable for detecting sign predictability.
The third measure is called the Root Mean Squared Error (RMSE). The RMSE is preferred over the Mean Squared Error (MSE) since it is on the same scale as the data. Some authors (e.g., Armstrong, 2001) recommend the use of the RMSE since it is more sensitive to outliers than commonly used Mean Absolute Error (MAE). The RMSE is computed for each cryptocurrency series, i = Bitcoin, Ethereum, Ripple and Litecoin: where R is the length of the rolling window, T the number of observations,ŷ i,t+1 the ithcryptocurrency forecast at time t, and y i,t+1 is the actual observation at time t.
The fourth type of measure is for evaluating the density forecasts, this measure is called the Log Predictive Score (LS). In the same way as for the RMSE, it is computed for each series: where f (y i,t+1 ) is the predictive density for y i,t+1 , given the information up to time t. The fifth measure is the Continuous Rank Probability Score (CRPS). This is a continuous extension of the RPS and can be defined by considering an integral of the Brier scores over all possible thresholds x. Denoting the predicted cumulative density function by F (x) = p(X ≤ x) and the observed value of X by y i , the continuous ranked probability score can be written for each series as: where H(x − y i ) is the Heaviside function that takes the value 0 when the observed value is smaller than the threshold, and 1 otherwise (Jolliffe and Stephenson, 2003, Forecast Verification).
For the RMSE, LS and CRPS we apply a t-test by Diebold and Mariano (1995)  The other procedure we use is the model confidence set procedure of Hansen, Lunde, and Nason (2011) using a R package called:M CS, detailed by Bernardi and Catania (2016). The model confidence set procedure compares all the predictions jointly and deletes a model if it is significantly worse, one end up with the best possible models of the models that were put in. The models which have a grey background in tables will be chosen to be not significantly worse than the other models.

Results
As stated in section 4.6, we use different measures for point and density forecasting. Initially the focus will be set on point forecasting. The first results of the forecasts are given in Overall the use of the crypto-predictor variables would be helpful to simulate forecasts due to the fact that in almost every case using the crypto-predictor variables would give a lower percentage of actual observations outside of the 95% credible interval. Using a student-t distribution in the SV model is only for Bitcoin more often out of the interval, which is expected as Bitcoin is the least volatile of the cryptocurrencies. Including the crypto-predictor variables into the SV-t model this percentage is only smaller for the Ripple, however not by a lot.
For every cryptocurrency also the credible intervals are plotted (see Figures A.1-A.4 in the Appendix). In these figures the credible interval of the BVAR models are pretty steady for all cryptocurrencies, hence that these models are not capturing the volatile movements of the data that well. When one uses a more expanded version like the BVAR-SV or BVAR-GARCH model the credible levels captures the movements better, when there are shocks the credible levels adapt to its movement. However the BVARX-SV models stands out the most, there is much noise in the credible levels, so using the predictors would not be helpful to give a more narrow credible interval to predict one day ahead. Table 3 shows the results for the second point forecasting measure previously described. This predictability is not statistically tested but gives an insight into the accuracy of the movement of the forecasts. The returns are used to see if the direction of predictions is correct. The BVAR-SV model is compared to the BVAR model and BVAR-GARCH model in all cases more in the right direction. Another observation is that only for Ethereum and the Ripple including the cryptopredictor variables predict the direction more precise. The reason for this behaviour would be that Ripple is more dependent on market movement than the other cryptocurrencies. However the percentages are under 50% or close to 50% which would imply that these models (BVAR and BVAR-GARCH) cannot predict the movement very precise. That statement only applies for now on the prediction of the cryptocurrency going up or down.
An important observation of this table is that the stochastic volatility models have the best scores overall and are in some cases about 60-67% which is much more precise than for example 35.45% of the BVAR-GARCH for the Bitcoin. This is especially the case for the SV model with a student-t distribution, thus using a SV model with student-t distribution is the best way, among these models, to forecast the direction of the cryptocurrencies. Moving to the last point forecast measure Table 4 contains the results of the ratio of the RMSE.
For these results, the RMSE of the benchmark model (BVAR) and the ratios of the other models are reported. As expected in the descriptive statistics the Ripple is the cryptocurrency with the highest RMSE due to the high kurtosis.
For Ripple and Litecoin the SV models are significantly better than the benchmark model. The GARCH model is in all cases not significantly better than the benchmark, the cause could be that cryptocurrencies do not follow such dynamics. We could state that including the crypto-predictor variables is not affecting the RMSE of the models enough to increase the performance of the forecasts. For the Bitcoin there is no model significantly better performing than the VAR, this could be caused by the aforementioned stability of the Bitcoin compared to the other cryptocurrencies.
The grey areas indicate the model confidence set, this also confirms our conclusion that using the SV model is in almost every case (except for Litecoin the VARX-SV) in this set. If one wants to forecast these cryptocurrencies with one of these models, then the preferred option, by looking at the RMSE, is using stochastic volatility.
The last two tables (see Tables 5 and 6)   the GARCH model since the values of the SV model are in many cases lower. In the model confidence set is now also the GARCH for the Bitcoin included.
The conclusion drawn from the first measure of density forecast (CRPS) is that for Ethereum the case is now the same as the case for Bitcoin by using the RMSE, there is no model significantly better than the benchmark. The reason could be that the density of the forecasts of Ethereum are not following the movement captured by the used models, such that the predictability of Ethereum is low caused by uncertainty higher than the other cryptocurrencies.
Regarding the density forecast for CRPS, the main conclusion is that including stochastic volatility in the model formulation lead to better results with respect to the benchmark (VAR model) and to GARCH specification. In particular the inclusion of student-t specification of the errors in the SV models leads to better results and to great improvements for every cryptocurrency.
If one includes the crypto-predictors in the analysis, there are not so great improvements except when the errors are student-t specified for stochastic volatility.
The predictive likelihood (PL, or log predictive score (LS)) has some different results compared to the previous measures. At first the predictive likelihood is very close to each other if one compares the cryptocurrencies, this indicates that the models perform the same for the cryptocurrencies. Only for Ethereum there are models significantly better performing than the VAR. The SV models are in that case the most significant and the GARCH and the VAR including the crypto-predictor variables are less significant.
Overall the model confidence set is as before containing the SV models. However this time the SV-t models are not in this set, only for the Litecoin including the crypto-predictor variables.
Litecoin has however almost a full set, only the SV-t model is not in it, so Litecoin is not following a single model, but can be explained by multiple. The GARCH models are now in the model confidence set as well, this illustrates that the log score of the forecasts are describable as GARCH movements.
Regarding the density forecast for PL, the main conclusion is that including stochastic volatility in the model formulation lead for Ethereum to better results with respect to the benchmark (VAR model) and to GARCH specification. In contrary to the CRPS inclusion of the student-t specification of the errors in the SV model lead to no significant better results. If one includes crypto-predictors in the analysis, there are only for Ethereum improvements if there is no student-t specification.

Robustness check
In this section, we perform the forecasting exercises by including different univariate models. We report the results for different possible benchmark models. We consider the following two univariate models: an autoregressive model with one lag (AR (1)) and an autoregressive model with the first three lags (AR (3)) based on the BIC criterion. Table 7 reports the point and density forecasting for the AR(1) and AR(3) versus the benchmark model considered in Section 5. All these models are run by using the usual Bayesian priors and by running 5.000 iterations. Furthermore, we perform the root mean square error (RMSE) and the CRPS for the four main cryptocurrencies. As stated in Table 7 the results for the point and density forecasting are qualitatively similar to multivariate benchmark case, VAR(3).

Conclusion
Recently cryptocurrencies have attracted attention from researchers and financial institutions due to their importance. In this paper a comparison of the performance of several models has been investigated to predict four of the most capitalised cryptocurrencies: Bitcoin, Ethereum, Ripple and Litecoin. A set of crypto-predictors is applied and eight model combinations are proposed for combining these predictors. The results show statistically significant improvements in point forecasting for all the cryptocurrencies when using a combination of stochastic volatility and a student-t distribution. In density forecasting for all cryptocurrencies the stochastic volatility model gives the best predictability. One recommendation for future research is to allow different weights across time and time-varying parameters to improve the point and density forecasting.
Moreover, other crypto-predictors based on the dynamics of the crypto-market might be interesting for modeling.