Forecasting Bitcoin Volatility Using Hybrid GARCH Models with Machine Learning

: The time series movements of Bitcoin prices are commonly characterized as highly nonlinear and volatile in nature across economic periods, when compared to the characteristics of traditional asset classes, such as equities and commodities. From a risk management perspective, such behaviors pose challenges, given the difﬁculty in quantifying and modeling Bitcoin’s price volatility. In this study, we propose hybrid analytical techniques that combine the strengths of the non-stationary properties of Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models with the nonlinear modeling capabilities of deep learning algorithms, such as Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Bidirectional LSTM (BiLSTM) algorithms with single, double, and triple layer network architectures to forecast Bitcoin’s realized price volatility. Our ﬁndings, both in-sample and out-of-sample, show that such hybrid models can generate accurate forecasts of Bitcoin’s price volatility.


Introduction
The rapid development of technology has spurred changes in the structure of financial markets and payment systems.In recent years, global financial markets have become increasingly digital, supporting cashless payment systems across our society.Present-day technology allows people to invest and generate their own money through virtual currency, known as cryptocurrency (Pabuçcu et al. 2020).
Bitcoin is the most popular and commonly utilized among the cryptocurrencies available on the crypto market.Bitcoin's market capitalization has exceeded USD 600 billion as of January 2021 (Aysan et al. 2021).Since Bitcoin's inception in 2009, its price has experienced volatility, unlike anything we have witnessed from traditional asset classes, such as equities and commodities (Pabuçcu et al. 2020).
These irregular Bitcoin price spikes are a common feature of the extreme volatility that is regularly witnessed in cryptocurrency markets (Gbadebo et al. 2021).This heightened volatility in cryptocurrency markets also indicates that traders might quickly make or lose a significant amount of money as investor sentiment and risk aversions change (Koutmos 2022).This is one of the reasons why much of the research has focused on Bitcoin's price volatility and the degree to which price movements can be exploited (Wellenreuther and Voelzke 2019;Al-Yahyaee et al. 2019).In addition, there is a growing interest in how cryptocurrencies' unique microstructures, such as their coin mining methods, can play a role in their price behaviors (Bowden et al. 2021;King et al. 2021).
The Generalized Autoregressive Conditional Heteroscedastic (GARCH) model has been widely used in financial economic research to model asset price volatility.Many extensions of the plain vanilla GARCH model have been proposed by researchers to better capture the stylized facts of the returns of a variety of assets and markets (Lim and Sek 2013;Koutmos 2015;Kyriazis 2021).Bouoiyour and Selmi (2016) examined the volatility dynamics of Bitcoin using several types of asymmetric GARCH models and concluded that the Component with Multiple Threshold (CMT-GARCH) model properly reflects the dynamic aspects of Bitcoin price volatility.Katsiampa (2017) argues that the Autoregressive-Component GARCH (AR-CGARCH) best fits the GARCH family models for modeling Bitcoin volatility.Similarly, Chu et al. (2017) concluded that the IGARCH (1,1) model provides a good fit for the volatilities of cryptocurrencies.Conrad et al. (2018) found that the GARCH-MIDAS (Mixed Data Sampling) model improves Bitcoin's longterm volatility modeling.Baur and Dimpfl (2018) investigated the asymmetric volatility characteristic in different cryptocurrencies by applying the threshold GARCH (TGARCH) model.Charles and Darné (2019) studied GARCH-type models to assess the volatility of Bitcoin by considering different stylized facts.Similarly, Gyamerah (2019) discovered that asymmetric GARCH models with long-memory and heavy-tailed error distributions accomplish better volatility forecasts for Bitcoin as well as other cryptocurrencies.Zahid et al. (2022) applied Realized HA-GARCH-type models with jumps and inverse leverage effect to model and forecast the realized volatility of Bitcoin.
The successful development of machine learning (ML) techniques in time series has encouraged analysts to apply these techniques for modeling the dynamics of financial markets.This has also motivated researchers to apply ML models for predicting cryptocurrencies (Shen et al. 2021).Unlike traditional models, ML approaches do not require strict assumptions.While conventional time series and econometrics models look at the whole data, ML models split the data into training and testing datasets.These models aim to increase the accuracy by decreasing predefined loss functions (Butner et al. 2019;Makridakis et al. 2018).The ML models have generally performed better than traditional models when specially developed to deal with the particular problems of big datasets (Pabuçcu et al. 2020).
Recently, many researchers have introduced hybrid methods by combining ML methods, such as Artificial Neural Networks (ANN) and Support Vector Regression (SVR), with GARCH-type models to improve the volatility forecasts of cryptocurrencies.For instance, Kristjanpoller and Minutolo (2018) demonstrated that an ANN-GARCH model improves the volatility of Bitcoin prices.On the other hand, Peng et al. (2018) found that the SVR-GARCH model performs better and outperforms asymmetric GARCH models.Seo and Kim (2020) emphasized that the GARCH hybrid with Higher Order Neural Network (HONN) provided more accurate forecasts than the GARCH-ANN model.
In recent years, experts from various financial sectors have shifted their focus away from machine learning and onto its more sophisticated form, deep learning (DL).While ML algorithms, such as feed-forward neural networks, excel in forecasting nonlinear series, they are limited in understanding temporal dependencies within the data.Recurrent Neural Networks (RNNs) are a different class of neural networks suited for modeling time series problems because they can learn the temporal correlations between sequential and time series data.DL approaches are regarded as extremely effective in recognizing the financial market's chaotic characteristics (Kamnitsas et al. 2017;Lahmiri and Bekiros 2019).Explanatory factors are input information to improve the neural network models' volatility estimates.Hinton and Salakhutdinov (2006) proposed the DL method and developed its various extensions to deal with classification and regression problems.These methods reflect the human brain processes information and strengthen the ANN by using a series of hidden layers, improving models' forecasting ability.The DL algorithm is also modified into different architectures, providing a novel perspective on financial time series (Feuerriegel and Fehrer 2016).
The DL models are commonly employed to forecast financial assets such as bonds and stocks.These methods are useful in predicting prices, volatility, and directional movements or trends (Singh and Srivastava 2017;Sim et al. 2019;Zhang et al. 2019;Makinen et al. 2019).However, according to a survey, just 13.8 percent of academics who researched cryptocurrencies between 2013 and 2019 used DL techniques (Fang et al. 2022).Most researchers forecasted cryptocurrency prices rather than volatility using various DL approaches.Shen et al. (2021) compared the RNN and GARCH models for forecasting Bitcoin's volatility.
GARCH-type models generally outperform neural networks for forecasting the volatility of financial assets.This is because neural networks perform best in nonlinear environments and require large data to approximate the estimated function (Laily et al. 2018).The DL models, however, can capture short-term and long-term features by learning complex and nonlinear relationships in financial time series data (Vidal and Kristjanpoller 2020).The DL models capture market dynamics more efficiently than traditional and ML models and, in some cases, can even be used to create automated trading systems (Jeenanunta et al. 2018).
Due to the rapid advancement of DL algorithms in time series forecasting, researchers have begun to utilize them alone or in combination with one or more classical approaches to enhance forecasts.However, a limited number of studies have integrated GARCH-type models with the DL algorithm in cryptocurrency price modeling.We believe that merging volatility models with an RNN can help improve volatility forecasts by appropriately modeling volatility processes using the sequence-based learning capabilities of RNNs.Additionally, combining the strengths of the two models may result in more precise volatility projections by combining the data derived by GARCH models and using it as input data to enable RNNs to adjust to volatility processes.Future studies on GARCH-DL hybrid models may use this study's findings as a motivation.
This research contributes to the existing literature on the hybrid DL model in the following ways.Firstly, it combines various GARCH models with DL algorithms such as Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Bidirectional LSTM (BiLSTM) algorithms with single, double, and triple layer network architectures to forecast the realized volatility of Bitcoin.Secondly, it utilizes the parameters of GARCH models, daily log-returns, squared log-returns, and volatility indicators, such as the relative volatility index (RVI) and relative strength index (RSI), as inputs into the DL algorithms to improve the volatility forecasts of Bitcoin.These combinations of hybrid models and indicators have yet to be considered extensively for forecasting the realized volatility of Bitcoin's price movements.
In our study, a combination of hybrid models was used to forecast realized Bitcoin price volatility at multiple time horizons (7-, 14-, and 21-day-ahead) using a rolling window technique, and whereby the performance was assessed using loss functions such as the Heteroscedasticity-Adjusted Mean Absolute Error (HMAE), and Heteroscedasticity-Adjusted Mean Squared Error (HMSE) loss functions, as well as the Model Confidence Set (MCS) procedure.These contributions have the potential to have a significant impact on the field of Bitcoin volatility forecasting.The hybrid models considered in this study may encourage practitioners and investors to get accurate and reliable volatility estimates, thereby improving risk management initiatives.
The remainder of this study is structured as follows: Section 2 provides the specification of DL and hybrid models and Section 3 discusses the data, methods, and evaluation measures, respectively.Section 4 provides a discussion of the findings, while Section 5 concludes our study.

Volatility Models
The standard GARCH model and two asymmetric GARCH models were used to model the volatility of Bitcoin.The GARCH model of Bollerslev (1986) has been considered one of the most popular volatility models.In this model, conditional variance is defined as a function of the historical values of both the squared residuals and the conditional variance.Leptokurtosis and volatility clustering are both captured by the GARCH model, but time-dependent asymmetry, which is regarded as a key stylized aspect of volatility, is frequently left out (Kim and Won 2018).Hence, the GARCH model considers the shock's magnitude but not the positive or negative direction (Muhammed and Faruk 2018).Various extensions of the standard GARCH model have been proposed to accommodate this asymmetric volatility characteristic.Nelson's (1991) Exponential GARCH (EGARCH) model accurately depicts the asymmetric effect of both positive and negative shocks on volatility.The parameters can be unconstrained in this model's logarithmic form while still maintaining a positive conditional variance.Additionally, the conditional variance is a function of prior standardized innovations rather than using past innovations (Naimy et al. 2021).The GJR model of Glosten et al. (1993), which is comparable to the EGARCH model in that it additionally takes into account the asymmetric impact of positive and negative shocks, is another widely used asymmetric GARCH model.

Recurrent Neural Networks
The RNN is generally suited for processing sequential problems (Siegelmann and Sontag 1991).It is a specific type of ANN that consists of multiple layers, known as recurrent layers.These layers operate sequentially and map the sequences to other sequences, resulting in a network with a higher overall performance than ANN.The RNN retains information in its internal state, referred to as a memory cell.The output of RNN networks at a specific time interval depends on the input at that time interval and the network's state in the preceding time interval.The network, however, either stops learning or keeps learning at a very high learning rate; as a result, it is unable to grasp the concept of the smallest error.This vanishing gradient problem makes it more difficult to train the network (Mehtab et al. 2020).
On the other hand, RNN is incapable of storing long-term memory (Moghar and Hamiche 2020).Therefore, its performance is not considered adequate when the learning requires long-term sequential dependencies and is hence considered incapable of forecasting samples with long-time data (Hochreiter and Schmidhuber 1997).The RNN architecture is presented in Figure 1.Nelson's (1991) Exponential GARCH (EGARCH) model accurately depicts the asymmetric effect of both positive and negative shocks on volatility.The parameters can be unconstrained in this model's logarithmic form while still maintaining a positive conditional variance.Additionally, the conditional variance is a function of prior standardized innovations rather than using past innovations (Naimy et al. 2021).The GJR model of Glosten et al. (1993), which is comparable to the EGARCH model in that it additionally takes into account the asymmetric impact of positive and negative shocks, is another widely used asymmetric GARCH model.

Recurrent Neural Networks
The RNN is generally suited for processing sequential problems (Siegelmann and Sontag 1991).It is a specific type of ANN that consists of multiple layers, known as recurrent layers.These layers operate sequentially and map the sequences to other sequences, resulting in a network with a higher overall performance than ANN.The RNN retains information in its internal state, referred to as a memory cell.The output of RNN networks at a specific time interval depends on the input at that time interval and the network's state in the preceding time interval.The network, however, either stops learning or keeps learning at a very high learning rate; as a result, it is unable to grasp the concept of the smallest error.This vanishing gradient problem makes it more difficult to train the network.(Mehtab et al. 2020).
On the other hand, RNN is incapable of storing long-term memory (Moghar and Hamiche 2020).Therefore, its performance is not considered adequate when the learning requires long-term sequential dependencies and is hence considered incapable of forecasting samples with long-time data (Hochreiter and Schmidhuber 1997).The RNN architecture is presented in Figure 1.

Long Short-Term Memory
The LSTM is a particular type of RNN that deals with sequential data (Hochreiter and Schmidhuber 1997).It consists of a memory cell and different gates; the former store the information while later updates it (Kraus and Feuerriegel 2017).The LSTM often provides better predictions than the RNN for long sequential data.The structure or architectures of LSTM networks make it possible to forget past irrelevant information, thus resolving the vanishing gradient problem.Hence, such networks are very suitable for modeling complex time series.Each memory cell has three sigmoid layers and one tanh layer (Mehtab et al. 2020).Figure 2 displays the structure of the LSTM cell, including three special gates: forget gate, input gate, and output gate.The output of the LSTM unit is represented by ℎ , while  represents the value of the memory cell.

Long Short-Term Memory
The LSTM is a particular type of RNN that deals with sequential data (Hochreiter and Schmidhuber 1997).It consists of a memory cell and different gates; the former store the information while later updates it (Kraus and Feuerriegel 2017).The LSTM often provides better predictions than the RNN for long sequential data.The structure or architectures of LSTM networks make it possible to forget past irrelevant information, thus resolving the vanishing gradient problem.Hence, such networks are very suitable for modeling complex time series.Each memory cell has three sigmoid layers and one tanh layer (Mehtab et al. 2020).Figure 2 displays the structure of the LSTM cell, including three special gates: forget gate, input gate, and output gate.The output of the LSTM unit is represented by h t , while c t represents the value of the memory cell.
The forget gate determines which cell state information is deleted from the LSTM unit.It is employed to weed out extraneous memories from the past and retain only knowledge pertinent to the present situation.The memory cell accepts the output h t−1 of the previous Risks 2022, 10, 237 5 of 18 moment and the external information represented by X t of the current moment as inputs and combines them in a long vector [h t−1 , X t ] by σ transformation to become a forget gate: In Equation ( 1), W c, b c represent the weight matrix and bias of the forget gate, respectively, and σ is the sigmoid function.The forget gate's main function is to record how much the cell state C t−1 of the previous time is reserved for the cell state C t of the current time.
The input gates control the new information that acts as the input to the current state of the network.It reserved the current input X t and passes it into the cell state C t which prevents insignificant content from entering the memory cells.It has two functions; one is to find the state of the cell which must be updated and the updated value is selected by the sigmoid layer, as in Equation ( 2), and the other function is to update the information in the cell state.The input gate determines how updated the cell state is: Meanwhile, a new candidate vector Ĉt is created through the tanh layer to control how much new information is added, as presented in Equation ( 3): Finally, Equation ( 4) is used to update the cell state of the memory cells: The old cell state, the forget gate, the input gate, and the intermediate cell state are added.The new state is then calculated using the operation's result.Thus, LSTM is ideal for sequence prediction thanks to this enhanced cell with four interacting layers instead of just one sigma cell or tanh in RNN (Struga and Qirici 2018).
The output gate determines the LSTM cell's next hidden state.A sigmoid layer determines the output information first, and the cell state is processed by tanh and multiplied by the sigmoid layer's output to generate the final output component.Finally, the output gates serve to output the network's output at the specified time (Qiu et al. 2020): The final output value of the cell is defined as follows: The output can be considered as the forecasted value computed by the model for the current state (Struga and Qirici 2018).

Gated Recurrent Unit
This is a more recent alteration of LSTM introduced by Cho et al. (2014) to analyze sequential data.It combines the forget gate and input gate into a single update gate, z t , and merges the cell state C t and hidden state h t .The GRU structure is simple compared to the LSTM.The hidden state h t generated at each time step t is defined as follows: GRU might be able to learn the data at the combined gate by streamlining the LSTM's architectural design.This single update gate, though, might not be able to fully uncover some hidden information.Consequently, the effectiveness of GRU networks may be diminished when attempting to forecast long-term time series (Vo et al. 2019).
The GRU has fewer tensor operations; therefore, they are faster to train than LSTM.Researchers usually try both to determine which one works better for analyzing different financial markets and their unique data sets.

Bidirectional LSTM
Graves and Schmidhuber ( 2005) introduced the BiLSTM model, a bidirectional RNN variant that combines a forward and backward unidirectional LSTM as expressed in Equation (11).
In contrast to BiLSTM, which uses the combined two hidden layers, unidirectional LSTM can only store long-term information from prior observations.As shown in Figure 3, it divides the RNN's neurons in two directions.While the other makes use of forward states or positive time directions, the first is for backward states or negative time directions.As a result, this technique uses two-time directions and input data from the past and future of the current time frame (Schuster and Paliwal 1997).

Data Description
The data used in this study consists of daily Bitcoin closing prices from 1 January 2015, to 31 March 2021, a total of 2283 observations.The data were extracted from www.Bitcoincharts.com(accessed on 5 April 2022).The log-returns are calculated as: where  is the closing price of Bitcoin at time , and  represents the closing price of Bitcoin at a previous time.The realized variance was calculated by aggregating the squares of Bitcoin log-returns:

Data Description
The data used in this study consists of daily Bitcoin closing prices from 1 January 2015, to 31 March 2021, a total of 2283 observations.The data were extracted from www.Bitcoincharts.com (accessed on 5 April 2022).The log-returns are calculated as: where p t is the closing price of Bitcoin at time t, and p t−1 represents the closing price of Bitcoin at a previous time.The realized variance was calculated by aggregating the squares of Bitcoin log-returns: where r 2 t represents the square of log-returns, and n represents the number of observations within a day.The square root of the realized variance is known as realized volatility: The initial N observations were considered for the in-sample period to estimate the parameters, while the remaining observations, from 15 June 2020, through 31 March 2021, which resulted in approximately 287 observations, were left for the forecast evaluation.

Evaluation Measures
Two nonlinear loss functions, HMSE and HMAE used in this study (Kristjanpoller and Minutolo 2016;Kim and Won 2018;Fuertes et al. 2009) to assess the forecasting performance of models.The mathematical formulas of loss functions are given as follows: In these equations, RV t and σ t represent the observed and forecasted realized volatility, respectively, while n represents the out-of-sample size.The relative importance of the models was evaluated and MCS selected the best model.The MCS process consists of a series of tests that, by accepting the equal predictive ability (EPA) null hypothesis at a specified confidence level, enable the building of a set of superior models.This procedure evaluates the EPA statistic for loss functions, HMSE, and HMAE (Shang and Haberman 2018).
For training the DL models, the network architecture was specified to have at most three LSTM/BiLSTM/GRU layers with 128, 64, and 32 neurons and a dense layer.The Adam optimizer was used for training the network and a dropout of 0.3 was added between the first and second layers.All the networks were trained for 150 epochs.The architecture was kept the same for all DL and hybrid models to allow for a more even assessment of the forecasts under the same network architecture.For initial training of the DL models, the rolling window was set at (t − 12) to forecast (t + n)-day-ahead realized volatility, with n = 7, 14, and 21.All the models were fitted using Python.

Results and Discussion
Table 1 summarizes the summary statistics of Bitcoin closing prices, Bitcoin log-returns, Bitcoin squared log returns, and volatility indicators RSI and RVI.The Bitcoin closing prices showed a large mean value, high standard deviation, positive skewness, and high kurtosis, which are commonly observed in financial time series.Other researchers have also observed these anomalies in cryptocurrency prices (see Phillip et al. 2020;Zahid et al. 2022, among others).To fit an ARCH method, stationary and normal data are required.The Jarque-Bera (JB) and Augmented Dickey-Fuller (ADF) tests were used to analyze the two conditions.A highly significant value (20,227) of the JB test, at a 5% significance level, confirmed the non-normality in the Bitcoin prices.The ADF statistic of −4.89 at a 5% significance level showed that the series was not stationary for ARCH modeling.
Furthermore, the ARCH-LM test with a highly significant value (243.5) at a 5% significance level indicated strong autoregressive conditional heteroscedastic effects in the residual variances.The ADF test on log-returns rejected the null hypothesis of a unit-root process.The JB test rejected the hypothesis that the log returns distribution was normal.Similarly, the ARCH-LM test at 12 lags was also rejected, indicating heteroscedasticity in Bitcoin log-returns.The Ljung-Box test up to the lag 20 was also rejected.It revealed that Bitcoin log-returns were appropriate for fitting the GARCH model.In Figure 7, the probability plot of the Bitcoin log-returns confirms the results of Table 1.Similarly, Q and ADF statistics indicated statistical significance, as in Table 1.The autocorrelation plot showed that the first five lags, such as 1, 10. 20, 28, and 45 seem significant, while the consecutive lags approached the significant line.This confirms the stationarity of the time series.On the other hand, partial autocorrelations at lags 1, 10, and 28 were found statistically significant.The subsequent lags are near the significance line.As a result, the PACF suggested fitting a first-order autoregressive model.In Figure 7, the probability plot of the Bitcoin log-returns confirms the results of Table 1.Similarly, Q and ADF statistics indicated statistical significance, as in Table 1.The autocorrelation plot showed that the first five lags, such as 1, 10. 20, 28, and 45 seem significant, while the consecutive lags approached the significant line.This confirms the stationarity of the time series.On the other hand, partial autocorrelations at lags 1, 10, and 28 were found statistically significant.The subsequent lags are near the significance line.As a result, the PACF suggested fitting a first-order autoregressive model.
In Figure 7, the probability plot of the Bitcoin log-returns confirms the results of Table 1.Similarly, Q and ADF statistics indicated statistical significance, as in Table 1.The autocorrelation plot showed that the first five lags, such as 1, 10. 20, 28, and 45 seem significant, while the consecutive lags approached the significant line.This confirms the stationarity of the time series.On the other hand, partial autocorrelations at lags 1, 10, and 28 were found statistically significant.The subsequent lags are near the significance line.As a result, the PACF suggested fitting a first-order autoregressive model.

Estimation Results
Three GARCH-type models (GARCH, GJR, and EGARCH) were fitted to the in-sample data and the results of the parameters estimates, along with their standard errors, are presented in Table 2.All three GARCH models' estimated parameters were highly

Estimation Results
Three GARCH-type models (GARCH, GJR, and EGARCH) were fitted to the insample data and the results of the parameters estimates, along with their standard errors, are presented in Table 2.All three GARCH models' estimated parameters were highly significant, with the GJR and EGARCH models confirming the leverage effect in log-returns of Bitcoin.

Out-of-Sample Forecast Results
This section presents the results of out-of-sample forecasts of models considered in this study.For the sake of brevity, the results of models with better performance are presented.Table 3 shows the result of out-of-sample forecasts of GARCH models and single GARCH models combined with two layers of DL models.
It was observed from this table that the EGARCH model has a lower HMAE and HMSE than GARCH and GJR models.The EGARCH model exhibited 1.57%, 2.24%, and 2.37% improvement in HMAE, whereas it attained 2.86%, 3.08%, and 3.14% improvement in HMSE against the GARCH model at 7-, 14-and 21-day-ahead forecasts, respectively.Similarly, the EGARCH achieved 1.66%, 1.76%, and 1.87% improvement in HMAE, while it attained 2.61%, 2.55%, and 3.13% in HMSE from the GJR model at selected days ahead forecasts, respectively Next, we combined single GARCH models with two-layer DL models to investigate whether including single GARCH volatility as inputs to DL models improves the volatility forecasts based on the HMAE and HMSE loss functions at the selected time horizons.The results are tabulated in Table 3 and show that the EGARCH-LSTM 2 model had the minimum errors in a class of single hybrid models.This model exhibited 4.45%, 4.85%, and 5.58% relative improvement in HMAE, while 5.5%, 5.78%, and 6.06% in HMSE against the E-GARCH model at 7-, 14-, and 21-day-ahead ahead forecasts, respectively.
These results revealed that a single GARCH-type model hybrid with DL improved the volatility forecasts and that adding other features, such as RSI and RVI, further strengthened the predictability of Bitcoin price volatility.The performance of E-GARCH-LSTM 2 was found better in a class of single GARCH-type models hybrid with the DL algorithm.Next, we combined double and triple GARCH models with two-layer DL models to investigate whether including double and triple GARCH volatility as inputs to DL models improves the volatility forecasts.Volatility indicators such as RSI and RVI were also added as input variables to further improve the volatility forecast of hybrid models.Table 4 presents the out-of-sample forecasts for double and triple GARCH-type models combined with two layers of DL models along with the volatility indicators.
The results indicate that double GARCH-type models hybrid with DL models were statistically better than single GARCH-type models hybrid with DL models.More specifically, the GARCH-EGARCH-LSTM 2 , combining the GARCH and EGARCH models with a two-layer LSTM model, had the minimum errors in a class of double GARCH-type models.This model achieved 8.94%, 9.36%, and 9.87% improvement in HMAE for a 7-, 14-, and 21-day-ahead forecasts, respectively, when benchmarked against the best-performing single GARCH hybrid model (which was the EGARCH-LSTM 2 model).Similarly, this model attained 9.64%, 10.19%, and 10.30% improvement in HMSE against the best single GARCH hybrid model for 7-, 14-, and 21-day-ahead forecasts, respectively.

Rolling Window Forecast Results
In addition, this study adopted a rolling window approach for improving the performances of hybrid models.This approach used a fixed window length and generated a one-step-ahead forecast while dropping the oldest observation and including a new observation.The aforementioned models were estimated again and a one-step-ahead forecast was generated.In this way, 7-, 14-, and 21-day-ahead forecasts were made.The last 288 days were chosen as a test size for out-of-sample forecasting.The results of the rolling window forecasts are shown in Tables 5 and 6.
Table 5 presents the out-of-sample forecast results for single GARCH-type models hybrid with DL models.It was observed from the table that the rolling window affected the model's performances and reduced the HMAE and HMSE further, as compared to the results of Table 3.The EGARCH-LSTM 2 was found to have lower HMSE and HMAE values.The EGARCH-LSTM 2 was shown to have a relative improvement of 5.18%, 10.56%, and 12.28% in HMAE and 8.71%, 11.41%, and 13.29% in HMSE for 7-, 14-, and 21-day-ahead forecasts, respectively.
Table 6 presents out-of-sample forecast results of double and triple GARCH-type models hybrid with DL models.The results indicated that the double GARCH-EGARCH-LSTM 2 improved by 9. 83%,11.70%,and 11.47% in HMAE and 16.00%,16.41%,and 18.08% improvement as compared to its single benchmark (which is the EGARCH-LSTM 2 ).
The triple GARCH-type hybrid with DL models further improved the model's forecasting performance by using a fixed rolling window scheme.More specifically, the triple GARCH-type models with two layers of LSTM (i.e., GARCH-GJR-EGARCH-LSTM 2 ) improved the performance by 14.45%, 14.64%, and 14.94% against the GARCH-EGARCH-LSTM 2 in terms of HMAE.Likewise, this model attained 16.27%, 17.51%, and 19.53% enhancement over the GARCH-EGARCH-LSTM 2 by employing a rolling window scheme for a one-day-ahead forecast at different time horizons.
To focus more on predictive accuracy dynamics, the performance of GARCH-GJR-EGARCH-LSTM 2 was compared with EGARCH-LSTM 2 , which was shown to be the best-performing model among a single class of models.This triple GARCH-LSTM model was shown to have an improvement of 22.86%, 24.63%, and 24.87% in HMAE.Similarly, this model attained 29.70%, 31.05%, and 33.92% in HMSE for 7-, 14-, and 21-day-ahead forecasts, respectively.Interestingly, it is also noteworthy that the models that accomplished the minimum errors by selected loss functions attained greater values of significance associated with their p-values than other models.To find a significantly better model, the analysis was further proceeded by applying the MCS procedure.Table 7 shows the results of the MCS.It compares the multiple forecasting models by generating a set of superior models and picking those models which are statistically significant.In this analysis, 24 models were assessed by the MCS.It was observable that 15 out of the 24 models were statistically significant.This represents 62% of the overall models that were statistically significant.
The EGARCH-LSTM 2 was found statistically significant among single GARCH family models hybrid with LSTM models at different forecast horizons.Similarly, the GARCH-EGARCH-LSTM 2 model was shown to be statistically significant among double GARCH family hybrid with LSTM models.It was interesting to observe that triple GARCH hybrid with DL models attained the topmost count of statistically significant results compared to single and double GARCH hybrid with DL models.The MCS shows that the GARCH-GJR-EGARCH-LSTM 2 is the best model for forecasting Bitcoin volatility at a selected time horizon by considering a p-value less than 0.01.
With respect to existing studies, Hu et al. (2020) incorporated the GARCH forecasts with ANN, LSTM, and BiLSTM methods, along with a group of explanatory variables, to create hybrid models for assessing the volatility forecast of copper price futures.Unlike their study, this study incorporated the GARCH-type models' forecasts with LSTM, BiLSTM, and GRU algorithms with different layers, along with explanatory variables, to generate hybrid models for the measurement of Bitcoin's volatility forecast.This study also highlights that GARCH forecasts could serve as informative features to substantially boost the volatility prediction of asset prices.Our findings complement Kristjanpoller and Hernández (2017) as well as Verma (2021).Empirically, this study further elaborates the significance of multiple hybrid models by using another DL algorithm (i.e., GRU), which has been shown to be an advanced form of RNN in previous studies.The significance level associated with the p-value of the MCS test is taken at 1%, 5%, and 10% levels, represented by ***, **, and *, respectively.HMAE denotes heteroscedasticity-adjusted mean absolute error and HMSE denotes heteroscedasticity-adjusted mean squared error.The bold value shows the best-performing models.Kim and Won (2018) combined the LSTM model with GARCH, E-GARCH, and Exponential Weighted Moving Average (EWMA) model to develop the GEW-LSTM hybrid model for assessing stock price volatility forecast.They compared hybrid model performance by analyzing the benchmark model as well as a deep-feed forward neural network (DFN) and E-DFN.In contrast, this study compared the performance of their benchmark model while considering measurement errors and also calculated the relative importance of models.They found that GEW-LSTM has the lowest prediction errors in terms of different measurement errors compared to other prescribed models.Somewhat consistent with their study, this study also found that a triple GARCH hybrid with LSTM attained minimum prediction errors in terms of measurement errors and was shown to be statistically significant.Our findings, along with Kim and Won (2018), show the importance of combining multiple traditional models with the DL algorithm rather than only a single traditional model to enhance prediction performance.

Concluding Remarks
Bitcoin volatility forecasting has emerged as a dominant research area within cryptocurrency research for academics, market participants, and regulators alike.This study builds and compares three forms of hybrid models by combining GARCH, EGARCH, and GJR-GARCH individually with DL algorithms and then combining single, double, and triple GARCH models with the DL algorithm in order to construct and assess volatility forecasts.
The presence of GARCH, EGARCH, and GJR-GARCH, in a hybrid model captured the well-known stylized facts, such as volatility clustering, leptokurtosis, and the leverage effect, more effectively, and then their forecasts were used as inputs into the LSTM, BiLSTM, and GRU models.These models were considered to assist in learning the high-level temporal pattern in Bitcoin price data, therefore increasing the amount of information for RV to learn more effectively by considering the necessary features for prediction.The other features, such as RSI and RVI, which shared the correlation patterns with Bitcoin price volatility, were also used in DL algorithms to further improve the forecast of Bitcoin price volatility.The predictive performance of the best triple hybrid model (GARCH-GJR-EGARCH-LSTM2) compared to the best single hybrid model (EGARCH-LSTM2) improved by 18.61%, 19.88%, and 20.51% in HMAE, and 20.04%, 19.88%, and 20.51% in HMSE, for the selected days-ahead forecasts.
We further proceeded with our rolling window scheme to generate a one-day-ahead forecast and to investigate whether this approach minimizes the errors and improves model performance.This study achieved optimistic results and demonstrated that the prediction performance of the best triple hybrid model (GARCH-GJR-EGARCH-LSTM2) compared to the best single hybrid model (EGARCH-LSTM2) improved by 22.86%, 24.63%, and 24.87% in terms of HMAE by using the fixed window size.Similarly, the best triple hybrid model attained 29.70%, 31.05%, and 33.92% in HMSE compared to the best single hybrid model, for the selected days-ahead forecasts, by applying the rolling window approach.
The empirical results herein provide evidence that the GARCH-type model forecasts can serve as informative features, along with volatility indicators, to significantly improve the models' predictive power.Moreover, integrating the GARCH-type model with the DL algorithm was found to be an effective approach to construct useful hybrid models to boost the forecast performance of Bitcoin price volatility.The findings of this study can assist regulators and other market participants in better modeling the volatility of cryptocurrencies using hybrid models rather than individual models.

Figure 2 .
Figure 2. The architecture of the Gated Recurrent Unit.2.4.Gated Recurrent Unit This is a more recent alteration of LSTM introduced by Cho et al. (2014) to analyze sequential data.It combines the forget gate and input gate into a single update gate,  , and merges the cell state  and hidden state ℎ .The GRU structure is simple compared

Figure 2 .
Figure 2. The architecture of the Gated Recurrent Unit.

Figure 4
shows the charts of daily prices, log-returns, squared log-returns, and realized volatility of Bitcoin from 1 January 2015, to 31 March 2021.It can be observed from the figure that the price of Bitcoin started increasing in the mid of 2017 with slight variations till 2020 until it reached its top value at the start of 2020.The daily log-returns of Bitcoin show volatility clustering.Risks 2022, 10, x FOR PEER REVIEW 9

Figure 5
Figure 5 compares Bitcoin's log-returns with the estimated volatility of GARCH-type models.

Figure 5
Figure 5 compares Bitcoin's log-returns with the estimated volatility of GARCH-type models.

Figure 5
Figure 5 compares Bitcoin's log-returns with the estimated volatility of GARCH-type models.

Figure 6
Figure 6 utilizes proxies of Bitcoin squared return, also called RV, and compares it with GARCH-type model volatility.RV values seem close to volatility estimated by the GARCH type model and visualized.

Table 1 .
Summary statistics of Bitcoin closing prices, volatility indicators, Bitcoin log-returns, and square log-returns.
ARCH (•) and JB denote the ARCH-LM test statistic and Jarque-Bera normality test statistics, respectively, while  (20) denotes the Ljung-Box (-statistics) for the squared error terms up to lag 20, *** denotes the significance of the test at the 1% level.

Table 2 .
Parameters estimates of the GARCH-type models.
Standard error in parenthesis.*** means significance at a 1% level.

Table 3 .
Out-of-sample forecast results for GARCH-type and hybrid models.
HMAE: Heteroscedasticity-adjusted mean absolute errors; HMSE: Heteroscedasticity-adjusted mean squared error; bold values represent the least value for each column.

Table 4 .
Out-of-sample forecasts results for GARCH-type double and triple models, hybrid with DL models.

Table 5 .
Out-of-sample forecast results for single GARCH-type models hybrid with DL models, with fixed rolling-window size at different forecasts horizons.

Day Rolling Window at 7 Days Ahead Forecast One Day Rolling Window at 14 Days Ahead Forecast One Day Rolling Window at 21 Days Ahead Forecast
HMAE: Heteroscedasticity-adjusted mean absolute errors; HMSE: Heteroscedasticity-adjusted mean squared error; bold values represent the least value for each column.

Table 6 .
Out-of-sample forecast results for double and triple GARCH-type models hybrid with DL models, with fixed rolling window sizes at different forecasts horizons.
HMAE: Heteroscedasticity-adjusted mean absolute errors; HMSE: Heteroscedasticity-adjusted mean squared error; bold values represent the least value for each column.

Table 7 .
Model confidence test (MCS) for significance across forecasting models.