Next Article in Journal
Algebraic Stabilization of Linear Transformations in Artificial Neural Networks
Previous Article in Journal
Optimization of Microgrid Scheduling Based on Adaptive Collaborative Secretary Bird Optimization Algorithm
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Stacked ML-GARCH for Bitcoin Risk Forecasting: A Novel Ensemble Approach for Superior Value-at-Risk Estimation

1
Department of Mathematics and Statistics, Universidad del Norte, Barranquilla 080001, Colombia
2
Departamento de Engenharia Nuclear, Escola de Engenharia, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte 31270-901, MG, Brazil
3
CEAUL—Centro de Estatística e Aplicações, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisbon, Portugal
4
CETRAD—Centre for Transdisciplinary Development Studies, Faculdade de Ciências Sociais e Tecnologia, Universidade Europeia, 1500-210 Lisbon, Portugal
*
Authors to whom correspondence should be addressed.
Mathematics 2026, 14(4), 624; https://doi.org/10.3390/math14040624
Submission received: 8 January 2026 / Revised: 4 February 2026 / Accepted: 6 February 2026 / Published: 10 February 2026

Abstract

Accurately forecasting Bitcoin’s conditional variance is essential for reliable Value-at-Risk (VaR) estimation yet remains challenging due to nonlinear dynamics, volatility clustering, and heavy-tailed return distributions. This study developed a novel stacking ensemble that integrates econometric and machine-learning models through XGBoost meta-learning to produce improved variance forecasts. Hybrid MLGARCH specifications are incorporated separately to enrich the comparative analysis. All estimators are trained with time-aware cross-validation to ensure temporal coherence and prevent look-ahead bias. Using Bitcoin data from 2014 to 2020, the empirical results show that the stacking ensemble consistently outperforms both standalone and hybrid alternatives in conditional variance forecasting and VaR accuracy, including during periods of severe market stress such as the COVID-19 episode. Residual diagnostics confirm that the ensemble effectively captures persistent temporal dependencies in volatility dynamics. Overall, the proposed methodology offers an innovative and interpretable risk-management tool for financial institutions, combining statistical rigor with the adaptability of machine-learning techniques in digital asset markets.

1. Introduction

Accurate forecasting of conditional variance constitutes a fundamental component of financial risk management because it directly supports reliable estimation of Value at Risk (VaR). VaR represents a central metric for quantifying potential portfolio losses under adverse market conditions. The effectiveness of VaR estimation depends critically on the precision of variance forecasts, a requirement that becomes particularly demanding for assets characterized by nonlinear dynamics, persistent volatility, and structural instability. Bitcoin exemplifies this asset class, exhibiting pronounced price fluctuations driven by its decentralized architecture and heightened sensitivity to speculative behavior, regulatory interventions, and technological developments [1,2,3]. Consequently, accurate estimation of Bitcoin conditional variance σ t 2 remains essential for robust risk measurement, effective hedging, and dependable VaR computation [4,5]. Despite substantial progress in econometric modeling and artificial intelligence, forecasting Bitcoin conditional variance continues to pose significant challenges. Nonlinear dependencies, volatility clustering, heavy-tailed distributions, and frequent structural breaks undermine the assumptions underlying many conventional approaches.
Traditional generalized autoregressive conditional heteroskedasticity models, including Exponential GARCH, Asymmetric Power ARCH, Baba–Engle–Kraft–Kroner, and Dynamic Conditional Correlation specifications, establish a rigorous framework for modeling volatility persistence, asymmetry, leverage effects, heavy-tailed behavior, and dynamic correlations across cryptocurrencies and conventional financial assets [6,7,8,9,10,11,12,13,14,15]. Recent advancements in regime-switching econometrics offer powerful alternatives to address structural instability. For example, the Markov-switching threshold BLGARCH model integrates regime-dependent dynamics with threshold effects to capture abrupt volatility transitions [16]. Although such models effectively characterize conditional heteroskedasticity, they frequently exhibit limited adaptability to pronounced asymmetries and abrupt market shocks [17,18,19]. Extending this line of inquiry, research on bilinear time series with Markov-switching mechanisms provides sophisticated diagnostic tools for analyzing nonlinear dependencies and higher-order moments across regimes. In particular, bispectral analysis offers a deeper theoretical understanding of complex financial dynamics [20].
Artificial intelligence approaches, including Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, Gated Recurrent Units (GRUs), Convolutional Neural Networks (CNNs), and transformer-based architectures, demonstrate a strong capacity to learn complex nonlinear temporal dependencies and enhance predictive stability across multiple forecasting horizons [4,21,22,23,24,25,26,27,28,29,30,31,32,33]. Moreover, ensemble and sentiment-aware extensions indicate that integrating heterogeneous information sources improves forecasting robustness [34,35,36,37,38,39]. Despite these methodological advances, the practical deployment of deep learning, ensemble, and hybrid frameworks remains constrained by vulnerability to overfitting, limited interpretability, and insufficient temporal coherence for reliable VaR calibration [40,41,42,43,44]. The influential studies by [16,20] provide important conceptual foundations to expand the current research trajectory by facilitating theoretical reinforcement and more rigorous diagnostic assessment. In particular, the Markov-switching threshold BLGARCH model offers a relevant benchmark for handling regime shifts and asymmetric effects, thereby providing an advanced standard for evaluating ensemble performance during acute market transitions.
Consequently, a substantive gap persists in developing a unified methodology capable of delivering predictive accuracy, interpretability, and temporal consistency, particularly during speculative episodes or crisis-driven regimes. This gap imposes a critical limitation on institutional risk management applications that require transparency, auditability, and robustness.
This study introduces a stacking ensemble forecasting framework for Bitcoin conditional variance and VaR estimation to address this research gap. In contrast to prior approaches grounded in isolated econometric or artificial intelligence paradigms, the proposed methodology integrates GARCH-family models with nonlinear learners, including MLP, RNN, and LSTM architectures. This integration captures volatility persistence, nonlinear dynamics, and structural breaks within a unified hierarchical design [45,46,47]. Within this architecture, econometric specifications provide interpretable baseline forecasts, nonlinear models capture higher-order temporal dependencies, and an extreme gradient boosting meta-learner aggregates base predictions to reduce forecast error while preserving explainability through feature-importance analysis [7,37,48,49,50]. Consequently, the proposed framework aligns methodological rigor with explainable artificial intelligence principles and directly satisfies institutional requirements for transparent and auditable risk modeling in digital asset markets.
Methodological implementation emphasizes reproducibility, temporal coherence, and statistical validity. Time series cross-validation preserves chronological integrity, while diagnostic procedures verify residual independence, model adequacy, and stability across volatility regimes [17,18,19]. Theoretical and diagnostic insights from Markov-switching bilinear analysis further reinforce the validation protocol by underscoring the importance of scrutinizing higher-order moment structures within the ensemble residuals across different market phases [20]. Model performance is evaluated across multiple loss functions and forecasting horizons to assess robustness and economic relevance [32,33]. Empirical evidence demonstrates that the proposed stacking ensemble consistently outperforms individual and hybrid benchmarks in conditional variance forecasting and VaR calibration. This superior performance remains evident even during periods of pronounced market stress, such as the COVID-19 disruption spanning June 2019 to July 2020. Consequently, violation rates align more closely with nominal tail probabilities, providing evidence of enhanced risk sensitivity.
Although the empirical analysis focuses on Bitcoin, the proposed methodology remains inherently domain-agnostic and readily extensible to other high-volatility financial environments. Cryptocurrency markets rank among the most volatile asset classes in global finance, characterized by extreme price fluctuations and complex market microstructure dynamics that often exceed those of traditional equity and fixed income markets. These characteristics make them particularly challenging for risk forecasting and management models. The empirical literature documents these unique volatility features and the interconnected behavior of major cryptocurrencies relative to traditional financial markets, underscoring the central role of Bitcoin as the dominant digital asset driving broader market movements [51]. Recent econometric advances illuminate promising avenues for future research, including explicit incorporation of regime-switching mechanisms or bilinear components into ensemble base learners or meta-learners to enhance adaptability to structural breaks [16,20].
The remainder of the paper is organized as follows. Section 2 presents the data sources and provides an exploratory data analysis, followed by the formal definitions of each model and the corresponding architectures. The section also describes the methodology adopted for temporal cross-validation and for backtesting in the estimation of VaR. Section 3 reports the empirical results and the robustness analyses and offers a comprehensive discussion of the findings. Section 4 presents the conclusions and examines the implications and limitations of the study and the directions for future research.

2. Materials and Methods

2.1. Data and Preprocessing Procedures

Accurate forecasting of conditional variance within cryptocurrency markets requires precise characterization of price behavior together with rigorous preprocessing. Bitcoin is selected due to its status as dominant reference asset within this environment, where concentrated capitalization and sustained trading activity generate transmission of variance patterns across alternative tokens. Empirical research reports pronounced co-movement and persistent spillover effects that originate in Bitcoin and propagate toward other digital assets [52,53]. Its return sequence exhibits heavy tails, rapid regime transitions and enduring heteroskedasticity, producing conditions that challenge evaluation of econometric and learning-based variance models. Although the proposed framework remains applicable to any financial series, Bitcoin provides a demanding and representative benchmark suitable for methodological assessment in extreme market dynamics.
All models were implemented in Python 3.10, using TensorFlow 2.10 for the neural network components, scikit-learn for support vector regression, and the ARCH library for GARCH-family specifications. Daily closing prices were retrieved from Yahoo Finance through a Python interface for the period from 14 September 2014 to 30 July 2020, yielding N = 2137 observations. The out-of-sample VaR evaluation spans 2 June 2019 to 30 July 2020, producing 418 one-step-ahead forecasts in a rolling assessment scheme. This interval is selected because it encompasses substantial uncertainty induced by COVID-19, an episode that prompted abrupt risk revaluation across cryptocurrency markets, as illustrated in Figure 1. Assessing models during such volatile conditions provides a rigorous examination of robustness and offers insight relevant for practical risk control. Let C t denote daily closing price and compute log-returns as
r t = ln C t C t 1 .
Conditional variance over window ω = 7 is defined as
σ ω , t 2 = 1 ω j = 0 ω 1 r t j 2 , x t ( ω , α inp ) = σ ω , t 2 , σ ω , t 1 2 , , σ ω , t α inp + 1 2 .
Here, α inp denotes the number of lags employed as regressors in ML, ANN and SVR models, with symbol definitions provided in Table 1. The realized volatility proxy is constructed as a 7-day rolling average of squared daily returns, with the aggregation window fixed at ω = 7 . Although higher efficiency estimators based on intraday data have been proposed in the literature, their implementation requires high frequency observations that are not consistently available across cryptocurrency exchanges or over extended sample periods [54]. Consequently, forecasting-oriented studies that rely on such estimators may be affected by data heterogeneity and microstructure noise, particularly when econometric and machine learning models are evaluated jointly. In this setting, the adopted proxy constitutes a robust and widely established alternative. Importantly, forecast evaluation based on noisy or error-contaminated volatility proxies remains valid for relative model comparison, as formally demonstrated by Patton [55], provided that the same proxy is applied consistently across competing specifications. Therefore, the objective of this study is not the estimation of latent volatility itself but the assessment of predictive accuracy and stability across alternative modeling approaches.
Moreover, the choice of ω = 7 reflects the effective periodicity of cryptocurrency markets, which operate continuously and exhibit well documented calendar regularities in returns and conditional variance. Empirical evidence demonstrates systematic weekday weekend differences in Bitcoin volatility, thereby supporting the inclusion of a complete weekly information block [56,57]. Recent research in cryptocurrency risk forecasting adopts comparable horizons and documents improvements in accuracy, robustness, and stability across structural changes, particularly when econometric models and machine learning procedures are combined [58]. This specification follows an established risk management rationale in which the conditional variance forecast σ ^ t + 1 2 is incorporated into VaR estimation and subsequent backtesting procedures, as documented in prior cryptocurrency risk studies [59,60]. Overall, theoretical considerations on weekly seasonality and empirical evidence on predictive performance jointly support ω = 7 as an effective specification for continuously traded cryptocurrency series since it mitigates weekend distortions and captures the dynamic evolution of conditional variance more effectively [57,58].
Data segmentation follows a chronological order with an allocation that assigns eighty percent of observations to the training subset and preserves the remaining portion for evaluation. Standardization to zero mean and unit variance is estimated solely on training folds to maintain data integrity during subsequent transformations. Time-series cross-validation (TSCV) guides model choice and produces out-of-fold predictions, which safeguards against leakage and sustains an unbiased examination of forecasting accuracy [61]. Two univariate series, daily returns and closing prices, exhibit distinct distributional patterns across N = 2137 entries as reported in Table 2. For returns, the sample mean equals μ = 0.001515 , and the dispersion measure equals σ = 0.039378 , with support defined by [ 0.464730 , 0.225119 ] . Quartiles ( Q 1 , Q 2 , Q 3 ) indicate pronounced variability within a wide interval. Price levels measured in USD display average magnitude equal to $4256.15 and volatility equal to $4048.08, contained within [ $ 178.10 , $ 19 , 497.40 ] . Corresponding quartiles ( $ 430.011 , $ 3486.182 , $ 7653.980 ) reveal substantial asymmetry and extensive fluctuations during the entire sample span.
Table 3 reports inferential indicators for the BTC return sequence, covering distributional behavior, stochastic stability, and short-range structure. Evidence rejects normality with marked excess kurtosis κ = 13.431 , negative asymmetry γ = 0.950 , and Jarque–Bera outcome JB = 16 , 383.47 accompanied by p - value < 10 12 , signaling a heavy-tailed pattern with pronounced left bias. Consistent support for stationarity arises from augmented Dickey–Fuller statistic ADF = 14.161 with p - value = 2.08 × 10 26 together with KPSS statistic KPSS = 0.150 and p - value = 0.10 . Dependence assessment indicates rejection of a linear structure at a five per cent threshold, documented by the Ljung Box value Q = 19.736 and p - value = 0.0319 , which reveals persistent autocorrelation within the return sequence. Joint interpretation emphasizes non-Gaussian behavior, stable stochastic dynamics, and residual temporal structure, thereby shaping subsequent specification choices for conditional variance retrieval and forecasting in learning-based procedures.
Figure 1 provides additional evidence regarding BTC daily dynamics. Figure 1 (left) depicts observations from 14 September 2014 to 30 July 2020 and marks the terminal segment in red. Return trajectories remain close to zero yet show clear conditional variance clustering with intermittent abrupt variations, including a marked contraction on 12 March 2020 with value −0.485. Figure 1 (right) illustrates annual box plots in which central positions stay near zero across years, although interquartile ranges and whiskers exhibit substantial variation. Years 2018 and 2020 present noticeably wider spreads together with many outliers. Such dispersion and frequency of extreme values visually support rejection of normality and independence documented in Table 3, whereas persistent stability of central tendency across years is consistent with stationarity of returns. Joint patterns reveal intervals with pronounced conditional variance clustering, which motivates adoption of specifications capable of reproducing evolving risk and extreme fluctuations in financial series.
Figure 2 shows BTC closing values on the left and corresponding traded volume on the right from 2015 to 30 July 2020. Price movements follow wide oscillations with a marked rise near $20,000 in late 2017, followed by a retreat during 2018 and partial recoveries across 2019 and 2020. Each terminal segment appears in red to mark the forecasting horizon. Volume displays intermittent surges that broadly align with upward and downward phases, most visibly in late 2017 and early 2018, mid-2019, and late 2019 through early 2020, indicating intervals of heightened market activity. Behavior around late 2019 and early 2020 reflects uncertainty associated with pandemic-related macroeconomic stress that encouraged liquidity shifts and, for some agents, use of BTC as an inflation-hedging vehicle [62]. Participation intensified further during 2021 in connection with major adoption events such as corporate balance-sheet exposure by Tesla, the public listing of Coinbase in the United States, and recognition of BTC as legal tender in El Salvador, generating sustained price–volume co-movements [63,64]. Observed dynamics reveal recurrent clusters in conditional variance and shifts across regimes that motivate use of adaptive structures designed to capture nonlinear patterns in financial time series.
Figure 3 illustrates BTC 7-day realized volatility from 14 September 2014 to 30 July 2020. Figure 3 Left reveals pronounced clustering with brief, intense surges followed by gradual decay, a behavior particularly evident during market disruptions in March 2020. The largest peaks occur on 12 and 17 March with approximate values of 0.170 and 0.190 , followed by slow normalization reflecting post-shock persistence. Figure 3 Right provides complementary evidence through annual box plots, where medians remain relatively low while dispersion fluctuates substantially. Box widths expand across 2017–2018 and early 2020, and upper whiskers extend markedly with dense high outliers, indicating heavy upper tails and temporal heterogeneity in conditional variance. Consistent with prior observations on cryptocurrency sensitivity to external shocks and asymmetric responses, where losses induce stronger volatility than gains [65,66,67], conditional variance remains elevated during subsequent weeks of 22 and 29 March, aligning with increased skewness and kurtosis in returns. From an investment standpoint, these features support development of risk management and allocation strategies conditioned on both short- and long-horizon forecasts, emphasizing continuous monitoring to inform decision-making and enhance understanding of market dynamics [68,69].

2.2. Model Specification and Analytical Framework

2.2.1. GARCH Models

Conditional heteroskedasticity in asset returns, reflecting time-varying volatility, is effectively captured by GARCH structures originally proposed by [70] and extended by [71]. These formulations provide a parsimonious and interpretable mechanism that accommodates persistence and clustering in conditional variance, thereby supporting probabilistic forecasts, risk evaluation, and uncertainty quantification. Robust statistical properties and computational tractability have facilitated extensive application in econometric and quantitative finance studies. Subsequent analysis explores alternative GARCH formulations that extend baseline GARCH(1,1) specifications to incorporate stylized features commonly observed in financial data. Modifications include heavy-tailed error distributions, asymmetric responses to positive and negative shocks, and nonlinear variance dynamics. Each extension enhances representation of empirical characteristics of returns and improves accuracy of volatility estimation and risk measurement within algorithmic and data-driven decision frameworks.

2.2.2. GARCH(1,1) with Gaussian Innovations

Financial returns r t are defined as logarithmic variations in asset prices as in Equation (1). Each return r t can be represented as a stochastic process
r t = μ t + ε t ,
where μ t denotes the conditional mean and ε t represents innovation. Innovations follow
ε t = σ t z t , z t i . i . d . N ( 0 , 1 ) ,
with σ t 2 denoting conditional variance and z t a standard normal variable. Conditional variance evolves according to GARCH(1,1):
σ t 2 = ω + α ε t 1 2 + β σ t 1 2 ,
where ω > 0 captures long-term variance, α 0 quantifies influence of previous squared innovations, and  β 0 reflects volatility persistence. This parsimonious formulation, known as GARCH-normal, constitutes a standard benchmark for volatility forecasting due to tractability, interpretability, and stable dynamics [71]. GARCH(1,1) provides foundational structure for conditional variance modeling and persistence characterization in financial returns. Empirical studies frequently reveal heavy tails, asymmetry, and nonlinear behavior that this baseline does not fully accommodate. Extensions enhance flexibility, allowing more accurate representation of complex stylized features and improving volatility estimation in dynamic markets.

2.2.3. GARCHt (Student-t Innovations)

GARCHt refines baseline GARCH(1,1) introduced in Equations (3)–(5) by incorporating a probabilistic structure able to represent substantial tail mass in return dynamics. This specification replaces Gaussian innovations with standardized Student-t shocks
z t t ν ( 0 , 1 ) ,
where parameter ν governs tail magnitude. This distributional choice enhances accuracy in volatility inference because it accommodates pronounced deviations generated by market stress. Accordingly, it yields a flexible mechanism that aligns conditional variance estimation with empirical patterns documented in environments characterized by abrupt fluctuations, as indicated in [72].

2.2.4. GJR-GARCH (Glosten–Jagannathan–Runkle)

GJR-GARCH expands baseline GARCH(1,1) in Equations (3)–(5) through a mechanism capable of reflecting differential variance reactions produced by shocks of the opposite sign, as indicated in [73]. Conditional variance at time t, written as σ t 2 , evolves with past innovation ε t 1 and previous variance σ t 1 2 according to
σ t 2 = ω + α ε t 1 2 + γ 1 { ε t 1 < 0 } ε t 1 2 + β σ t 1 2 ,
where parameter ω > 0 introduces a baseline variance component, coefficient α 0 determines the impact of past squared disturbances, coefficient β 0 shapes persistence, and coefficient γ 0 governs asymmetric amplification under negative innovation conditions. Indicator 1 { ε t 1 < 0 } activates additional variance when shocks exhibit adverse sign. This structure provides an effective representation of leverage dynamics and yields improved conditional variance inference in environments dominated by asymmetric return adjustments.

2.2.5. EGARCH (Exponential GARCH)

EGARCH specifies conditional variance in logarithmic form, which enables explicit modeling of asymmetry and leverage effects while avoiding parameter restrictions that enforce non-negativity, following [74]. Conditional variance σ t 2 evolves through
log σ t 2 = ω + β log σ t 1 2 + α | ε t 1 | σ t 1 E | ε t 1 | σ t 1 + γ ε t 1 σ t 1 ,
where parameter ω introduces a baseline log-variance component, coefficient β describes persistence in transformed variance, coefficient α regulates magnitude reactions to standardized disturbances, and coefficient γ characterizes asymmetric adjustments linked to shocks of the opposite sign. Innovations ε t 1 are normalized by lagged volatility σ t 1 to ensure scale invariance in their effect. This specification guarantees positivity of σ t 2 through its logarithmic transformation and enhances conditional variance inference in environments exhibiting nonlinear and asymmetric volatility dynamics.

2.2.6. APARCH (Asymmetric Power ARCH)

APARCH specifies conditional variance through a power transformation and an asymmetry component, which introduces flexibility to represent nonlinear and asymmetric volatility observed in financial returns, following [75]. Conditional variance evolves according to
σ t δ = ω + α | ε t 1 | γ ε t 1 δ + β σ t 1 δ ,
where parameter ω introduces a baseline variance level, coefficient α regulates reactions to past shocks, coefficient γ characterizes asymmetry associated with stronger effects of negative disturbances, and exponent δ > 0 controls the power transformation that governs nonlinear behavior. Disturbances ε t 1 are influenced by past volatility through this structure. APARCH coincides with standard GARCH ( 1 , 1 ) when δ = 2 and γ = 0 , which positions APARCH as a unifying framework encompassing various ARCH-type specifications. This formulation enhances inference regarding conditional variance in environments marked by asymmetry and nonlinear adjustments.
GARCH-type models provide a rigorous and interpretable structure for conditional variance dynamics and empirical characteristics of financial returns; however, parametric assumptions may restrict predictive performance in settings influenced by complex interactions, nonlinear dependencies, or structural variability. Such restrictions motivate nonparametric methodologies that infer functional relationships directly from historical information, thereby improving conditional variance forecasting in markets shaped by intricate volatility patterns.

2.2.7. Support Vector Regression

SVR addresses nonlinear regression by mapping input observations into high-dimensional manifolds through transformations guided by structural risk minimization principles [76]. This mapping relies on a feature lifting operator ψ ( · ) , which captures the nonlinear structure implicitly [77,78,79,80]. The regression function in this lifted representation is specified as
z ^ = u ψ ( q ) + θ ,
where vector q denotes input information, z ^ denotes predicted output, u represents model weights, and  θ introduces an offset. Model calibration involves solving a convex program that balances predictive accuracy and model parsimony through slack variables η k + and η k :
min u , θ 1 2 u 2 + λ k = 1 m ( η k + + η k ) , subject to
r k u ψ ( q k ) θ ϵ + η k + , u ψ ( q k ) + θ r k ϵ + η k , η k + , η k 0 , k = 1 , , m ,
where r k denotes observed responses, ϵ sets an insensitive band, and  λ controls regularization strength. Dual variables β k + and β k yield a dual formulation expressed as
max β k + , β k 1 2 i , j = 1 m ( β i + β i ) ( β j + β j ) κ ( q i , q j ) + ϵ k = 1 m ( β k + + β k ) k = 1 m r k ( β k + β k ) ,
subject to k = 1 m ( β k + β k ) = 0 , 0 β k ± λ ,
with κ ( · , · ) denoting a Mercer kernel satisfying κ ( q i , q j ) = ψ ( q i ) ψ ( q j ) .
Kernel selection determines functional flexibility. A linear kernel,
κ ( a , b ) = a b ,
preserves original feature geometry and applies to nearly linear interactions. A Gaussian radial basis kernel,
κ ( a , b ) = exp γ a b 2 2 ,
enables smooth nonlinear transformations suited for complex structures. A polynomial kernel,
κ ( a , b ) = ( γ a b + c 0 ) p ,
captures higher-order interactions, whereas a sigmoid kernel,
κ ( a , b ) = tanh ( γ a b + c 0 ) ,
represents smooth nonlinear transitions akin to neural activation patterns. Constants γ , c 0 , and p denote kernel-specific hyperparameters. After dual optimization, predictions follow
z ^ * = k = 1 m ( β k + β k ) κ ( q k , q * ) + θ ,
where q * denotes an unseen input. This structure supports high-capacity function approximation while maintaining generalization through margin-based control [81,82].
By capturing nonlinear relationships implicitly through kernel mappings, SVR offers a robust framework for conditional variance prediction in financial time series. Its ability to learn functional associations directly from historical variance information, without strong parametric restrictions, strengthens volatility inference and enhances forecasting accuracy in environments characterized by dynamic market conditions.

2.2.8. Multilayer Perceptron

The MLP provides a flexible neural architecture frequently applied in regression and classification research [83,84,85]. Its structure integrates an input representation, at least one nonlinear intermediate transformation, and a scalar output. Empirical evidence indicates that a single hidden layer often attains universal approximation over compact domains [86]. Let u ˜ = ( 1 , u ) denote an augmented regressor that incorporates a bias component. An MLP with H hidden units and one output adopts the functional form
z ^ = ω 0 + h = 1 H ω h α v h u ˜ ,
where vector v h contains synaptic intensities of hidden unit h, and parameters ω 0 , ω 1 , , ω H govern signal transmission to output. Nonlinear operator α ( · ) is commonly specified as hyperbolic tangent within internal layers, whereas identity mapping regulates output transformation [87,88]. This formulation has been incorporated into hybrid designs to improve volatility predictions in financial settings [45,89,90]. By recovering latent nonlinear relations through the MLP and refining residual structure with statistical or kernel-based regressors, hybrid systems exploit complementary representational strengths. The MLP is particularly suitable for dynamic environments, given its capacity to reconstruct intricate dependencies and heteroskedastic patterns observed in sequential data [88,91].
Parameter estimation in the MLP relies on minimizing a smooth objective E ( Θ ) over parameter set Θ , which contains synaptic coefficients and offsets. Let Θ ( k ) denote parameter configuration at iteration k. Learning advances through stochastic gradient descent,
Θ ( k + 1 ) = Θ ( k ) η Θ E Θ ( k ) ,
where η > 0 denotes learning coefficient and Θ E represents gradient of objective with respect to Θ . For a single parameter w, the update rule becomes
w ( k + 1 ) = w ( k ) η E w | Θ ( k ) ,
which secures descent under suitable regularity conditions [92]. A feedforward configuration adopting the mean squared error objective uses
E = 1 2 i = 1 N ( y i z ^ i ) 2 ,
and its gradient with respect to weight w j i is written as
E w j i = ( y i z ^ i ) α v j u ˜ i u ˜ i ,
where y i denotes the observed target, z ^ i the model output, α the derivative of activation operator, u ˜ i the augmented input, and  w j i the weight connecting input i to hidden unit j [93]. Later studies introduced curvature-aware adjustments, adaptive learning coefficients, and second-order optimization procedures to strengthen convergence and numerical stability in MLP training [94].

2.2.9. Recurrent Neural Network

Recurrent neural networks constitute a neural family designed for sequential structures with variable length [95,96,97]. Their design incorporates hidden units with recurrent connectivity that act as internal states supporting information retention across time. Parameter sharing occurs through matrices U R m × n , W R n × n , and  V R n × k , where U governs transformations from inputs to hidden representations, W regulates temporal propagation within hidden states, and  V produces outputs from internal representations. At time index t, input vector x t R m interacts with previous representation h t 1 R n to generate hidden response h t R n , while output vector o t R k emerges from transformed internal activity. Dynamics are formalized as
h t = f h U x t + W h t 1 ,
o t = f o V h t ,
where f h and f o denote nonlinear mappings such as hyperbolic tangent or rectified linear functions [98,99,100]. Training relies on Backpropagation Through Time, which unfolds recurrent operations across temporal indices to compute partial derivatives associated with each parameter configuration [101]. Although this mechanism effectively captures short-lived interactions, gradient magnitudes often contract rapidly as sequence length increases, impairing sensitivity to distant events and limiting predictive performance under long-term dependency conditions [102,103].

2.2.10. Long Short-Term Memory

Long short-term memory provides a recurrent mechanism tailored for sequential structures requiring sustained retention across extended horizons [102]. Its design incorporates a chain of interconnected units that regulate information flow through three multiplicative components referred to as forget, input, and output gates (see Figure 4). Each component applies a sigmoid transformation followed by elementwise interactions that modulate internal dynamics. For a sequence of vectors x = { x 1 , x 2 , , x t , } with x t R m , each element represents a multivariate observation collected at index t. Subsequences arise from sliding windows constructed over longer records, and predictive quality depends on structural complexity and intrinsic variability of input signals.
Given an incoming observation x t at index t, the unit updates internal representations through a sequence of controlled transformations. First, it evaluates relevance of past content by producing a gating value within [ 0 , 1 ] ,
f t = σ 1 W f [ h t 1 , x t ] + b f ,
where h t 1 denotes the previous hidden output and ( W f , b f ) represent parameters associated with the forgetting mechanism. New information is processed through an input transformation defined by
i t = σ 2 W i [ h t 1 , x t ] + b i ,
C ˜ t = tanh W c [ h t 1 , x t ] + b c ,
C t = f t C t 1 + i t C ˜ t ,
where ( W i , b i ) and ( W c , b c ) govern admission of incoming content and generation of candidate representations. Updated memory C t reflects a convex combination of retained and newly synthesized components. Output modulation depends on
o t = σ 3 W o [ h t 1 , x t ] + b o ,
h t = o t tanh C t ,
with ( W o , b o ) governing selection of information made available to subsequent units. This recurrent structure maintains an internal path for C t that propagates across temporal indices and supports long-range dependencies while mitigating gradient decay observed in simpler recurrent schemes. Empirical studies report consistent gains from LSTM variants in diverse forecasting environments, including financial volatility modeling, where sequential patterns exhibit nonlinear and heteroskedastic behavior [104,105,106,107,108].

2.2.11. Hybrid Frameworks

Hybrid modeling that integrates artificial neural structures with support vector regression provides a flexible approach for nonlinear and nonstationary sequences as each component contributes distinct inductive properties. Neural architectures extract intricate relationships through adaptive layers, whereas support vector regression enhances generalization with scarce or noisy samples by controlling structural risk in feature-induced manifolds [45,88,89,90,91]. This framework examines sequential integration in which each learner approximates a separate source of variability. A scalar process Z t is expressed as
Z t = X t + Y t ,
with X t representing dominant nonlinear behavior and Y t capturing residual structure. In a first arrangement, neural approximation generates X ^ t based on lagged vectors and yields residuals
ε ^ t = Z t X ^ t ,
which are then represented through a support vector regression function
ε ^ t = f ^ SVR ( ε ^ t 1 , ε ^ t 2 , , ε ^ t n ) + Δ t ,
where f ^ SVR denotes a nonlinear estimator defined in kernel space and Δ t reflects unpredictable perturbations. Combined predictions follow
Z ^ t = X ^ t + Y ^ t .
A second arrangement begins with support vector regression producing X ^ t and residuals
ε ^ t = Z t X ^ t ,
which are subsequently approximated by a neural mapping that generates Y ^ t and produces forecasts
Z ^ t = X ^ t + Y ^ t .
This study further evaluates combinations incorporating generalized autoregressive conditional heteroskedasticity, producing hybrid structures that merge conditional variance models with neural or kernel-based estimators. Inclusion of GARCH provides explicit representation of volatility clustering, whereas neural or support vector components refine structures unexplained by conditional variance. Resulting hybrids exploit residual learning to improve forecasting accuracy in environments marked by heteroskedasticity or abrupt swings. Neural-first configurations are advantageous when nonlinear features dominate, and support vector-first structures perform well with limited samples or elevated conditional variability, and GARCH-based hybrids show particular suitability for sequences governed by persistent variance dynamics. Application to the cryptocurrency markets enables systematic assessment of flexibility, residual propagation, and predictive stability across all configurations.

2.2.12. Stacked Gradient Boosted Meta Regressor

Time-indexed quantitative behavior is represented by a univariate sequence { y t } t = 1 T , where each realization y t R corresponds to a scalar observation recorded at index t. A set of K heterogeneous forecasting engines, denoted by B = { b ( 1 ) , b ( 2 ) , , b ( K ) } , provides point predictions for horizons defined by positive integers h. Elements in this collection include five GARCH variants introduced in Section 2.2.1, together with SVR, MLP, RNN and LSTM specifications. Historical information at time t is summarized as H ( t ) = ( y t , y t 1 , y t 2 , ) , which permits autoregressive inference under causal requirements. Each engine applies a model-specific transformation Ψ k ( · ) that maps H ( t ) into a representation aligned with its architecture and capable of encoding sequential structure, cyclical evolution or rolling summaries, consistent with recent advances in short-term demand prediction [109].
Each element b ( k ) generates a forecast
y ^ t + h ( k ) = b ( k ) Ψ k ( H ( t ) ) ,
interpreted as an estimate of the conditional expectation of a future outcome based on information available at t. Resulting predictions form a concatenated vector
z ( t ) = y ^ t + h ( 1 ) , y ^ t + h ( 2 ) , , y ^ t + h ( K ) , τ t R K + 1 ,
where τ t R conveys temporal context such as index progression or cyclical position. This stacked design captures complementarities among forecasting engines and systematic bias patterns, an effect documented in wind and load prediction studies [110]. A meta-level regression function F Θ : R K + 1 R with parameter vector Θ is trained to approximate
y ^ t + h = F Θ z ( t ) ,
yielding refinements beyond individual engines. Extreme gradient boosting constructs F Θ as an additive ensemble of M regression trees,
F Θ z ( t ) = m = 1 M T m z ( t ) ,
where each T m divides input space into regions associated with piecewise constant responses. This strategy, initially proposed as a scalable second-order boosting framework [111], relies on a twice differentiable loss L ( y , y ^ ) and an ensemble F Θ ( m 1 ) comprising the first m 1 trees. A second-order expansion provides the approximation
L y t + h , F Θ ( m 1 ) ( z ( t ) ) + T m ( z ( t ) ) L y t + h , F Θ ( m 1 ) ( z ( t ) ) + g t T m ( z ( t ) ) + 1 2 h t T m ( z ( t ) ) 2 ,
with derivatives
g t = L y ^ | y ^ = F Θ ( m 1 ) ( z ( t ) ) , h t = 2 L y ^ 2 | y ^ = F Θ ( m 1 ) ( z ( t ) ) .
Samples routed into leaf j in T m form a set S m , j , with aggregated quantities
G m , j = t S m , j g t , H m , j = t S m , j h t .
A penalized objective controls structural complexity,
J ( T m ) = j = 1 J m G m , j ω m , j + 1 2 ( H m , j + λ ) ω m , j 2 + γ J m ,
where ω m , j R is a leaf value, λ > 0 stabilizes leaf magnitudes and γ > 0 penalizes excess branching. Minimization yields
ω m , j * = G m , j H m , j + λ ,
which enhances numerical stability. Multi-horizon prediction may rely on horizon specific meta regressors
y ^ t + h = F Θ h z ( t ) , h = 1 , , H ,
or a multi-output formulation,
y ^ t + 1 : t + H = F Θ z ( t ) R H .
Model assessment requires a walk forward evaluation strategy [112]. With nonstationary behavior, forecast risk across a sliding window of width W is expressed as
R W ( F Θ ) = E L y t + h , F Θ ( z ( t ) ) t W < τ t ,
which facilitates detection of evolving regimes. Observable changes in R W motivate periodic updates of fitted structures. This stacked design captures nonlinear interactions, bias propagation and latent shifts in dynamics. Integrated predictions within a boosted meta regressor improve robustness and reduce variance [109,110], while second-order boosting offers convergence properties suited to real time forecasting pipelines [111].

2.2.13. Estimation and Backtesting of Value at Risk

Value at Risk (VaR) characterizes maximal anticipated loss for a fixed forecast horizon at a given confidence level. Let r t denote logarithmic return introduced in Equation (1), and let F ( · ) denote the cumulative distribution function for r t . For confidence level q, VaR at tail probability α = 1 q satisfies
α = P ( r t VaR t α ) = F ( VaR t α ) ,
which implies that VaR t α corresponds to the α -quantile of the predictive return distribution. Within parametric volatility formulations such as GARCH families, the conditional mean is expressed as μ t = E [ r t | F t 1 ] and conditional standard deviation as σ t | t 1 = Var ( r t | F t 1 ) , where F t 1 denotes information available at t 1 . Standardized innovation is defined as
ε t = r t μ t σ t | t 1 ,
which centers and rescales return dynamics. Since returns are demeaned, the conditional mean equals zero, μ t = 0 , and standardized residuals remain centered with unit variance. Consequently, ε t follows the distribution F ε ( · ) with zero mean and unit variance. In such a structure, one-step-ahead VaR forecast is computed as
VaR t α = μ t + F α 1 ( ε t ) σ t | t 1 ,
where F α 1 ( ε t ) denotes the inverse quantile function of standardized innovation at probability α . When ε t is assumed Gaussian, the required quantile is obtained from the standard normal distribution. Parameters are estimated recursively over rolling windows of fixed length ω , yielding sequences of out of sample VaR forecasts suited for assessing performance under evolving volatility conditions [113,114]. The reliability of these forecasts is examined through backtesting, which evaluates whether empirical frequency and temporal structure of violations align with nominal α . A violation occurs when r t < VaR t | t 1 α . Define indicator
I t ( α ) = 1 { r t < VaR t | t 1 α } ,
which equals one under violation conditions and zero otherwise. Total violations are given by N = t = 1 T I t ( α ) , and the empirical rate is p ^ α = N / T . The Unconditional Coverage (UC) procedure of [113] assesses equality between p ^ α and α for H 0 : p = α . Its likelihood ratio is
L R U C = 2 ln ( 1 α ) T N α N ( 1 p ^ ) T N p ^ N ,
which converges to χ 2 ( 1 ) . Rejection indicates inadequate unconditional coverage. The Independence (IND) test of [115] evaluates serial independence of violations. Let N i j denote transitions from state i at t 1 to state j at t, with  i , j { 0 , 1 } . Estimated conditional probabilities are
π ^ 01 = N 01 N 00 + N 01 , π ^ 11 = N 11 N 10 + N 11 .
Under independence conditions, π ^ 01 = π ^ 11 = α . The likelihood ratio is
L R I N D = 2 ln ( 1 α ) N 00 + N 10 α N 01 + N 11 ( 1 π ^ 01 ) N 00 π ^ 01 N 01 ( 1 π ^ 11 ) N 10 π ^ 11 N 11 ,
which converges to χ 2 ( 1 ) . The Conditional Coverage (CC) statistic of [115] combines both aspects as
L R C C = L R U C + L R I N D ,
which converges to χ 2 ( 2 ) and supports accurate coverage with independent violations when not rejected. The Dynamic Quantile (DQ) procedure introduced by [114] evaluates whether exceedances display zero conditional mean and absence of serial dependence. Let
Hit t ( α ) = I t ( α ) α .
With correct calibration, E [ Hit t ( α ) | F t 1 ] = 0 . This property can be examined through
Hit t ( α ) = γ 0 + i = 1 k γ i Hit t i ( α ) + j = 1 m δ j Z t j + u t ,
where Z t j denotes lagged regressors such as returns or previous VaR forecasts, and  u t is an i.i.d. disturbance. Null hypothesis H 0 : γ 0 = = γ k = δ 1 = = δ m = 0 implies correct dynamic quantile calibration. Statistic D Q = n R 2 converges to χ 2 ( p ) with p = k + m + 1 . Finally, the Dynamic Binary (DB) formulation generalizes DQ through a logistic structure [116,117]. Conditional violation probability is modeled as
P ( I t ( α ) = 1 | F t 1 ) = G β 0 + i = 1 p β i I t i ( α ) + j = 1 q θ j X t j ,
where G ( x ) = exp ( x ) / ( 1 + exp ( x ) ) , X t j denotes auxiliary variables such as lagged returns or volatility forecasts, and coefficients β i , θ j are parameters. With  H 0 : β i = θ j = 0 for all i , j , violation probability equals α . This structure captures nonlinear persistence and feedback within exceedances. Joint application of UC, IND, CC, DQ and DB procedures yields a coherent statistical framework for assessing calibration, serial behavior and robustness of VaR forecasts under evolving volatility conditions.

2.2.14. Experimental Design

This study adopts a unified experimental design for volatility forecasting and risk evaluation with strict temporal causality. The analysis focuses on one-step-ahead ( h = 1 ) forecasting of conditional variance and the associated VaR measures. All forecasts are generated exclusively from information available at the forecast origin, thereby ensuring chronological consistency and precluding any form of look-ahead bias. Moreover, model assessment is conducted within a common walk-forward evaluation framework applied consistently across econometric, machine learning, and deep learning models. Although these model classes differ in their estimation and training mechanisms, they are evaluated under identical temporal constraints. Consequently, the resulting performance comparisons remain methodologically coherent. The detailed implementation of temporal validation, model estimation, and training procedures is provided in the subsequent subsections.

Temporal Validation Framework

The temporal validation framework establishes a rigorous approach to model training and evaluation under strict chronological ordering conditions, ensuring that all forecasts rely exclusively on information available prior to the forecast origin. Within this framework, the univariate volatility series is reformulated into a two-dimensional supervised input–output representation suitable for training neural networks and support vector regression models. This transformation is implemented through the construction of lagged feature vectors of realized variance, with alternative input-window lengths denoted by α inp . Candidate lag structures α inp { 7 , 14 , 21 , 28 } are systematically evaluated to determine the configuration that maximizes predictive accuracy.
The forecasting horizon remains fixed at one-step-ahead ( h = 1 ) , which enables rolling-origin evaluation over the test set and ensures direct comparability with conditional variance forecasts generated by GARCH-family models with an identical temporal assessment protocol. Realized variance is computed using a fixed aggregation window of length ω = 7 , without adopting multi-horizon or multi-window configurations. The rationale for this specification appears in Section 2.1, while Table 1 summarizes the role of each hyperparameter within this configuration.
Model validation is conducted through a forward-expanding cross-validation procedure based on the TSCV methodology. As illustrated in Figure 5, each fold generates sequential partitions consisting of training, validation, and test segments that strictly preserve the temporal ordering of the data. At each iteration, training inputs X train and corresponding targets y train are constructed first, followed by validation subsets X cv and y cv , and finally by test sequences X test and y test . Out-of-fold one-step-ahead forecasts are produced for every candidate model at each iteration. This procedure enforces chronological integrity and effectively prevents information leakage throughout the evaluation process.

Model Training and Hyperparameter Configuration

The experimental design implements a reproducible pipeline for one-step-ahead forecasting of Bitcoin conditional variance. The process begins with daily closing prices, from which log-returns are computed. Realized variance is subsequently derived using a fixed window ω = 7 . The dataset is partitioned chronologically into training and testing subsets comprising 80 percent and 20 percent of the observations, respectively, yielding n = 418 out-of-sample observations. Standardization parameters are estimated exclusively on the training set and subsequently applied to the test set to guarantee leakage-free preprocessing.
Econometric models from the generalized autoregressive conditional heteroskedasticity family, including GARCH-t, GJR-GARCH-t, EGARCH-t, APARCH-t, and GARCH-normal, are estimated directly on the return series r t using a rolling-origin walk-forward scheme. At each forecast origin t, all model parameters, including the degrees of freedom of the Student-t distribution when applicable, are re-estimated using information available up to time t 1 . These models produce one-step-ahead conditional variance forecasts σ ^ t | t 1 2 that constitute model-implied predictions of the unobserved realized variance for period t.
Machine-learning models, such as support vector regression, multilayer perceptrons, recurrent neural networks, and long short-term memory architectures, are trained directly on observed realized-variance measures. These models receive as inputs lagged vectors of realized variance x t ( ω , α inp ) and are optimized to predict the same target quantity σ t 2 that the econometric specifications forecast. Therefore, although econometric and machine-learning approaches differ in their input representations, both modeling streams generate forecasts that are conceptually and numerically comparable because they refer to identical conditional variance outcomes.
Hyperparameters for all machine-learning models, as well as for the stacking meta-learner based on extreme gradient boosting, are selected exclusively within the training sample using a leakage-safe blocked time-series cross-validation procedure with sequential folds. Each validation block is constructed to occur strictly after its corresponding training block, thereby preserving the temporal ordering of information. The optimal configuration is selected by minimizing the average validation root mean squared error across folds and is subsequently refitted on the complete training window before out-of-sample evaluation. Consequently, this unified evaluation protocol guarantees that forecasts generated by econometric and machine-learning models remain fully aligned and directly comparable despite their different estimation mechanisms.
All experiments were executed on a workstation equipped with an AMD Ryzen 9 processor, an NVIDIA RTX 4070 GPU, and 32 GB of RAM. Using this hardware configuration, the complete training pipeline required several days of computation, primarily due to two factors. First, deep-learning models were trained repeatedly across multiple blocked time-series cross-validation folds with early stopping. Second, GARCH-family models were recursively re-estimated at each forecasting origin using quasi-maximum likelihood procedures. In contrast, the computational cost of fitting the XGBoost meta-learner was comparatively minor, typically requiring only minutes once base-model forecasts were available. The overall computational burden occurs exclusively during the offline training stage, whereas the operational forecasting phase remains lightweight and suitable for real-time deployment.
Identical settings are adopted for all time-series cross-validation routines, with numJumps and stepsToForecast both set to 1, following the timeseries-cv framework of [61]. Regularization strategies, including 1 and 2 penalties together with dropout, are applied to mitigate overfitting. A detailed summary of the hyperparameter search space and the selected configurations appears in Table 4.

Hybrid and Stacked Architectures

Hybrid architectures are developed through a sequential modeling strategy in which a single GARCH specification is first estimated to characterize conditional variance dynamics, and either a support vector regression model or a neural network is subsequently applied to the resulting standardized residuals. This two-stage configuration enables the secondary model to capture nonlinear structures that are not fully accommodated by the parametric volatility component, thereby enhancing predictive performance while preserving the econometric interpretability of the underlying GARCH framework as illustrated in Figure 6.
Moreover, a stacking ensemble denoted STACKED is constructed using XGBoost as the meta-learner. The meta-model receives as inputs the one-step-ahead forecasts generated by all base learners, including the GARCH specifications and the machine learning models. By integrating heterogeneous predictive signals within a unified learning structure, the stacking approach leverages complementarities across model classes and mitigates limitations associated with individual specifications. Consequently, this integration produces superior generalization under sequential forecasting conditions.
From a methodological perspective, the proposed stacking framework remains inherently asset-independent. The architecture operates exclusively on predictive signals generated by diverse base models and learns to combine them through a data-driven meta-learning mechanism. Therefore, its effectiveness depends on the presence of complementary predictive information across models rather than on any asset-specific characteristic. The same mechanism applies without structural modification to any financial time series exhibiting volatility clustering and nonlinear dynamics. Across different assets, only the relative importance assigned to each base model by the XGBoost meta-learner may vary, as quantified through entropy-based feature-importance measures. However, the overall implementation, training protocol, and predictive architecture remain fully valid and directly replicable for any highly volatile asset with complexity comparable to Bitcoin.
Finally, the conditional variance forecasts produced by all hybrid and ensemble configurations are converted into conditional standard deviations. Value-at-Risk measures are then obtained by combining a robust location estimator with quantiles derived from a Student-t distribution. This formulation effectively accommodates the heavy-tailed characteristics commonly observed in financial return series and ensures that the resulting risk estimates remain consistent with empirically documented deviations from Gaussian assumptions.

2.3. Student-t Quantiles, Unified Temporal Validation, and VaR Backtesting

Value-at-Risk forecasts are constructed using Student-t quantiles and evaluated through a backtesting framework designed to assess empirical adequacy under strict out-of-sample conditions. Conditional variance forecasts are consistently transformed into risk measures across all competing modeling approaches, ensuring that exceedance behavior and statistical reliability are examined within a unified and fully comparable evaluation environment. For models that explicitly assume Student-t innovations, including GARCH-t and related variants, this transformation relies on degrees of freedom jointly estimated with the remaining parameters at each re-estimation step. In the rolling-origin walk-forward scheme, the Student-t degrees of freedom parameter ν is updated at every estimation point. Specifically, for each out-of-sample evaluation date t, the model is re-estimated using information available up to t 1 , producing a one-step-ahead conditional variance forecast together with an updated estimate ν t . The corresponding VaR forecast is obtained by combining the predicted conditional standard deviation with the Student-t quantile at confidence level α , evaluated using the iteration-specific parameter ν t .
Machine learning and deep learning models, including SVR, MLP, RNN, and LSTM architectures, generate point forecasts of conditional variance without explicitly specifying a parametric innovation distribution. Therefore, comparability with econometric models requires a consistent mapping from variance forecasts to VaR. This mapping is implemented using a fixed Student-t tail specification. The degrees of freedom parameter is set to ν = 12 for all out-of-sample periods, and Student-t quantiles are computed accordingly. This approach provides a parsimonious heavy-tailed calibration while avoiding the introduction of an additional time-varying tail-parameter estimation layer for models trained exclusively to predict variance rather than the full return density [118]. The procedure aligns with the established backtesting literature, which demonstrates that distributional assumptions and parameter estimation constitute integral components of the risk-measurement process and can materially influence backtest outcomes. VaR backtesting is conducted over the entire out-of-sample period by defining an exceedance whenever r t < VaR t α and constructing the associated indicator I t ( α ) . Model adequacy is assessed using tests of unconditional coverage based on the Kupiec test, independence of exceedances based on the Christoffersen test, and conditional coverage. When reported, these procedures are complemented by dynamic assessments, including dynamic quantile and duration-based tests, which jointly evaluate correct exceedance frequency and the absence of clustering or serial dependence in violations.
Temporal validation for machine learning and deep learning models is explicitly aligned with the rolling estimation protocol applied to GARCH-type models through the generation of all out-of-sample results within a common rolling-origin walk-forward evaluation framework that enforces strict causality. For every evaluation date t within the test segment, no model exploits information beyond t 1 . Hyperparameter selection for machine learning and deep learning models is performed exclusively within the training sample using leakage-safe blocked time-series cross-validation with sequential folds, as illustrated in Figure 5. Each validation block occurs strictly after its corresponding estimation block, thereby replicating the same walk-forward logic employed in out-of-sample evaluation while remaining restricted to the training data. The final configuration is selected by minimizing the average validation root mean square error and is subsequently refitted on the full training window before forecasting over the held-out 20 percent test segment.
In parallel, GARCH-family models are estimated through a rolling re-estimation procedure applied directly to returns. At each evaluation date t, parameters are re-estimated using observations available up to t 1 , and a one-step-ahead conditional variance forecast σ ^ t | t 1 2 is generated. TSCV is not employed for GARCH models as these approaches do not require the series to be restructured into supervised input–output representations for hyperparameter tuning, in contrast to machine learning and deep learning methods. Competing GARCH specifications are instead evaluated across model families and innovation distributions in an identical sequential estimation protocol. Consequently, time-series cross-validation for machine learning and deep learning models and rolling estimation for GARCH models constitute two coherent implementations of a single unified temporal framework grounded in strict causality, one-step-ahead forecasting, and sequential evaluation over the full test horizon, thereby ensuring complete comparability across model classes.
The same temporal principle is preserved within the stacked framework. During training, meta-features are constructed from out-of-fold base-model predictions generated through time-series cross-validation. During testing, the meta-learner operates exclusively on base forecasts computed using information available up to t 1 , ensuring that strict chronological integrity is maintained throughout the stacking procedure.

Performance Metrics

Forecast evaluation relies on a carefully selected set of complementary performance metrics designed to capture distinct dimensions of predictive accuracy. This study employs three loss functions: the root mean square error (RMSE), the quasi-likelihood loss (QLIKE), and the symmetric mean absolute percentage error (SMAPE). Model selection is conducted primarily on the basis of RMSE, whereas QLIKE and SMAPE are used as complementary evaluation criteria. RMSE provides an interpretable and scale-consistent measure that penalizes large forecast deviations through a squared loss structure, which makes it particularly appropriate in settings characterized by heavy-tailed distributions and extreme observations [119]. Empirical evidence demonstrates that RMSE-driven optimization improves forecast accuracy across multiple horizons and enhances robustness under parameter uncertainty and distributional shift conditions [120,121]. By assigning disproportionate weight to large errors, RMSE effectively captures economically meaningful deviations in conditional variance forecasts.
QLIKE evaluates asymmetric forecast errors in conditional variance and is strictly consistent for σ 2 . Its robustness to noise in volatility proxies, together with its analytical equivalence to the Gaussian log-likelihood, justifies its widespread use as a benchmark loss function [55,122]. Consequently, a small regularization constant ε = 10 8 is introduced to ensure numerical stability without affecting relative model rankings. The resulting loss function is defined as
QLIKE ε = 1 n t = 1 n σ t 2 + ε σ ^ t 2 + ε log σ t 2 + ε σ ^ t 2 + ε 1 , ε = 10 8 ,
where σ t 2 denotes the realized conditional variance and σ ^ t 2 represents its corresponding forecast. SMAPE provides a normalized, scale-independent measure of accuracy that remains stable in the vicinity of zero and mitigates distortions commonly associated with percentage-based loss functions. Its symmetric formulation facilitates interpretation across heterogeneous volatility magnitudes. Therefore, it is particularly well suited to highly volatile markets such as cryptocurrencies [123]. SMAPE is defined as
SMAPE = 100 % n t = 1 n σ ^ t 2 σ t 2 | σ ^ t 2 | + | σ t 2 | / 2 ,
Overall, RMSE serves as the primary ranking criterion, while QLIKE and SMAPE enrich the evaluation by incorporating both scale-dependent and scale-independent perspectives. In this evaluation protocol, the STACKED model attains the lowest RMSE, which demonstrates systematic gains in conditional variance forecasting and aligns with the improved VaR calibration documented in subsequent analyses. Building on this evaluation framework, the subsequent section presents a detailed comparative assessment of forecasting performance across all competing models.

3. Results and Discussion

3.1. Conditional Variance, Forecasting Accuracy and Diagnostic Assessment

To address the methodological gap identified in Section 1 regarding the need for a unified framework that combines econometric structure with the flexibility of machine learning, a comprehensive forecasting experiment was implemented. The analysis evaluated a diversified ensemble composed of four machine-learning systems (SVR, MLP, RNN, and LSTM) and five GARCH specifications (GARCH-normal, GARCH-t, GJR-GARCH-t, EGARCH-t, and APARCH-t), together with six hybrid families constructed through sequential compositions of individual learners (SVR–ANN, ANN–SVR, SVR–GARCH, GARCH–SVR, ANN–GARCH, and GARCH–ANN). This experimental configuration responds directly to the existing literature on hybrid and ensemble approaches [34,37,39] and extends previous contributions through a systematic assessment of stacking as a meta-learning strategy. For conciseness, Table 5 reports the five configurations that achieved the highest empirical accuracy among the full set of hybrid structures, together with the STACKED system, which aggregates predictions from all base learners using an XGBoost meta-learner. The proposed stacking architecture constitutes the core methodological contribution by coherently integrating the interpretable volatility dynamics inherent in GARCH models with the nonlinear representational capacity of neural networks [45,47]. Consequently, this approach overcomes the documented limitations of isolated econometric and machine-learning models described in prior studies [17,18,19].
The empirical assessment reported in Table 5, together with Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11, demonstrates that the STACKED ensemble achieves the lowest root mean squared error while preserving favorable regularized QLIKE  ( QLIKE ε ) and symmetric mean absolute percentage error performance. The Brock–Dechert–Scheinkman (BDS) test produces a p-value of 0.36 , which exceeds the conventional significance threshold of 0.05 , indicating the absence of nonlinear dependence in the residual series. This diagnostic result accords with the methodological arguments advanced by Cavicchioli et al. [20] regarding the relevance of examining higher-order moment structures and confirms the adequacy of the stacked mechanism as a coherent data-generating process for subsequent risk analysis. The feature-importance configuration illustrated in Figure 11 (center) indicates that the multilayer perceptron constitutes the dominant contributor to predictive accuracy, followed by heavy-tailed GARCH components. This outcome provides evidence that the ensemble successfully combines nonlinear feature extraction with an econometric foundation, thereby improving sensitivity to tail dynamics without generating excessive expansion in the upper quantiles at α = 5 % . Figure 7 further demonstrates a stable correspondence between predicted variance σ ^ 2 and realized conditional variance σ R 2 across the entire evaluation horizon. In particular, the STACKED ensemble responds rapidly to the abrupt volatility surge observed in March 2020, reaches an appropriate peak on 12 March, and subsequently reverts to pre-shock levels without persistent overshooting. This behavior surpasses the performance of all alternative hybrid architectures. The adaptive response observed during this regime shift aligns with the empirical insights derived from Markov-switching threshold models discussed by Alraddadi [16], demonstrating the practical capacity of the ensemble to accommodate structural breaks without requiring explicit regime specification.
The learning trajectories depicted in Figure 8 decline smoothly and display a limited discrepancy between training and validation errors, with early stopping activated near optimal iterations for the multilayer perceptron at epoch 188, for the long short-term memory network at epoch 98, and for the recurrent neural network at epoch 112. This pattern confirms the stability of the optimization process for neural components embedded within individual, hybrid, and stacked systems, and alleviates concerns regarding overfitting that frequently restrict deep learning applications [40,44]. Taken together, these diagnostic findings indicate that the STACKED ensemble operates as a robust engine for conditional variance forecasting. The accurate variance projections σ ^ 2 generated by this framework provide a reliable foundation for improved alignment of Value-at-Risk estimates, as demonstrated in the subsequent subsection.

3.2. Value-at-Risk Implementation and Backtesting Assessment

The conditional variance projections σ ^ 2 obtained from individual, hybrid, and stacked formulations serve as primary inputs for out-of-sample VaR estimation. This design establishes a direct linkage between forecast accuracy and tail-risk measurement quality, addressing the core requirement emphasized in the Introduction: that reliable VaR depends critically on precise variance forecasts. Each model generates VaR by combining its projected conditional variance with an appropriate innovation distribution; performance is then examined through regulatory backtesting procedures that assess coverage and independence. Figure 9, Figure 10 and Figure 11 illustrate this mechanism across all architectures.
Figure 9. Out-of-sample conditional VaR coverage and exceedances for individual ML and GARCH models. Gray dots represent daily returns.
Figure 9. Out-of-sample conditional VaR coverage and exceedances for individual ML and GARCH models. Gray dots represent daily returns.
Mathematics 14 00624 g009
Exceedances concentrate around March 2020, with a peak on 12 March, and remain sparse outside this stress episode. Heavy-tailed GARCH variants maintain realized violation frequencies p ^ α near nominal targets for α = 1 % and 2.5 % , whereas standalone machine learning systems tend to produce excessively wide envelopes at α = 5 % , reflecting the calibration challenges noted in [40,41]. Hybrid MLGARCH structures reduce such overcoverage while preserving tail responsiveness and generate a less clustered pattern of breaches, suggesting weaker serial dependence. Among all candidates, the stacked formulation achieves the most consistent calibration. Its VaR bands expand rapidly during the March 2020 shock and contract appropriately once market conditions stabilize; realized violation frequencies remain close to nominal values across all confidence levels; stacking weights emphasize heavy-tailed GARCH components alongside nonlinear ML contributors; and realized conditional variance aligns closely with its predicted counterpart even at the volatility peak. These outcomes confirm that deriving VaR from σ ^ 2 and integrating individual and hybrid mechanisms through stacking substantially improves risk calibration, thereby supporting more efficient capital allocation [4,5].
Figure 10. Out-of-sample VaR performance for hybrid MLGARCH models (June 2019–July 2020). The 3 × 3 panel shows one-day-ahead VaR forecasts at three confidence levels (95%, 97.5%, and 99%) alongside actual returns, with exceedances marked where returns fall below the predicted VaR threshold. Gray dots represent daily returns.
Figure 10. Out-of-sample VaR performance for hybrid MLGARCH models (June 2019–July 2020). The 3 × 3 panel shows one-day-ahead VaR forecasts at three confidence levels (95%, 97.5%, and 99%) alongside actual returns, with exceedances marked where returns fall below the predicted VaR threshold. Gray dots represent daily returns.
Mathematics 14 00624 g010
Figure 11. Stacked ensemble performance and diagnostics. (Left): Out-of-sample VaR forecasts (95%, 97.5%, and 99% levels) with return exceedances. Gray dots represent daily returns. (Center): Feature-importance plot showing the relative contribution (stacking weights) of each base learner in the XGBoost meta-model. (Right): Calibration plot comparing realized conditional variance ( σ 2 ) against predicted variance ( σ ^ 2 ) during the evaluation period.
Figure 11. Stacked ensemble performance and diagnostics. (Left): Out-of-sample VaR forecasts (95%, 97.5%, and 99% levels) with return exceedances. Gray dots represent daily returns. (Center): Feature-importance plot showing the relative contribution (stacking weights) of each base learner in the XGBoost meta-model. (Right): Calibration plot comparing realized conditional variance ( σ 2 ) against predicted variance ( σ ^ 2 ) during the evaluation period.
Mathematics 14 00624 g011
Table 6 synthesizes out-of-sample diagnostics for α { 1 % , 2.5 % , 5 % } using the Kupiec unconditional coverage ( UC ), independence ( IND ), conditional coverage ( CC ), dynamic quantile ( DQ ), and dynamic binary ( DB ) tests. Throughout these assessments, a p-value below 0.05 implies rejection of the null hypothesis and indicates inadequate performance. At  α = 1 % , heavy-tailed GARCH models (GARCH-t, EGARCH-t, APARCH-t, GJR-GARCH-t) together with several recurrent networks maintain violation frequencies near 1 % and seldom fail UC or CC , signaling satisfactory tail calibration. Pure machine learning architectures, especially SVR and occasionally the MLP, display systematic overcoverage ( p ^ 1 % < 1 % ) and occasional CC or IND rejection, largely driven by temporal concentration of exceedances during the March 2020 event. This pattern persists at α = 2.5 % , where most heavy-tailed GARCH and hybrid mechanisms avoid UC or IND rejection, whereas several pure ML architectures continue to fail UC due to overcoverage and sometimes CC due to dependence in breaches. At  α = 5 % , such overcoverage becomes more pronounced for both ML architectures and some standard GARCH specifications, resulting in UC rejection. Hybrid alternatives mitigate this effect and yield fewer concurrent UC and CC failures, suggesting enhanced calibration. Across all probability levels, the stacked formulation delivers the most balanced results: violation frequencies remain near nominal targets, and  UC or CC rejections occur infrequently. These observations support the conclusion that VaR constructed from stacked conditional variance projections yields statistically reliable risk calibration and enables more efficient capital usage throughout the evaluation period, directly addressing the institutional need for robust, auditable risk measures highlighted in Section 1.

Diebold–Mariano Test for Forecast Comparison

To provide formal statistical validation of the forecasting advantage of the proposed ensemble, a two-sided Diebold–Mariano test was implemented to determine whether the stacked and hybrid mechanisms generate significantly lower conditional variance loss relative to individual benchmark models. This inferential comparison responds to the requirement for rigorous evaluation emphasized in the Introduction and supplies empirical evidence that the ensemble delivers benefits beyond simple point-forecast accuracy measures. The results presented in Table 7 demonstrate consistent and statistically significant improvements for the STACKED ensemble. Pairwise comparisons against all machine-learning benchmarks, including SVR, MLP, LSTM, and RNN, as well as against GARCH-family specifications, namely, GARCH-normal, GARCH-t, GJR-GARCH-t, EGARCH-t, and APARCH-t, yield p-values close to 0.00265 . These outcomes indicate a systematic reduction in average conditional variance loss attributable to the meta-learning integration.
Hybrid structures display more selective gains. For example, the LSTM–GJR–GARCH–t configuration significantly outperforms the EGARCH-t benchmark with a p-value near 0.029 and approaches statistical significance relative to the RNN model with a p-value close to 0.051 . Similarly, the GARCH-t–RNN hybrid surpasses the SVR benchmark with a p-value around 0.048 . For the remaining pairwise comparisons, p-values exceed the 0.05 threshold, indicating no significant difference in mean loss. This pattern is consistent with the capacity of heavy-tailed GARCH formulations to capture volatility clustering and excess kurtosis, as documented in the earlier literature [72]. The repeated occurrence of identical p-values across several comparisons, specifically 0.288881 , 0.090084 , 0.068224 , and  0.256055 , arises from equal loss differentials generated by the common rolling-window evaluation protocol and does not reflect any computational irregularity.
The Diebold–Mariano procedure is inherently two-sided, and therefore the direction of improvement is inferred from the accuracy rankings reported in Table 5 based on regularized QLIKE and root mean squared error criteria. The significant outcomes consistently favor either a hybrid configuration or the stacked formulation. In practical forecasting environments, the superiority of stacked learning corresponds to improved Value-at-Risk calibration, reflected in fewer violations and reduced coverage error. Consequently, the STACKED ensemble emerges as a statistically sound and reliable solution for conditional variance prediction, while hybrid machine-learning and GARCH combinations provide targeted advantages in specific comparative settings.
The superior performance of the STACKED ensemble during the COVID-19 volatility surges underscores its practical relevance for institutions managing cryptocurrency exposure. During episodes of abrupt market stress, such as the intense turbulence observed in March 2020, the ensemble adapts rapidly to regime transitions and preserves stability in one-step-ahead conditional variance forecasts. This capability supports reliable risk quantification across evolving market conditions, directly addressing the institutional need for robust risk measures highlighted in the Introduction. Improved VaR accuracy facilitates more efficient capital allocation, reduces unnecessary overcoverage while maintaining regulatory alignment, and enhances institutional capacity to respond to structural breaks that frequently characterize digital-asset environments [51].
The feature-importance analysis presented in Figure 11(center) provides transparent evidence of the relative contribution of individual base models to the final variance forecasts. These importance scores constitute a meaningful form of global interpretability by revealing how econometric specifications and machine-learning components interact within the ensemble structure. The results indicate that the meta-learner assigns differentiated weights to models according to their predictive relevance, thereby supporting an interpretable integration of heterogeneous forecasting signals. At the same time, the analysis highlights an inherent methodological limitation since feature-importance measures summarize average contributions across the entire sample and do not provide instance-level explanations for specific forecasting dates. Consequently, the current approach primarily facilitates aggregate model understanding rather than localized attribution of predictions. A relevant avenue for future research involves the incorporation of SHAP-based techniques applied to the XGBoost meta-learner to enhance granular explainability. This methodological extension would enable instance-level attribution of VaR forecasts on a day-by-day basis, particularly during periods of market stress while preserving the predictive accuracy and operational viability of the STACKED framework.

4. Conclusions

This study presents a methodological advancement in financial risk modeling through a stacked ensemble framework developed for improved conditional variance forecasting and Value-at-Risk estimation for Bitcoin. The proposed methodology demonstrates consistent improvements over conventional econometric models, established machine learning benchmarks, and sequential hybrid configurations. The framework integrates heavy-tailed generalized autoregressive conditional heteroskedasticity components that preserve distributional fidelity during market stress with adaptive neural layers capable of capturing regime shifts. Consequently, the approach provides a comprehensive structure for modeling the complex dynamics of cryptocurrency markets. The stacking mechanism combines these complementary methodologies within a unified predictive architecture, generating accuracy gains that exceed those achieved by individual models and traditional hybrid strategies.
From a methodological perspective, the stacking framework remains inherently asset-independent. The architecture operates exclusively on predictive signals generated by heterogeneous base models and combines them through a data-driven meta-learner. Therefore, its effectiveness depends on the presence of complementary information across models rather than on any Bitcoin-specific characteristic. The same structure can be applied without modification to any financial time series that exhibits volatility clustering and nonlinear behavior. Across different assets, only the relative importance assigned to each base model by the extreme gradient boosting meta-learner may vary, as determined through entropy-based feature-importance measures, while the overall implementation, training procedure, and predictive design remain directly transferable.
The selection of XGBoost as the sole meta-learner in this implementation arises from its capacity to capture nonlinear and conditional interactions among heterogeneous predictors while preserving robustness against overfitting in small-sample and high-noise environments. Gradient-boosted decision trees provide a flexible and computationally efficient mechanism for learning complex combinations of base-model forecasts, which proves particularly suitable for sequential prediction under nonstationary conditions. Alternative meta-learning approaches, including linear aggregation methods, neural network meta-models, and probabilistic ensemble techniques, were intentionally excluded to preserve methodological focus and ensure a controlled and interpretable evaluation. A systematic comparison of different meta-learner architectures represents a relevant direction for future research that aims to assess potential benefits from alternative aggregation paradigms.
Extensive empirical evaluation conducted during the COVID-19 market crisis from June 2019 to July 2020 provides strong evidence of the robustness of the framework under extreme volatility conditions. The STACKED ensemble achieves the lowest root mean square error while maintaining competitive performance in regularized quasi-likelihood and symmetric mean absolute percentage error metrics. Diagnostic testing confirms the absence of residual nonlinear dependence, with Brock–Dechert–Scheinkman statistics yielding non-significant results with p-values greater than 0.05. Diebold–Mariano comparisons further establish statistically significant superiority over all competing models. Most importantly, enhanced conditional variance forecasts translate directly into more accurate Value-at-Risk estimates, with empirical violation frequencies p ^ α closely matching theoretical levels for α { 1 % , 2.5 % } and substantially reducing overcoverage at the 5% level. These findings emphasize the critical link between advanced variance modeling and reliable regulatory-aligned risk measurement.
The analysis of feature importance reveals a coherent integration of nonlinear machine learning representations and heavy-tailed econometric structures, offering financial institutions a transparent and sophisticated tool for digital asset risk management. The framework supports optimized capital allocation through improved Value-at-Risk calibration and demonstrates strong adaptability to volatility regime changes and market disruptions. It provides interpretable risk analytics that successfully combine statistical rigor with the flexibility of artificial intelligence, thereby addressing the growing demand for explainable artificial intelligence in financial applications. Beyond cryptocurrency markets, the domain-agnostic design enables application to diverse financial contexts, including portfolio optimization, derivative pricing, and systemic risk assessment, where heavy tails, structural shifts, and heteroskedastic dynamics are prevalent.
Despite this method’s demonstrated effectiveness for Bitcoin, several limitations remain. The computational requirements associated with the training process, although justified by predictive improvements, may present challenges in latency-constrained institutional environments. The exclusive focus on Bitcoin restricts direct conclusions regarding generalizability to other digital assets and traditional financial instruments. Future research should explore probabilistic deep learning approaches for improved uncertainty quantification and transformer-based architectures for modeling long-range volatility dependencies. Cross-asset extensions to multi-currency portfolios and conventional financial markets offer additional opportunities, as does the incorporation of alternative data sources such as blockchain indicators and sentiment measures.
Overall, the proposed stacked ensemble constitutes a principled, practical, and interpretable framework for conditional variance estimation that advances both artificial intelligence research and financial risk management practice. By demonstrating how meta-learning effectively integrates econometric structure with machine learning adaptability, this work contributes to the development of explainable artificial intelligence in finance. The results provide financial institutions with a robust foundation for next-generation risk management systems capable of addressing the distinctive challenges of digital asset markets, thereby representing a meaningful innovation in financial technology and quantitative risk modeling.

Author Contributions

Conceptualization, L.R., K.V.A., C.E.V. and F.R.R.; Methodology, L.R., K.V.A. and F.R.R.; Software, L.R. and K.V.A.; Validation, L.R., K.V.A., C.E.V. and F.R.R.; Formal analysis, L.R., K.V.A., C.E.V. and F.R.R.; Investigation, L.R., K.V.A., C.E.V. and F.R.R.; Resources, L.R.; Data curation, K.V.A.; Writing—original draft, L.R. and K.V.A.; Writing—review and editing, L.R., K.V.A., C.E.V. and F.R.R.; Visualization, K.V.A.; Supervision, L.R.; Project administration, L.R.; Funding acquisition, L.R. and F.R.R. All authors have read and agreed to the published version of the manuscript.

Funding

The authors gratefully acknowledge the institutional support received for this research. L.R. and K.V.A. acknowledge the support and resources provided by Universidad del Norte. K.V.A. acknowledges financial support from the Ministry of Science, Technology and Innovation of Colombia (Minciencias) through the National Doctoral Scholarship Program for Higher Education Faculty (Call No. 909), awarded in February 2022 for doctoral studies at Universidad del Norte. C.E.V. acknowledges the support of Universidade Federal de Minas Gerais (UFMG), Brazil. F.R.R. acknowledges funding from national funds provided by FCT—Fundação para a Ciência e a Tecnologia, I.P., under projects UID/00006/2025 (DOI: 10.54499/UID/00006/2025) and UID/04011/2025 (DOI: 10.54499/UID/PRR/04011/2025).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Kristoufek, L. What are the main drivers of the Bitcoin price? Evidence from wavelet coherence analysis. PLoS ONE 2015, 10, e0123923. [Google Scholar] [CrossRef]
  2. Alexander, C.; Heck, D.F. Price discovery in Bitcoin: The impact of unregulated markets. J. Financ. Stab. 2020, 50, 100776. [Google Scholar] [CrossRef]
  3. Böhmecke-Schwafert, M. The role of blockchain for trade in global value chains: A systematic literature review and guidance for future research. Telecommun. Policy 2024, 48, 102835. [Google Scholar] [CrossRef]
  4. Pratas, T.E.; Ramos, F.R.; Rubio, L. Forecasting bitcoin volatility: Exploring the potential of deep learning. Eurasian Econ. Rev. 2023, 13, 285–305. [Google Scholar] [CrossRef]
  5. Berger, T.; Koubová, J. Forecasting Bitcoin returns: Econometric time series analysis vs. machine learning. J. Forecast. 2024, 43, 2904–2916. [Google Scholar] [CrossRef]
  6. Martinet, G.G.; McAleer, M. On the invertibility of EGARCH(p, q). Econom. Rev. 2018, 37, 824–849. [Google Scholar] [CrossRef]
  7. Aras, S. On improving GARCH volatility forecasts for Bitcoin via a meta-learning approach. Knowl.-Based Syst. 2021, 230, 107393. [Google Scholar] [CrossRef]
  8. Bergsli, L.Ø.; Lind, A.F.; Molnár, P.; Polasik, M. Forecasting volatility of Bitcoin. Res. Int. Bus. Financ. 2022, 59, 101540. [Google Scholar] [CrossRef]
  9. Musunuru, N. Examining Volatility Persistence and News Asymmetry in Soybeans Futures Returns. Atl. Econ. J. 2016, 44, 487–500. [Google Scholar] [CrossRef]
  10. Tavares, A.B.; Curto, J.D.; Tavares, G.N. Modelling heavy tails and asymmetry using ARCH-type models with stable Paretian distributions. Nonlinear Dyn. 2008, 51, 231–243. [Google Scholar] [CrossRef]
  11. Zhang, B. A study of financial time series volatility forecasting method based on GARCH modeling. In Proceedings of the International Conference on Digital Economy and Intelligent Computing; Association for Computing Machinery: New York, NY, USA, 2025; pp. 54–59. [Google Scholar]
  12. Güngör, A.; Güngör, M.S. Modeling time-varying co-movements between major cryptocurrencies and foreign exchange markets. In Emerging Insights on the Relationship Between Cryptocurrencies and Decentralized Economic Models; IGI Global: Hershey, PA, USA, 2023; pp. 86–106. [Google Scholar]
  13. Gatfaoui, H. Translating financial integration into correlation risk. Econ. Model. 2013, 30, 776–791. [Google Scholar] [CrossRef]
  14. Harris, R.D.F.; Stoja, E.; Tucker, J. A simplified approach to modeling the co-movement of asset returns. J. Futur. Mark. 2007, 27, 575–598. [Google Scholar] [CrossRef]
  15. Mili, M.; Bouteska, A. Forecasting nonlinear dependency between cryptocurrencies and foreign exchange markets using dynamic copula: Evidence from GAS models. J. Risk Financ. 2023, 24, 464–482. [Google Scholar] [CrossRef]
  16. Alraddadi, R. The Markov-switching threshold BLGARCH model. AIMS Math. 2025, 10, 18838–18860. [Google Scholar] [CrossRef]
  17. Bond, S. An Econometric Model of Downside Risk; Butterworth-Heinemann: Oxford, UK, 2007; pp. 301–331. [Google Scholar]
  18. Telmoudi, F.; El Ghourabi, M.; Limam, M. On Conditional Risk Estimation Considering Model Risk. J. Appl. Stat. 2016, 43, 1386–1399. [Google Scholar] [CrossRef]
  19. Fan, L.; Li, H. Volatility Analysis and Forecasting Models of Crude Oil Prices: A Review. Int. J. Glob. Energy Issues 2015, 38, 5–17. [Google Scholar] [CrossRef]
  20. Cavicchioli, M.; Ghezal, A.; Zemmouri, I. (Bi) spectral analysis of Markov switching bilinear time series. Stat. Methods Appl. 2025, 1–30. [Google Scholar] [CrossRef]
  21. Shen, Z.; Wan, Q.; Leatham, D.J. Bitcoin return volatility forecasting: A comparative study between GARCH and RNN. J. Risk Financ. Manag. 2021, 14, 337. [Google Scholar] [CrossRef]
  22. Lu, X.; Liu, C.; Lai, K.K.; Cui, H. Risk measurement in Bitcoin market by fusing LSTM with the joint-regression-combined forecasting model. Kybernetes 2021, 52, 1487–1502. [Google Scholar] [CrossRef]
  23. Tapia, S.; Kristjanpoller, W. Framework based on multiplicative error and residual analysis to forecast bitcoin intraday-volatility. Phys. A Stat. Mech. Its Appl. 2022, 589, 126613. [Google Scholar] [CrossRef]
  24. Dutta, A.; Kumar, S.; Basu, M. A gated recurrent unit approach to bitcoin price prediction. J. Risk Financ. Manag. 2020, 13, 23. [Google Scholar] [CrossRef]
  25. Yan, X.; Weihan, W.; Chang, M. Research on financial assets transaction prediction model based on LSTM neural network. Neural Comput. Appl. 2021, 33, 257–270. [Google Scholar] [CrossRef]
  26. Rokde, C.; Chakole, J.; Ukey, A. Financial Forecasting with Deep Learning Models Based Ensemble Technique in Stock Market Analysis. Int. J. Inf. Eng. Electron. Bus. 2025, 17, 1–13. [Google Scholar] [CrossRef]
  27. Andri; Harumy, T.H.F.; Efendi, S. Evaluating the Impact of Weight Initialization on Recurrent and Transformer-Based Models in Financial Asset Price Prediction. East.-Eur. J. Enterp. Technol. 2025, 4, 19–31. [Google Scholar] [CrossRef]
  28. Ahmad, Z.; Bao, S.; Chen, M. DeepONet-Inspired Architecture for Efficient Financial Time Series Prediction. Mathematics 2024, 12, 3950. [Google Scholar] [CrossRef]
  29. Chuku, C.; Simpasa, A.; Oduor, J. Intelligent Forecasting of Economic Growth for Developing Economies. Int. Econ. 2019, 159, 74–93. [Google Scholar] [CrossRef]
  30. Perez-Bernabeu, E.; Polat, O. AI and Machine Learning in Macroeconomic Forecasting: A Systematic Review of Models, Trends, and Challenges. In Proceedings of the 31st ICE IEEE/ITMC Conference on Engineering, Technology, and Innovation: AI-Driven Industrial Transformation; IEEE: New York, NY, USA, 2025. [Google Scholar]
  31. Hai, D.H.; Van Tuan, P. AI and Econometric Modeling: Deep Reinforcement Learning in Predictive Modeling; Springer: Cham, Switzerland, 2024; Volume 556, pp. 53–60. [Google Scholar]
  32. Reina-Jiménez, P.; Martínez-Ballesteros, M.; Riquelme, J.C. Deep Learning or Trees? A Trade-off Analysis for Multivariate Time Series Forecasting. In Proceedings of the Advances in Computational Intelligence; Rojas, I., Joya, G., Catala, A., Eds.; Springer: Cham, Switzerland, 2026; pp. 616–627. [Google Scholar]
  33. He, X. A Survey on Time Series Forecasting; Springer: Singapore, 2023; Volume 348, pp. 13–23. [Google Scholar]
  34. Alghamdi, S.; Alqethami, S.; Alsubait, T.; Alhakami, H. Cryptocurrency price prediction using forecasting and sentiment analysis. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 891–900. [Google Scholar] [CrossRef]
  35. Anam, M.K.; Lestari, T.P.; Yenni, H.; Nasution, T.; Firdaus, M.B. Enhancement of Machine Learning Algorithm in Fine-grained Sentiment Analysis Using the Ensemble. ECTI Trans. Comput. Inf. Technol. 2025, 19, 159–167. [Google Scholar] [CrossRef]
  36. Zhang, X.; Liu, P.; Feng, J. A semi-heterogeneous ensemble forecasting method for stock returns based on sentiment analysis. Inf. Sci. 2025, 723, 122655. [Google Scholar] [CrossRef]
  37. Aras, S. Stacking hybrid GARCH models for forecasting Bitcoin volatility. Expert Syst. Appl. 2021, 174, 114747. [Google Scholar] [CrossRef]
  38. Xie, T. Forecast bitcoin volatility with least squares model averaging. Econometrics 2019, 7, 40. [Google Scholar] [CrossRef]
  39. Borup, D.; Jakobsen, J.S. Capturing volatility persistence: A dynamically complete realized EGARCH-MIDAS model. Quant. Financ. 2019, 19, 1839–1855. [Google Scholar] [CrossRef]
  40. Yang, Y.; Wen, L.; Li, L. Explainable AI for Time Series Prediction in Economic Mental Health Analysis. Front. Med. 2025, 12, 1591793. [Google Scholar] [CrossRef]
  41. Setiawan, H.; Setyanto, A.; Utami, E.; Kusrini. Advancements and Challenges in Financial Forecasting Models: A Systematic Literature Review. Adv. Transdiscipl. Eng. 2025, 73, 690–697. [Google Scholar]
  42. Castellani, M.; Dos Santos, E.A. Prediction of Long-Term Government Bond Yields Using Statistical and Artificial Intelligence Methods. Stud. Comput. Intell. 2014, 514, 341–367. [Google Scholar]
  43. Dang, J.; Ullah, A. Machine-Learning-Based Semiparametric Time Series Conditional Variance: Estimation and Forecasting. J. Risk Financ. Manag. 2022, 15, 38. [Google Scholar] [CrossRef]
  44. Nezhad, M.T.F.; Rezaei, M. Stock Price Prediction Using Intelligent Models, Ensemble Learning and Feature Selection. In Proceedings of the 2nd International Conference on Distributed Computing and High Performance Computing (DCHPC 2022); IEEE: New York, NY, USA, 2022; pp. 15–25. [Google Scholar]
  45. Rubio, L.; Alba, K. Forecasting selected colombian shares using a hybrid ARIMA-SVR model. Mathematics 2022, 10, 2181. [Google Scholar] [CrossRef]
  46. Abbasi, M.; Dehban, H.; Farokhnia, A.; Roozbahani, R.; Bahreinimotlagh, M. Long-term streamflow prediction using hybrid SVR-ANN based on Bayesian model averaging. J. Hydrol. Eng. 2022, 27, 05022018. [Google Scholar] [CrossRef]
  47. Rubio, L.; Palacio Pinedo, A.; Mejía Castaño, A.; Ramos, F. Forecasting volatility by using wavelet transform, ARIMA and GARCH models. Eurasian Econ. Rev. 2023, 13, 803–830. [Google Scholar] [CrossRef]
  48. Solís, M.; Gil-Gamboa, A.; Troncoso, A. Metalearning for improving time series forecasting based on deep learning: A water case study. Results Eng. 2025, 28, 107541. [Google Scholar] [CrossRef]
  49. Waqar, M.; Kim, Y.W.; Byun, Y.C. A stacking ensemble framework leveraging synthetic data for accurate and stable crop yield forecasting. IEEE Access 2025, 13, 136909–136926. [Google Scholar] [CrossRef]
  50. El Hafyani, M.; El Himdi, K.; El Adlouni, S.E. Improving monthly precipitation prediction accuracy using machine learning models: A multi-view stacking learning technique. Front. Water 2024, 6, 1378598. [Google Scholar] [CrossRef]
  51. Almeida, J.; Gonçalves, T.C. Cryptocurrency market microstructure: A systematic literature review. Ann. Oper. Res. 2024, 332, 1035–1068. [Google Scholar] [CrossRef]
  52. Katsiampa, P. Volatility co-movement in the cryptocurrency market. J. Int. Financ. Mark. Institutions Money 2019, 62, 101–124. [Google Scholar]
  53. Auer, R.; Claessens, S. Cryptocurrency market reactions to regulatory news 1. In The Routledge Handbook of FinTech; Routledge: New York, NY, USA, 2021; pp. 455–468. [Google Scholar]
  54. Barndorff-Nielsen, O.E.; Hansen, P.R.; Lunde, A.; Shephard, N. Designing realized kernels to measure the ex post variation of equity prices in the presence of noise. Econometrica 2008, 76, 1481–1536. [Google Scholar] [CrossRef]
  55. Patton, A.J. Volatility forecast comparison using imperfect volatility proxies. J. Econom. 2011, 160, 246–256. [Google Scholar] [CrossRef]
  56. Caporale, G.M.; Plastun, A. The day of the week effect in the cryptocurrency market. Financ. Res. Lett. 2019, 31. [Google Scholar] [CrossRef]
  57. Ma, D.; Tanizaki, H. The day-of-the-week effect on Bitcoin return and volatility. Res. Int. Bus. Financ. 2019, 49, 127–136. [Google Scholar] [CrossRef]
  58. Dudek, G.; Fiszeder, P.; Kobus, P.; Orzeszko, W. Forecasting cryptocurrencies volatility using statistical and machine learning methods: A comparative study. Appl. Soft Comput. 2024, 151, 111132. [Google Scholar] [CrossRef]
  59. Kwon, J.H. On the factors of Bitcoin’s value at risk. Financ. Innov. 2021, 7, 87. [Google Scholar] [CrossRef]
  60. Syuhada, K.; Tjahjono, V.; Hakim, A. Improving Value-at-Risk forecast using GA-ARMA-GARCH and AI-KDE models. Appl. Soft Comput. 2023, 148, 110885. [Google Scholar] [CrossRef]
  61. Ramos, F.R. Data Science na Modelação e Previsão de Séries Económico-Financeiras: Das Metodologias Clássicas ao Deep Learning. Ph.D. Thesis, ISCTE-Instituto Universitario de Lisboa, Lisboa, Portugal, 2021. [Google Scholar]
  62. Sarkodie, S.A.; Ahmed, M.Y.; Owusu, P.A. COVID-19 pandemic improves market signals of cryptocurrencies–evidence from Bitcoin, Bitcoin Cash, Ethereum, and Litecoin. Financ. Res. Lett. 2022, 44, 102049. [Google Scholar] [CrossRef]
  63. Tan, X.; Tao, Y. Trend-based forecast of cryptocurrency returns. Econ. Model. 2023, 124, 106323. [Google Scholar] [CrossRef]
  64. Ozdamar, M.; Sensoy, A.; Akdeniz, L. Retail vs institutional investor attention in the cryptocurrency market. J. Int. Financ. Mark. Institutions Money 2022, 81, 101674. [Google Scholar] [CrossRef]
  65. Corbet, S.; Larkin, C.; Lucey, B. The contagion effects of the COVID-19 pandemic: Evidence from gold and cryptocurrencies. Financ. Res. Lett. 2020, 35, 101554. [Google Scholar] [CrossRef]
  66. Jia, Y.; Liu, Y.; Yan, S. Higher moments, extreme returns, and cross–section of cryptocurrency returns. Financ. Res. Lett. 2021, 39, 101536. [Google Scholar] [CrossRef]
  67. Baur, D.G.; Dimpfl, T. Asymmetric volatility in cryptocurrencies. Econ. Lett. 2018, 173, 148–151. [Google Scholar] [CrossRef]
  68. Conrad, C.; Custovic, A.; Ghysels, E. Long-and short-term cryptocurrency volatility components: A GARCH-MIDAS analysis. J. Risk Financ. Manag. 2018, 11, 23. [Google Scholar] [CrossRef]
  69. Hatemi-J, A. Modeling the Asymmetric and Time-Dependent Volatility of Bitcoin: An Alternative Approach. Eng. Proc. 2024, 68, 15. [Google Scholar]
  70. Engle, R.F. Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of United Kingdom Inflation. Econometrica 1982, 50, 987–1007. [Google Scholar] [CrossRef]
  71. Bollerslev, T. Generalized Autoregressive Conditional Heteroskedasticity. J. Econom. 1986, 31, 307–327. [Google Scholar] [CrossRef]
  72. Bollerslev, T. A Conditionally Heteroskedastic Time Series Model for Speculative Prices and Rates of Return. Rev. Econ. Stat. 1987, 69, 542–547. [Google Scholar] [CrossRef]
  73. Glosten, L.R.; Jagannathan, R.; Runkle, D.E. On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks. J. Financ. 1993, 48, 1779–1801. [Google Scholar] [CrossRef]
  74. Nelson, D.B. Conditional Heteroskedasticity in Asset Returns: A New Approach. Econometrica 1991, 59, 347–370. [Google Scholar] [CrossRef]
  75. Ding, Z.; Granger, C.W.J.; Engle, R.F. A Long Memory Property of Stock Market Returns and a New Model. J. Empir. Financ. 1993, 1, 83–106. [Google Scholar] [CrossRef]
  76. Liu, J.; Seraoui, R.; Vitelli, V.; Zio, E. Nuclear power plant components condition monitoring by probabilistic support vector machine. Ann. Nucl. Energy 2013, 56, 23–33. [Google Scholar] [CrossRef]
  77. Yu, P.S.; Chen, S.T.; Chang, I.F. Support vector regression for real-time flood stage forecasting. J. Hydrol. 2006, 328, 704–716. [Google Scholar] [CrossRef]
  78. Wu, C.; Chau, K.W.; Li, Y.S. River stage prediction based on a distributed support vector regression. J. Hydrol. 2008, 358, 96–111. [Google Scholar] [CrossRef]
  79. Wu, M.C.; Lin, G.F.; Lin, H.Y. Improving the forecasts of extreme streamflow by support vector regression with the data extracted by self-organizing map. Hydrol. Process. 2014, 28, 386–397. [Google Scholar] [CrossRef]
  80. Dibike, Y.B.; Velickov, S.; Solomatine, D.; Abbott, M.B. Model induction with support vector machines: Introduction and applications. J. Comput. Civ. Eng. 2001, 15, 208–216. [Google Scholar] [CrossRef]
  81. Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2011, 2, 1–27. [Google Scholar] [CrossRef]
  82. Suganyadevi, M.; Babulal, C. Support vector regression model for the prediction of loadability margin of a power system. Appl. Soft Comput. 2014, 24, 304–315. [Google Scholar] [CrossRef]
  83. Rosenblatt, F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev. 1958, 65, 386. [Google Scholar] [CrossRef] [PubMed]
  84. Almeida, L.B. Multilayer perceptrons. In Handbook of Neural Computation; CRC Press: Boca Raton, FL, USA, 2020; pp. C1–C2. [Google Scholar]
  85. Murtagh, F. Multilayer perceptrons for classification and regression. Neurocomputing 1991, 2, 183–197. [Google Scholar] [CrossRef]
  86. Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
  87. Bishop, C.M. Neural networks: A pattern recognition perspective. In Handbook of Neural Computation; CRC Press: Boca Raton, FL, USA, 2020. [Google Scholar]
  88. Zhang, G.P. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 2003, 50, 159–175. [Google Scholar] [CrossRef]
  89. Khashei, M.; Bijari, M. A novel hybridization of artificial neural networks and ARIMA models for time series forecasting. Appl. Soft Comput. 2011, 11, 2664–2675. [Google Scholar] [CrossRef]
  90. Wang, L.; Zou, H.; Su, J.; Li, L.; Chaudhry, S. An ARIMA-ANN hybrid model for time series forecasting. Syst. Res. Behav. Sci. 2013, 30, 244–259. [Google Scholar] [CrossRef]
  91. Khashei, M.; Bijari, M. A new class of hybrid models for time series forecasting. Expert Syst. Appl. 2012, 39, 4344–4357. [Google Scholar] [CrossRef]
  92. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
  93. McCulloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 1943, 5, 115–133. [Google Scholar] [CrossRef]
  94. Kingma, D.P. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  95. Hopfield, J.J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA 1982, 79, 2554–2558. [Google Scholar] [CrossRef]
  96. Jordan, M.I. Serial order: A parallel distributed processing approach. In Advances in Psychology; Elsevier: Amsterdam, The Netherlands, 1997; Volume 121, pp. 471–495. [Google Scholar]
  97. Elman, J.L. Finding structure in time. Cogn. Sci. 1990, 14, 179–211. [Google Scholar] [CrossRef]
  98. LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
  99. Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 315–323. [Google Scholar]
  100. Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; Volume 30, p. 3. [Google Scholar]
  101. Werbos, P.J. Backpropagation through time: What it does and how to do it. Proc. IEEE 1990, 78, 1550–1560. [Google Scholar] [CrossRef]
  102. Hochreiter, S. Long Short-term Memory. In Neural Computation; MIT-Press: Cambridge, MA, USA, 1997. [Google Scholar]
  103. Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994, 5, 157–166. [Google Scholar] [CrossRef]
  104. Sagheer, A.; Kotb, M. Time series forecasting of petroleum production using deep LSTM recurrent networks. Neurocomputing 2019, 323, 203–213. [Google Scholar] [CrossRef]
  105. Shahid, F.; Zameer, A.; Muneeb, M. Predictions for COVID-19 with deep learning models of LSTM, GRU and Bi-LSTM. Chaos Solitons Fractals 2020, 140, 110212. [Google Scholar] [CrossRef]
  106. Xiang, Z.; Yan, J.; Demir, I. A rainfall-runoff model with LSTM-based sequence-to-sequence learning. Water Resour. Res. 2020, 56, e2019WR025326. [Google Scholar] [CrossRef]
  107. Hu, C.; Wu, Q.; Li, H.; Jian, S.; Li, N.; Lou, Z. Deep learning with a long short-term memory networks approach for rainfall-runoff simulation. Water 2018, 10, 1543. [Google Scholar] [CrossRef]
  108. Peng, L.; Liu, S.; Liu, R.; Wang, L. Effective long short-term memory with differential evolution algorithm for electricity price prediction. Energy 2018, 162, 1301–1314. [Google Scholar] [CrossRef]
  109. Li, C.; Zheng, X.; Yang, Z.; Kuang, L. Predicting short-term electricity demand by combining advantages of ARMA and XGBoost in fog computing environment. Wirel. Commun. Mob. Comput. 2018, 2018, 5018053. [Google Scholar] [CrossRef]
  110. Zheng, H.; Wu, Y. An XGBoost model with weather similarity analysis and feature engineering for short-term wind power forecasting. Appl. Sci. 2019, 9, 3019. [Google Scholar] [CrossRef]
  111. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
  112. Zhao, Q.; Wen, X.; Jong, W.; Fang, J. Optimised extreme gradient boosting model for short-term electric load demand forecasting of regional grid system. Sci. Rep. 2022, 12, 11029. [Google Scholar]
  113. Kupiec, P.H. Techniques for verifying the accuracy of risk measurement models. J. Deriv. 1995, 3, 73–84. [Google Scholar] [CrossRef]
  114. Engle, R.; Manganelli, S. CAViaR: Conditional autoregressive value at risk by regression quantiles. J. Bus. Econ. Stat. 2004, 22, 367–381. [Google Scholar] [CrossRef]
  115. Christoffersen, P.F. Evaluating interval forecasts. Int. Econ. Rev. 1998, 39, 841–862. [Google Scholar] [CrossRef]
  116. Christoffersen, P.F.; Pelletier, D. Backtesting value-at-risk: A duration-based approach. J. Financ. Econom. 2004, 2, 84–108. [Google Scholar] [CrossRef]
  117. Pajhede, T. Backtesting value-at-risk: A generalized Markov framework. J. Forecast. 2015, 34, 376–390. [Google Scholar] [CrossRef]
  118. Barendse, S.; Kole, E.; van Dijk, D. Backtesting Value-at-Risk and Expected Shortfall in the Presence of Estimation Error. J. Financ. Econom. 2023, 21, 528–568. [Google Scholar] [CrossRef]
  119. Bao, W.; Yue, J.; Rao, Y. A deep learning framework for financial time series using stacked autoencoders and long-short term memory. PLoS ONE 2017, 12, e0180944. [Google Scholar] [CrossRef] [PubMed]
  120. Conrad, C.; Engle, R.F. Modelling Volatility Cycles: The MF2-GARCH Model. J. Appl. Econom. 2025, 40, 438–454. [Google Scholar] [CrossRef]
  121. Akgun, O.B.; Gulay, E. Dynamics in Realized Volatility Forecasting: Evaluating GARCH Models and Deep Learning Algorithms Across Parameter Variations. Comput. Econ. 2025, 65. [Google Scholar] [CrossRef]
  122. Bollerslev, T.; Patton, A.J.; Quaedvlieg, R. Exploiting the errors: A simple approach for improved volatility forecasting. J. Econom. 2016, 192, 1–18. [Google Scholar] [CrossRef]
  123. Makridakis, S.; Hibon, M. The M3-Competition: Results, conclusions and implications. Int. J. Forecast. 2000, 16, 451–476. [Google Scholar] [CrossRef]
Figure 1. Bitcoin daily return series (September 2014–July 2020). (Left): The full return trajectory with the terminal segment (June 2019–July 2020) highlighted in purple. (Right): Annual box plots illustrating the distribution of daily returns across calendar years.
Figure 1. Bitcoin daily return series (September 2014–July 2020). (Left): The full return trajectory with the terminal segment (June 2019–July 2020) highlighted in purple. (Right): Annual box plots illustrating the distribution of daily returns across calendar years.
Mathematics 14 00624 g001
Figure 2. Bitcoin daily price and trading volume (September 2014–July 2020). (Left): Daily closing prices in USD. (Right): Corresponding daily trading volume. The shaded purple segment marks the terminal period (June 2019–July 2020) used for out-of-sample evaluation.
Figure 2. Bitcoin daily price and trading volume (September 2014–July 2020). (Left): Daily closing prices in USD. (Right): Corresponding daily trading volume. The shaded purple segment marks the terminal period (June 2019–July 2020) used for out-of-sample evaluation.
Mathematics 14 00624 g002
Figure 3. Bitcoin 7-day realized volatility from September 2014 to July 2020. (Left): Weekly rolling volatility time series, highlighting the pronounced spike during the COVID-19 market crisis of March 2020 in purple color. (Right): Annual box plots illustrating the distribution of realized volatility across calendar years, with 2020 exhibiting the highest dispersion and outlier frequency in red.
Figure 3. Bitcoin 7-day realized volatility from September 2014 to July 2020. (Left): Weekly rolling volatility time series, highlighting the pronounced spike during the COVID-19 market crisis of March 2020 in purple color. (Right): Annual box plots illustrating the distribution of realized volatility across calendar years, with 2020 exhibiting the highest dispersion and outlier frequency in red.
Mathematics 14 00624 g003
Figure 4. Architecture of the LSTM network applied to Bitcoin volatility prediction.
Figure 4. Architecture of the LSTM network applied to Bitcoin volatility prediction.
Mathematics 14 00624 g004
Figure 5. Two-dimensional representation of volatility series, illustrating supervised input–output structure and chronological data partitions for model training and evaluation.
Figure 5. Two-dimensional representation of volatility series, illustrating supervised input–output structure and chronological data partitions for model training and evaluation.
Mathematics 14 00624 g005
Figure 6. BTC conditional variance forecasting architecture with hybrids, stacking, and VaR.
Figure 6. BTC conditional variance forecasting architecture with hybrids, stacking, and VaR.
Mathematics 14 00624 g006
Figure 7. Out-of-sample conditional variance for hybrid MLGARCH models: predicted σ ^ 2 versus realized σ R 2 .
Figure 7. Out-of-sample conditional variance for hybrid MLGARCH models: predicted σ ^ 2 versus realized σ R 2 .
Mathematics 14 00624 g007
Figure 8. Learning curves for neural network architectures during Bitcoin volatility forecasting. (Left): Multilayer perceptron (MLP). (Center): Long short-term memory network (LSTM). (Right): Recurrent neural network (RNN). Each panel shows the average RMSE across folds for training, validation, and test sets as a function of training epochs, demonstrating convergence and generalization performance.
Figure 8. Learning curves for neural network architectures during Bitcoin volatility forecasting. (Left): Multilayer perceptron (MLP). (Center): Long short-term memory network (LSTM). (Right): Recurrent neural network (RNN). Each panel shows the average RMSE across folds for training, validation, and test sets as a function of training epochs, demonstrating convergence and generalization performance.
Mathematics 14 00624 g008
Table 1. Symbol definitions employed to transform a univariate sequence into supervised learning input–output objects ( X , y ) and to compute associated volatility measures.
Table 1. Symbol definitions employed to transform a univariate sequence into supervised learning input–output objects ( X , y ) and to compute associated volatility measures.
SymbolDescription
α inp Length of input segment that determines how many past observations serve as regressors within each model. Possible configurations comprise sets of consecutive elements containing 7, 14, 21 or 28 time points, selected to capture short- and medium-range temporal structure.
hPrediction horizon representing number of future observations produced at each iteration. This study employs a one-step-ahead specification ( h = 1 ) in a rolling evaluation routine to ensure consistent alignment between estimation and forecast generation.
β jump Increment that governs movement across folds within temporal cross-validation. This parameter remains fixed at β jump = 1 to preserve continuity and avoid loss of sequential information.
ω Size of rolling window used to compute conditional variance, set equal to 7 in accordance with empirical evidence supporting weekly periodicity in cryptocurrency dynamics and its influence on volatility behavior.
Table 2. Descriptive metrics for Bitcoin return sequence and associated price series.
Table 2. Descriptive metrics for Bitcoin return sequence and associated price series.
ParameterDaily ReturnsClose Price
N records 2137.0000002137.000000
μ 0.0015154256.149419
σ 0.0393784048.077591
y min −0.464730178.102997
Q 1 −0.012083430.010986
x ˜ 0.0018353486.181641
Q 3 0.0171707653.979980
y max 0.22511919,497.400391
Table 3. Diagnostic indicators for BTC daily returns covering distributional behavior, stochastic stability in unit-root assessment, and short-range dependence.
Table 3. Diagnostic indicators for BTC daily returns covering distributional behavior, stochastic stability in unit-root assessment, and short-range dependence.
NormalityUnit Root/StationarityIndependence
TestKurt.Skew.JBADFKPSSLB
Statistic13.430763−0.95031616383.47−14.16080.15037219.7355
p-value< 10 12 0.341952< 10 12 2.0 × 10 26 0.1000000.031856
Table 4. Hyperparameter tuning summary and selection protocol. The search is performed within the training set using blocked TSCV.
Table 4. Hyperparameter tuning summary and selection protocol. The search is performed within the training set using blocked TSCV.
ModelHyperparameterSearch SpaceSelected Value
MLPDepth/width ( L , U ) L { 1 , 2 , 3 } ; U { 64 , 128 , 256 , 512 } (units per layer). Fixed: batch = 128 , Adam LR = 3 × 10 4 , max epochs = 500 , patience: 70. Input lags α inp { 7 , 14 , 21 , 28 } .Best-by-TSCV (RMSE). ( L = 1 , U = ( 256 ) , α inp = 7 )
RNNDepth/cells ( L , C ) L { 1 , 2 } ; C { 64 , 128 , 256 } (cells). Fixed: batch = 128 , Adam LR = 3 × 10 4 , max epochs = 500 , patience: 70. Input lags α inp { 7 , 14 , 21 , 28 } .Best-by-TSCV (RMSE). ( L = 2 , C = ( 64 , 32 ) , α inp = 7 )
LSTMDepth/cells ( L , C ) L { 1 , 2 } ; C { 64 , 128 , 256 } (cells). Fixed: batch = 128 , Adam LR = 3 × 10 4 , max epochs = 500 , patience: 70. Input lags α inp { 7 , 14 , 21 , 28 } .Best-by-TSCV (RMSE). ( L = 2 , C = ( 64 , 64 ) , α inp = 7 )
SVRKernel and regularizationKernel { RBF } (or include {linear, poly} if you tested them). Grid example: C { 0.1 , 1 , 10 , 100 } ; γ { 10 3 , 10 2 , 10 1 , 1 } ; ϵ { 10 3 , 10 2 , 10 1 } . Input lags α inp { 7 , 14 , 21 , 28 } .Best-by-TSCV (RMSE). (kernel=RBF, C = 10 , γ = 0.01 , ϵ = 0.01 , α inp = 28 )
XGBoostDepth/trees/ η Meta-learner XGBoost trained on TSCV fold predictions from base learners. Depth { 3 , 5 , 7 } ; trees (n_estimators) { 500 , 1000 , 2000 } ; learning rate η { 0.01 , 0.05 , 0.1 } (other regularization parameters fixed or tuned as in the stacking grid).Best-by-TSCV (RMSE). (stackedXGBoost; tuned depth/trees/η)
GARCH familyOrders/specification
and innovation
Candidate set: { GARCH , GJR GARCH , EGARCH , APARCH } with ( p , q ) { 1 , 2 , 3 } 2 and innovations { N , t ν } ; parameters are (Q)MLE re-estimated at each rolling step using data up to t 1 (including ν for t innovations).Best specification by out-of-sample RMSE (and VaR backtesting). (EGARCH-t)
Notes. Tuning uses blocked TSCV with numJumps = stepsToForecast = 1 (see [61]); ANNs use Adam + ES, and GARCH models are rolling ((Q)MLE) quasi-maximum likelihood with innovations { N , t ν } ( N Gaussian).
Table 5. Accuracy of hybrid variance models using a reduced metric set: QLIKE, RMSE, SMAPE, and BDS p-value.
Table 5. Accuracy of hybrid variance models using a reduced metric set: QLIKE, RMSE, SMAPE, and BDS p-value.
ModelQLIKEεRMSESMAPE (%)BDS (p)
STACKED5.3880.006420145.10.362
SVRGARCH-normal5.4430.009893121.40.443
MLPSVR4.5980.009938128.20.086
SVRMLP33.6750.010279125.50.192
LSTMGJRGARCHt44.1150.010460122.20.869
Notes. In hybrid models, the hyphen distinguishes the first component, which estimates conditional variance, from the second, which forecasts the resulting residuals; QLIKEε denotes the regularised QLIKE.
Table 6. Backtesting results: observed vs. expected violation rates (%) with test p-values (UC, IND, CC, DQ, DB) for lower-tail quantiles α { 1 % , 2.5 % , 5 % } .
Table 6. Backtesting results: observed vs. expected violation rates (%) with test p-values (UC, IND, CC, DQ, DB) for lower-tail quantiles α { 1 % , 2.5 % , 5 % } .
ModelViol (%)Exp (%)UC (p)IND (p)CC (p)DQ (p)DB (p)
α = 1 % (Lower tail)
SVR0.471.00.22280.08940.047110.86540.0000
MLP1.001.00.05170.05870.95440.00010.0000
LSTM0.711.00.52330.36820.97820.99990.9964
RNN0.941.00.90590.78230.95580.00040.4932
GJR-GARCH-t0.931.00.89060.78330.95380.00010.2563
GARCH-t0.701.00.51110.83680.78890.99650.9975
GARCH-normal0.471.00.21630.08890.46120.99960.9998
EGARCH-t0.471.00.21630.08990.09670.99930.9993
APARCH-t0.701.00.51110.83680.78890.99650.9975
LSTM-GJR-GARCH-t0.931.00.89060.78330.95380.99830.9892
GARCH-t-RNN0.931.00.89060.78330.95380.00010.2563
SVR-MLP0.941.00.89820.02330.07560.00010.0000
MLP-SVR0.931.00.89060.78330.95380.05020.9982
APARCH-t-SVR1.871.00.10060.00580.00060.00000.0500
SVR-GARCH-normal0.231.00.05520.94540.15860.72361.0000
STACKED0.701.00.50820.83690.786530.99250.9965
α = 2.5 % (Lower tail)
SVR1.182.50.05240.72940.14360.01740.0000
MLP1.172.50.04920.73070.13620.00820.3475
LSTM1.652.50.23310.62740.43660.25680.4932
RNN1.652.50.23310.62740.43660.30100.4932
GJR-GARCH-t1.172.50.04920.73070.13620.00820.3475
GARCH-t1.242.50.10450.54460.27160.05480.5063
GARCH-normal1.242.50.10450.54460.27160.05480.5063
EGARCH-t0.932.50.01760.78830.05750.09540.9917
APARCH-t0.932.50.01760.78830.05750.09540.9917
LSTM-GJR-GARCH-t2.342.50.82660.22120.46200.30200.3732
GARCH-t-RNN1.642.50.22190.09430.11700.00890.1473
SVR-MLP2.112.50.59900.17360.34510.52110.5335
MLP-SVR1.402.50.11310.06450.05160.07620.3830
APARCH-t-SVR2.102.50.58860.00100.03170.00000.0000
SVR-GARCH-normal0.472.50.00100.89090.00430.17790.9993
STACKED2.332.50.82060.48910.76720.53120.5824
α = 5 % (Lower tail)
SVR2.835.00.02610.40250.05930.87240.5638
MLP1.045.00.09640.09880.01600.08210.02785
LSTM2.835.00.02610.40250.05930.85280.5638
RNN3.775.00.22670.26200.25670.30850.1953
GJR-GARCH-t3.275.00.08060.47090.16760.17630.7131
GARCH-t3.275.00.08060.47090.16760.17630.7131
GARCH-normal3.045.00.04520.40110.09450.57140.5612
EGARCH-t1.175.00.05000.73070.00010.00840.3475
APARCH-t1.175.00.04520.73070.13620.09450.9794
LSTM-GJR-GARCH-t4.915.00.92910.36910.66540.76720.5398
GARCH-t-RNN3.975.00.31220.16670.23060.18420.2096
SVR-MLP3.765.00.21860.13180.10570.04540.3537
MLP-SVR2.105.00.00020.17270.00030.54620.5477
APARCH-t-SVR3.745.00.21080.13060.14580.10690.2723
SVR-GARCH-normal2.805.00.02350.40480.05430.54530.5087
STACKED3.265.00.07880.46970.16440.14360.0000
Underlined boldentries denote p-value < 0.05 . For hybrids Aσ2 B ε t ( A σ 2 ) , the second model is fitted to the residuals ε t from the first model’s variance. UC: Unconditional Coverage; IND: Independence; CC: Conditional Coverage; DQ: Dynamic Quantile; DB: Dynamic Binary.
Table 7. Diebold–Mariano p-values comparing hybrid specifications (columns) with individual models (rows) for conditional variance σ 2 .
Table 7. Diebold–Mariano p-values comparing hybrid specifications (columns) with individual models (rows) for conditional variance σ 2 .
LSTM–
GJR–GARCHt
GARCHt
RNN
SVR–
MLP
MLP
SVR
APARCH–t
SVR
SVR–
GARCH–Nor

STACKED
SVR0.1686540.0481570.2888810.0900840.0682240.2560550.002651
MLP0.0959830.4178210.2888820.0900850.0682250.2560560.002652
LSTM0.3130640.3130640.5690480.3420830.6318730.3376640.344827
RNN0.0511520.1169110.2888800.0900830.0682230.2560540.002650
GJR-GARCH-t0.1330340.5942100.2888790.0900860.0682260.2560570.002653
GARCH-t0.1196110.8237420.2888830.0900820.0682220.2560530.002649
GARCH-Nor0.1529890.8689920.2888780.0900870.0682270.2560580.002654
EGARCH-t0.0291650.8289490.2888850.0900810.0682210.2560520.002648
APARCH-t0.1396670.7322280.2888760.0900880.0682280.2560590.002655
Notes. Two-sided Diebold–Mariano p-values; significant cases are underlined and bold (p-value < 0.05 ). Abbreviations: GARCH-Nor = Gaussian; t = Student-t.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rubio, L.; Alba, K.V.; Velasquez, C.E.; Ramos, F.R. Stacked ML-GARCH for Bitcoin Risk Forecasting: A Novel Ensemble Approach for Superior Value-at-Risk Estimation. Mathematics 2026, 14, 624. https://doi.org/10.3390/math14040624

AMA Style

Rubio L, Alba KV, Velasquez CE, Ramos FR. Stacked ML-GARCH for Bitcoin Risk Forecasting: A Novel Ensemble Approach for Superior Value-at-Risk Estimation. Mathematics. 2026; 14(4):624. https://doi.org/10.3390/math14040624

Chicago/Turabian Style

Rubio, Lihki, Keyla V. Alba, Carlos E. Velasquez, and Filipe R. Ramos. 2026. "Stacked ML-GARCH for Bitcoin Risk Forecasting: A Novel Ensemble Approach for Superior Value-at-Risk Estimation" Mathematics 14, no. 4: 624. https://doi.org/10.3390/math14040624

APA Style

Rubio, L., Alba, K. V., Velasquez, C. E., & Ramos, F. R. (2026). Stacked ML-GARCH for Bitcoin Risk Forecasting: A Novel Ensemble Approach for Superior Value-at-Risk Estimation. Mathematics, 14(4), 624. https://doi.org/10.3390/math14040624

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop