For this study, daily price data were retrieved from the CoinMarketCap website using the crypto2 package in R, which performs automated scraping of the historical records of several financial assets, including Bitcoin, the focus of this research. The analyzed period spans from 1 January 2019 to 31 December 2024. The dataset consists of daily closing prices, which represent the final recorded value of the asset at the end of each trading day.
4.1. Data Description
It is essential to understand the dynamics of Bitcoin prices over time in order to provide a solid foundation before applying statistical modeling. Thus,
Figure 1 shows the trajectory of this cryptocurrency from 1 January 2019 to 31 December 2024.
As shown in
Figure 1, the analyzed period encompasses different market conditions. Between January 2019 and early 2020, Bitcoin prices remained relatively stable, staying below US
$15,000. In 2021, there was a strong upward trend in the first half of the year, followed by a sharp correction. This price surge coincided, in part, with the COVID-19 pandemic, during which many investors began to view Bitcoin as a potential hedge against global economic instability [
46].
Between 2023 and 2024, Bitcoin experienced another significant increase in value, driven by macroeconomic factors and specific events in the cryptocurrency market. This period culminated in a new price record, surpassing US$98,000 in November 2024, possibly influenced by political events such as the United States elections.
The presence of these different behaviors—from relative calm to high volatility, including sharp uptrends and corrections—makes the 2019–2024 interval ideal for testing the selected stochastic models. This allows for evaluating whether these models adequately adapt to drastic changes in price dispersion. Moreover, the occurrence of persistent trends and abrupt reversals creates a suitable context to test the ability of models based on fractional Brownian motion to capture temporal dependence.
Therefore, the choice of this period is not intended to avoid the influence of external factors or price instability but rather to use them as a stress test for the models. The ability of a stochastic model to describe and predict Bitcoin behavior over such a heterogeneous period, including both calm phases and major turbulence, attests to its robustness and relevance. In this context, the randomness and long memory, intrinsic characteristics of the studied models, will be assessed against a rich and challenging price history.
4.2. Daily Analysis
In this section, we use the abbreviations GOUFE-CIR and GOUFE-CONST to refer to the GOU-FE process driven by fBm (
3), with the CIR model as the volatility function
, and the GOU-FE process with constant volatility (
), respectively. Similarly, we use GFBM-CIR and GFBM-CONST to denote the corresponding GBM models obtained from (
4).
To asses the performance of all four models (GOUFE-CIR, GOUFE-CONST, GFBM-CIR, and GFBM-CONST), their parameters were estimated. The results are presented in
Table 1 and
Table 2, where dashes (–) indicate parameters not applicable to the constant volatility specification.
From
Table 1 and
Table 2, it can be observed that the models with constant volatility (GOUFE-CONST and GFBM-CONST) yield higher log-likelihood values compared to their respective counterparts with CIR-type stochastic volatility. This initial analysis suggests that, for the daily Bitcoin prices, the additional complexity introduced by modeling volatility through the CIR process may not be justified, and assuming constant volatility provides a more efficient fit to the data. Among all analyzed models, GOUFE-CONST stands out with the highest log-likelihood (31,871.26), indicating the best overall fit. Furthermore, the estimated Hurst exponent (
H) remains consistently in the range of 0.54 to 0.55 across all models, highlighting the presence of long memory in the daily Bitcoin prices.
After evaluating the estimated parameters, predictive performance was assessed using error metrics that quantify the deviation between observed and predicted values. Accordingly,
Figure 2 presents a comparison between the observed daily Bitcoin prices and the forecasts generated by the four models throughout the analysis period, while
Table 3 summarizes this comparison based on the prediction errors.
Analyzing
Figure 2, it can be seen that all models are able to follow the general trajectory of the observed prices, capturing the main market trends and movements. However,
Table 3 shows that, from the perspective of prediction errors and the coefficient of determination, the GFBM-CIR model had the best performance, presenting the lowest RMSE (US
$1201.802) and the lowest MAPE (2.25%). On the other hand, GOUFE-CONST achieved the lowest MAE (US
$716.2269).
Additionally, the GOUFE-CONST model also obtained the highest value (0.9969822), indicating that it explains approximately 99.70% of the variability in observed prices, although the other models have very similar values. Regarding bias, the GFBM-CIR model showed the smallest absolute bias (US$9.7625), suggesting a very small tendency toward overestimation, whereas GOUFE-CIR presented the highest bias (US$215.2351).
Overall, the error metrics suggest that in terms of predictive accuracy, the GFBM-CIR and GOUFE-CONST models performed the best, although the other models are not far behind. It is worth noting that, for the GOUFE model, the adoption of stochastic volatility (CIR) did not bring a significant gain in accuracy despite the additional parameters. In contrast, for the GFBM model, introducing CIR volatility resulted in a noticeable improvement, with the CIR version having a bias of only US$9.7625, compared to US$−72.3004 for the constant volatility specification.
Nevertheless, to more confidently determine which model performs best, it is necessary to consider additional factors, such as the Information Criteria (AIC, BIC, and EDC), where lower values indicate a better fit by penalizing model complexity.
Table 4 presents these results.
According to the results in
Table 4, the GOUFE-CIR model achieved the lowest values for all three information criteria (AIC, BIC, and EDC), indicating the best overall fit by balancing likelihood and model complexity. The GFBM-CIR model ranked second in this evaluation.
These results suggest that introducing stochastic volatility via the CIR (Cox–Ingersoll–Ross) process provided a significant gain in model fit that justifies the increased number of parameters, particularly when compared to the constant volatility versions (GOUFE-CONST and GFBM-CONST), which yielded consistently higher AIC, BIC, and EDC values.
However, it is important to note that information criteria are based on in-sample fit. When comparing these findings with the out-of-sample predictive performance reported in
Table 3, a discrepancy in model rankings emerges. While GFBM-CIR and GOUFE-CONST performed better in terms of predictive accuracy, GOUFE-CIR remained the best model based on in-sample fit.
4.2.1. Residual Analysis
To evaluate model adequacy, a residual analysis was performed. This allows us to examine the distribution of residuals, including median, quartiles, and outliers, as shown in
Figure 3 and
Table 5.
Based on
Figure 3 and
Table 5, we conclude that the GFBM-CIR and GOUFE-CONST models exhibit low bias in forecasting the central trend of Bitcoin prices, as their medians are very close to zero (2.81 and −5.84, respectively).
Regarding the dispersion of the residuals, the interquartile range (IQR = Q3 − Q1) is similar across models, ranging from 666.72 (GOUFE-CONST) to 698.34 (GOUFE-CIR). In this aspect, GOUFE-CIR shows the highest IQR and also the highest standard deviation (SD = 1271.11), indicating greater variability in its residuals. On the other hand, GFBM-CIR (SD = 1212.76) and GOUFE-CONST (SD = 1212.72) have the lowest standard deviations, indicating less dispersion.
In terms of symmetry in the residual distribution, the GFBM-CIR model appears to be the most symmetric around its median, with similar distances between the median and quartiles (). The GFBM-CONST model shows a slight right-skew, while the GOUFE-CIR model exhibits a more pronounced left-skew, consistent with its negative median.
A common feature in all four models is the presence of numerous outliers, which contribute to large prediction errors. The total range of residuals (Max–Min) is quite similar across models, varying from approximately 15,694 (GFBM-CONST) to 15,979 (GOUFE-CIR).
In
Figure 4, we present the scatter plots of residuals against the observed Bitcoin prices for the four models fitted to the daily data: GFBM-CIR, GFBM-CONST, GOUFE-CIR, and GOUFE-CONST.
According to
Figure 4, in all four models, the dispersion of residuals is not constant across the range of observed prices. For lower prices (approximately below US
$25,000), the residuals are more concentrated around zero, indicating smaller errors. However, as the price increases, the variability of the residuals also increases considerably, forming a pattern resembling a cone or fan. This behavior indicates the presence of heteroskedasticity, meaning that the error variance is not constant and tends to be larger at higher price levels. Thus, the models’ prediction errors tend to increase as Bitcoin prices rise.
To gain further insight into the distribution of residuals, QQ-plots are provided below, comparing the quantiles of the residuals with the quantiles of a standard normal distribution
. This is presented in
Figure 5.
According to
Figure 5, all four models show significant deviations from the reference line, with a large number of outliers and heavy tails, suggesting that the residuals do not follow a standard normal distribution.
This heavy-tailed behavior implies that prediction errors can occasionally be much larger than what is suggested by average-based measures such as RMSE or MAE. To better assess the risk associated with such extreme events, we use the Expected Shortfall (ES), also known as Conditional Value at Risk. The ES1−α measures the average expected loss in the worst of scenarios.
Table 6 presents the Expected Shortfall values calculated at a 95% confidence level (ES
95) for each fitted model.
Analyzing the results in
Table 6, we conclude that the GOUFE-CIR model presented the lowest expected shortfall (US
$−2764.197), considering the average losses in the worst 5% of scenarios. Next in ranking are the GOUFE-CONST (US
$−3020.207) and GFBM-CIR (US
$−3070.517) models, with very similar values.
In contrast, the GFBM-CONST model recorded the highest expected shortfall (US$−3212.101), indicating that it is the least conservative model.
4.2.2. Normality and Autocorrelation Tests on Residuals
To formally verify the normality hypothesis suggested in
Figure 5, the Shapiro–Wilk test was applied and its results presented in
Table 7. The null hypothesis (
) of this test is that the data follow a normal distribution.
As shown in
Table 7, the
p-values for all four models are extremely small, leading to rejection of the null hypothesis of normality in all cases.
To assess whether autocorrelation is present in the residuals of the fitted models, the Autocorrelation Function (ACF) plots are presented in
Figure 6. These plots display the estimated autocorrelations of the residuals at different lags for each model. The blue dashed lines represent approximate significance limits (usually
, where
N is the sample size). Bars exceeding these limits indicate statistically significant autocorrelation at that specific lag.
As seen in
Figure 6, all four models show several bars exceeding the significance limits, suggesting that some autocorrelation structures are not fully captured by the models.
To formally and globally test for the presence of autocorrelation, the Ljung–Box test was applied, and its results are presented in
Table 8. Its null hypothesis (
) is that autocorrelations up to a given lag are jointly equal to zero, i.e., the residuals are independent. According to Burns [
47], the chosen lag should not exceed 5% of the sample size. For the current study, a lag of 60 days was selected, which falls well below the upper limit (109).
As evident from
Table 8, the null hypothesis (
) is strongly rejected for all four models, as the
p-values are significantly lower than any conventional significance level. Thus, residuals exhibit dependency at a 60-day lag, confirming the initial suspicion that these models retain some long-range temporal dependence.
4.3. Intraday Analysis
The choice of the price difference threshold is a crucial step in computing price durations, as it determines which price changes should be considered significant for generating durations (intervals between meaningful price changes). If the threshold is too small, the analysis may be dominated by market noise, leading to an excessive number of short durations. Conversely, if the threshold is too large, important market movements might be overlooked.
To find an appropriate balance, tests were conducted by varying the price threshold between 0.05%, 0.075%, 0.10%, and 0.15% of the average Bitcoin price on 20 January 2025. The impact of this choice on the resulting distribution of durations is presented in
Figure 7 and
Table 9.
As shown in
Figure 7 and
Table 9, a 0.05% threshold is highly sensitive to minor price fluctuations, generating 8893 durations with a median of just 4 s. In contrast, a 0.15% threshold drastically reduces the event count to 1579 and increases the median duration to 21 s. The 0.10% threshold offers a balanced trade-off, producing a substantial number of durations (3114) with a median of 10 s, thereby capturing significant market movements without being overly influenced by noise. Based on these findings, a threshold of 0.10%, corresponding to price changes of approximately US
$104.74, was adopted for the remainder of this study.
4.3.1. Duration Analysis Throughout the Day
To understand how market activity varies over a 24 h period, the generated durations were analyzed. In ACD models, shorter durations indicate periods of high market activity, whereas longer durations suggest lower liquidity or reduced volatility.
Figure 8 displays the evolution of durations over the course of the day, highlighting periods of higher and lower price variation intensity.
From
Figure 8, we note that significant variation in durations is observable throughout the day. The longest durations, where prices remained stable for over 6 minutes, occurred around 3:00 a.m., 7:00 a.m., 9:00 a.m., and 7:00 p.m. (GMT-3 timezone).
To remove the underlying diurnal pattern from durations, a smoothing technique known as the “Super Smoother”, proposed by Friedman [
48], was applied. This method identifies long-term trends in how durations vary over the day. The resulting pattern is shown in
Figure 9.
Figure 9 reveals a cyclical intraday pattern in event durations. Durations are initially high, then drop sharply around 5:00 a.m., signaling an increase in trading activity. Between 5:00 a.m. and 3:00 p.m., successive peaks and troughs suggest fluctuating volatility, likely influenced by overlapping market sessions and liquidity windows. Towards the end of the day (after 3:00 p.m.), durations begin to rise again as market activity subsides.
After adjusting the durations to remove the time-of-day effect, the distribution of the adjusted durations is presented in
Figure 10. The histogram reveals a highly right-skewed distribution, with the vast majority of durations concentrated near zero, indicating that most transactions occur within extremely short time intervals.
This sharp concentration around zero is consistent with the microstructure of financial markets, where bursts of activity tend to cluster in high-liquidity periods. Despite the adjustment, the presence of a long right tail suggests that some prolonged gaps between transactions still occur, potentially driven by episodic drops in trading intensity or structural breaks not fully captured by the diurnal correction.
The resulting shape underscores the need for flexible parametric models, such as the Generalized Gamma distribution, that can accommodate both heavy tails and asymmetry. This distributional feature will directly inform the choice of conditional duration models in subsequent sections and highlights the limitations of more restrictive specifications like the exponential distribution, which assumes memorylessness and constant hazard rate.
Table 10 provides summary statistics for the adjusted durations, which are essential for selecting and parameterizing an appropriate ACD model. The statistics confirm right skewness (3.877) and high kurtosis (26.698). The mean (1.055 s) is greater than the median (0.704 s), which is characteristic of a right-skewed distribution. The high coefficient of variation (110.62%) underscores the significant dispersion of the data.
4.3.2. Model Fit Analysis for Durations
To model the adjusted durations, three
models were fitted and compared: Exponential, Weibull, and Generalized Gamma. The parameter estimates for each model are presented in
Table 11,
Table 12, and
Table 13.
In all three models, the parameters and are highly statistically significant, indicating a strong temporal dependence where past durations heavily influence the current duration.
Exponential Model: The parameter estimates show high persistence, with the sum .
Weibull Model: This model also indicates strong temporal dependence. The shape parameter is significant and greater than 1, suggesting that the hazard rate of durations is increasing and that the distribution is more peaked with heavier tails than the exponential distribution.
Generalized Gamma Model: The estimates for and again confirm temporal persistence. The additional shape parameters, and , are both significant, affording the model greater flexibility to capture the complex skewness and kurtosis of the duration data.
The evaluation of model fit for the ACD(1,1) specification was conducted using a combination of log-likelihood values and standard model selection criteria: AIC (Akaike Information Criterion), BIC (Bayesian Information Criterion), and MSE (Mean Squared Error). As presented in
Table 14, the Generalized Gamma distribution produced the best overall performance, attaining the highest log-likelihood and the lowest values for AIC and BIC. Although the MSE differences across models are subtle, they remain consistent with the ranking obtained from the likelihood-based criteria.
These results indicate that the Generalized Gamma model provides a more flexible and accurate representation of the conditional distribution of durations, particularly when compared to the more restrictive Exponential and Weibull alternatives.
In addition to the numerical criteria, the adequacy of the fitted models was also examined through Cox–Snell residual analysis. QQ-plots of the residuals (
Figure 11) illustrate the extent to which the transformed residuals conform to the theoretical
distribution. The Exponential and Weibull models display clear deviations from the reference line, especially in the upper quantiles, indicating poor tail modeling. Conversely, the Generalized Gamma model yields residuals that closely align with the diagonal, suggesting a more accurate capture of the underlying duration dynamics.
Overall, the combination of information criteria and residual diagnostics supports the Generalized Gamma distribution as the most appropriate choice for modeling intraday durations under the ACD(1,1) structure.
To quantify the fit in the tail of the distribution, the Expected Shortfall (ES) was computed at a 95% confidence level (
), representing the average of the most extreme 5% values; see
Table 15. From this table, we note that the Generalized Gamma model achieved an ES
95% of 3.9753, which is lower than the ES
95% of the Exponential model (4.0759) and the Weibull model (4.882). This result indicates that the Generalized Gamma model provides a better fit to the expected behavior of an
distribution.
Finally, to ensure that the models successfully captured all temporal dependence, the Cox–Snell residuals were tested for autocorrelation using ACF plots (
Figure 12) and the Ljung–Box test (
Table 16). The Ljung–Box test results show
p-values well above 0.05 for all models, so the null hypothesis of no autocorrelation is not rejected. The ACF plots in
Figure 12 visually support this, with most correlations falling within the confidence bands.