Cryptocurrency Forecasting: More Evidence of the Meese-Rogoff Puzzle

: This paper tests the random walk hypothesis in the cryptocurrency market. Based on the well-known Meese – Rogoff puzzle, we evaluate whether cryptocurrency returns are predictable or not. For this purpose, we conduct in-sample and out-of-sample analyses to examine the forecasting power of our model built with autoregressive components and lagged returns of BITCOIN, compared with the random walk benchmark. To this end, we considered the 13 major cryptocurrencies between 2018 and 2022. Our results indicate that our models significantly outperform the random walk benchmark. In particular, cryptocurrencies tend to be far more persistent than regular exchange rates, and BITCOIN (BTC) seems to improve the predictive accuracy of our models for some cryptocurrencies. Furthermore, while the predictive performance is time varying, we find predictive ability in different regimes before and during the pandemic crisis. We think that these results are helpful to policymakers and investors because they open a new perspective on cryptocurrency investing strategies and regulations to improve financial stability.


Introduction
Blockchain technology is an integral part of scientific progress, especially in finance and capital markets. Because of the development of cryptocurrencies, tokens, and nonfungible tokens (NFTs), this technology is changing the financial markets and the portfolios of institutional and retail investors. For instance, in October 2021, ProShares issued the first exchange-traded fund of Bitcoin. Today, this ETF manages a total amount of USD 728 million (statistics markets available in www.etf.com accessed on 30 May 2022). This phenomenon motivates deepening the understanding of the behavior of cryptocurrencies considered both as exchange rates and as financial assets for the short and long terms and, more importantly, in the evaluation of forecasting models for investment and its risk management.
It is a stylized fact in the financial literature that forecasting exchange rate returns is highly challenging. The authors of [1,2] find that exchange rate models exhibit poor forecasting performance, especially compared with simple naive models such as the random walk (RW) and the driftless random walk (DRW): This is known as the Meese-Rogoff puzzle (see Appendix A for a detailed discussion). While there is still debate in the literature on whether exchange rates are predictable or not, it is a fact that the RW and the DRW are considered the most challenging benchmarks to outperform; as commented in [3], "The toughest benchmark is the random walk without drift". We emphasize that the literature reports poor forecasting performance for exchange rates and financial returns in general (see [4][5][6]). length moving average rule models [36], hybrid neuro-fuzzy controller [37]; adaptive multilevel time-series detection methodology [38]; nonparametric causality-in-quantiles test [39]; and technical analysis to predict BITCOIN [17].

Momentum Effect and Cryptocurrency Predictability
The momentum effect is a change in the price generated by an overaction of investors. This phenomenon can provoke irrational behavior, especially in cryptocurrencies, where the fundamentals are opaque and confused. Some biases well reported in the literature are irrational behavior by investors such as positive feedback trading [23], overconfidence bias [40], herding behavior [41], general sentiment [42], conservatism bias [43], gradual diffusion of information [44], and trend following [45]. In other words, investors irrationally overreact to positive returns today by increasing their long positions, pushing prices higher tomorrow, so that positive returns today predict positive returns tomorrow and vice versa.
The authors of [18] argue that cryptocurrency returns follow a martingale in that they cannot be predicted. However, [19,20,22] find that the momentum factor has predictability for returns. In these works, momentum is measured by the lagged average returns of cryptocurrencies. For instance, [19] find that current coin market returns predict cumulative future coin market returns from one week to eight weeks ahead. This effect is related with investor attention. The researchers on [20] developed a dynamic cryptocurrency valuation model where the momentum effect is generated by the positive externality of the network effect that is not immediately incorporated into cryptocurrency price. Then, momentum arises because users have incorrect expectations about future prices. In this way, [19] show that the momentum effect is stronger for the relatively lowattention coins. In their results, the coefficient estimates for both the high-attention and the low-attention subgroups are positive, suggesting that there are time-series momentum effects for both groups but the effects are only significant in the low-attention group.

Dominance Effect and Cryptocurrency Predictability
The dominance effect is the influence of the lagged returns of the biggest cryptocurrency considering its market capitalization. The authors of [46] indicate that investment in Bitcoin has attracted considerable attention from investors and researchers in recent years. There are thousands of coins in the cryptocurrency market, but very few reach a sufficient market capitalization to be followed by investors and included in largevolume investment portfolios. Additionally, BITCOIN accumulates 40% of the total cryptocurrency market, while ETHEREUM accumulates 15%. This dominance effect is important because the cryptocurrency market is interconnected and synchronized. In [47], the authors find that BITCOIN and other altcoins are interdependent in the short and long terms using autoregressive distributed lags. This means that BITCOIN returns could have predictive power over other altcoins.
Furthermore, [48] find that BITCOIN has predictive power over other assets such as the SP500. Related to this, [36] show that BITCOIN spot volatility increased following the launch of BITCOIN futures contracts traded on the Chicago Board of Options Exchange (CBOE) and the Chicago Mercantile Exchange (CME). Similarly, [49] show a dynamic linkage of cryptocurrencies and BITCOIN.
In [50], the authors find that BITCOIN prices are positively related to those of other altcoins. Chang and Shi (2020) conclude that BITCOIN exhibits some predictive power over the cryptocurrency market. A possible explanation for this predictive ability is synchronization. (See [51] for an example of synchronization between stock indices.) Moreover, [47,52] pointed out that an interesting feature of the cryptocurrency market is the interconnection between BITCOIN and the rest of crypto assets. According to Coinmarketcap, in the last ICO crash in 2018, BITCOIN lost 80% of its value, while other altcoins lost 99%.

Materials and Methods
We consider daily closing prices of the 13 cryptocurrencies with the highest market capitalization (BNB, USDC, USDT, XRP, SOL, MATIC, LTC, WVTC, VOGE, VUSD,  ADAS, LUNA, USDT). Additionaly, we consider BITCOIN and ETHEREUM as the "dominant" cryptocurrencies used to construct our forecasting models. Our sample goes from 1 January 2018 through 13 April 2022 in daily frequency. We download our series from Yahoo! Finance. The beginning of our sample period is mainly determined based on data availability: most of the 13 cryptocurrencies were not regularly traded before 1 January 2018. Finally, we use exclusively closing prices in order to avoid any concerns about spurious autocorrelations induced by taking the averages of intra-day prices [53][54][55][56].
Our purpose is to analyze the predictability in the cryptocurrency market. To this end, we compare the random walk (henceforth RW) and the driftless random walk (henceforth DRW) benchmarks with two econometric models (see Equations (1) and (2)). There is a vast literature suggesting that the random walk (RW) and the driftless random walk (DRW) are the toughest benchmarks when forecasting exchange rates, especially for short forecasting horizons: This is known as the Meese-Rogoff puzzle. Our models consider autoregressive components and the lags of a dominant cryptocurrency (either BITCOIN or ETHEREUM; the number of lags is determined according to our empirical analysis of the cryptocurrency' correlations and the cross-correlations of cryptocurrencies and the two dominant cryptocurrencies, BITCOIN and ETHEREUM (see Appendices B and C, respectively).
where ∆ln( ) ≡ ln ( ) − ln ( −1 ) In the above equations, is the cryptocurrency price at time t, stands for the dominant cryptocurrency, either BITCOIN or ETHEREUM, and finally, 1, and 2, are error terms.
The number of lags considered in Equation (1) is not economically based but purely empirical. First, Appendix C presents the correlogram for each cryptocurrency; as expected, the correlation (in absolute value) rapidly decays with a greater lag. For instance, the average correlation between each cryptocurrency and its own first lag is surprisingly high: 0.17. Moreover, the average correlation of each cryptocurrency with its own second lag is 0.11. In the same line, the average correlation of the first four lags with each cryptocurrency is 0.10. In contrast, the average correlation with the fifth lag is considerably lower (0.04), and it is exponentially lower for longer lags (smaller than 0.03). Appendix D indicates a similar result: the cross-correlations between each cryptocurrency and the dominant cryptocurrencies rapidly decay with longer lags. For instance, the average correlation (in absolute value) between the first lag of BTC and the target cryptocurrencies is 0.06; in contrast, the average correlation with the fifth lag of BTC is about 0.02.
Second, Tables 1-3 suggest that the fourth lags may be playing a role, at least for some regimes. In particular, the 4th lag (either the autoregressive component or BTC) is statistically significant for at least 1 regime in 9 out of 13 cryptocurrencies (about 70%).
For instance, for UST, at least one of the two fourth lags is significant in all the four regimes.
Third, in unreported results (available upon request), we estimate a model similar to Equation (1) but this time considering up to six lags. Out of the 52 new coefficients included, only 6 are statistically significant in sample (about 11%), suggesting very limited predictive content in those longer lags.
Fourth, in unreported results (available upon request) we compare the Schwarz information criteria (SIC) for three different specifications: our model (four lags), a specification with five lags, and a specification using up to six lags. In 10 out of 13 cryptocurrencies, the specification with four lags exhibits the best SIC, suggesting the best trade-off between fit and number of parameters. Additional analyses using other information criteria like Akaike information criteria (AIC) and Bayesian information criteria (BIC) lead to qualitatively similar conclusions, and they are available upon request.
Finally, while it is true that we could analyze specifications with even more lags, it is well-known that parsimony is a virtue when considering forecasting models. As commented by [57,58], the inclusion of too many parameters may pollute the out-ofsample forecast of the nesting model. For this reason, we disregarded more complex models with higher numbers of lags and parameters.

In-Sample Test
In this section, we describe our in-sample analyses. Since the relevant benchmarks are the RW and the DRW, using Equation (1), we test the null hypothesis, = 1 = 2 = 3 = 4 = 1 = 2 = 3 = 4 = 0, through a simple Wald statistic. This null hypothesis posits that our model is a simple RW.
Additionally, we study the t-statistic associated with each one of the coefficients associated with the lags of the cryptocurrency and the lags of bitcoin. It is generally accepted that the prices of financial assets are unit-root processes and their first differences (log-returns) are covariance stationary. In this sense, in order to properly apply the central limit theorem, we require a proper estimation of the long-run variance; to this end, we use the HAC standard error, as suggested by [59,60]. In particular, [60] propose a Bartlett kernel to ensure a positive definite variance matrix, and [59] consider an automatic selection for the lag truncator parameter.
Finally, we concluded our in-sample analysis with multiple unknown break regressions. The rationale of this exercise is that there is a long tradition in the literature of reporting instabilities in the predictive performance of forecasting models; in other words, many papers find evidence of predictability, but this evidence is sporadic and unstable, appearing as "pockets of predictability" [4]. For this reason, we explore the possibility of multiple unknown breaks in the parameters in Equation (1) (as commented by [61], this is a simple case of a time-varying coefficients model).

Out of Sample Test
While in-sample analyses provide interesting evidence about the predictive relationship in cryptocurrencies, they are not exempt from criticisms from a methodological standpoint. First, in-sample analyses are not really emulating forecasts in real time. Second, as pointed out by [63], they are prone to overfitting. As commented by [3], page 1253: "A second area where in-sample fit does not provide reliable guidance for out-ofsample forecasting ability is when predicting exchange rates. [1,64] have shown that, although models of exchange rate determination based on traditional fundamentals fit well in sample, their forecasting performance is much worse than a simple, a-theoretical random walk model." To partially overcome these problems, we conduct an out-of-sample analysis.
In the context of nested models, we compare the predictive performance of our model (see Equation (2)) against the RW and DRW benchmarks. The null hyphotesis, 0 : 1 = 2 = 3 = 4 = 1 = 2 = 0, indicates that our model reduces to a RW. To conduct the out-of-sample analyses, we split our sample into two windows: an estimation window of size R and an evaluation window of size P; of course, T = P + R, where T is the total number of observations. Even though one of the arguments for conducting out-of-sample analyses is to avoid overfitting, [65] pointed out that the results from only one ad hoc window size may still be highly controversial. This is so because predictability may be confined to one particular subsample and hence not robust to different window sizes. To avoid this concern, we consider two completely different window sizes. In the first evaluation, we use R = (1/2) × T, which is equivalent to estimating our model with 782 observations, and compare the prediction with the RW and DRW models with 782 observations. In the second exercise, we use R = (1/3) × T, which employs 521 observations to estimate our model and 1044 observations to evaluate its forecasting power respect to the RW and DRW models. In the following we describe the out-ofsample test considered in this paper.
The intuition of this test is as follows. Consider a "combined forecast," where the combination is simply the weighted average of the two competing forecasts. Suppose that the weights are assigned to minimize the mean squared prediction error (MSPE). If we set the entire weight to one forecast, then we say that this forecast "encompasses" the other, but if the optimal weight is a combination of both forecasts, then none of the forecasts encompasses the other. In the specific case of nested models, a rejection of the null hypothesis of no encompassing means that a combination of the forecasts from the nested and nonnested models is better (in terms of MSPE) than either individual forecast. See [12,67,68] for more insights into the interpretation of the ENC-t.
The main caveat of the asymptotic theory of [66] is that it remains silent about the effects of parameter uncertainty on the asymptotic distribution of the test. In other words, [67] assume that forecasts are primitives of the forecasting problem, but the forecast constructed by models with estimated parameters is polluted with estimation error. To address this point, [69] develops an asymptotic theory that accommodates parameter uncertainty and shows that the ENC-t is asymptotically normal. Nevertheless, the idea of [69] is valid exclusively for nonnested models; if models are nested (as our case), the asymptotic distribution of the ENC-t becomes degenerate under the null hypothesis.
The critical values of the test depend on the number of excess parameters in the nesting model, on the scheme used for updating our parameters (either rolling, recursive, or fixed), and on the ratio P/R. Let = , 2 be the number of excess parameters in the nesting models and a ( 2 × 1) vector of Brownian motions with a covariance kernel equal to the identity matrix. The limit distribution of the − is simply .
The authors of [70] provide critical values for a comprehensive set of P/R and k2. When our exercises match the cases considered by [70], we use their critical values. If not, we simulate these distributions and obtain the corresponding critical values (see Appendix D for the critical values considered).

Wild Clark and West Test (WCW)
The authors of [57,58] consider the comparison in MSPE between nested models. Even though the null hypothesis of equal predictability should imply that both models have the same MSPE, the authors demonstrate that the sample MSPE of the nesting (longer) model is greater than the nested (smaller) model. In this sense, small parsimonious models such as the DRW and the RW have a natural advantage in terms of MSPE over nesting models. In other words, under the null hypothesis, the larger model introduces "additional noise" into its forecasts because of parameter uncertainty (parameters need to be estimated). Clark and West (CW) propose adjusting this upward shift of the sample MSPE in the alternative model, that is, using an adjusted MSPE. Notably, this adj-MSPE has an equivalence with the ENC-t, even though the interpretation of this adj-MSPE is not encompassing. Furthermore, while CW does not provide the asymptotic distribution of this adj-MSPE, they argue that it is approximately standard normal.
In [12], PHM propose a simple modification of the test by CW that ensures asymptotic normality: they label their test "wild Clark and West" (WCW). The critical point of their approach is to introduce an independent normal i.i.d random variable with [ ] = 1, and a "small" variance [ ] = 2 , that prevents the test from becoming degenerate under the null hypothesis of equal predictive ability. Moreover, this random variable keeps the asymptotic distribution of WCW centered around zero and eliminates the autocorrelation structure. Even though the asymptotic theory of [69] does not apply to CW, it does apply to the WCW, as the variance of the statistic remains positive under the null hypothesis. PHM show that the parameter uncertainty in WCW is "asymptotically irrelevant"; hence, the WCW is extremely simple to compute. The authors report simulations and empirical illustrations in which WCW tends to exhibit better size properties than CW but a slight disadvantage in terms of power; in other words, WCW is a more conservative test.

Correlation Test
In two recent papers [13,14], PH argue that traditional MSPE comparisons may be misleading: When some efficiency conditions are not met, the forecast with a higher correlation with the target variable may also display the lowest MSPE. Through a simple decomposition, the authors show that the null hypothesis of equal correlations is not necessarily equivalent to a null hypothesis of equal MSPE. They name this opposite behavior the MSPE Paradox. Moreover, PH find that a useless forecast with no relationship with the target variable whatsoever may also be more accurate than a forecast displaying a positive correlation.
As a complementary approach to MSPE comparisons, PH propose a simple correlation-based test to compare two competing forecasts. Based on weak assumptions of dependency, their test directly applies the delta method and the central limit theorem for stationary series [71,72]; hence, their test is asymptotically normal.
To implement the correlation test, we use a forecast X (built with Equation (2)) and a forecast Z (obtained from the RW benchmark), and we let Y be the target variable (the realized cryptocurrency returns). (Note that when using rolling windows regressions, the forecasts from the Random walk are not constant; hence they exhibit positive variance.) Let and be the correlations of Y with X and with Z, respectively. Then, the null hypothesis is simply 0 : ZY = XY . Under this null, both forecasts have the same correlation with the target variable Y: where ̂= ∇ĥ ′ ∇̂′[∑ Ω j ∞ =−∞ ] ∇̂∇ĥ, ZY , and r XY stand for the sample correlations of Z and X with Y, respectively, and 2 is the sample variance of the target variable, and are the sample standard deviations and covariances, respectively, where , and stand for the sample mean of Z, X, and Y, respectively and Ω j = . PH show that this statistic is standard normal, and they suggest estimating the 7 × 7 long-run variance matrix ∑ Ω j ∞ =−∞ through a HAC estimator such as in [59,60]. We emphasize that this test does not compare forecasting models. As a consequence, we are just evaluating the correlations of our forecasts without addressing the implications of our parameters.

Trading Strategy
Based on a simple modification to [73], [74] (PHB) provide a simple out-of-sample test of the random walk hypothesis: the test evaluates the particular case in which the benchmark model is a zero-mean forecast. The trading strategy is to buy the cryptocurrency if ≥ 0 (where is the forecast of the cryptocurrency return) and sell the cryptocurrency otherwise. The investor modifies its decision at each period based on the latest forecast; then, the one-period return is +1 = ( ) +1 where ( ) = 1 if ≥ 0 and ( ) = −1 otherwise. In our exercises, the straightforward excess profitability test (SEP) by PHB evaluates the random walk hypothesis; hence, we test the null hypothesis 0 : 1 = 2 = 3 = 4 = 1 = 2 = 0. Moreover, the core statistic of the SEP has an interesting interpretation: it provides a sense of the economic and financial implications of the predictive ability. Under the null hypothesis, the cryptocurrencies returns are martingale difference sequences; thus, PHB shows that where ̂ is a consistent estimator of the variance of . In summary, the core statistic of the SEP is just the average return of a trading strategy based on the forecast of the nesting model, and a rejection of the null hypothesis has an interesting implication of a rejection of the martingale difference hypothesis. Table 1 reports our in-sample results using OLS with a HAC estimator of the longrun variance, following [59,60]. We show that cryptocurrencies exhibit considerable persistency in contrast to traditional exchange rates. For example, 67% of cryptocurrencies have at least one statistically significant autoregressive component. Moreover, the first three lags are statistically significant in some cryptocurrencies: USDC, USDT, WVTC, VUSD, LUNA, and UST. The only cryptocurrencies that do not seem to behave like an autoregressive time series are BNB, XRP, LTC, ADA, and ETH, in which we do not reject the null hypothesis for any of the first four lags.   Notes: Z (−1), Z (−2), Z (−3), and Z (−4) represent the first, second, third, and fourth lags of the oneday returns of the cryptocurrency, respectively. BTC (−1), BTC (−2), BTC (−3), and BTC (−4) are the first, second, third, and fourth lags of the one-day returns of BITCOIN, respectively. The results using ETHEREUM are qualitatively the same, and they are available upon request. Table 1 exhibits estimations of Equation (1), with HAC estimators of the long-run variance, according to [59,60]. We do not report the constant term for space. Standard deviation in parenthesis. * p < 10%, ** p < 5%, *** p < 1%. Source: Authors' elaboration.

In-Sample Analysis
Another interesting result responds to the "dominance effect," i.e., the BTC exhibits some predictive ability over minor cryptocurrencies. In 50% of our exercises, we reject the null hypothesis for at least one coefficient associated with lags of BTC; for instance, in some cryptocurrencies like WVTC, VOGE, and LUNA, we reject the null hypothesis for the first two lags of BTC at least at the 5% significance level. Notably, the predictive ability of BTC goes beyond the persistency of the time series. For example, while ADA and ETH do not exhibit strong persistency (we do not reject the null for any of the autoregressive components), we reject the null for some lag of BTC.
Third, the Wald statistic rejects the null hypothesis of 0 : 1 = 2 = 3 = 4 = 1 = 2 = 3 = 4 = 0 in 13 out of 15 of our exercises, with most of these rejections being significant at 5%; in other words, either the own lags of the cryptocurrency or the lags of BTC seem to improve the accuracy of our models (at least in sample).
Finally, the 2 ranges from approximately 1% (BNB, XRP, LTC, ADA, and BTC) to an outstanding 37% (VUSD). Notably, 40% of our cryptocurrencies exhibit an R 2 over 5%: this is a notably high 2 considering that typical exchange rate models generally do not show high 2 . It is a well-known fact in the financial literature that in-sample R 2 in predictive regressions tend to be quite small: put simply, it is expected that financial returns exhibit a large portion of unpredictable variance. As commented by [75]: "It should be noted that the R2 statistic for the predictive regression […] is usually quite small; for example, less than 5% for monthly stock returns. This simply indicates that stock returns (and asset returns more generally) contain an intrinsically large unpredictable component, so that-at the risk of stating the obvious-it is extremely difficult to predict returns." [75], page 8. In this sense, our results are quite encouraging respect to the economic forecasting literature.
In the context of commodity prices forecasting, [61] report in-sample R 2 relatively similar to ours, claiming robust results for predictive ability: "The highest R 2 is found for copper (7.3%) and the lowest for nickel (1.1%). All in all, our monthly results suggest an interesting ability of the Chilean peso to predict base metals returns other than nickel." [61], page 259. Again, compare to their results, our results in most cases seem to be significantly stronger.
All in all, the in-sample analyses provide sound evidence that: 1. Cryptocurrencies do not seem to behave like random walks. 2. Cryptocurrencies tend to be far more persistent than regular exchange rates.
3. BTC appears to improve the predictive ability of our models in some of the cryptocurrencies.
To give some intuition about these instabilities, Tables 2 and 3 report our coefficients allowing for multiple regimes, where the time of the break is unknown and determined according to the UDMax statistic proposed by [9]. Several features in Table 4 are worth mentioning. Table 2. The in-sample results for one-step-ahead forecasting with breaks.
(1)  Notes: Breaks are determined according to the UD-Max statistic of [9]. Each panel report estimates our parameters for a different regime. Table 2 reports the results of Equation (1) for the last three regimens. We consider a HAC estimator according to [59,60]. The results using ETHEREUM are qualitatively the same, and they are available upon request. We do not report the constant or the standard deviations to save space. * p < 10%, ** p < 5%, *** p < 1%. Source: Authors' elaboration. Table 3. Additional in-sample results for one-step-ahead forecasting with breaks.  Notes: Breaks are determined according to the UD-Max statistic of [9]. Each panel report estimates of our parameters for a different regime. Table 4 report results of Equation (1) for the last three regimens. We consider a HAC estimator according to [59,60]. The results using ETHEREUM are qualitatively the same, and they are available upon request. We do not report the constant nor the standard deviations to save space. * p < 10%, ** p < 5%, *** p < 1%. Source: Authors' elaboration.
Except for SOL, LTC, and ADA, all the cryptocurrencies display a significant break in their coefficients; in this regard, it is safe to say that our parameters exhibit substantial changes in time. Specifically, in 62% of these cryptocurrencies, we find three or more regimes consistent with the literature's predictive instabilities.
In some cases, we notice strong swings of sign and magnitude in the coefficients: for instance, in the first regime, the first autoregressive component for BNB is positive and significant, and the first BTC lag is negative and significant. In contrast, both coefficients remain significant in the second regime but with the opposite sign. We highlight the results of BNB reported in Table 1 when none of our predictors appear statistically significant. Contrarily, in our breaks analysis, these strong swings in the sign of the coefficients seem to cancel out when performing our OLS regressions with the whole sample. See [3,80,81] for a detailed discussion on Granger-causality with parameter instabilities and exchange rate forecasting robust to instabilities.
Along the lines of [82], we observe strong breaks in the neighborhood of the COVID-19 pandemic crisis; consistent with the authors, we notice an increment in the magnitude and significance of our parameters (e.g., USDC, USDT, and VUSD) during this period. While there is sound evidence of predictability during the pandemic, we emphasize that there is predictive ability in multiple regimes and not exclusively for the pandemic crisis. For example, USCD and USDT reject the null hypothesis for the first autoregressive components in all the regimes (before COVID-19 and during COVID- 19).
We emphasize that the reason to perform the analysis with structural breaks is to highlight the role of instabilities. For instance, Table 1 suggests that BTC has no role in Granger-causing UST, whereas Tables 2 and 3 reveal that in three out of four regimes, at least one lag of BTC is statistically significant. In other words, the in-sample analysis using the whole sample might hide some sporadic predictability because of the coefficients' strong swings in sign and magnitudes. In this sense, we argue that a real-time forecast might indeed capture some of this sporadic predictability.
In [9], Bai and Perron propose many tests for structural breaks with unknown dates. Roughly speaking, there are two types of procedures: sequential estimations for breaks and global vs. no estimations for breaks. Bai and Perron conclude that all their approaches have decent properties but that there are situations in which the former type of tests may lack power. This is evident from their simulations and empirical illustrations, as they comment: "There are instances where the sequential procedure can be improved. The problem is that, in the presence of multiple breaks, certain configurations of changes are such that it is difficult to reject the null hypothesis of 0 versus 1 break but it is not difficult to reject the null hypothesis of 0 versus a higher number of breaks (this occurs, for example, when 2 changes are present and the value of the coefficient returns to its original value after the second break). In such cases, the sequential procedure breaks down. A useful strategy is to first look at the UD max or WD max tests to see if at least one break is present. If these indicate the presence of at least one break, then the number of breaks can be decided based upon a sequential examination […]" page 15. In this sense, a sequential approach will lead to fewer breaks than our approach; this is the reason we choose a global approach. We acknowledge, however, that we could also use the WDMax test. In unreported results, we considered the WDMax statistic for robustness. While the WDMax generally detect more breaks, the general message remains the same: "the cryptocurrency predictability is highly unstable".
All in all, Tables 2 and 3 exhibit some exciting results: 1. The coefficients are time varying with significant swings in sign and magnitude.
2. These swings in sign suggest that some of the results of no predictability in Table 1 may be misleading: some coefficients are canceling out. 3. The predictive ability is not confined to the pandemic crisis but appears in many regimes. Table 4 reports our results for the ENC-t test, using Equation (2). In each panel, we report our results compared with the RW and the DRW considering BTC or ETH as the dominant cryptocurrency. Panel A presents the results with R = 1/3 × T, while Panel B uses R = 1/2 × T.

Out-of-Sample Analysis
The results using BTC or ETH are qualitatively the same, with minor differences in terms of predictability. In specific, the ETH (BTC) model outperforms the DRW in 16 (15) out of 26 exercises, while the ETH (BTC) model outperforms the RW in 14 (13) out of 26 exercises. This evidence of predictability is robust using both BTC and ETH irrespective of how we split our data (R = 1/3 × T or 1/2 × T): For instance, for the case of USDC, USDT, VUSD, ADA, and LUNA, we reject the null hypothesis of no predictability in all our exercises. Figure 1 shows our out-of-sample forecast for ADA, highlighting its good performance. Some cases of predictability seem to be unstable and sporadic. An example is UST, where we find strong evidence of predictability for R = 1/2 × T (we reject the null hypothesis at the 1% significance level in all cases), but this significance disappears for R = 1/3 × T. Finally, some cryptocurrencies have no predictability in any of the exercises: BNB, XRP, MATIC, and VOGE. Table 5 is akin to Table 4, but we report the WCW statistic this time. Similar to Table  4, the ETH (BTC) model outperforms the DRW in 14 (15) out of 26 exercises. Moreover, the ETH (BTC) model outperforms the RW in 12 (12) out of 26 exercises. For example, compared with the ENC-t, we find a stronger predictive ability for VOGE but weaker evidence for ADA.  (2) using either BITCOIN (BTC) or ETHEREUM (ETH), against either the random walk with drift (RW) or the driftless random walk (DRW). * p < 10%, ** p < 5%, *** p < 1%. Source: Authors' elaboration.
The results between both tests are similar with a slight edge for the ENC-t. We reject the null hypothesis in 56% of our exercises using the ENC-t, and we reject the null hypothesis 52% of the times using the WCW. This result is consistent with the WCW being a "conservative" test of predictability (as noticed by PHM, the WCW seems to be correctly sized at the cost of deteriorating some of its power). Overall, both Tables 4 and 5 exhibit sound evidence of the out-of-sample predictive ability of several cryptocurrencies despite the well-known fact that exchange rates are difficult to forecast. Table 6 reports the differences in correlations with the target variable (realized returns) between our forecast (Equation (2)) and the random walk. First, notice that 92% report a positive difference; in other words, the correlations of our forecasts are higher than the correlations of the historical sample mean. The only exception is XRP, in which the differences are not statistically significant.
Second, the differences in correlations are striking in 5 out of 13 cryptocurrencies (USDC, USDT, VUSD, LUNA, and UST). This result is robust to both the model and estimation window R. Finally, the model that considers lags of ETH statistically outperforma the benchmark in 14 out of 26 exercises, suggesting that some cryptocurrencies are far more predictable than the random walk. Table 6. The out-of-sample results using a correlation test for different benchmarks and estimation windows. Notes: Each entry reports the differences in correlations with the target variable between a forecast built using Equation (2) and the RW (historic average). The null hypothesis of equal correlations with the target variable is evaluated according to the correlation-based test proposed by [13]. This test is two-tailed with standard normal critical values. The long-run variances are estimated according to [59,60]. * p < 10%, ** p < 5%, *** p < 1%. Source: Authors' elaboration. Table 7 reports the annualized returns of a trading strategy based on our forecasts. The null hypothesis is that the cryptocurrencies are DRW, and we consider the SEP test by PHB, which is asymptotically normal. First, most of our trading strategies display a positive return, highlighting this predictive ability's economic implication. In 88% of our exercises, the trading rule based on our forecasts yields positive returns. Another piece of compelling evidence is that the model built with lags of ETH statistically outperforms the DRW in 16 out of 32 exercises.

Trading Strategy
Complementarily, for some cryptocurrencies, we find that this predictive ability (using the ETH) is robust to the sample division R. In particular, we find that our model outperforms the DRW in both windows for 7 out of 13 cryptocurrencies: BNB, USDC, USDT, VOGE, VUSD, LUNA, and UST.
Finally, we notice a substantial financial benefit to following the trading rule based on our forecasts for some of our cryptocurrencies. Similarly, [49] found that some cryptocurrencies increased their value significantly during the COVID-19 pandemic. Table 7. The out-of-sample results using a trading strategy for different benchmarks and estimates windows. Each entry report the annualized financial profit from a trading strategy based on our forecasts. The trading strategy is inspired by [73]. The null hypothesis posits that cryptocurrency returns are DRW. We consider the straightforward excess profitability test in [74] to evaluate the null hypothesis; this is a one-sided test with standard normal critical values. * p < 10%, ** p < 5%, *** p < 1%. Source: Authors' elaboration.

Conclusions
Cryptocurrencies are not a random walk. We find predictability in several of the major cryptocurrencies. This is remarkable considering the extensive evidence of no predictability in financial markets.
Notably, our models significantly outperform the random walk in many exercises. Furthermore, this predictability is not only confined to the COVID19 pandemic. The insample tests show that 67% of cryptocurrencies are predicted by their own lags, consistent with a "momentum effect." Interestingly, 50% of our exercises exhibit predictability with BITCOIN's lags, confirming the existence of the "dominance effect." Nevertheless, both effects display time-varying features. Furthermore, we report significant breaks in regression coefficients; for instance, 62% of the cases have three or more regimes.
The out-of-sample results confirm the predictive evidence. For example, using the ENC-t test, our models outperform the DRW in 16 out of 26 exercises. Additionally, in 9 out of 13 cases, the correlation between our forecast and the realized returns is significantly higher than a naive benchmark. Finally, in 88% of our exercises, the trading rule based on our forecasts yields positive returns.
It is a well-known fact in the economic forecasting literature that forecasting financial returns is highly challenging. This phenomenon is so because financial returns usually exhibit a sizable unpredictable component; in this sense, even the best forecasting model may explain only a small part of future returns. Moreover, as [4] commented, the competition between professional forecasters and traders has the exciting implication of eliminating potential forecasting abilities in given models. The Meese-Rogoff puzzle posits that most exchange rate models fail to outperform simple naive benchmarks as the random walk. Interestingly, our paper shows that cryptocurrencies have a substantial predictable component. While we are not settling the debate, we contribute sound evidence of predictability in cryptocurrencies.
We present many implications. For example, we included many stablecoins in our analysis, which is very useful for rethinking models of financial stability, even more so with the growing interest of central banks in issuing digital money. There are also investor implications. Our results should motivate computer scientists in finance to develop algorithms for trading cryptocurrency because we demonstrate predictability in this asset class. In this sense, for future research, we think that would be an exciting reply to our exercises with high-frequency prices and the development of trading algorithms with multivariate methodologies to improve forecasting.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Is the Random Walk the Proper Benchmark for Cryptocurrency Forecasting?
We argue that the random walk is a proper benchmark for this exercise. We discuss in detail the following six arguments: (1) While financial assets may not be factually random walks, the literature suggests that economic models fail to outperform a random walk in terms of mean squared prediction error (MSPE). This puzzle is supported here by recent literature on the subject. (2) Some rational expectations models have the interesting implication of exchange rates being near-random walk models. In this sense, even if exchange rates are not random walks, they may behave similarly to them. (3) Cryptocurrencies may be seen both as exchange rates and financial assets. In this sense, the literature on financial returns is similar: most asset pricing models fail to forecast better than a simple random walk in terms of MSPE. (4) Most of the predictability claims seem sporadic: rarely does an econometric model robustly outperform the random walk in different time periods. (5) Even the detractors of the Meese-Rogoff puzzle acknowledge that the random walk is a challenging benchmark to outperform in MSPE. Finally, (6) cryptocurrencies have no clear economic fundamentals; hence, these "hard to value" assets may not have an economic-based benchmark.
(1) First, we are not arguing that exchange rates are indeed random walks; for instance, the literature proposes a plethora of variables (i.e., fundamentals) that should be related to exchange rates, such as purchasing power parity (PPP), inflation, output, prices of commodity exports, productivity, and monetary-driven models. In this sense, some of these variables may explain some portion of the variance of the exchange rates (hence, exchange rates are not factually random walks). Nevertheless, the stylized fact in the literature is that these models fail to outperform the random walk in terms of MSPE. Put simply, even if exchange rates are not random walks, it is well-known that economicbased models do not produce better forecasts than a simple random walk; this is the socalled Meese-Rogoff puzzle [1]. While many years have passed since Meese and Rogoff's seminal paper, this is still an open debate, as commented by [83] in the handbook of economic forecasting: "The inability of macroeconomic models to generate point forecasts of bilateral Exchange rates that are more accurate than forecasts based on the assumption of a random walk, particularly at horizons shorter than one year, is known as the "exchange rate disconnect puzzle" (Obstfeld & Rogoff, 2001). Since it was first established by , this puzzle has mostly resisted a quarter-century of empirical work attempting to overturn it, and it remains one of the best-known challenges in international finance. Along those lines, [83,84] pointed out that this puzzle "has proven robust over the decades", and that it is "the most researched puzzle in macroeconomics". Some papers providing an overview of this literature with similar conclusions are those of [83][84][85][86][87].
While we agree with the referee that exchange rates are not random walks (random walks are just parsimonious representations of their dynamics), it is our reading of the literature that the random walk is the proper benchmark when forecasting exchange rates with econometric models. As summarized by [83], "Meese and Rogoff (1983) spawned a leap in research attention as scholars attempted to take up the challenge of developing new models to beat a random walk for Exchange rates. However, it is not clear that we have learned much since the 1980s other than it is still quite challenging to construct a model that is capable of systematically outperforming a random walk in predicting future spot exchange rates." Along these lines, [3] pointed out that,"[…] it has been well known that exchange rates are very difficult to predict […]; in particular, a simple, a-theoretical model such as the random walk is frequently found to generate better Exchange rate forecasts than economic models" [3], page 1063. Moreover, in the context of exchange rate forecasts, Rossi concludes that "The toughest benchmark is the random walk without drift." (2) Second, many rational expectations models have the interesting implication of exchange rates being near-random walk models. In this sense, a natural implication of asset pricing models (e.g., [88]) is simply that random walks should be very difficult models to outperform. As pointed out by [89], "We show analytically that in a rational expectations present-value model, an asset price manifests near-random walk behavior if fundamentals are I(1) and the factor for discounting future fundamentals is near one. We argue that this result helps explain the well-known puzzle that fundamental variables such as relative money supplies, outputs, inflation, and interest rates provide little help in predicting changes in floating exchange rates. As well, we show that the data do exhibit a related link suggested by standard models-that the exchange rate helps predict these fundamentals. The implication is that exchange rates and fundamentals are linked in a way that is broadly consistent with asset-pricing models of the exchange rate.
[…] We also show theoretically that under some empirically plausible circumstances the inability to forecast exchange rates is a natural implication of the models." (3) Third, the inability of econometric models to outperform a simple random walk is not exclusively observed in the exchange rates literature but more generally in financial returns. For instance, [6] studied the predictive performance of 17 variables proposed by the academic literature as predictors of the equity premium. Notably, they found that none of the variables could significantly outperform a simple random walk in terms of MSPE: "As of the end of 2005, most models have lost statistical significance, both IS and OOS. OOS, most models not only fail to beat the unconditional benchmark (the prevailing mean) in a statistically or economically significant manner, but underperform it outright. If we focus on the most recent decades, that is, the period after 1975, we find that no model had superior performance OOS and few had acceptable performance IS." [6], page 1504. Moreover, in a recent update of their original paper, [90] examine 29 predictors proposed by the literature after the publication of [91], as well as the 17 original variables. The authors conclude that "Overall, the predictive performance remains disappointing." Moreover, the authors claim, "We remain comfortable with the original claims in Goyal and Welch (2008). Standing here today in 2021, even as risk-neutral investors willing to take on more risk, we do not believe that we know what variables should help us today to predict the equity premium forward-looking for 2022." [91], page 32.
(4) Fourth, it is true that some papers have found some success outperforming the random walk; however, this predictability is usually unstable and short-lived. In other words, they find that there is no predictability "on average" but rather sporadic predictability. For instance, [4] pointed out that, most of the time, financial returns are not predictable. In particular, the author concludes that there are "pockets of predictability" in which some predictors may outperform the random walk in the moment, but on average, none of the models consistently outperform this benchmark (just to emphasize the weak results found by [4], the title of his paper is "Elusive Return Predictability"). Along the same lines, [90] addresses the problems of forecasting under instabilities. She concludes that rarely does a predictor outperform a simple random walk for every period: "[…] some predictors would have been potentially useful, but, at the same time, those that would have been useful at the time of the crisis were different from those in normal times", page 5.
(5) Fifth, most of the Meese-Rogoff puzzle detractors, such as [92][93][94][95][96], argue that exchange rates may be predictable if we consider different loss functions (other than MSPE). In other words, they argue that investors are not really interested in MSPE, and other loss functions such as mean directional accuracy may suggest some degree of predictability. While we partially agree with Moosa and Burns in this observation (this is the reason we consider the tests by [13,55,74]), we emphasize that the authors acknowledge that the random walk is a difficult benchmark to outperform in terms of MSPE: "We suggest that a simple explanation for the puzzle is the use of the root mean square error (RMSE) to measure forecasting accuracy, presenting a rationale as to why it is difficult to beat the random walk in terms of the RMSE." In this sense, the observation that the random walk is challenging to outperform in MSPE remains even for detractors of the Meese-Rogoff puzzle.
(6) Finally, another approach could be constructing benchmarks based on their fundamentals (similar to [89]). In our opinion, there are two caveats to this approach. First, as commented before, overwhelming evidence reports that these models tend to be inaccurate in predicting exchange rates, especially compared with a simple random walk. Second, while extensive literature discusses the fundamentals of an exchange rate (e.g., inflation, output, productivity, etc.) or a financial stock (e.g., cash flows, dividend-price ratios, etc.), it is unclear which is the proper fundamental of a cryptocurrency. As commented by [17], "Cryptocurrencies' fundamental source of intrinsic value remains unclear.
[…] unlike cash flows from more typical financial assets, such as stocks and bonds, cryptocurrencies' fundamentals have few, if any, publicly available predictive signals such as analyst coverage […]. We refer to fundamentals with these characteristics of uncertainty, opacity, disagreement, and lack of predictive information as "hard to value." [17], page 107. Table A1 reports descriptive statistics of daily log-returns of all the 15 cryptocurrencies. Table A1. Descriptive Analysis.