Do Cryptocurrency Prices Camouﬂage Latent Economic Effects? A Bayesian Hidden Markov Approach †

: We study the Bitcoin and Ether price series under a ﬁnancial perspective. Speciﬁcally, we use two econometric models to perform a two-layer analysis to study the correlation and prediction of Bitcoin and Ether price series with traditional assets. In the ﬁrst part of this study, we model the probability of positive returns via a Bayesian logistic model. Even though the ﬁtting performance of the logistic model is poor, we ﬁnd that traditional assets can explain some of the variability of the price returns. Along with the fact that standard models fail to capture the statistic and econometric attributes—such as extreme variability and heteroskedasticity—of cryptocurrencies, this motivates us to apply a novel Non-Homogeneous Hidden Markov model to these series. In particular, we model Bitcoin and Ether prices via the non-homogeneous Pólya-Gamma Hidden Markov (NHPG) model, since it has been shown that it outperforms its counterparts in conventional ﬁnancial data. The transition probabilities of the underlying hidden process are modeled via a logistic link whereas the observed series follow a mixture of normal regressions conditionally on the hidden process. Our results show that the NHPG algorithm has good in-sample performance and captures the heteroskedasticity of both series. It identiﬁes frequent changes between the two states of the underlying Markov process. In what constitutes the most important implication of our study, we show that there exist linear correlations between the covariates and the ETH and BTC series. However, only the ETH series are affected non-linearly by a subset of the accounted covariates. Finally, we conclude that the large number of signiﬁcant predictors along with the weak degree of predictability performance of the algorithm back up earlier ﬁndings that cryptocurrencies are unlike any other ﬁnancial assets and predicting the cryptocurrency price series is still a challenging task. These ﬁndings can be useful to investors, policy makers, traders for portfolio allocation, risk management and trading strategies.


Introduction
What are cryptocurrencies? How do they compare to traditional financial instruments? Are they like traditional money, like commodities, a hybrid of the former or an utterly new type of asset that merit their own definition and understanding? Early research, mainly focusing on Bitcoin (henceforth BTC), provides mixed insights. While the creation of new BTCs resembles the mining process of gold-or precious metals in general-its attributes clearly differentiate it from conventional commodities [1]. The claim that BTC is fundamentally different from valuable metals like gold is also backed by Klein et al. [2] due to its shortage in stable hedging capabilities. Along with [3], Cheah and Fry [1] also argue that standard economic theories cannot explain BTC price formation and using data up to 2015, they provide evidence that BTC lacks the qualities necessary to be qualified as money. However, using GARCH models, Dyhrberg [4] demonstrates that BTC has similarities to both gold and the US dollar (USD) and somewhat surprisingly that it may be ideal for risk-averse investors. Also, while the BTC is useful to diversify financial portfolios-due to the negative correlation to the US implied volatility index (VIX)-it otherwise has limited safe-haven properties [5][6][7]. Using data from a longer period (between 2010 and 2017), Demir et al. [8] conclude the opposite, namely that BTC may indeed serve as a hedging tool, due to its relationship to the Economic Policy Uncertainty Index (EUI).
The fact that cryptocurrencies are different from any other asset in the financial market is further supported by [9][10][11]. High volatility, speculative forces and large dependence on social sentiment at least during its earlier stages-as measured by social media and Internet data (Google trends, Wikipedia searches and Twitter posts)-are qualified by many as some of the main determinants of BTC prices [12,13]. Yet, a large amount of price variability remains unaccounted for. Moreover, the proliferation of cryptocurrencies other than BTC that are supported by different technologies, i.e., variations of the standard Proof-of-Work distributed consensus of the BTC blockchain, e.g., [14], calls for a more comprehensive research approach. Despite the high documented correlation in the price of the various cryptocurrencies, it is highly debated whether this trend will also continue into future or not [15,16].
In the present paper, we make an effort towards understanding the correlation between a set of traditional assets and cryptocurrencies. We adopt an economic/financial perspective and use a set of 14 financial and economic predictors comprising main exchange rates (4 variables), equity indices (4 variables), commodity future prices (oil and gold) and economic uncertainty indicators (2 variables) along with 2 quasi-economic and 2 cryptocurrency specific variables: the hash rate which captures the amount of investment on mining equipment and hence accounts for the economic size of the network and the average block size which implicitly measures the amount of transactions and hence the activity in the respective cryptocurrency. All the variables and the applied transformations are summarized in Table 1. Also, we report the correlations between the explanatory variables in Table 2.
Earlier studies highlight the scarcity of results on cryptocurrencies other than BTC and underline the need for a better understanding of the entire cryptocurrency ecosystem and its properties (statistical and economic), see e.g., [11,17]. Studies that go beyond the BTC prices and confirm via various financial models the importance of using diverse cryptocurrencies-rather than a single one-in portfolio optimization include but are not limited to [18] and [15]. In view of the above, in the present study, apart from Bitcoin (BTC), we also focus on Ether (ETH), the native coin of the Ethereum blockchain [14,19], and currently the second largest cryptocurrency in terms of market capitalization [20]. Unlike the BTC blockchain, the Ethereum blockchain has been launched eponymously and is governed, or more aptly researched and developed, by the Ethereum Foundation [21], a non-profit organization-based in Switzerland. The architecture of the Ethereum ecosystem has far-reaching implications on its long-term development and sustainability that clearly differentiate it from BTC. Supporting smart contract execution-execution of code snippets that go beyond the simple monetary transactions of BTC-Ethereum has scheduled a transition from the currently computationally heavy Proof of Work (introduced by BTC and followed by most cryptocurrencies) to the computationally efficient alternative of Proof of Stake, which saves on energy resources and provides a scalable infrastructure while retaining the same security guarantees as Proof of Work. Without going further into the technical details, the main motivation to study ETH that stems from these considerations is the following. Given the different technological advancements that are promised by Ethereum, will ETH become independent from BTC and follow its own path as a cryptocurrency or are after all the values of all cryptocurrencies inevitably tied, as they are up to now [16]? Keeping in mind that ETH-i.e., the native coin-is only one of the main applications of the ETH blockchain-and the blockchain technology in general-it should also be noted that price movements of ETH may not necessarily align in the future with technological advancements in the Ethereum blockchain.
From a methodological perspective, we perform a two-layer Bayesian analysis. First, we transform the cryptocurrency series into a binary series and apply a logistic regression model on the transformed series. Specifically, if the current price return, i.e., Y t − Y t−1 , exceeds a predefined threshold then we assign the value 1 and 0 otherwise. Then, we investigate whether the logistic regression model-which is widely used by applied statisticians and econometricians for analyzing binary data, see [22] and references therein-with a specific covariate set is an appropriate model for estimating the probability of observing the value 1 in these binary series. We use the methodology of Polson et al. [23] to make inference on the model's parameters with an additional reversible jump step to allow for model uncertainty, cf. Section 2.1. Secondly, we model the log-price series data using a novel Hidden Markov (regime switching) model, namely the non-homogeneous Pólya Gamma Hidden Markov model (NHPG) of [24], cf. Section 2.2. Hidden Markov models introduce time-variation in the parameters through an underlying unobserved discrete process. In brief, at any given time t, the observed log-price data point depends on a latent (hidden) state. Hence, conditionally on the hidden states, the parameters of the data generating process vary and thus allowing for a flexible data representation. In our setting, the underlying process follows a binomial process with exogenous variables. It has been shown that the NHPG model outperforms similar models in forecasting conventional financial data, cf. [25]. Also, it uses Bayesian Model Averaging (BMA) approach for inference which has been shown to possess desirable properties for forecasting applications [26][27][28][29].
With all these in mind, the questions we aim to address are the following: Q1. Does the underlying information from fiat currencies, commodities, stock indices and blockchain specific variables explain/predict the probability of positive returns? Q2. Do the same variables have explanatory/predictive power on both the BTC and ETH cryptocurrencies? Q3. Do the same explanatory variables affect the BTC price series both on the long and short run?
We use daily data (for both the response and the explanatory variables) between 2017 and 2019. For question Q3, we compare the BTC data of the whole 2014-2019 period to the 2017-2019 period (also used in Q1-Q2). As in most of the recent studies, we exclude the period up to 2014 which exhibits markedly different characteristics.
The findings of our experiments can be summarized as follows. The logistic model is not suitable to model the probabilities of positive daily returns of BTC and ETH. However, changing the magnitude of returns, we observe that (a) the logistic model has improved performance (b) the statistical significant covariates in the logistic regression model change and (c) the in-sample (fitting) results are different for the BTC and ETH series.
Considering the second experiment, we find that the NHPG model identifies periods of different volatility and accounts well for the heteroskedacity of all three price series (BTC short and long periods and ETH). Graphically, this is illustrated in later figures. The hidden states-which may be described as periods of high and low volatility-are not persistent, i.e., the transitions between the two states are frequent. Based on the same figures, the in-sample performance of the NHPG algorithm is good. However, the set of included predictors-predictors with posterior probability of inclusion above 0.5-is large, which implies that each predictor explains only a small fraction of the volatility of the series. Concerning specific predictors, the exclusion of some of the fiat currency exchange rates for the ETH series suggests a (still) more geographically restricted interest for the currency in comparison to BTC. It is also worth mentioning that the cryptocurrency specific variables, hash rate and average block size are not significant for modeling the BTC and ETH price series. This may indicate a more mature and stabilizing mining network that is less responsive to price expectations, sentiment or extreme speculation.
Finally, as shown in the last figure, the mean posterior out-of-sample predictions, although better for ETH than for BTC, are in general not good as they frequently even miss the direction of movement of the series. However, this is a common outcome in exchange rates [30,31]. In sum, our results confirm that the Hidden Markov approach is promising in the understanding of cryptocurrencies price formation and back earlier findings that cryptocurrencies are unlike any existing financial asset and hence that their understanding requires novel tools and ideas.
In the related literature, the first layer of our methodology, i.e., the logistic regression model, falls into the binary regression models literature. They have only been applied in the cryptocurrency context, by [32] to forecast the daily price direction of BTC, by [33] to study the price co-explosivity in leading cryptocurrencies and by [34] to study the herding behavior of BTC. As far as the second layer of our methodology, the present NHPG model falls into the Markov-switching literature that is the benchmark for predicting exchange rates, see [35] and [36,37] and explaining financial time series, see [38] and references therein. This class of models, account for the non-stationarities and non-linearities of the time series. Although standard in financial applications ( [38]), Hidden Markov models have been applied in the cryptocurrency context by [39] as Markov-switching GARCH models to model the volatility dynamics of BTC, by [40], as a state-space model for representing the BTC price series, by [18] as multivariate state state-space models in forecasting cryptocurrencies, as homogeneous Hidden Markov, i.e., hidden Markov models with constant transition probabilities, by [41,42], and in the understanding of price bubbles by [43]. Also, [44], study the BTC and ETH prices under structural break setting while [45] study the cryptocurrency returns and volatility under stochastic volatility model with discontinuous jumps. The use of the NHPG model in explaining and predicting the BTC and ETH price series is also supported by the findings of various articles. For example the authors of [46,47] and [8], demonstrate the non-stationarity of the BTC index and volume and underline the importance of modeling non-linearity in Bitcoin prediction models. This is further elaborated by Beckmann and Schüssler [26] who suggest that model selection and the use of averaging criteria are necessary to avoid poor forecasting results. Following the similar reasoning, Phillip et al. [11] posit that standard models are inadequate to capture the extreme variability of cryptocurrencies and argue in favor of more composite approaches. In an important finding, Ciaian et al. [3] show that the Bitcoin price series exhibits structural breaks and identifies periods of data (prior to 2013 and between 2013 and 2015) of markedly different variance and other econometric characteristics. Their findings further suggest that significant price predictors may vary over time. Pichl and Kaizoji [48] use data from various time periods to demonstrate, among other results, that the BTC price series exhibits heteroskedasticity.
Finally, the present paper falls into the strand of literature that studies the explanatory and predictive power of traditional financial and economic indices on the cryptocurrency price series.
All in all, our aim is to contribute to the literature that studies the modeling and prediction of cryptocurrencies, using a novel Bayesian elaborate econometric model and to try to gain understanding in the statistical, econometric and financial properties of existing cryptocurrencies.
The rest of the paper is structured as follows. In Section 2, we describe the two econometric models of this study: the logistic regression model is described analytically in Section 2.1 and the NHPG model and simulation scheme is described in Section 2.2. The empirical study is presented in Section 3. In detail, the data set that we used is described in Section 3.1, the results regarding the logistic model are presented in Section 3.2 and lastly, the results regarding the NHPG model are presented in Section 3.3. We conclude the paper with a discussion of the limitations of the present model and directions for future work in Section 4. This paper considerably extends its earlier conference version. Concerning the applied methodology, we provide a rigorous description of the logistic regression model for studying the probabilities of positive returns (Section 2.1) and of the NHPG model (Section 2.2). In addition, we have updated the data set-BTC and ETH series-and the covariate set. Specifically, in the covariate set, (Table 1), we have included the Russel 2000 index, excluded the autoregressive terms and applied different transformations on the variables. More importantly, concerning the results, this paper includes the novel analysis of the logistic model (Section 3.2) and based on the new covariate set, it offers more enriched outcomes and more comprehensive insight from the analysis of the NHPG model (Section 3.3).

The Logistic Regression Model
Let Y t be the ETH or the BTC price series with realization y t . Also, consider a set of r − 1 available The explanatory variables (predictors) {X t } that are used in the present analysis are described in Table 1. We transform the cryptocurrency price series as a binary series, i.e., a series that takes the values 1 or 0, as follows. Let where I denotes the indicator function that takes the value 1 if Y t − Y t−1 ≥ α and 0 otherwise and α is a predefined threshold. Intuitively, we study the connection of the predictors {X} with the probability of having positive daily returns {Y t − Y t−1 ≤ α}. We perform our analysis for various positive thresholds, α ∈ {0, 1, . . . , 5%}. We treat the binary series U t,α as a Bernoulli(p t ) variable. From the class of the generalized linear models, we use a logit link to model the probabilities (p t ). The standard logistic regression model is defined as with η t = x t β, β the logistic regression coefficients and g(z) = log z 1−z the logit link function. Then, the probabilities are modeled as, .
We use the recently proposed latent variable scheme, namely the Pólya-Gamma data augmentation method of [23] which has significantly improved results.
The authors introduce of [23] proved that binomial likelihoods-or Bernoulli likelihoods-parametrized by log odds can be represented as mixtures of Gaussian distributions with respect to the Pólya-Gamma distribution. Their main result is that letting p(ω) be the density of a Pólya-Gamma latent variable ω, with ω ∼ P G(b, 0), for b > 0, the following identity holds for all a ∈ R, with k = a − b/2. Furthermore, the conditional distribution of ω | ψ is also Pólya-Gamma, P G(b, ψ). When ψ = xβ, the previous identity gives rise to a conditionally conjugate augmentation scheme for Bernoulli likelihoods of logistic parameters. The likelihood is given by .
Using the result of [23] with k t = u t − 1/2 and setting Ω = diag{ω 1 , . . . , ω T }, the augmented likelihood is proportional to Assuming as prior distributions ω ∼ P G(1, 0) and β ∼ N m β 0 , V β 0 , simulation from the posterior distributions can be done iteratively in two steps: . To account for model uncertainty, on the predictor set, we perform a reversible jump step withing the augmented Pólya-Gamma data augmentation scheme, as proposed in [24] 2.1.

Evaluation Metrics
For each iteration, we kept an in-sample replication of the binary series and compared it with the actual binary series U t . Using a 1-0 loss function, we measure the accuracy of the logistic regression model-by means of the average number of misestimated data values per iteration-in representing the studied series. We report the results of this study in Section 3.2.

The Non-Homogeneous Polya-Gamma Hidden Markov Model
Given a time horizon T ≥ 0 and discrete observation times t = 1, 2, . . . , T, we consider an observed random process {Y t } t≤T and a hidden underlying process {Z t } t≤T . The hidden process {Z t } is assumed to be a two-state non-homogeneous discrete-time Markov chain, s = 1, 2, that determines the states of the observed process. In our setting, the observed process is either the BTC or the ETH logarithmic prices series. The description of the hidden states is not pre-determined and is subject to the interpretation of the results.
Let y t and z t be the realizations of the random processes {Y t } and {Z t }, respectively. We assume that at time t, t = 1, . . . , T, y t depends on the current state z t and not on the previous states. Consider also the set of predictors {X t } of Section 2.1. A subset of the predictors X (1) t ⊆ {X t } of length r 1 − 1 affects the cryptocurrency linearly. In addition, a subset X (2) t ⊆ {X t } of length r 2 − 1 is used to describe the dynamics of the time-varying transition probabilities, i.e., the probabilities of moving from hidden state s = 1 to the hidden state s = 2 and vice versa. Thus, we allow the predictors to affect the series {Y t } linearly and non-linearly. Given the above, the cryptocurrency price series {Y t } can be modeled as where B s = (b 0s , b 1s , . . . , b r 1 −1s ) are the regression coefficients and N (µ, σ 2 ) denotes the normal distribution with mean µ and variance σ 2 . The dependence of the observed process on the unobserved states, allows the model to capture the non-stationarities, non-linearities and the changes in the volatility, i.e., heteroskedasticity of the cryptocurrency series. The dynamics of the unobserved process {Z t } can be described by the time-varying (non-homogeneous) transition probabilities, which depend on the predictors X (2) t and are given by the following relationship where β ij = (β 0,ij , β 1,ij , . . . , β r 2 −1,ij ) is the vector of the logistic regression coefficients to be estimated. Please note that for identifiability reasons, we adopt the convention of setting, for each row of the transition matrix, one of the β ij to be a vector of zeros. Without loss of generality, we set β ij = β ji = 0 for i, j = 1, 2, i = j. Hence, for β i := β ii , i = 1, 2, the probabilities can be written in a simpler form Summing up, the unknown quantities of the NHPG are θ s = B s , σ 2 s , β s , s = 1, 2 , i.e., the parameters in the mean predictive regression equation and the parameters in the logistic regression equation for the transition probabilities of the unobserved process {Z t }, t = 1, ..., T. We follow the Bayesian methodology of [24], for joint inference on model specification, model parameters and predictions. Specifically, the authors in [24] use conditional conjugate analysis for the parameters in the mean predictive regression equation, i.e.
where IG denotes the Inverted-Gamma distribution. The conditional and the marginal posterior distributions for the state specific parameters σ s and B s , To make inference about the logistic regression coefficients, the authors model the probabilities of staying at the same state for two consecutive time periods, i.e., p t ss . They define, for t = 1 . . . , T − 1, the quantityZ s t+1 = I [Z t+1 = Z t = s] with the sum ∑ tZ s t+1 , be the number of times that the chain was at state s for two consecutive time periods. Then, and inference for the probabilities falls to the case of the logistic model of Section 2.1 Finally, for every dataset we kept L out-of-sample observations and compare the estimated forecasts using the NHPG model. Given model M, the predictive distribution of y T+L , L ≤ 1 is . All in all, the MCMC sampling scheme is constructed with recursive updates of (i) the latent variables z T given the current value of the model parameters by using the scaled Forward-Backward algorithm (Scott [54]) (ii) the logistic regression coefficients by adopting the auxiliary variables method of Polson et al. [23] given the sequence of states z T , (iii) the mean regression coefficients conditional on z T by using the Gibbs sampling algorithm (iv) the covariate set using a couple of reversible jump steps and (v) the predictive distributions given the parameters and hidden states. The MCMC steps are detailed in Algorithm 1.

Algorithm 1 MCMC Sampling Scheme for Inference on Model Specification and Parameters
1: % After each procedure the parameters and model space are updated conditionally on the previous quantities 2: procedure SCALED FORWARD-BACKWARD( θ, y t ) 3: %Simulation of a realization of the hidden states z t

The Data
We assess the explanatory power of 12 financial/economic and 2 cryptocurrency specific variables, outlined in Table 1, on the BTC and ETH series, through two different experimental exercises: (a) the logistic regression model and (b) NHPG model. We analyze the daily BTC and ETH price series for the period ranging from 1/1/2017 to 16/11/2019. Missing data due to the non-business days are filled with the last available value. For the first exercise, we transform the price series into binary series using the transformation {Y t − Y t−1 ≥ α}. The aim of this experiment is to model/explain the probability of a positive return. We repeat the experiment for different magnitudes (α) of returns, i.e., gains more than 0%, 1%, . . . , 5%. For the second exercise, we use the logarithm of the price series of the two aforementioned cryptocurrencies. Additionally to the previous data sets, we apply the NHPG methodology in a larger BTC log-price series (for the period ranging from 1/1/2014 until 16/11/2019). The closing BTC prices were downloaded from [20] and the ETH prices were downloaded from [56].

Results: The Logistic Regression Model
We report the in-sample performance for the logistic regression model in Table 3 for every threshold and for both cryptocurrencies. Even though the in-sample performance-based on the large number of incorrect point estimates of the series-is poor, we see that when the threshold α increases, the in-sample performance is improved. Table 3. Mean incorrect estimations out of the total T = 1017 observations, per iteration of the BTC and ETH binary series. Increasing the magnitude of minimum returns the average number of misestimations decreases. This shows that the covariate set has explanatory power on defining the probability of larger returns. In parenthesis we report the error rate.

Thresholds
Coin To be more precise, we find that for α = 0% the covariate set has no explanatory power on the U t,α since the probability of incorrect estimations is almost 50% for both coins. However, for α = 5% the probability of incorrect estimations drops to 17% for the binary BTC series and to 25.5% for the binary ETH series. This result is an indication that the accounted covariate set, which includes fiat currencies, stock indices and commodities can be used to explain/predict the possibility of larger positive returns. A visualization of the best in-sample performance for the U t,5 series for BTC and ETH is in Figures 1  and 2.  In Tables 4 and 5, we report the posterior probabilities of inclusion π k , k = 1, . . . , K of the K explanatory variables for the studied logistic models for the BTC and ETH binary series, respectively. For both coins, we observe that the studied covariated set does not affect nor predict the probability of positive return, i.e., Y t − Y t−1 ≥ 0. Table 4. Posterior probabilities of inclusion of the explanatory variables for the binary BTC series for the period ranging from 1/2016 to 11/2019 The probabilities of the variables that are included in the median probability model, i.e., the variables with probability of inclusion above 0.5, are highlighted with bold.

Predictors
Return's magnitude  Table 5. Posterior probabilities of inclusion of the explanatory variables for the binary ETH series for the period ranging from 1/2016 to 11/2019 The probabilities of the variables that are included in the median probability model, i.e., the variables with probability of inclusion above 0.5, are highlighted with bold.

Posterior Probabilities of Inclusion
Predictors However, if we change the magnitude of return (α), we find that there is a correlation between the covariate set and the probabilities of observing Y t − Y t−1 ≥ α, α = {1, 2, 3, 4, 5%}. We find that the sets including the covariates that affect the binary BTC and ETH series-covariates with posterior probability of inclusion above 0.5-are different. Also, we observe that even though the binary ETH series is correlated with more covariates than the covariates of the binary BTC series, the logistic model has worse performance in explaining the studied series (as seen in Table 3).
The results of this experiment imply that the accounted covariate set has some explanatory power on the series but a more elaborated and more complicated model, such as the NHPG model, needs to be considered.

Results: The NHPG Model
In this section, we present the results of the NHPG model using the logarithmic ETH and BTC price series. Figures 3-5 plot the log ETH, log BTC and extended log BTC datasets (blue line) along with the estimated in-sample time series (gray line). This shows graphically the good in-sample performance of the NHPG model to replicate the log BTC and log ETH series. Shaded bars indicate the time periods that the underlying hidden process is in state 1. The states alternate between 1 and 2 rather frequently, confirming previous studies on the heteroskedasticity of the series, see e.g., [11], and on the existence of structural breaks and regime switches, see e.g., [39,44].   However, the out-of-sample performance of the NHPG is poor. Figure 6, shows the posterior mean (gray lines), of the 30 empirical predictive distributions, along with the actual out-of sample log prices of ETH, BTC (blue line). The mean posterior out-of-sample predictions are in general not good, as they frequently miss the direction of movement of the series. Even worse, when we examine the posterior prediction intervals with boundaries the 2.5% and 97.5% quantiles of the empirical predictive densities-instead of the mean point forecasts-we find, for some cases, that they do not include the actual out-of-sample values. Hence, we confirm the claim of previous studies that financial and economic variables do not accurately predict the price fluctuation of cryptocurrencies. The good in-sample and poor out-of-sample performance of the two-state Non-Homogeneous Hidden Markov models is also observed in the exchange rate literature, see for example [30,31].
In Table 6, we report the posterior probabilities of inclusion of the explanatory variables for the mean equation-first number-and the logistic regression-second number in every cell. Variables with posterior probability of inclusion above 0.5 in either the mean equation or the transition probabilities are marked with bold. These variables make up the median probability model (MPM). Although the BTC and ETH are correlated [15], the variables that affect the series (by means of the MPM) are not the same. The MPM of ETH consists of 7 covariates: the USD/EUR, USD/GBP and USD/CNY exchange rates, Russel 2000, S&P 500, Dow Jones, NASDAQ indices. The MPM of BTC additionally contains the USD/JPY exchange rate and VIX. This difference on the exchange rates indicates that ETH is more geographically restricted. Moreover, the hash rate (HR) and Average Block size is insignificant for both cryptocurrencies. Regarding the extended BTC dataset, we find that Gold and Crude Oil future prices are also significant. Furthermore, it is worth mentioning that all the effects are linear in BTC series and hence the transition probabilities of hidden states are homogeneous (constant through time). This is in difference with ETH price series where S&P 500 and USD/GBP affects the series also non-linearly. The inclusion of these two variables in the transition probabilities equation results to the non-constant transition probabilities and consequently indicates that the Non-Homogeneous Hidden Markov model is promising in the understanding of cryptocurrencies price formation. The non-constant transition probabilities along with the fact that there exist other variables with non-negligible posterior probabilities of inclusion (above 0.3), imply that the are other drives that drive changes in the underlying process that go beyond the financial aspects that have been considered in the present setting. Finally, we observe that even though the number of statistical important variables is large, for both coins, the forecasting performance of this model is very modest. This is an indication that non-traditional financial and economic variables need to be considered. Table 6. Posterior probabilities of inclusion of the explanatory variables. The first value in each cell is the posterior probability of inclusion in the mean equation and the second the probability of inclusion in the transition probabilities of the underlying Markov process. The probabilities of the variables that are included in the median probability model, i.e., variables with probability above 0.5 in either the mean equation or the logistic regression equation are highlighted with bold. To sum up, even though we observe a large number of statistical significant explanatory variables, the insufficient forecasting performance of the NHPG model confirms that the cryptocurrencies are still decoupled from the mainstream financial and economic assets [57].

Concluding Remarks
We applied a logistic regression model with a predefined covariate set to model the probabilities of observing daily returns exceeding a predefined threshold for the Bitcoin (BTC) and Ether (ETH) series. We show empirically that the logistic model has weak fitting performance. However, we find that changing the magnitude of positive returns, improves the fitting performance of the logistic regression model. This result motivated us to incorporate the logistic regression model into a more complex model. Therefore, we applied a specific instance of the Non-Homogeneous Hidden Markov models to the logarithm of BTC and ETH price series.
We used the non-homogeneous Pólya-Gamma Hidden Markov Model (NHPG) of [24]. Focusing on a data set of financial/economic predictors, we studied general properties of the cryptocurrency price series. While the NHPG algorithm exhibited good in-sample performance, it revealed that changes in the underlying two-state Markov process are frequent, thus indicating that the states are not persistent, contributing to the already high heteroskedasticity of both the Bitcoin and the Ether data series. Notably, both cryptocurrency specific variables were not found significant for BTC and ETH. Significance of exchange rates revealed a more geographically restricted interest for ETH than for BTC.
From a modeling point of view, the median probability model included too many covariates, thus, indicating data with high variability and confirming that financial and economic variables-even if cryptocurrency specific-are not enough to explain the formation of cryptocurrency prices. Along with the poor out-of-sample predictions, these findings show that even algorithms with good performance on conventional financial data do not capture all aspects of cryptocurrencies. In the main takeaway of this study, these results back earlier findings that cryptocurrencies are unlike any other financial asset and that the understanding of their properties requires not only the combination of more sophisticated models but also the inception of novel ideas and tools.
While the current study offers a novel perspective on the hidden states-and hence on the underlying forces-that drive cryptocurrency markets, it also suggests that the analysis of their price formation requires more elaborate tools. Recent advances in deep neural networks provide methods to identify hidden layers that approximate complex non-linear relationships. Specifically, by exploring electronic high-frequency data of supply, demand and prices in financial markets, Deep Learning models can uncover universal price formation mechanisms, [58]. This approach seems particularly promising for cryptocurrency markets. Along these lines, the current model may prompt a more extensive application of the rich Hidden Markov theory and analytical toolbox on cryptocurrency markets.
All in all, the investigation of the exogenous variables that affect or drive the cryptocurrency market can be useful to investors, policy makers, traders for portfolio allocation, risk management and trading strategies.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.