1. Introduction
The popularity of cryptocurrency with financial intermediaries came about as a consequence of the perceived failures of the monetary authorities in the global financial crisis of 2008 and the European sovereign debt crisis during 2010 to 2013 [
1]. In terms of monetary attributes, Yermack [
2] explains that Bitcoin does not behave similarly to a fiat currency according to the criteria widely used by economists. For instance, some economists view the inelasticity in the supply of cryptocurrency as an advantage but some view it as a disadvantage.
Cryptocurrencies are digital coins not issued by any government or legal entity [
3]; they only use cryptography and a clever system to regulate their supply, control trading operations and avoid frauds. The transactions are recorded digitally in a blockchain as an accounting system [
4]. Digital currencies are based on peer-to-peer authentication with rules to determine the amount and condition produced [
5]. These currencies plan the peer-to-peer network as a set of nodes in a self-organizing connected network. Hayes [
6] has identified that the relative differences in the cost of mining on the margin determine the prices of cryptocurrencies. The most popular cryptocurrency is Bitcoin, which was developed in the seminal paper of Nakamoto [
7]. Bitcoin, Ethereum, and Ripple are the main cryptocurrencies capitalized by the market as reported by Blas [
8].
Conversely, Dwyer [
5] reports that governments create and certify fiat currencies that are used by all. In terms of networks, this is similar to a client–server model in which one server receives requests from clients and responds to them. The server ensures the data precision of whatever information it provides. The issuer designs a fiat currency to hinder counterfeiting and enforces laws that make counterfeiting a crime.
What are the most important challenges in the cryptocurrency market? Encrybit [
9], a cryptocurrency exchange platform, conducted a survey among 1108 traders between 23 April and 30 April 2018 and identified the three biggest problems in the cryptocurrency market: lack of security (40%), high trading fees (37%), and lack of liquidity (36%). The first problem deals with more sophisticated hackers who endanger the exchange platforms. The second problem arises because many exchanges split their trading fees into two separate fees in which the maker fee is higher than the taker fee because the former adds liquidity to the market. The third problem is price manipulation and high volatility in which brokers do not place orders or execute them on time.
Crook [
10] argues that the illiquidity problem needs to be solved in order to democratize access to the cryptocurrency market, make it more efficient, and avoid perverse incentives from the different players. Jiang et al. [
11] make a bibliometric research of 918 papers published between 2009 and 2019 about the cryptocurrency market and conclude that there was a shift in the main topic from technological to economic. Furthermore, the most important economic topic is explaining and forecasting the volatility in the cryptocurrency market. The most common statistical models that studies use to explain or predict market volatility are parametric such as the GARCH-type family.
The literature related to the liquidity of the cryptocurrency market is even more recent than the literature related to its volatility and has different strands: market microstructure explanations [
12], the relation between liquidity and volatility [
13,
14], the factors that affect market liquidity [
15,
16], and how to measure liquidity in the cryptocurrency market [
17].
None of the above strands have tried to explain or predict the liquidity in the cryptocurrency market. Furthermore, given the microstructure complexity of this liquidity, we assert that the use of nonparametric models would be better for predicting it. In this sense, we find that the k-nearest neighbor (KNN) approach, which is a supervised machine learning algorithm, is better suited to predict the liquidity of the cryptocurrency market than a classical linear model such as the autoregressive moving average (ARMA) model or a nonlinear model such as the generalized autoregressive conditional heteroskedasticity (GARCH) that are intensively used to predict volatility.
Although there have been several investors’ clampdown events on cryptocurrency trading due to market news, these pieces of news are later incorporated into prices due to the short-term memory of Bitcoin returns that exhibit short term momentum and reversals [
18]. It is important to clarify that the mechanism through which news may be added into cryptocurrency prices is short-term for cryptocurrencies’ returns, but long-term for cryptocurrencies returns’ volatility. Nevertheless, liquidity is at the bottom of the explanation.
Khuntia and Pattanayak [
19] present an event history in the Bitcoin market from January 2015–June 2018 and suggest that these events impact the trading volume and the long-memory in volatility of bitcoin returns. Furthermore, Phillip et al. [
20] found that slower transacted cryptocurrencies, such as Bitcoin, have less long memory, whereas faster transacted coins, such as Ripple display more long memory. In other words, the day-to-day volatility correlation (long memory in return volatility) is dependent on completion times and therefore liquidity.
Thus, the main objective of this study is to determine the best model for predicting the short-term log rates of the bid-ask spreads in the three biggest cryptocurrencies—Bitcoin, Ripple, and Ethereum—and in the 16 major fiat currencies listed by Bloomberg. Consequently, our research question is the following: Is the KNN approach a better predictor of the short-term liquidity of cryptocurrencies than classical time-series models? To the best of our knowledge, there is no other study that has addressed this question, and given the importance of market liquidity, we argue that it is necessary to find better ways to assess the liquidity of the cryptocurrency market (long-term memory in the cryptocurrency market liquidity is beyond the scope of our study).
This article is organized into five sections. In
Section 2, we examine the literature on liquidity.
Section 3,
Section 4 and
Section 5 present the methodology, results, and discussion of the study, respectively.
2. Literature Review
According to Jiang et al. [
11], the research trend of the last decade in the published papers related to cryptocurrency has changed from a technological perspective to an economic one. Specifically, their focus has been to try to explain the volatility of cryptocurrencies with traditional statistical models such as generalized autoregressive conditional heteroskedasticity model (GARCH) and its derivations.
Financial time series, as Ruppert [
21] mentions, often exhibit volatility clustering in assets returns, where periods of high volatility and periods of low volatility could be present, i.e., time-varying volatility is more common than constant volatility. In this sense, Nelson [
22] comments that GARCH models elegantly capture the volatility and this feature accounts for both their theoretical appeal and their empirical success. Thus, as Venter and Maré [
23] argue, the GARCH model has become increasingly popular among both academics and practitioners for modelling time-varying volatility in financial time series analysis, including cryptocurrencies.
GARCH-type models are actually used not only for examining mean returns but also a volatility return transmission within the VAR-GARCH model; for example, Loverta and Lopez [
24] focus on credit default swap (CDS) spreads as a directly observable market indicator of default risk within the VAR-BEKK-GARCH framework.
Kyriazis et al. [
25] study the volatility of the three most highly capitalized digital currencies (Bitcoin, Ethereum, and Ripple) during a bearish market and find that during distressed times, no possibilities for hedging exist between the majority of cryptocurrencies and the three major ones. Walther et al. [
26] use the GARCH-MIDAS framework to identify drivers of cryptocurrency volatility and find that the global real economic activity provides superior volatility predictions for both bull and bear markets. Acereda et al. [
27] study the expected shortfall of the main cryptocurrencies and find that the best results come from using a NGARCH.
Fakhfekh and Jeribi [
28] model the volatility of 16 cryptocurrencies and find that the TGARCH was the best specification, whereas Cerqueti et al. [
29] find that relaxing the normality assumption and considering skewed distributions, such as a skewed non-Gaussian GARCH models, yield better predictions for the cryptocurrencies’ volatility. One of the most comprehensive studies about the usefulness of GARCH-type models for forecasting Bitcoin’s volatility is the one by Köchling et al. [
30]. Interestingly, the authors find that most GARCH-type models have equal predictive ability and that some specifications are outperformed on a regular basis.
In the last two years, several authors have focused their attention on cryptocurrencies’ liquidity. Stenfors [
31] defines the bid-ask spread as a bonus that is paid to market makers for standing ready to absorb the risk borne by others “immediately” and that the spread is closely connected to market liquidity. For the case of fiat exchange rate markets, Stenfors [
31] uses the spread as a proxy, whereas Dyhrberg et al. [
12] use it for the case of cryptocurrency markets. Kim [
32] finds that Bitcoin markets have bid-ask spreads that are approximately 2% lower than the main fiat currencies due to lower transaction costs.
Furthermore, there is a strong relation between the volatility and liquidity of the cryptocurrency market. Wei [
13] shows that volatility decreases as liquidity increases in cryptocurrencies and that there is no sign of an illiquidity premium. Będowska-Sójka et al. [
14] obtain a contrasting result in that high volatility in the cryptocurrency market attracts new investors and that this attraction causes an increase in the market liquidity. Brauneis et al. [
15] and Scharnowski [
16] show that the liquidity of cryptocurrencies depends specifically on the volatility of their returns, the dollar trading volume, and the number of transactions and that general financial market variables have no influence. Brauneis et al. [
15] show that a universal best measure for the liquidity of cryptocurrencies does not exist yet because it depends on the application.
There is also research related to the long-term memory of cryptocurrencies returns’ volatility, where the long memory describes the high order correlation structure of a series. Fakhfekha and Jeribi [
28] studied sixteen of the most popular cryptocurrencies with five GARCH-type models to predict long-term memory in cryptocurrencies returns’ volatility and found the TGARCH with double exponential distribution to be the best model.
Lahmiri et al. [
33] studied the nonlinear patterns of volatility in seven Bitcoin markets. They investigate the fractional long-range dependence in conjunction with the potential inherent stochasticity of volatility time series under four diverse distributional assumptions and found the existence of long memory in Bitcoin market volatility, irrespectively of distributional inference. Hence, in explaining long memory in cryptocurrencies returns’ volatility it is useful to consider Markov-Switching Multifractal Models (MSM) or a sort of hybrid model between pure parametric models and non-parametric ones, such as the Fractionally Integrated GARCH model.
Finally, research has shifted towards non-parametric models to forecast cryptocurrencies returns’ volatility. Khaldi et al. [
34] compared different type-GARCH models and Artificial Neural Networks (ANN) models in an attempt to forecast the Bitcoin returns’ volatility. They found that a type of ANN (the Multilayer Perceptron-MLP) outperformed all the parametric and nonparametric models, but it was only effective in short-term forecasting.
The previous review clearly finds that the most important economic concern in the cryptocurrency market has been its volatility, but now the focus is starting to shift to its liquidity. Nevertheless, so far there has been no attempt to predict this short-term liquidity. Furthermore, there is no consensus on what the best method is for predicting the market volatility (beyond a type of hybrid or a non-parametric model), nor what the best measure is for predicting liquidity, but it is possible to use the bid-ask spread as a measure of liquidity and a non-parametric model. Moreover, although there is a strong relation between volatility and liquidity, there is no consensus on the direction of the relation or whether all factors that affect this relation are within the market network.
3. Methodology
3.1. Data and Hypothesis
The study is motived by empirical evidence based on the idea that even though cryptocurrencies exhibit similar features of fiat currencies their market structure is fundamentally different, as is noted by Dyhrberg et al. [
12]. Furthermore, Saadah and Whafa [
35] pointed out that cryptocurrencies are the most fluctuating product on the market and their high volatility makes liquidity prediction difficult. Therefore, modeling cryptocurrency prediction using classical time series methods coupled with the scarcity of a reliable data source could be a challenge. In this sense, KNN algorithm has been applied as a fundamental prediction technique when there is little or no prior knowledge about the distribution of the data [
36]. Due to the complexity of the market microstructure in crypto and fiat currencies, we test whether nonparametric machine learning, such as the KNN approach, is better suited for predicting short-term liquidity in the cryptocurrency market rather than parametric time-series models that studies have widely used to predict volatility. Hence, we propose the following hypothesis based on Gandal and Halaburda [
4], Stenfors [
31], Kim [
32], and Katsiampa [
37].
Hypothesis 1. The nonparametric KNN approach is a better method to predict the short-term liquidity of crypto and fiat currencies than classical ARMA and GARCH models.
The data are the log rates of the daily closing bid-ask spreads from 9 February 2018 to 8 February 2019, for a total of 259 observations for each of the 19 currencies in the study. The log rates were calculated by taking the logarithm of the USD price ratio of the spreads for each currency.
The sample comprises three cryptocurrencies (Bitcoin, Ethereum, and Ripple) and 16 major fiat currencies: the Australian dollar, Brazilian real, British pound, Canadian dollar, Danish krone, euro, Japanese yen, Mexican peso, New Zealand dollar, Norwegian krone, Singaporean dollar, South African rand, South Korean won, Swedish krona, Swiss franc, and Taiwanese dollar. The data are publicly available on the Bloomberg database.
3.2. ARMA and GARCH Models
First, we used the augmented Dickey–Fuller (ADF) statistic to test for the presence of a unit root to rigorously verify the nonstationary nature of the log rates of the bid-ask spreads,
with
. If the null hypothesis of the presence of a unit root is rejected, then the stationarity autoregressive is guaranteed at least in the mean [
38].
Once we applied the ADF test, we continued to analyze the ARMA model proposed by Box and Jenkins [
39]. The generalized ARMA(
p,
q) model that includes both the
p-autoregressive and the
q-moving average terms is represented as follows:
where
and
are the parameters for the autoregressive and moving average models, respectively, such as
and
;
is the delay operator, and the value of its exponent indicates the order of the delay that means
;
is the constant parameter; and
is the residuals with
.
We used the Schwarz criterion (SC) developed by Schwarz [
40] that penalizes the inclusion of a greater number of parameters compared with the Akaike Information Criterion (AIC) proposed by Akaike [
41] and therefore avoids errors in the estimation of models with large numbers of parameters. For ARMA models, Koehler and Murphree [
42] confirms in an empirical study that the SC is a better criterion than AIC and validates the results of others that the AIC will overfit the data. To select the best ARMA(
p,
q), we used the model with the lowest SC. Second, we applied the GARCH(
p,
q) model proposed by Bollerslev [
43] that considers the dependence of the conditional variance (
) on the past squared residuals of the model (
) and the past values of the variance (
) for the
time series. In this context, the modeling of the conditional mean and conditional variance is governed by the following:
where
is the time delay parameter of the squared residuals;
is the time delay parameter of the past values of the variance;
,
with
and
with
are the estimated coefficients; and
represents the standardized residuals with
. To select the parameters
, we also applied the SC, which has been shown to exhibit a higher degree of accuracy in identifying the true data generating process than AIC [
44].
3.3. Nearest Neighbor Method
Fix and Hodges [
45] introduced the nearest neighbor rule as a non-parametric method for pattern classification. Later, Cover and Hart [
46] formalized mathematically the method. As a nonparametric method, in the KNN algorithm no explicit assumptions about the underlying data distribution are needed. The KNN besides a classification tool is also used as forecasting technique that considers the spatial correlation between the points of a phase space to improve short-term prediction.
Bajo-Rubio et al. [
47] apply the KNN to the foreign exchange market to indicate the potential utility not only as a tool for the prediction of the daily exchange rate but also for the rules of purchase or sales in the technical analysis. Fernández-Rodríguez et al. [
48] state that the basic idea behind these predictors is that pieces of the time series in the past might resemble pieces in the future.
The KNN algorithm used in this study can be explained following the next steps from Finkenstädt and Kuhbier [
49], and Arroyo and Maté [
50]:
- (1)
The time series considered,
, is transformed into a series of
-dimensional vectors:
where
,
with
being the number of lags and
being the delay parameter. In the KNN forecasting algorithm,
and
are pre-determined parameters.
- (2)
To simplify, we shall only consider the case of
, then the resulting time series of vectors is denoted by
, with
, which represents a vector of
consecutive observations that can be characterized as a point in
-dimensional space:
These
-dimensional vectors are often called
-histories, whereas the
-dimensional space is referred to as the phase space of time series.
- (3)
The distance between the last vector, also called focal,
and each vector in the time series
with
is computed. The distance used in this study is the sum over all dimensions of the absolute difference between the values of the cases (
and
with
) also called the Manhattan distance or city block metric.
- (4)
The vector closest to is selected and denoted by . The parameter is also pre-determined using a criteria selection, generally the with the lowest sum of squares residuals (SSRs).
- (5)
Given the
neighboring vectors
; their subsequent values,
, are averaged to obtain the forecast,
.
Thus, the KNN searches for segments with similar dynamic behavior and uses them to produce the forecast. In this sense, the future short-term evolution of the time series will then be calculated using the historical patterns. To compare the above time-series methods and select the best model, we calculated the average of the sum-of-squares residual (SSR) for the ARMA, GARCH, and KNN methods. The selected model had the lowest SSR.
5. Discussion
We compare the predictions on the short-term market liquidity of the major crypto and fiat currencies by using classical time-series models such as ARMA and GARCH and a nonparametric learning machine algorithm called the KNN approach. We find that the KNN algorithm is a better predictor of the log rate of the bid-ask spreads of crypto and fiat currencies than the ARMA and GARCH models given the nonlinearity of the market liquidity and the complexity of its market microstructure, as stated by Bouoiyour et al. [
3].
We also find that the log rates of cryptocurrencies behave differently than those of the fiat currencies in developed markets. However, the short-term prediction (KNN approach) is similar in the emerging markets with fiat currencies when using a wider prediction timespan (GARCH model). The result of the cryptocurrency’s log rates is in accordance with its more complex pattern than the fiat currencies, as mentioned in Gandal and Halaburda [
4], due to its time spatial dispersion performance derived from the KNN approach and with the absence of outliers, as noted by the kurtosis statistic.
Considering a classical time-series analysis, ARMA models are better at capturing the short-term liquidity of the fiat currencies in developed countries, whereas GARCH models are better suited for estimating the behavior of the fiat currencies in the emerging market countries because their currencies are more susceptible to sudden changes or unexpected news with a lower probability of following trends. Nevertheless, the KNN approach is better suited to capture the short-term liquidity of cryptocurrencies than the ARMA and GARCH models.
The practical implications of this study are twofold. First, as the number of entities that accept cryptocurrencies increases, this study shows that using the KNN approach better explains the short-term liquidity of the cryptocurrency market than traditional time-series models. This ability is important given the speculative nature of investors in the cryptocurrency market. Second, other machine learning models are worth trying to compare the results among them.
Despite the above results, three limitation of the study we can be observed. First, the sample used in this study is small because Bloomberg has a short time publishing cryptocurrencies data and it is difficult to obtain a reliable and complete data, i.e., bid-ask rates. Secondly, the cryptocurrencies analyzed in this paper is limited to three, and according to Ong et al. [
52], there is a large array of cryptocurrency variants, alternative coins, or altcoins, that are introduced to the market on a daily basis; although, the information about cryptocurrencies can be sparse [
53].
For future research, other type GARCH models could be reviewed in the comparison analyses, such as the VAR-BEKK-GARCH model proposed by Loverta and López [
24] for log spreads times series. On the other hand, it is also possible to consider besides KNN other different intelligent algorithms that can be a route for cryptocurrencies behavior analysis, for example, the Support Vector Machine (SVM) and the Long Short-Term Memory (LSTM) model that was applied by Saadah and Wafa [
36]. In addition, other possible direction for future research might include a systematic quantitative analysis of the popularity and impact of cryptocurrencies and how they spread in a macro level as suggested by Park and Park [
54].
Finally, in this field it is necessary to deal with the interplay between liquidity, returns, and volatility in the context of long-term memory while at the same time using more nonparametric methods, such as machine learning, that may be better suited to deal with the microstructure complexity of the cryptocurrency markets.