Bitcoin is currently the most prominent cryptocurrency with a market capitalization of approximately

$116 billion (as of 26 August 2018). It was created in 2009 by Satoshi Nakamoto (whose identity remains unknown) with the purpose of having “a system for electronic transactions without relying on trust”. However, the nascent literature on bitcoin seems to find converging evidence that people use bitcoin and other cryptocurrencies more as a speculative asset rather than as a medium of exchange (see, e.g.,

Baur and Dimpfl 2018;

Baur et al. 2017;

Cermak 2017;

Glaser et al. 2014). Notwithstanding,

Dimpfl (

2017) finds that the adverse selection component of the spread in bitcoin trading is high. This raises an intriguing question about what constitutes private information in the bitcoin market and whether the price discovery process here differs from that of other asset classes.

For equities and fixed income assets, we have the notion of a fundamental price process. What about bitcoin? Reasonable arguments have been put forward questioning the intrinsic value of the cryptocurrency—even claiming it features a purely speculative price bubble process.

1 Yet, there is information that should affect the price of bitcoin. For example, the hacking of Mt Gox clearly impacted the notion that bitcoin is immune to cyber attacks.

2 Likewise, the ban of bitcoin in countries such as China and South Korea (major hubs for bitcoin mining and trading activities) should matter as well, because it affects the future value of bitcoin as a borderless medium of exchange. How does such information get impounded in the price process of bitcoin? The answer to this question ultimately boils down to how informed agents trade in this market.

The purpose of our paper is to shed light on the microstructure of one of the major bitcoin exchanges—BTC-e. Based on a rich dataset offering a full view of BTC-e’s limit order book across 150 price levels on each side of the market at the sub-second frequency, we are able to measure the extent to which information shows up at the various layers in the book (in addition to trades). The data also allows us to see how liquidity is shifting across the book and to test important implications of adverse selection on market liquidity, differentiating various theories of limit order book markets. With over nearly ten months of trading from 7 December 2013 to 24 September 2014, we are able to characterize the intraday behavior and time series variation of volume, volatility, and liquidity on this important bitcoin exchange.

3Our empirical analysis is based on the

Ricco et al. (

2018) model of a dynamic limit order market with asymmetric information and non-Markovian learning (that is, traders condition their trading and quoting decision on not only the current state of the limit order book but also the history leading to the current state). The bitcoin market is a great laboratory to test this model given the high level of adverse selection as documented by

Dimpfl (

2017). In addition, the non-Markovian learning feature of the model appears fitting for the bitcoin market, especially when fundamental information is not present or multiple price equilibria may occur (see

Sockin and Xiong 2018 for further discussion). That leaves trade and order history being even more important as the possible source for information extraction. Based on this model, we develop and test three main hypotheses.

The first hypothesis explores the link between orders’ informativeness and aggressiveness, which can then shed light on order strategies pursued by informed traders in this market. Traditional microstructure theories such as

Kyle (

1985) assume that informed traders exploit their information advantage through market orders—the most aggressive type of orders—, while recent work such as

O’Hara (

2015) argue that informed traders might prefer to hide information in various layers of the limit order book (i.e., less aggressive orders). The

Ricco et al. (

2018) model suggests that the relationship between order informativeness and aggressiveness is more nuanced: it depends on the size of the information shock. For large information shocks, informed traders are more likely to trade and quote more aggressively to realize the value of the information quickly. On the other hand, the value of small information shocks might not be sufficient to offset the cost of crossing the spread, thereby inducing them to use less aggressive orders for price improvement.

We find that on average, there is a positive relationship between order informativeness and aggressiveness up to the 10th level in the book. Beyond this level, however, the information content of an order is greater the deeper it is in the book, a finding that would be consistent with the action of informed traders with noisy signals. We then test how this relationship changes in large value shock and small value shock environments. Our hypothesis states that in the former, the relationship between informativeness and aggressiveness becomes more positive (or less negative for the outer region of the book with negative relationship). The parallel hypothesis for low value shock environment is that the positive (negative) relationship between informativeness and aggressiveness lessens (strengthens) as informed traders resort to less aggressive orders for the price improvement.

Our results support these hypotheses. Limit orders at or near the market (most aggressive limit orders) become significantly more informative, whereas limit orders farther away from the market (less aggressive limit orders) become significantly less informative, on days with large information shocks. In a low volatility environment, however, the most aggressive order types, namely marketable orders and inside limit orders, become less informative while the less aggressive limit orders residing between level 6 through 50 become significantly more informative. Our findings support the idea that the size of the information shock matters for informed traders’ order strategies.

The second hypothesis is concerned with how adverse selection affects market liquidity. If the first hypothesis uncovers the strategies of the informed traders, market liquidity as examined in the second hypothesis is the outcome of the interaction of informed and uninformed traders, given that this is a limit order market in which liquidity is supplied by the participants themselves. How adverse selection influences this outcome is an empirical question.

Ricco et al. (

2018) differentiate between whether adverse selection is due to an increased fraction of informed traders (holding the shock size fixed) or the occurrence of a large shock. In the former case, the increased fraction of informed traders does not change the informed traders’ strategies but drive uninformed liquidity away from the market, resulting in a liquidity reduction. The latter is more complex, because a large value shock can induce informed traders to trade and quote aggressively, namely an inward migration of informed liquidity. At the same time, due to increased pick-off risk, uninformed liquidity is expected to move outward. Accordingly, the net outcome could be either an increase or decrease in market liquidity.

To test the second hypothesis, we use the measure developed by

Cont et al. (

2014) to proxy for the amount of adverse selection at the hourly frequency. We use the slope of the limit order book to capture market liquidity along both the quantity and price dimensions. Most importantly, the slope can capture the migration of liquidity toward or away from the market and allows us to test the theory discussed above as close as possible. This is a novel feature of our exercise, which is possible only with the full limit order book data used in this paper. We find that in a normal environment, increased adverse selection is associated with a flatter slope, as limit orders migrate to more conservative prices. In a large value shock environment, the slope becomes even flatter, suggesting a significant reduction in liquidity. In contrast, the limit order book slope steepens significantly in low value shock environment, making it easier and less costly to trade. These results contribute empirical evidence to further our understanding of

Ricco et al. (

2018)’s model.

The third hypothesis directly tests

Ricco et al. (

2018)’s assumption of non-Markovian learning in financial markets. If price discovery is non-Markovian, lagged market variables should have incremental explanatory power, over and above that of the current state variables. In other words, traders learn from not only the current state of the limit order market, but also the path leading to the current state. Based on predictive regressions of current state variables and their 24-h histories on hourly price impact measure, we find no concrete evidence of the order book history having a significant predictive power over that by the current state variables. One important caveat of this exercise is that we examine only the linear form of dependency, and thus our evidence permits only a qualified conclusion that there is no linear dependency of price discovery on the history of the individual state variable. Our findings do not rule out the possibility that the dependency could be of a non-linear form, or on some combination of the state variables. That said, the lack of evidence for non-Markovian learning in the linear form suggests that the Markovian learning assumption typically adopted in limit order book models might not be unreasonable. This assumption is important because it allows theorist to simplify the state space significantly. Our results here provide some initial empirical support for this assumption in modeling limit order markets. Furthermore, as noted earlier, some bitcoin pricing models suggest that multiple price equilibria may occur, see in particular

Sockin and Xiong (

2018). In their model, multiple equilibria occur endogeneously and relate to future trading benefits of the cryptocurrency. Although this is one of several theoretical bitcoin pricing models that have been proposed, our empirical analysis suggests that such pricing behavior might potentially be well approximated by a Markovian regime-switching model.

Our study contributes, first and foremost, to the nascent literature on bitcoin in general and the microstructure of secondary market trading in this cryptocurrency in particular. Bitcoin is traded around the clock on many exchanges globally, all of which operate as limit order markets. As in any limit order market, the key question of interest is how informed traders choose their strategies and how the uninformed learn value relevant information from market observables. Even though this question is relatively well researched across many asset markets, bitcoin is an interesting and unique asset in part because the asset pricing theory is still in flux. It is not clear what constitutes fundamental information underlying the intrinsic value of bitcoin. Furthermore, as argued by

Zimmerman (

2018), bitcoin is extremely volatile and attracts speculative trading due to the unique blockchain structure. Accordingly, it is possible that the price dynamics are sentiment-driven and that traders’ strategies might differ from what we know based on prior theories and empirical work. It is an interesting question whether these unique features of bitcoin could upend the predictions from current theories of limit order markets.

The paper also contributes to the literature on limit order markets. Our paper is based on the full limit order book data, which permits a study of the information content and liquidity across the whole book, not just those pertaining to the top tier or the top few tiers as is commonly done in the literature. With the benefit of observing a more complete state space and action space, the obtained evidence provides a more complete view of order strategies of informed agents and market liquidity. Our evidence therefore can be helpful for the development of theoretical models aimed at capturing more realistically the dynamics of a limit order market, as well as for the interpretation of such models. An additional advantage of using the bitcoin market as an empirical lab for testing limit order market models is that it is relatively free from market frictions (e.g., free entry, no minimum order size or tick size)—a common assumption that usually does not hold in more traditional asset markets.

The paper proceeds as follows. We review the literature on the microstructure of the bitcoin market and derive testable hypotheses in

Section 1. We then describe our data and provide summary statistics in

Section 2.

Section 3 examines the link between order aggressiveness and informativeness in relation to the size of the information shock, providing evidence on order strategies of informed traders.

Section 4 analyzes the relationship between adverse selection and market liquidity. In

Section 5, we discuss our tests of the non-Markovian learning property. Finally,

Section 6 concludes.

## 1. Related Literature and Testable Hypotheses

Our paper belongs to the small but growing literature that focuses on the microstructure of the secondary market for bitcoin and determinants of bitcoin returns. We review some of the relevant papers in the first subsection. In the second subsection, we state the hypotheses of direct interest that are subsequently tested in our empirical analysis.

#### 1.1. Related Literature

Balcilar et al. (

2017) analyze the relationship between trading volume and bitcoin returns and find that volume can predict returns except in bear and bull markets, but cannot predict volatility.

Eross et al. (

2017) find that volume, bid-ask spread, and volatility all have n-shaped patterns, and conclude that European and North American traders are the main driver of trading activity in this market.

Polasik et al. (

2015) show that bitcoin returns are driven primarily by its popularity, public sentiments, and the total number of transactions.

Dimpfl (

2017) finds that the adverse selection component of the spread is high.

Caporale and Plastun (

2019) examine the day-of-the-week effect in cryptocurrency markets and find no evidence of this anomaly in most crypto currencies except bitcoin, whose returns on Mondays are significantly higher than on other days. They view this finding as evidence against efficiency of the bitcoin market.

Detzel et al. (

2018) find that bitcoin returns are predictable by the 5- to 100-day moving averages of its prices but largely unpredictable by macroeconomic variables. They explain this finding with an equilibrium model of rational learning in the absence of fundamental cashflow information.

Makarov and Schoar (

2018) document large and persistent arbitrage opportunities across cryptocurrency exchanges, a result they attribute to capital controls that prevent arbitrage capital from moving freely across borders.

Also related are studies that focus on measuring volatility and understanding the dynamics of jumps in bitcoin prices.

Conrad et al. (

2018) use the GARCH-MIDAS model to extract the long- and short-term components of bitcoin’s volatility and explore potential macroeconomic determinants of bitcoin’s long-term volatility. Out of the many macroeconomic variables considered, only four have significant effects on bitcoin’s long-term volatility (i.e., U.S. stock market volatility, U.S. volatility risk premium, global economic activity, and bitcoin’s own trading volume).

Scaillet et al. (

2018) find that jumps occur frequently on Mt. Gox, a major bitcoin exchange before its bankruptcy in March 2014. They show that order flow imbalance and the widening of the bid-ask spread can predict jumps, which in turn have a short-term positive impact on market activity and induce persistent price changes.

With bitcoin being viewed as an asset (albeit a speculative one) and traded on exchanges that operate as limit order markets, its secondary market is similar in many respects to that of other financial instruments. Thus, our study also belongs to the large literature that models and studies price discovery, liquidity provision, and order strategies in limit order markets. The differences between bitcoin and other financial assets, however, can deliver important insights as to the key drivers of market dynamics. We draw upon the theoretical model by

Ricco et al. (

2018) (henceforth RRS2018) to empirically study the link between information asymmetry and order strategies, which in turn has important implications for the price discovery process. We discuss this model and derive our testable hypotheses in the next subsection.

Finally, it is important to note that alongside the literature on cryptocurrency trading is a more established literature on blockchain technology, smart contracts underlying the working of distributed ledgers, and the organization and economics of mining and verifying transactions (see

Malinova and Park 2017 for a thorough discussion of this literature).

Pagnotta and Buraschi (

2018) develop the first equilibrium model of bitcoin value in a decentralized financial network, based on the interaction of agents’ demand for the censorship-resistance of transfers and the ability to engage in trustless exchanges, with miners’ endogeneous supply of harshing power. Thus, the model provides important insights into what can serve as the “fundamentals” of bitcoin value.

Zimmerman (

2018) provides a model to explain how the limited blockchain capacity gives rise to the extreme price volatility and the disproportionate extent of speculative trading in bitcoin, linking the “primary market” to important secondary market features of cryptocurrencies. Our paper is on the microstructure of the secondary market where bitcoin is traded, focusing more instead on the interaction of liquidity providers and demanders and how such interaction affects bitcoin returns.

#### 1.2. Testable Hypotheses

The main objective of this paper is to study the extent to which limit order book activities contribute to the dynamics of bitcoin price. Our empirical analysis is built upon RRS2018’s model that extends the framework developed in

Kyle (

1985) for a limit order market in which informed traders can exploit their information advantage through either market or limit orders at varying prices. Featured in their model are informed traders who have perfect information on the value shock that is eventually realized in the final trading round, and uninformed traders who use Bayes’ rule and observable market dynamics over time to learn about the value shock. RRS2018 characterize the optimal order-submission strategy of informed and uninformed traders at each point in time, conditioning on the path of the limit order book up to that point. As in other theories of limit order markets, the key trade-off is between price improvement and trading immediacy. Whether the information asymmetry arises from a large value shock or a greater fraction of informed traders can result in different effects of adverse selection on market liquidity and price discovery. The model provides three important empirical predictions.

First, when information asymmetry arises from a large value shock, informed traders are more likely to increase their trading and quoting activities at or near the market, because those orders have a higher execution probability and the magnitude of the private information is sufficiently large that such executions are profitable. However, this substantially increases the picked-off risk for the uninformed, inducing them to refrain from trading or quoting too close to the market. Accordingly, in this environment, the informativeness of more aggressive orders is higher. On the other hand, in a low volatility environment, the information advantage is too small to outweight the costs of trading or quoting aggressively. Hence, it is more likely that informed traders post orders further out in the book for the price improvement. In an experimental study,

Bloomfield et al. (

2005) also find that informed traders are more likely to take liquidity when the value shock is large, but switch to providing liquidity when the value shock is small.

If the channel discussed above reflects informed traders’ behavior in this bitcoin market, we expect to find:

**Hypothesis** **1A.** The information content of limit orders at or near the market increases, while the information content of limit orders at outer tiers decreases, in a high volatility environment.

**Hypothesis** **1B.** The information content of limit orders at or near the market decreases, while the information content of limit orders at outer tiers increases, in a low volatility environment.

Second, while the prediction on the informativeness of market and limit orders at the inside and outside tiers is unambiguous, the effect of adverse selection on market liquidity is more nuanced. Theoretical work in this area provide different predictions, depending on how adverse selection is defined and which measure of liquidity is used. For example, in

Rosu (

2019), adverse selection is defined by the fraction of informed traders in the market; the higher this fraction, the faster is information incorporated in prices and thus the higher the market liquidity as measured by a narrower bid-ask spread. In

Goettler et al. (

2009), adverse selection arises due to innovations in the asset’s fundamentals. It reduces liquidity at the best quote but increases liquidity behind the best quote, because informed agents use marketable orders while other agents submit more conservative limit orders.

The RRS2018 model considers both types of adverse selection: one driven by the value shock size and the other by the fraction of informed traders in the market. However, their results are based on numerical analysis and could depend on the specific parameters adopted. They find that when adverse selection increases due to an increase in the fraction of informed traders while holding the value shock size fixed, the liquidity provision strategy of the informed does not change, while uninformed traders reduce their trading and quoting activities near the market in response to the increased level of adverse selection. This situation results in an unambiguous net reduction in liquidity at or near the market, but increased liquidity at more conservative price levels—a result opposite that of

Rosu (

2019). RRS2018’s numerical results for the case in which adverse section increases due to a large value shock indicate that liquidity at or near the market could improve. This is because informed traders are likely to increase their trading and aggressive limit order activities, thereby raising liquidity provision at more aggressive price levels, which in turn could outweigh the reduction in uninformed liquidity. It remains an empirical question whether the net change is an increase (decrease) in liquidity flowing toward the market, implying a steepening (flattening) of the order book slope. The above discussion gives rise to the second set of hypotheses:

**Hypothesis** **2A.** Adverse selection due to large value shock can improve liquidity and be associated with a steeper slope of the limit order book.

**Hypothesis** **2B.** Adverse selection due to increased fraction of informed traders reduces liquidity, i.e., is associated with a flatter slope of the limit order book.

Finally, the model predicts that order history carries information beyond that conveyed by the current state of the limit order book. This arises because uninformed traders is using Bayesian updating to learn about the value shock. The path that leads the market to its current state therefore has important implications on the price impact of order flow. If so, we expect that lagged limit order book variables have incremental explanatory power on the price impact of order flow after controlling for the most current state of the limit order book. We thus have the third hypothesis:

**Hypothesis** **3.** Lagged order book variables have significant explanatory power on price impact of order flow over and beyond the most current state of the limit order book.

We focus specifically on testing for the linear form of history dependence, that is, lagged order book variables enter a linear regression linking price impact with current order book state. It is important to note that a failure to reject the null does not invalidate the non-Markovian property of learning as featured in this model. All we can conclude from this test is whether price discovery in this market is linear non-Markovian. It is still plausible that the history dependency takes on some non-linear form, or that the history of the variables considered is not the one that matters. The RRS2018 model is still under development with respect to which types of histories are important. Nevertheless, studying one set of order book variables and one specific form of history dependence is the first important step toward understanding the nature of learning in the market, and provides useful empirical evidence for further theory development.

## 2. Data

We use data from BTC-e, which is a digital currency trading platform founded in July 2011. It operated as a limit order market in which users can trade with one another via market or limit orders. The platform allowed trading between several cryptocurrencies (bitcoin, litecoin, namecoin, novacoin, peercoin, dash and ethereum) and three fiat currencies (U.S. dollar (USD), Russian ruble and Euro), but in this paper, we focus on the bitcoin–USD pair. The platform charged a flat fee of 0.2% on all transactions. It was seized by the U.S. authorities in July 2017 on international money laundering charges. During its time of operation, BTC-e was one of the largest cryptocurrency exchanges in the world, serving approximately 700,000 users including a large customer base in the U.S.

4 BTC-e was operated by a complex web of entities located in Russia, Bulgaria, and elsewhere, but its servers were located in the U.S.

#### 2.1. The Data

The data was collected through direct API access provided by BTC-e to query transaction history and limit order book snapshots.

5 Data collection began on 6 December 2013 and ended on 25 September 2014 (we exclude these two end dates from the sample due to incomplete data on the full 24-h day). This period encloses several major events in the cryptocurrency world, most notably the collapse of Mt. Gox in February and March of 2014.

Appendix A provides the list of news articles covering important events in the cryptocurrency world in general, and bitcoin in particular.

During the sample period, two computers simultaneously pinged BTC-e’s servers approximately every 0.1 s, took snapshots of the limit order book, and downloaded the most recent 150 transactions. The two computers ran in parallel, so that if one computer was down, the other was still available for the downloading to ensure completeness. We unify these two separate datasets and clean them as follows.

For transaction history data, each trade record has a unique order ID, so it is straightforward to merge the two transaction histories and eliminate duplicates. Transaction history data contain the following variables: transaction date-time stamp (one-second resolution), transaction price, quantity transacted, trade type (“ask” for seller-initiated and “bid” for buyer-initiated trades), and the order number. It is important to note that a large market order, upon arrival, might execute against multiple limit orders, each of which appears as a separate record in the transaction history data and is given an order number. As in the literature, we aggregate multiple records that likely belong to the same market order as one single trade. We identify a sequence of trade records as belonging to a “parent” market order if the following apply: (1) they span at most two consecutive seconds, (2) have the same trade type (i.e., buy or sell), (3) they occur in non-decreasing prices (for buys) or non-increasing prices (for sells), reflecting the fact that as a market order walks deeper into the book, the prices are sequentially worse, (4) their order numbers are sequential and there is no gap between adjacent orders’ numbers (e.g., 12345678L, 12345679L, 12345680L).

6 We then aggregate each identified sequence of trade records to the parent order level by summing up the traded quantities and aggregating the prices in two ways: (1) the price of the last trade record in the sequence, and (2) the volume-weighted average price. We refer to aggregated orders as trades and this is the unit of analysis of trading activity in the rest of the paper. As reported later in the paper, the trade size is still very small, which is likely due to traders slicing their large orders into multiple smaller ones to send to the exchange. The data we have do not contain trader IDs to allow for the identification and aggregation to this level.

The merging and cleaning of the two limit order book snapshot datasets is more involved. It is important to ensure the correct time sequencing of snapshots, which is a challenging task because the latency between each downloading computer and BTC-e’s servers can vary and time stamps are only up to the one-second resolution. The data downloading algorithm captures the start time of a request and the time it takes to finish the downloading of a snapshot. Thus, each snapshot in each dataset has a lower and upper time bounds, based on which we can conservatively supplement gaps in one dataset with available snapshots from the other. After this step, we obtain a combined dataset containing snapshots of the limit order book with 150 price levels on each side of the market at an ultra-high frequency.

7 We remove duplicate snapshots when there is no change in any of the price nor quantity in the book. We refer to the resulting dataset as the tick-frequency limit order book data.

To see the extent to which the price on BTC-e differs from that on other exchanges, we plot in

Figure 1 the evolution of the best mid-quote on BTC-e at the one-minute frequency and the daily closing price collected by CoinMarketCap, a cryptocurrency market tracking company. Except for the earlier part of the sample period, the price on BTC-e aligns quite closely with the general price level in the market. In fact,

Brandvold et al. (

2015) find that BTC-e, together with Mt. Gox, leads in price discovery in a study of seven bitcoin exchanges that collectively make up 90% of bitcoin trades over the period from April 2013 to February 2014.

Dimpfl (

2017) also finds that BTC-e is the most liquid market for trading bitcoin in U.S. dollar based on data from November 2016 to January 2017.

#### 2.2. The Microstructure of BTC-e

#### 2.2.1. The Limit Order Book

We plot in

Figure 2 the average shape of the limit order book, indicating the percentage of cumulative depth up to the 150th tier. The

x-axis shows the distance (in cents) from each price level to the mid-quote. For reference, the plot includes a solid line showing cumulative depth in an equally-distributed book. The distribution is convex on both sides, suggesting that there is relatively more depth available the further out the price level. The plot also reveals the difference in how depth is distributed on the two sides. On the ask side, depth is more concentrated at or near the market than that on the bid side. To supplement this plot, Panel A of

Table 1 reports some specific depth statistics at select tiers. These statistics further indicate that depth is quite spread out across the book, totaling about 320 coins across 150 price levels on each side, or averaging slightly above 2 coins per tier. At the inside tier, there are about 4.1 bitcoins on the ask and only 2.6 on the bid.

We report the average cost of immediacy as measured by the relative spread at various price tiers in Panel B of

Table 1. The spread at a select tier is computed as the difference between the price at that tier and the best price on the opposite side, standardized by the mid-quote. The inside bid-ask spread averages to 19.5 bps, comparable to the bid-ask spread of a large stock. Adding the flat transaction fee of 20 bps charged by BTC-e, the total transaction cost is at least 40 bps.

We also report the volume-weighted spread, computed as the difference between the depth-weighted average of prices up to a given tier and the best price on the opposite side. The simple spread statistics on the bid side up to the 20th tier are smaller, indicating that these bid orders are placed slightly closer to the market than their ask counterparts. However, taking into account their lower quantities (as shown in Panel A), the volume-weighted spread on the bid side is equal to or greater than that on the ask side. Furthermore, beyond the 50th tier, bid limit orders are placed at a further distance as compared to the corresponding ask limit orders. Taken together, the summary statistics presented here indicate that the cost of immediacy for a bitcoin buyer (who takes the asks) is slightly lower than that for a seller (who hits the bids), i.e., it is easier to immediately buy than sell, especially a large quantity.

We examine the time series variation of depth over the sample period (plot available upon request) and find that ask depth appears to fluctuate more than bid depth. Furthermore, there are several spikes in ask depth not accompanied by similar spikes in bid depth, especially at the top 5 price levels. This indicates that most occasions of significant market imbalances are due to excessive selling pressure on the ask side of the book. Perhaps most interestingly, bid depth often leads ask depth. Granger-causality tests with up to 10 lags on the daily average bid and ask depth at the top five tiers and across the whole book all indicate that depth on the bid side systematically leads depth on the ask side. However, this lead-lag relationship prevails only at the daily frequency and not at intraday frequencies.

#### 2.2.2. Trading Activities and Market Volatility

As reported in

Table 2, trades on the BTC-e platform occur frequently with nearly 13,000 trades per 24-h trading day on average, for a total of 5626 coins bought and 5752 coins sold. On average, the total dollar volume transacted each day is roughly

$7 million. Buyer-initiated trades generally occur more frequently than seller-initiated trades, but with a slightly smaller trade size: 0.84 versus 0.93 coins per average trade (or in the

$500 range in dollar trade size). The median of 0.1 coin and the tail statistics (5th and 95th percentiles) indicate that the trade size distribution is highly skewed to the right. 95% of trades are for 4 coins or less, which can be easily absorbed by limit orders at the top tiers in the book.

We find that trading volume is highly correlated with volatility, as shown in

Figure 3. We compute daily realized volatility from 5-min mid-quote returns.

8 The sample correlation between volume and volatility is nearly 0.68, and they both exhibit multiple spikes around major events in the cryptocurrency world. December 2013 appears to be the most volatile month in the sample period, and includes the biggest spike of over 500% in volatility. Analysis of news events in December points to China Central Bank’s decision on 5 December 2013 to ban financial institutions from handling bitcoin transactions as the catalyst. The collapse of Mt. Gox in late February 2014, the IRS’s decision to tax bitcoin as property in late March 2014, and the closure of bank accounts of Chinese bitcoin exchanges also see volatility shooting up. Interestingly, volatility does not appear affected by Dell’s and Paypal’s announcements to accept bitcoin. This evidence seems consistent with earlier research that bitcoin is used more as an asset than as a currency. The high level of volatility, averaging to about 63% during the sample period, also supports earlier studies’ conclusion that bitcoin is too volatile to serve as a store of value and medium of exchange.

#### 2.2.3. Intraday Patterns

Even though the market operates continuously 24/7, intraday patterns of key market variables in

Figure 4 indicate that the most liquid and active time is between 03:00 and 12:00 ET. After 12:00, depth, trade volume, and volatility all trend down. The bid-ask spread stays narrow for a few more hours and only widens significantly after 18:00 We also check if liquidity exhibits any day-of-week pattern widely documented in U.S. equity markets, by regressing different daily liquidity variables on day-of-week dummies. We find that these dummies are rarely significant, providing little evidence for the day-of-week pattern in either depth or trading volume. There is some mild evidence that the bid-ask spread is narrower mid-week (Tuesday through Thursday).

## 3. Adverse Selection and Order Strategies

In this section, we measure and analyze the information content of market and limit orders, in order to test the hypothesis that the link between order informativeness and order aggressiveness depends on the size of the value shock. Thus, it is important to identify when there is likely a large value shock, and compare the information content of orders of varying degree of aggressiveness in such an environment with that in a normal environment. To strengthen the analysis, we also test if the reverse is true in a low value shock environment.

#### 3.1. Identification of Large Value Shock Environment

As the first step, we partition the sample period into three subsamples corresponding to large value shock, low value shock, and average value shock environments. We use both realized volatility and the high-low range of the midquote to determine these partitions.

9 Specifically, for each day, the high-low range is computed as the difference between the log of the highest and lowest mid-quote of the day, based on the raw snapshot data available at the tick frequency. Essentially, this measure captures the return that can be achieved by an informed trader with perfect information who buys at the lowest price and sells at the highest price. Thus, it is a reasonable proxy for the asset value shock signal.

The use of realized volatility to supplement the high-low range is to ensure that a large high-low range reasonably reflects an environment with a large value shock and not one that is otherwise a tranquil day but for a fleeting price outlier. We compute the daily realized volatility by the square root of the sum of squared log returns based on the midquote sampled every five minutes.

10 The large (low) value shock days are those days when both the realized volatility and the high low range are in the top (bottom) quartile of their respective distributions. The average environment is defined as the rest of the days in the sample on which the value shock size is neither too large or too small. This exercise results in 62 days in the large value shock subsample, 58 days in the low value shock subsample, and 172 days in the average value shock subsample.

We verify that the procedure above delivers a reasonable partition of the data by conducting a comprehensive news search of important crypto market events from major news outlets, including Bloomberg, Reuters, and popular cryptocurrency news websites CCN and CoinDesk. We then classify these news articles into three broad categories: cybersecurity threats, regulatory changes, or market acceptance. The table below shows the count of each news type that falls into the high value shock, average value shock, and low value shock subsamples:

11**News Type** | **High** | **Average** | **Low** |

Market Acceptance | 2 | 4 | 1 |

Regulatory | 14 | 14 | 0 |

Security/Hack | 15 | 16 | 1 |

Total days with news | 31 | 34 | 2 |

It is clear that a disproportionately large number of important news are in the high volatility subsample, while there is almost no news on low volatility days. This provides further assurance that the high and low value shock subsamples do indeed reflect the difference in their information environment. It is also important to point out that the major type of news during the sample period relates to hacking incidences that expose the security risk of owning and trading bitcoin. Regulatory actions of various governments also seem relevant, much more so than does market acceptance. News of bitcoin being accepted by retailers as a method of payment (e.g., Overstock, Dell, and eBay) does not occur as frequently, and does not seem to be informationally relevant, as indicated in

Figure 3 by the muted response of trading volume and volatility upon such events.

#### 3.2. Information Content of the Limit Order Book

An important task in testing Hypotheses 1A and 1B is measuring the information content of limit orders at different price levels in the book. For this, we use

Hasbrouck (

1995)’s information shares, similar to

Cao et al. (

2009). With 150 price levels on each side of the market, it is not econometrically feasible to model their dynamics without some data summarizing and dimension reduction. To keep the exercise manageable, we partition the 150 levels of the order book into six categories: tier 1, tiers 2–5, tiers 6–10, tiers 11–50, tiers 51–100, and tiers 101–150, ranging from the most aggressive limit orders to the least aggressive ones. Together with the trade price series, they are all some noisy representation of the true price of one common asset and thus cannot deviate too much from the common underlying price process. Thus, we model their dynamics with a vector error correction (VEC) model based on which to compute their respective information shares. The information share of a price series reflects its contribution to the variance of the permanent price updates: the higher the information share, the more informative a price series is and the more it contributes to price discovery.

Let

${\mathbf{X}}_{i}\equiv {\left[{P}^{T},\phantom{\rule{0.166667em}{0ex}}{P}^{1},\phantom{\rule{0.166667em}{0ex}}{P}^{2},\phantom{\rule{0.166667em}{0ex}}{P}^{3},\phantom{\rule{0.166667em}{0ex}}{P}^{4},\phantom{\rule{0.166667em}{0ex}}{P}^{5},\phantom{\rule{0.166667em}{0ex}}{P}^{6}\right]}_{i}^{T}$, where

${P}^{T}$ is the last transaction price, and the remainders are depth-weighted average prices corresponding to tier 1, tiers 2–5, tiers 6–10, tiers 11–50, tiers 51–100, and tiers 101–150 (averaged across both sides and all relevant tiers). The VEC model is:

where

${z}_{i-1}$ is a

$6\times 1$ vector of correction terms, i.e.,

In our implementation, we choose

$k=10$ and estimate the model separately for each day using one-minute price data. Due to the high dimensionality of the VEC model (7 price series with 10 lags), estimating information shares at intraday frequencies might compromise the reliability of the estimates. The

Hasbrouck (

1995)’s information shares rely on two ingredients derived from the model. The first is the permanent impact of the shock vector on all cointegrated prices in the system (i.e., the long-run multipliers based on the moving average representation of the VEC model). The second is the vector of orthogonalized shocks, which we obtain via a Cholesky decomposition of the covariance matrix of the residuals

$\mathsf{\Omega}=E\left[{\eta}_{i}{\eta}_{i}^{\prime}\right]$. The information share of price series

j is then computed as:

where

${\gamma}_{i}$ is the permanent price impact of shock

i, and

${m}_{ij}$ is the

$(i,j)$ element of the lower triangular matrix

M such that

$M{M}^{\prime}=\mathsf{\Omega}$. Thus, the information share of a given price series is the contribution of its variation to the total variation of the efficient price updates. To address the sensitivity of information shares to variable ordering in the model, we perform the calculation for all possible orderings and then compute the average information shares across all orderings. These are the information share estimates we use throughout the paper.

We plot the 7-day moving averages of our information share estimates in

Figure 5, with the legend indicating the average information share for each order category over the entire sample period. It is clear that trades and limit orders posted at the top tier of the book account for the largest portion of the total variation of bitcoin price updates, 31% and 28% on average respectively. The remaining 41% average information shares come from the rest of the limit order book. It is interesting to see that information shares do not monotonically decrease the less aggressive the limit orders: they decrease through the 10th level but then increase the further away the limit orders. This is surprising and contrasts with the common belief that information content increases with order aggressiveness. Earlier research into the information content of limit order books, such as

Cao et al. (

2009), does not go this far into the limit order book. Therefore, it is not clear if we would have seen the same pattern had we examined equity market limit order books beyond the 10th tier, or if this is a peculiar feature of the bitcoin market in which the extreme volatility might increase the probability of execution for further-away orders, thereby inducing informed traders to post more of such orders. This is similar to out-of-the-money options having a greater chance of moving in-the-money the higher the volatility, which can be attractive to informed traders.

#### 3.3. Order Aggressiveness and Information Content

The RRS2018 model predicts that informed traders increase their trading and quoting at the most aggressive price levels when the value shock is sufficiently large. In contrast, in low value shock environment (i.e., the value of the private information is small), it might not be profitable for them to trade and quote aggressively, and hence orders closer to the market might become less informative.

Panels A1–A3 of

Table 3 show the average information share of trades and limit orders at varying degree of aggressiveness by the three information environments. Consistent with

Figure 5, we observe that trades carry the most information regardless of the information environment, alone accounting for nearly one third of the total variation in price updates. We continue to observe that very little information is conveyed by limit orders at tiers 6–10 (less than 3%). It seems that informed traders either trade or quote very aggressively, or submit orders away from the market.

We then perform a difference-in-mean t-test by comparing, for each order aggressiveness category, the information share in a large (low) value shock environment with that in an average volatility environment. Panels B1 and B2 contain the results of these tests, and they both provide strong support for RRS2018’s model. That is, when the value shock is large, informed traders are more likely to trade and quote aggressively to exploit their information, because the value of the information is more than offset the cost of trading or quoting aggressively. Panel B1 shows that aggressive limit orders (i.e., those placed in the top 10 tiers in the book) become statistically more informative than they normally would in an average volatility environment, whereas limit orders posted beyond the 10th tier become statistically less informative. The evidence provided in Panel B2 for the low value shock environment reveals the opposite, which is that less aggressive orders are statistically more informative, lending further support to the theory that informed traders in such an environment choose to use limit orders deeper in the book, as trading and quoting aggressively is not profitable when the value shock is small.

## 4. Adverse Selection and Liquidity Provision

We next investigate how adverse selection affects market liquidity through its impact on order strategies of informed and uninformed traders. This exercise helps demonstrate that whether adverse selection hurts market liquidity depends on the nature of the information asymmetry. For this, we first discuss how we construct the measure of adverse selection at an hourly frequency, and the measure that summarizes the state of liquidity supply. Next, we present and discuss our findings.

#### 4.1. Measuring Adverse Selection

Central to our empirical analysis is a measure of the degree of adverse selection in the market. Previous studies traditionally measure adverse selection by the price impact of trades, which reflects the common assumption underlying many traditional microstructure theories that informed traders use market orders to exploit their information advantage.

O’Hara (

2015) argues that this assumption is increasingly detached from the reality of trading in today’s high frequency trading world. Recent work has argued that informed traders can also utilize limit orders to exploit their information advantage (for theoretical work, see, e.g.,

Boulatov and George 2013;

Ricco et al. 2018; and references therein; for empirical evidence, see, e.g.,

Fleming et al. 2018;

Brogaard et al. 2018;

Cont et al. 2014, among others.) As

Cont et al. (

2014) (CKS2014) argue, trades are not sufficient to capture price movements given the sheer amount of limit order book events between any two trades. Our earlier findings on the information composition of trade and limit order activities also confirm that trades, while accounting for the largest share of price discovery, account for only 30%. The rest of price discovery is due to movements in the limit orders. Accordingly, price impact of limit order book events, and not just trades, is a better measure of adverse selection in a limit order market. Following CKS2014, we estimate this price impact measure by the following regression model:

where

$\Delta {P}_{k,i}$ is the change in the mid-quote over the sub-interval

k of interval

i, and

$OF{I}_{k,i}$ is the order flow imbalance that aggregates limit order submissions, cancellations, and executions at the inside tier over the same sub-interval. Estimating Equation (

2) separately for each interval

i, we obtain the interval’s price impact estimate

$\widehat{P{I}_{i}}$. To ensure that there are sufficient observations within each interval to reliably estimate

$\widehat{P{I}_{i}}$, we choose the interval frequency to be hourly (i.e.,

$i=0,1,\cdots ,23$), and the sub-interval sampling frequency to be one-minute (i.e.,

$k=1,2,\cdots ,60$). To compute

$OF{I}_{k,i}$, we accumulate changes at the top of the book over the sub-interval

k using the raw snapshot data available at the tick frequency. The net order flow between ticks

$t-1$ and

t is

where

${P}_{t}^{b}$ and

${q}_{t}^{b}$ are the price and quantity of coins available at the best bid (similarly defined for the best ask). If price does not change, then the order flow to the bid side is

${q}_{t}^{b}-{q}_{t-1}^{b}$. If there is an improvement in the best bid (i.e.,

${P}_{t}^{b}>{P}_{t-1}^{b}$), the order flow to the bid side equals the size of the new price-improving orders. If there is a cancellation or trade execution that completely wipes out the previous tick’s first tier and pushes the second tier forward (i.e.,

${P}_{t}^{b}<{P}_{t-1}^{b}$), the order flow is decreased by the quantity of the first tier at the previous tick. The same logic applies to the ask side. The net order flow from one tick to the next is therefore the difference between the order flow to the bid (in the first parentheses) and that to the ask (in the second parentheses). We aggregate this net order flow to the sub-interval

k level, i.e.,

$OF{I}_{k,i}={\sum}_{t\in (k,i)}{e}_{t}$.

Panel A of

Figure 6 provides summary statistics of

${\widehat{PI}}_{i}$. It is reassuring that the estimate is correctly positive 95% of the time (i.e., excess buying pressure results in price increases and vice versa). The distribution is right-skewed indicating the presence of extremely large values, with a mean of 2.4 and a median of 1.65.

#### 4.2. Measuring Liquidity Provision

Given the high dimensionality of the limit order book, the slope of the bid and ask schedules is a useful measure of how willing liquidity providers supply their bids/offers to fulfill trading demand by more impatient traders.

Næs and Skjeltorp (

2006) examine the slope and show that it is indeed very important for the trading and volatility relationship.

Næs and Skjeltorp (

2006) first measure the slope locally at each price level by the ratio of the increase in depth over the increase in price, then average across all price levels and the two sides of the book to obtain an average slope measure. Our full limit order book data allows us to estimate the slope compactly with the following regression:

where

$Q{P}_{\tau ,i}$ is the percentage of cumulative depth at tier

$\tau $ (

$\tau =1,\cdots ,150$) as of the end of interval

i and

${d}_{\tau ,i}$ is the distance between the price at tier

$\tau $ and the best bid-ask midpoint

$|{P}_{\tau ,i}-{P}_{mid,i}|$. We estimate the slope separately for the bid and the ask side and for each interval

i, and denote them by

$S{L}_{i}^{A}$ and

$S{L}_{i}^{B}$.

Because we standardize cumulative depths by the total depth across all 150 tiers,

$Q{P}_{150,i}$ is always 100%. Therefore, the slope measure depends merely on how wide the 150-tier pricing grid is (the intercept already takes care of

$Q{P}_{1,i}$). Accordingly, our slope measure captures the tightness of prices in the book. A steeper slope indicates that limit orders are priced closer to the market.

Figure 7 illustrates how the slope captures the liquidity distribution in the book. Another way of interpreting the slope is that it measures the elasticity of depth with respect to price. A steeper slope implies that for one unit increase in price, there is a greater increase in the quantity bid or offered, implying a greater willingness of liquidity providers to facilitate trading demand on each side of the market.

We plot the histogram of the slope estimates in Panels B and C of

Figure 6. Both slopes are strictly positive, reflecting that liquidity supply is greater the better the price for the liquidity providers. With zero being the lower limit, both distributions have long right tails. The ask slope is steeper than the bid slope, as indicated by both the mean and the median, and has more variability. This suggests a greater but also more volatile willingness to provide liquidity on the ask side.

#### 4.3. Does Adverse Selection Worsen Market Liquidity?

To test hypothesis 2A and 2B, we run the following regression:

where

$S{L}_{i}$ is the slope of the order book at the end of hourly interval

i,

$P{I}_{i}$ is the measure of adverse selection for interval

i,

${\mathbf{1}}_{HiVol}$ is an indicator for days with a large value shock,

${\mathbf{1}}_{LoVol}$ is an indicator for days with a low value shock, and

${Z}_{i}$ are control variables. Included in

${Z}_{i}$ are: (1) the slope on the opposite side (to see if liquidity provision on one side interacts with that on the other), (2) the total number of coins bid and offered in the book (to control for the absolute amount of liquidity available at any given point in time), (3) the fraction of depth residing at the top (to control for the possibility that a flat slope can also reflects the front-loading of depth at the top tier), (4) the current level of trading activity (to assess if trading demand plays a role in shaping liquidity provision), and (5) the prevailing volatility in the market (as volatility is well documented to affect liquidity). Depth, trading volume, and trade count are log-transformed. We also estimate a specification that includes hourly dummies to account for any diurnal effects on the order book shape and liquidity provision. We use robust regression methodology to minimize the potential impact of outliers on the results. We report the results in

Table 4. The key estimates—

$\beta $,

${\beta}_{H}$, and

${\beta}_{L}$—appear on the first three rows.

Robust across side and specifications is a significantly negative $\beta $—the coefficient for $PI$—indicating that increased adverse selection is associated with a flatter order book slope. Moreover, the significantly negative coefficient on $PI\times {\mathbf{1}}_{HiVol}$ indicates that the already negative association of adverse selection and liquidity becomes even stronger on days when large value shocks are likely present. On the other hand, in a low value shock environment, the order book steepens significantly as shown by the significantly positive estimates of ${\beta}_{L}$, suggesting a greater liquidity supply. Unreported tests indicate that the sum of $\beta $ and ${\beta}_{L}$ is significantly positive, that is, on low volatility days, adverse selection is positively associated with liquidity provision.

The flattening of the order book in the high value shock environment, compared with the steepening of the order book in the low value shock environment, indicates that it is the value shock size that matters for both informed and uninformed traders. In an environment where adverse selection is due to a large value shock, informed traders are likely to trade and quote more aggressively, resulting in informed liquidity moving toward the inside tier. However, worried about the risk of being picked off, uninformed traders’ liquidity moves away from the market. Our findings are consistent with the outward migration of uninformed liquidity outweighing the inward migration of informed liquidity, resulting in the flattening of the order book slope.

Within each information environment, the variation of adverse selection is likely due to changes in the fraction of informed traders. However, the evidence only supports hypothesis 2B (that adverse selection due to increased fraction of informed traders worsens liquidity) in the high value shock environment. In such an environment, the increased fraction of informed traders heightens the picked-off risk and contributes to drive uninformed liquidity away from the market. The opposite evidence obtained for the low value shock environment does not support RRS2018’s prediction, but is instead consistent with

Rosu (

2019)’s intuition that the increased fraction of informed traders facilitates better learning by uninformed traders, thereby improving liquidity.

## 5. Is Price Discovery History-Dependent?

Another important implication of RRS2018’s model is the non-Markovian property of information learning in a limit order market. It implies that the price impact of order flow depends not only on the current state of the limit order book, but also on the history of past order actions leading up to the current state. Given that the history space is large, it would be desirable to have some theoretical guidance as to what types of histories matter most for price discovery. The theory is still under development in this regard. Here, we rely on the data to tell us which part of the history that is important.

One important caveat is that, order history can enter the price discovery process in some unknown form. Our empirical analysis below looks specifically for linear effects, recognizing that the lack of evidence for the non-Markovian property in the linear sense does not imply that price discovery is not non-Markovian. It could still well be, just not in a linear form. Alternatively, the specific elements in the history space we consider are not the ones that matter. Nevertheless, it is still a very useful exercise to at least know that certain elements and certain forms do not work. This would be especially helpful for the development of theories with respect to the non-Markovian learning implication.

We test for the linear non-Markovian property by regressing the hourly price impact—the measure of price discovery—on a set of variables that capture the current state of the limit order book. These are: (a) the total depth on each side, (b) the concentration of depth at the top tier on each side, (c) the concentration of depth at the top 5 tiers on each side, (d) the slope of the order book on each side, (e) the level of buying and selling activity in the market, and (f) the prevailing short-term volatility. Order book variables (a, b, c, d) are measured at the beginning of each interval. Trading and volatility variables (e, f) are measured over each interval, similar to the way the dependent variable is measured. We then add the history of each of these variables, one by one, to the regression to determine if the lagged values of the chosen variable have any incremental explanatory power over the price discovery measure. We allow for a 24-h history in these variables, that is, we include 24 lags of each covariate. Let

${Z}_{i}$ be the vector collecting all of the above 11 explanatory variables for the hourly interval

i, while

$Z{\left(j\right)}_{i}$ denotes the individual explanatory variable

j. We run the following regression for each explanatory variable

j:

Panel A of

Table 5 shows the baseline regression results that include only the explanatory variables that capture the current state of the limit order book. The regression results are consistent with intuition: (1) price impact is lower when trading is more active (regardless of the side), (2) price impact is higher when the market is more volatile, (3) greater ask depth (indicating increased selling pressure) dampens price impact while greater bid depth (indicating increased buying pressure) increases it, and (4) a greater concentration of depth at the top 5 tiers reduces price impact.

It is interesting to note that the concentration of depth at the top tier does not explain price impact, nor does the slope. It seems that market participants care only about the total amount of depth in the book and how much of that depth resides at the top 5 tiers. We also include hourly dummies to control for the potential diurnal pattern of price discovery, but find that these dummies have little explanatory power, and that the coefficients of order book variables remain robust to the inclusion of the hourly dummies. Thus, for regressions that include lagged explanatory variables, we use the baseline model without the hourly dummies.

Panel B of

Table 5 documents the incremental explanatory power of each covariate’s 24-h history, as indicated by the number of lags with a significant coefficient and the adjusted

${R}^{2}$ of the regression. These lags are rarely significant. At the 1% significance level, the history of most order book variables studied here is completely insignificant (except the total bid depth and the realize volatility, each of which has only 1 significant lag out of the 24-h history). Even at the 10% significance level, the number of significant lags for most variables is very small, at most 4 out of 24. In addition, the inclusion of the lags reduces the adjusted

${R}^{2}$, as opposed to increasing it. Overall, Panel B shows little support for the linear history-dependence of price discovery.

## 6. Conclusions

Our paper provides an in-depth analysis of the information content and liquidity of the limit order book on a major bitcoin trading platform. We find several important results that contribute to both the growing literature on bitcoin trading and the literature on limit order book modeling.

First, trades and limit orders at the best bid and ask are most informative. The information content then decreases the more conservative the price up to the 10th best level, and mildly increases for orders posted at far-away tiers. In a high value shock environment, aggressive orders become more informative, suggesting the increased use of such orders by informed traders. On the other hand, in a low value shock environment, the informativeness of market orders and best limit orders is reduced while that of mid-book orders increases. Second, we find that the information shock size matters for market liquidity. When asset volatility is high, adverse selection worsens liquidity, in contrast to the improved liquidity in the low volatility environment. Overall, the complex liquidity and volatility dynamics reported in our paper provide further justification for the skepticism on the part of monetary policy authorities towards the broad use of cryptocurrencies.

Lastly, we do not find concrete evidence to support the linear dependency of price discovery on the history of individual state variables. This result indicates that the Markovian learning assumption typically adopted in limit order market models might not be unreasonable after all. Future research can shed further light on this important assumption by employing non-linear models of dynamic limit order books (see, e.g.,

Nguyen et al. 2019). Such research will be valuable in enhancing our understanding of the complex dynamics of price discovery in limit order book markets.