Abstract
We introduce a class of Markov models to describe the bid–ask price dynamics in the presence of liquidity fluctuations. In a highly competitive regime, the spread evolution belongs to a class of Markov processes known as a population process with uniform catastrophes. Our mathematical analysis focuses on establishing the law of large numbers, the central limit theorem, and large deviations for this catastrophe-based model. Large deviation theory allows us to illustrate how huge deviations in the spread and prices can occur in the model. Moreover, our research highlights how these local trends and volatility are influenced by the typical values of the bid–ask spread. We calibrated the model parameters using available high-frequency data and conducted Monte Carlo numerical simulations to demonstrate its ability to reasonably replicate key phenomena in the presence of liquidity fluctuations.
MSC:
60F05; 60F10; 60J27; 60J28; 60J75
1. Introduction
The “order book” (OB) refers to an electronic list used to describe the evolution of bid and ask prices and sizes in high-frequency electronic markets, such as NYSE-ARCA, LSE, or NASDAQ. The evolution of the OB results from the interaction of buy and sell orders through a rather complex dynamic process. Order book dynamics has been extensively studied in the market microstructure and econophysics literature ([1,2,3]). More recently, based on empirical characteristics presented in these studies, several models for the evolution of the OB have been proposed, as seen in [4,5,6,7]. These models, which are Markovian queueing systems, primarily focus on the direction of the next price movement and provide good results, offering a more or less clear understanding of price dynamics in conditions of uninterrupted high liquidity, i.e., they assume an abundant availability of limited orders in the OB. In this high liquidity context, the prices are relatively stable with small temporary fluctuations, and the bid and ask sizes at the top of the OB provide valuable information on these short-term price fluctuations.
Conversely, in various markets, prices are not as stable; they exhibit significant changes and, in some cases, local downtrends, often caused by liquidity wells in the OB. Events such as the 6 May 2010 “flash crash” (see Figure 1), which was a sudden and severe drop in stock prices in a very short time, have raised concerns about the stability of the OB and its suitability as the primary mechanism for trading. Additionally, occurrences of mini flash crashes—rapid and significantly large directional movements in asset prices—have become increasingly common (see [8]). These events lead to temporary liquidity crises, resulting in larger spreads. Consequently, it is both practically and theoretically important to gain a better understanding of how price dynamics depend on the structure and fundamental parameters of the OB (see [9,10,11]).
Figure 1.
A graph of the S&P500 futures on the day of the flash crash of 6 May 2010 at 2:45 p.m.
In the present paper, we are interested in understanding how severe intermittencies in liquidity affect the order book dynamics. The contexts in which there are significant and intermittent decreases in the OB’s ability to absorb market orders are what we call “liquidity fluctuations”. We propose a simple model for price dynamics in an OB with the presence of liquidity fluctuations, our model explains in a simple way how large price fluctuations occur, fluctuations such as those observed in flash crashes. Furthermore, it shows us how these local trends and volatility are determined by the typical values of the bid–ask spread. From our price model, a model for the dynamics of the spread is implicitly derived, we use this model to analyze large deviations in the spread and its impact on prices, we present these large deviations in the form of “optimal trajectories” that give us relevant information about their occurrence. Finally, we present Monte Carlo simulations to corroborate that our model reproduces relevant empirical characteristics observed in our data as well as documented in the literature, such as the famous bid–ask bounce; see [12].
We were initially motivated by the local and seemingly patternless price trends observed in various markets. Our goal was to understand the connection between these long-term trends and the short-term micro-jumps in prices, as shown in Figure 2. Our initial conjecture motivating this work was the existence of a close relationship between the spread, price trends, volatility around these trends, and liquidity fluctuations. Consequently, we needed to jointly model spread and price dynamics. Figure 2 illustrates that the local trend is shared by both bid and ask prices. This observation suggested that our model should not only capture long-term price trends but also incorporate the asymptotically stationary behavior of the spread. Large spread and price changes are typically attributed to liquidity changes ([13]). Our curiosity about this phenomenon grew, leading us to investigate how significant fluctuations in spread and prices, such as those seen in flash crashes, are related to liquidity fluctuations.
Figure 2.
Intraday evolution of the ask (red) and bid (green) prices, Apple Inc. (Cupertino, CA, USA) AAPL stock, 4 March 2011. Left: short-term, 1 min. Right: long-term, 15 min. Figures created by H.Rojas.
Our paper is organized as follows. In Section 2, we elucidate key empirical characteristics within an order book exhibiting liquidity fluctuations. We also introduce a general class of continuous-time Markov processes to model bid–ask price dynamics. Section 3 outlines a comprehensive Markovian model for an order book with high liquidity fluctuations. Here, we present the fundamental mathematical statements concerning stability (invariant measure), the law of large numbers, the central limit theorem, and large deviations. In Section 4, we present our numerical results. Section 5 delves into extensions and proposes models for two additional liquidity regimes: non-competitive and low liquidity. Our conclusions are presented in Section 6. The appendix, our final section, contains auxiliary results and proofs of the main statements presented in Section 3.
2. Markov Model and Regimes in Liquidity Fluctuations
A very important empirical characteristic observed in markets with liquidity fluctuations is the low availability of orders in the OB; the queue sizes at the top of the OB are small most of the time; see, e.g., Figure 3. In this context, the queue sizes of the best bid and ask prices are no longer the determining factors in the dynamics of prices; for more details see [13]. If the liquidity intermittency is severe, even “gaps” are formed in the OB (blocks of adjacent price levels that do not contain quotes). In these cases, the distribution of price changes is mainly determined by the distribution of the gap sizes in the OB. Taking these facts into account, if our interest is to explain the observed long-term price trends, we can focus only on micro-jumps in prices and disregard the size of the queues.
Figure 3.
Joint empirical distribution of bid and ask queue sizes at the top of the order book; Apple Inc. stock, 4 March 2011. Figure created by H.Rojas.
In these liquidity regimes, the spread exhibits a quite flexible dynamic behavior, reaching values much larger than those observed in high liquidity conditions; see, e.g., Figure 4. Based on our empirical experiences, the OB slowly digests liquidity fluctuations and we can characterize that process in two stages. In the first stage, the spread begins to increase persistently. In the later stage, the spread is reduced; the reduction can be drastic or gradual. The closing type of the spread in the second stage depends on the intensity of the liquidity fluctuation.
Figure 4.
Empirical distribution of the bid–ask spread, Apple Inc. stock, 4 March 2011, corresponding to 15 min of observation (blue). The invariant distribution is calculated by Formula (6) (red). Figure created by H.Rojas.
Our empirical observations about the reversing process of the bid–ask spread to its typical values, before and after liquidity shocks, have been theoretically corroborated through equilibrium models; see [14]. In this paper, we consider different types of reversing processes of the spread, i.e., we consider three low-liquidity regimes: highly competitive, non-competitive, and low liquidity with gaps. These three regimes correspond to low-liquidity regimes but differ in the closing type of the spread and in the gaps present in the OB.
The Markov Model: A General View
Building upon our discussions in the preceding sections, we propose a simplified model for an order book (OB) with liquidity fluctuations. Let represent the (best) bid price and denote the (best) ask price. The state of the OB is characterized by a continuous-time process , taking values in the discrete state space (a two-dimensional lattice). Here, represents the “tick size”, and, as usual, denotes the set of integers.
Here, we aim to elaborate on our choice of employing the set of integers as opposed to restricting ourselves to the set of positive integers. This decision is motivated by several factors. One primary reason is the inherent simplicity that arises from utilizing in terms of mathematical description and subsequent analysis. Introducing boundary conditions, such as reflections, can significantly complicate the analytical process.
Furthermore, an additional rationale behind our choice lies in our focus on analyzing systems characterized by constant parameters. In situations where the price tends towards zero, it is possible that this behavior is driven by changes in the underlying system’s parameters (due to a crisis, for example). In contrast, it might not necessarily be an outcome of the dynamics within a model governed by the same parameters. This consideration underscores our approach to employ for its greater flexibility and relevance to the specific context of our analysis.
For the sake of simplicity, we consider as the state space of , interpreting each state as a multiple of . The price process manifests piecewise constant sample paths, with transitions corresponding to order book events that trigger price fluctuations (as illustrated in Figure 2). Our objective is to describe the asymptotic behavior of the price process arising from numerous micro-jumps.
Based on this simplified representation, consider a continuous-time Markov chain with state space
Here, represents the bid price, represents the ask price, and is the bid–ask spread. In general, the transitions of the chain are defined by the following transition rates: given a state , then
in all cases, the increment is a positive integer number. The function (resp. ) is the rate at which increases (resp. decreases) in the ask (resp. bid) price occur as a result of the execution of market buy (resp. sell) orders or cancellations of limited sell (resp. buy) orders, as well as, that the function (resp. ) is the rate at which the decreases (resp. increases) in the ask (resp. bid) price occur as a result of a limited sell (resp. buy) order placed within the spread.
We study the asymptotic behavior of as t goes to infinity. To facilitate this analysis, it is convenient to consider an equivalent process, denoted as with state space . Although both and contain the same information, the latter representation offers better control for our asymptotic examination. The transitions of the chain are defined by the following transition rates: given a state of the Markov chain, then
Since the transition rates of depend only on the second coordinate, the spread, we see that alone is the continuous-time Markov process and has the following transition rates. Suppose that at some moment the spread is , then
Based on the model (1) and its alternative representation (2) and (3), it is possible to define three low-liquidity regimes: highly competitive, non-competitive, and low liquidity with gaps. Any regime is defined by how the rates depend on the increment which is usually determined by the intensity of the liquidity fluctuations.
In this paper, our focus is primarily on the first regime, namely, the highly competitive regime, while the other two regimes are outlined briefly. The findings presented here can be generalized for the other two regimes, but we believe that there will be no significant qualitative difference in the results.
3. The Markov Model: Closing the Spread Uniformly (Highly Competitive Regime)
The highly competitive regime (HC regime) is characterized by very small opening steps of the spread and a rapid decrease in it. This regime is consistent with a rapid reversing process of the spread and the absence of gaps in the order book (OB). The rapid decrease in the spread is caused by the competitive behavior of impatient agents who place quotes within the spread, prioritizing the execution of their placed limit orders. In the considered model (1), we define the rates in such a way that the spread can increase by only one unit. For a given spread length, denoted as k, the next length of the spread is chosen uniformly from the set .
In order to define the rates for the highly competitive regime, we make use of our notation and fix the parameters , , , and , which are strictly positive real numbers. Furthermore, the terms and are employed solely as parameters of the model and not as functions. The transition rates for the Markov chain are defined as follows: given that the chain is in a state at a certain moment, then
For an illustration, see Figure 5 in the case when .
Figure 5.
The rates for the highly competitive model. An illustrative example of the case when . Figure created by A.Yambartsev.
In this regime, the transition rates of , see (3), are the following: suppose that at some moment the spread is , and let and , then
Note that is an irreducible Markov chain in this regime.
3.1. Ergodicity and Invariant Measure for
We begin by analyzing the stability of the spread . The following theorem establishes ergodicity, representing one of the rare instances where we are able to determine the invariant measure for the process.
Theorem 1.
In a highly competitive regime model, for any positive values of parameters the spread is a positive recurrent Markov process with an invariant measure denoted as given by the following formula: let
These findings regarding the stationary asymptotic behavior of the spread process are in line with the empirical observations illustrated in Figure 4.
We conclude this section with the following observation: the process falls within the category of processes referred to as population processes with uniform catastrophes. An extension to processes with almost uniform catastrophes (as defined in Section 5.1) was explored in [15]. In that work, the following result was established for the maximum of the process: for any fixed
3.2. Local Drift (LLN for the Prices)
The following theorem addresses the law of large numbers (LLN) as applied to prices. This theorem will illuminate the local trends (local drift) exhibited by the prices.
Theorem 2.
With probability one scaled by the time the bid price converges to a constant
where
This result validates our conjecture regarding the impact of the spread on the local trend of prices. From a practical standpoint, given the jump rates of the bid and ask prices, we can easily compute the price trend.
3.3. Price Volatility (CLT for the Prices)
In this section, our focus is on examining the connection between price volatility and price jump rates. Specifically, we establish a central limit theorem (CLT) for the price process. We articulate the volatility of price fluctuations around local drift in relation to the jump rates of the ask and bid prices. In other words, the central limit theorem holds for the process depicted by (A3).
Once more, we begin by demonstrating the central limit theorem (CLT) for the embedded discrete-time dynamics of the price (Lemma 1), followed by establishing the CLT for the continuous-time chain (Theorem 3). We have included the proofs in the Appendix A.
Lemma 1.
Let , then
in distribution, where
Lemma 1 establishes a connection between the “coarse-grained” volatility of intraday returns at lower frequencies and the high-frequency jump rates of prices. In simpler terms, it asserts that prices exhibit a diffusive behavior around a local drift over time, with a diffusion coefficient of . Consequently, price volatility, as determined by the number of micro-jumps in prices, is given by
Here, n represents the total count of high-frequency price jumps. Equation (8) presents a means to estimate price volatility without requiring long-term price observations. Optionally, the parameter can be interpreted as the intraday realized volatility of the asset. Thus, relation (8) establishes a link between the realized volatility and the high-frequency parameters of the order book.
Based on Lemma 1, we established Theorem 3; see proof in Appendix A.4. Note that the proof of the law of large numbers provides the following representation for local drift D in continuous time , where v is the local drift for an embedded chain provided by Lemma A1 (see Appendix A.2).
Theorem 3.
Let , then there exists such that
in distribution.
3.4. Large Deviations for the Spread
It is known that in the context of liquidity fluctuations, even a small order can trigger a substantial price change, thereby leading to a significant increase in the spread ([3,13]). Consequently, our interest lies in comprehending the mechanisms behind substantial spread changes without altering the model’s parameters. We believe that such analyses can contribute to evaluating the order book’s resilience against severe liquidity fluctuations.
In this section, we present an application of the large deviations theory to the Markov process that describes the dynamics of the spread. Specifically, we investigate the asymptotics of large deviations for the spread process. Our goal is to identify the most probable trajectory associated with a specific state of the spread, particularly when it becomes very large, over a given time interval.
The topic of large deviations for Poisson processes with uniform (or almost uniform) catastrophes has been explored in [15,16]. Large deviation analysis serves as a culminating step within a sequence of limit theorems for such processes. While the theory of large deviations is well-developed, the processes examined here do not satisfy the “classical” conditions. Consequently, the proof of large deviations remains quite technical.
In order to provide the large deviations, we need some increasing scaling parameter. Let T be the length of the time interval over which we observe our process. We consider the following scaled process:
We say that the family of the random variable satisfies the large deviation principle (LDP) on with the rate function if for any the set is compact and for any set the following inequalities hold:
where is the Borel -algebra on and are the closure and open interior of the set B, respectively. This principle was established in [16], in which the logarithmic asymptotic for the probability was calculated. Note that the principle was proved for the state x of the spread at the time T, it is not the principle on the functional space. The principle on the functional (trajectory) space provides us the possibility to find the (unique) optimal trajectory—the trajectory which shows how such deviation (a rare event) occurs taking into account the evolution of the spread.
An initial approach to proving the principle in the functional space is to establish the local large deviation, which involves studying the asymptotic behavior of the probability of the process remaining within a small neighborhood of a given continuous function. We say that the family of the processes satisfies the local large deviation principle (LLDP) on the set with rate function if for any function the following inequalities hold:
where is the space of càdlàg functions, i.e., the functions that are continuous from the right, and have a limit from the left; and where .
The LLDP was proved in [15] for the compound Poisson process with almost uniform catastrophes. We note only here that the process is the special case of the processes considered in [15]. Let G be a set of absolutely continuous functions that are positive on the interval (0, 1]. In order to write the corresponding rate function, we need to remember that any function with a finite variation can be uniquely represented as a difference of two non-decreasing functions and such that . The functions and are called the positive and negative variations in the function f, respectively. Now, the rate function for can be represented for as follows:
where stands for the derivative of function f and is the indicator function.
We note that the large deviation principle and local large deviation have the same normalization factor for the probabilities, . This provides the existence of an optimal trajectory for the large deviations. The existence of the optimal trajectories of large deviations was established in [16]. If , then there exists the moment such that the spread process stays near zero up to the time and after that , increases according to the straight line which starts at point and grows up to the point with the slope ; see function in Figure 6A. If , then the process grows together with the straight line starting from the origin up to the point , i.e., its slope is x; see function in Figure 6A. For illustrative purposes of comparison, in Figure 6 we represent the optimal trajectories that provide large fluctuations for the Poisson process with rate and the process , that is, the Poisson process (of rate ) with uniform catastrophes (of rate ).
Figure 6.
The optimal trajectories for (A) spread process, which is a Poisson process (of rate ) with uniform catastrophes (of rate ), and (B) Poisson process with rate . If , then the large deviation occurs according to the functions . If , then the large deviation trajectory is in the neighborhood of the straight line . Figures created by A.Yambartsev.
3.5. Large Deviations for the Prices
The large deviation result for the spread suggests the question about the behavior of prices under a large spread. The rate function corresponding to the large deviation is essentially the rate function of a Poisson process with rate , which consists of the rates and . Here, we provide some qualitative behavior of optimal price trajectories without proof. The qualitative picture is represented in Figure 7.
Figure 7.
The optimal trajectories for prices under a large deviation of the spread when (A) the scaled spread x is less than , which consists of the rates , i.e., ; and (B) the scaled spread . Figures created by A.Yambartsev.
The main difference between the behavior of the optimal trajectories of Poisson processes and our process lies in the inclusion of the indicator function within the rate function, as seen in (9). This indicator function imposes a constraint on the possible values of the line slope—it cannot be lower than the rate of the Poisson process. Consequently, when the scaled spread is less than , a “bifurcation” point emerges. After this point, the upper line has a slope of and the lower line has a slope of . As the scaled spread surpasses , the slopes change, but the relationship between the contributions of rates and remains constant.
4. Numerical Simulations and Applications
This section provides a detailed description of the data and the empirical facts relevant to them. We also present computational simulations used to calibrate the model parameters, using the data to validate some qualitative model outcomes. Additionally, we demonstrate a practical application that confirms our model’s short-term predictive capabilities.
4.1. HFT Data
The dataset consists of NASDAQ high-frequency trading (HFT) data for Apple Inc., collected through the Bloomberg stock trading platform. High-frequency data are collected within the day (intraday), and recorded tick by tick. The dataset covers 12 h of trading activity each day, specifically, the entire trading population for 3 and 4 March 2011. During the first 12 h of market operation, the order book quotes, measured by the frequency of price jumps, remain stable for periods ranging from 180 to 540 min after the market opens, as shown in Figure 8. Consequently, for both trading days, we will analyze data from within these time intervals. It is noteworthy that during these intervals there are approximately 305 thousand price jumps per day, a characteristic that typically persists on a daily basis.
Figure 8.
Intensity of price jumps: average number of micro-jumps per minute. Upper graph: 3 March 2011. Lower graph: 4 March 2011. Figures created by H.Rojas.
4.2. Empirical and Qualitative Facts
In this section, based on the available data presented above, we present some empirical and qualitative characteristics that are typically observed in high-frequency trading markets. The objective of this section is to corroborate whether the assumptions underlying our model align with these recurring qualitative features commonly found in most markets.
As previously mentioned, our model’s key assumption is that in conditions of low liquidity, characterized by a high intensity of price jumps, the sizes of the bid and ask orders diminish in significance as the primary factors influencing the price dynamics. In contrast, under these conditions, the bid–ask spread becomes the determining factor in predicting price dynamics.
In order to empirically validate this assumption, we will label the observed price variations into two categories: up variations for upward variations, and down variations for downward variations. After binarizing the price jumps, we use them as binary labels (target variable) to classify jumps based on the bid–ask spread and the sizes of the bid and ask orders as predictors.
The idea is that the greater the predictive power, the greater the influence on price dynamics. It is worth mentioning that to measure the predictive power of order sizes, the imbalance was considered, which corresponds to the fraction of the ask size and the sum of total orders, that is, the bid size divided by the sum of the bid and ask sizes.
There are various dissimilarity metrics, known as divergence measures in information theory, that we can use to assess the predictive power of the bid–ask spread and the imbalance. Due to its simplicity and wide use in classification problems, we will employ the Jeffreys divergence, also known as the information value (IV); see, for example, reference [17].
Based on the information value (IV) (see Figure 9) we corroborate that the bid–ask spread has a persistent and relevant influence on the price dynamics. On the other hand, the imbalance, and consequently the size of the orders, has an influence that disappears quickly over time. Therefore, our main assumption of the model, the prevailing influence of the spread on price dynamics, is empirically validated in our data. It is worth noting that the influence of the bid–ask spread fluctuates, and our model implicitly accounts for this empirical observation.
Figure 9.
The vertical axis corresponds to the information value (IV). The horizontal axis corresponds to the time lag (lags) taken into account for the calculation of the divergence, that is, contiguous periods where price jumps occur. The graph on the left corresponds to the spread bid–ask predictor. The graph on the right corresponds to the imbalance predictor. Figures created by H.Rojas.
4.3. Parameter Estimation and Monte Carlo Experiments
In this section, we explore the steady-state properties of our proposed model using Monte Carlo simulations. We compare the empirically observed long-term behavior (unconditional properties) of the OB to simulations of the fitted model. The goal of these simulations is to indicate how well the model reproduces the average properties of the OB. The transition rates of can be estimated by
where T is the length of our sample (in seconds), () is the total number of jumps where the ask price increases (decreases), and () is the total number of jumps where the bid price increases (decreases).
From the Apple stock data, since our model (4) only allows spread openings in one tick, we selected approximately 15 continuous minutes of trading for which spread openings only occurred in one tick, which corresponds to the interval of 365 to 380 min after the market opens. In this sub-sample, using (10), we obtain the following: s, , , , and . Based on the estimation of the parameters , we simulate the price process over a long horizon of 900 s, which corresponds to what was empirically observed, and observe the evolution of prices in two time windows. The results are displayed in Figure 10. The results of our simulations illustrate that our model reproduces realistic characteristics for both the short- and long-term price behavior, which were presented for the empirical data in Figure 2.
Figure 10.
Simulation of the order book with parameters , , , and . Upper left: Short-term evolution of bid (blue) and ask (red) prices, 1 min sample. Upper right: Long-term evolution of the prices, 15 min sample. Bottom left: short-term path of the price process , 1 min sample. Bottom right: long-term path of the price process, 15 min sample. Figures created by H.Rojas.
The simulation results demonstrate that our model also accurately captures realistic characteristics of the (steady-state) average behavior of the order book (OB) profile. Notably, the model successfully replicates the negative autocorrelation of price changes at the first lag. Empirical observations indicate that there is a pronounced negative autocorrelation at the first lag in the autocorrelation function of the transaction price returns. This negative autocorrelation is significant at the first lag and rapidly diminishes thereafter, as depicted in Figure 11.
Figure 11.
Autocorrelation function of price return based on our simulations of the order book. The red dashed line represents the 95% confidence interval. Figure created by H.Rojas.
This phenomenon is commonly referred to as the bid–ask bounce [12], largely arising from having distinct trading prices for buyer-initiated and seller-initiated transactions. While this negative autocorrelation vanishes when considering aggregate returns, it is a noteworthy microstructural effect that must be considered in an order book model. Our model successfully replicates this empirical characteristic. Therefore, we conclude that we have sufficient evidence to argue that our model reproduces qualitative characteristics that are realistic and relevant.
4.4. Application: Prediction of Next Price Jump Direction
In this section, we present a direct application of our model that corresponds to the short-term forecast of price movement; that is, we use the proposed model to calculate the probability that the price will increase in the next jump, this probability is conditional on the observed state of the OB. This amount is particularly important in financial trading as it is used in the design of high-frequency trading strategies. From the transition rates of the price process in (4), the probability that the price will increase in the next jump conditional to the observed state of the OB is given by
where is the change in the mid-price. To increase the precision of the forecasts, we suggest calculating the theoretical probabilities taking into account the different fixed values of the imbalance and the spread, that is,
where is the numbers of ask orders and the numbers of bid orders. In other words, we will use (11) for buckets or bins generated by combinations of spread bid–ask and imbalance values.
As for the calculation of the IV, for the purposes of empirical contrast of the theoretical quantities , the observed variations in the mid-price were classified into two categories. Variations with mid-price increases were categorized as up variations. On the other hand, downward variations were categorized as down variations. Once the variations in mid-prices were dichotomized, the total set of 305 thousand observations was divided into two sub-samples.
The first sub-sample, which we call the training sample, corresponded to 70% of the total observations and was used to estimate the model parameters. The second sub-sample, called the test sample, corresponding to 30% of the total observations, was used to validate the performance of the model forecasts.
With the estimated parameters, we calculate the empirical probabilities using the test sample. Additionally, using the (11), we calculate the theoretical predicted probabilities for the same dataset. For comparison purposes, in Figure 12, we present the results for both quantities. The figure confirms the good precision of the model to predict variations in the mid-price. Furthermore, to numerically confirm the precision of our forecasts, we present in Figure 13 the ROC curve, which corroborates, with a score of 96%, the predictive power of the model to events of up variations in the mid-price.
Figure 12.
Empirical probabilities versus theoretical probabilities . Figures created by H.Rojas.
Figure 13.
Accuracy in classifying variations up; the event of interest (target variable) is the variation up. Figure created by H.Rojas.
5. Discussion: Other Regimes
In this section, we outline the formulations for the remaining two regimes that can be encompassed within our general model. As previously indicated, the primary findings of this article have the potential for generalization to these alternative regimes. However, our belief is that, in qualitative terms, there will be minimal disparities between the outcomes. We anticipate that these subsequent formulations will serve as incentives for future research endeavors aimed at extending and broadening the scope of our results.
5.1. Almost Uniform Catastrophes
As we mentioned before, the large deviations were proved for the so-called almost uniform catastrophes. Recall that in order to close the spread of length k (with probability ), we choose the next state for the spread with the same probability (uniformly) from the set and denote these probabilities as , and here . The almost uniform distribution is defined by the following form of probabilities : there exists a constant such that for all
for all . It extends the class of models for highly competitive regimes. For example, for any length k of the spread, it can be divided into some parts, say two parts, and we say that with probability , we choose one part, and then uniformly the state from this part is chosen; with probability , the second part is chosen and the corresponding state is chosen uniformly.
All the proofs above can be slightly modified.
5.2. Non-Competitive Regime
The main features of the non-competitive regime (NC regime) are small openings of the spread, similar to the HC regime. However, in the NC regime, there is a slow decrease (following a power law) in the spread. This slow decrease occurs because agents placing limit orders within the spread prioritize achieving an optimal price in their quotes. Compared to the HC regime the agents are less impatient. With some constant rate, the spread opens by one tick. For the closing spread, let be the spread size, and the variation in the prices is chosen from according to the rate which is proportional to , where is a fixed positive number. The parameter can be interpreted as a behavioral measure for agents to obtain a more favorable price in their negotiations.
Model: Closing the Spread Polynomially
In order to define the rates for the NC regime, let us again fix parameters which are strictly positive real numbers. Suppose that at some moment the chain is at some state , then the transition rates for the Markov chain in this regime are defined in the following way:
For illustration, see Figure 14 in the case when .
Figure 14.
The rates for the non-competitive model. An illustrative example for the case when . Figure created by A.Yambartsev.
Again, as before, first, we study the Markov chain . Suppose that at some moment t the chain is at some state , and let and , then
These transitions suggest that the spread dynamics have a slow reversal process to their typical values, this is because each liquidity provider competes with the others to spread closing.
5.3. Low Liquidity with Gaps Regime
The main feature of low liquidity with gaps regime (LLG regime) is that the spread can open by more than one tick, this is due to the existence of gaps in the OB. The spread decreases similarly to the NC regime.
Model
Let us fix parameters and , which are strictly positive real numbers. Suppose that at some moment the chain is at some state , then the transition rates for the Markov chain for this regime are defined as follows.
For illustration, see Figure 15 in the case when .
Figure 15.
The rates for the low-liquidity model. An illustrative example for the case when . Figure created by A.Yambartsev.
This Markov chain can be described more easily informally in the following way: for a given state
- with the rate , the chain decides to increase the ask price, and it chooses the increment according to the geometric distribution with parameter ;
- with the rate , the chain decides to decrease the ask price, and it chooses the increment according to the truncated geometric distribution with parameter and values ;
- with the rate , the chain decides to decrease the bid price, and it chooses the increment according to the geometric distribution with parameter ;
- with the rate , the chain decides to increase the bid price, and it chooses the increment according to the truncated geometric distribution with parameter and values .
Once more, following the same approach, we initiate our analysis by studying the Markov chain . Let us consider a particular time instance t when the chain is situated at state , then the following transitions occur:
6. Conclusions
We propose a straightforward model for price dynamics in an OB with liquidity fluctuations. Unlike [9], our model does not explicitly capture liquidity fluctuations but still offers a reasonable approximation for empirical observations. The continuous-time Markov process describing spread dynamics falls into the category of Poisson processes with uniform catastrophes, where the eliminated fraction of the population follows a uniform distribution. Large deviation results for such processes have been studied in [15,16].
When the spread closure (catastrophe) follows a uniform distribution, it accurately represents a scenario of complete uncertainty in the decisions of bidders. In such a case, any change in the spread is equally probable, indicating an exceptionally unusual situation often associated with extremely high volatility. By examining such scenarios, it is conceivable, to a significant extent, to devise a decision-making algorithm with a substantial degree of reliability. Alternatively, if one seeks to hedge against substantial fluctuations in the spread, our model may offer insights into calculating the appropriate insurance premiums needed to ensure that the risk of financial collapse remains below a predetermined threshold.
We examined the asymptotic behavior of the model, which involved determining the invariant measure (a rare case where it can be derived explicitly), and establishing results such as the law of large numbers, the central limit theorem, and large deviations. These theoretical findings were utilized in Monte Carlo simulations to validate that our model reproduces relevant empirical characteristics.
We conclude our paper by discussing potential future research directions, including the exploration of other liquidity regimes and model extensions for various applications.
Author Contributions
Conceptualization, H.R., A.L. and A.Y. All authors have read and agreed to the published version of the manuscript.
Funding
Artem Logachov is supported by the Ministry of Science and Higher Education of the Russian Federation FWNF-2022-0010. Anatoly Yambartsev thanks the support of FAPESP via grant 2017/10555-0.
Data Availability Statement
The data presented in this study may be available on reasonable request from the first or corresponding author.
Acknowledgments
We thank Sasha Stoikov for providing us with the NASDAQ high-frequency trading (HFT) data for Apple Inc., collected through the Bloomberg stock trading platform. We thank Vadim Scherbakov for fruitful discussions.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A
Appendix A.1. Ergodicity and Invariant Measure for S(t): Proof of Theorem 1
To establish positive recurrence, we introduce a Lyapunov function based on the criteria for continuous-time Markov chains as outlined in [18], Theorem 1.7. Positive recurrence can be inferred from the existence of a non-negative function f (Lyapunov function) across the set of states, a small positive value , and a finite set of states F. Specifically, by applying the process generator to function f, we ensure that holds true for all states x not within the set F.
Recall that the generator for a discrete-state Markov process is represented by the matrix , where for corresponds to the transition rate from state x to state y, and . Applying the generator for the identity function , we obtain
for all . The last inequality provides the finite set for the criteria above with . Thus, the criterion ensures that the Markov process is positively recurrent.
The formula for the invariant measure can be verified through direct examination using the global balance equations. Beginning with the system of global balance equations, we derive the following.
Dividing the left side and right side on and using the notation , we rewrite the last system as
After the normalizing of the relation
and since is the probability measure, , we return the notation and obtain Formula (6).
Appendix A.2. Law of Large Numbers: Proof of Theorem 2
Perhaps the most straightforward method to prove the LLN involves the ergodic theorem for discrete-time Markov chains. Let denote the embedded discrete-time Markov chain on derived from , with the following transition probabilities:
Consider the stationary measure associated with the chain . Evidently, the relation (6) for the stationary measure can be transformed into the following relation for the stationary measure :
This transformation can be verified directly using the global balance equations of the discrete-time Markov chain .
Denote as the discrete-time embedding chain corresponding to the continuous-time bid-price process . The behavior of can be expressed as a function of the spread dynamics using the following formula.
where
is the sequence of independent and identically distributed random vectors such that
and where the function F is
Note that
is a Markov chain, and let be its invariant measure. Observe that the discrete part of the invariant measure for the process is the product . By the ergodic theorem, we obtain the LLN for the embedding chain .
Lemma A1.
where
Proof.
The ergodic theorem states the convergence (A6). Thus, we need only to find the v, which is the expectation over the invariant measure of the increments :
which finishes the proof of the lemma. □
To finish the proof of Theorem 2, we observe
where is the Poisson process with rate .
Appendix A.3. CLT for Embedding pn: Proof of Lemma 1
For instance, one approach to proving this is by demonstrating the geometric ergodicity of the chain , which indicates a geometric rate of convergence to the invariant measure:
where stands for the total variation norm. Subsequently, we can apply the results applicable to geometrically ergodic chains. Formally, this necessitates establishing that the chain , defined by (A5), is a Harris ergodic Markov chain, which indeed holds true for . For further details, please refer to [19].
Theorem A1
(Corollary 2, [19]). Let X be a Harris ergodic Markov chain on with invariant distribution π and let be a Borel function. Assume that X is geometrically ergodic and for some . Then, for any initial distribution, as
in distribution.
Let us begin by proving the geometric ergodicity of the chain . General results exist concerning the so-called drift conditions for establishing geometric ergodicity in chains, as detailed in [20]. However, for countable Markov chains, we can utilize the criteria outlined in [21] (refer to Theorem 2):
A countable Markov chain is geometrically ergodic if there exists a finite set and function , such that when and , if .
Proof.
Utilizing the criteria, let . To check the conditions we consider the function for the chain , defined as follows:
For simplicity, let us assume and . Then,
Consider two cases. First, we suppose that . Then,
Thus, the condition holds for all such that . In the second case, , we have :
There is no such that (A8) for all x. But it is easy to see that there exists and such that for all under the condition
It is easy to see that
Thus, in this case, we can define the finite set B from the condition as
This completes the proof of the geometrical ergodicity of the chain . □
For the second condition of the CLT theorem, it is necessary to verify that for a certain , where the function F is defined by (A4). To achieve this, we require insights into the behavior of the invariant measure.
Proof.
As before, let be the invariant measure for the chain . The condition takes the following form:
With that, we conclude the proof of the CLT, as stated in Lemma 1.
Appendix A.4. CLT for Continuous-Time P b (t): Proof of Theorem 3
Based on the result above, let us proceed to establish the central limit theorem for the price process . As previously mentioned, consider to be the Poisson process with rate , representing the count of jumps for the process . The subsequent representation holds:
According to Lemma 1 and CLT for the Poisson process, we expect that as
The second and third convergence are well known when the first one can be proved as follows: let be the cumulative distribution function of the scaled embedded price Markov chain from Lemma 1, then for any there exists such that for all
where stands for the cumulative normal distribution with zero mean and variance . Then,
Observe that two normal variables from (A10) are not independent; however, they are asymptotically uncorrelated and, furthermore, asymptotically independent. To demonstrate this, consider variables
We will show that they are asymptotically independent.
Since the second variable is a measurable function of the it suffices to prove that for all , all sets , and all , the following inequality holds:
If the set A is bounded from above, then the inequality holds:
Suppose now that A is not bounded. In this case, for any we obtain
Thus, for any we have
and the following inequality holds
Choosing , we obtain inequality (A11).
References
- Biais, B.; Hillion, P.; Spatt, C. An empirical analysis of the limit order book and the order flow in the Paris Bourse. J. Financ. 1995, 50, 1655–1689. [Google Scholar] [CrossRef]
- Smith, E.; Farmer, J.D.; Gillemot, L.; Krishnamurthy, S. Statistical theory of the continuous double auction. Quant. Financ. 2003, 3, 481–514. [Google Scholar] [CrossRef]
- Bouchaud, J.P.; Farmer, J.D.; Lillo, F. How markets slowly digest changes in supply and demand. In Handbook of Financial Markets: Dynamics and Evolution; Elsevier: Amsterdam, The Netherlands, 2009; pp. 57–160. [Google Scholar]
- Cont, R.; Stoikov, S.; Talreja, R. A stochastic model for order book dynamics. Oper. Res. 2010, 58, 549–563. [Google Scholar] [CrossRef]
- Avellaneda, M.; Reed, J.; Stoikov, S. Forecasting prices from Level-I quotes in the presence of hidden liquidity. Algorithmic Financ. 2011, 1, 35–43. [Google Scholar] [CrossRef]
- Cont, R.; De Larrard, A. Price dynamics in a Markovian limit order market. SIAM J. Financ. Math. 2013, 4, 1–25. [Google Scholar] [CrossRef]
- Cont, R.; Mueller, M.S. A Stochastic Partial Differential Equation Model for Limit Order Book Dynamics. SIAM J. Financ. Math. 2019, 12, 744–787. [Google Scholar] [CrossRef]
- Golub, A.; Keane, J.; Poon, S.H. High frequency trading and mini flash crashes. arXiv 2012, arXiv:1211.6667. [Google Scholar] [CrossRef]
- Dall’Amico, L.; Fosset, A.; Bouchaud, J.P.; Benzaquen, M. How does latent liquidity get revealed in the limit order book? J. Stat. Mech. Theory Exp. 2019, 2019, 013404. [Google Scholar] [CrossRef]
- Lo, D.K.; Hall, A.D. Resiliency of the limit order book. J. Econ. Dyn. Control 2015, 61, 222–244. [Google Scholar] [CrossRef]
- Riccò, R.; Rindi, B.; Seppi, D.J. Information, Liquidity, and Dynamic Limit Order Markets; Bocconi University: Milano, Italy, 2022. [Google Scholar]
- Roll, R. A simple implicit measure of the effective bid-ask spread in an efficient market. J. Financ. 1984, 39, 1127–1139. [Google Scholar]
- Doyne Farmer, J.; Gillemot, L.; Lillo, F.; Mike, S.; Sen, A. What really causes large price changes? Quant. Financ. 2004, 4, 383–397. [Google Scholar] [CrossRef]
- Biais, B.; Weill, P.O. Liquidity Shocks and Order Book Dynamics; Technical Report; National Bureau of Economic Research: Cambridge, MA, USA, 2009. [Google Scholar]
- Logachov, A.; Logachova, O.; Yambartsev, A. The local principle of large deviations for compound Poisson process with catastrophes. Braz. J. Probab. Stat. 2021, 35, 205–223. [Google Scholar] [CrossRef]
- Logachov, A.; Logachova, O.; Yambartsev, A. Large deviations in a population dynamics with catastrophes. Stat. Probab. Lett. 2019, 149, 29–37. [Google Scholar] [CrossRef]
- Rojas, H.; Alvarez, C.; Rojas, N. Statistical Hypothesis Testing for Information Value (IV). arXiv 2023, arXiv:2309.13183. [Google Scholar]
- Menshikov, M.; Petritis, D. Explosion, implosion, and moments of passage times for continuous-time Markov chains: A semimartingale approach. Stoch. Process. Their Appl. 2014, 124, 2388–2414. [Google Scholar] [CrossRef]
- Jones, G.L. On the Markov chain central limit theorem. Probab. Surv. 2004, 1, 299–320. [Google Scholar] [CrossRef]
- Meyn, S.P.; Tweedie, R.L. Markov Chains and Stochastic Stability; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
- Popov, N. Geometric ergodicity conditions for countable Markov chains. Dokl. Akad. Nauk. Russ. Acad. Sci. 1977, 234, 316–319. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).