1. Introduction
Currently, Limit Order Book (LOB) is widely used in financial markets to facilitate traders to manage their orders and to implement transactions. In the LOB, the existing limit orders for a financial asset can be viewed as the up to time liquidity provision in the market, while the issuing of market orders from any traders is an instant demand of that liquidity. A liquidity demander who wants to buy shares (or sell shares) immediately will take the price at the best ask (or at the best bid) to complete transactions. However, because of the trading latency, traders may suffer price uncertainty (risks) of transactions as the quoted price at the best bid and the best ask would fluctuate from the time of issuing orders to the time that the order is matched. The problem is critical for traders with relatively high trading latency and becomes severe in the periods when the quote price changes quickly.
Because of the development of algorithmic trading and high-frequency trading in the last two decades, some traders have a speed advantage in order to execute their orders. The volatility of the quoted price and the latency of trades determine the uncertainty of the purchasing (selling) price for those liquidity demanders. Hasbrouck [
1] has pointed out that low-latency traders have the advantage of having lower cost to demand orders, while in the paper, he assumes that the price path is already given at the time of order submission. However, the dynamic of the quote price is quite unpredictable with some periods being much more volatile than others. If a trader arrives at a time when the quote price is more volatile, she will encounter a higher risk of transaction pricing.
Different traders have different latency, from microseconds for high-frequency traders to dozens of seconds for ‘slow’ traders; see [
2]. To provide a framework that is suitable for us to quantify quote volatility and the cost of demanding liquidity for different traders, we resort to the instantaneous volatility of quote price at any time point and further construct an integrated variance for different time horizons. However, the estimation of instantaneous volatility becomes particularly difficult for irregular LOB data in HF.
In the development of information technology and electronic trading, the updating frequency of order events has reached the level of microseconds, while there is also silent time where LOB stays unchanged for a couple of seconds. To deal with irregular data, the theory of point process is employed. According to it, the arrivals of a certain type of market events can be viewed as ordered points occurring in the time space. In the past, a number of econometric models were built to describe the occurrence of limit order submission, trade arrival, price change, etc. [
3,
4], and many of them directly study the financial durations between those market events [
5,
6,
7].
Recently, Yang et al. [
8] applied the autoregressive conditional duration (ACD) model by [
5] and Markov-switching multifractal inter-trade duration (MSMD) model by [
9] to study LOB and find their limitations in fitting HF transaction data. Abergel and Jedidi [
10] introduced the Hawkes process to model LOB, in which past events can influence the occurrence intensity of a current event. Then, Swishchuk and Huffman [
11] constructed general compound Hawkes processes and investigated their properties in LOB. Later, Morariu-Patrichi and Pakkanen [
12] applied state-dependent Hawkes processes to HF LOB data and built a novel model that captures the feedback loop between the order flow and the shape of the LOB. In addition, Li et al. [
13] used a time-varying Markov regime switching model to study the arrival time of trades in LOB and captured the bimodal distribution of intertrade durations.
Although the point process has been extensively exploited to model HF financial data, short-term volatility measurement using price duration is not common in the literature. Cho and Frees [
14] initiated a discussion of using price duration as volatility measurement, and Gerhard and Hautsch [
15] formally proposed a volatility estimator based on price durations. Tse and Yang [
16] developed duration-based variance estimators using ACD specifications, and recently, Hong et al. [
17] proposed a non-parametric duration-based estimator and concluded that the duration-based volatility estimators are more efficient than noise-robust realized volatility estimators. Furthermore, some papers focus on modeling the intensity function itself. By defining the intensity in continuous time and allowing the intensity process to be updated whenever required, the instantaneous volatility can be calculated by the inverse of intensity. Russell [
18] first proposed a univariate dynamic intensity model, the autoregressive conditional intensity (ACI) model, which follows an autoregressive structure that is updated at the time of new market events. Then, the ACI model was extended to the stochastic conditional intensity processes and multivariate process [
19,
20,
21].
In this paper, we propose a new change-point model to measure the quote volatility. The seminal work on change-point models can be traced back to Box and Tiao [
22], who want to solve a common modeling problem for the time series for which its parameters may undergo occasional changes. It arises in many applications, e.g., engineering, econometrics, and biomedicine. However, the generalized statistic method for estimation was developed until Lai et al. [
23], who used BCMIX (bounded complexity mixture) to reduce the complexity of computation. In our framework, we view the jump of the intensity of the quote price movement as a change point. Moreover, the domain set for renewing the intensity is infinite, and the renewing distribution is continuous. Thus, our model can generate a much larger space than the traditional ACI model. The empirical results demonstrate that our model performs much better in terms of fitting current HF data. Moreover, the change-point intensity model can not only measure the cost of demanding liquidity for traders with different latency but also can be used to test volatility jumps in HF environments.
The paper is organized as follows.
Section 2 introduces the method of volatility measurement using the price-change duration and our change-point intensity model.
Section 3 provides the estimation procedure and the simulation results.
Section 4 presents the data we used.
Section 5 shows the in-sample fitness of our model and the measurement of HF quote volatility, with a benchmark comparison with the ACI model.
Section 6 further implements the out-of-sample test and evaluates the model’s predictive power.
Section 7 provides the conclusions.
2. Volatility Measurement Using Price Duration
2.1. Instantaneous Volatility Measurement Use Price Duration
According to [
5], the instantaneous volatility can be measured by the conditional instantaneous variance of returns, which is defined as follows:
where
is a price process of a financial security, and
denotes the information setup, including
t. Following [
15,
16], duration-based variance estimators rely on a relationship between the conditional intensity function and the conditional instantaneous variance of a point process. Specifically for price volatility, we can consider the price-change process as a point process. We define
as the threshold of the price-change event, and
are the times when these price-change events occur. Clearly, the number of events
n depends on the value of
.
If we define
as the price duration between two consecutive price-change events, then the conditional variance per time over the price duration is as follows.
The above calculation requires either specifying a stochastic process for
or computing the distribution
using a transformation of the conditional distribution of
.
Here, we introduce more information on the point process theory, which leads to the formulation of instantaneous volatility using the price-change intensity function. Let
be a sequence of event arrival times
, then a orderly point process is associated with a counting process,
, where
is the number of events up to and including time
t. A point process can be characterized by a intensity function
, which is described as follows.
It represents the probability for a new arrival of the event in an infinitesimal time interval. In many applications, this is equivalent to the hazard function, particularly in traditional duration or survival analysis, where cross-sectional duration data are analyzed [
5,
24], while the intensity function is mostly defined in continuous time and conditions on a possibly continuously varying information set
.
In particular, for the price-change event with the price threshold
, the price variation in a small time interval
can only be
or
. Hence, the instantaneous variance of returns at time
t can be derived in terms of the following expression.
Therefore, the measurement of instantaneous volatility lies in estimation of the intensity function associated with the process of
—price changes, i.e.,
. A similar result is obtained in [
4,
5,
17].
Another important property about the integrated intensity function builds the basis for the construction of the likelihood function of an intensity-based model and leads to the mixture-of-exponential representation that is essential for our change-point intensity (CPI) model. According to the random time change theorem by [
25] which transforms a wide class of point processes to a unit-rate Poisson process, both Barndorff-Nielsen and Shiryaev [
26] and Hautsch [
4] have shown that, if the event arrival time of a point process is
, then the integrated intensity function is as follows:
where
is the exponential distribution with the rate parameter as 1.
If we further assume that the intensity function is constant between two consecutive events, i.e.,
between
and
, then the above property becomes
, where
is the event duration between
and
. Rearranging the equation, we will have the mixture-of-exponential representation.
Chen et al. [
9] also arrives at the same result by interpreting a point process as a dynamic, uncountable set of independent Bernoulli trials.
2.2. The Benchmark ACI Model
The benchmark autoregressive conditional intensity (ACI) model was proposed by [
18]. It presents dynamic parameterizations of the intensity function in continuous time, which allows updating the intensity process whenever required. The intensity is characterized in the the following form:
which is driven by three components: one component
capturing the dynamic structure, a baseline intensity component
, and a seasonal periodicity component
.
The core part is to model dynamic component
, which is given by the following:
where
is the vector of covariates that is collected at the time of the preceding event (say
),
is the vector of coefficients for these covariates, and
is a piecewise-constant dynamic component between
and
i events. This piecewise-constant component follows a form of ARMA(
):
where
is the persistence parameter, and
is the coefficient associated with the innovation term
. The innovation term is specified as follows.
Based on the theory of point process, the probability of events occurring at time
is
. Hence, the log-likelihood of the observation of events is as follows.
Specifically, for this quote volatility measurement, set the baseline intensity , where is the backward recurrence time at t and is defined as . is the nearest backward event time, and . For the seasonal factor , we can use 1 h intervals and set it as piecewise linear within one interval: where are the interval cutting time and we set .
Therefore, the parameters in this simple ACI model without covariates are as follows:
and we can use the maximum likelihood (MLE) for estimation.
2.3. The Change-Point Model for Quote Volatility
For the quote volatility measurement in an HF environment, we propose a new change-point intensity (CPI) model, following [
22,
23]. Similarly to other intensity-based duration models, we think that the price duration follows an exponential distribution that has the price-change intensity as its rate parameter
. In our case, the shift of intensity is a change point, which may respond to market environment fluctuation, liquidity shocks, and a myriad of perceived changes of information on the stock. Allowing intensity to change over time can account for relatively long periods punctuated by extremely short periods observed in the time series.
According to the mixture of exponential representation in the Equation (
3), the duration is as follows:
where
follows an i.i.d.
. The conditional intensity follows a Markov change-point process with the renewing distribution
.
It belongs to the class of change point process because the underlying intensity undergoes occasional changes. It is also a Markov process because the value of future intensity only depends on the current state, i.e., either keep the same value as the current intensity with probability or randomly draw from a fixed distribution with probability p.
Given
defined in a continuous and infinite space, our model can generate much greater flexibility by using a very small numbers of parameters. Specifically, in our model, we assume that the renewing distribution is a Gamma distribution, and the p.d.f is as follows:
where
and
are the shape and rate parameter of Gamma distribution, respectively, and
is a defined function of
and
.
This model can also be view as a conditional mean duration model ( similarly to the traditional ACD model) in terms of the following:
where
is the conditional mean level for the inter-trade duration, and
also follows Markov change-point process:
where
is the renewing distribution of
.
The economic intuition of our CPI model is as follows. In the period without a change point (with probability ), it is plausible to think that the market is behaving consistently and the price-change intensity remains unchanged. At this time, quote volatility is constant. However, when the market environment changes or there is some liquidity/informational shock (with probability p) on the stock, the evolution of quote price enters a new state and the price-change intensity changes, and correspondingly, the quote volatility becomes a new value.
2.4. Model Comparison
In this part, we show some comparative structures of our CPI model, using the ACI model as an anchoring benchmark.
The price-change intensity in our model is set to be constant within the interval of price updates while it may have a sudden jump to a new level for subsequent price-change events. We think that this setting is reasonable for ultra-high-frequency data as the time interval of quote-price updates is short, for which we can ignore the dynamics in between, but more importantly, it can help us to circumvent the complex estimation of the time structure of the underlying intensity. Nevertheless, when there is some exogenous shock (information or liquidity shock), the quote-price change intensity can become a new one, and correspondingly, quote volatility changes.
Although, in general, the quote-price change duration is short, it may also have a large dispersion as the shortest duration can proceed to the extent of microseconds while the longest duration could be a couple of seconds. This will become a potential problem for the ACI model as the evolution of price-change intensity in the ACI model is relatively smooth. (In the ACI model, in addition to the baseline component and the seasonal component , the dynamic component follows an ARMA structure.) However, in our change-point intensity model, the new level of intensity is drawn from a continuous distribution . Hence, it allows drastic changes of intensity, depending on the value of distribution parameters.
Our model is also feasible to be extended to a framework of incorporating other influencing factors in determining the change of quote volatility. For example, we can allow intensity renewal probability
p to be time varying and to be driven by some other factors:
where
is a
vector of influencing factors, and
is a
vector of the corresponding factor loadings (or the regression coefficients). By having this structure, we can further analyze whether some factors can result in a high probability of changing the quote volatility. Nonetheless, this part of extension is beyond the scope of this paper and deserves a further development.
4. Data Environment
Our data are downloaded from LOBSTER (
https://lobsterdata.com/ accessed on 2 February 2018), which provides high-quality LOB data of all Nasdaq stocks from June 2007. The LOB data reconstructed by LOBSTER are based on Nasdaq’s Historical TotalView-ITCH data, i.e., the historic record of what Nasdaq calls. LOBSTER simultaneously generates two files for each active trading day of a selected ticker. One is a ‘message’ file, which contains indicators for the type of event causing an update of LOB in the requested price range. The other is an ‘orderbook’ file, which records the ask and bid quotes of LOB at the time when the ‘message’ file is updated.
Table 2 shows a sample of LOB ‘messages’ and ‘orderbook’ files of AMZN on 2 January 2013. We show three events of the LOB. In panel A of
Table 2, which is the ‘messages’ file, type 3 event and type 1 event represent a deletion and submission of a limit order, respectively. The direction ‘−1’ means the order event from the ask side, and ‘1’ denotes the bid-side order. In the meantime, the ‘orderbook’ file in panel B of
Table 2 records the ‘shape’ of LOB after these three events. From this, we can observe that after submission of a new order at the bid side with a higher price than the existing best bid, the new best bid changes from 2550700 to 2550800, i.e.,
$255.07 to
$255.08.
Figure 3 plots the evolution of quote price at the best bid in the first 50 s of AMZN stock on 2 January 2013. From this, we can observe that there is non-constant variation of bid price, with some intervals being much more volatile than others. A liquidity demander with 1 s of latency will encounter a high uncertainty and cost when she posts a (sell-side) market order at
or
, compared with the entering time at
. Therefore, we need a method to effectively quantify the volatility of the quote price at any time and also the cost of demand for traders with different trading latencies.
6. Out-of-Sample Performance and Model Prediction Power
At last, we want to examine the out-of-sample performance of our CPI model. However, it is hard to predict the volatility directly and to observe the model’s performance because actual volatility is unobserved. Therefore, we only predict the duration length for the next price change as the actual price-change duration is known to us. When the duration for the change of quote price is long, quote volatility is low. On the other hand, when the duration for the change of quote price is short, quote volatility should be high.
We use one-step-ahead forecasting based on the model’s in-sample estimation results. Specifically, for the quote prices (at the best bid) of the AMZN stock on 2 January 2013, we use the first 4000 data points (which are the 4000 events of price changes) for parameter estimation and perform a one-step-ahead prediction for the remaining observations of price durations. The expected price-change duration for the quote price can be derived as follows.
This is because, in the CPI model, the intensity for the next price change either retains its past value (with probability ) or renews from the Gamma distribution (with probability 1 − ), and the mean value of the Gamma distribution is .
We test the model’s out-of-sample performance by using the Mincer–Zarnowitz ordinary least squares (OLS) regressions.
where
is the regression intercept,
is the regression coefficient for
, and
is the regression error term.
Moreover, we compare the model’s performance with the ACI model, and the results are shown in
Table 4. From the results, we can observe that coefficients
are significantly positive in both models, suggesting that the predictions from both the CPI and ACI models can explain the variation of the real value of price-change durations. Nevertheless, our CPI model is significantly better in terms of prediction power because the R-squared in the fitting of CPI model is much higher.
7. Conclusions
This paper has proposed a new method to measure the volatility of quote prices in the limit order market, which is important to quantify the cost of demanding liquidity in the HF trading environment. We use the point process to describe price-change events that occur at the best quote level (at the best bid or the best ask), and volatility is measured based on the inference of price-change intensity according to realized price-change durations. In particular, we resort to the change-point model proposed by Lai et al. [
23] to describe the dynamics of price-change intensity and name it as the change-point intensity (CPI) model. In the model, the underlying price-change intensity follows a Markov process, i.e., either maintains its past value or renews from a Gamma distribution. Thus, we can use the data of price-change durations to infer the underlying price-change intensity and further calculate quote volatility based on the method proposed by Engle and Russell [
5].
We apply the CPI model to study the quote volatility of the AMZN stock on 2 January 2013. Specifically, we choose the threshold of price change as 3 cents to define the price-change event and construct the series of price-change durations. The instantaneous quote volatility at any time of the trading day can be derived from the estimated price-change intensity by our CPI model. Furthermore, we have calculated the cost of demand for traders with different trading latency based on the integrated variance. In addition, we compare both the in-sample fitness and out-of-sample prediction power with the benchmark ACI model by Russell [
18], and the results suggest that our model performs better.
Our work has made progress in modeling HF quote volatility. Nonetheless, it leaves much room for future development. The current CPI model is a univariate structure that studies the dynamics of quote volatility itself. We can further extend it by incorporating other factors in determining the changes of quote volatility. This can be performed by setting intensity renewal probability p to be time-varying and to be driven by some other factors.