Multivariate General Compound Point Processes in Limit Order Books

In this paper, we focus on a new generalization of multivariate general compound Hawkes process (MGCHP), which we referred to as the multivariate general compound point process (MGCPP). Namely, we applied a multivariate point process to model the order flow instead of the Hawkes process. Law of large numbers (LLN) and two functional central limit theorems (FCLTs) for the MGCPP were proved in this work. Applications of the MGCPP in the limit order market were also considered. We provided numerical simulations and comparisons for the MGCPP and MGCHP by applying Google, Apple, Microsoft, Amazon, and Intel trading data.


Introduction
In this paper we study multivariate general compound point processes to model the price processes in the limit order books (LOB). We prove a Law of Large Numbers and Functional Central Limit Theorems (FCLT) for these processes. The latter two FCLTs are applied to limit order books where we use these asymptotic methods to study the link between price volatility and order flow in our two models by using the diffusion limits of these price processes. The volatilities of price changes are expressed in terms of parameters describing the arrival rates and price changes. Bacry et al. (2013) proved a LLN and FCLT for multivariate HP [1]. Bowsher (2007) was the first who applied a HP and point processes to financial data modelling [2]. Bauwens and Hautsch (2009) use a 5-D HP to estimate multivariate volatility, between five stocks, based on price intensities [5]. We note, that Brémaud et al. (1996) generalized the HP to its nonlinear form [3]. Also, a functional central limit theorem for the nonlinear HP was obtained in [28]. Some applications of multivariate HP to financial data are given in [13]. Vinkovskaya (2014) considered a point process model for the dynamics of LOB, and a regime-switching HP to model its dependency on the bid-ask spread in limit order books [24]. A semi-Markov process was applied to LOB in [21] to model the mid-price. We note, that a level-1 limit order books with time dependent arrival rates λ(t) were studied in [8], including the asymptotic distribution of the price process. General semi-Markovian models for limit order books were considered in [22]. The book by Cartea et arXiv:2008.00124v1 [q-fin.MF] 31 Jul 2020 al. (2015) develops models for algorithmic trading in contexts such as executing large orders, market making, trading pairs or collecting of assets, and executing in dark pool [7]. That book also contains link to a website from which many datasets from several sources can be downloaded, and MATLAB code to assist in experimentation with the data. A detailed description of the mathematical theory of Hawkes processes is given in [16]. Zheng et al. (2014) introduced a multivariate point process describing the dynamics of the Bid and Ask price of a financial asset [27]. The point process is similar to a Hawkes process, with additional constraints on its intensity corresponding to the natural ordering of the best Bid and Ask prices. Eichler et al. (2017) has shown that the Granger causality structure of multivariate HP is fully encoded in the corresponding link functions of the model [12]. A new nonparametric estimator of the link functions based on a time-discretized version of the point process was introduced by using an infinite order autoregression. Consistency of the new estimator was derived. The estimator was applied to simulated data and to neural spike train data from the spinal dorsal horn of a rat. Chen et al. (2019) developed a new approach for investigating the properties of the HP without the restriction to mutual excitation or linear link functions [9]. They employed a thinning process representation and a coupling construction to bound the dependence coefficient of the HP. Using recent developments on weakly dependent sequences, a concentration inequality for second-order statistics of the HP was established. This concentration inequality was applied to cross-covariance analysis in the high-dimensional regime, and it was verified the theoretical claims with simulation studies [9]. Lemonnier et al. presented a framework for fitting multivariate HP for large-scale problems, both in the number of events in the observed history n and the number of event types d (i.e. dimensions) [15]. Liniger (2009) thesis addresses theoretical and practical questions arising in connection with multivariate, marked, linear HP [16]. Yang et al. (2017) developed a nonparametric and online learning algorithm that estimates the triggering functions of a multivariate HP [26]. [18] has shown that multivariate Hawkes processes coupled with the nonparametric estimation procedure can be successfully used to study complex interactions between the time of arrival of orders and their size observed in a limit order book market. This methodology was applied to high-frequency order book data of futures traded at EUREX. Introduction to point processes from a martingale point of view may be found in Bjork (2011) lecture notes [4].
Guo et al. (2020) constructed a multivariate general compound Hawkes process (MGCHP) [14] which is an extended model from [10] and [20]. In [14], they applied the multivariate Hawkes process to model the order flow of several stocks in limit order market and proved limit theorems for the MGCHP. In this paper, we proposed a new mid-price model which is a generalization of the MGCHP and we called it the multivariate general compound point process (MGCPP). For the MGCPP, we applied a multi-dimensional simple point process to represent the order flow in LOB instead of the Hawkes process. We also proved the corresponding LLN and FCLTs for the MGCPP. One of the reasons why we considered the generalized model is parameters for simple point process are much easier to estimate than Hawkes process. So, we provided the numerical comparisons of the MGCPP and MGCHP by real high-frequency trading data and we found that results of the new generalized model are as good as the MGCHP. This paper is organized as follows. Definition and assumptions of the multivariate general compound point process (MGCPP) can be found in Section 2. Functional central limit theorem (FCLT) I and law of large numbers were proved in Section 3. We also provided numerical examples simulated by real data for the FCLT I in Section 3. In Section 4, we considered a FCLT II for the MGCPP and applied it in the mid-price prediction. Section 5 concludes the paper.

Definition of Multivariate General Compound Point Process (MGCPP)
In this Section, we proposed a multivariate stochastic model for the mid-price in the limit order book. This is a generalization for models in [10], [14], and [20]. Here, we assume the order flow was described by a multivariate simple point process with some good asymptotic properties. Definition 2.1 (Counting Process). (see, eg., [11]): We called a stochastic process {N (t), t ≥ 0} counting process if it satisfies N (t) ≥ 0, N (0) = 0, N (t + s) ≥ N (t), for all t, s ≥ 0, and N (t) is an integer. Definition 2.2 (Point Process). (see, eg., [11]): Let (T 1 , T 2 , T 3 , · · · ) be a sequence of non-negative random variables with P (0 ≤ T 1 ≤ T 2 ≤ T 3 ≤ · · · ) = 1, and the number of points in a bounded region is almost surely finite, then (T 1 , T 2 , T 3 , · · · ) is called a point process.
The point process was characterized by the conditional intensity function λ(t) in the form of where λ(t) is a non-negative function and F N (t), t > 0 is the corresponding natural filtration.
Assumption 2.0.2 We also assume there's a Functional Central Limit Theorem (FCLT) of the N t in the form of: converge in law of the Skorohod topology to Σ 1/2 W t as n → ∞, where W t is a standard d-dimensional Brownian motion and Σ is in the form of: Σ = diag(σ 2 1 , σ 2 2 , σ 2 3 , · · · , σ 2 d ).
Here, N t denotes the order flow in the limit order market for d stocks. Liquidity for the high-frequency trading data guarantee there are enough price changes in one day or even a small window size nt. So, it is resealable to consider those two limit assumptions before.
Remark 2.1 For a simple example, if we consider the point process as a multivariate homogeneous Poisson process, then two assumptions above are LLN and FCLT for the multi-dimensional Poisson process. Let P t be a d-dimensional Poisson process with intensity λ. Here, we used notation P t to distinguish the general case and Poisson example. Then, we have the LLN in the form of as n → ∞ almost-surely. And the FCLT in the form of √ n 1 n P nt − t λ converge in law for the Skorokhod topology to W t • λ 1/2 as n → ∞, where • is the element-wise product.
Remark 2.2 Another interesting example is the multivariate Hawkes process (MHP). Let H t = (H 1,t , H 2,t , · · · , H d,t ) be a d-dimensional Hawkes process with the intensity function for each H i in the form of Let µ = (µ ij ) 1≤i,j≤d , λ = (λ 1 , λ 2 , · · · , λ d ) T , and K = ∞ 0 µ(t)dt, then the LLN for MHP is in the form of as n → ∞ almost-surely, where I is a d-dimensional identity matrix. And we can also have the FCLT for MHP: converge in law of the Skorohod topology to (I − K) −1 D 1/2 W t as n → ∞, where W t is a standard d-dimensional Brownian motion and D is a diagonal matrix such that D ii = ((I − K) −1 λ) i . Details about the LLN and FCLT of MHP can be found in [1].

Definition for MGCPP
Next, we consider a price process S t in the form S t = (S 1,t , S 2,t , · · · , S d,t , ) as: where X i,k are independent ergodic continuous-time Markov chains and a i (·) are bounded continuous functions on X.
We refer S t as multivariate general compound point processes (MGCPP).

Remark 2.3
If we consider the one-dimensional case, let N t be a Poisson process, a(x) = x, and X k is a sequence of independent random variables such that P (X 1 = δ) = P (X 1 = −δ) = 1/2, then S t is a stochastic model for the dynamics of a limit order book discussed in [10].

Remark 2.4
When N t is a multivariate Hawkes process, then S t is a multivariate general compound Hawkes processes (MGCHP) which proposed in [14].

LLNs and Diffusion Limits for MGCPP
In this Section, we considered the diffusion limit theorems for the MGCPP. It provides us a link between the order flow N t and the price process S t . The functional central limit theorem and law of large numbers for the MGCPP are generalizations for the diffusion limit theorems of the MGCHP in [14].

LLN for MGCPP
as n → ∞ almost-surly.  (7), we have Recall the strong LLN of Markov chain (see, eg,. [17]), we have Rewrite (9) in the multivariate case, we derive the LLN for the MGCPP.

Diffusion Limits for MGCPP: Stochastic Centralization
Theorem 3.2 (FCLT I: Stochastic Centralization). Let X i,k , i = 1, 2, · · · , d be independent ergodic Markov chains with n states {1, 2, · · · , n} and with ergodic probabilities π * i,1 , π * i,2 , . . . , π * i,n . Let S nt be d-dimensional general compound point process, we have Here, where P i is the transition probability matrix for the Markov chain X i , Π * i is the matrix of stationary distributions of P i , and g i (j) is the jth entry of g i . and here the a * i is defined by a * i = k∈Xi π * i,k a i (X i,k ). Then, for some n, we have Consider the following sums: where · is the floor function. As the similar martingale method in [21] and [25], we have the following weak convergence in Skorokhod topology From the assumption (2.0.1), we have the LLN for the MPP in the form of Using change of time in (15) and let t → N i (nt)/n, we have Rewrite (16) in the multivariate form we derive the weak convergence for MGCPP: Next, we consider a simple special case. Let X i,k be a Markov chain with two dependent states (+δ, −δ) and the ergodic probabilities (π * i , 1 − π * i ). In the limit order market, the δ is the fixed tick size and the d-dimensional point process N nt represents the order flow for d stocks. Here, we set a i (x) = x in the equation 7. In this way, we can derive the corresponding limit theorems for the d-dimensional price process S nt . where for all t > 0 and some lagre enough n. Since S nt is the price process in high-frequency trading, the time is always measured in a very short period (eg, milliseconds). So, even if the window size nt = 10 seconds with t = 0.001, the n will equal to 10, 000 which is a very large number. In this way, it is reasonable to consider this kind of approximation in the LOB.

Remark 3.4
When N t is a multivariate Hawkes process, the corresponding FCLTs and LLNs for the S nt were considered in [14]. When we consider an one-dimensional case, if N t is a renewal process, the corresponding limit theorems for the semi-Markovian model S t model were discussed in [21] and [22].

Numerical Examples for FCLT: Stochastic Centralization
In this Section, we tested the FCLT I of MGCPP model with the LOBSTER data and compared our results with the result simulated by MGCHP in [14].

Data Description and Parameter Estimation for MGCPPDO
The level one LOBSTER data on June 21st, 2012 was considered in this paper. In this data, time is measured in milliseconds and the tick size is one cent which means the corresponding δ = 0.005. We can find the basic data description and check the liquidity from Table 1: Next, we estimate parameters Σ = diag(σ 2 1 , σ 2 2 , σ 2 3 , · · · , σ 2 d ) and λ = (λ 1 ,λ 2 ,λ 3 , · · · ,λ d ) via the LLN and FCLT assumptions of N t . From 2.0.1 and 2.0.2, when n is large enough, we can derive the approximations: and Take the expectation for (21) and variance for (22), we have and In this way, we derived the estimated parameters for 5 in Table 2.
We calculated frequency in our data to estimated the p uu and p dd in P by where q uu , q dd , q ud , and q du are the number of price goes up twice, goes down twice, goes up and then down, goes down and then up, respectively. And the result is in Table 3: Table 3: Transition matrix and constant parameters for two-state MGCPP. α * and σ * were calculated by equation (19).

Comparison with multivariate general compound Hawkes process with two dependent orders
In this Section, we compared the simulation results of MGCPP with the multivariate general compound Hawkes process (MGCHP) model to show that the simple generalized model can also reach a good accuracy as the MGCHP who has a sophisticated intensity function. In [14], they simulated the MGCHP with two dependent states for Microsoft and Intel's data. So here we also conduct simulations for Microsoft and Intel's data with the two-state MGCPP, which means the Markov chain has two dependent states (+δ, −δ).
We tested the MGCPP model by comparing the standard deviation for the left hand side and right hand side in the FCLT: That is to say, we first cut our data into disjoint windows of size nt, specifically [int, (i + 1)nt] with t = 0.001 and by setting the left bound as our starting time we can calculate: and the equation for standard deviation is given by The Figure 1 gives a standard deviation comparison of MGCPP, MGCHP, and the raw data for 2 stocks in different window sizes from 0.1 second to 12 seconds in steps of 0.1 second. First, we could find the MGCPP parameters make the standard deviation of LHS very similar to the RHS for each stocks when n is large. So, generally speaking, we can say our MGCPP model fits the data well. Second, the MGCPP curve is very close to the MGCHP curve or we could say the simulation results via Intel and Microsoft stocks data are nearly same. It shows that even we don't have a sophisticated intensity function as the Hawkes process, we still can reach a relative good result with a simple point process model. This can help us deal with the computing efficiency problem when using the MGCHP model. We'll give more quantitative error analysis later.

INTC thm1
Compound  Since the number of windows decreases as the window size nt increases, we can find that the spread of data increases when the window size increases in Figure 1. For example, when we consider nt = 0.1 second, the number of windows is 234,000. However, a 12-second window size yields 1,950 windows which will lead the standard deviation increases.
Intuitively, the Figure 1 shows that the standard deviation of MGCHP and MGCPP are very close and both of them fit the real standard deviation very well. Next, we analyze MGCHP and MGCPP models quantitatively.
We computed the mean square error (MSE) of the real standard deviation and theoretical standard deviations in Table 4. As can be seen from the  Recall the equation (25), we can find the standard deviation and the square root of time step have a linear relationship. So, we can fit the real standard deviation data with the square root curve by using the least-square regression. And then, we can compare the coefficients from the least-square regression and two stochastic models. From the Table 5, we can find that the percentage error of both two stochastic models are all smaller than 5% and there is no significant difference between the MGCPP coefficient and the MGCHP coefficient.

MGCPP with n-state Dependent Orders
We will give more simulation examples by using the Google, Apple, and Amazon data with the MGCPP model with n-state dependent orders in this Section. Thanks to [23], we can conclude that the accuracy of the general compound Hawkes process model increases when the number of states increases. And for Google, Apple, and Amazon in LOBSTER data set, the best number of states is 4 to 7. In the previous Section, we also showed that the simulation results of MGCPP is nearly same as the MGCHP. So, it's reasonable to consider a MGCPP model with 7-state Markov chain here.  Intel's and Microsoft's data, we take bigger time steps and window sizes (from 10 seconds to 20 minus with 10 seconds time step) to capture more dynamics. From the figures we can find that the 7-state model has a significant improvement than the 2-state model. 7-state curves for AAPL and GOOG are very close to the real standard deviation, although the theoretical curve of AMZN is underestimated even with the 7-state model. The Table 6 lists the MSE and coefficients of the 2-state and 7-state models with different tickers. We can find the improvement of 7-state model quantitatively from the Table. The results of AAPL and GOOG are good enough for the mid-price modeling. As for AMZN, although we derive a remarkable improvement from 2-state model (74.60% error) to 7-state model (28.29% error), we cannot make the error smaller than 5% or 10%. This is to say, MGCPP model may not be able to capture the full dynamics for AMZN data, but it still can be a strong candidate for modeling the mid-price, which is consistent with the conclusion of MGCHP model in [23].
In general, we can conclude that: as a generalization of MGCHP, the MGCPP model also has a very good performance in mid-price dynamics modeling. If we consider the MGCPP with higher states Markov chain, we will derive a better result.

Remark 3.6
The MGCPP is not only a generalization of MGCHP, but also a generalization for all multivariate compound models whose point processes N t satisfy the assumptions 2.0.1 and 2.0.2. The reason we use Hawkes process for comparison is we want to take the advantage of numerical examples in references.

Diffusion limit for the MGCPP: Deterministic Centralization
We proved a LLN and FCLT for the MGCPP in the previous Section. And the limit theorems provide us an approximation for the mid-price modeling in the LOB. Recall the approximation in Remark 3.3, we have where the S nt is the price process and N nt is the order flow. However, in the real-world problems, equation (26) cannot help us with the forecasting task directly because we couldn't have the order flow N nt in advance. This motivates us to consider a FCLT II for the MGCPP in this Section.

Remark 4.2
We can also consider a special case as the FCLT I. Let X i,k be a Markov chain with two dependent states (+δ, −δ) and the ergodic probabilities are (π * i , 1 − π * i ). Set a i (x) = x in the definition 7. Then, we can derive a similar result for FCLT II. Parametersã * andσ * can be computed by equation (19).

Remark 4.3
For the FCLT II, we can also consider a similar approximation as the FCLT I. For some large enough n, we have To deal with the E( N nt ) term, we consider the approximation derived from assumption 2.0.1 in equation (23): Rewrite equation (32), we have the new approximation

Numerical Examples for FCLT: Deterministic Centralization
In this Section, we applied the LOBSTER data to test the FCLT II. According to the numerical examples of FCLT I, we consider the standard deviation of the approximation in Remark 4.3, namely The comparisons of real standard deviation and theoretical standard deviation can be found in Figure 5. Since results of INTC and MSFT are good enough with the 2-state Markov chain (+δ, −δ) in FCLT I, we also applied 2-state Markov chain for INTC and MSFT here. As for AAPL, GOOG, and AMZN, we used the MGCPP model with 7-state Markov chain. Window sizes here start from 1 second and increase to 20 minutes in time steps of 10 seconds. As can be seen in Figure 5, the results for FCLT II are as good as the FCLT I results in Figure 1, 2, 3, and 4. We also computed the MSE and coefficients in Table 7.   We see that the percentage errors of MSFT and AAPL are very small (less than 5%) and the results of INTC and GOOG are also good (less than 10%). The percentage error of AMZN is large, but it is still smaller than the error derived from FCLT I in Table 6. In general, the simulation results of FCLT II is as good as the FCLT I and we can apply this FCLT II to model a mid-price.

Rolling Cross-Validation
In this Section, we tested the forecast ability of the MGCPP model. Since we didn't assume the multivariate point process N t is stationary or independent, we cannot apply the K-fold cross-validation directly. Here, we used the rolling K-fold cross-validation method which proposed in [6]. We divided the last 50 minutes' data into 5 disjoint 10-min windows for each stock. For the fold 1, We take the first 280 minutes' data as the training set to estimate parameters. And then, we applied the data in the next 10-min window to calculate the percentage error. Next, we merge the test set into the training set in fold 1 as the new training set in fold 2 and apply the next 10-min window as a new test set. Repeat this procedure 5 times, we will get 5 percentage errors. The mean value of the 5 percentage errors will be the test error E for this stock. So, the overall test error for our multivariate model is the average of all test errors. Figure 6 gives an example diagram for the rolling cross-validation.   Table 7. That's because the results in Table 7 is a fitting error while the test errors in Table  8 is a kind of forecast error. We didn't apply any future information when we conduct the forecast task. So, even the 15.46% overall test error is not as good as the fitting one, it is still a good prediction in the LOB and can provide lots of insights in the forecast task.

Conclusion and future work
In this paper, we proposed a multivariate general compound point process for the mid-price modeling in limit order book. This kind of process is a generalization of several stochastic models in the limit order market. We applied LOBSTER data to conduct simulations and found the multivariate generalized model is as good as the general compound Hawkes process model. We also tested the prediction ability of this kind of process. In general, the MGCPP performs very good in LOB modeling and it can be a meaningful reference in the mid-price prediction. In the future, we will explore more applications of the MGCPP and consider related option pricing problems under this kind of frame work.