1. Introduction
However, the general problem of modeling market agent behavior from historical data is complicated by the sheer number of agents and the diversity of their utility functions. It is simply impractical to try to enumerate them all. Furthermore, most stock trading data are anonymized
Comerton-Forde and Tang (
2009), and information about who submitted a certain order is not available.
The goal of this study is to learn stock trader behavior patterns from anonymized historical stock order data using neural network-based imitation learning (IL)
Schaal (
1999);
Schaal et al. (
2003). To realize multi-modal imitation learning, we propose latent segmentation
Cohen and Ramaswamy (
1998);
Swait (
1994) of stock trading strategies by trader objective function. Orders are segmented according to the weighted average of the reward function for each stock trader segment. An IL model is defined for each segment and trained to predict which trader segment was most likely to have submitted a particular order at a particular time. We refer to the proposed method as “Latent Segmentation Imitation Learning (LSIL)”.
LSIL was evaluated using both simulated market data and actual historical stock order data. Experiments using simulated data were conducted to evaluate the validity of latent segmentation, and experiments on historical stock order data were conducted to examine the accuracy of stock order predictions made by LSIL. We find that LSIL models are able to predict stock orders with a degree of accuracy, and also provide meaningful insights into the drivers of trader behavior. Detailed investigation into changes in market conditions and segments revealed that our proposed segments behaves in line with real-market investor sentiments.
The primary contributions of this study can be summarized as follows:
We propose a neural network-based method for imitation learning of stock trading strategies. To consider diverse trading strategies, latent segmentation of based on a reward function is introduced.
The proposed method is evaluated using both simulated market data and historical stock order data. The proposed method was confirmed to provide both accurate stock order predictions and a meaningful interpretation of trader segment behavior.
3. Latent Segmentation Imitation Learning (LSIL)
As mentioned previously, financial markets consist of stock orders with various objectives, and these objectives are not self-evident from trading data alone. Since previous studies were seemingly able to classify trading strategies
Yang et al. (
2015), we should be able to achieve higher prediction accuracy by modeling each strategy class. In this study, we assume latent segmentation of traders
Cohen and Ramaswamy (
1998);
Swait (
1994) and that all traders belong to a unique segment at each time, that each trader may drift between segments, but cannot belong to more than one segment simultaneously.
Let the latent segments
represent specific trading strategies. Then, the probability of submitting stock order
can be written as
where
X are market states,
is the probability that traders belonging to segment
submit an order, and
is the probability that a stock order is submitted by traders in segment
conditioned on parameter
. Each
is predicted using an individual network, which we refer to as segment level order networks.
Although we can obtain pairs from historical stock order data, information about is never available. Therefore, we also predict using another neural network that we refer to as the segment network. The predicted probability of a given segment is written , where is the parameter of the segment network.
As mentioned previously, each segment represents an individual strategy. Much like in reinforcement learning, we introduce an individual reward function for each segment. The reward function for segment
is denoted
. Then, with the predicted segment probability, the expected reward for order
o is calculated as follows.
Since real markets are not perfectly efficient
Jung and Tran (
2016), we assume there exists a segment of traders that acts inefficiently. We introduce an exceptional action segment
, which has a uniform reward function and requires
.
represents noise traders
Shleifer and Summers (
1990) who do not act efficiently, and/or traders whose investment strategies do not fit our segmentation scheme. The gradient with respect to the parameters in the segment network
is calculated as follows:
Appropriate selection of reward function
is essential for training the segment network. Reward functions across the segments are constrained to similar scales in order for training to converge. In this study, as the simplest case,
is calculated as the profit and loss (P&L) of the order
o. Thus,
can be calculated as follows.
Here, is the mid price of the stock at time after order o is submitted. and are change in inventory and cash due to order o and can be calculated from the change of the market limit-order-book. For LMT and MKT, is the executed quantity and is the transaction amount. and in case of buy LMT or MKT, and and in case of sell LMT or MKT. For CXL, and are excluded quantity and transaction amount; times executed quantity and transaction amount of the canceling order. Therefore, differing from LMT and MKT, and in case of buy CXL, and and in case of sell CXL.
Here, refers to a time scale value, and , , and are affected by . Therefore, segments are considered to have individual values. By setting a value for each segment, the segment network can be trained.
Each
can be optimized by standard cross entropy minimization for
in Equation (
1). The gradient with respect to each
takes the form
where
is a cross entropy function, and
and
are the observed order and market states. In addition, the cross entropy gradient with respect to
can be calculated and added to Equation (
3) as an adjustment term. We performed training and validation of the proposed model with and without the adjustment term (we refer to these methods as LSIL1 and LSIL2), and the results may be seen in the experiments section.
4. Neural Network Configuration
4.1. Feature Engineering
As market states X, price series and orderbook features are used.
The price series is comprised of 10 market prices (last or current trade price) taken at certain time step intervals. The time step intervals were selected according to the time scale value of each cluster .
The orderbook features are a arranged in a vector array of the latest orderbook volumes at 10 price levels above and below the mid-market price. To distinguish buy and sell orders, buy order volumes are recorded as negative values.
4.2. Stock Order Digitization
We consider a single market and three order types: Limit order (LMT), Market order (MKT), and Cancel order (CXL). LMT and CXL specify the market side (i.e., buy or sell), prices, and quantities, and MKT specifies the market side, and quantities.
Orders were digitized based on order type (i.e., LMT, MKT, CXL), order price, and order volume. In practice, some orders are defined as combinations of two or more types of orders. For example, price change orders and volume change orders can be interpreted as a combination of LMT or MKT and CXL. In such a case, the LMT or MKT part of the order that is considered to reflect the latest intentions of the agents, is extracted.
Order price and volume are digitized into possible values. For price, 10 prices above and below the mid price are possible. For volume, up to five times the minimum trading unit is possible, and CXL orders are digitized with negative volume. LMT and CXL orders have possible values, and MKT orders that do not specify prices have possible values. Thus, 210 values are possible, and orders that do not match any condition are discarded.
4.3. Network Architecture
The proposed network architecture is shown in
Figure 1. The network consists of the segment network and segment level order networks. Networks outputs are aggregated by Equation (
1) and the overall order probability
is calculated. Reward and order probability losses are calculated using the predicted probabilities, observed order, and reward functions for each segment.
Segment network and segment level order networks have the same layer configuration, except for the last fully-connected layer. In these networks, two market state features, price series and orderbook (
Section 4.1), are extracted and merged. To extract price series features, a long short-term memory (LSTM)
Hochreiter and Schmidhuber (
1997) layer is used according to previous studies
Bao et al. (
2017);
Fischer and Krauss (
2018). Convolutional layers
Krizhevsky et al. (
2012) are used to extract orderbook features that have positional information
Tashiro et al. (
2019);
Tsantekidis et al. (
2017). Merged features are transformed by fully-connected layers and the segment or order probability is output from the last layer.
5. Experiments
Experiments were performed using simulated market data and historical market data. In experiments using simulation data, we ran an artificial market simulation in advance, and trained neural network models using the generated data. The objective of the experiments on artificial data was to verify that the proposed LSIL model could predict segment probability correctly in an idealized setting where order-trader pair information is available. In experiments using historical data, we trained models using actual public stock trading data from the Tokyo Stock Exchange.
The proposed LSIL method was used to train networks with and without the adjustment term (
Section 3). We refer to the networks as LSIL1 and LSIL2, respectively, and the proposed method was compared to a standard IL model and GAIL model. The IL model has the same layer configuration as the LSIL segment level order networks and simply predicts order probabilities from market states. The GAIL model is based on sequence generative adversarial nets (SeqGAN)
Yu et al. (
2017) and generates order sequences without using market states. In addition, a network, which we refer to as segment IL (SIL) with the same network architecture as LSIL, was optimized to minimize only order probability loss and not reward loss.
Model performance was compared using the following benchmarks: precision at k (
[email protected]), area under receiver operating characteristic (AUROC), and expected reward
. Precision at
k is the percentage of correct answers included in the top k classes in predicted scores.
[email protected] and AUROC are calculated for predicted order probabilities. Expected reward
is calculated to validate LSIL predicted cluster probability. As reward values are centered, positive
indicates that the LSIL model can predict segment probability appropriately.
5.1. Experiments on Simulated Data
We ran an artificial market simulation to generate a dataset, and used the dataset to train and validate our LSIL models. The artificial market simulator consists of markets and agents where markets play the role of the environment whose state evolves through the actions of the agents. In each step of simulation, an agent is sampled, the agent submits an order, and markets process orders and update their orderbooks. Market pricing follows a continuous double auction
Friedman and Rust (
1993).
We define a fundamental price
for the market. The fundamental price represents the fair price of the asset/market, is observable by stylized agents and is used to predict future prices. The fundamental price changes according to a geometric Brownian motion (GBM)
Eberlein and Keller (
1995) process. The volatility of the GBM was set to
.
Stylized agents are commonly used in artificial market simulations to model the behavior of realistic economic actors
Hommes (
2006), and reproduce many stylized facts of actual financial markets
Chiarella and Iori (
2002);
Chiarella et al. (
2009). Stylized agents predict expected future log return
r using the following equation:
where
and
and
are current market price and fundamental price, respectively, and
is the time window size (or time scale). Weight values
,
, and
are sampled randomly and independently from exponential distributions for each agent. The stylized agents predict future market prices
from the predicted log return using the following equation:
A stylized agent submits a buy LMT with price if , and submits a sell LMT with price if . The parameter k is the called order margin and represents the amount of profit that the agent expects from the transaction. In this experiment k was set to . The submitting volume v is fixed to one.
In this study, the following seven types of stylized agents are registered to the simulator: Type 1 (), Type 2 (), Type 3 (), Type 4 (), Type 5 (), Type 6 (), and Exceptional (). Noise weights were fixed at 0 for for types 1 to 6. To prevent chart term C from becoming too dominant, the expected value of chart weights was attenuated according to .
These types of stylized agents reflect our assumption that agents with some type of time scale exist. Here, 100 agents were registered for types 1 to 6 and 400 agents were registered for the exceptional type.
One simulation consists of 101,000 steps where the first 1000 steps were used to build up the initial market orderbook and subsequently discarded. Simulations were performed 10 times with changing random seeds, and the data from the first eight simulations were used for training and the data of the remaining two simulations were used for validation.
According to the configured types of stylized agents, LSIL segments were set as follows: : , : , : , : , : , : , and : Exceptional.
The results of modeling are shown in
Table 1. For all indicators of order prediction accuracy, the proposed LSIL2 outperformed all other methods. We find that our proposed method worked well without the adjustment term. In addition, since the expected rewards of LSIL1 and LSIL2 were both positive at
and
, we believe LSIL1 and LSIL2 were able to predict segment probabilities appropriately. Appropriate prediction of segment probabilities also contributed to the improvement of prediction accuracy as shown in
Table 1.
5.2. Experiments on Historical Data
We used FLEX_FULL historical full-order-book data from the Tokyo Stock Exchange.
1 FLEX_FULL contains tens of millions of stock order data per day recorded in millisecond resolution
Brogaard et al. (
2014).
In this experiment, data for symbol 9022 (Central Japan Railway Company) collected between 1 January 2018 and 31 December 2018 were used for training, and data collected between 1 January 2019 and 31 August 2019 were used for validation. Training and validation samples were extracted every 10 available samples. The segments of LSIL were set to the same values as the experiments using artificial data.
The average of each segment probability along all validation data is , , , , , , and while the LSIL1 and LSIL2 rewards were and . We thus see that traders with the shortest-term rewards are dominant in this market and in agreement with the ratio of orders submitted from the co-location site at the TSE.
The accuracy results are shown in
Table 2. We can see that SIL, LSIL1, and LSIL2 predicted orders with similar accuracy. Although LSIL2 outperforms on simulated data, we attribute its underperformance on historical data to the simplicity of our reward function specification. In general, real-market investors are considered to have a wide variety of “reward functions”, and therefore more diverse types of reward functions are needed for more accurate prediction. Nevertheless, we are able to obtain salient features of the most dominant segments.
An example of predicted segment probability is shown in
Figure 2. It shows changes of segment probabilities predicted using trained LSIL2 model and market prices over time. Segment probabilities fluctuate as market states (market price, orderboook, etc.) change. In this case, we see the market price rose suddenly at the end of the plot, and the probability of
temporarily increased just before the rise of the market price.
is the segment of traders with the shortest-term rewards. In general, short-term investors are considered to act when price fluctuations are expected immediately afterwards, and the increase in the probability of
in
Figure 2 is reasonable. Although there are some price fluctuations that are not linked to price fluctuations, we are able to interpret meaningful behavior patterns and gain insight into the agents driving the dynamics of real markets.