## 1. Introduction

However, the general problem of modeling market agent behavior from historical data is complicated by the sheer number of agents and the diversity of their utility functions. It is simply impractical to try to enumerate them all. Furthermore, most stock trading data are anonymized

Comerton-Forde and Tang (

2009), and information about who submitted a certain order is not available.

The goal of this study is to learn stock trader behavior patterns from anonymized historical stock order data using neural network-based imitation learning (IL)

Schaal (

1999);

Schaal et al. (

2003). To realize multi-modal imitation learning, we propose latent segmentation

Cohen and Ramaswamy (

1998);

Swait (

1994) of stock trading strategies by trader objective function. Orders are segmented according to the weighted average of the reward function for each stock trader segment. An IL model is defined for each segment and trained to predict which trader segment was most likely to have submitted a particular order at a particular time. We refer to the proposed method as “Latent Segmentation Imitation Learning (LSIL)”.

LSIL was evaluated using both simulated market data and actual historical stock order data. Experiments using simulated data were conducted to evaluate the validity of latent segmentation, and experiments on historical stock order data were conducted to examine the accuracy of stock order predictions made by LSIL. We find that LSIL models are able to predict stock orders with a degree of accuracy, and also provide meaningful insights into the drivers of trader behavior. Detailed investigation into changes in market conditions and segments revealed that our proposed segments behaves in line with real-market investor sentiments.

The primary contributions of this study can be summarized as follows:

We propose a neural network-based method for imitation learning of stock trading strategies. To consider diverse trading strategies, latent segmentation of based on a reward function is introduced.

The proposed method is evaluated using both simulated market data and historical stock order data. The proposed method was confirmed to provide both accurate stock order predictions and a meaningful interpretation of trader segment behavior.

## 3. Latent Segmentation Imitation Learning (LSIL)

As mentioned previously, financial markets consist of stock orders with various objectives, and these objectives are not self-evident from trading data alone. Since previous studies were seemingly able to classify trading strategies

Yang et al. (

2015), we should be able to achieve higher prediction accuracy by modeling each strategy class. In this study, we assume latent segmentation of traders

Cohen and Ramaswamy (

1998);

Swait (

1994) and that all traders belong to a unique segment at each time, that each trader may drift between segments, but cannot belong to more than one segment simultaneously.

Let the latent segments

${s}_{i}\phantom{\rule{3.33333pt}{0ex}}(i=1,2,\dots ,N)$ represent specific trading strategies. Then, the probability of submitting stock order

$\pi \left(o\right|X)$ can be written as

where

X are market states,

$p\left({s}_{i}\right|X)$ is the probability that traders belonging to segment

${s}_{i}$ submit an order, and

${\pi}_{{s}_{i}}\left(o\right|X;{\theta}_{i})$ is the probability that a stock order is submitted by traders in segment

${s}_{i}$ conditioned on parameter

${\theta}_{i}$. Each

${\pi}_{{s}_{i}}\left(o\right|X;{\theta}_{i})$ is predicted using an individual network, which we refer to as segment level order networks.

Although we can obtain $({X}_{t},{o}_{t})$ pairs from historical stock order data, information about ${s}_{i}$ is never available. Therefore, we also predict $p\left({s}_{i}\right|X)$ using another neural network that we refer to as the segment network. The predicted probability of a given segment is written $p\left({s}_{i}\right|X;{\theta}_{s})$, where ${\theta}_{s}$ is the parameter of the segment network.

As mentioned previously, each segment represents an individual strategy. Much like in reinforcement learning, we introduce an individual reward function for each segment. The reward function for segment

${s}_{i}$ is denoted

${r}_{{s}_{i}}\left(o\right)$. Then, with the predicted segment probability, the expected reward for order

o is calculated as follows.

Since real markets are not perfectly efficient

Jung and Tran (

2016), we assume there exists a segment of traders that acts inefficiently. We introduce an exceptional action segment

${s}_{*}$, which has a uniform reward function and requires

${\sum}_{i=1}^{N}p\left({s}_{i}\right|X;{\theta}_{s})+p\left({s}_{*}\right|X;{\theta}_{s})=1$.

${s}_{*}$ represents noise traders

Shleifer and Summers (

1990) who do not act efficiently, and/or traders whose investment strategies do not fit our segmentation scheme. The gradient with respect to the parameters in the segment network

${\theta}_{s}$ is calculated as follows:

Appropriate selection of reward function

${r}_{{s}_{i}}\left(o\right)$ is essential for training the segment network. Reward functions across the segments are constrained to similar scales in order for training to converge. In this study, as the simplest case,

$r\left(o\right)$ is calculated as the profit and loss (P&L) of the order

o. Thus,

$r\left(o\right)$ can be calculated as follows.

Here, ${p}_{\mathit{mid},\tau}$ is the mid price of the stock at time $\tau $ after order o is submitted. $\Delta I$ and $\Delta c$ are change in inventory and cash due to order o and can be calculated from the change of the market limit-order-book. For LMT and MKT, $\Delta I$ is the executed quantity and $\Delta c$ is the transaction amount. $\Delta I\ge 0$ and $\Delta c\le 0$ in case of buy LMT or MKT, and $\Delta I\le 0$ and $\Delta c\ge 0$ in case of sell LMT or MKT. For CXL, $\Delta I$ and $\Delta c$ are excluded quantity and transaction amount; $-1$ times executed quantity and transaction amount of the canceling order. Therefore, differing from LMT and MKT, $\Delta I\le 0$ and $\Delta c\ge 0$ in case of buy CXL, and $\Delta I\ge 0$ and $\Delta c\le 0$ in case of sell CXL.

Here, $\tau $ refers to a time scale value, and ${p}_{\mathit{mid},\tau}$, $\Delta I$, and $\Delta c$ are affected by $\tau $. Therefore, segments are considered to have individual $\tau $ values. By setting a $\tau $ value for each segment, the segment network can be trained.

Each

${\pi}_{{s}_{i}}\left(o\right|X;{\theta}_{i})$ can be optimized by standard cross entropy minimization for

$\pi \left(o\right|X;\theta )$ in Equation (

1). The gradient with respect to each

${\theta}_{i}$ takes the form

where

$CE$ is a cross entropy function, and

${o}_{t}$ and

${X}_{t}$ are the observed order and market states. In addition, the cross entropy gradient with respect to

${\theta}_{s}$ can be calculated and added to Equation (

3) as an adjustment term. We performed training and validation of the proposed model with and without the adjustment term (we refer to these methods as LSIL1 and LSIL2), and the results may be seen in the experiments section.

## 4. Neural Network Configuration

#### 4.1. Feature Engineering

As market states X, price series and orderbook features are used.

The price series is comprised of 10 market prices (last or current trade price) taken at certain time step intervals. The time step intervals were selected according to the time scale value of each cluster ${\tau}_{{c}_{i}}$.

The orderbook features are a arranged in a vector array of the latest orderbook volumes at 10 price levels above and below the mid-market price. To distinguish buy and sell orders, buy order volumes are recorded as negative values.

#### 4.2. Stock Order Digitization

We consider a single market and three order types: Limit order (LMT), Market order (MKT), and Cancel order (CXL). LMT and CXL specify the market side (i.e., buy or sell), prices, and quantities, and MKT specifies the market side, and quantities.

Orders were digitized based on order type (i.e., LMT, MKT, CXL), order price, and order volume. In practice, some orders are defined as combinations of two or more types of orders. For example, price change orders and volume change orders can be interpreted as a combination of LMT or MKT and CXL. In such a case, the LMT or MKT part of the order that is considered to reflect the latest intentions of the agents, is extracted.

Order price and volume are digitized into possible values. For price, 10 prices above and below the mid price are possible. For volume, up to five times the minimum trading unit is possible, and CXL orders are digitized with negative volume. LMT and CXL orders have $(10+10)\times (5+5)=200$ possible values, and MKT orders that do not specify prices have $5+5=10$ possible values. Thus, 210 values are possible, and orders that do not match any condition are discarded.

#### 4.3. Network Architecture

The proposed network architecture is shown in

Figure 1. The network consists of the segment network and segment level order networks. Networks outputs are aggregated by Equation (

1) and the overall order probability

$\pi \left(o\right|X)$ is calculated. Reward and order probability losses are calculated using the predicted probabilities, observed order, and reward functions for each segment.

Segment network and segment level order networks have the same layer configuration, except for the last fully-connected layer. In these networks, two market state features, price series and orderbook (

Section 4.1), are extracted and merged. To extract price series features, a long short-term memory (LSTM)

Hochreiter and Schmidhuber (

1997) layer is used according to previous studies

Bao et al. (

2017);

Fischer and Krauss (

2018). Convolutional layers

Krizhevsky et al. (

2012) are used to extract orderbook features that have positional information

Tashiro et al. (

2019);

Tsantekidis et al. (

2017). Merged features are transformed by fully-connected layers and the segment or order probability is output from the last layer.

## 5. Experiments

Experiments were performed using simulated market data and historical market data. In experiments using simulation data, we ran an artificial market simulation in advance, and trained neural network models using the generated data. The objective of the experiments on artificial data was to verify that the proposed LSIL model could predict segment probability correctly in an idealized setting where order-trader pair information is available. In experiments using historical data, we trained models using actual public stock trading data from the Tokyo Stock Exchange.

The proposed LSIL method was used to train networks with and without the adjustment term (

Section 3). We refer to the networks as LSIL1 and LSIL2, respectively, and the proposed method was compared to a standard IL model and GAIL model. The IL model has the same layer configuration as the LSIL segment level order networks and simply predicts order probabilities from market states. The GAIL model is based on sequence generative adversarial nets (SeqGAN)

Yu et al. (

2017) and generates order sequences without using market states. In addition, a network, which we refer to as segment IL (SIL) with the same network architecture as LSIL, was optimized to minimize only order probability loss and not reward loss.

Model performance was compared using the following benchmarks: precision at k (

[email protected]), area under receiver operating characteristic (AUROC), and expected reward

$E\left[r\right(o,X\left)\right]$. Precision at

k is the percentage of correct answers included in the top k classes in predicted scores.

[email protected] and AUROC are calculated for predicted order probabilities. Expected reward

$E\left[r\right(o,X\left)\right]$ is calculated to validate LSIL predicted cluster probability. As reward values are centered, positive

$E\left[r\right(o,X\left)\right]$ indicates that the LSIL model can predict segment probability appropriately.

#### 5.1. Experiments on Simulated Data

We ran an artificial market simulation to generate a dataset, and used the dataset to train and validate our LSIL models. The artificial market simulator consists of markets and agents where markets play the role of the environment whose state evolves through the actions of the agents. In each step of simulation, an agent is sampled, the agent submits an order, and markets process orders and update their orderbooks. Market pricing follows a continuous double auction

Friedman and Rust (

1993).

We define a fundamental price

${p}_{F}$ for the market. The fundamental price represents the fair price of the asset/market, is observable by stylized agents and is used to predict future prices. The fundamental price changes according to a geometric Brownian motion (GBM)

Eberlein and Keller (

1995) process. The volatility of the GBM was set to

$5\times {10}^{-6}$.

Stylized agents are commonly used in artificial market simulations to model the behavior of realistic economic actors

Hommes (

2006), and reproduce many stylized facts of actual financial markets

Chiarella and Iori (

2002);

Chiarella et al. (

2009). Stylized agents predict expected future log return

r using the following equation:

where

and

${p}_{t}$ and

${p}_{t}^{*}$ are current market price and fundamental price, respectively, and

$\tau $ is the time window size (or time scale). Weight values

${w}_{F}$,

${w}_{C}$, and

${w}_{N}$ are sampled randomly and independently from exponential distributions for each agent. The stylized agents predict future market prices

${p}_{t+\tau}$ from the predicted log return using the following equation:

A stylized agent submits a buy LMT with price ${p}_{t+\tau}(1-k)$ if ${p}_{t+\tau}>{p}_{t}$, and submits a sell LMT with price ${p}_{t+\tau}(1+k)$ if ${p}_{t+\tau}<{p}_{t}$. The parameter k is the called order margin and represents the amount of profit that the agent expects from the transaction. In this experiment k was set to $0.01$. The submitting volume v is fixed to one.

In this study, the following seven types of stylized agents are registered to the simulator: Type 1 ($15\le \tau \le 25$), Type 2 ($30\le \tau \le 50$), Type 3 ($60\le \tau \le 100$), Type 4 ($120\le \tau \le 200$), Type 5 ($240\le \tau \le 400$), Type 6 ($480\le \tau \le 800$), and Exceptional (${w}_{f}={w}_{c}=0$). Noise weights ${w}_{N}$ were fixed at 0 for for types 1 to 6. To prevent chart term C from becoming too dominant, the expected value of chart weights ${w}_{C}$ was attenuated according to $\tau $.

These types of stylized agents reflect our assumption that agents with some type of time scale exist. Here, 100 agents were registered for types 1 to 6 and 400 agents were registered for the exceptional type.

One simulation consists of 101,000 steps where the first 1000 steps were used to build up the initial market orderbook and subsequently discarded. Simulations were performed 10 times with changing random seeds, and the data from the first eight simulations were used for training and the data of the remaining two simulations were used for validation.

According to the configured types of stylized agents, LSIL segments ${s}_{i}$ were set as follows: ${s}_{1}$: $\tau =20$, ${s}_{2}$: $\tau =40$, ${s}_{3}$: $\tau =80$, ${s}_{4}$: $\tau =160$, ${s}_{5}$: $\tau =320$, ${s}_{6}$: $\tau =640$, and ${s}_{*}$: Exceptional.

The results of modeling are shown in

Table 1. For all indicators of order prediction accuracy, the proposed LSIL2 outperformed all other methods. We find that our proposed method worked well without the adjustment term. In addition, since the expected rewards of LSIL1 and LSIL2 were both positive at

$0.1127$ and

$0.0721$, we believe LSIL1 and LSIL2 were able to predict segment probabilities appropriately. Appropriate prediction of segment probabilities also contributed to the improvement of prediction accuracy as shown in

Table 1.

#### 5.2. Experiments on Historical Data

We used FLEX_FULL historical full-order-book data from the Tokyo Stock Exchange.

1 FLEX_FULL contains tens of millions of stock order data per day recorded in millisecond resolution

Brogaard et al. (

2014).

In this experiment, data for symbol 9022 (Central Japan Railway Company) collected between 1 January 2018 and 31 December 2018 were used for training, and data collected between 1 January 2019 and 31 August 2019 were used for validation. Training and validation samples were extracted every 10 available samples. The segments ${s}_{i}$ of LSIL were set to the same values as the experiments using artificial data.

The average of each segment probability along all validation data is $p\left({s}_{1}\right)=0.4362$, $p\left({s}_{2}\right)=0.0517$, $p\left({s}_{3}\right)=0.0500$, $p\left({s}_{4}\right)=0.0893$, $p\left({s}_{5}\right)=0.1348$, $p\left({s}_{6}\right)=0.1835$, and $p\left({s}_{*}\right)=0.0546$ while the LSIL1 and LSIL2 rewards were $0.0371$ and $0.0179$. We thus see that traders with the shortest-term rewards are dominant in this market and in agreement with the ratio of orders submitted from the co-location site at the TSE.

The accuracy results are shown in

Table 2. We can see that SIL, LSIL1, and LSIL2 predicted orders with similar accuracy. Although LSIL2 outperforms on simulated data, we attribute its underperformance on historical data to the simplicity of our reward function specification. In general, real-market investors are considered to have a wide variety of “reward functions”, and therefore more diverse types of reward functions are needed for more accurate prediction. Nevertheless, we are able to obtain salient features of the most dominant segments.

An example of predicted segment probability is shown in

Figure 2. It shows changes of segment probabilities predicted using trained LSIL2 model and market prices over time. Segment probabilities fluctuate as market states (market price, orderboook, etc.) change. In this case, we see the market price rose suddenly at the end of the plot, and the probability of

${s}_{1}$ temporarily increased just before the rise of the market price.

${s}_{1}$ is the segment of traders with the shortest-term rewards. In general, short-term investors are considered to act when price fluctuations are expected immediately afterwards, and the increase in the probability of

${s}_{1}$ in

Figure 2 is reasonable. Although there are some price fluctuations that are not linked to price fluctuations, we are able to interpret meaningful behavior patterns and gain insight into the agents driving the dynamics of real markets.