High-Frequency Trading (HFT) and Market Quality Research: An Evaluation of the Alternative HFT Proxies

: We examine the soundness of high-frequency trading (HFT) proxies that are widely deﬁned on the limit order book (LOB) information. We use a unique TRTH (Thomson Reuters Tick History) millisecond time-stamped intraday trades and quotes dataset enriched with 10 levels of LOB depth messages for 149 highly fragmented LSE listed stocks for the period 2005 to 2016. We explore a sharp uptrend in HFT activities and accompanying improvement in market liquidity in the European market. We show that alternative HFT proxies built on LOB are not equally powerful. The HFT proxy deﬁned on the ﬁve best LOB prices (the mid point of a typical limit order book) provides a better HFT identiﬁcation than the one popularly deﬁned on the ﬁrst best prices (BBO). We suggest that picking the LOB information beyond a certain level (e.g., the best ﬁve prices) of market depth in developing HFT proxy is counterintuitive. Evidence indicates that high-frequency traders (HFTs) participate in both competitive (narrow) and passive (wider) quoting as a market making strategy; however, they do not participate in passive quoting excessively.


Introduction
For the last two decades, the advent of sophisticated computing technology has been changing the financial market structure unprecedentedly. Machines are gradually occupying the places for which human interaction was necessary. High-frequency trading (HFT) is the latest and one of the important technological inclusions in the modern trading platforms. The contribution of HFT in equity trading is significantly large across developed financial markets. For example, in the USA, it was approximately 52% of total equity trading in 2018 (Zaharudin et al. 2021). HFT is extraordinarily fast and the fastest speed has increased the effectiveness of HFT strategies in the present-day fragmented trading environment (Baldauf and Mollner 2021). It is now a regulatory concern all over the world regarding the market impact of HFT. Addressing these issues, market microstructure research in HFT has been trending for the last decade and an increasing body of empirical literature has shown contradictory evidence.
Surprisingly, to date, the consensus on the definition of HFT is quite low. Nonetheless, the differences in research evidence and the public, media, and regulatory perception regarding the impact of HFT are huge (Gomber et al. 2011). For empirical study, HFT identification is a precondition. The limited and costly access to HFT data limits the scope of choices for the researchers, and studies to rely on some imperfect HFT identification strategies. Proprietary trade and quote data extracted from the limit order book (LOB) are one of the broadly accessible data sources in HFT studies (Ben Ammar and Hellara 2021;Boehmer et al. 2018;Conrad et al. 2015;Friederich and Payne 2015;Frino et al. 2017;Hendershott et al. 2011;Leone and Kwabi 2019). Generally, those datasets are supplied without direct identification or HFT flag, and users must rely on some definitions or proxies for HFT identification. The choice of proxies are many folds (Biais and Foucault 2014;Bouveret et al. 2014) and a single data source does not support them all. Proprietary LOB datasets which are commonly used for HFT research are also source-dependant, and the choice of proxies are mostly constrained by the fields and depth levels to which an user gains access.
On top of that, managing LOB data is always challenging because of its voluminous aspect, computational burden, and required time and resource commitment for its management. Therefore, compromises in the choices of necessary data fields is quite common. For example, many studies (Boehmer et al. 2021;Hendershott et al. 2011) do not use the LOB information beyond the best market depth level or BBO, either for avoiding the related complexities in data management or not having the appropriate data access. Yet the evidence (AMF 2017; Hendershott and Riordan 2013) on ATs/HFTs' participation at the different LOB levels shows a participation rate of 52% to 70.8% at the BBO and marketable orders, and 64% to 79.3% behind the BBO and nonmarketable limit orders respectively. It implies that a proxy defined on the first best level of limit order book messages does not incorporate the HFT activities behind that point and could create a number of caveats which concerns the validity of the proxy. On the other hand, expanding the data extraction to many levels inside an order book poses the challenge of data handling. Moreover, the use of diverse HFT proxies in the absence of a proper standard is creating many concerns. Zaharudin et al. (2021) put the right apprehension of using distinct proxies across HFT research: "The lack of a uniform identification for HFT leads to problems, such as research complications, that lead to somewhat conflicting conclusions as to the effect of HFT on equity markets in general and the market microstructure in particular." In this paper, we attempt to evaluate the soundness of a set of alternative HFT proxies that are dependent on proprietary LOB data and frequently used in the HFT and related studies. The primary aim is to find an appropriate level of limit order books' depth until which the extraction of information helps the best identification of HFT footprints, and simultaneously minimizes the issues related to the insignificant use of voluminous data. In doing so, we gain access to one of the richest millisecond time-stamped TRTH (Thomson and Reuters Tick History) intraday trades and quotes dataset enriched with 10 depth levels messages of LOB covering the period 2005 to 2016 for a group of 149 LSE (London Stock Exchange) listed large capitalized stocks in Europe. We develop all proxies available in the relevant literature and supported by our datasets, and evaluate them in two phases: (i) firstly, we discuss their descriptive properties and issues that have already been reported in the literature; (ii) afterwards, we employ them in a broadly used market microstructure empirical setting to understand their effectiveness in identifying the HFT activities.
A first-hand analysis shows that Hendershott et al. (2011)'s proxy (dollar volume per electronic message times (−1)) does not appropriately fit in the European equity market setting for the sample period under our consideration 1 . Another proxy, order to trade ratio, also has the same limitation. For determining the impact of choices of different level of limit order book information on HFT proxies, we take two extremes and a midpoint of the limit order book market depths, and define three alternative HFT proxies on them. We evaluate each of the three proxies by employing them in our empirical models. The analyses show that the HFT proxy defined on the first best bid and offer prices (BBO) fails to include the footprints that HFTs leave on a typical limit order book through their passive (nonmarketable) quoting. The HFT proxy constructed by taking the other extreme of limit order book depth, i.e., the first 10 best bid and ask prices, does not also show the improvement with regard to explaining the changes in market quality parameters. On the contrary, the HFT proxy defined on the mid point or the first five levels of market depth information provides the best HFT identification and reasonable explanation for the changes in market quality parameters over the past. The analyses also confirm that market fragmentation has a significant role in understanding the HFT behaviour.
This study contributes to the existing body of empirical market microstructure literature in general and high-frequency trading literature in particular in at least two ways.
Firstly, to our knowledge, this is the first study of its kind to evaluate the alternative high-frequency trading proxies built on limit order book information. Secondly, it provides evidence regarding the high-frequency trading intensity and market liquidity changes in the European equity market covering one of the longest periods, using an original dataset. The rest of the paper is organized as follows. Section 2 gives a short accounts of the HFT market microstructure in the European market, Section 3 describes the literature. Section 4 provides the description of data, variable definition, and empirical specification. Section 6 reports and discusses the results, and Section 6 concludes the study.

High-Frequency Trading
As a new field in the Algorithmic Trading (AT) domain, the literature and definition on HFT are in their infancy (Gomber et al. 2011). Aldridge (2013) highlights that the computing technology plays the biggest role in HFT business. HFT identifies every small change in the limit order book quote updates and enables them to move faster. A fairly volatile financial market that ensures adequate changes in price to exceed transaction costs and provides ample space so that traders can quickly take in and out from market positions, befits HFTs' need. There are many papers (Aldridge 2013;Biais and Foucault 2014;Gomber et al. 2011;O'Hara 2015; The Netherlands Authority for the Financial Markets 2010; Zaharudin et al. 2021) that have made attempt to paint a descriptive picture of HFT and have marked a distinction between the working feature of HFT and AT. It reveals that both HFT and AT share some common features but they work on different purpose and capacity. 'Market making' is one of the main strategies of HFT. Additionally, the domain of HFT strategies includes electronic liquidity provision, statistical and latency arbitrage, liquidity detection, and short-term momentum. Sometimes HFTs are registered as the designated liquidity provider in addition to market making. The salient features of HFTs include (Gomber et al. 2011): very high number of order generation, fast order revision or cancellation, proprietary trading, generating profit from market making, maintaining a zero or minimal inventory, fast holding revision, using low-latency technology, taking position in highly liquid instruments, and so on. For a detail discussion on HFT and AT, we refer the reader to Aldridge (2013); Gomber et al. (2011); The Netherlands Authority for the Financial Markets (2010). Figure 1 shows the domain of HFT.

European Market Structure
The adoption of MiFID in 2007( MiFID II in 2018 has been proliferating the trading venues across the European equity market. One of the vital changes it has brought into effect was abolishing the monopoly power of traditional exchanges in trading securities and liberalizing them into many trading platforms such as regulated markets (RMS), multilateral trading facilities (MTFs), and systematic internaliser (SIs). These platforms have different market structures and reporting systems that are defined under MiFID directives 2 .
In broad terms, RMs and MTFs operate in a similar fashion, provide an electronic platform for users to transact orders multilaterally. These trading venues generally match orders on a non-discretionary basis according to pre-defined rules that establish price and time priority for submitted orders. RMs and MTFs are required to publish pre-trade quotes and report details of executed trades to the market (CFA Institute 2011). Both RMs and MTFs are allowed to organize primary listing. However, RMs facilitate the listing of regulated instruments and MTFs do the same for the unregulated ones. A firm chooses on which RM to list, and once listed, MTFs may decide to organize trading for that firm as well. SIs are investment firms that internalise order flow to deal on their own account on an organised, frequent, and systematic basis. Trades executed through SIs are reported as the over-the-counter (OTC) trades.
The large RMs in the European equity market include the LSE Group (operator of the London Stock Exchange and Borsa Italiana), NYSE Euronext (which operates exchanges in France, Belgium, the Netherlands, Portugal, and the United Kingdom), and Deutsche Borse Group (operator of the Frankfurt Exchange and the Xetra trading system). The LSE runs electronic order books on which buy and sell orders are continuously matched from the open to the close according to the price-time priority rules. Automated trading sessions start at 8:00 and close at 16:30 in local time. As a supply side response, LSE has been investing heavily in technology against growing HFT demands for low-latency 3 tradings for the last two decades. The implementation of the Millenium trading platform has improved its latency to 113 microseconds which were 600 milliseconds before the year 2000 (Linton and Mahmoodzadeh 2018). Beside RMs, the main MTFs are CHIX, BATS, and Turquoise. These exchanges are well equipped with modern latency-based technologies, and have become main rivals of primary exchanges like LSE.

Limit Order Book (LOB)
The understanding of a limit order book structure is fundamental for developing HFT proxies. Every new information arrival keeps the LOB updating. It may occur due to new order, order modification, order cancellation, or trade execution, etc. The limit order and market order are two basic order types that investors generate in the system according to their needs of liquidity. Investors can design orders differently, according to their needs, while submitting them to an LOB. A successful limit order execution depends on the finding of a counterparty with the opposite needs and therefore sometimes remains unexecuted. A market order is executed immediately if it finds an outstanding limit order on the other side of the LOB. All limit orders are arranged on both sides of the LOB according to their price and time priority. The lowest available price on the ask/offer side (sell limit order) at a particular point of time is known as the best offer price. All offer prices other than the best one are queued behind the best price according to their time and price priority. The highest available price on the bid side (buy limit order) at a particular point of time is known as the best bid price. All bid prices other than the highest one are also queued behind the best price according to their time and price priority. The best bid and offer prices, BBO, are the best available prices on both sides of a limit order book at a particular time. The construction of a hypothetical LOB is illustrated in Figure 2. The depth level of an LOB refers to the number of offer and bid prices in a queue. For example, 'n'th-level depth refers to the 'n'th best bid and ask prices. We refer Table A1 in the Appendix A for a real-time LOB update up to the 10 levels. HFT studies generally establish a link between the intensity of HFTs' participation in the limit order book and the frequency or speed at which quotes are updated there (Conrad et al. 2003;Hendershott et al. 2011).

High-Frequency Data
High-frequency data are the records of the real-time LOB updates. Every change in a LOB, also known as 'tick', shows the continuous update of a limit order book's records as a results of the arrival of new limit orders ( buy and sell), revision or modification of the existing limit orders, cancellation of the existing limit orders, or the entry of new market orders. "Ticks" are sequentially collected and recorded with a proper timestamp in producing the highest-frequency data. A timestamp records the date and time at which a particular event of quote updates occurs. The timestamps show the time and date at which either the exchange or the broker-dealer released the quotes, or the time at which the automated trading system received the quotes at the exchange, or when the third-party distributor of the 'tick' data stamp the time and date at its own facility after receiving the real-time data feeds from the exchange. Each record of high-frequency data is comprised of many fields providing the records for security identifier, exchange identifier, time region, transaction types, trades or quotes volume, prices, qualifiers for showing the different market conditions, and many more. We refer the reader to Aldridge (2013) for a good coverage on high-frequency data.
The importance of high-frequency data in studying complex microstructure of modern financial markets is well established. Nowadays, market microstructure studies are using high-frequency data more frequently than ever before. Although high-frequency data have concentrated on the major financial markets, many recent studies on less developed or emerging markets report their high-frequency data properties (Markellos et al. 2003). Modern investors search for profitable opportunity across electronic markets through their intraday high-frequency trading windows (The Netherlands Authority for the Financial Markets 2016). Electronic exchanges are connected with each other through the traders. Kollias et al. (2013) draw the inference from relevant studies: "intraday data help reveal a more accurate picture of how markets and market agents react and adapt to changes and exogenous shocks...the more detailed account and the information contained in high frequency data allow more reliable inferences and conclusions to be drawn vis-à-vis daily data".

Relevant Literature
HFT identification strategy is not only critical but a central issue in empirical studies. The empirical HFT literature is divided into many strands of which HFT is considered a subset. A researcher attempting an empirical HFT study has limited choices, and depends on either one or both of the approaches for identification: (i) to use an exchange provided HFT-flagged dataset (direct method); or (ii) to define a proxy which tracks the footprint of HFT (indirect method). Conrad et al. (2015) has pointed out the limitation of using the exchange-identified HFT data. Generally, exchange houses select samples according to their own criteria, and this may not be free from possible conflict of interest which prevails among the users of HFT flagged data, HFT firms, and trading platforms. Other than that, sample firms appear to be large and specialized in HFT often operate in several exchanges across countries. There are various reasons why there could be a non-random distribution of trades across trading venues due to their heterogeneity in liquidity, fee structure, etc. Therefore, sometimes, drawing inferences from these datasets may not reflect the true HFT behavior. In contrast, a well-defined proxy tracks the predominant market making nature of HFT through posting and renewing quotes.

HFT Studies Using Direct Method
HFT-flagged data provide marks internally on every order originated by the highfrequency traders (HFTs) with a distinct flag. Many studies have used a pre-identified dataset. The NASDAQ dataset is the famous flagged dataset that has been used in many HFT papers such as Carrion (2013) The other pre-identified datasets used in the prior studies are mostly study-specific and diverse.  used the HFT flagged FSA data set measuring the HFT activities in the UK equities asset of particular HFT firms from 5 November, 2007 to 5 August 2011. Brogaard and Garriott (2018) used the data from one of the alternative trading platform of Canada with complete order-book information. The dataset provides the identification for direct-access members and enables the identification of HFT firms using inventory and trading behaviour. It also provides the sequence by which the members get entered in the market. Hendershott and Riordan (2013) used HFT flagged data from the Deutsche Boerse. The dataset contains all orders generated from AT in 30 DAX stocks (the leading index for German stock market ) between 1 January and 18 January 2008, for a total of 13 trading days.

HFT Studies Using Indirect Method
Unflagged HFT data are trades and quote data supplied from the exchange houses without pre-identification or flag. The users need to develop some sorts of proxies for identifying the HFT activities. Ideally, the definition of an HFT proxy target one or more HFT identifying features, and develop the measure accordingly. The seminal study of Hendershott et al. (2011) used the 'number of electronic messages per $100 trading volume' as their proxy is defined only on the BBO level information for the period 2001-2005. They targeted the 'speed' as an identifying feature of HFT. There are also other features upon which many studies have relied. For example, Kirilenko et al. (2017) used the 'daily net position' with the idea that HFTs usually perform high-volume trades and carry low intraday and overnight positions. Hasbrouck and Saar (2013) developed the proxy 'Strategic Runs' of linked messages where the proxy exploits the particular order sending and cancelling pattern of low-latency traders. Moreover, quote updates (Conrad et al. 2015), message-to-trade ratios (Friederich and Payne 2015; Frino et al. 2017), the duration or 'life time of orders' (Bouveret et al. 2014), and HFT strategies (Hagströmer and Nordén 2013), etc., are proxies which have been defined on one or more identifying features of HFT.

Data
LSE is the largest RM in Europe for listing the highest number of large market capitalized stocks from Stoxx 800, a benchmark index constituting the largest 800 market capitalized stock in Europe. It is widely believed and demonstrated by previous studies that large-caps stocks are highly liquid, attract more HFT, and are extremely fragmented (AMF 2017;Boehmer et al. 2021;Gresse 2017;Hendershott et al. 2011). We analyse the list for STOXX 800 at the end of 2016 to know its composition (See Figure A1). It shows that that top 50% of the stocks in the list are from only three primary trading venues, the London Stock Exchange (LSE), Deutche Boerse (Xetra), and Euronext Paris, of which LSE listed stocks are more than 50%. It also shows the market share of both primary and alternative lit trading venues in European equity markets. Among the trading venues, CHIX, BATS, and Turquoise facilitate most of the lit trading besides the primary platforms. After a long trial, we finalize our sample that includes large market capitalized stocks of LSE which are considerably liquid, cross-listed in MTFs during 2005-2016. Table 1 lists all securities by their RICs ( Reuters instrument code) and available data points (number of days) which are included in the sample. The primary source of our data is Thomson Reuters Tick History (TRTH), a product of the Securities Industry Research Centre of Asia-Pacific (SIRCA), which is compiled from the Global Thomson Reuters exchange feeds. Two resilient London-based recording devices provide the millisecond timestamps to each recorded message. The primary analysis of TRTH data structure reveals that time synchronizations of trades and respective quote messages are not uniform across trading venues. TRTH provides better time synchronizations between quotes and trades for the trading venues which are physically closer to the IDN Collection LAN in London (e.g., LSE, CHIX, Bats, Turquoise) than for those which are not (e.g., Deutche Boerse (Xetra), Euronext Paris ). This issue raises some real challenges in determining trades and quotes based measure of transaction costs, which is particularly true for the effective spread. Considering the TRTH time synchronization issue, we narrow down the sample choices only to the UK-based LSE-listed stocks. To address the fragmented environment of these stocks appropriately, we select CHIX, BATS, and Turquoise as their alternative venue counterparts. These four trading venues facilitated around 99% of lit tradings during the period 2014-2016 for the stocks that are primarily listed in LSE. The trades and quotes data are available from TRTH since 1996 for most of the primary trading venues, which for alternative trading venues in MiFID zone started to be available from mid of 2008. Among the 220 primarily selected stocks from the LSE, TRTH provides data support only for 204 stocks. At this point, a primary analysis shows that among those 204 securities, some are not compatible for further analysis due to reasons such as delisting, takeovers, or mergers with other firms or liquidation at some point or not having enough data coverage for all four trading venues for unknown reasons, and we reduce the sample finally to 149 stocks.
TRTH supplies intraday quotes and trades records through two main files-the Time and Sales (TS) and the Market Depth (MD). Two resilient London-based recording devices provide the millisecond time stamp to each recorded message. The time and sales file provides transaction records and the best quote updates, and the market depth file comes with the queue of bid and ask limit prices and respective quantities, as displayed in the limit order book. The records in market depth can be extracted to 25 best limit prices (based upon their availability) of which we limit our data up to the 10 best prices since no records are found beyond that level. The full limit order book provides important information about the depth and spread beyond the BBO, which essentially affects the trading decisions of market participants. As far as we are aware, the sample we construct here should provide the longest coverage of data with its highest granularity used in any HFT research. We also rely on the Thomson Reuters's Datastream for some other data such as control variables which are not made available from TRTH.

High-Frequency Trading
We define all HFT proxies available from the literature, and that are supported by our datasets. Therefore, we can not construct proxies that require a direct link-up of every limit order book update with an order ID. We can track intraday millisecond records of the trades and quotes data for every update in the LOB. For an update in any fields of its 10 level depths, we see the change and track them. The updates are generally caused by any of the following reasons: (i) order execution, (ii) arrival of new limit order, (iii) quote cancellation, and (iv) quote modification. We then aggregate and divide them by the number of minutes allocated for each daily automated trading session (8.00 to 16.30) to get the daily quote update speed per minute.
We use the method used in Hendershott et al. (2011) for defining the daily quote update speed per minute known as the electronic message rate (h f t it ). The electronic message rate measures the intensity of the LOB quote updates according to the HFT speed.
where qupdate it is the aggregate daily quote update up to the k level depth of the LOB for stock i on day t, and T is the length of a daily trading sessions (in minutes). A recent regulatory study report (AMF 2017) shows that HFTs actively participate beyond the best limit prices. The average market share of HFT in the BBO, the two best limit prices, and the three best limit prices are 70.8%, 77.3%, and 79.3%, respectively. We use the definition of daily electronic message rate per minute for measuring three alternative HFT proxies (h f t_bbo it , h f t_5bo it h f t_bbo it ) based on three different market depth level ( k ) information. h f t_bbo it , h f t_5bo it , h f t_bbo it measure the quote update until the BBO, 5 best bid and ask prices, and 10 best bid and ask prices, respectively.
Our fourth alternative measure for HFT is Hendershott et al. (2011)'s proxy where electronic message rate is adjusted for trading volume (algo_trad_k it ). We define where value it is the value of the trading volume of stock i on day t, qupdate it is the aggregate daily quote update up to the k level depth of the LOB for stock i on day t. We also define two alternative measures of the same metric, algo_trad_(10bo) it and algo_trad_(5bo) it for the 10 best levels and 5 best levels order book updates, respectively. Our last proxy is the order-to-trade ratio (ord_to_trd it ) and is defined as below, where qupdate it is the aggregate daily quote update up to the 10-level depth of the LOB for stock i on day t, and ntrades it is the number of executed trade of stock i on day t. The post-MiFID period is prominently characterized by the proliferation of low-latency based modern trading venues, through which markets have been experiencing a large influx of HFT investment. A substantial upward trend of the message trafficking during this period can be seen in Figure 3a. Table 2 shows the descriptive statistics of all HFT measures where 149 stocks are divided into 5 quintiles, based on market capitalization. It shows that HFT intensity increased over the sample period remarkably ( Figure 3). The average message traffic rate h f t_10bo in 2005 starts at 9 messages per minute, and rises to 176 messages in 2016. All other HFT proxies also show a similar rise in HFT activity during this period. It can also be seen that HFT activities are more intense in large stocks. The average message speed for the proxy developed on 5 best bid and offer price (h f t_5bo) is more than double that for the proxy constructed on the BBO (h f t_bbo). It can also be seen that HFT intensity demonstrated in the measure h f t_10bo is not that intense, as captured by the h f t_5bo.  For developing a sound empirical strategy, we incorporate the market fragmentation in our setup. The idea is that technology and regulation drive both HFT, and fragmentation and they are closely related (Menkveld 2016;Upson and Van Ness 2017). The impact of order fragmentation on liquidity is well documented in many studies, e.g., Degryse et al. (2015); Gresse (2017); O'Hara and Ye (2011); Aitken et al. (2017), etc. We refer to the study Mishra and Zhao (2021) for a more comprehensive discussion regarding the link between HFTs and fragmented equity markets.
We use the Herfindahl-Hirschman Index (HH I)-the most commonly used definition of market concentration in the literature, as the proxy for order fragmentation across markets. We define the proxy for stock level volume fragmentation, HH Itrd it , as where v ij is the square of the trading volume share on venue j among n, for security i at day t. This is a normalized measure that ranges from 1 to n, where 1 stands for no fragmentation or full concentration in the primary venue, and n for evenly distributed order flow across n exchanges. Since our study considers one primary venue (LSE) and three alternative exchanges, the range is 1 ≤ HH Itrd it ≤ 4. The trading volume of all stocks, on average, has fragmented across the venues over the past. We find that the average HH Itrd has increased from 1 in 2005 to 2.89 in 2016 ( Figure 4). It may be noted that the market was concentrated before 2007, and the proxy takes the value 1 accordingly.

Liquidity Measures
We take only the liquidity 4 dimension of market quality measures. Market microstructure literature classifies liquidity into several dimensions, such as transaction costs, quantity, time, and so on. We use the quoted half-spread (spread_bps it ) and the effective half-spread (espread it ) as measures of the transaction cost dimensions of liquidity. To address the quantity dimension of liquidity, the quoted depth (depth_k it ) is measured at and beyond the BBO.
For the t th intraday quote in stock i, quoted half-spreads in basis points (spread_bps it ) are defined as where ask it is the best quoted ask price, bid it is the best quoted bid price, and mp it is the quote midpoint at BBO calculated as (ask it + bid it )/2. Finally, we aggregate all intraday spread_bps it into a time weighted daily measure. For interpretation, the lower the quoted spread, the higher the liquidity and vice versa.
For the t th intraday trade in stock i, the effective half-spread in bps, espread it , is where d it is an indicator variable that equals +1 if the k th intraday trade is a liquidity demander's buy and −1 if the k th trade is a liquidity demander's sell, p it is the trade price, and mp it is the quote midpoint prevailing at the time of the k th trade. Finally, all intraday effective spreads are weighted by their trade volume to convert them into a daily measure. For clarification, a narrower effective spread is associated with a higher liquidity and vice versa. Generally, empirical studies use the readily available trade signing approaches found in the literature, e.g., Lee and Ready (1991). The signing of trade using any of these methods comes with a high cost of inaccuracy due to the fact that exchange platforms and data providers usually do not follow a uniform data synchronization system. In contrast to common practices, we develop unique algorithms that are capable of signing trade precisely in TRTH European data structure. The algorithms match every trade price with the immediate prevailing quotes, both bid and ask, and define k th trade as liquidity demander's buy if it matches quoted ask price, and as liquidity demander's sell if it matches quoted bid price. 5 The average quoted depth can be decomposed into offer depth (the specified quantity that a liquidity supplier is willing to sell at ask price), and bid depth (the specified quantity that a liquidity supplier is willing to buy at the bid price). The intraday quoted depth for stock 'i' at time 't' is where depth_k it , refers to the average offer and bid quantity available at k level of market depth at time t, for stock i. We define quoted depth for two values of k, the first one, depth_bbo it , is based on BBO level market depth, and the other one, depth_3bo it , is based on the first three best market depth levels. Finally, we aggregate all intraday depth into a daily measure by taking simple average. For interpretation, the larger the depth size, the higher the liquidity supply at depth level 'k'.
During the period 2005-2006, both quoted spreads and effective spreads reveal substantial improvement of liquidity in the market (see Figure 5a). For instance, overall quoted spreads decreased more than one-half and dropped to 13.5 bps (approx.) at the end of 2016 compared to 30 bps (approx.) in 2005. The effective spreads also decreased to approx. 4 bps compared to approx. 10.7 bps in 2005. Figure 5b shows that both depth measures of liquidity declined over the years significantly. It would seem surprising if one interprets this movement as liquidity degradation. We rather argue that the decline in depth might be attributable to the narrowed depth as market makers have less incentive to offer a larger depth that is usually done for a wider spread. The rising HFT intensity during the sample period is also linked to the decreasing trade sizes. HFT is generally performed in small lots, and it prominently uses slice-and-dice strategy in executing a large order (Aitken et al. 2017;Gresse 2017;Hendershott et al. 2011). One of the early used HFT/AT proxies in market microstructure literature is algo_trad it which was originally developed and used in the study Hendershott et al. (2011). The proxy is defined as the negative of trading volume (in hundreds of dollars) divided by the number of messages. The idea of adjusting the message rate by the trading volume originated from the fact that the sample of NYSE used in the study for the period 2001-2005 was associated with an upward trend of the trading volume. However, this adjustment makes this proxy difficult (or misleading) for interpretation in some cases. Generally, the lower the ratio is (the higher the absolute proxy value), the higher the HFT intensity is. It can be seen that the interpretation does not hold when the same ratio is used to compare HFT intensities across stocks or groups.
Let us imagine that there are two different stocks, 1 and 2. For stock 1, the value of |algo_trad it | is v 1 and that for stock 2 is v 2 where v 1 < v 2 . Let us imagine again, for any stock, the value of |algo_trad it | at time t 1 and t 2 are v t 1 and v t 2 , respectively. For showing a rise in the HFT intensity over time, the condition is v t 2 < v t 1 . On the contrary, for the same comparison across stocks (let us say intensity of HFT in i is higher than that of j), the condition should be v i > v j , i.e., the opposite. This difficulty of comparison makes the proxy less attractive in different scenarios. For example, we refer to Figure 3b to illustrate the theoretical interpretation issue. An intensive growing tendency of the electronic message rate (a decreasing trend of the absolute value of the proxy) can be noted across all quintiles over the sample period, and the pattern confirms the condition we set, v t 2 < v t 1 , accordingly. Again, in the same figure, the quintiles (large) showing the highest HFT intensity lie at the bottom of the groups (the lowest absolute value of the proxy), i.e., v large > v small . This explanation confirms that the cross-sectional and time-series comparison is somewhat dubious for the proxy algo_trad it .
Another HFT proxy, the order-to-trade ratio ( ord_to_trad it ) is a popular measure used by the regulators in many countries like Canada, Australia, Germany for controlling HFTs. Studies such as Frino et al. (2017); Friederich and Payne (2015) have used order-to-trade ratio. Among others, Bouveret et al. (2014) evaluated its pros and cons as a proxy for HFT or AT activities, and finally concluded: "It is a useful metric to assess potential risks linked to trading system overload rather than a method to identify firms carrying out HFT activity". Apart from that, the issue regarding the dubious comparison that we have discussed against algo_trad is equally applicable for this measure. We refer to Figure 3c for a similar comparison.
'Overnight inventory position' (Jovanovic and Menkveld 2015;Kirilenko et al. 2017), 'strategic runs' (Hasbrouck and Saar 2013), 'order life time' (Bouveret et al. 2014) are some study specific proxies which development have made the choice set of proxies larger. However, to the best of our knowledge, we do not find any studies using them afterward. The idea of using intraday overnight inventory position as a proxy is motivated by the fact that HFTs reportedly carry very low intraday and overnight inventory position, and an HFT is distinguished from a non-HFT after comparing the day closing inventory position level. The difficult part of this strategy is that one should be able to identify and distinguish all traders with some classes of flags supplied in the data set. Unfortunately, most of the time, the supply of datasets of this sort is very limited. The procedures described in the Hasbrouck and Saar (2013) for constructing strategic runs are also not easy and almost difficult to construct for data unavailability. The 'order life time' is defined as the time elapsed before the order is modified and cancelled. Generally, HFTs send orders with a shorter lifetime. To trace the order life time, an order id is an essential field in the HFT dataset. Nonetheless, the datasets which are enriched with 'order id' but do not have enough information of the limit order book's depth are also not sufficient for replicating the proxy.
We argue that among the proxies referred here, the electronic message rate (h f t i t) should be the best choice for being chosen as an HFT proxy. This is because the definition is simple, and the interpretation is quite straightforward. It also highlights the main technological aspects of the HFT revolution for the last two decades, i.e., speed. For this reason, it is seen that many papers have used this definition and constructed the proxy on the limit order book information. However, the concern is that most of the studies have relied only on the first-level market depth (BBO) in defining their proxies. The reasons for this might be that BBO level data are highly available, less costly, and require less time and resource commitment for processing. Conversely, evidence coming from regulatory case studies like AMF (2017) shows that HFT participates in quotes beyond the BBO. The AMF studied the activity of the leading HFTs on Euronext Paris focusing the order book liquidity provided by the leading HFTs. It reports that the presence of the HFTs at the BBO, two best price limits and three best price limits were 70.8%, 77.3%, and 79.3%, respectively, compared to the total liquidity provided in the market. We see that the participation of HFTs at the deeper of the limit order book are quite high but we do not have evidence until which level they do participate.

Methods
To determine the level of HFT participation in the limit order book, we take three benchmark levels-BBO, the five best prices, and the ten best prices, and define three proxies-h f t_bbo it , h f t_5bo it , and h f t_10bo it . Using our empirical setup, we evaluate the power of three alternative proxies in explaining the liquidity improvement over the last decade which is attributable to the HFT mostly (Boehmer et al. 2021;Hasbrouck and Saar 2013;Hendershott et al. 2011). We apply an ordinary multivariate regression model which is generally used in the market microstructure literature for determining the impact of market design changes/market variables on market qualities. We use this setup for evaluating the impact of alternative HFT proxies that are defined on the different depth level information of the limit order book.
A primary relationship diagnosis among the variables is made in Table 3. All coefficients in this Pearson's correlation matrix against the null hypothesis H 0 : ρ = 0 are significant at 1% level. All three HFT proxies are strongly correlated and inversely related with the spread base liquidity measures but positively correlated with the depth-based measure. We also find that HFT and market fragmentation are positively correlated and market fragmentation has the same relationship with the main response variables as we find for HFT proxies.  (2624 days). h f t_10bo, h f t_5bo, and h f t_bbo represent the per minute electronic message rate for the 10 best, 5 best, and best bid and offer (BBO) depth levels in the limit order book, respectively, HH Itrd is the Herfindhal-Hirchman index (HHI) showing the degree of market fragmentation, spread_bps is the time-weighted daily quoted spread in basis point, espread is the volume weighted effective half-spread in basis point, depth_bbo is the average BBO-level quoted depth measured in GBP100, depth 3 bo is the average cumulative depth up to three best limit prices measured in GBP100, voltintra is the intraday mid-price range volatility measured in basis point, mktcap is the average market capitalization measured in million GBP, price is the daily average price level measured in GBX. All are daily measures constructed from the intraday millisecond records. We take three consecutive specifications for defining our target model. We believe that it would help immensely in observing and explaining the impacts of main variables on market liquidity better. We include both stock (firm) fixed effect and time fixed effect (day) to address potential unobserved heterogeneity across firms and time periods in all three specifications. To control the effect of other variables on market liquidity, we also use market capitalization, intraday volatility, and price with proper standardization in the model as they are found to be related with the HFT proxy as the variable of our main interest. The first specification is where liq it represents one of the daily (t) market liquidty measures, (spread_bps it , espreadit, depth_bbo it , depth_3bo it ) for stock i, HFT it represents one of the HFT proxies (h f t_bbo it , h f t_5boit, or h f t_10bo it ), the vector X it includes three control variables-log normalized market capitalization (Log(mktcap)), log normalized intraday mid price volatility (Log(voltintra)), and inverse of daily average prices (invprice), which are commonly evident as liquidity determinant in empirical market microstructure literature (Boehmer et al. 2021;Hendershott et al. 2011), α i is the firm fixed effects, and γ t is the time-fixed effects. The expanded second specification, Model 2, includes the daily market fragmentation proxy, MFrag it , with HFT it , so that the impact of both HFT and market fragmentation can be assessed in the same model. We use the following specification: where MFrag it is the market fragmentation proxy measured by HH Itrd.

Our final specification is to include an additional interaction term between HFT and market fragmentation proxies. The final specification is
The idea of the interaction effect between HFT and market fragmentation arises from the fact that the level of fragmentation and HFT are likely to influence each other (Menkveld 2016).
Our original sample constructs an unbalanced panel of 149 stock. To be on the safe side, and to avoid the probable econometric pitfalls related to the estimation of an unbalanced panel estimation, we construct a balanced panel. The panel is reduced to 132 stocks and 2624 trading days (for the period December 2005-December 2016). All stocks other than italic in Table 1 are included in the balanced panel. Table 4, calculated from the balanced panel, shows the relevant descriptive statistics of regression variables which should be used as the average reference values for all regression estimates. All measures other than 'HH Itrad' and 'price' are natural log transformed. We apply the OLS, as an estimation method, using the Newey-West HAC estimator for standard errors, a heteroscedasticity and autocorrelation consistent covariance matrix estimator (lags for autocorrelation are optimally determined). , and h f t_bbo represent the per minute quote update for the best 10, best 5, and best bid and offer (BBO) depth levels in the limit order, respectively, HH Itrd is the Herfindhal-Hirchman index (HHI) showing the degree of market fragmentation, spread_bps is the time-weighted daily quoted spread in basis point, espread is the volume weighted effective half-spread in basis point, depth_bbo is the average BBO level quoted depth measured in GBP100, depth 3 bo is the average cumulative depth up to three best limit price measured in GBP100, voltintra is the intraday mid price range volatility measured in basis point, mktcap is the average market capitalization measured in million GBP, price is the daily average price level measured in GBX. All are daily measures constructed from the intraday millisecond records. The  Tables 5 and 6 report the regression results of the three models where we take the electronic message rate (h f t it ) constructed on the first 10 levels of the LBO (h f _10bo it ) as the main variable of interest (independent variable) and one of the four liquidity measures (spread_bps it , espread_it, depth_bbo it and depth_3bo it ) as the response variable. All variables are natural log transformed except HH Itrd and inverse of the price (invprice). All reported coefficients in the Tables 5 and 6 are significant at 1% except for the dependant variable depth_3bbo. The coefficients of Log(h f t_10bo) and HH Itrd measure the association of HFT intensity and market fragmentation with the liquidity, respectively. We observe from the coefficient of h f t_10bo it that higher HFT intensity is associated with lower quoted and effective spreads, lower depth (in the both depth_bbo it and depth_3bo it ). The coefficients of MFrag it show that higher fragmentation is associated with higher quoted and effective spreads, lower BBO depth and higher depth at the 3 best prices of the LOB (not significant!). The estimate of −0.278 for Log(h f t_10bo it ) (column I, Table 5) means that, ceteris paribus, 1% increase in the HFT is associated with 0.278% decrease in quoted spread. For instance, a one-standard-deviation increase in HFT from its sample mean of 114.88 messages/perminute to 284.01 (≈147.2%) would narrow down the quoted spreads by approx. 41% (147.2*0.278), i.e., the sample mean of quoted spreads would fall down from 18.37 bps to 10.85 bps (for descriptive statistics, see Table 4). The coefficient of HH Itrd is not log transformed, so the estimate of 0.049 (column II) against the log transformed quoted spreads means that, ceteris paribus, a unit increase in HH Itrd, for example, from the sample mean of 2.17 to 3.17 is associated with 4.9% increase in quoted spreads. Table 5. Effects of HFT (electronic message traffic rate/minute for the 10 best levels) on market liquidity as measured by relative quoted half-spreads and effective half-spreads). The table reports the panel regression estimates for Models 1-3 where the first two liquidity measures (spread_bps and espread) are regressed on HFT (log(h f t_10bo)), market fragmentation (HH Itrd) proxy. h f t_10bo represents the per minute daily quote update for the best 10 depth levels in the limit order book. HH Itrd is the Herfindhal-Hirchman index (HHI), showing the degree of market fragmentation. The liquidity measures are time-weighted quoted spread (spread_bps), volume-weighted effectivehalf spread (espread). Dependent variables, spread_bps and espread are natural log transformed, all spreads-based measures are in basis point. Control variables are natural log transformed market capitalization (Log(mktcap)), natural log normalized intraday mid-price volatility (Log(voltintra)), and inverse of the average daily price level (invprice). The regression is based on a balanced panel of 132 stocks and 2624 days (December 2005-December 2016) and has both time (daily) and stock fixed effects. Coefficient estimates are OLS, t-statistics shown in the parentheses below the coefficient, calculated using Newey-West (HAC) for standard errors, a heteroscedasticity and autocorrelation consistent covariance matrix estimator (lags are optimally determined). *** denotes significance at 1% level. We turn now to the interpretation of the interaction term, Log(h f t_10bo) * HH Itrd, in column III. The interaction term demonstrates the partial effect. In Model 3, the partial effect of h f t_10bo it on liquidity depends on the average level of HH Itrd, and vice versa. To explain, we define here a general expression for the partial effect. The partial effects of HFT it and MFrag it on liq it is defined as ∆liq it /∆HFT it = β 1 + β 3 * MFrag it and ∆liq it /∆MFrag it = β 2 + β 3 * HFT it , respectively. For interpreting the partial effects, it is necessary that the above two expressions are to be evaluated at some interesting values, generally the mean. The estimated coefficient on the interaction term between HFT and market fragmentation against the response variable spread it is −0.019 (Table 5, Column VI), and the respective estimates for the partial effects of HFT (∆espread/∆Log(h f t_10bo) and market fragmentation (∆espread/∆HHItrd) are (−0.279 − 0.019 * 2.17 ≈) −0.32 and (0.139 − 0.019 * Log(114.88) ≈) 0.049, respectively 6 . These estimates are close to the estimated coefficients on the same variables where the interaction effects are not introduced (column V ). We do not report joint significance tests for partial effects, which are trivial as both interaction and main effects are significant. The interaction effect confirms the claim that HFT and market fragmentation are related to each other (Menkveld 2016). The interaction effect implies that some of the possible benefits of HFT intensity on market liquidity are offset by the extra cost of market making that incurs in fragmented markets. Conversely, some extra cost of market fragmentation is also offset by the benefits derived from HFT. It seems that fragmented markets would have more liquidity if there were no HFT.
The impact of h f t_10bo and HH Itrd on the average quoted depth (depth_bbo and depth_3bo) are negative, which imply that both the HFT and fragmentation are associated with less average quoted depth (see Table 6). The positive sign of the coefficient HH Itrd in column III implies that more fragmentation is associated with more quoted depth in the deeper level of a limit order book, however, the coefficient for depth_3bo is not significant. One might argue that the depleted market liquidity through the quoted depth is likely to outweigh the benefit of the liquidity added through the narrower quoted and effective spreads. We refer to the calibration exercises by Hendershott et al. (2011) to overcome the doubt where the paper concluded that the depth reduction is small relative to the narrowing of the spread.
The coefficients of the control variables, in both Tables 5 and 6, have the expected signs and are significant at the 1% level. The large market capitalized stocks are associated with lower quoted and effective spreads, lower price impacts, greater depth, and higher realized spreads. The inverse price coefficient implies that a stock with higher price is associated with lower quoted and effective spreads, higher depth and lower realized spreads and price impacts. The positive estimate of the volatility coefficient implies that a higher intraday volatility increases quoted and effective spreads, provides greater depth at BBO, also associated with higher price impact and lower realized spread. Table 6. Effect of HFT (electronic message rate/minute for the 10 best levels) on market liquidity as measured by the market depth at BBO and the cumulative market depth for 3 best depth levels. The table reports the panel regression results of Models 1-3 where two depth-based liquidity measures (depth_bbo and depth_3bo) are regressed on HFT (h f t_10bo) and market fragmentation (HH Itrd) proxy. h f t_10bo represents the per minute daily quote update in the 10 best depth levels of the limit order book. HH Itrd is the Herfindhal-Hirchman index (HHI), shows the degree of market fragmentation. The liquidity measures (response variables) are the average quoted depth at best limit price (depth_bbo), and the accumulated average quoted depth up to the best three limit price (depth_3bo ). All explanatory variables are natural log transformed, and depth measures are in GBP100. Control variables are log market capitalization (Log(mktcap)), natural log normalized intraday mid-price volatility (Log(voltintra)), and inverse of the average daily price level (invprice). The regression is based on a balanced panel of 132 stocks and 2624 days (December 2005-December 2016) and has both time (day) and stock (firm) fixed effects. Coefficient estimates are OLS, t-statistics shown in the parentheses below the coefficient, calculated using Newey-West (HAC) for standard errors, a heteroscedasticity and autocorrelation consistent covariance matrix estimator (lags are optimally determined). *** denotes significance at 1% level.  Tables 7 and 8 report the regression results of the HFT proxy defined on the five best prices of the limit order book. All the coefficient estimates in Table 7 for three models are significant and higher than those estimated for the proxy h f t_10bo in Table 5. Interestingly, we see that HFT proxy based on the 5 LOB best prices is showing a stronger impact on market liquidity than the one we defined on the 10 best prices. On the contrary, the regression coefficient for all models run on depth-level liquidities (Table 8), are weaker than what we find in Table 6. The estimated coefficient for the market fragmentation and the interaction between HFT and market fragmentation beyond the BBO are not found to be significant, but that for the HFT are significant and stronger for the depth beyond the BBO. Our evidence agrees with the evidence provided in AMF (2017). Table 7. Effect of HFT (electronic message traffic rate/minute for 5 best levels) on market liquidity as measured by quoted spreads and effective spreads). The table presents the panel regression results of Models 1-3 where first two liquidity measures (spread_bps and espread) are regressed on HFT (log(h f t_5bo)) and market fragmentation (HH Itrd) proxy. h f t_5bo represents the per minute daily quote update for the best 5 depth levels in the limit order book. HH Itrd is the Herfindhal-Hirchman index (HHI), showing the degree of market fragmentation. The liquidity measures are timeweighted quoted spread (spread_bps), volume-weighted effective-half spread (espread). Dependent variables, spread_bps, espread are natural log transformed, all spreads-based measures are in basis point. Control variables are lnatural og transformed market capitalization (Log(mktcap)), natural log normalized intraday mid-price volatility (Log(voltintra)) and inverse of the average daily price level (invprice). The regression is based on a balanced panel of 132 stocks and 2624 days (December 2005-December 2016) and has both time (daily) and stock fixed effects. Coefficient estimates are OLS, t-statistics shown in the parentheses below the coefficient, calculated using Newey-West (HAC) for standard errors, a heteroscedasticity and autocorrelation consistent covariance matrix estimator (lags are optimally determined). *** denotes significance at 1% level.   Table 8. Effect of HFT (electronic message traffic rate/minute for 5 best levels) on market liquidity as measured by the marker depth at BBO and the cumulative market depth for 3 best depth levels). The table presents the panel regression results of Models 1-3 where two depth-based liquidity measures (depth_bbo and depth_3bo) are regressed on HFT (h f t_5bo) and market fragmentation (HH Itrd) proxy. h f t_5bo represents the per minute quote update for the best 5 depth levels in the limit order book. HH Itrd is the Herfindhal-Hirchman index (HHI), showing the degree of market fragmentation. The liquidity measures are the average quoted depth at best limit price (depth_bbo), the accumulated average quoted depth up to the best three limit price (depth_3bo ). All dependent variables are log transformed and depth measures are in 100GBP. Control variables are log market capitalization (Log(mktcap)), log normalized intraday mid-price volatility (Log(voltintra)) and inverse of the average daily price level (invprice). The regression is based on a balanced panel of 132 stocks and 2624 days (December 2005-December 2016) and has both time (daily) and stock fixed effects. Coefficient estimates are OLS, t-statistics shown in the parentheses below the coefficient, calculated using Newey-West (HAC) for standard errors, a heteroscedasticity and autocorrelation consistent covariance matrix estimator (lags are optimally determined). *** and * denote significance at 1% and 10% levels, respectively.

HFT Proxy: The First Best Price (BBO) of the LOB
For the last HFT proxy, h f t b bo, Tables 9 and 10 report all the estimated regression coefficients. All estimated coefficients for the first two spread base liquidity measures (Table 9) are significant (except one in column III) and show the weakest association with the HFT and market fragmentation among all the estimates that we estimated so far in Tables 5-10. The impact of market fragmentation on the depth base liquidity is not established if we rely on the HFT proxy at the BBO-level information. The results go against the evidence we find from one of the important studies in the European market on fragmentation by Gresse (2017). The results we obtained by employing the HFT proxy defined on the BBO level information has turned out to be the weakest and most fragile.

Which Depth Levels of LOB Should We Rely on?
At this point, we would like to narrow down our choices on three specifications. The analyses we have made so far make it clear that the third specification (Model 3) provides the most reliable estimate. We would also like to single out a market liquidity measure so that the results can be concluded more significantly. Here, our choice is the effective spreads due to the following reasons. Firstly, the results demonstrated so far are similar for both quoted spreads and effective spreads but more robust for the latter one as a dependent variable. Secondly, the effective spread is considered a more sound measure of liquidity theoretically. We find that on average effective spreads are lower than the quoted spreads (see Table 4). For understanding the causes for lower effective spreads, the remark of the Petersen and Fialkowski (1994) is significant: "when trades are executed inside the posted bid-ask spread, the posted spread is no longer an accurate measure of transaction costs faced by investor". The same study also shows that the effective spread averages half the posted spread. This phenomenon can be generalized and persists in every market including emerging ones (Ahn et al. 2018). Hagströmer (2021) confirms the diverse application of effective spreads as a measure of trading cost, and a benchmark measure of many regulators. We summarize next the regression results for all HFT proxies regressed on effective spreads. Table 11 summarizes the results of Model 3 where liquidity (effective spreads) is regressed on alternative HFT proxies-the electronic message rate defined on the three different levels of LOB information. It is clear that the highest impact of HFT, market fragmentation, and their interaction on the liquidity is given for the electronic message rate calculated on the 5 best prices of LOB (h f t_5bo). It appears that neither h f t_10bo (the electronic message rate calculated on the 10 best prices) nor h f t_bbo (the electronic message rate calculated on the first best prices) provides as many variation in regressions for explaining the variations in the liquidity measures as supplied by h f t_5bo (the electronic message rate calculated on the five best prices). The evidence supports that HFT participates beyond the BBO as reported in AMF (2017), and confirms that HFTs provide both tight (marketable) and wider (non-marketable) quotes. Our study also extends the evidence provided in AMF (2017) to show that HFTs' participation at the 10 best prices, i.e., at too far from the BBO, is not significant. Table 9. Effect of HFT (electronic message traffic rate/minute at BBO ) on market liquidity as measured by relative quoted spreads and effective half-spreads. The table presents the panel regression results of Models 1-3 where the first two liquidity measures (spread_bps and espread) are regressed on HFT (log(h f t_bbo)), market fragmentation (HH Itrd) proxy. h f t_bbo represents the per minute quote update for the BBO depth levels in the limit order book. HH Itrd is the Herfindhal-Hirchman index (HHI), shows the degree of market fragmentation. The liquidity measures are time-weighted quoted spread (spread_bps), volume-weighted effective-half spread (espread). Dependent variables, spread_bps, espread are log transformed, all spreads-based measures are in basis point. Control variables are natural log transformed market capitalization (Log(mktcap)), natural log normalized intraday mid-price volatility (Log(voltintra)), and inverse of the average daily price level (invprice). The regression is based on a balanced panel of 132 stocks and 2624 days (December 2005-December 2016) and has both time (daily) and stock fixed effects. Coefficient estimates are OLS, t-statistics shown in the parentheses below the coefficient, calculated using Newey-West (HAC) for standard errors, a heteroscedasticity and autocorrelation consistent covariance matrix estimator (lags are optimally determined). *** denotes significance at 1% level.  proxy. h f t_bbo represents the per minute daily quote update for the BBO in the limit order book. HH Itrd is the Herfindhal-Hirchman index (HHI), which shows the degree of market fragmentation. The liquidity measures are the average quoted depth at best limit price (depth_bbo), the accumulated average quoted depth up to the best three limit price (depth_3bo ). All dependent variables are natural log transformed and depth measures are in 100GBP. Control variables are natural log market capitalization (Log(mktcap)), natural log normalized intraday mid-price volatility (Log(voltintra)), and inverse of the average daily price level (invprice). The regression is based on a balanced panel of 132 stocks and 2624 days (December 2005-December 2016) and has both time (daily) and stock fixed effects. Coefficient estimates are OLS, t-statistics shown in the parentheses below the coefficient, calculated using Newey-West (HAC) for standard errors, a heteroscedasticity and autocorrelation consistent covariance matrix estimator (lags are optimally determined). ***, **, * denote significance at 1%, 5%, and 10% levels, respectively. We would like to explain our results more rigorously. The evidence we provide for the European market in the Section 4.2.3 along with recent studies in US (e.g., Hasbrouck and Saar 2013;Hendershott et al. 2011) and international markets (e.g., Boehmer et al. 2021) show a sharp decline in transaction costs over the past. The same studies have attributed this decline of transaction cost mostly to the simultaneous rise of high-frequency traders at the global market places. In the absence of directly flagged and reliable high-frequency data, recent studies have deliberately used the unflagged LOB data for measuring HFT proxies. We find it worth mentioning again that case studies specific to the HFT participation at different level of market depths (e.g., AMF 2017; Hendershott and Riordan 2013) using exchange supplied flagged data document that HFTs use both marketable (HFTs' share is 52% to 70.8% ) and nonmarketable/passive (HFTs' share is 64% to 79.3%) quotes and trades in the LOB's participation. A marketable quote is one which finds the counter party and gets executed immediately. The first level of market depth or the BBO represents the marketable quotes in a LOB. A passive or nonmarketable quote represents all the orders queued behind the BBO. We have already discussed ( see Section 4.2.4) the strength of the electronic message rate (h f t it ) over other measures as a candidate of HFT proxy. Now, the issue which remains to be settled is the level of limit order books information that should be used for constructing HFT proxies which can exhaust HFTs' participation in the LOB mostly. Otherwise, it poses either the risk of missing important HFT footprints which makes the proxy biased or underidentified, or imposes the burden of handling voluminousness data that eventually impact both financial and non-financial resource commitments for a study.
Prior HFT studies using proprietary LOB data predominantly have used the first level of market depth data for constructing their proxies. Based on the evidence we provide from the related studies, it manifests that defining the electronic message rate only on the BBO level information does not depict a full HFT image, and raises the concerns of under or over identification. As we reported earlier, among the proxies defined on three chosen levels of market depth, h f t_bbo shows the weakest association with the HFT and market fragmentation (see Tables 5-10) that is consistent with our concern of under-identification problem. If our conjecture is valid, it implies that h f t_bbo underestimates the coefficient for the liquidity variable (effective spreads) consistently, and the amount of underestimation is on average about 10% 7 and 9% compare to the same for h f t_5bo and h f t_10bo, respectively.
Why is the use of BBO level data so widespread in HFT research? There are two aspects we need to address in finding the answer to this question. Firstly, the supply of HFT data is quite limited with regards to both time and instrument coverages. Secondly, data management remains a challenge due to its voluminous nature. The more the data fields get involved in a dataset, the higher the data management burden, since most of the HFT data are recorded at the granularity of millisecond or microsecond. Generally, a small file involves millions of records. As a result, the use of LOB data beyond the BBO is not seen frequently in HFT research. After the seminal study of Hendershott et al. (2011), many papers have followed in the same path, and there is reasonable ground to believe that the limitations associated with the HFT data have delayed this valid question that there should be a LOB level up to whose use the proxy can become a valid measure of HFT.
Why is the use of 10 best level data counterintuitive? The arguments and evidence we have provided so far may give the impression that the more the information from the limit order book we incorporate the better the HFT measure becomes, but this is not correct. HFTs participate in nonmarketable quotes which are essentially queued behind the BBO. Now, the fundamental question is-how passive are the quotes? Since HFTs are fast, they can minimize the adverse selection costs by their market monitoring technology. For this reason, expecting a nonmarketable quote from HFT which is too passive is not intuitive. Again, the handling of 10 level market depth data is quite costly in terms of the use of computing resources and the data subscription costs. The estimates we find from the empirical market microstructure models (see Tables 5-10) for the liquidity variables against the proxy h f t_10bo are weaker than those we get for the h f t_5bo. The results support our conjecture that the use of too much information for the construction of HFT proxy increases the implied costs without any benefits, that we refer to as welfare damaging.
The evidence we provide here has a profound impact on the selection of the LOB best prices for constructing HFT proxies. Our results indicate that the electronic message rate (h f t it ) defined by the five best prices provides a better estimate than that only defined by the BBO prices. It also suggests that taking the best prices after a certain level of the LBO (e.g., five best prices) for defining the electronic message rate (h f t it ) is counterintuitive. We do not claim that the electronic message rate (h f t it ) defined on the five best prices of the limit order book is optimal. The methodology we adopt in this paper is like an exercise where we take three choices from an order book-two edges of the LOB (h f t_bbo and h f t_10bo) and a mid-point (h f t_5bo), to show how the regression estimates of the generally established market microstructure models change with the different uses of LOB data. We could have expanded our choices on more LOB points in doing this calibration, but our idea in this paper is to raise the issue and put it forward as an agenda for future research. We admit that the scope of this paper does not allow us to address this issue fully in one paper and it is also true that we do not have the supply of both flagged and unflagged HFT data by using which we could have come up with a definitive answer.

Conclusions
HFT research has been trending in the last decade. In the absence of a uniform HFT identification strategy, we see that the evidence on empirical HFT studies is somewhat conflicting and inconclusive. In this paper, we attempt to evaluate HFT proxies that have been employed in many prior studies. The aim is to check with evidence whether they all have the same or different HFT identifying qualities. Our study has benefited from millisecond time-stamped rich datasets supplied by TRTH with data for 149 LSE-listed large market capitalized stocks for the period of 2005 to 2016. The datasets enable us to construct many HFT proxies with alternative definitions, different measures for liquidity, and a proxy for market fragmentation.
Our study confirms that the post-MiFID era of the European equity market, particularly LSE stocks, has documented intense HFT activities along with high fragmentation in trading volume across markets. The joint forces of HFT and market fragmentation have made a significant impact on the improvement of liquidity in the European market. However, gauging the HFT through proxies remains an issue here.
Our analyses show that some HFT proxies such as algo_trad, used in Hendershott et al. (2011), and order-to-trade ratio (order_to_trade) may not be used invariably in all market settings, and their interpretations also require proper attention. It reveals that their cross-sectional and time-series comparisons do not stand on the same mathematical arguments and are somewhat dubious. The existing studies, namely, Bouveret et al. (2014), have also raised concern over using the order-to-trade ratio as an HFT proxy. Apart from that, the construction of some other HFT proxies such as, 'life time of order ' Bouveret et al. (2014), 'overnight inventory position' (Kirilenko et al. 2017), or 'strategic runs' (Hasbrouck and Saar 2013), are normally not possible for the limited availability of data fields in an HFT dataset.
Popular HFT proxies generally constructed from the trade and quote data are also not uniformly defined but mostly rely on the first best prices of the limit order book.  Conrad et al. (2015) have used only the BBO-level information for defining their proxies, whereas our results along with the evidence provided in the studies like AMF (2017) support that HFTs participate beyond the BBO and provide both tight (marketable) and wider (non-marketable) quotes. The results also show that the electronic message rate defined on the BBO level information provides the weakest estimate among the three alternative depth levels used in this study. Our results also suggest that taking the best prices after a certain level (e.g., 5 best prices) of the LBO for defining the HFT proxy is counterintuitive. We find that the electronic message rate (h f t it ) defined on the 5 best prices of the limit order book provides the largest variations in regressions for explaining the European market quality changes for the period 2005-2016.
The evidence we document in this paper supports the idea that HFT provides liquidity at a deeper level than the best bid and offer price in the limit order book. However, HFTs do not participate in quoting at the very deep of a limit order book (e.g., 10 levels of market depth). The HFT proxy defined on the best five level of limit order book updates explains the market liquidity changes most.
HFTs are diverse in their use of trading strategies (Biais and Foucault 2014;Hagströmer and Nordén 2013). The HFT proxies we finally sorted in this paper are likely to capture the relative dominance of a subset of HFT which we believe to be 'market making'. The evidence we provide here has important implications for the empirical HFT study which are not supported by the HFT-flagged datasets. We do not claim that the electronic message rate (h f t it ) defined on the 5 best prices of the limit order book is optimal, since we take here only two extremes and one median point of the limit order book information for performing the analyses. We believe that any similar study designed on additional informative points of the limit order book may come out with a more precise recommendations. Our paper provides a general guideline in developing HFT proxies such as the electronic message rate (h f t it ) where one should not rely on the BBO level information or extract too much information from the limit order book owing to the fact that every additional piece of information essentially requires more commitment of time and financial resources.
Funding: The idea for this paper and some of the analyses draw upon Chapter 2 of my Ph.D. thesis at University of Naples Federico II. I thank Marco Pagano for insightful comments and suggestions. Financial support from the University of Naples Federico II is gratefully acknowledged.

Data Availability Statement:
Restrictions apply to the availability of these data. Raw data were obtained from Thomson Reuters Tick History (TRH) owned by Securities Industry Research Centre of Asia-Pacific (SIRCA) through the European Capital Markets Cooperative Research Centre (ECMCRC). and are available from the authors with the permission of European Capital Markets Cooperative Research Centre and SIRCA.

Conflicts of Interest:
The author declares no conflict of interest. then provide a seller-initiated trade flag or a buyer-initiated flag when they find a match with ask. If the algorithms do not find a match with the immediate quotes, then they look for a match to the one before the immediate one, and so on. In contrast, a traditional trade signing approach compares changes in trade price with the changes in mid price to ascertain whether an executed trade is buyer-or seller-initiated, and does not seem to fit a dynamic low-latency environment where quote update speed is very high and the time synchronization between trades and quotes updates is not quite orderly. The algorithms used in this study can assign a trade sign with accuracies reaching over 99%. 6 The general expressions for the partial effect are evaluated at the sample means of HFT (96) and market fragmentation (2.17), as reported in Table 4. 7 (0.282 − 0.254)/0.282 ≈ 0.10, see Table 11.