Heterogeneous Criticality in High Frequency Finance: A Phase Transition in Flash Crashes

Flash crashes in financial markets have become increasingly important, attracting attention from financial regulators, market makers as well as from the media and the broader audience. Systemic risk and the propagation of shocks in financial markets is also a topic of great relevance that has attracted increasing attention in recent years. In the present work, we bridge the gap between these two topics with an in-depth investigation of the systemic risk structure of co-crashes in high frequency trading. We find that large co-crashes are systemic in their nature and differ from small ones. We demonstrate that there is a phase transition between co-crashes of small and large sizes, where the former involves mostly illiquid stocks, while large and liquid stocks are the most represented and central in the latter. This suggests that systemic effects and shock propagation might be triggered by simultaneous withdrawals or movement of liquidity by HFTs, arbitrageurs and market makers with cross-asset exposures.


Introduction
Flash crashes in financial markets can be defined as extreme changes in the price of one or multiple assets within a short interval of time. These have become increasingly relevant for practitioners and, in particular, market makers whilst being increasingly studied and reported in the quantitative finance literature.
The most notorious flash crash is likely that of 6 May 2010, which involved the major U.S. stock indices (S&P, DJIA, and NASDAQ composite) and caused a ≈9% drop in the DJIA in the 36 min it lasted for. This event led to a variety of empirical and theoretical papers trying to understand the event and its causes, with the aim to shed light on other black swan events too (up/down crashes). High frequency traders are at the center of interest in a large portion of this literature; hence, we report a brief summary of their role in markets and its regulatory concerns.
It has been shown that HFT market players contribute to price efficiency and tighter spreads, thereby improving the price discovery process. These players and electronic trading as a whole have become increasingly dominant in recent years to the point of constituting a large portion of the traded volume in financial markets. On the other hand, some characteristics of HFT players have caused other market players to raise concerns, as the run to incredibly fast execution leaves many behind and allows HFTs to front run other players [1]. The ability of HFTs to process information faster than other players leads to adverse selection and its fixed cost to a size advantage for larger players, which might hurt the overall welfare of market participants [2]. It can now perhaps be argued that the run to faster execution is going beyond price efficiency, which benefits investors and toward an unstable price process driven by competition between large firms. This is supported by a large body of literature on flash crashes, which places HFTs at the center of some disruptive systemic events, as discussed below.
The SEC's report on the flash crash of May 6th [3] finds that most market participants automatically halted their trading due to hard risk constraints triggered by the sudden price change, while some HFT firms kept trading, as it was deemed still profitable by their algorithms. These absorbed most of the original large sell order, but once they reached inventory or loss constraints, they started selling too. This increased the selling pressure in the market, and some works hold that it caused HFTs to trade with each other repeatedly ("hot potato phenomenon"), thereby increasing the traded volume (but not the real liquidity). This apparent increase in liquidity in the form of high trading volume caused large sell orders to get executed faster [4]. This chain of events highlights that the phenomenon has a dangerous positive feedback loop.
The results in [5] show through simulations how reducing either (or both) the number of HFT players or the size of the large sell order greatly reduced the size of the drawdown. Further, other works find that black swan phenomena of duration <1.5 s are about ten times more frequent than longer ones, and their return distribution deviates from the canonical power law distribution of returns. The authors suggest a phase transition to an all-machine environment at ∼1 s, as human reaction time is in the order of seconds. The authors also investigate the time scales via additional simulations to show the rise in extreme events and their magnitude around ∼1 s in what they define as the all-machine phase [6,7]. Findings along those lines, on the distribution of high frequency black swan events deviating from the canonical return distributions, were also recently published by the authors of this work [8].
From the review above, we see that crashes of different sizes seem to involve a selfperpetuating cycle [5] with positive feedback loops.
This type of self-excited process is also investigated in [9] for the liquidity and information dependence between two sample assets, showing how liquidity shocks to an asset can propagate to related ones (and by extension to the wider market).
The frequency and size (in terms of number of securities involved) of simultaneous-like crashes in HFT is also investigated in the literature. For instance, the works by Lillo and coauthors [10,11] investigate the dynamics of simultaneous flash crashes, and motivate their importance by showing the growth in the number of mini crashes in recent years. Further, they show how the number of simultaneously crashing securities has grown over the last 10 years, thereby highlighting the increasing systemic relevance of this phenomenon.
We recognize that systemic risk is traditionally defined as "the risk of a cascading failure in the financial sector" [12]. In this work, we do not investigate interbank connectivity, but rather the connectivity of trading patterns across financial assets which can lead to breakdowns or temporary dysfunctions in financial markets, as per the definition of systemic risk in [13]. We phrase the concept in a slightly different manner in the context of our work as follows. In this paper, we define systemic risk as the risk component of an event (say a flash crash) that is given by the interconnectedness of assets, likely as a result of correlated actions and arbitrage between market participants. This causes isolated events to spread in the market and affect more assets, thereby increasing their impact and relevance for all market participants. A related concept is that of "synchronization" which is the systemic and concentration aspect that arises from the alignment and interdependence of actions between market players (on a single asset) rather than across assets.
Our phrasing of the concept of systemic risk from [13] highlights our microstructural investigation of the trading dynamics which lead to dysfunctions and disruptions in the orderly functioning of financial markets. Indeed, crashes can be just due to microstructural dynamics, but as price efficiency deteriorates and volatility spikes, investors shy away from financial markets. Financial markets allow investors to provide companies in the real economy with capital, and their dysfunction can turn mere trading issues into real economic panic and crisis. Therefore, even high frequency black swan events can have dramatic effects on the real economy, as proven multiple times in recent history, which ties our interpretation of systemic risk back to Ref. [12] as well.
The systemic risk posed by HFTs was investigated in the literature in the last decade. The work by Paulin et al. [14] simulates flash crashes through agent-based modeling and highlights the importance of market structures in the systemic propagation of extreme events. The works by Abreu and Brunnermeier [15] and Bhojraj et al. [16] investigate the risks of synchronization between arbitrageurs in financial markets and acknowledge its existence. Other works investigate the systemic risk of HFT dynamics. Jain et al. [17] investigate how low-latency HFT trading can worsen extreme systemic events in financial markets and argue for the need to incorporate correlation and market structure in regulating these risks. The work by Harris [18] discusses many mechanisms, among which systemic risks originating from order routing and self-reinforcing mechanisms which cause crashes. The review by De Gruyter [19] summarizes the systemic aspects of HFTs and market structure, such as position correlation and herd behavior, adverse selection in orders and crowding, as well as negative contribution to price discovery at times.
Co-crashes are becoming more frequent and systemic. It is, therefore, important to investigate their structure. In particular, it is relevant to understand which stocks are central to larger systemic events as well as the contagion structure between stocks in the market. This is a central theme in market stability for regulators as well as in risk management for market makers.
The present work joins the two themes of flash crashes and systemic risk by delving deeper into the dynamics of simultaneous flash crashes of different sizes throughout 300 liquid stocks traded on the NASDAQ. We investigate the empirical distribution of crash sizes and the structure of these events in the market. We also investigate whether larger systemic events involve highly unstable stocks (which crash often) or stocks that are more stable in their price dynamics, yet more influential to trigger larger systemic events when subject to liquidity shocks. We apply tools from statistical physics to show the difference between crashes which involve a small or large number of assets. We uncover a phase transition occurring when the crash size exceeds five stocks. Implications for systemic risk in high frequency markets are discussed from both a trading and regulatory perspective.

Data
In the present work, we consider a universe of 300 liquid stocks from the NASDAQ exchange between 3 January 2017 and 25 September 2020. High frequency price data are obtained from LOBSTER [20] and sampled to obtain non-overlapping one-minute returns. This frequency was also adopted in [10] and other works in the literature for the detection of price jumps, as it is understood that below this limit, microstructural noise becomes relevant and can impact the validity of the method.

Jump Detection
In the present work, we focus on anomalous movements in the mid-price p t and their co-occurrence structure. To do so, we detect price jumps (up and down crashes) similarly to [10], at least in principle, in 1 min non-overlapping returns.
Specifically, we apply the basic jump detection method from [21] and detect jumps at the 5% significance level. The intuition behind this method is simple: we consider changes in p t in the form of log-returns r t = log p t p t−1 . Those are normalized so that, in the absence of jumps, their distribution is close to being normal and stationary. The method then exploits extreme value theory to obtain thresholds, above or below which, r t can be classified as anomalous (i.e., a jump), with a given confidence level.
To achieve a distribution of log-returns close to normal and stationary in time, we must normalize returns locally to account for two known regularities: daily seasonalities and long memory effects [22][23][24][25]. Mid-price returns have been shown to have approximate zero mean but a non-stationary variance due to the above [26]. Hence, the method empirically measures and discounts daily seasonality patterns and autocorrelations in return variance from the data. This yields a time series of almost normally distributed returns with stationary variance. Extreme value theory can then be applied as described above.
In addition to the basic features of the method for robust volatility estimation in intraday patterns, we obtain a robust estimate of intraweek periodicity and adjust the return series and jump detection according to [27].
As per the description above, null models are calibrated, and price jumps detected individually for each stock. As we consider 1 min non-overlapping returns, our sampling allows for aligned timestamps. We then consider contemporaneous price jump detection across assets in the universe as simultaneous jumps (a single systemic event).
It is important to highlight in the context of risk that crashes are normally associated with negative price returns of anomalous magnitude. The method used here detects both positive and negative anomalous price movements and we consider both as they are "jumps". In our related work [8], we have shown how both up and down jumps are relevant for risk, as market makers can hold inventory and be exposed in either direction. Further, a short squeeze can potentially be more dangerous, as it is often associated with high levels of leverage. Still, we recognize the importance of investigating down jumps (traditionally termed "crashes") and are looking to include a comparison between down and up jump structures in follow-up works.

Crash Size Distribution and Firm Persistence
We investigate whether co-jumps which involve different numbers of stocks originate from the same dynamic process and present the same distribution. We also consider whether individual stocks are involved to the same extent across co-crashes of different sizes or if a pattern emerges.
We define the unnormalized crash frequency for stock x, in co-crashes with m stocks and time range t ∈ [0, T] as The changes in the composition of the crashes are investigated by computing the correlation between the involvement of firms across crashes of different sizes. Namely, for each crash size m, we assign to each firm x a rank in decreasing order by f x,m . We then compute the Spearman correlations between these ranks.

Statistical Testing
To support the visual intuition of our results, we apply statistical testing in the form of null models. We applied the Spearman correlation to test for rank similarity between the crash frequency distributions across stocks at different crash sizes m. As the frequency distributions are noisy and fat-tailed, the correlation p-value seems hard to justify as a valid test. Hence, we follow the idea of Mantegna et al. [28] to create a simple null model of correlation significance.
To do so, we sample without replacement the whole list of stocks S m according to ∝ f m from Section 3.2 to obtain a biased reshuffling G i,m of the stocks according to their crash frequency.
For each shuffled list, we calculate the Spearman correlation coefficient between the sample and the original list to form the empirical null distribution as We then define the significance of the correlation between sizes m, m + τ as the quantile of Spearman(S m+τ , S m ) in D m .

Crash-Weighted Trading Volume
To investigate the relationship between the crash size and the involvement of highly traded stocks, we define a weighted average daily dollar traded volume for each crash size, where the weighting is given by the normalized crash frequency of each stock.
For crash size m and crash frequency distribution f x,m , as per Section 3.2, we define the crash-weighted dollar traded volume DTV m as This measure aims to represent how more highly traded stocks are involved at different crash sizes.

Results and Discussion
The plot in Figure 1a shows the frequency distribution f m of the number of stocks involved in each flash crash. Figure 1b plots the cumulative frequency f (M ≥ m). It is evident from both figures that they are heavy-tailed, and there is a change in the slope around m ≈ 5 and a finite size effect at ≈ 10 2 , which is when the crash involves a large portion of the system (system size is 3 · 10 2 ) [29]. This kind of distribution was already reported in [10], where the authors investigated and modeled flash crash sizes and frequency as a single Hawkes process. The authors there suggest that each security's crash dynamics should be modeled as a self-excitation process, but they point out that this would involve tuning a large number of parameters on very noisy data. They therefore decided to model the collective self-excitation process of securities as the frequency of crashes (or co-crashes) and their size. Hence, all crash sizes are treated as instances of a multi-asset Hawkes process in [10], with no distinction between the assets involved in each crash or their co-occurrence structure.
In the present work, we take a more granular approach and move to investigate the structure of co-crashes and the individual susceptibility of each stock.
To further investigate the difference between small and large crash sizes, we report in Figure 2a the Spearman correlation between the ranks of crash frequency for all stocks. Specifically, each line reports the correlation between the rank of the companies in the initial crash size m (correlation 1) and all other crashes of higher sizes m + τ. We indeed observe how crashes of smaller sizes (m < 5) have a substantially different composition to crashes of larger sizes. We instead observe that for sizes m > 5, a steady state is reached, with a large component of the population having similar ranks in frequency across all crash sizes. These steady states for m > 5 are significantly higher than the ones of smaller sizes, as the structure no longer evolves significantly between higher size crashes. The plot in Figure 2b provides a clearer visualization of this. We highlight that already at size 5, the correlation transitions directly to the steady state, albeit a lower one with respect to the ones for crash size 6 and above.
To validate the visual results from Figure 2, we apply the null model of correlation significance between crash frequency distributions. Figure 3 shows the correlation significance between the starting point m on the horizontal axis and its steady state distribution ∼[m + 2, m + 10]. We observe the first significant value at 1% around m = 4, which confirms the intuition from Figure 1a,b that crash sizes up to ≈4 belong to a different process than larger crashes. Indeed smaller crashes are dominated by less stable stocks and larger ones by very liquid stocks with high market capitalization. This suggests that more influential and systemic stocks are involved in larger crashes and perhaps even trigger those. A reason for why this is not the case in small crashes can be that these stocks are systemic enough to mostly be involved in (or perhaps even cause) crashes of a larger size. These are then even more relevant for systemic risk. Alternatively, only larger crashes involve enough activity to influence highly traded stocks.  Figure 1. Heterogeneous crash distribution. Log-log plot of the flash crash size distribution. We observe that sizes lesser than 4 follow a different trend, with lower than expected frequency. This suggests that crashes of this size and onwards do not belong to the same self-organized process, but that this is rather a heterogeneous distribution.   This is therefore further evidence of the occurrence of a transition in the process between smaller and larger crashes. The slow decay of smaller crash sizes indicates how these belong to similar distributions of non-systemic events, but as the crash size grows, the steady state gets closer to the large crash level. This suggests that larger crashes have some systemic characteristics.
If we take a closer look at the top ranked stocks at each size, we observe that smaller crash sizes are dominated by very volatile and illiquid stocks, which are subject to large jumps perhaps due to the lack of a smooth price process in their trading. We would expect this though to make them susceptible to larger systemic events as well and, hence, stably ranked. Yet, we observe very low to null rank correlation between individual (and small) crash frequencies and the large crash size steady state. It seems as if these crashes are not only non-indicative, but also, as indicated by the phase transition in Figure 3, they belong to an unrelated ranking and distribution. We highlight that we considered rankings and ranking correlation in order to avoid any sensitivity to large values or outliers at smaller frequencies.
Large crash sizes involve stocks such as Microsoft (MSFT) and Apple (AAPL) as being consistently high ranked. We highlight that these stocks are highly liquid and characterized by a stable price process with very few price jumps. Indeed, the few times they get involved into jumps, they are often part of larger simultaneous crashes, which involve more stable and systemic stocks. Further, when analyzing the co-crash relations between pairs of stocks, we observed a heavy-tailed distribution of centrality for these large systemic stocks, which suggests a community and core-periphery-like structure of the contagion network of co-crashes [30][31][32][33][34].
The above observations prompted us to conduct further analyses on the relation between stock liquidity (where average daily dollar traded volume is used as a proxy) and crash frequency at different crash sizes.
To validate visually and numerically our observation that highly traded stocks are more present in large crashes, we present the plots in Figure 4. The plot in Figure 4a shows the average daily traded volume of a stock per crash size, weighted by its crash frequency, as per the definition in Section 3.4. This is plotted against the crash size to show a clearly increasing trend in crash-weighted traded volume with crash size. This shows how larger crashes see stocks with higher traded volumes being more frequently involved.
This could, however, be the consequence of a subset of crashes which involved highly traded stocks. We therefore test this with the results in Figure 4c, which show how not only the average crashing stock is more "liquid" in larger crashes, but also that the fraction of crashes, which involve at least one of the top 20 stocks by traded volume in our universe, increases with crash size.
In line with this, we test how the traded volume of each stock correlates with its crash frequency for each crash size. We report results for the Spearman correlation coefficient in Figure 4b, where dots are used for correlations significant to the 5% confidence level and crosses otherwise. We see that co-crashes of size 1 and 2 seem to have an inverse or no relation between the volume traded and crash size. At our previously identified phase transition point m ≈ 5, we see the first significant positive correlation between the volume traded and crash frequency, which stays somewhat stable or is slightly increasing with crash size.
This last result is less clear than the previous one, but still shows a positive correlation between the volume traded (a proxy for liquidity) and crash frequency at crash sizes m > 5.
The presence of liquid stocks in most large crashes observed in Figure 4c prompts questions around the periphery structure of the different liquid stocks and implications for systemic risk. Further work in this direction is already underway with promising results and will be the topic of a follow-up work. The causality of such co-crash structures is also a very important topic, albeit harder to investigate rigorously, and should be the subject of future work.

Conclusions
The present work analyses co-jump structures in high frequency markets. We investigate the distribution of co-jump sizes for 300 stocks on 1 min returns. We highlight features of this distribution, such as the finite size effect in the tail and the divergence of small crash frequencies from the distribution. We show how the ranking and structure of crash frequency throughout stocks changes drastically through a phase transition between small and large crash sizes at size 5. We quantify this with the Spearman correlation between crash frequency ranks at different crash sizes. We then apply a null model of crash frequency at each crash size to test the hypothesis of a phase transition. Finally, we highlight how larger crashes are dominated not by the less liquid stocks present in small crashes, but rather by highly liquid stocks which are present in most flash crashes as the crash size grows. Preliminary results, which we leave for future work, find these stocks to be systemic in communities and core-periphery like structures of co-crashes. We suggest that these systemic events can be viewed as communities centered around these most influential stocks.
We know from the literature that these structures can be indeed vulnerable and highly unstable, as well as fragmented if characterized by multiple cores. One of the possible reasons for this can be inferred from the interviews with different market players following the crash of May 6th [3]. Many HFTs highlight the centralized risk constraints for volatility and P&L, which cause them to withdraw from the market in the case of extreme conditions or losses. As they constitute much of the liquidity in the market in particular for smaller stocks, withdrawing from those causes liquidity droughts. These are often systemic, as players have central risk constraints and withdraw from the entire market as those are triggered. Further, as systemic stocks crash, arbitrageurs come into play to level prices across the market, thus making the isolated event a systemic one. In this view, well-known stocks are not systemic per se, but rather as a result of non-siloed trading by HFTs and ETFs.
In light of the present results, future works shall investigate the asynchronous price changes of securities and model spreading dynamics of flash crashes and their directed structure. Lead-lag investigations of causality of these larger crashes are also suggested for future work. Already from our results, one can monitor, in particular, the most systemic stocks from larger flash crashes for co-jumps of size 5 and higher and induce trading halts or limitations to avoid further spreading of these systemic events. This is crucial, as our results combined with those of [10] suggest a systemic self-excited process in both frequency and magnitude of those crashes.
We leave the investigation of this structure for future work and highlight that this is of high importance for practitioners and regulators when dealing with market efficiency and stability, particularly as trading frequencies rise and electronic trading becomes widespread across securities.
We conclude by observing that volatility and P&L-based trading breaks used by market players may worsen these events and their systemic characteristics since they cause liquidity withdrawals throughout stocks and market players. This introduces systemic synchronization throughout the market and makes individual assets more susceptible to small trading volumes. Further, we suggest to monitor the stocks we find to be systemic throughout larger crashes to model the contagion of liquidity crises and halt trading before these spread and distort a larger number of assets. This should also be topic of future work aimed at smart and efficient regulation in high frequency markets.

Data Availability Statement:
The data used in this work was obtained from the LOBSTER dataset (https://lobsterdata.com (accessed on 2 January 2022)) under academic license to the Financial Computing and Analytics group at University College London. We are therefore unable to publish the raw data.