Early Warning Signs of Financial Market Turmoils

Volatility clustering and fat tails are prominently observed in financial markets. Here, we analyze the underlying mechanisms of three agent-based models explaining these stylized facts in terms of market instabilities and compare them on empirical grounds. To this end, we first develop a general framework for detecting tail events in stock markets. In particular, we introduce Hawkes processes to automatically identify and date onsets of market turmoils which result in increased volatility. Second, we introduce three different indicators to predict those onsets. Each of the three indicators is derived from and tailored to one of the models, namely quantifying information content, critical slowing down or market risk perception. Finally, we apply our indicators to simulated and real market data. We find that all indicators reliably predict market events on simulated data and clearly distinguish the different models. In contrast, a systematic comparison on the stocks of the Forbes 500 companies shows a markedly lower performance. Overall, predicting the onset of market turmoils appears difficult, yet, over very short time horizons high or rising volatility exhibits some predictive power.


Introduction
Financial market prices are vastly fluctuating with prices occasionally rallying upwards and suddenly collapsing back down. The notion of "excess volatility" has been coined as price fluctuations often not being able to be reconciled with changes in economic fundamentals (Shiller 2005). On the contrary, volatility, i.e., the standard deviation of returns, appears to exhibit a dynamics of its own, exhibiting temporal clustering and long-range dependence (Chakraborti et al. 2011;Cont 2001). Econophysics has since developed and proposed models which explain these and other stylized facts of volatility in terms of endogenous dynamics arising from the collective action of many heterogeneous traders (Aymanns and Farmer 2015;Franke and Westerhoff 2016;Lux and Marchesi 1999;Patzelt and Pawelzik 2013;Samanidou et al. 2007).
According to this view, market fluctuations are amplified by feedbacks inherent in the dynamics of traders, e.g., arising from chartist strategies (Lux and Marchesi 1999), herding (Franke and Westerhoff 2016) or leverage targeting (Aymanns and Farmer 2015). While these models can replicate several stylized facts, their empirical estimation and analysis remains challenging. Yet, according to these models, especially large market fluctuations are driven by dynamical instabilities and could potentially be preceded by specific statistical signatures such as critical slowing down. While critical slowing down has successfully been used as an early warning sign from physics to ecology (see Scheffer et al. (2012) and references therein), complex systems thinking is still not widespread among economic scientists and practitioners. Nevertheless, some anecdotal evidence, i.e., focusing on crisis periods alone without a systematic investigation of predictive performance, has found that rising volatility might provide an early warning sign of financial crises (Diks et al. 2018;Guttal et al. 2016). In contrast, within economics statistical studies of forecasting of financial crises provide quantitative comparison of different methods, e.g., logistic regression, but they have focused on aggregate macroeconomic data such as growth of equity prices and credit levels (Drehmann and Juselius 2013). Thereby being in the realm of classical statistics without influences from the science of complex systems.
Motivated by these studies, we analyze a similar question. In particular, we develop and analyze several early warning signs indicating periods of elevated volatility. Our contribution is threefold: First, we provide an operational method to automatically date the onset of volatility clusters, referred to as financial turmoil in the following. In contrast to financial crises, which occur very infrequently and are often dated manually based on the unfolding of macroeconomic developments after the fact (Reinhart and Rogoff 2009), financial turmoil is slightly more prevalent and thereby allows for better statistical analysis. Yet, it is sufficiently different from more common notions of financial risk, such as captured by (implied) volatility indices (Chicago Board Options Exchange 2019), SRISK (Brownlees and Engle 2017) or CoVaR (Adrian and Brunnermeier 2016). As demanded from econometric risk indicators, these statistics track financial or systemic risk and, in particular, stay elevated during periods of high volatility. Here, we are instead especially interested in the dynamics preceding such periods and therefore require methods to date their onset. Hawkes processes, i.e., self-exciting point processes, provide an alternative modeling framework and have been shown to capture clustering in extreme market returns (Embrechts et al. 2011;Stindl and Chen 2019). Thus, we build our dating procedure on such methods. To this end, a Hawkes process is fitted to each return series. Large jumps in the Hawkes intensity then reliably identify the first events in each series of clustered tail returns. Note that no prediction is desired here. Instead, our method is designed to date market events, i.e., mark there onset after they have been observed, in an automated and reproducible fashion. Secondly, we study several models and derive early warning indicators which are meaningful with respect to the instability underlying each model. In this respect, our investigations complement more direct fitting exercises and comparisons of agent-based models, e.g., via moment-matching (Franke and Westerhoff 2012) or likelihood based methods (Bertschinger and Mozzhorin 2020;Lux 2018). While agent-based models have been shown to capture volatility dynamics with similar precision as established stochastic volatility models, uncovering the mechanisms driving financial turmoils has remained unfruitful. A problem appears to be that model fitting requires a match to all of the data and not just of the instable times preceding financial turmoils. Thus, by focusing on these events specifically, we strive to uncover complementary information and obtain insights into the mechanisms underlying market instabilities as our main research objective. Thirdly, we carry out a systematic and quantitative investigation comparing the performance of several indicators across simulated as well as hundreds of actual stock data. To this end, we put forward the receiver operator characteristic (ROC), a well established technique from machine learning and statistics to evaluate the performance of binary classifiers. As we explain below, this method is particularly suited to access and compare forecasts of rare events.
Overall, our paper is structured as follows: Section 2.1 shortly introduces Hawkes processes and explains their role for identifying and dating onsets of turbulent market periods, i.e., market events. Thus, having defined what we consider as market events, Section 2.2 turns to our main research question on which possible mechanisms could drive such events. In particular, we consider three agent-based models illustrating different mechanisms leading to clustered volatility and derive early warning signals, i.e., specific statistical signatures, tailored to each of them. Next, in Section 2.3 we introduce the ROC and explain how it can be used to evaluate the ability of each indicator to predict market events. With all methodological considerations in place, Section 3 contains the evaluation results on both, simulated as well as real, market data sets. Last but not least, Section 4 discusses and concludes our study.

Market Event Detection
Detecting the onset of turbulent market periods requires that we can identify and date the onset of volatility clusters, i.e., periods of elevated volatility. To this end, we start by fitting 2-dimensional Hawkes process, see Appendix A.1 for additional details, on daily stock return data within a certain observation period [T * , T * ] as follows: 1.
The triplet series (t n , d n , r n ) of events at times t n ∈ [T * , T * ] is specified as follows: if a positive return r n which exceeds the positive 5% quantile occurs at day t n , then d n = 1 and if there is a negative return which is smaller than the 5% negative quantile, then d n = 2. Otherwise, no event is recorded.
Thus, overall we focus on the largest 10% of absolute returns. This threshold has been found to be well suited for detecting financial turmoil, which is thought to be accompanied by the largest absolute returns.

2.
For each Hawkes process, i.e., modeling positive and negative returns, we choose two independent exponential decay functions ω j , each of which has the form ω j = α j exp(α j t) for j = 1, 2 3. As mark distributions we choose two independent Pareto distributions f j , with densities along with a polynomial impact function g j

4.
The 2 × 2 branching matrix Q has zero entries on its diagonal. We make this restriction because even without this constraint the maximum likelihood fit of this Hawkes process on the triplet series (t n , d n , r n ) for all the stocks and models which we consider in the sequel set the diagonal parameters of the branching matrix Q to zeroish values. This implies that the intensity for the Hawkes process exhibits no self-excitation and large positive returns are triggered by negative tail returns and vice versa.
Overall, these Hawkes processes describe the dynamics of observed event series (t n , d n , r n ). As described in Appendix A.1, each time an event occurs, either the intensity for positive returns (d n = 1) or negative returns (d n = 2) jumps with mark value r n . The process intensities are coupled via the branching parameters Q and the impact functions g 1,2 . The choice for the polynomial impact function is motivated from the GARCH model. Denoting the intensity of the j-th component at time t with λ j (t), we can compute λ j (t) recursively (actually not the intensity λ j (t) itself but a proxy, see Liniger (2009) where η j is the immigration parameter of the component j and we have assumed exponential decay functions. θ j,d n−1 is the descendant parameter at row j and column d n−1 in the kernel of a Hawkes process, as described in Appendix A.1. If we take the definition of the impact function g d n−1 and introduce the recursive formula adopts the form λ j (t n ) = ω + αr 2 n−1 + βλ j (t n−1 ) .
If we take into account that r n−1 is a return beyond a certain quantile, λ j (t n ) can be interpreted as variance σ 2 t n in terms of the GARCH(1,1) model. Nevertheless, there are important differences between our GARCH like formula and an actual GARCH(1,1) model: the parameters α, β and ω depend on the time via the Hawkes process dynamics, and we do not assume that tail returns r n−1 are normally distributed. Furthermore, the Hawkes process only describes the dynamics of tail returns, i.e., the 5% largest positive and negative returns, thereby focusing on turbulent periods alone. Figure 1 shows an example of the 2-dimensional Hawkes process fitted to the returns of the S&P 500 within the time period T * = 1950\01\04 and T * = 2016\12\30. Over this time period the maximum likelihood estimate yields the following parameter values:  It is clearly visible, how the Hawkes intensities jump most strongly at the beginning of a cluster covering several tail returns. This in turn allows us to date the onset of financial turmoil via large jumps in the Hawkes intensities. Note that the intensity dynamics of the Hawkes processes is inferred by fitting them to the whole observation window. Thus, large jumps in intensity arise after the fact, when an initial event-at the time of the jump-is followed by additional events. This way, the dynamics of the Hawkes process allows one to date the onset of a turbulent market period, i.e., a cluster of several events corresponding to tail returns but is not predictive for their occurrence.
Formally we define a market event as follows: Recall, λ 1 (t) denotes the intensity of the upper hawkes process in Figure 1. Here, we assume that a jump in the upper intensity process indicates a market crash and the onset of the subsequent volatility cluster. While it might feel more intuitive to consider the lower intensity process, we found that large negative jumps are purely driven by positive jumps and do not exhibit self-excitation (see remark 4 on the diagonal entries of the branching matrix being zero). Thus, dating is actually more precise when using the upper intensity process alone. We measure the jump-size via the ratio I(t) = λ 1 (t)/λ 1 (t − 1) and introduce a binary event variable Y m via where θ > 0 is a certain threshold. Throughout the paper we take the 1.5 per mill quantile of the time series (I(t)) t∈[T * ,T * ] . Again, we have investigated other thresholds but found this value to work robustly across simulated as well as real market data. Y m (t) then simply records whether a market event occurred or not. Thus, all times where Y m (t) = 1 are the dates of the desired market events. Figure 1 shows the detected event dates (marked by red bars) as detected on the S&P 500 returns. It is clearly visible that large negative returns and, arguably, most onsets of volatility clusters are detected as market events. Our goal is now to derive early warning indicators which can predict the occurrence of market events.

Indicators
Econophysics provides a variety of explanations for mechanisms which drive endogeneous market events, e.g., in terms of fundamentalists and chartists trading on a single asset or leveraged investors. Here, we pick three models and derive indicators from them.

Information Absorption
Patzelt and Pawelzik (2013) present a reduced agent based model where market instabilities occur from information absorption. In this model agents trade in money or stocks based on public information states µ(t). This information can be either exogeneous, i.e., unrelated to past price movements, or endogeneous. In the latter case, the information available is derived from the signs of the last K returns r t−K , . . . , r t−1 . Thus, with K = 2 four states of information could be distinguished corresponding to past price movements (u, u), (u, d), (d, u) and (d, d) where u, d denote positive/negative returns, i.e., rising or falling prices, over the last two days respectively. Each agent then uses a fixed strategy mapping observed information to buy or sell decisions. Market prices are in turn determined from the aggregate demand of all traders. The model distinguishes between two types of traders, namely producers and speculators. Whereas producers have fixed resources, speculators gain or lose wealth based on their trading decisions. Appendix A.2 contains further details on this model.
Here, we consider the following parameters and consider only the case of endogeneously generated information. Patzelt and Pawelzik (2013) show that with these parameters the model reproduces stylized facts known from stock market return series: volatility clustering, long-range correlation of the absolute values of the returns and heavy tails. A typical return time series generated by this parameter set is presented in Figure 2. Note that large negative returns are often followed by periods of elevated volatility and correctly detected as market events by our automatic dating method. Interestingly, the collective trading activities of the agents corresponds to a learning rule related to gradient descent which reduces price fluctuations. Hence, the system tends in general towards an equilibrium after some time and price fluctuations, that is returns are reduced. This equilibrium becomes unstable once all predictable information, i.e., binary sequences encoding the return sign, has been already exploited by the agents. Then, an information state which has not been seen in the recent past leads to massive price changes because the market is not well adapted-see Patzelt and Pawelzik (2013) for further details. Therefore, Patzelt and Pawelzik do not only provide a reduced agent based model which reproduces stylized facts of financial markets but also provides an indicator for when market events are occurring in their model: they tend to occur after rarely seen information states.
Formally, we define the indicator value X PP (t) as where the probabilitiesp(sgn(r t−K ), . . . , sgn(r t−1 )) of the binary sequences encoding the signs of the K + 1 preceding returns are estimated by their empirical frequencies on the whole time series. Thus, low values of the indicator are expected to precede market events. To aid comparison with the simulation results, we fix K = 9 in the following, but note that other choices would lead to similar results as the mechanism of the model is rather robust to changes of this parameter. Indeed, the indicator merely needs to rank sequences of past returns according to their probability in order to provide a meaningful early warning signal of market instabilities.

Scaling and Criticality
The seminal paper by Lux and Marchesi (1999) stresses the point that stock prices exhibit universal scaling behavior which can be found in other physical systems with a large number of interacting microscopic particles generating a trend on a macroscopic scale. In (Lux and Marchesi 1999) the particles are thought to be agents which fall into two classes: fundamentalists, they follow the premise of an efficient market and expect the price to follow the so called fundamental value; chartists, they are traders which do not only try to identify price trends and patterns but also take behavior of other agents into account. The main building block of the model is the possibility of agents to move from one group to another. That is, chartists may become fundamentalists and vice versa based on the current profitability of each strategy. We refer the interested reader to Lux and Marchesi (2000) for the mathematical details of the model.
Here, we simulated the model with the following parameters as detailed in Appendix A.2. In this specification, the model generates return series with volatility clustering and market crashes whose sizes follow a power law. Those periods of wild market behavior occur in the model if the ratio of chartists versus fundamentalists is imbalanced in favor of the chartists. Market turmoil then occurs if the ratio exceeds a critical threshold. Figure 3 shows an example return series as well as the fraction of chartist traders. Again our method for dating market events nicely detects large negative returns starting a volatility cluster. Interestingly, this often concurs with local maxima in the fraction chartist traders even though that information is not available to the dating procedure. . Return time series in blue along with the ratio of chartists compared to the overall number of trading agents. One clearly sees that the volatility clusters if the ratio is close to one. As before, the red lines mark the times of market turmoil as detected by our method based on Hawkes process intensities. Interestingly, these correspond to local maxima of the chartist fraction even though this information was not available to the dating method.
Hence, we observe the phenomenon of criticality in this model, also known from nonlinear dynamical systems where tuning a parameter-in our case the ratio between chartists and fundamentalists-destabilizes an attracting equilibrium of the system into a repelling one, leading to rapid changes: a massive drop of stock prices. Indeed, Lux and Marchesi (2000) have shown that the deterministic skeleton of the model exhibits a fixed point which loses stability when the fraction of chartists becomes to high. A well known indicator of a system approaching such a critical transition is critical slowing down. That is, if the system is disturbed, the recovery rate is getting slower. If we translate this into the terms of the model by Lux and Marchesi (1999), this means that correcting deviations of the stock price from its fundamental value takes longer if the ratio of chartists and fundamentalists approaches the previously mentioned threshold. Hence, an indicator which measures the recovery rate provides insight about future possible market bursts.
Formally, critical slowing down leads to a characteristic scaling of the recovery rate, variance and autocorrelation when a system approaches a critical transition (Scheffer et al. 2009). As measuring the recovery rate directly requires repeatable perturbations to the system and autocorrelations tend to be small in market returns, we focus on variance scaling alone. This is also justified as volatility being large compared to mean returns and exhibits the most strongly visible patterns in market returns. Under suitable regularity conditions, the variance σ 2 scales as where τ denotes the distance of the control parameter to the critical transition and the exponent α depends on the type of instability (α = 0.5 for saddle-node and α = 1 for Hopf bifurcations) (Meisel et al. 2015). It is common practice to assume that the distance to criticality reduces proportional to the time to the critical transition, i.e., τ = t − t crit . Furthermore, the increasing trend in variance close to a critical transition is commonly detected by using a sliding window to estimate short-term variances and computing Kendall's rank correlation coefficient. Critical slowing down is then associated with a strongly positive correlation of sliding variances. Unfortunately, this method has been shown to exhibit very low power in detecting critical transitions (Boettiger and Hastings 2012) and will be of limited use in our case where considerable noise is expected. Yet, as we search for a specific scaling, a likelihood ratio test-which by the well-known Neyman-Pearson lemma constitutes the most powerful test for simple hypothesis-can be derived. Thus, we define our indicator as wherep 0 (r(t − w + 1), . . . , r(t)) andp α (r(t − w + 1), . . . , r(t)) denote the maximum likelihood estimates for a null-model assuming constant variance and a critical slowing down model assuming variance scaling as above. Both models are optimized over a window of w preceding returns and the likelihood values are given aŝ Note that the second model nests the first, i.e., with α = 0, and thus the the indicator X PP (t) is bounded below by zero. Yet, large positive values indicate a much better fit of the critical slowing down model compared to the null model and are therefore expected to precede market events. Aymanns and Farmer (2015) investigate an agent based model composed of a leveraged bank B and a noise trader N which invest both into a single stock with price p(t) at time t. In the present context, we only outline a simplified model which can be found in Section 4.2 of (Aymanns and Farmer 2015). The bank invests only a fixed percentage amount w B ∈ (0, 1) of their capital into the stock while the rest is saved in cash w c = 1 − w B . The weight w N of capital which the noise traders invest into the stock follows a stochastic process. At time t + 1 the bank perceives its portfolio risk σ(t + 1) according to σ 2 (t + 1) = (1 − δ)σ 2 (t) + δ log p(t) p(t + 1) with the risk perception parameter δ. According to its risk perception, the bank computes its target leverage, that is, the ratio between total assets and equity of the bank, according to the formula

Leverage Cycle
where α and b denote two time independent parameters. Banks then expand or shrink their balance sheet in order to match the desired target leverage. Details of the full process, i.e., on how the leverage cycle effects asset prices, can be found in Aymanns and Farmer (2015).
Here, we simulated the model with parameters as detailed in Appendix A.2. α is interpreted as the risk appetite of the bank and b is the cyclicality parameter. If b < 0, the bank is acting procyclical, that is, the bank's leverage is stimulated by a low risk perception. The case b > 0 is called contracyclical. The annotation comes from the fact that a positive cyclicality parameter produces leverage cycles: the banks leverage steadily increases until it drops dramatically. This drop in leverage comes along with a price crash of the traded asset and the procedure starts over again. Since b < 0, leverage and risk perception are inverse proportional. Hence, if the leverage is high, then risk perception is low and vice versa. Since risk perception can be computed from previous market prices-it is nothing else than an exponentially weighted mean over previous logarithmic returns-σ(t) provides an early warning indicator for future market crashes in this model and we define X AF (t) = σ 2 (t) .

Performance Evaluation
Having defined market events and model derived indicators, we now want to evaluate their predictive performance. Obviously, a good indicator should signal market events, by adopting large values ahead of each event. On the other hand, only few values of similar magnitude should occur if no event happened. This is especially important, as market events are rather rare-about one event every five years-and could lead to a high number of false positive warning signals. In such a situation, an early warning signal which never issues a warning, i.e., being utterly useless, could easily achieve a lower overall error than a more sophisticated indicator as it never raises a false positive warning. Thus, we need a more elaborate method in order to compare different indicators. Here, we use the Receiver Operator Characteristic (ROC) which allows one to access how the indicator reveals information about market events.
Assume that the indicator variable X(t) is used to predict market events d days into the future. Then, we fire a warning signal if X(t) > δ for a threshold δ and compare the resulting binary series with the actual series of market events Y m (t + d) shifted d days ahead. Now, we compute the rates r c (δ) of correct predictions and r f (δ) of false positives: The ROC curve then is the graph of r c over r f for different values δ, as illustrated in Figure 5. The ROC curve is monotonically increasing from (0, 0) toward (1, 1). It illustrates the informativity of an indicator in the following way: if the indicator is independent of actual market events, i.e., when no warning is ever raised, the ROC is the diagonal from (0, 0) toward (1, 1) whereas a perfectly informative warning signal corresponds to a ROC curve consisting of the points {(x, 1) : x ∈ (0, 1]}. Note that the ROC curve can be below the diagonal when raised warnings are actually indicating nonevents and vice-versa. Such a warning signal is obviously useful and informative but should be inverted, i.e., the omission of a warning should be considered as signaling an event. A common method, to summarize the information provided from the ROC curve is to compute the area under the curve (AUC) score. Here, an AUC score of 0.5 identifies an uninformative indicator, whereas scores of 1 (or 0) correspond to perfect detection of market events.  For each threshold, the fraction of correctly predicted events as well as the fraction of false alarms is recorded. Thereby, as shown in the right panel, the ROC curve is obtained. In the example, the indicator is informative with an AUC score of 0.89.

Early Warning Signals for Simulated Market Events
In this section we analyze the performance of the indicators for the various models, PP, LM and AF, introduced above. In all three cases we simulated the models for 50,000 time steps and recorded the last 16, 000 returns for further analysis, corresponding to roughly the number of trading days of the S&P 500 data. Each model was simulated 100 times. See Appendix A.2 for further details on these simulations.
For each series we have detected market events and then computed the ROC curve for the various indicators. To summarize our results, we in turn computed the corresponding AUC score. Figure 6 shows the distribution of AUC scores obtained over 100 simulation runs for return series generated according to the PP model. AUC scores are shown for predicting market events d = 1, 7 and 28 days into the future (from top to bottom). Especially when predicting over short horizons (d = 1) the PP indicator works almost perfectly and, as expected, low indicator values precede market events giving rise to AUC scores close to zero. It appears that over larger horizons, none of the indicators is predictive with all AUC scores clustering around 0.5. Figure 7 shows the corresponding results on return time series simulated according to the LM model. In this case, all indicators seem to be of similar quality with none being perfect. Especially at larger time horizons there seems to be a slight edge for the LM indicator though. That the risk perception indicator derived from the AF model exhibits a similar performance could be somewhat expected as it, being a short term estimate of market volatility, also reacts quickly to raising volatility. Figure 8 shows the corresponding results on return time series simulated according to the AF model. In this case, the AF indicator shows almost perfect performance with low indicator values, i.e., volatility, preceding market events. In contrast, the LM indicator looking for raising volatility is not useful reflecting that the mechanism of the model is completely different. Interestingly, the PP indicator is also predictive, but now with large indicator values preceding market events. The performance of all indicators drops just slightly when predicting across larger time horizons.
Overall, the indicators work reasonable well-mostly almost perfectly-with respect to the model they where designed for. In addition, on return series from other models they sometimes exhibit reasonable performance but can nevertheless be well separated as their behavior is sufficiently different across times series simulated from different models. We now turn to real market data.

Early Warning Signals for Real Market Events
Here, we investigate the performance of all indicators on the return series from the S&P 500 between 4 January 1950 and 30 December 2016 as well as on stocks of the Forbes 500 universe. Based on availability, we selected 401 stocks for our experiments (see Appendix A.3). Figure 9 shows the ROC curve of all indicators on the S&P 500 when predicting 1, 7 and 28 days ahead (from top to bottom). In each case, the grey areas show the 95% confidence region of ROC curves resulting from totally uninformative indicators. Thus, the performance of an indicator should be considered as statistically significant at the 5% level to the extend its ROC stays outside this region. According to this criterion, only the LM and AF indicators appear to be somewhat predictive for very short time horizon of a single trading day. Interestingly, in contrast to the results on the AF model, high values of the AF indicator are predictive. Thus, on the S&P 500 data, high or rising volatility indicate the onset of market events.  . ROC curves for the different indicators for the S&P 500 time series when predicting 1, 7 and 28 days ahead (from top to bottom). The grey areas show the 95% confidence region of ROC curves resulting from totally uninformative indicators. Thus, the performance of an indicator should be considered as statistically significant at the 5% level to the extent its ROC stays outside this region.
To further investigate the predictive performance, we have in turn tested the indicators on stocks of the Forbes 500 universe. Figure 10 shows the corresponding distribution of AUC scores obtained for one day ahead predictions. Whereas the PP indicators performs close to change level, on average over all stock tested, the other indicators shows some predictive power. Especially the AF indicator exhibits a very good performance in some cases, closely followed by the LM indicator. Note that the relative ranking of the indicators is most similar to the data simulated from the LM model. Nevertheless, the AF indicator appears somewhat better and the PP indicator is clearly worse with no tendency towards lower probabilities. In this respect, it appears that the structure of real market data is different than the model generated return series.

Conclusions
Starting from three econophysics models, each illustrating a different mechanism generating clustered volatility, we have derived early warning indicators for predicting the onset of turbulent market phases. In order to rigorously evaluate and compare all indicators, we have proposed a novel method for dating the onset of volatility clusters, i.e., market events. To this end, we fit a two-dimensional Hawkes process to tail returns and generate a market event if the intensity of the Hawkes process exhibits a large jump. Using simulated as well as real market data we have visually checked that this method produces market events that capture the intuitive notion of a volatility cluster reasonable well.
Next, we have introduced the ROC curve in order to systematically evaluated all indicators. These were then compared on simulated as well as real market data in terms of their AUC score. We found that the indicators are able to accurately predict market events on simulated data, especially on time series simulated from the model they were derived from. Furthermore, all indicators clearly distinguished between data simulated from the different models and achieved their best performance on data simulated from the same model they were derived from. Thus confirming our idea that different mechanisms could be detected based on specific statistical signatures. In contrast, the performance on actual market data is markedly lower and also more similar between the different models. Yet, over very short time horizons high or rising volatility appears to be able to predict market events to some extent, thus at least ruling out the mechanism based on low perceived risk. From this perspective, our findings confirm the results of Guttal et al. (2016) who found that financial crises appear to be preceded by increasing volatility. They have not investigated the problem of false alarms in a systematic fashion though. In this respect, our results substantiate their claims as the ROC curve precisely relates the ability to detect actual events with the rate of false alarms. Note that an informative signal, i.e., with a high AUC score, score would not be guaranteed to generate good trading returns, e.g., when used in an investment strategy. Yet, the idea to derive trading signals from agent-based models might be interesting to directly investigate in the future.
Overall, predicting the onset of market crashes-as attempted in this as well as other studies-remains elusive. We believe that substantial progress on this problem requires first and foremost a workable definition, ideally provided as an algorithm to allow large scale data screening, of what constitutes a market crash or event that is to be predicted. Modern machine learning has greatly advanced by relying on public data sets which allow for the systematic evaluation and compare different models. In particular, the price offered by Netflix in 2014 has prompted many other companies and academic institutions alike to open up challenging data sets to scientific research. With models becoming more and more complex, it becomes increasingly difficult to identify significant differences on purely empirical grounds. Thus, models need to be studied and evaluated from multiple angles and over multiple data sets. We believe that our contribution of indirectly testing model predictions in terms of indicators derived from instabilities, is valuable in this respect.

Conflicts of Interest:
The authors declare no conflict of interest. The funder had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A.1. Hawkes Processes
Here we recall some basic definitions and results on Hawkes processes, following Embrechts et al. (2011). For a more thorough treatment we refer the interested reader to the doctoral thesis of Liniger (2009).
Consider a multivariate point process N t with n components specified by a countable series (t i , d i , x i ) i∈I . That means at time t i ∈ R the component d i ∈ {1, . . . , n} jumps with mark value x i > 0. Since we assume t i = t j if i = j, this implies that there are no simultaneous jumps of several components. Hawkes himself introduced his processes in Hawkes (1971aHawkes ( , 1971b via intensity functions which is informally the derivative Definition A1. A Hawkes Process is a multivariate point process N t whose intensity is defined as where ω j : R + → R + , g k : R → R + and η j , θ jk ≥ 0. The numbers θ jk are called the branching coefficients or descendant parameters whereas η j are called immigration parameters. The n × n matrix Q = (`i j ) i,j=1,...,n is the branching matrix. We call ω j the decay functions which allow the following interpretation. Assume a jump occurred in component k at time s and fix some t > s. The intensity of all other components j ∈ {1, . . . , n} at time t are increased by ω j (t − s). The functions g j are called impact functions and if there is a jump in component k with mark value x k , the intensity of all other components is increased by g j (x k ). The mark values x k for each component are pairwise independently distributed with densities f k : R → R which are called mark densities. A mark density is called one-sided if its support contains only non-negative numbers. The various functions are subject to the following normalizing conditions for all j, k ∈ {1, . . . , n} where Spr denotes the spectral radius of Q, that is, the square root of the largest eigenvalue of the matrix Q T Q. If all these conditions are fulfilled, then there is a unique mulitvariate point process N t with intensity as in Definition A1-see theorem 1 of Liniger (2009). In the present paper the parameters of a Hawkes process are estimated via a maximum likelihood fit on data observed within a time period D = [T * , T * ].
Definition A2. For all t ∈ D we define the compensator by With these definitions the log-likelihood reads as Note that the integrands depend on either the jump times t i or mark values x i but not both. Furthermore, assuming exponential decay functions ω j (t) the integrals can be efficiently computed in a recursive fashion as outlined in the main text. In turn, if the parameters are fitted, one can simulate a Hawkes process by means of the Ogata's modified thinning algorithm (Ogata and Akaike 1981). An adaption of this algorithm to Hawkes processes as well as further details on numerical methods for maximum likelihood fitting are provided by Liniger (2009).

PP model
The model by Patzelt and Pawelzik (2013) consists of N agents: N s speculators and N p producers. Each agent i possesses at day t two types of assets called money M i (t) and stocks S i (t). The price of stocks is p(t) = δ(t)/ζ(t) where δ(t) is the demand of stocks and ζ(t) their supply which shall be defined shortly.
Agents base their trading decisions on public information states µ(t) which can be endogenous or exogenous information. If µ(t) is endogenous, it is derived from the signs sgn(r t−K ), . . . , sgn(r t−1 ) of the K preceding returns with the sign function sgn(x) = 1 if x ≥ 0 and −1 else. Hence, there are 2 K different states encoded by binary sequences of length K. In the exogeneous case µ(t) is simply drawn randomly from the uniform distribution over the 2 K binary sequences of length K. Each agent's i strategy is determined by a strategy function σ i : {0, 1} K → {0, 1}. If σ i (µ(t)) = 1 then the agent i buys stocks by trading a fraction gamma of its current monetary wealth M i (t). If σ i (µ(t)) = 0 then the agent i sells a corresponding fraction of its stocks Whereas the wealth of speculators is updated according to their buy and sell decision, producers have a fixed and unchanging wealth. Note that without external producers, wealth would be conserved and exchanged between speculators in a zero-sum fashion. Thus, with external producers their provided resources are exploited by the speculators and redistributed according to the effectiveness of their trading strategy.
Demand and supply are the sums of all buy and sell orders, respectively, where ε = 1 × 10 −10 is a small positive number which ensures δ(t), ζ(t) > 0 and therefore well defined prices and returns. For the simulation studies, we have run the model with parameters as in Equation (1) for 50,000 time steps from random initial conditions. Then, we discarded the initial part of the generated return series and recorded the last 16, 000 returns. This corresponds to about 64 years of simulated data, matching the empirical record for the S&P 500, from the stationary distribution of the model dynamics.

LM model
The model by Lux & Marchesi consists of N speculators. These either adhere to chartist or fundamental strategies. At each time point, the number of chartists and fundamentalists will be denoted by n c and n f respectively, i.e., N = n c + n f . Furthermore, the model distinguishes between optimistic and pessimistic chartists. Thus, n c = n + + n − with n + optimists and n − pessimists.
The dynamics of the model is formulated in continuous time and consists of switching dynamics as follows: 1.
Chartists switch between being optimistic or pessimistic. This decision is based on the price changeṗ = dp dt -approximated as p t −p t−∆t ∆t in simulations-and the opinion index x ∈ [−1, 1] capturing the prevailing population opinion, i.e., Chartists then switch with rate π +− from the pessimistic to the optimistic camp and π −+ in the reverse direction. Formally, the switching rates are given as with U 1 = α 1 x + α 2ṗ ν 1 and parameters ν 1 , α 1 and α 2 . 2.
Speculators also switch between fundamental and chartist strategies. Here, the decision is based on the excess profits gained by chartists (r +ṗ)/p + R and the difference between realized p and fundamental price p f . In this context, r denotes nominal dividends and R average real returns from other investments. It is assumed that these are priced at fair value, i.e., r/p f = R.
Switching rates are now given by Prices are then determined from the excess demand of chartists ED c = (n + − n − )t c , i.e., optimists buy and pessimists sell a fixed number t c of assets, and fundamentalists ED f = n f γ(p f − p) which react to over-or underpricing with intensity γ. The auctioneer then observes the total demand ED = ED c + ED f with a small amount of noise µ and adjusts prices up or down by ∆p ± 0.01 with rates and reaction speed β.
The model is then simulated by discretizing the continuous time jump dynamics, i.e., transition rates are rescaled as √ ∆tπ. Furthermore, to avoid the absorbing boundaries transitions are prevented whenever n + , n − or n f would fall below a minimum of 4 speculators. Overall, following Lux and Marchesi (2000) we have used the following parameters 1 1 In order to obtain volatility clustering, we had to increase β ten-fold compared to the original reference. This adjustment was stable across three independent implementations of the model in Python, Haskell and C++ which all produced identical results. N = 500, ν 1 = 3, ν 2 = 2, β = 60, T c (≡ Nt c ) = 10, T f (≡ Nγ) = 5 α 1 = 0.6, α 2 = 0.2, α 3 = 0.5, p f = 10, r = 0.004, R = 0.0004, s = 0.75 and µ ∼ N (0, 0.05). As before, we simulated the model for 50,000 time steps and discarded an initial transient, retaining only the last 16,000 returns.

AF model
Aymans & Farmer consider an agent based model illustrating the interplay between a leveraged bank B and a noise trader N investing into a single stock. The bank strives to maintain a target leveragē λ(t) = α(σ(t) 2 + σ 2 0 ) b based on its perceived riskiness of the stock 2 . To this end, the bank adjusts its stock holdings by ∆B(t) =λ(t)(A(t) − L(t)) − A(t). Here, A(t) = n(t)p(t)/w B and L(t) denote the value of its assets and liabilities respectively. Note that the assets are derived from the bank targeting a portfolio weight w B and owning a fraction n(t) of stock priced at p(t) at time t. The model is then completed with a dynamics for the perceived risk and balance sheet constraints for the liabilities, stock ownership fraction and price: σ 2 (t + 1) = (1 − δ)σ 2 (t) + δ log p(t) p t−1 2 L(t + 1) = L(t) + ∆B(t) n(t + 1) = w B (n(t)p(t + 1) + c B (t) + ∆B(t)) p(t + 1) Note that this dynamics is mainly driven by the desired change ∆B(t) in the bank's balance sheet. Indeed, if ∆B(t) > 0 the bank borrows additional funds for investing into the stock market. If ∆B(t) < 0 the bank sells assets and pays back some its liabilities. The stock ownership fraction n(t) and the stock price p(t) then adjust to the demand of bank and noise trader. Here it is assumed that the portfolio weight w N (t) of the noise trader follows a mean-reverting random walk w N (t + 1) = w N (t) + ρ(0.5 − w N (t))w N (t) + ηw N (t)dW where ρ determines the speed of mean-reversion and η denotes the noise standard deviation. In turn, the cash holdings c B (t) and c N (t) of the bank and the noise trader are updated as where it is assumed that cash is redistributed in order to maintain an equity target E 0 for the bank 3 .
In particular, as detailed in (Aymanns and Farmer 2015) this ensures that equity in the system is conserved.

2
Aymans & Farmer also discuss a policy rule for adapting α(t). Since this does not substantially change the qualitative dynamics of the model we stick with α(t) ≡ α for simplicity. 3 The bank's equity at time t is given as E (t) = A(t) − L(t).