Are there Dragon Kings in the Stock Market?

We undertake a systematic study of historic market volatility spanning roughly five preceding decades. We focus specifically on the time series of realized volatility (RV) of the S&P500 index and its distribution function. As expected, the largest values of RV coincide with the largest economic upheavals of the period: Savings and Loan Crisis, Tech Bubble, Financial Crisis and Covid Pandemic. We address the question of whether these values belong to one of the three categories: Black Swans (BS), that is they lie on scale-free, power-law tails of the distribution; Dragon Kings (DK), defined as statistically significant upward deviations from BS; or Negative Dragons Kings (nDK), defined as statistically significant downward deviations from BS. In analyzing the tails of the distribution with RV>40, we observe the appearance of"potential"DK which eventually terminate in an abrupt plunge to nDK. This phenomenon becomes more pronounced with the increase of the number of days over which the average RV is calculated -- here from daily, n=1, to"monthly,"n=21. We fit the entire distribution with a modified Generalized Beta (mGB) distribution function, which terminates at a finite value of the variable but exhibits a long power-law stretch prior to that, as well as Generalized Beta Prime (GB2) distribution function, which has a power-law tail. We also fit the tails directly with a straight line on a log-log scale. In order to ascertain BS, DK or nDK behavior, all fits include their confidence intervals and p-values are evaluated for the data points to check if they can come from the respective distributions.


Introduction
Realized volatility RV is the square root of realized variance, which is defined as follows where is the average realized variance over n days and are the daily returns with S i being the reference (closing) price on day i.This is an annualized value, where 252 represents the number of trading days in a year.In particular, n = 1 represent daily returns and n = 21, being a typical number of trading days in a month, is useful for evaluating monthly RV.We point out however that in our calculation n is simply a number of consecutive trading days that can fall on different weeks and months.Specifically, we performed our analysis for n = 1, 2, 3, 5, 7, 9, 13, 17, 21.Here we present results for n = 1, 7, 21 which already succinctly illustrate the changes in the RV distribution with n.
Since it is based on actual trades, realized volatility (RV) is the ultimate measure of market volatility, although the latter is more often associated with the implied volatility, most commonly measured by the VIX index [1,2] -the so called market "fear index" -that tries to predict RV of the S&P500 index for the following month.Its model-independent evaluation [3] is based on options contracts, which are meant to predict future stock prices fluctuations [4].The question of how well VIX predicts future realized volatility has been of great interest to researchers [5,6,7,8].Recent results [9,10] show that VIX is only marginally better than past RV in predicting future RV.In particular, it underestimates future low volatility and, most importantly, future high volatility.In fact, while both RV and VIX exhibit scale-free power-law tails, the distribution of the ratio of RV to VIX also has a power-law tail with a relatively small power exponent [9,10], meaning that VIX is incapable of predicting large surges in volatility.
It should be emphasized that RV is agnostic with respect to gains or losses in stock returns.Nonetheless, it has been habitual that large gains and losses occur at around the same time.Here we wish to address the question of whether the largest values of RV fall on the power-law tail of the RV distribution.As is well known, the largest upheavals in the stock market happened on, and close to, the Black Monday, which was a precursor to the Savings and Loan crisis, the Tech Bubble, the Financial Crisis and the COVID Pandemic.Plotted on a log-log scale, power-law tails of a distribution show as a straight line.If the largest RV fall on the straight line they can be classified as Black Swans (BS).If, however, they show statistically significant deviations upward or downward from this straight line, they can be classified as Dragon Kings (DK) [11,12] or negative Dragon Kings (nDK) respectively [13].
The main result of this paper is that the largest values of RV are in fact nDK.We find that daily returns are the closest to the BS behavior.However, with the increase of n we observe the development of "potential" DK with statistically significant deviations upward from the straight line.This trend terminates with the data points returning to the straight line and then abruptly plunging into nDK territory.
To gain further insight into this phenomenon, we start in Sec. 2 with the time series of RV from 1970 to 2021, including expanded views of the aforementioned periods of market upheavals.In Sec. 3 we give analytical expressions of the two distribution functions used to fit the entire RV distribution: modified Generalized Beta (mGB), which is discussed in great detail in a companion paper [14], and Generalized Beta Prime (GB2), which is essentially a limiting case of mGB and is chosen because it has power-law tails.mGB is chosen because it exhibits long stretch of power-law dependence before dropping off and terminating at a finite value of the variable, thus mimicking the nDK behavior of RV [14].Additionally, both mGB and GB2 emerge as steady-state distributions of a stochastic differential equation for stochastic volatility [14].In Sec. 4 we describe fits of RV with mGB and GB2 and give a detailed description of the tails, specifically in regards to possible DK/nDK.Towards this end we also use a linear fit (LF) of the tails.For all three fits, we provide confidence intervals [15] and, more importantly, the results of a U-test [13], which evaluates a p-value for the null hypothesis that a data point comes from a fitting distribution [13].Sec. 5 is a discussion of results obtained in Sec. 4.

Time Series of Realized Volatility
Fig. 1 shows the time series of RV for n = 1, based on daily returns, n = 7, and n = 21, where n is the number of days over which daily RV is averaged in (1) and (2).Only values with RV > 17 are shown and black dots mark values RV > 10 1.75 ≈ 56.It is clear that the time series progression with the increase of n is towards a more pronounced amplification of singularly important events, such as periods corresponding to Black Monday, Financial Crisis and COVID pandemic -even though the maximum values of RV understandably decrease in the same progression as averaging is taken over a larger number of days n.While such progression naturally leads to a question of whether these events might belong to the DK category, we shall see in what follows that they are actually nDK.

Generalized Beta Distribution Function
A companion paper [14] discusses in great detail the modified Generalized Beta distribution function (mGB) used here to fit the distributions of RV.A generalization of the traditional [16] GB can be written, in a slightly modified form relative to that of [14], as follows: where β 1 and β 2 are scale parameters and α, p and q are shape parameters, all positive, B(p, q) is the beta function and x ≤ β 1 .Although it has a concise and transparent form, it does not come out as a solution of a stochastic differential equation (SDE) [17], which is desirable for the purpose of modeling behavior of quantities, such as stochastic volatility, important for understanding of RV [18].The probability density function (PDF) of mGB, which comes out as a solution of an SDE (with minor caveats explained in [14]) and which is used here to model the RV distribution, can be written as The cumulative distribution function (CDF) and complimentary CDF (CCDF) of mGB are given respectively by where the first term in ( 6) and ( 7) represent, respectively, CDF and CCDF of GB (whose PDF is given by( 4)), while I(y; p, q) = B(y; p, q)/B(p, q) and B(y; p, q) are, respectively, the regularized and incomplete beta functions [19].
In what follows, we will be specifically interested in the β 2 ≪ β 1 circumstance since for β 2 ≪ x ≪ β 1 GB and mGB exhibit a power-law dependence, In the limit of β 1 → ∞, mGB and GB become, respectively, mGB2 and GB2 (the latter also known as Generalized Beta Prime) and are given by [14] and Unlike mGB and GB, for whom the power-law dependences in ( 8) eventually terminate at β 1 , mGB2 and GB2 will sustain these power-law dependences indefinitely.
Below, we will use (7) to fit CCDF of distributions of RV.As explained in [14], mGB2 and GB2 are equivalent since q and p are independently defined at this level GB family of distributions and q can be shifted by unity in the definition of mGB2/GB2.Consequently, we choose a more familiar CCDF of GB2 to fit CCDF of the RV data.Insofar as the main difference between mGB and GB is concerned, it is their behavior near β 1 in the present context [14].Namely, and that is 1 − F mGB drops off to zero (F mGB saturates to unity) faster than 1 − F GB due to the factor β2 β1 α .This feature accounts for a better fit via mGB versus GB, which may be due to the fact that mGB emerges from a physically motivated stochastic model [14].

Fitting Distribution of Realized Volatility
We fit CCDF of the full RV distribution -for the entire time span discussed in Sec. 2 -using mGB (7) and GB2 (11).The fits are shown on the log-log scale in Figs. 4 -13, together with the linear fit (LF) of the tails with RV > 40.LF excludes the end points, as prescribed in [13], that visually may be nDK candidates.(In order to mimic LF we also excluded those points in GB2 fits, which has minimal effect on GB2 fits, including the slope and KS statistic).To make the progression of the fits as a function of n clearer, we included results for n = 5 and n = 17, in addition to n = 1, 7, 21 that we used in Sec. 2. Confidence intervals (CI) were evaluated per [15], via inversion of the binomial distribution.p-values were evaluated in the framework of the U-test, which is discussed in [13] and is based on order statistics: where x k,n is the k's member of numbers between 1 and n ordered by increasing magnitude (RV values in this case), and F (x k,n ) is the assumed CDF (mGB, GN2 and LF here).
For each n, from top to bottom, Figs. 4 -13 are organized as follows: • Full data CDF fit with mGB and GB2 and LF of the tails; • Same as above shown for RV > 40; • p-values of all three fits for RV > 40, with p < 0.05 indicating DK and p > 0.95 nDK; • LF with its CI; • GB2 fit with its CI; • mGB fit with its CI.In the CI plots, upward pointing triangles indicate p-values consistent with DK, while downward pointing triangles indicate p-values consistent with nDK [13].Fig. 14 shows LF for n = 7, 17, 21, where the last 10% of the range of values were excluded, that is values greater than 0.9 max{RV }, as opposed to excluding points visually as in Figs. 4 -13.Fig. 15 shows LF and GB2 slopes and Kolmogorv-Smirnov (KS) statistic for GB2 and mGB as a function of n; the horizontal line with the table value of KS statistic for our sample size [20] is shown for guidance only since mGB and GB2 here are distributions with estimated parameters.

Discussion
While the standard search for Dragon Kings involves performing a linear fit of the tails of the distribution [13,15], here we tried to broaden our analysis by also fitting the entire distribution using mGB (7) and GB2 (11) -the two members of the Generalized Beta family of distributions [14], [16].As explained in the paragraph that follows (7), the central feature of mGB is that, after exhibiting a long power-law dependence, it eventually terminates at a finite value of the variable.GB2, on the other hand, has a power-law tail that extends mGB's power-law dependence to infinity.
The key to understanding the results of fits in Sec. 4 is the analysis of the structure of RV used by the markets -a square root of realized variance (1).At its core is the average of the consecutive daily realized variances (2).Distribution of daily realized variance can be modeled using a duo of stochastic differential equations -for stock returns and stochastic volatility -which produces distributions of daily variance such as mGB [14] and GB2 [18].Via a simple change of variable, daily RV would then follow the same distributions but with renormalized parameters.
Even assuming the knowledge of the distribution of daily realized variance, finding the distribution of the averages constitutes a daunting task.To begin with, using convolution to evaluate the distribution of a sum of just two such complex distributions as mGB and GB2 is already not amenable to analytical evaluation.To complicate things further, the consecutive daily RV cannot be treated as independent identically distributed variable (i.i.d.) due to the correlations that persists up to roughly 5 -7 days [10].
With the above in mind, we first address Figs. 4 -13.According to Figs. 4 and 5, daily RV appears to be the closest of being commensurate with the Black Swan behavior as both LF and GB2 approximate the tail of the distribution better than mGB and LF does not point to existence of either DK, p < 0.05, or nDK, p > 0.95.The n = 1 behavior undergoes a dramatic change with the increase of n, as seen in Figs. 6  -13, where we observe that, first, the "potential" DK, p < 0.05, develop at the earlier portions of the tails, only to terminate in nDK at the tail ends.
Generally speaking, the existence of the large number of "potential" DK in the tail of the distribution indicates that the distribution is not describing the tail adequately.This becomes pronounced for large n for all three fits -LF, GB2 and mGB -although less so for LF, which also does not exhibit "potential" DK for small n.However, if we adopt a different procedure for LF, whereby instead of visually excluding nDK candidates at the tail end we exclude the values whose RV is greater than 0.9 of the maximum RV, we observe in Fig. 14 that it has little effect on LF for small n but all but eliminates "potential" DK.
For large n we also observe that mGB approximates the tail end better than GB2 -consistent with smaller KS values in Fig. 15 and smaller number of nDK.However, neither approximates the preceding portion of the tail well as indicated by the "potential" DK.This has to do with the fact that neither of the distributions appear as a solution of a first-principle model describing average RV.Finally, in the first plot in Fig. 15, we observe that after roughly 5 -7 days the slope of the GB2 tail saturates, consistent with the correlation range of daily RV [10].The slope of LF, on the other hand, increases with n.However neither is consistence with a naive assumption of the distribution having the same slope as that of the daily RV.
In conclusion, we showed that for daily returns distribution of realized volatility likely has a powerlaw tail, consistent with the Black Swan behavior.Multi-day realized volatility develops strong negative Dragon King signature as the number of days involved in averaging of daily realized variances increases.The breadth and strength of the S&P index analyzed here may be a contributing factor in suppressing the runaway power-law behavior.A natural extension of this work will be analysis of gains and losses of stock returns as well as of other large data sets that call into question possible power-law tails, such as incomes.

Figs. 2
Fig.1shows the time series of RV for n = 1, based on daily returns, n = 7, and n = 21, where n is the number of days over which daily RV is averaged in (1) and (2).Only values with RV > 17 are shown and black dots mark values RV > 10 1.75 ≈ 56.It is clear that the time series progression with the increase of n is towards a more pronounced amplification of singularly important events, such as periods corresponding to Black Monday, Financial Crisis and COVID pandemic -even though the maximum values of RV understandably decrease in the same progression as averaging is taken over a larger number of days n.While such progression naturally leads to a question of whether these events might belong to the DK category, we shall see in what follows that they are actually nDK.Figs.2 and 3give snapshots of the time series in Fig.1around the largest volatility events: Fig.2based on daily returns, n = 1, and Fig.3for n = 7 and n = 21 respectively.Based on Figs.1 -3, Black Monday was clearly the most singular volatility event, while Financial Crisis and COVID Pandemic were distinguished by more prolonged periods of sustained extraordinarily large RV.

Figure 2 :
Figure 2: Snapshots of n = 1 time series in Fig. 1 around Black Monday, Tech Bubble, Financial Crisis and Covid Pandemic.

Figure 3 :
Figure 3: Snapshots of n = 7 (top two) and n = 21 (bottom two) time series in Fig. 1 for the same periods as in Fig. 2.

Figure 7 :
Figure 7: Same as the middle Fig. 6 with the respective CI, "potential" DK (up triangles) and nDK (down triangles).

Figure 9 :
Figure 9: Same as the middle Fig. 8 with the respective CI, "potential" DK (up triangles) and nDK (down triangles).

Figure 15 :
Figure 15: As a function of n: linear slope values of linear (stars) and GB2 (squares) fits -left; KS statistic values of mGB (stars) and GB2 (squares) fits -right