A Hypothesis Test for the Goodness-of-Fit of the Marginal Distribution of a Time Series with Application to Stablecoin Data

Presentedatthe7thInternationalconferenceonTimeSeriesandForecasting,GranCanaria,Spain,19–21July2021. Abstract: A bootstrap-based hypothesis test of the goodness-of-ﬁt for the marginal distribution of a time series is presented. Two metrics, the empirical survival Jensen–Shannon divergence ( E SJS ) and the Kolmogorov–Smirnov two-sample test statistic ( KS 2), are compared on four data sets—three stablecoin time series and a Bitcoin time series. We demonstrate that, after applying ﬁrst-order differencing, all the data sets ﬁt heavy-tailed α -stable distributions with 1 < α < 2 at the 95% conﬁdence level. Moreover, E SJS is more powerful than KS 2 on these data sets, since the widths of the derived conﬁdence intervals for KS 2 are, proportionately, much larger than those of E SJS .


Introduction
The empirical survival Jensen-Shannon divergence (E SJS) has recently been proposed as a goodness-of-fit measure of a fitted parametric continuous distribution [1]. However, the important issue of hypothesis testing whether the output E SJS value is significant was left open.
To alleviate this problem, we propose a hypothesis test based on the parametric bootstrap [2,3], and evaluate the method on time series data [4,5]. As a proof of concept, we chose four cryptocurrency time series, three stablecoin [6] data sets, and, for reference, we employ a fourth, Bitcoin [7], data set. The stablecoins we chose maintain their "stability" by being pegged to the dollar, and thus one would expect their volatility to be low. Apart from the general interest in cryptocurrency time series, it has already been shown that Bitcoin data are heavy-tailed [8]; thus, demonstrating that stablecoins also exhibit heavy tails is interesting in its own right. One reason to experiment with heavy-tailed distributions, such as the α-stable distribution [9] (or simply the stable distribution) employed herein, is that they pose additional problems compared to, say, the normal distribution (in the special case when α = 2) due to their variance being infinite (in the more general case when α < 2).
The rest of the paper is organised as follows: In Section 2, we introduce the E SJS and, for comparison purposes, also bring in the well-known Kolmogorov-Smirnov two-sample test statistic (KS2) [10] Section 6.3. In Section 3, we present a parametric bootstrap-based goodness-of-fit hypothesis test. Time series do not necessarily comprise independent and identically distributed (iid) random variables (as is assumed in, say, [11]), so utilising more general models, such as autoregressive models (as is assumed in, say, [12]), is more appropriate when generating time series bootstrap samples. Here, we assume an autoregressive process of order one [4,5], abbreviated to AR(1), with α-stable innovations, as in [13,14]. In Section 4, we introduce the cryptocurrency time series we experiment with, and fit them to a stable distribution after applying first-order differencing to the raw data, to obtain stationary processes. In particular, we demonstrate that in this case α < 2, that is, they are not normally distributed. In Section 5, we apply the goodness-of-fit hypothesis test of Section 3 to the cryptocurrency time series described in Section 4 and discuss the results. Finally, in Section 6, we provide our concluding remarks. We note that all computations were carried out using the Matlab software package.

Empirical Survival Jensen-Shannon Divergence
To set the scene, we assume a time series, x = {x 1 ,x 2 ,...,x n }, where x t , for t = 1,2,...,n is a value indexed by time, t, for example, modelling the movement of a stock price. More specifically, a time series of n values is a random sample generated by a stochastic process that forms a sequence of random variables X = X 1 ,X 2 ,...,X n , where each value x i is a realisation of the random variable X i . The stochastic process X may be a sequence of iids, but, more often than not, a time series exhibits temporal dependencies between its values, which is more realistic. We will also assume that the time series is stationary [4,5]. This makes sense in our context, since we are particularly interested in the marginal distribution of x, which we suppose comes from an underlying parametric continuous distribution D.
The empirical survival function of a value z for the time series x, denoted by S(x)[z], is given by where I is the indicator function. In the following, we will let P(z) = S(x)[z] stand for the empirical survival function S(x)[z], where the time series x is assumed to be understood from the context; we will generally be interested in the empirical survival function P, which we suppose arises from the survival function P of the parametric continuous distribution D, mentioned above. The empirical survival Jensen-Shannon divergence (E SJS) [1] between two empirical survival functions, Q 1 and Q 2 , arising from the survival functions Q 1 and Q 2 , is given by where We note that the E SJS is bounded and can thus be normalised, so it is natural to assume its values are between 0 and 1; in particular, when Q 1 = Q 2 its value is zero. Moreover, its square root is a metric (cf. [1]).
For completeness, we provide the definition of the Kolmogorov-Smirnov two-sample test statistic ([10] Section 6.3) between Q 1 and Q 2 as above, which is given by where max is the maximum function, and |v| is the absolute value of a number v. We note that KS2 is bounded between 0 and 1, and is also a metric. Now, for a parametric continuous distribution D, we let φ = φ(D, P) be the parameters that are obtained from fitting D to the empirical survival function, P. The distribution D may, in principle, be any continuous distribution, although here we concentrate on the α-stable distribution, since it allows for the modelling of heavy-tailed data, which poses additional problems to those of light-tailed data, due to the variance (and possibly the mean) being infinite. In particular, we have an interest in cryptocurrency data, which is likely to be heavy-tailed [8].
We now let P φ = S φ (x) be the survival function of x, for D with parameters φ. Thus, the empirical survival Jensen-Shannon divergence and the Kolmogorov-Smirnov two-sample test statistic, between P and P φ , are given by E SJS( P,P φ ) and KS2( P,P φ ), respectively. These values provide us with two measures of goodness-of-fit for how well D, with parameters φ, is fitted to x (cf. [1]).

A Bootstrap-Based Goodness-of-Fit Hypothesis Test
Our hypothesis test makes use of the parametric bootstrap [2,3]; the pseudocode for the parametric bootstrap in our context is given in Algorithm 1. It takes as input a time series x, the distribution D we hypothesise x comes from, and the number of bootstrap samples m; in the simulations we use the typical value of m = 1000 samples [15]. The algorithm outputs two vectors, BV-E SJS and BV-KS2. The first contains m E SJS values, for i = 1,2,...,m, between the empirical survival function P i = S(B i ) for the ith bootstrap sample, B i , and the survival function P φ = S φ (x) of x, for D with parameters φ. Correspondingly, the second contains m KS2 values, for i = 1,2,...,m, between P i and P φ . The bootstrap samples are generated by an AR(1) process with α-stable distribution innovations [14] (see also [13]), which is more realistic than assuming that the samples are generated from an iid process, as in [11].
Let n be the number of values in x; 4.
Let φ = φ(D, P); 5. Let where B i is generated from an AR(1) process with innovations derived from D with parameters φ; 9. Let end for 13.
return BV-E SJS and BV-KS2 sorted in ascending order. 14. end As we have assumed that the time series is stationary, the absolute value |ρ| of the parameter ρ of the AR(1) process generating x should be less than one. For the generation process, we use an estimate ρ of ρ, and, as we will see in Section 4, | ρ| < 1 is satisfied for the data sets we employ, as required. We also add a burn-in period of 100 steps to the AR(1) process generated, which we found to be sufficient for the data sets we used.
Given the bootstrap vectors, BV-E SJS and BV-KS2, and the output from Algorithm 1, we can form confidence intervals for E SJS( P,P φ ) and KS2( P,P φ ), according to the bootstrap percentile method ([16] Section 3.1.2), which is the simplest way to construct a bootstrap confidence interval; see [16] for improvements on the percentile method. We assume that the significance level we are interested in for a hypothesis test is a percentage, and set the significance level to 5%, which is the value we will use in Section 5.
Subsequently, for a one-sided test, we would exclude the highest 5% values from the parametric bootstrap vector, say BV, returned from Algorithm 1, and for a two-sided test we would exclude from BV the lowest 2.5% values and the highest 2.5% values. For both E SJS and KS2 only a one-sided test makes sense, since both metrics are bounded below by zero. Therefore, the null hypothesis is that the distribution of P is D, and so we reject the null hypothesis at the 5% confidence level, if E SJS( P,P φ ) or, correspondingly, KS2( P,P φ ) is greater than the upper bound of the constructed confidence interval, depending on which goodness-of-fit measure we are employing.

Cryptocurrencies and Heavy Tails
As a proof of concept, we analysed four time series data sets. These include the prices of three stablecoins [6]: Tether (https://tether.to, accessed on 1 June 2021), DAI (https://makerdao.com, accessed on 1 June 2021) and USDC (https://www.centre.io/usdc, accessed on 1 June 2021), which are all pegged to the dollar. In addition, for comparison purposes, we make use of a fourth time series data set, the price of the archetypal decentralised cryptocurrency, Bitcoin [7], the price of which has previously been hypothesised to follow the heavy-tailed stable distribution [8].
In Table 1, we describe the details of the time series data we used for the empirical validation of the proposed goodness-of-fit method; the data were obtained from Coin Metrics (https://coinmetrics.io, accessed on 1 June 2021). For the stablecoins, 1 is subtracted from the daily closing rate, so that its value is positive if above 1, zero if exactly 1, and negative if below 1. For analysis purposes we applied first-order differencing [4,5] to all the time series, that is, we computed the difference between consecutive observations, which is useful for removing trends, transforming the price time series into a return series (in future work we will also consider analysing the raw data set without differencing; however, since our main aim is to introduce the hypothesis test, for brevity and clarity of exposition we will not consider this further analysis here). The time series, after differencing was applied to the raw data sets, are shown in Figure 1.  The α-stable distribution (or simply the stable distribution) [9] has four parameters: (i) the characteristic exponent α ∈ (0,2]; (ii) the skewness parameter β ∈ [−1,1] (when β = 0, the distribution is symmetric); (iii) the scale parameter γ; and (iv) the location parameter δ. It is heavy-tailed unless α = 2, when the stable distribution reduces to the light-tailed normal distribution with β = 0. When α < 2, the stable distribution is heavy-tailed, its variance as well as all its other higher moments are infinite; in the case of α ≤ 1, its mean is also infinite. In the following we will refer to a distribution as stable when α < 2, and normal when α = 2.
In Figure 2, we show the histograms of the marginal distributions of the four cryptocurrencies overlaid with the curve of the maximum likelihood fit of the normal distribution to the data. It is visually evident that the normal distribution is not a good fit for these data sets. Kurtosis of a distribution, in this case the marginal distribution of a time series, indicates peakedness and tailedness of the data relative to the normal distribution [17] (for ease of comparison with the kurtosis of the normal distribution, which is 3, we will subtract 3 from the kurtosis, giving the excess kurtosis). In Table 2, we show the excess kurtosis of the four cryptocurrencies, which provides further evidence that none of them follow a normal distribution, and are in fact heavy-tailed.  Next, we fitted the stable distribution to the four data sets using the Matlab implementation provided by [18], which is based on the empirical characteristic function method [19]. The fitted parameters are shown in Table 3, noting that in all cases 1 < α < 2, implying that the means of the marginal distributions are finite but their variances are infinite. Table 3. Parameters from fits of the stable distribution to the data of the four cryptocurrencies.

Application of the Goodness-Of-Fit Hypothesis Test to Cryptocurrencies
We apply the bootstrap goodness-of-fit test presented in Section 3, based on the empirical survival Jensen-Shannon divergence (E SJS) and Kolmogorov-Smirnov two-sample test statistic (KS2) metrics, to construct 95% confidence intervals for E SJS( P,P φ ) and KS2( P,P φ ), where P is the empirical survival function of the input time series and P φ is the survival function of time series x, for D with parameters φ. When running Algorithm 1, we computed 1000 bootstrap samples, that is, we set m = 1000. Moreover, it can be seen in Table 4 that, for all data sets, the estimate ρ of the AR(1) parameter is less than one in absolute value, implying that the generated bootstrap time series, B i , are stationary as required. Table 4. Estimates ρ of the parameter ρ of the AR(1) process for the four cryptocurrencies, noting that, when |ρ| < 1, the process is stationary.

Currency
ρ Tether −0.3604 In Tables 5 and 6, we show the results of the bootstrap hypothesis test when employing the E SJS and KS2 metrics, respectively. In particular, for all data sets, both metrics are within the 95% confidence interval, and thus with 95% confidence we cannot reject the null hypothesis that the marginal distribution of the input time series comes from an α-stable distribution.
The bar chart in Figure 3 shows that for all four cryptocurrencies the width of the confidence interval for the KS2 goodness-of-fit measure is, proportionately, much larger than that of the E SJS goodness-of-fit measure. Statistical tests using measures resulting in smaller confidence intervals are normally considered to be more powerful as this implies, with high confidence, that a smaller sample size may be deployed [20].
Finally, to provide contrast to the stable distribution result, we now hypothesise that the marginal distribution of the time series is actually normal (i.e., α = 2). We see in Table 7 that, for all four cryptocurrencies, we reject the null hypothesis that the marginal distribution is normal, as both the E SJS and KS2 are outside their respective 95% confidence intervals.

Concluding Remarks
We presented a proof of concept of the bootstrap-based goodness-of-fit test on four cryptocurrency time series, concentrating on the α-stable distribution, which allows for the modelling of heavy-tailed data. Our results demonstrate that, when first-order differenced, the marginal distributions of all four time series are all α-stable with α < 2. Moreover, for both E SJS and KS2, the confidence level of the bootstrap-based test is at the 95% level. Furthermore, E SJS is more powerful than KS2 on these data sets, since the widths of the derived confidence intervals for the KS2 measure are, proportionately, much larger than those for the E SJS measure.