Data-Driven Jump Detection Thresholds for Application in Jump Regressions

This paper develops a method to select the threshold in threshold-based jump detection methods. The method is motivated by an analysis of threshold-based jump detection methods in the context of jump-diffusion models. We show that over the range of sampling frequencies a researcher is most likely to encounter that the usual in-fill asymptotics provide a poor guide for selecting the jump threshold. Because of this we develop a sample-based method. Our method estimates the number of jumps over a grid of thresholds and selects the optimal threshold at what we term the “take-off” point in the estimated number of jumps. We show that this method consistently estimates the jumps and their indices as the sampling interval goes to zero. In several Monte Carlo studies we evaluate the performance of our method based on its ability to accurately locate jumps and its ability to distinguish between true jumps and large diffusive moves. In one of these Monte Carlo studies we evaluate the performance of our method in a jump regression context. Finally, we apply our method in two empirical studies. In one we estimate the number of jumps and report the jump threshold our method selects for three commonly used market indices. In the other empirical application we perform a series of jump regressions using our method to select the jump threshold.


Introduction
Aggregate market risks exhibit discontinuities (i.e., jumps) in their dynamics. 1 Bearing such nondiversifiable jump risk is significantly rewarded, as is evident for example from the expensiveness of short-maturity options written on the market index with strikes that are far from their current levels. 2 Therefore, precise time series estimates of the jumps in asset prices will aid greatly in our understanding of the pricing of jump risk.
Since being introduced in Mancini (2001) and Mancini (2004) threshold-based methods have become popular ways to estimate the jumps in time series data. The essential idea of these methods is that if an observed return is sufficiently large in absolute value then it is likely that the interval in which that return was taken contained a jump. To think about such an idea consider a standard jump-diffusion process for a log-asset price: where b s is thought of as the drift of the process, σ s is a time-varying volatility process, and J t is some finite activity jump process. (See Section 2 for a more rigorous definition of the jumpdiffusion processes we consider in this paper.) Defining the returns of the observed process as ∆ n i X ≡ X i∆n − X (i−1)∆n and given a sequence of thresholds, v n , a threshold technique would label a return interval as containing a jump if |∆ n i X| > v n . While Mancini (2001) originally set v n = ∆ n log(1/∆ n ) a common practice has emerged to use v n = α∆ n where α and are parameters selected by the researcher. Typically values are = 0.49 or = 0.45 and α is left as a tuning parameter. If = 0.49 or = 0.45 the parameter α has a convenient interpretation. Since the diffusive moves in X t are on the order ∆ 1/2 n using = 0.49 or = 0.45 we see that the tuning parameter α has the interpretation of being essentially the number of local standard deviations of the process.
A threshold-based jump selection scheme of this form then has the convenient interpretation of labeling returns as containing a jump (or multiple jumps) if the return is larger in absolute value than α local standard deviations. While this provides a nice interpretation of the method, unfortunately the literature leaves the choice α up to the researcher. The goal of the current paper is to 1 There is substantial earlier parametric evidence for jumps, as well as more recent strong nonparametric evidence based on high-frequency data and different jump tests; see, for example, Huang and Tauchen (2005), Barndorff-Nielsen and Shephard (2006), Jiang and Oomen (2008), Lee and Mykland (2008) and Aït-Sahalia and Jacod (2009). 2 The presence of nontrivial market jump risk premium has been well documented in the empirical option pricing literature. provide a method for the selection of α. (We leave = 0.49 or = 0.45 since what is important in practice is the relative size of a 'typical' increment ∆ n i X and v n . See Jacod and Protter (2012) p. 248 for a discussion.) Our primary focus in this paper is effective jump detection for the jump regression context of Li, Todorov, and Tauchen (2015a) and Li, Todorov, and Tauchen (2015b) where X = (Z, Y ). We can think of such a setting as estimating β in the following model where {i(p)} p≥1 are the return intervals in Z thought to contain a jump. Using a set of truncation thresholds of the form v n = ασ∆ n we would estimate {i(p)} p≥1 as {i(p)} p≥1 = {i = 1, . . . , nT : |∆ n i Z| > ασ∆ n }.
Notice how crucially {i(p)} p≥1 and thereby the estimatedβ in (2) depends on the choice of α. If we set too low a threshold we might easily include return intervals in {i(p)} p≥1 that do not actually contain jumps and that could potentially bias our estimated 'jump beta'.
To see why we do not want to set too high a threshold we need to think about the precision of our estimated jump beta. Recall that the precision of a regression coefficient is often defined as the inverse of its variance. To motivate a discussion about the impact of the estimated jumps in Z on the precision of the estimated jump beta make the simplifying assumption that the regression errors in equation (2) are homoskedastic, i.e. var( i(p) ) = σ 2 for every i(p). If that is the case, and we assume the remaining Gauss-Markov assumptions in the regression, than the variance of our jump beta will be var(β) = σ 2 i∈{i(p)} (∆ n i Z) 2 .
This motivates a 'precision index' of the jump beta, i.e.
While our precision index is not equivalent to the true precision of the estimated jump beta it does get after the idea that decreasing α will likely increase the precision of our estimated jump beta because our regression will, if α is decreased sufficiently, contain more return intervals. (Notice that under the simplifying assumptions above that our precision index is proportional to the true precision. Without these assumptions our precision index is likely to be approximately proportional to the true precision as the variances of the residuals in the regression in (2) are unlikely to change dramatically with the addition of new return intervals in the jump regression.) Figure 1 illustrates these ideas. The left panel plots the jump beta for the jump regression of the SDPR S&P500 ETF (SPY) against the SDPR utilities ETF (XLU) for the years 2007 to 2014 using five-minute returns over a grid of α jump threshold parameters. Notice that going down from α = 7 to about α = 3.75 that the hypothesis of a constant jump beta might be supported, i.e. that ∆Y τp = β∆Z τp where {τ p } p≥1 are the true jump times in Z. The plotted jump beta is obviously noisy, but the estimated jump betas might very well be centered around a true and constant value.
However after about α = 3.75 the estimated jump betas for this asset begin to rapidly decline. It is not hard to imagine that after about α = 3.75 the jump regressions became wildly corrupted by the addition of return intervals containing only diffusive moves. The right panel plots our 'precision index' for the estimated jump betas along the same grid of α jump threshold parameters. Notice how the precision index increases as α decreases and we add return intervals to the jump regression.
As in the left panel we plot a line at α = 3.75. If after around α = 3.75 our jump regression begins to be rapidly corrupted by the addition of return intervals that only contain diffusive moves than, even though our precision index continues to increase, our estimates of the jump beta are likely to be significantly biased. These panels illustrate the trade-off mentioned earlier in selecting a jump threshold. To increase the precision of our estimated jump beta we would like a low jump threshold, but too low a jump threshold will likely bias our jump beta since we will likely include many returns that only contain diffusive moves.
In this paper we develop a new method to balance the trade-off of setting too low a threshold and potentially including return intervals that only contain diffusive moves versus setting too high a threshold and potentially excluding return intervals that actually contain true jumps. The main idea is to find the value of α for which the jump count function (defined below) 'bends' most sharply. Intuitively this could be thought of as the 'take-off' point of the jump count function.
Selecting a threshold at this 'take-off' point should greatly reduce the number of misclassifications while maintaining many of the true jumps. We implement this idea by computing the point of maximum curvature to a smooth sieve-type estimator applied to the jump count function.
A related paper Figueroa-López and Nisen (2013) derives an optimal rate for the threshold in a threshold based jump detection scheme with the goal of estimating the integrated variance. Using a loss function that equally penalizes jump misclassifications and missed jumps, Figueroa-López and Nisen (2013) find that the optimal threshold should be on the order of v * n = 3σ 2 ∆ n log(1/∆ n ) + o( ∆ n log(1/∆ n )) similar to the threshold originally proposed in Mancini (2001). However they do not provide theoretical guidance on the scale of v n . While we could have used such a formulation in our paper given that the difference in the relative convergence rates of ∆ n log(1/∆ n ) and α∆ n  given that using v n = α∆ n provides a convenient interpretation for the parameter α we feel using v n = α∆ n is preferable.
The rest of the paper is organized as follows. Section 2 presents the setting. Our methodology and the main theory about its consistency are developed in Sections 3-4. Section 5 and Section 6 presents the results from a series of Monte Carlo studies and two empirical applications respectively.
Finally, Section 7 provides a conclusion. All proofs are in the appendix.

The Setting
We start with introducing the formal setup for our analysis. The following notations are used throughout. We denote the transpose of a matrix A by A . The adjoint matrix of a square matrix A is denoted A # . For two vectors a and b, we write a ≤ b if the inequality holds componentwise. The functions vec (·), det (·) and Tr(·) denote matrix vectorization, determinant and trace, respectively. The Euclidean norm of a linear space is denoted · . We use R * to denote the set of nonzero real numbers, that is, R * ≡ R \ {0}. The cardinality of a (possibly random) set P is denoted |P|. For any random variable ξ, we use the standard shorthand notation {ξ satisfies some property} for {ω ∈ Ω : ξ (ω) satisfies some property}. The largest smaller integer function is denoted by · . For two sequences of positive real numbers a n and b n , we write a n b n if b n /c ≤ a n ≤ cb n for some constant c ≥ 1 and all n. All limits are for n → ∞. We use P −→, L −→ and L-s −→ to denote convergence in probability, convergence in law, and stable convergence in law, respectively.

The underlying processes
The object of study of the paper is the optimal selecting of the cutoff level for a threshold-style jump detection scheme. Let X be the process under consideration and, for simplicity of exposition, assume that X is one-dimensional. (The results can be trivially generalized to settings where X is multidimensional and the jump arrival times of its individual elements are disjoint, i.e., the jump components of its elements are independent of each other.) We proceed with the formal setup. Let X be defined on a filtered probability space represented as (Ω, F, (F t ) t≥0 , P). Throughout the paper, all processes are assumed to be càdlàg adapted. Our basic assumption is that X is an Itô semimartingale (see, e.g., Jacod and Protter (2012), Section 2.1.4) with the form where the drift b t takes value in R; the volatility process σ t takes value in R + , the set of positive real numbers; W is a standard Brownian motion; δ : Ω × R + × R → R is a predictable function; µ is a Poisson random measure on R + × R with its compensator ν (dt, du) = dt ⊗ λ (du) for some Finally, the spot volatility of X at time t is denoted by c t . Our basic regularity condition for X is given by the following assumption.
The only nontrivial restriction in Assumption 1 is the assumption of finite activity jumps in X.
This assumption is used mainly for simplicity as our focus in the paper are 'big' jumps, i.e., jumps that are not 'sufficiently' close to zero. Alternatively, we can drop Assumption 1(c) and focus on jumps with sizes bounded away from zero. 3 Turning to the sampling scheme, we assume that X is observed at discrete times i∆ n , for 0 ≤ i ≤ n ≡ T /∆ n , within the fixed time interval [0, T ]. The increments of X are denoted by ∆ n i X ≡ X i∆n − X (i−1)∆n , i = 1, . . . , n.
Below, we consider an infill asymptotic setting, that is, ∆ n → 0 as n → ∞.

Limits
Here we present some initial results needed to develop the data-drive method described in Section 4.
To do so we first discuss how to think about inference for the jumps; next, we introduce the jump count function, and then we proceed to discuss jump misclassifications.

Inference for the jump marks
As was discussed in the introduction, in order to disentangle jumps from the diffusive component of asset returns, we choose a sequence v n of truncation threshold values which satisfy the following condition: v n ∆ n for some constant ∈ (0, 1/2) .
In order to analyze the jumps of the process X it is helpful to introduce some notation. First, define {τ p } p≥1 to be the successive jump times of the process X. Next, define two random sets P ≡ {p ≥ 1 : τ p ≤ T } and T ≡ {τ p : p ∈ P} which collect respectively the indices of the jumps times in the interval [0, T ] and the jump times themselves. Since the jumps in X are assumed to be of finite activity, these two sets are almost surely finite as well. For the jump in X that occurs at time τ ∈ T , we call (τ, ∆X τ ) its mark. Finally, define a Borel measurable subset D ⊂ [0, T ] × R * as a (temporal-spacial) region. We do so in order to think about restricting our observation set to only those jumps that fall within a given region. To do so define the set P D ≡ {p ≥ 1 : (τ, ∆X τ ) ∈ D}.
With these definitions we can think about the true and estimated sets that index the jumps in a given sample. For each p ∈ P, we denote by i (p) the unique random index i such that τ p ∈ ((i − 1) ∆ n , i∆ n ]. We set The set-valued statistic I n (D) collects the indices of returns whose 'marks' ((i − 1)∆ n , ∆ n i X) are in the region D, where the truncation criterion |∆ n i X| > v n eliminates diffusive returns asymptotically. The set I (D) collects the indices of sampling intervals that contain the jumps with marks in D.
Clearly, the set I (D) is random and unobservable. We also impose the following mild regularity condition on D, which amounts to requiring that the jump marks of X almost surely do not fall on the boundary of D.
Under Assumptions 1 and 2 it can be shown that for a fixed v n ∆ n that I n (D) consistently estimates the jumps, i.e. I n (D) P −→ I (D). (See, for example, Li, Todorov, and Tauchen (2015a).) The goal of the current paper is to make v n dependent on the sample and the sampling frequency.

The jump count function
The now-standard method to define the truncation level is v n = α √ σ 2 ∆ n for some constant ∈ (0, 1/2) .
where σ 2 is an estimate of the general level of local variance, typical settings are = 0.49 or = 0.45, and α is a tuning parameter. Since the diffusive moves in X are on the order ∆ 1/2 n and is just under 1/2, the tuning parameter α has the convenient interpretation of essentially being the number of local standard deviations. This definition of v n motivates a definition of the sample index of the jumps that depends on the truncation threshold α. With this in mind define By the presumed finite activity of the jump process in X there are only a (random) finite number of jumps and we wish to identify the set I n (α, D).
In order to do so it proves convenient to define the jump count function Evidently, N n (α) is non-increasing, piecewise flat with discontinuities at the order statistics of |∆ n i X|. Notice N n (α) decreases to zero as α → ∞. For each fixed α, it can be shown that for a large enough n that (See, for example, Li, Todorov, and Tauchen (2015a).)

Jump misclassifications
We think of a jump selection procedure as having a 'misclassification' if, for some return interval ∆ n i X, we |∆ n i X| > ασ∆ n yet over the region ((i − 1)∆ n , i∆ n ] we have ∆X t = 0. That is, if we label the return interval as containing a jump when no true jump occurred.
In order to think about jump misclassifications consider the jump count function solely for the diffusive moves. Defining the continuous moves of the process as X c t = X t − s≤t ∆X s for t ≥ 0 and ∆ n i X c ≡ X c i∆n − X c (i−1)∆n we can define the jump count function of the continuous moves as For a given jump threshold α and sampling frequency ∆ n , the function N c n (α) counts the diffusive moves that are 'incorrectly' labeled as jumps. Since the diffusive moves are locally Gaussian we σ 2 ∆ n is simply a Bernoulli random variables with probability of success equal to 2Q(α∆ −1/2 n ). This is because where Z is a standard normal random variable. Because of this N c n (α) is simply a binomial random variable with the same probability and n draws. This implies For a fixed α > 0 it is fairly straight forward to show that E[N c n (α)] = n2Q(α∆ −1/2 n ) → 0 as ∆ n → 0, which implies in the limit that the number of misclassifications goes to zero. This result however turns out not to be a good guide for the range of sampling frequencies most often encountered in practice.  researcher is most likely to encounter that the truncation threshold should actually be increased not decreased as the sampling frequency increases. Because of this result, while we remain alert to the asymptotic theory, we seek to find an optimal threshold parameter α that is sample driven, not one based solely on the asymptotic theory.

The Curvature Method
As briefly discussed above in the introduction, the selection of a jump threshold, i.e. the selection of α in equation (10), involves a trade off between setting too high a threshold and failing to include all of the jumps against setting too low a threshold and erroneously labeling diffusive moves as jumps. For example, setting α = 0 would correctly identify every jump but would also include every diffusive move. Similarly, setting α > max i |∆ n i X| would guarantee that no diffusive moves were incorrectly labeled as jumps, but would fail to identify any of the jumps.
We can use the results of Section 3 to guide the selection of a suitable α. Under the modeling assumptions of Section 2, there are a finite (but random) number of jumps N * on the interval [0, T ]. From the theory (Jacod and Protter, 2012;Li, Todorov, and Tauchen, 2015a) we know that for any fixed α the truncation scheme correctly classifies all N * jumps when n is sufficiently large.
Furthermore, for a fixed n and for higher values of α we should expect the jump count function to have a long flat region that is level at about N * , but we should also expect the jump count function to rise sharply at lower values of α where many diffusive moves start getting erroneously classified as jumps. So the task is to determine from the jump count function that value of α where the jump count function starts to increase sharply as α declines. We think of this point as the point at which the jump count function begins to 'take-off'. Our solution to find this 'take-off' point is to look for the value of α at which the jump count function N n (α) 'kinks' or 'bends' most sharply.
The way to mathematically define a 'kink' or sharpest 'bend' in a smooth function is the point of maximum curvature. The curvature of a smooth function f : R → R is defined as Given this problem, we work with a smoother sieve estimator fitted to the jump count function.
A natural choice might seem to be kernels or splines but these turn out to be ineffective due to the small wiggles and discontinuities that these functions have in their higher order derivatives. These wiggles and discontinuities in turn significantly affect the curvature of these functions making the point of maximum curvature often more dependent on the particular choice of which kernels or splines was chosen rather than the data. A far better approach is to do a least-squares projection of the observed jump count function onto a set of smooth basis functions. Given the shape of the jump count function, we use basis functions where we need p ≥ 1 for the point of maximum curvature to be well-defined. Using these basis functions we can define the projection of the jump count function onto g(α, γ) as In practice we find that these basis functions result in projections with extremely tight 5 fits that have very high R 2 s for low values of p = 3 or p = 4. Because of this the projection itself amounts 5 The projection minimizes a a (Nn(α) − g(α, γ)) 2 dα with respect to γ. to a compact numerical representation of approximately the same information as in the raw jump count function itself.
With this idea in mind we select α * n as the value that maximizes the curvature of the appropriately smoothed jump count function, i.e.
where α ∈ [ᾱ ,ᾱ] ⊂ R + . We refer to such a selection method in what follows as the 'curvature method'.
Setting the threshold right at this point of maximum curvature or 'kink' point then allows for a great many of the true jumps to be located, but guards against overly misclassify diffusive moves.
Because of this, the procedure is evidently very conservative in that it lets through only a small number of diffusive moves. However, in a jump regression setting a very conservative jump selection procedure is to be preferred as the loss from including diffusive moves is very high because doing so potentially biases the estimates whereas incorrectly missing a true jump only entails a small loss of efficiency.
Though conservative, the curvature method is asymptotically accurate. We show this in Theorem 1 below. The theorem shows that in the limit the curvature method correctly identifies all of the jumps and excludes any returns that contain only diffusive moves. The theorem relies on the following definition for the convergence of random vectors with possibly different length: for a sequence N n of random integers and a sequence ((A j,n ) 1≤j≤Nn ) n≥1 of random elements, we write (A j,n ) 1≤j≤Nn Theorem 1. Under Assumptions 1 and 2 and with α ∈ [ᾱ ,ᾱ] ⊂ R + we have that (a) P[I n (α * n , D) = I(D)] → 1, and The theorem above shows that as ∆ n → 0 that the jump count function N n (α * n ) using our procedure will converge in probability to the true number of jumps and that the estimated index of the jumps I n (α * n , D) over a region D will converge in probability to the true index I (D) over that region.

Monte Carlo studies
We evaluate the performance of our threshold selection method on simulated data in three Monte Carlo studies. The first study compares our method with a method that simply chooses a fixed value of the truncation parameter α. The second study evaluates how our method does at recovering jumps of varying magnitudes. Finally, the third study shows the performance of our method in a jump regression setting.

Comparing our method with choosing a fixed truncation constant
In the first Monte Carlo study we evaluate the performance of our threshold selection method against a method that simply chooses a fixed value of α. (Where recall we label a return interval as containing a jump if |∆ n i X| > α∆ n .) The sample span is one year, containing T = 252 trading days. Each day we simulate data using N = 390 × 60 high-frequency returns to match what would correspond to one second sampling and consider return intervals of one second, one minute, five minutes, and ten minutes. There are 1000 Monte Carlo replications and we set = 0.49.
The data generating process in this Monte Carlo study follows the model below. The model is taken from Li, Todorov, and Tauchen (2015b) and accommodates features such as the leverage effect, price-volatility co-jumps, and heteroskedasticity in jump sizes. Let W and B be independent Brownian motions. We generate prices according to where the components are given by In addition to the selected threshold α * n , we report two statistics for the Monte Carlo study. The first we term the jump 'recovery rate'. This is the number of correctly identified jumps divided by the number of true jumps. A recovery rate of 100% means every true jump was correctly identified whereas a recovery rate of 0% means no true jumps were identified. The second statistic we term the 'accuracy' of the jump detection procedure. This is the number of correctly matched jumps divided by the number of estimated jumps. An accuracy of 100% means that every return interval NOTE: REC is average jump recovery rate, ACC is the average accuracy of estimated jumps, and α * n is the average selected threshold parameter α across the Monte Carlo replications. The jump recovery rate is defined as the number of correctly matched jumps divided by the number of true jumps. The jump accuracy is defined as the number of correctly matched jumps divided by the number of estimated jumps. The jump accuracy and recovery rate are in percentage terms. The results are based on 1000 replications following the data generating process outlined in (21) and (22). we estimated to include a jump actually contained a true jump whereas an accuracy of 0% means that none of the return intervals we estimated to include a jump actually contained a true jump. selected value of α * n decreases from ten minute sampling down to one second sampling. Such a result is to be hoped for since over the sampling range of ten minutes to one second the number of jump misclassifications, as was shown in Section 3.3, is actually increasing at higher sampling frequencies. A method that attempted to minimize jump misclassifications would ideally increase the jump threshold over this range to guard against such misclassifications. Our method appears to make some effort to do so.
Notice that the average recovery rates of the curvature procedure are generally as good as and sometimes better (rarely worse) than those using a fixed α. At the same time, the average accuracy of the procedure is above 90% for all sampling frequencies, unlike the fixed α = 4 case.
The curvature method can achieve substantially increased recovery rates with little sacrifice in accuracy. As for the other values of α, the accuracy remains high but at the expense of a lower recovery rate than that of the curvature method.

Recovering jumps of varying magnitudes
For the second Monte Carlo study we use modification of a standard setup to examine how our method performed in recovering jumps of differing magnitudes. To this end we simulated jumps that, with equal probability, took sizes varying from one to ten unit standard deviations of the local volatility. 6 To do this we modeled the jumps as following a compound Poisson process, that once scaled for the local volatility, had a jump size density that followed a discrete uniform distribution taking values in the range {1, 2, . . . 10}. Using such a jump density allows us to observe how well our method can and cannot detect jumps of various magnitudes.
Letting (W t , B t ) be a vector of Brownian motions with Corr(W t , B t ) = 0.5, the model is defined N t is a Poissson process with intensity λ = 1/12 × 252, and where u t is an i.i.d. discrete uniform distribution that takes values in {1, . . . , 10}. (Setting Corr(W t , B t ) = 0.5 allows for a dependence between X t and V t , i.e. a leverage effect.) We set λ = 1/12 × 252 so there should be on average a one-twelfth chance of a jump occurring each day. This is consistent with previous studies on market jumps. The data generating process for the diffusive moves and the volatility process is similar and based on that found in Li, Todorov, and Tauchen (2015a).
We perform the study using 1000 replications and set T = 3 × 252, which corresponds to three years' worth of simulated data. We use an Euler scheme to simulate the high-frequency data doing an initial simulation with N = 390 × 60 × 10 which corresponds to sampling once every tenth of a second. We then sample these high-frequency returns at one second, one minute, five minute, and ten minute frequencies. we recovery most jumps greater than eight local standard deviations, a few jumps between five and seven local standard deviations, and virtually no jumps of sizes one to four local standard deviations. Sampling at a five minute frequency we make significant gains in recovering jumps of five to seven local standard deviations. At a one minute sampling frequency we can uncover nearly all jumps except those of one local standard deviation. Finally, at one second sampling all of the 6 Where to preserve the jump sizes across sampling frequencies we used the local volatility in terms of return intervals at the coarsest sampling frequency, which here corresponded to sampling at a ten minute frequency. jumps are recovered. (Note though that the increase in the sampling frequency going from one minute to one second sampling is significantly greater than going from ten to five to one minute sampling so the stark contrast between the one minute and the one second sampling should not be exaggerated.) The average selected threshold parameter α * n appears to decrease somewhat from ten minute to five minute to one minute sampling, but increases quite significantly going from one minute to one second sampling. Following the discussion in Section 3.3 the large increase in the selected threshold from one minute to one second sampling is to be hoped for as the number of expected jump misclassifications increases greatly going from one minute to one second sampling. The slight decrease in the average selected threshold parameter going from ten minute to one minute sampling, while not ideal in terms of the arguments of Section 3.3, does not appear to drastically change the accuracy of the estimated jumps. The accuracy over these three sampling frequencies is always above 98% and only decreases to 94.28% at one second sampling.

Jump regression setting
The third Monte Carlo study examines how our procedure performs in a jump regression context.
A thorough overview of jump regressions can be found in Li, Todorov, and Tauchen (2015a). Below we only give a brief overview of jump regressions and the results we use in our Monte Carlo study.
Given two series of returns ∆ n i Z (often a proxy for the market) and ∆ n i Y (often the return on an asset price) a jump regression considers a regression of ∆ n i Z on ∆ n i Y only over the return intervals in which Z is thought to contain a jump. The null in many jump regression settings is that the jump regression coefficient, termed the jump beta, is constant at every jump time, i.e. ∆Y τp = β∆Z τp , where τ p are the jump times of Z.
For this Monte Carlo study we perform a test of a constant jump beta under both a simulated model that has a constant jump beta and a model with a time varying jump beta. We report rejection rates for the test as well as the average selected thresholds α * n and the accuracy and recovery rates of the estimated jumps. For the test of a constant jump beta we use a bootstrap version of the determinant test of Li, Todorov, and Tauchen (2015a).
We simulate data using a model adapted from Li, Todorov, and Tauchen (2015a). The model NOTE: The recovery rates are for jumps of sizes equal to 1-10 unit local standard deviations in terms of return intervals sampled at a ten minute frequency. The σ above indicates a unit of local standard deviation. The average selected threshold parameter α across Monte Carlo replications is denoted α * n . The jump recovery rate is defined as the number of correctly matched jumps divided by the number of true jumps. The jump accuracy is defined as the number of correctly matched jumps divided by the number of estimated jumps. The results are based on 1000 replications following the data generating process outlined in Section 5.2.
where W ,W , and B are three independent Brownian motions. J t andJ t are compound Poisson jump processes where the jump size densities follow double-exponential (or Laplacian) distributions and the jump intensities are λ = 1/12 × 252 andλ = 1/48 × 252 respectively. We set β c = 0.89.
The jump beta process β J t follows the following specifications under the null and the alternative whereB is a Brownian motion independent of W ,W , and B. The unconditional mean of β J t under the alternative is 1. The model differs from Li, Todorov, and Tauchen (2015a) only in the specification of different jump and diffusive betas.
We perform the study using 1000 replications and set T = 3 × 252, which corresponds to three years' worth of simulated data. We use an Euler scheme to simulate the high-frequency data doing an initial simulation with N = 390 × 60 × 10 which corresponds to sampling once every tenth of a second. We then sample these high-frequency returns at one second, one minute, five minute, and ten minute frequencies. These parameters were chosen to match the Monte Carlo study in Li, Todorov, and Tauchen (2015a) as closely as possible. moderately over-sized at ten and five minute sampling and not terribly over-sized at one minute sampling. (This is perhaps to be somewhat expected as Li, Todorov, and Tauchen (2015a) found the test of a constant jump beta to be moderately over-sized.) These fairly mild over rejections using the curvature method however are in stark contrast to using a fixed α = 4. Notice that using a fixed α = 4 how the size of the tests becomes progressively worse and worse as the sampling frequency increases. Even at ten and five minute sampling the test is quite over-sized. This result is due to the inclusion of return intervals in the jump regression that only contained diffusive moves thereby biasing the estimated jump beta. To see this notice that using α = 4 the accuracy of the jump detection procedure deteriorates significantly as the sampling frequency increases. At one minute sampling the average accuracy is 88.1% and at one second sampling the average accuracy is a very low 26.5%. This means that in the respective jump regressions on average fully 11.9% and 84.5% of the respectively estimated returns did not actually contain a jump. In contrast using the curvature method the accuracy of the estimated jumps remains high at all sampling frequencies.
Finally, note that the power of the test using both methods is consistent with the results in Li, Todorov, and Tauchen (2015a).  Li, Todorov, and Tauchen (2015a). REC is average jump recovery rate, ACC is the average accuracy of estimated jumps, and α * n is the average selected threshold parameter α across the Monte Carlo replications. The jump recovery rate is defined as the number of correctly matched jumps divided by the number of true jumps. The jump accuracy is defined as the number of correctly matched jumps divided by the number of estimated jumps. The jump accuracy and recovery rate are in percentage terms. The results are based on 1000 replications following the data generating processes for the null and alternative as outlined in Section 5.3.

Empirical application
We considered two empirical applications. The first estimates the jumps and reports the jump threshold selected by our method for three commonly used and high liquid market indices. The second application reports the results of jump regressions of the nine SDPR sector ETFs against the SDPR S&P500 ETF using our method to select the jump threshold.

Estimating jumps in market indices
For the E-mini S&P500 index futures (ES), the SPDR S&P500 ETF (SPY), and the VIX futures (VIX) we use the tools developed in this study to estimate the optimal jump thresholds for each series over a range of dates and a range of sampling frequencies. We report both the jump threshold selected by our method as well as the estimated number of jumps at each selected jump threshold.
The SPY and ES series span the dates January 3, 2007 to December 12, 2014. The VIX series spans the dates July 2, 2012 to April 30, 2015. Only the more recent VIX futures data are used because Bollen, O'Neill, and Whaley (2015) provide evidence that the VIX futures market was highly illiquidity and immature in prior periods. For each series we remove market holidays and partial trading days; and, to guard against possible adverse microstructure effects, we discard the first five minutes and the last five minutes of each trading day.
For each series we performed the estimation over both the entire sample and each complete calendar year within each sample. In addition, we performed the estimation using one minute, five minute, and ten minute intraday returns. Tables 5 and 6 report the selected jump threshold α * n and the estimated number of jumps at the selected jump threshold.
In Table 6 which reports the selected jump threshold α * n , notice that for the E-mini S&P500 index futures (ES) and the SPDR S&P500 ETF (SPY) there appears to be somewhat of an increase in the selected threshold as the sampling frequency increases from ten minute to five minute to one minute sampling. As was discussed in Section 3.3 this is to be hoped for as the number of jump misclassifications is actually increasing over this range of sampling frequencies. For the VIX futures (VIX) we do not see much of a pattern in the selected jump threshold α * n . This however should not be seen as evidence against our threshold selection procedure since Andersen, Bondarenko, Todorov, and Tauchen (2015) provide evidence that the high-frequency returns of the VIX futures might be well modeled as following an α-stable distribution with α ≈ 1.8. If this were true then not only would we not expect the same misclassification dynamics as in the diffusive case, but the correct scaling of the returns would be on the order of ∆ α n rather than ∆ 1/2 n . For Table 5, which reports the estimated number of jumps, notice that the number of estimated  NOTE: The table reports the estimated number of jumps for each sample at the chosen α * n jump threshold given in Table  6. The frequency refers to the sampling frequency of the returns. The SPY and ES series span the dates January 3, 2007 to December 12, 2014. The VIX series spans the dates July 2, 2012 to April 30, 2015. jumps is always increasing as the sampling frequency increases. Note also that the number of jumps detected at the 5-min and 10-min frequencies is very small, reflecting the inherent conservative nature of the curvature method. In practice, common sense suggests that at these coarser frequencies the practitioner might elect to experiment a bit with slightly lower values of α than those produced directly by the curvature method, which does define a sensible baseline however.

Jump regressions
Using the nine SDPR sector ETFs we perform a series of jump regressions of the sector ETFs against the SDPR S&P500 market ETF (SPY). We determine the jumps in the SPY series via the jump threshold parameter α * n based on the curvature method developed Section 4 above. Then to examine how sensitive these jump regressions are to different jump thresholds we consider two other thresholds α + n and α − n which are equal to α * n plus and minus 15% respectively. The reason for basing the jump threshold on the SPY series is that a jump regression only considers the beta for the regression of the specific asset return on the market return for intervals in which the market (SPY) is thought to have jumped. Note that the data are for the year 2009 and that we use oneminute returns to estimate the jumps but five-minute returns to perform the jump regressions. 7 We chose the year 2009 because it was a representative year and one for which there appeared empirical support for a constant jump beta for each asset over the year. 8 Table 7 reports the jump beta, the standard error of the jump beta, the R 2 of the regression, and the p-value of the null of a constant jump beta over the year. The p-values are calculated using a bootstrap version of the determinant test in Li, Todorov, and Tauchen (2015a). The standard errors are calculated under a simplifying assumption that the volatility process of the diffusive moves is continuous across the market jump times; otherwise, inference becomes far more complicated but the conclusions barely changed in the end. For some of the portfolios the estimated beta seen in the table seems relatively insensitive to the 15% perturbations to α * n , but there are some notable exceptions. In particular, the jump beta for the XLF (Finance) portfolio is quite lower (1.182 vs 1.687) using α − n versus α * n . The same is also true but to a lesser degree for XLK (Technology), XLU (Utilities), and XLV (Health Care). These four are economically important portfolios where the beta value matters, and one does not want a misleading estimate obtained by letting in too many diffusive moves and thereby throwing off the jump regression. At the same time, note that 7 The SPY asset is sufficiently liquid to use to identify jump intervals at the 1-min level; the subsequent aggregation to 5-min returns is a correction for possible trading friction noise in the returns of the less liquid sector-specific assets.
8 Not all years showed such evidence of constant jump betas. For the sake of exposition we do not report the results from these years since there is not as much to learn from examining the jump regression results using different jump thresholds if the jump beta is time-varying. Results for all years are available on request, however.  The jumps were located using one-minute returns and the jump regressions where performed using five-minute returns. The threshold α * n is the estimated threshold using the curvature method and α + n and α − n are α * n plus and minus 15% respectively. The standard errors are calculated under the simplifying assumption that the volatility is continuous over the day. The p-values are from a bootstrap version of the determinant test in Li, Todorov, and Tauchen (2015a). for all nine portfolios the estimation precision obtained with α * n is higher (lower standard error) than with α + n , which reflects of course the inclusion of the more jumps, i.e., data points.

Conclusion
This paper introduced a method for selecting the threshold in threshold-based jump detection schemes. Previously the selection of the threshold in such schemes has been left to each researcher in each project to choose. This creates a problem because the number of estimated jumps in a series of observed returns can vary substantially depending on which threshold a researcher selects.
Our method therefore advances the existing literature on asset price jumps because it provides a method for the selection of the jump threshold. Even further, we believe researchers will find our method intuitive and easy to implement in practice.
In developing our method we first showed that over the range of sampling frequencies a researcher is most likely to encounter that the standard in-fill asymptotics provide a poor guide for the selection of the jump threshold. Because of this we developed a sample based method. Our method is developed as follows. Given a series of observed returns, our method relies on first estimating the number of jumps in this series over a grid of possible thresholds. Doing so results in a jump count function where the value of the function is the number of estimated jumps in the series of returns at each value of the threshold in the grid. Our method then selects the chosen threshold as the threshold for which the curvature of a suitably smoothed version of the jump count function is maximized. We think of this point as being the point were the estimated number of jumps begins to 'take-off'. We argue that selecting the threshold at this point should include many of the true jumps in the process and should guard against overly including returns that only contain diffusive moves. As the sampling size of the returns goes to zero we show that such a methodology will consistently estimate the jumps of a jump-diffusion model and asymptotically will exclude returns that only contain diffusive moves.
Having developed a methodology for selecting the threshold in threshold-based jump detection schemes we show its performance in several Monte Carlo studies and an empirical application. The Monte Carlo studies showed our method was able to recovery many of the true jumps in the data generating processes considered and maintained a high degree of accuracy in the returns it labeled as containing jumps. Further, one Monte Carlo study showed the improvement our method gave in a jump regression context. Finally, in two empirical studies we applied the method discussed to real world data. In the first empirical study we estimated the number jumps and provided the jump threshold selected by our method over a range of dates and sampling frequencies for three commonly used series in finance: the SPDR S&P500 ETF, the S&P500 E-mini futures, and the VIX futures. In the second empirical study we performed a series of jump regressions where we regressed the return intervals thought to contain jumps in the SDPR S&P500 ETF (SPY) on the corresponding return intervals in the SDPR sector ETFs using our method to select the jump times.

Appendix: Proofs
To notational brevity below we refer to the optimally selected jump threshold as α n , that is α n = arg max κ(g n (α)), rather than α * n as in the main text. For two positive sequences of real numbers a n and b n we use the following set of notations below: we write a n a ∼ b n if a n /b n → 1 as n → ∞, we write a n = O(b n ) if for some N * ∈ N there exists c > 0 such that for all n > N * we have a n ≤ cb n , we write a n = Ω(b n ) if for some N * there exists c > 0 such that for all n > N * we have a n ≥ cb n , and finally we write a n = Θ(b n ) if for some N * there exists c > 0 and d > 0 such that for all n > N * we have cb n ≤ a n ≤ db n .
We also drop the dependence of I n (α, D) and I(D) on the region D and simply write I n (α) and I instead.
Proof of Theorem 1. (a) Since the jumps of X have finite activity, we can assume without any loss of generality that each interval ((i − 1)∆ n , i∆ n ] contains at most one jump. (If not we can restrict the focus to the w.p.a.1 set of the sample paths upon which this condition holds.) We denote the continuous part of X by Following Li, Todorov, & Tauchen (2014) notice that I n (α) can be broken into two disjoint sets I 1n (α) and I 2n (α) defined as I 1n (α) = I ∩ I n (α) and I 2n (α) = I n (α) \ I(α).
To see this notice that, almost surely, Therefore w.p.a.1 the sets I 1n (α n ) and I will coincide.

Part (ii):
We first show a result concerning the distribution of the diffusive moves. Notice that because the diffusive moves are locally Gaussian that for a fixed α > 0 that we have P(|∆X c i | > ασ∆ n ) = 2Q(α∆ −1/2 n ) where Q(·) = 1−Φ(·) and Φ(·) is the cumulative distribution function of the standard normal density.
Having provided rates for how the coefficients of g n (α) go to zero we will use these results to describe the behavior of the the curvature function of g n (α) in the limit as well. Doing so will allow us to think about how α n = arg max κ(g n (α)) will behave in the limit. The curvature 10 of g n (α) is κ(g n (α)) = g n (α) (1 + [g n (α)] 2 ) 3/2 .
Having shown that α n → α we will find its rate. First, however, we need to establish a result concerning linear functions. Note that for any non-zero linear function f (x) : R → R the rate at which f (x) → f (c) when x → c for a constant c ∈ R will be the same as the rate that x → c because we express f (x) = mx + b for some m = 0 and b ∈ R. With this in mind define the function h n (α n ) = κ(g n (α n )) − g n (α n ) and consider its Taylor approximation around α n = α when α is 'small'. That is h n (α n ) = h n (α) + h n (α)α n + O(α 2 n ).