Abstract
The risk-neutral density is a fundamental concept in pricing financial derivatives, risk management, and assessing financial markets’ perceptions over significant political or economic events. In this paper, we propose a new nonparametric method for estimating the risk-neutral density using natural cubic splines (NCS). The estimated density is twice continuously differentiable with linear tails at both ends. Our method targets the logarithm of the underlying asset price, releasing the restriction to the positive domain. We theoretically prove the consistency of our NCS method. We conduct a comprehensive empirical study comparing the proposed NCS method with a piecewise constant method, a uniform quartic B-spline method, and a cubic spline method from the literature using 20 years of S&P 500 index option data. The empirical results show that our NCS method is more robust than the piecewise constant method, which can only produce a discontinuous density, especially for options with maturities longer than six months. Moreover, our NCS method outperforms other historical continuous methods in terms of optimization feasibility and option price estimation.
1. Introduction
Risk is a fundamental factor in asset pricing. Suppose a certain asset has an expected payoff in the future. To determine the current value of this asset, we need to adjust its expected payoff based on investor risk preferences. For example, a risk-averse investor would price the asset lower than its expected future payoff. Nevertheless, adjusted rates can vary among investors due to differing risk preferences. A risk-neutral distribution encapsulates all investors’ risk sentiments into the probability of future outcomes of the asset. To price the asset, we can take the expectation of the future payoff with respect to the risk-neutral distribution, and then discount the expected payoff at a risk-free rate for all investors. As a fundamental theorem of asset pricing in both financial economics and mathematical finance, a risk-neutral distribution exists under the assumption of no arbitrage (Pascucci 2011) or no free lunch with vanishing risk (Björk 2009).
Risk-neutral distributions have many applications. Inspired by Lee (2008), we list the applications as follows: First, it can be used to price complex derivatives (Grünbichler and Longstaff 1996; Rosenberg 1998). A derivative is a contract between two or more parties whose value depends on the performance of the agreed underlying asset. The fair price of a derivative can be calculated given its pay-off function, the risk-neutral density (RND) of the underlying asset at maturity, and the risk-free rate. In practice, the risk-free rate is typically replaced by the interest rate of zero-coupon Treasury bills. Second, risk-neutral distributions can be utilized in risk management to calculate Value-at-Risk (VaR), a commonly used metric for measuring the potential risk within a portfolio over a specific time frame (Aït-Sahalia and Lo 2000). Third, it allows us to assess market perceptions of significant political and economic events, which can greatly impact financial markets. For example, Melick and Thomas (1997) examined market expectations during the Persian Gulf Crisis, while Söderlin (2000) investigated perceptions of the European Exchange Rate Mechanism (ERM) Crisis.
There is rich literature on determining risk-neutral distribution through option pricing. European options are contingent claims whose values reflect the market’s anticipation of the underlying asset’s future prices. As first mentioned by Cox and Ross (1976), under the assumption of no arbitrage, the fair prices of European call and put options priced at time t with maturity on T can be expressed as
respectively, where K denotes the strike price of the option, is the risk-free rate for the time period , and denotes the risk-neutral density of the underlying asset price at the expiry time T. As shown in Breeden and Litzenberger (1978), the second derivative of a European call price over the strikes can be used to infer the risk-neutral density through
In the literature, option-implied approaches to restore risk-neutral distributions can be categorized into two classes: parametric methods and nonparametric methods. Parametric methods assume an explicit form of the risk-neutral density , and then leverage (1) and (2) to fit the option prices and estimate the parameters in . Nonparametric methods allow more flexibility in the representation of and largely rely on data to decide the form of . Jackwerth (1999) and Bondarenko (2003) reviewed the approaches in both categories. More surveys over existing methods can be found in Bliss and Panigirtzoglou (2002), Markose and Alentorn (2005), Figlewski (2018), and Li et al. (2024).
Black and Scholes (1973) suggested that assuming the stock price process is governed by a generic Brownian motion, the stock price can be explicitly represented by
where denotes the standard Brownian motion, denotes the volatility of the stock price , and r denotes the risk-free rate. Equation (4) actually indicates that follows a two-parameter log-normal distribution with constant volatility, which conflicts with the well-known “volatility smile” (Hull 2022). This lack of flexibility leads to the problem of under-fitting when estimating the risk-neutral density using observed option prices. To address this issue, more flexible models have been developed (see, e.g., Kumar et al. (2023) and references therein).
Parametric methods have the advantage of parsimony, as they typically rely on only a few parameters to characterize the distribution law, including lognormal distribution (Jarrow and Rudd 1982), a mixture of lognormals (Melick and Thomas 1997; Ritchey 1990), normal inverse Gaussian distribution (Eriksson et al. 2009), generalized extreme value distribution (Markose and Alentorn 2011), and a finite lognormal-Weibull mixture distribution (Li et al. 2024). However, the success of parametric methods largely depends on the accuracy of the assumed parametric model. Additionally, the flexibility of a parametric method is limited by its underlying model (Figlewski 2018; Jiang et al. 2021). For example, one major issue with the parametric approach based on normal inverse Gaussian (NIG) distributions (Eriksson et al. 2009) is that the estimated skewness and kurtosis pairs often fall outside the feasible domain as time-to-maturity increases (Ghysels and Wang 2014; Jiang et al. 2021).
Nonparametric methods impose fewer restrictions on the functional form of the risk-neutral density compared to parametric methods. Being primarily data-driven, they can be categorized into three classes: kernel methods, maximum entropy methods, and curve-fitting methods.
Kernel methods are nonparametric regression approaches that fit the target function using a weighted average. As shown in Aït-Sahalia and Lo (1998), kernel methods are used to estimate the option pricing formula as follows:
where is the observed option price at , which is a five-dimensional vector describing the option’s characteristics including stock price, strike price, time to expiration, interest rate, and dividend yield, and is a white noise. Given the smoothness assumption of , the estimated is
where is the kernel function, and h is the bandwidth, both of which determine the size of the neighborhood around included in computing the function average. Related studies can also be found in Aït-Sahalia and Lo (2000).
Maximum Entropy methods follow the Bayesian framework by treating the risk-neutral distribution as a posterior density with some predetermined prior. Buchen and Kelly (1996) utilized uniform and lognormal distributions as prior distributions, while Stutzer (1996) selected historical risk-neutral distributions of the underlying asset as priors.
Curve-fitting methods refer to fitting a non-parametric function, such as polynomials and splines, to the risk-neutral density itself or the implied volatility. There are many variants in this category. Shimko (1993) first fitted the implied volatility curve with a quadratic polynomial with the strike price as its variable. The fitted implied volatility can be used to calculate the option prices and then (3) from Breeden and Litzenberger (1978) can be readily applied to derive the corresponding risk-neutral density. Malz (1997) updated Shimko’s method by fitting the implied volatility against option deltas with a same-order polynomial. Campa et al. (1998) used a smoothing spline to fit the implied volatility against strike price to allow more control over the smoothness of the resulting density. Bliss and Panigirtzoglou (2002) followed the methods from Malz (1997) and Campa et al. (1998) and examined the stability of the resulting risk-neutral density. Since there are no direct constraints on the implied risk-neutral density, one needs to check its unity integration and non-negativity. Besides fitting the implied volatility first, one can fit the risk-neutral density itself. Monteiro et al. (2008) used cubic splines to fit the risk-neutral distribution directly. They placed the knots as a superset of the option strikes to achieve the desired fitting flexibility. Lee (2014) proposed a uniform quartic B-spline method with power tails to model the risk-neutral density (RND). A feasibility issue raised by their filtering procedures (Lee 2014; Monteiro et al. 2008) is that fewer options are available for estimating the RND as time-to-maturity increases (Jiang et al. 2021). To address this issue, Jiang et al. (2021) proposed a more robust nonparametric approach that uses all the available options to fit a piece-wise constant function as the estimated RND.
There are other nonparametric methods, such as implied binomial trees developed by Rubinstein (1994) and Jackwerth and Rubinstein (1996), which focus on modeling the price evolution of the underlying asset. Overall, non-parametric methods are data-driven and should be controlled by over-fitting.
In this paper, we propose a nonparametric natural cubic spline (NCS) approach to model the risk-neutral density directly. It falls into the category of using spline functions to fit the risk-neutral density itself. The estimated risk-neutral density is twice continuously differentiable with stable linear tails at both ends. Compared with commonly used parametric and nonparametric models that also produce continuous risk-neutral densities, the NCS method demonstrates superior performances in terms of optimization feasibility. In contrast to the recent piece-wise constant approach (Jiang et al. 2021), which yields discontinuous implied densities, NCS can generate more robust risk-neutral densities, especially for options with expirations longer than six months.
2. Materials and Methods
In this section, we first examine relevant historical approaches in depth, including the cubic spline approach in Monteiro et al. (2008), the uniform quartic B-spline approach in Lee (2014), and the piece-wise constant approach in Jiang et al. (2021). We then introduce our natural cubic spline (NCS) approach.
We consider a cross-section of European options priced at t with maturity T and strike price . We let the range be divided into continuous sub-intervals by a set of points with , where are known as the knots. A spline function defined on can be represented by order-m polynomials on the sub-intervals, respectively. Based on the order of the piecewise polynomials, the spline functions can be categorized into piecewise constant splines (), piecewise linear splines (), cubic splines (), etc.
2.1. A Cubic Spline Approach
A cubic spline approach has been proposed by Monteiro et al. (2008) for capturing the risk-neutral density using a set of N equidistant knots , such that . The cubic polynomial component defined on interval is
By letting denote the collection of parameters and be the cubic spline defined on , there are a total of parameters to be determined. As suggested by Monteiro et al. (2008), the spline knots are placed as a superset of existing strike prices to achieve fitting flexibility. To remove the options that may violate the non-arbitrage assumption, they first generate call options from the observed put options through the put-call parity. If a call option with the same strike already exists, they retain the one with a higher volume and discard the options that violate monotonicity and strict convexity, which may lead to fewer options, especially with a longer expiration.
The estimated price of a call option with a strike price is
which shows that the estimated call option price is a linear combination of the parameters. A similar formula for put option prices can be derived. To determine the parameters, one may minimize the following least squares loss function
under the constraints that ensure the resulting density is non-negative, twice continuously differentiable, and integrates to 1, where and denote the observed market prices of call and put options, respectively, with strike price .
Monteiro et al. (2008) assert that more knots than existing strike prices are needed to ensure fitting flexibility with observed market option prices. They also suggest using “artificial” call options generated from the put-call parity to eliminate arbitrage opportunities that may cause any market information loss. We demonstrate later that our proposed natural cubic spline method achieves better fitting results using fewer knots.
2.2. Uniform Quartic B-Spline Approach
Lee (2014) utilized quartic B-spline functions defined on equidistant knots for estimating risk-neutral distribution functions, called the uniform quartic B-spline method. A quartic B-spline refers to a spline where the highest polynomial degree is and the first three derivatives of spline functions are continuous within the boundary knots. The format of the uniform quartic B-spline basis functions can be expressed as follows:
where , , and is the indicator function of .
For European options priced at t with maturity T and strike price , the risk-neutral cumulative distribution function (CDF) within the range can be represented by the linear combinations of . The parts of the CDF beyond the range of strike prices, as suggested by Lee (2014), can be represented by power tails. The resulting risk-neutral CDF is represented by
and the estimated price of a call option with a strike price K can be expressed as
where , , and is the annualized risk-free interest rate for period . Equation (5) is linear in the basis function coefficients but non-linear in the power tail parameters , which requires extra computational efforts to find the optimal parameters. A pricing formula for put options can be derived similarly.
As suggested by Lee (2014), only the out-of-the-money (OTM) options are used to estimate the parameters, since OTM options are more liquid than in-the-money (ITM) options. The least square loss combined with a penalty of smoothness is used as the objective function as follows:
where denotes the observed OTM option price with strike price K, and denotes the estimated OTM option price with strike price K. Then is minimized with the constraints that ensure the resulting implied density is non-negative, twice continuously differentiable, and integrating to 1.
Since is highly nonlinear, it is difficult to determine the tail parameter and the B-spline basis function coefficients at the same time. On the other hand, the uniform B-splines are defined on a set of equidistant knots, which do not necessarily coincide with the existing strike prices. Lee (2014) proposed to select the minimum number of knots, such that all OTM option prices estimated from the fitted B-spline model fall between their bid and ask quotes. This procedure is computationally intensive since it iteratively fits the B-spline model till the optimal number of knots is achieved. In cases where no estimated OTM option prices fall between their bid and ask quotes, this uniform B-spline approach is infeasible.
2.3. A Piece-Wise Constant Approach
Recently, Jiang et al. (2021) proposed a piecewise constant (PC) approach to fit the risk-neutral density (RND). If we denote as the distinct strike prices available in the market, the PC risk-neutral density function takes the forms of
and zero elsewhere, where and , with some predetermined constant . By choosing all available distinct strike prices as knots and fitting a constant line segment between each pair of adjacent knots, this approach is computationally simple and can provide fairly accurate estimates of option prices given the strike prices. Nevertheless, its main drawback is that the resulting density is discontinuous at knots, whereas continuity is a desired property of risk-neutral densities.
2.4. The Proposed Natural Cubic Spline Approach
In this section, we propose a new natural cubic spline approach for estimating the risk-neutral density.
We let denote the underlying asset or security price at time t, and assume that the logarithm of the future security price at time T (denoted by ) follows a risk-neutral distribution whose density can be characterized by a natural cubic spline function (see, e.g., Hastie et al. (2009)). Two desired properties of natural cubic splines are that they are twice continuous differentiability across the support and exhibit linear tails beyond the boundary knots. The first property guarantees the desired smoothness of the resulting risk-neutral density. Given that we only have option information within the strike range, we follow the original natural cubic spline method and assume linear tails beyond the boundaries to reduce the variance in the estimated cubic splines near the boundaries (Hastie et al. 2009).
More specifically, we adopt the following natural cubic spline basis functions as suggested by Hastie et al. (2009) to represent the risk-neutral density:
where K denotes the number of knots, ,
and are the K knots.
Assume that there are p traded European call options and q traded European put options in the market at time t with maturity T. We extract all the distinct strike prices and sort them as . Since for a given strike price, there usually exists a pair of put and call options, then , we take the logarithm of those strike prices and denote them as with , . We deliberately set those M distinct log strike prices as the knots of the natural cubic splines. The risk-neutral density of given all the information at time t can be written as
where is twice differentiable within and linear outside. There are three possible outcomes of the slopes of the linear tails, for example, the left tail:
- If the left tail has a negative slope, the density will not shrink to 0 as the value of goes to . This conflicts with the fact that the integral of a risk-neutral density is 1.
- If the left tail has a positive slope, we should restrict it to the nonnegative region; otherwise, the resulting density may take negative values.
- If the left tail has a zero slope, we must ensure the tails remain nonnegative.
To ensure that the tail parts of the density function are nonnegative and converge to zero with extreme strike prices, we extend the support of to , where and are some predetermined positive values. That is, we add the following constraints:
As long as and , (7) restricts the slope of the left tail (or right tail) to nonnegative (or nonpositive) values. For cases where the slope of the left tail (or right tail) is positive (or negative), it ensures the tail parts of the density are nonnegative as well.
We add further restrictions to ensure the non-negativity of within the boundary strike prices. We let at each knot value:
Since represents a density, the integration of over its support must be 1. We let denote the support of with and . Then,
Using (6), the estimated call and put option prices can be explicitly written as
where , and are the basis function coefficients. Equations (10) and (11) indicate that the estimated option prices are linear in .
To solve for , we minimize the following least square loss function
subject to constraints (7) to (9), where
is the vector of non-discounted market option prices, are strike prices of market put options, are strike prices of market call options, and
is the optimization matrix of dimension . Each integration of the natural cubic spline basis functions can be written in the following explicit form. Firstly, we rewrite as
For the elements in the first two columns of , we have
where , and . For the elements in the third column and beyond,
where , , and
Similarly, we have
where , , and
The least square loss function (12) is essentially the sum of squared differences between the estimated option prices and the market prices. Since ITM options are more expensive than OTM options, the least square loss tends to favor more accurate estimates of ITM option prices. However, OTM options typically contain more market information due to their high liquidity, making them critical for characterizing the risk-neutral density. Therefore, it is beneficial to have a loss function that focuses more on accurately pricing OTM options.
Following Jiang et al. (2021), we define the weighted least square loss function as follows:
where is called the weight matrix. The weighted least square loss function is essentially the sum of squared relative differences between the estimated option prices and their market prices. Since OTM options are typically cheaper, their squared relative differences are usually larger. Thus, the loss function (13) focuses more on accurately pricing OTM options. The constrained optimization problems of (12) and (13) can readily be solved by using numerical optimization software, such as R, a free software environment for statistical computing and graphics (https://www.r-project.org/). In this study, we utilize the R package lsei (Wang et al. 2020).
3. Consistency of the Natural Cubic Spline Approach
In this section, we demonstrate the consistency property of our proposed natural cubic spline method under certain assumptions, which is a desired property of a risk-neutral density estimator.
3.1. Consistency Property of the Cubic Spline Function of Interpolation
In this section, we follow Ahlberg et al. (1967) and introduce the consistency property of the cubic spline function of interpolation.
Given a and b such that , we consider a set of mesh points
with , . Suppose a set of corresponding values is provided. The goal is to find a cubic spline defined on with knots , such that, , .
We let denote the second-order derivative of at knot , that is, . Since is linear between every two adjacent knots, we have
By integrating (14) twice, we obtain for ,
Equation (15) shows that the cubic spline can be characterized by . To determine , we show in detail how the fact that a cubic spline has the first two derivatives on every interior knot can be leveraged and what conditions need to be added.
It can be readily verified that (14) and (15) are continuous through the interval . Nevertheless, additional conditions need to be imposed to ensure the knot-continuity of . At any given interior knot , we have
By letting (16) be equal to (17) for each interior knot, we obtain
where . So far we have equations but quantities to solve. We need two more end conditions to determine . We classify three kinds of end conditions as follows:
At the left boundary ,
and at the right boundary ,
where , , , denote the prescribed first and second-order derivatives of the cubic spline function at the boundary knots, and (i), and (ii) are special cases of (iii), respectively.
We rewrite (18) in the similar form of (19) (iii) and (20) (iii) as
where , , and . Then, we combine (21), (19) (iii), and (20) (iii) into the following system of linear equations,
Then, the cubic spline function can be characterized by solving (22).
Now we consider a continuous function on and a sequence of mesh points
with , , , , , and . Ahlberg et al. (1967) proved the consistency property of the cubic spline under the following restriction on :
where , and is some finite positive number. We quote Theorem 2.3.1 of Ahlberg et al. (1967) as follows:
3.2. Consistency Property of the Proposed Natural Cubic Spline Approach
Following the results in Theorem 1, we derive our own lemma as follows, whose proof, as well as other proofs, are relegated to Appendix A.
Lemma 1.
Let be a probability density function, which is continuous and bounded on . Given any , let be a sequence of equidistant meshes defined on with . Then, given any and , there exists an , such that for any with , there exists a natural cubic spline defined on , satisfying that
- (i)
- qualifies the conditions for in Theorem 1 and thus ;
- (ii)
- is bounded by some value , which does not depend on ϵ and
- (iii)
- for all .
In the rest of this section, we will show the consistency property of the proposed natural cubic spline (NCS) method when the true risk-neutral density (tRND) is defined on .
We let denote the tRND of conditioned on the information available at time t. To explore the consistency property theoretically, we make the following two assumptions:
- (i)
- is continuous and bounded on , and satisfies .
- (ii)
- The strike prices of put options available in the market are bounded by .
To justify Assumption (i), continuity is a desired property of risk-neutral densities. Based on the pricing formula of call options with strike price K, we have
Since the call option price should not be infinite given a finite K, we infer that . On the other hand, . Therefore, .
Although a probability density function does not have to be bounded, such as a beta distribution with both parameters less than one, an unbounded probability density function can always be approximated by a sequence of bounded density functions. Therefore, it is reasonable to assume that the risk-neutral density is bounded in practice.
As for Assumption (ii), to see why strike prices of put options should be bounded, we first check the limit of the price of a put option as its strike :
A put option grants its owner the right to sell the underlying asset at the strike price K. If the strike price approaches infinity, the put option premium would also become infinite, implying that the owner would need to buy the option with an infinite amount of money and execute it for an infinite amount, which is not practical in financial markets. It is reasonable in practice to assume that the strike prices of put options available in the market are bounded by some positive number .
To show the consistency property of our NCS method in restoring the risk-neutral density, we consider a set of M distinct strike prices available in the market such as
We assume that as , we have both and . For simplicity of notations, we assume that the log strike prices are added equidistantly, and the distance between adjacent log strike prices shrinks to zero as . Nevertheless, a similar conclusion holds as long as the maximum distance between adjacent log strike prices shrinks to zero.
We let denote a natural cubic spline with knots for estimating . Under the no-arbitrage assumption, the estimated put and call option prices with strike price are as follows:
We define the following least square loss function
where denotes the total number of distinct strike prices of put options bounded by , and and are fair prices of put and call options with strike price based on the true risk-neutral density , respectively. We assume that increases as M increases but . Then, the optimal natural cubic spline estimate of is defined as follows:
In Theorem 2 below we show the consistency property of our NCS method when the tRND is defined on .
Theorem 2.
Suppose that the true risk-neutral density defined on is continuous, bounded, and satisfies . Let Δ denote the set of M equidistant log strike prices as defined in (24), such that, , , and , as . Let denote the optimal natural cubic spline estimate of as defined in (25). Then, as M goes to infinity, we have .
Note that at the beginning of this section, we ignore the constraints of non-negativity and unity integration when we set as in (25), where can be any natural cubic spline estimate of the risk-neutral density. Nevertheless, it can be verified that the natural cubic spline function constructed from Lemma 1 is non-negative and integrated to 1 asymptotically as . Actually, according to Lemma 1, for any fixed a and b, as . Therefore, asymptotically for any . On the other hand, since a can be arbitrarily small and b can be arbitrarily large, . In conclusion, there at least exists a natural cubic spline estimate of the tRND which is convergent and satisfies the constraints of non-negativity and unity integration asymptotically (see Section 4.3 for a relevant simulation study).
4. Results
4.1. Data Preparation
Following Jiang et al. (2021), this study considers European options written on the S&P 500 indices and continuously compounded zero-coupon interest rates for the period from 2 January 1996 to 31 August 2015 in the United States. We employ an interpolation method to calculate the risk-free rates for the time periods matching the observations and expiration dates of the options.
The data-cleaning procedure is as follows: We first filter out the options with less than a week to maturity. We then eliminate those options whose trading volume and bid prices are zero. Options expiring in less than a week usually contain too much noise in their listing prices and can hardly be used to capture the true risk-neutral density. We will explain the motivation behind the second filtration in the next paragraphs.
We favor liquid options when fitting the risk-neutral density due to the fact that they usually contain more market information. Options with higher trading volume are generally considered more liquid. We draw the relation between trading volume and strike prices in Figure 1.
Figure 1.
Trading volume vs. strike price for options traded on 18 August 2015 and expired between 18 September 2015 and 18 October 2015 (the S&P 500 index price traded on 18 August 2015 was $2096.92).
It is clear that options with strike prices close to the S&P 500 index price have a higher trading volume, while options whose strike prices are far from the index price have little or even zero trading volume. In Jiang et al. (2021), only options with a positive volume were kept and 70% of the raw data were discarded. In this study, we want our implied risk-neutral densities to reflect extreme market events and the options with extreme strike prices are sufficiently important in estimating the tail behaviors of the implied densities. Therefore, in this study, we keep the options with zero trading volume and only eliminate the options whose trading volume and bid price are zero.
There are 7,385,062 options in our raw data; after the trading volume and bid price filtration, there are 6,854,864 options left. By keeping options that have more than seven days to maturity, we have 6,500,692 options in our cleaned dataset.
4.2. Empirical Study Regarding NCS and Other Relevant Approaches in the Literature
In this section, we use the European option prices on the S&P 500 index described in Section 4.1 to compare the performance of different methods for estimating the RND. We primarily focus on the nonparametric methods described in Section 2. For comparison, following Jiang et al. (2021), we also include one parametric approach, namely the normal inverse Gaussian (NIG) model. According to the simulation studies by Eriksson et al. (2009), if the stock price and its volatility are generated via a Heston model (Heston 1993), a commonly used stochastic volatility model for stock price (Hainaut 2022), the NIG model performs well in approximating the RND.
We follow the idea of Lee (2014) by using only OTM options to fit our NCS model since they are more liquid and thus more informative. However, if the purpose of the implied density is to price complex derivatives, it makes more sense to use all available options to fit the model. This strategy allows the resulting density to incorporate more information about the investors’ expectations of the underlying asset in the market.
As suggested by Lee (2014), OTM options are generally more liquid than ITM options and, therefore, are more commonly used for estimating the RND. Following Lee (2014) and Jiang et al. (2021), we treat the observed OTM option prices as the training data to estimate the RND and use the ITM option prices as the testing data to evaluate the performance of the estimated RND. Since the training and testing data are different in nature (due to moneyness), this type of partition is known as a natural partition (see, e.g., Li and Yang (2022)). The corresponding cross-validation procedure also tests the robustness of the implied density. It should be noted that another type of partition, namely random partitions for K-fold cross-validation, is more commonly used for estimating the prediction errors of methods in many other applications (see, e.g., Hastie et al. (2009)).
When fitting the NCS model with OTM options, we need to take care of the boundary conditions. For example, suppose there are five distinct strike prices and the distinct OTM option strike prices are . When deciding the support of the implied density, we use instead of . The values of and are determined by numerical validations.
We numerically examine the performance of the proposed natural cubic spline method (NCS), as well as the normal inverse Gaussian (NIG, Eriksson et al. (2009)), piece-wise constant (PC, Jiang et al. (2021)), uniform quartic B-spline (UQB, Lee (2014)), and cubic spline (CS, Monteiro et al. (2008)) methods in the literature. Following Jiang et al. (2021), we select and sort all available market options into seven categories based on their expiration time, namely, days, days, days, days, days, days, and days. We randomly select 200 cross-sections of options from each expiry category, use the same cross-sections of the data to fit the corresponding model, and then compare their performance in option pricing.
We follow Ghysels and Wang (2014) and define the absolute pricing error and the relative pricing error as follows:
where denotes the estimated price of the option, and denotes the observed market price of the option. From the investment point of view, both the absolute error and the relative error are important when pricing options.
The results are listed in Table 1 and Table 2. For example, “PC (all+ls)”in “Method” indicates that the PC model is fitted using all the available options and the least square loss function (12), while “NCS (otm+wls)” means that the NCS model is fitted with OTM options only and the weighted least square loss function (13). Columns labeled by or are average pricing errors across 200 randomly selected cross-sections in the corresponding expiry category, and column “200” provides the number out of 200 cross-sections where a risk-neutral density is successfully estimated. Rows labeled by “ITM” (or “OTM”) indicate that the predictions are applied to ITM (or OTM) options only. From Table 1 and Table 2, we can draw the following conclusions:
Table 1.
Comparison of five approaches—part 1.
Table 2.
Comparison of five approaches—part 2.
- (i)
- PC and NCS outperform other methods in terms of computational feasibility. PC and NCS can recover risk-neutral densities for more than 99% of cross-sections from each expiry range. NIG can recover risk-neutral densities for at most 86% of cross-sections. As the expiry time becomes longer than six months, less than 10% of the cross-sections can be used to obtain a feasible NIG model. The reason is that the NIG model has an issue of feasible domain coverage, which becomes more severe as the expiry time increases. UQB and CS have a similar problem when recovering risk-neutral densities, especially for cross-sections longer than six months. UQB becomes infeasible when we cannot find the optimal number of knots that satisfy the condition that all the estimated OTM option prices fall into their bid-ask quotes. The reason for CS becoming infeasible is the lsei package (Wang et al. 2020) used in R software is not capable of finding the solutions for some constrained quadratic programming problems.
- (ii)
- When comparing our NCS method with the continuous historical methods, the NCS model fitted with OTM options outperforms its competitors in the average pricing accuracy of both OTM and ITM options, especially for the expiry range where the optimization feasibility is comparable.
- (iii)
- For comparison between PC and NCS in terms of option pricing accuracy, when we use OTM options to fit the risk-neutral density and evaluate the model with weighted least square loss function, that is, OTM+wls, NCS does a much better job in pricing the ITM options for cross-sections with an expiration longer than six months. In other words, for cross-sections longer than six months, the risk-neutral densities restored by NCS are more robust than PC’s.
- (iv)
- PC fits a constant line segment between every two adjacent knots, and the resulting density is discontinuous throughout the entire domain. On the contrary, NCS can construct implied densities that are desirably twice continuously differentiable. We further illustrate this difference by plotting the risk-neutral densities produced by NCS and PC in Figure 2 and Figure 3 using all the OTM options priced on 25 September 2009 with 187 days to expire.
Figure 2. RND with NCS method.
Figure 3. RND with PC method.
4.3. Numerical Justification on Consistency of NCS
In this section, we run a simulation study to verify the consistency property of the NCS method, assuming that the true RND follows a normal inverse Gaussian distribution. First, we use options observed on 3 December 2014 with maturity on 2 January 2015 to fit the normal inverse Gaussian model. Next, we generate logarithmically equidistant strike prices and their corresponding put and call option prices. Then, we apply our NCS method and calculate the value of the loss function . As we increase the number of distinct strike prices and decrease the distance between adjacent log strike prices, we expect to see a decrease in the loss function.
Table 3 shows that as the number of distinct strike prices increases and the distance between adjacent log strikes decreases, the value of the loss function decreases to zero.
Table 3.
Simulation results for different sets of strike prices.
For the four different sets of strike prices in Table 3, we plot the fitted NCS model against the true risk-neutral density (“org”) in Figure 4. The results show that the fitted natural cubic spline density converges to the true density as the number of distinct strikes increases, achieving non-negativity and unity integration asymptotically.
Figure 4.
Simulated risk-neutral density plot.
5. Discussion
Risk-neutral density can be characterized using market options. In Section 2.4, we propose a nonparametric natural cubic spline (NCS) approach to model the risk-neutral density through available market options. Unlike the piece-wise constant approach, the implied density from the natural cubic spline is twice continuously differentiable with stable linear tails at both ends (see Figure 2).
According to our empirical study conducted in Section 4.2, when comparing the NCS method with the normal inverse Gaussian method, uniform quartic B-spline method, and cubic spline method, which also generate continuous risk-neutral densities, the NCS method demonstrates superior optimization feasibility. Actually, the NCS approach can recover risk-neutral densities from more than 99% of the cross-sections of options regardless of their expiry time, while the other approaches can recover risk-neutral densities from at most 86% of the cross-sections of options (see Table 1 and Table 2). Additionally, when fitted with OTM options, the NCS method outperforms its competitors in pricing both OTM and ITM options, especially for the expiry range where the optimization feasibility is comparable.
When comparing the NCS method with the piece-wise constant approach which can only produce discontinuous implied densities (see Figure 3), the NCS method can generate more robust risk-neutral densities for options with an expiration longer than six months.
In Section 3, we prove the consistency of the NCS approach. Under reasonable assumptions about the true risk-neutral distribution, we show that as the range of option strike prices approaches and the difference between adjacent log strike prices shrinks to 0, the fair prices of options based on the estimated risk-neutral density converge to their market values on average. For simplifying the notation, we utilize equidistant log strike prices, while the key property of meshes is actually . The consistency property is also verified numerically in Section 4.3.
As another important application in practice, the estimated RND can be used to detect potential arbitrage opportunities arising from market prices of options that temporarily violate put-call parity, monotonicity, or strict convexity (Monteiro et al. 2008). As illustrated in Jiang et al. (2021), one can use leave-one-out cross-validation to check if any market prices of options are significantly higher or lower than the fair price provided by the estimated RND, which may indicate profitable opportunities. To capture such opportunities in real-time, one needs high-speed internet, efficient algorithms, and high-performance computers (Kumar and Kumar 2023; Monteiro and Santos 2023).
The proposed method for estimating the RND is applicable only for a given option expiry date. In the literature, many efforts have been made to estimate option price surfaces and state-price densities under no-arbitrage constraints across both strikes and expiry dates. We refer readers to Fengler and Hin (2015), Kundu et al. (2024), and references therein.
Author Contributions
Conceptualization, S.Z., L.J., F.W. and J.Y.; methodology, S.Z., K.L. and J.Y.; software, S.Z. and L.J.; validation, S.Z.; formal analysis, S.Z.; investigation, S.Z. and J.Y.; resources, F.W. and J.Y.; data curation, S.Z. and L.J.; writing—original draft preparation, S.Z. and J.Y.; writing—review and editing, S.Z., L.J., K.L., F.W. and J.Y.; visualization, S.Z.; supervision, J.Y.; project administration, J.Y. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data used in this research were obtained from OptionMetrics through Wharton Research Data Services (https://wrds-www.wharton.upenn.edu/pages/about/data-vendors/optionmetrics/, accessed on 31 August 2015). These data are not publicly available due to licensing agreements. Researchers interested in accessing similar data can contact Wharton Research Data Services for access details.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| CS | Cubic Spline |
| ITM | In The Money |
| LS | Least Squares |
| NCS | Natural Cubic Spline |
| NIG | Normal Inverse Gaussian |
| OTM | Out of The Money |
| PC | Piece-wise Constant |
| RND | Risk-neutral Density |
| UQB | Uniform Quartic B-spline |
| WLS | Weighted Least Square |
Appendix A. Proofs
Proof of Lemma 1.
Given , and , to construct a satisfying the three conditions in Lemma 1, we let , , such that
- (i)
- To simplify the notation, we assume that is fixed such that , , and when k increases, increases, but remains the same value and a knot; similarly, is fixed such that , , and when k increases, increases, but remains the same value and a knot.
- (ii)
- We define a set of predetermined ordinates , , ⋯, , such that, , . For , we require . We will discuss later what value that should take if .
- (iii)
The three specifications above guarantee that the cubic spline of interpolation defined on satisfies the first condition of Lemma 1, as long as .
To extend the natural cubic spline from to such that for all , we connect the following function
with at and with the end conditions at and :
where is the second derivative of at the knot value . We deliberately set so (A1) (i) is satisfied. Given and in the aforementioned specification (iii), (19) (iii) and (20) (iii) can be used to infer that , therefore, (A1) (iii) is also satisfied. We then show in detail what conditions on the value of for should be set, such that, (A1) (ii) can be satisfied, and the resulting natural cubic spline is bounded by some positive value M independent of .
For simplicity of notations, we drop k in the footnote for the rest of the proof.
As derived in Ahlberg et al. (1967), when the mesh of points is equidistant, the second derivative of at any interior knot can be represented by the predetermined ordinates and the inverse of the coefficient matrix in (22):
where is the th entry of the inverse of the coefficient matrix in (22) with m replaced by N, and h denotes the distance between every two adjacent knots. From (16) and (17), we have
To satisfy the condition of (A1) (ii), we need to ensure . We explicitly set and plug the value of , , , into (A2) to obtain
We obtain the following solutions to (A5),
As derived by Ahlberg et al. (1967), when the mesh of points is equidistant, we have
where . We further conclude that
For where , we have , such a is bounded as long as is bounded. We set the bound to be M. We also set the value of to be some number bounded by M for . We show that and determined by (A8) are also bounded by M.
Actually, as specified at the beginning of the proof, and while and . We infer that for any sequence denoted by , there are at least 7 knots, that is, . For , we have the following inequalities:
From (A9) we derive
If N is even,
If N is odd,
Then,
Since , we have when N is even, and when N is odd. Furthermore, when N is even we have
and when N is odd, we have
In a similar manner, we obtain
Furthermore, we have
After all, (A14) and (A15) complete the proof and the resulting natural cubic spline is bounded by M which does not depend on . □
Proof of Theorem 2.
Since is bounded, we have throughout for some fixed positive number . Because is continuous, for any , there exist , such that
We let
According to Lemma 1, there exists an , such that as long as , there exists a natural cubic spline defined on satisfying , for and . According to the proof of Lemma 1, is bounded by as well.
We will show that for each individual strike price , the squared residual of the estimated option price is bounded by , for some fixed positive Q.
For the put option with the strike price , we denote I as the squared residual as follows:
where is a fixed and bounded number. In later calculations, we treat such a positive constant as 1 to simplify the notation. Now we examine the value of I under four different scenarios according to the range of :
- (i)
- If , we have . Since ,
- (ii)
- If , then and . Since and for all , then
- (iii)
- If , then and .
- (iv)
- Since , there are no put options with log strike prices greater than b.
For the call option with the strike price , we denote as the squared residual after ignoring the positive constant :
There are five different scenarios for according to the range of :
- (i)
- If , we have for all and
- (ii)
- If , since , we have
- (iii)
- If , since for all , we have
- (iv)
- If , since , we have
- (v)
- If , since , then
References
- Ahlberg, J. Harold, Edwin N. Nilson, and Joseph L. Walsh. 1967. The Theory of Splines and Their Applications. New York and London: Academic Press. [Google Scholar]
- Aït-Sahalia, Yacine, and Andrew W. Lo. 1998. Nonparametric estimation of state-price densities implicit in financial asset prices. Journal of Finance 53: 499–547. [Google Scholar] [CrossRef]
- Aït-Sahalia, Yacine, and Andrew W. Lo. 2000. Nonparametric risk management and implied risk aversion. Journal of Econometrics 94: 9–51. [Google Scholar] [CrossRef]
- Björk, Tomas. 2009. Arbitrage Theory in Continuous Time. New York: Oxford University Press. [Google Scholar]
- Black, Fischer, and Myron Scholes. 1973. The pricing of options and corporate liabilities. Journal of Political Economy 81: 637–54. [Google Scholar] [CrossRef]
- Bliss, Robert R., and Nikolaos Panigirtzoglou. 2002. Testing the stability of implied probability density functions. Journal of Banking & Finance 26: 381–422. [Google Scholar]
- Bondarenko, Oleg. 2003. Estimation of risk-neutral densities using positive convolution approximation. Journal of Econometrics 116: 85–112. [Google Scholar] [CrossRef]
- Breeden, Douglas T., and Robert H. Litzenberger. 1978. Prices of state-contingent claims implicit in option prices. Journal of Business 51: 621–51. [Google Scholar] [CrossRef]
- Buchen, Peter W., and Michael Kelly. 1996. The maximum entropy distribution of an asset inferred from option prices. Journal of Financial and Quantitative Analysis 31: 143–59. [Google Scholar] [CrossRef]
- Campa, José M., P. H. Kevin Chang, and Robert L. Reider. 1998. Implied exchange rate distributions: Evidence from otc option markets. Journal of International Money and Finance 17: 117–60. [Google Scholar] [CrossRef]
- Cox, John C., and Stephen A. Ross. 1976. The valuation of options for alternative stochastic processes. Journal of Financial Economics 3: 145–66. [Google Scholar] [CrossRef]
- Eriksson, Anders, Eric Ghysels, and Fangfang Wang. 2009. The normal inverse gaussian distribution and the pricing of derivatives. Journal of Derivatives 16: 23. [Google Scholar] [CrossRef]
- Fengler, Matthias R., and Lin-Yee Hin. 2015. Semi-nonparametric estimation of the call-option price surface under strike and time-to-expiry no-arbitrage constraints. Journal of Econometrics 184: 242–61. [Google Scholar] [CrossRef]
- Figlewski, Stephen. 2018. Risk-neutral densities: A review. Annual Review of Financial Economics 10: 329–59. [Google Scholar] [CrossRef]
- Ghysels, Eric, and Fangfang Wang. 2014. Moment-implied densities: Properties and applications. Journal of Business & Economic Statistics 32: 88–111. [Google Scholar]
- Grünbichler, Andreas, and Francis A. Longstaff. 1996. Valuing futures and options on volatility. Journal of Banking & Finance 20: 985–1001. [Google Scholar]
- Hainaut, Donatien. 2022. Continuous Time Processes for Finance: Switching, Self-Exciting, Fractional and Other Recent Dynamics. Cham: Springer Nature. [Google Scholar]
- Hastie, Trevor, Robert Tibshirani, and Jerome H. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. New York: Springer. [Google Scholar]
- Heston, Steven L. 1993. A closed-form solution for options with stochastic volatility with applications to bond and currency options. The Review of Financial Studies 6: 327–43. [Google Scholar] [CrossRef]
- Hull, John C. 2022. Options, Futures, and Other Derivatives, 11th ed. New York: Pearson Education. [Google Scholar]
- Jackwerth, Jens Carsten. 1999. Option implied risk-neutral distributions and implied binomial trees: A literature review. Journal of Derivatives 7: 66–82. [Google Scholar] [CrossRef]
- Jackwerth, Jens Carsten, and Mark Rubinstein. 1996. Recovering probability distributions from option prices. The Journal of Finance 51: 1611–31. [Google Scholar] [CrossRef]
- Jarrow, Robert, and Andrew Rudd. 1982. Approximate option valuation for arbitrary stochastic processes. Journal of Financial Economics 10: 347–69. [Google Scholar] [CrossRef]
- Jiang, Liyuan, Shuang Zhou, Keren Li, Fangfang Wang, and Jie Yang. 2021. A new nonparametric estimate of the risk-neutral density with applications to variance swaps. Frontiers in Applied Mathematics and Statistics 6: 611878. [Google Scholar] [CrossRef]
- Kumar, Abhimanyu, and Sumit Kumar. 2023. Novel computational technique for the direct estimation of risk-neutral density using call price data quotes. Computational and Applied Mathematics 42: 270. [Google Scholar] [CrossRef]
- Kumar, Sudarshan, Sobhesh Kumar Agarwalla, Jayanth R. Varma, and Vineet Virmani. 2023. Harvesting the volatility smile in a large emerging market: A dynamic nelson-siegel approach. Journal of Futures Markets 43: 1615–44. [Google Scholar] [CrossRef]
- Kundu, Arindam, Sumit Kumar, and Nutan Kumar Tomar. 2024. A semi-closed form approximation of arbitrage-free call option price surface. Computational Economics 63: 1431–57. [Google Scholar] [CrossRef]
- Lee, Seung Hwan. 2008. Three Essays on Estimation of Risk Neutral Measures Using Option Pricing Models. Ph.D. thesis, The Ohio State University, Columbus, OH, USA. [Google Scholar]
- Lee, Seung Hwan. 2014. Estimation of risk-neutral measures using quartic b-spline cumulative distribution functions with power tails. Quantitative Finance 14: 1857–79. [Google Scholar] [CrossRef]
- Li, Keren, and Jie Yang. 2022. Score-matching representative approach for big data analysis with generalized linear models. Electronic Journal of Statistics 16: 592–635. [Google Scholar] [CrossRef]
- Li, Yifan, Ingmar Nolte, and Manh Cuong Pham. 2024. Parametric risk-neutral density estimation via finite lognormal-weibull mixtures. Journal of Econometrics 241: 105748. [Google Scholar] [CrossRef]
- Malz, Allan M. 1997. Estimating the probability distribution of the future exchange rate from option prices. Journal of Derivatives 5: 18–36. [Google Scholar] [CrossRef]
- Markose, Sheri, and Amadeo Alentorn. 2005. Option pricing and the implied tail index with the generalized extreme value (gev) distribution. In Computing in Economics and Finance. Cham: Springer Nature, pp. 1–29. [Google Scholar]
- Markose, Sheri, and Amadeo Alentorn. 2011. The generalized extreme value distribution, implied tail index, and option pricing. Journal of Derivatives 18: 35–60. [Google Scholar] [CrossRef]
- Melick, William R., and Charles P. Thomas. 1997. Recovering an asset’s implied pdf from option prices: An application to crude oil during the gulf crisis. Journal of Financial and Quantitative Analysis 32: 91–115. [Google Scholar] [CrossRef]
- Monteiro, Ana M., and António A. F. Santos. 2023. Parallel computing in finance for estimating risk-neutral densities through option prices. Journal of Parallel and Distributed Computing 173: 61–69. [Google Scholar] [CrossRef]
- Monteiro, Ana Margarida, Reha H. Tütüncü, and Luís N. Vicente. 2008. Recovering risk-neutral probability density functions from options prices using cubic splines and ensuring nonnegativity. European Journal of Operational Research 187: 525–42. [Google Scholar] [CrossRef][Green Version]
- Pascucci, Andrea. 2011. PDE and Martingale Methods in Option Pricing. New York: Springer. [Google Scholar]
- Ritchey, Robert J. 1990. Call option valuation for discrete normal mixtures. Journal of Financial Research 13: 285–96. [Google Scholar] [CrossRef]
- Rosenberg, Joshua V. 1998. Pricing multivariate contingent claims using estimated risk–neutral density functions. Journal of International Money and Finance 17: 229–47. [Google Scholar] [CrossRef]
- Rubinstein, Mark. 1994. Implied binomial trees. The Journal of Finance 49: 771–818. [Google Scholar] [CrossRef]
- Shimko, David. 1993. Bounds of probability. Risk 6: 33–37. [Google Scholar]
- Söderlin, Paul. 2000. Market expectations in the uk before and after the erm crisis. Economica 67: 1–18. [Google Scholar] [CrossRef][Green Version]
- Stutzer, Michael. 1996. A simple nonparametric approach to derivative security valuation. Journal of Finance 51: 1633–52. [Google Scholar] [CrossRef]
- Wang, Yong, Charles L. Lawson, and Richard J. Hanson. 2020. lsei: Solving Least Squares or Quadratic Programming Problems Under Equality/Inequality Constraints. R Package Version 1.3-0. Vienna: R Foundation for Statistical Computing. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).