Detecting and Measuring Nonlinearity

This paper proposes an approach to measure the extent of nonlinearity of the exposure of a financial asset to a given risk factor. The proposed measure exploits the decomposition of a conditional expectation into its linear and nonlinear components. We illustrate the method with the measurement of the degree of nonlinearity of a European style option with respect to the underlying asset. Next, we use the method to identify the empirical patterns of the return-risk trade-off on the SP500. The results are strongly supportive of a nonlinear relationship between expected return and expected volatility. The data seem to be driven by two regimes: one regime with a positive return-risk trade-off and one with a negative trade-off.


Introduction
Economic theories are often operationalized under linearity assumptions on the relationships between the underlying variables or joint normality assumptions on their distributions. Famous examples include the Capital Asset Pricing Model (CAPM) and Value at Risk models. Often, linearity and normality assumptions are needed in order to obtain analytical formulas and elegant characterizations of the phenomena of interest. However, such assumptions may lead to wrong conclusions when they are not valid. Recognizing these limits and observing the relatively high frequency of extreme economic events (e.g., financial crises and recessions), a body of the empirical literature in finance emphasizes the distributional characteristics of assets returns that do not reflect normality, in particular asymmetry and fat tails.
For example, Harvey andSiddique (1999, 2000) propose a model to estimate the conditional skewness and highlight the importance of taking this into account when analyzing the cross sectional properties of assets prices. Christoffersen et al. (2006) propose a framework to price options in the presence of conditional skewness. Feunou and Tedongap (2012) propose a stochastic volatility model with conditional skewness for assets prices and show that these distributional aspects are very important to explain how investors value options. Gabaix (2009Gabaix ( , 2016 argues that stable laws approximate the distribution of many economic and financial variables fairly well while Gabaix (2011) proves that macroeconomic fluctuations can have granular origins. Indeed, Gabaix's work draws attention on a more general issue, namely, the fact that the aggregation of independent phenomena does not always lead to the normal distribution as stipulated by the central limit theorem.
If non-normality is now well entrenched in the mind of academic researchers and financial risk managers, non-linearity has received much less attention in the literature. The joint normality of two random variables X and Y implies that E(Y X) is a linear function of X and that conversely E(X Y) is linear in Y. While a linear relation can still exist between X and Y without joint normality, a nonlinear relationship precludes joint normality. This shows that nonlinearity and non-normality are distinct concepts. Despite the availability of more and more complex computer solutions, linear relationships remain largely advocated for empirical inquiries in economics and finance. Prominent models that are based on linear relationships include the Arbitrage Pricing Theory (APT), Taylor rules and Keynesian consumption functions. Unfortunately, a linearity assumption may lead to wrongly falsifying a theory when the true unknown relationship of interest is nonlinear. Even after acknowledging that the relationship between two variables is nonlinear, finding the functional form that works best for the situation of interest may still be difficult.
In this paper, I propose an approach to measure the degree of nonlinearity of the relationship between two variables. For any pair (X, Y), I define the exposure of Y to X as the expectation of Y given X. The relationship between X and Y is then said to be nonlinear if either Y is nonlinearly exposed to X or X is nonlinearly exposed to Y. Said differently, the relationship between X and Y is linear if and only if Y is linearly exposed to X and in turn X is linearly exposed to Y. Indeed, it is possible that E(Y X) be a linear function of X without E(X Y) being linear in Y. In this case, the linearity of E(Y X) is spurious as it is misleading about the true relationship between the two variables. Please note that the relationship between variables is approached in this paper from a predictive point of view rather than from a causal perspective. Knowing that the predictive relationship between X and Y is nonlinear is a good starting point for the search of the true underlying causal relationship.
The proposed measure for the degree of nonlinearity of the function E(Y X), denoted κ, exploits the relative importance of the norms of the linear and nonlinear parts of E(Y X) in a functional space. The decomposition of E(Y X) into its linear and nonlinear part is done via a functional projection of E(Y X) onto a basis of orthogonal polynomials P j (X) ∞ j=0 , where P j (X) is a polynomial of order j and the orthogonality is defined with respect to a metrics m(x). Upon observing that linear functions of X are loaded only on P 0 (X) and P 1 (X), a function is said to be purely nonlinear when it is entirely loaded on the higher order polynomials P j (X) n j=2 . For any function of X, the value of κ always lies between 0 and 1, with 0 meaning that E(Y X) is linear in X and 1 meaning that E(Y X) is purely nonlinear as per the previous definition. The index κ is invariant to linear transformations of Y as well as to the addition of an independent noise to Y. It is not invariant to the choice of the metrics m(x) and it is therefore sensitive to transformations of X.
Unlike our proposed measure of nonlinearity, the Pearson linear correlation coefficient measures the propensity of a random variable Y of being replicated by a linear function of another variable X. It always lies between −1 and +1, with 1 meaning a perfectly linear and positive relationship, 0 the absence of linear relationship and −1 a perfectly linear and negative relationship. A linear correlation coefficient lying strictly between −1 and 1 indicates that a fit of Y by a linear function of X will not be perfect. This imperfection arises either from the dependence of Y on random factors other than X or from nonlinearity in the relationship between Y and X (or both). The linear correlation coefficient is smaller than 1 in absolute value when X and Y are bound by a deterministic but nonlinear relationship. In an effort to repair the limitations of the linear correlation, measures of nonlinear association (typically, based on ranks) have been proposed. Notably, we have the rank correlation coefficient of Spearman (1904), Kendall's tau (Kendall 1938(Kendall , 1970, Goodman and Kruskal's gamma Kruskal 1954, 1959) and the quadrant count ratio (see Holmes 2001). All rank correlation measures are designed to detect the strength of possibly nonlinear but monotonic relationships between Y and X. Unlike the linear correlation, a rank correlation coefficient equals 1 if for instance Y = log X and −1 if Y = 1 X (assuming X > 0). However, like the linear correlation, a rank correlation coefficient can be non significant if the relationship between X and Y is non-monotonic.
The remainder of the paper is organized as follows. Section 2 motivates the use of the nonlinearity index (κ) proposed in this paper. Section 3 presents the derivation of κ. Section 4 discusses the choice of the metrics m(x) used to calculate κ. Section 5 examines the invariance properties of κ. Section 6 illustrates the calculation of κ for simple functions and warns against coarse errors when choosing the metrics m(x). Section 7 proposes a feasible estimator for κ and shows its consistency. Section 8 proposes three applications of our methodology. In the first application, I compute the nonlinearity index of a European option relatively to the underlying asset. It is found that the degree of nonlinearity of the option depends on its maturity and strike as well as on the volatility of the underlying asset. The second application underscores the importance of performing a nonlinearity diagnosis prior to designing a hedging strategy for a portfolio. In the third application, I analyze the nature of the relationship between the returns on the SP500 index and the associated risk as measured by the realized variance (RV). The empirical results are supportive of the existence of a nonlinear relationship between the expected return and the expected risk. The SP500 seems to be driven by two regimes, one regime in which the expected return is increasing in the expected risk and another regime in which the trade-off is negative. Within each regime, the return-risk trade-off is approximately linear. Section 9 concludes. The mathematical proofs are gathered in Appendix A.

Motivation
This section starts with an empirical example which shows that E(Y X) can be linear with E(X Y) being concomitantly nonlinear. The second subsection presents a theoretical explanation for the empirical example. The third subsection presents a situation where ignoring the presence of nonlinearity in the data can be harmful.

Pitfalls in the Linearity of Conditional Expectations
It is possible to have two random variables X and Y such that To support this point by empirical arguments, I downloaded daily observations on the SP500 from Yahoo Finance covering the period from 1 February 1959 to 30 April 2013 (14,173 days). The daily data are used to generate monthly returns (R t ) and log-returns denoted r t = log R t (652 months). The realized volatility (RV t ) is computed as the sum of squared daily log-returns within Month t. Figure 1a shows the scatter plot of the r t on the y-axis against RV t on the x-axis. The relationship between r t and V t is not visible on this plot. Figure 1b,c shows the scatter plots of r t against v t = log RV t and v t−1 respectively. Taking the log of RV t zooms the pictures out. The shapes of the two graphs are quite similar.   Figure 2 shows the nonparametric estimators of E (r t v t ) and E (v t r t ). For any pair (X, Y), the expectation of Y conditional on X = x is estimated as: Figure 2a suggests that the relationship between r t and v t is more or less linear. However, the curve of E (v t r t ) shows on Figure 2b is clearly nonlinear.
The shape of E (v t r t ) is consistent with the existence of two regimes in the joint dynamics of (v t , r t ), as found by Ghysels et al. (2014). The first regime is a vicious circle in which higher levels of expected risk are associated with higher losses (financial crises occur during this regime) 1 . By contrast, the second regime is a virtuous circle where higher levels of expected risk are associated with higher gains. The nonlinearity index proposed in this paper can be used to identify the two regimes.

Spurious Linearity
The expression "spurious linearity" may be used to describe a situation where the exposure of X to Y is linear while the reverse exposure is nonlinear, as in the previous example. To illustrate this concept theoretically, let us consider a deterministic mapping of the following form: (1) The relationship described by Equation (1) is plotted in Figure 3a. In this equation, Y is a deterministic and nonlinear transformation of X, which means that Y = E(Y X) and that the pseudo-R 2 of the nonlinear regression of Y onto X is ρ 2 (Y, E(Y X)) = 1. 2 However the reciprocal mapping from Y to X is not deterministic. To see this, assume that X follows a Uniform distribution on [0, 3]. Figure 3b plots X on the vertical axis against Y on the horizontal axis. Each value of Y is related to two possible values of X. Therefore, the knowledge of Y does not permit to identify with certainty the value of X to which it is associated. For each Y, the possible values of X are Y 2 and 3 − Y.   Please note that outliers have not been removed from the sample.
Var (Y) and ρ 2 (Y, E (Y X)) coincide with the R 2 in linear regression models. In nonlinear regressions, the empirical counterpart of can be larger than one in finite sample and depending on the choice of functional form for E (Y X). However, the empirical version of ρ 2 (Y, E (Y X)) always lies between 0 and 1. Hence, ρ 2 (Y, E (Y X)) is used here as Pseudo-R 2 for nonlinear regression models.
To find E(X Y), it helps to think of the joint distribution of (X, Y) as being generated by two regimes. Under Regime j = 1, Y = 2X and E (X Y, j = 1) = Y 2 whereas under Regime j = 2, we have Y = 3 − X and E (X Y, j = 2) = 3 − Y. Hence: The pseudo-R 2 of the nonlinear regression of X onto Y is given by: In summary, the exposure of Y to X is nonlinear and strong while the exposure of X to Y is linear and weak. This simply reflects the fact that X is a richer conditioning information set than Y. Indeed, the values of X implicitly determine the regimes so that: where k ∈ {1, 2}. Consequently, more information is gleaned about the type of relationship between X and Y by examining the expectation of Y given X.
The methodology proposed in this paper can be used to compute and compare the degrees of nonlinearity of E (X Y) and E(Y X). If one of the two conditional expectations is far more nonlinear than the other, this would be suggestive that the relationship between X and Y is non-monotonic. In the empirical example of Figure 2b and the theoretical illustration of Figure 3a, an examination of the degree of nonlinearity of E(Y X) on increasing subsets of type (−∞, x] of the support of X can be used to identify a candidate threshold x * for a piecewise linear approximation to the true model.

Disentangling Nonlinearity from Non-Normality
Models of equity premium prediction are typically specified as: where y t is an excess return process, x t is a vector of risk factors and ε t is an error term. The Capital Asset Pricing Model (CAPM) is a one factor model which takes x t to be the excess return on the market portfolio (See e.g., Sharpe 1964). Merton (1973) proposed an Intertemporal Capital Asset Pricing Model (ICAPM) where x t is the lagged realization of the asset's volatility. Merton (1980) further insisted on the fact that empirical predictions of expected returns must be related to changes in expected future market risk in order to be consistent with equilibria model. Following this idea, French et al. (1987) conducted an empirical study where y t and x t are respectively the stock market excess return and expected volatility. They found that expected return is positively related to expected risk while unexpected return is negatively related to unexpected risk (so-called "leverage effect"). Attempts have also been made to predict the equity premium using valuation ratios, such as companies' sizes, earnings-to-price ratios, cash flow-to-price ratios, book-to-market equity, sales growth, etc. See for instance Fama and French (1996) and their famous three-factor model.
Another strand of this literature focuses on the estimation of long run return-risk trade-off. Using long horizon regression models, Bandi and Perron (2008) find that past realized market variance is a good predictor of future excess returns. Jacquier and Okou (2014) reappraise this result by separating the market realized variance into its continuous part and its jump component. They find that the power of past realized volatility at predicting the future risk premium is attributable to its continuous part. Okou and Jacquier (2016) refined their own results by performing statistical inferences on the term structure of the return-risk trade off at long horizon. Their main finding is that the results depend much on whether an intercept is included in the regressions or not, and that there is an horizon effect in the risk return trade-off. They argued that differences in the data frequency may be responsible for the conflicting empirical conclusions on the risk-return trade-off. 3 I argue that in the presence of unsuspected nonlinearity, a linear regression can lead to quite misleading conclusions. In fact, the slope of the regression (2) can be insignificant as a result of E (y t x t ) being nonlinear in x t . Moreover, the estimate of β that comes out of this regression is meaningless in the presence of nonlinearity.
Nonparametric regressions are sometimes advocated by empirical researchers on the ground that excess returns are non-normal (See e.g., Harvey 2001). However, it is easy to conceive a linear relationship between y t and x t with non-normal residuals. Likewise, E (y t x t ) can be nonlinear while ε t is Gaussian. In the latter case, nonlinearity can cause the residuals of the linear regression of y t onto x t to be skewed or fat-tailed. The κ index proposed in this paper can help suspect whether the behavior of the model is driven by nonlinearity or non-normality.

Measuring Nonlinearity
Let the exposure of Y to X be given by: where X and Y are scalar random variables and G(X) is a possibly nonlinear function of X. In assets pricing, Y could be the risk premium on an asset and X a measure of the risk for bearing that asset, in which case G(X) describes a nonlinear risk-return trade-off. Alternatively, Y can be viewed as an investors portfolio and X the traditional market index. In the latter case, G(X) would be reflecting a nonlinear exposure to the market stemming from a non-directional investment strategy. In macroeconomics, Y could be the inflation rate and X the unemployment rate, in which case G(X) features a nonlinear Phillips curve. Finally, Y = Y t could be an arbitrary process and X = Y t−1 its lagged value in a nonlinear time series model. My objective is to assess the extent of nonlinearity of G(X). Let the population linear regression of Y onto X be denoted by: where α and β are real numbers. Please note that EL(X) coincides with the linear regression of G(X) onto X and that EL(X) coincides with G(X) if and only if Y can be represented as: where ε is linearly uncorrelated with X. In the particular case where (X, Y) is bivariate Gaussian, G(X) and EL(X) are necessarily identical and ε is normally distributed as well. That is, the joint Gaussianity of (X, Y) is sufficient but not necessary for linearity. With no loss of generality, I assume that G (X) admits a series representation of the following form: Two empirical studies using different databases recorded at the same frequency or two studies based on the same dataset but using different methods, may still lead to conflicting conclusion. See the introduction of Ghysels et al. (2005). where P j (X) ∞ j=0 is a complete sequence of orthogonal polynomials over the support of X under some metrics m (x), that is: and D(X) is the tight range of X, that is, the domain over which the values of X are meaningful. For instance, D(X) is a priori (0, ∞) for a price process and R for a log-return process. P j (x) is a polynomial of order j and P 0 (x) = 1. Please note that G (x) satisfies (5) if and only if: See Carrasco et al. (2007) and the references therein. Upon knowing G(X), the metrics m(x) can always be selected to meet the condition (6). Let the projection of G (X) onto [P 0 (.) , P 1 (.)] under the metrics m(x) be given by: where: and ⟨., .⟩ m denotes the scalar product under m (x). The function G (X) is linear if and only if it is loaded only on the first two basis functions, that is: where the first equality is deduced from (5) and the last equality stems from (4). G (X) is nonlinear if and only if the residual of the projection of G (X) onto [P 0 (.) , P 1 (.)], i.e., G (X) − LP(X), is not identically null. Therefore, the nonlinear part of G (X) may be isolated as: Based of this observation, a measure of the degree of nonlinearity of G (X) is given by: By construction, κ = 0 if G (X) is perfectly linear and κ = 1 if G (X) is fully loaded on the nonlinear basis functions. Hence, κ always lies between 0 and 1 and is decreasing in the degree of linearity of G (X) as measured by the ratio of the norms of LP(X) − γ 0 = γ 1 P 1 (X) and G (X) − γ 0 = ∑ ∞ j=1 γ j P j (X). Equivalent expressions of κ are therefore given by: Please note that κ is not defined when G (X) = γ 0 , just as the linear correlation coefficient between Y and a constant does not exist.
The choice of the metrics m(x) is a crucial step of the methodology presented above. Indeed, the value of κ depends on the metrics used and a bad choice of metrics may lead to spuriously detect nonlinearity. This issue is discussed in the next section.

The Conditioning Information Set and the Suitable Choice of Metrics
Let D(X) denote the conditioning information set, that is, the support of X. A probability measure f (x) on D (X) is said to belong to Pearson's family if and only if it satisfies: For instance, letting a 1 = −1 and a 2 = c 1 = c 2 = 0 leads to: which is the Gaussian probability distribution function on (−∞, +∞). Likewise, letting a 1 = a 2 = 0 yields the uniform distribution on b, b while setting c 1 = a 1 a 2 (a 1 ≠ 0, a 2 < 0) and c 2 = 0 yields an exponential distribution on [b, +∞). The Student, Gamma, Beta distributions are also special members of the Pearson family. See Johnson et al. (1994, pp. 15-25) and Bontemps and Meddahi (2012) for more details.
If a probability measure f (x) defined on b, b belongs to the Pearson family, the sequence of orthogonal polynomials under f (x) are given by Rodrigues' formula (see Askey 2005): where d (j) dx j is the nth order differentiation operator and e j is a sequence of normalization factors that could be chosen so as to achieve specific purposes. That is, any sequence P j (x) given by (14) Subsequently, we consider five cases that are representative of the situations that researchers will often face in practice.
Case 1: When D (X) = R, a suitable choice of metrics is m (x) = e −x 2 . The corresponding orthogonal basis is given by Hermite polynomials: where H ′ j (x) is the derivative of H j (x) with respect to x. In this case, the nonlinearity of G (X) is given by: where: Case 2: When D (X) = [0, +∞), one may use m (x) = e −x along with the corresponding orthogonal basis formed by Laguerre polynomials: The nonlinearity of G(X) is then given by: where: Case 3: If X can realistically not fall below a given threshold b, one may consider defining the domain of X as D(X) = [b, ∞). The corresponding Laguerre polynomials are obtained by noting that: Case 5: For an arbitrary bounded domain D (X) = b, b , we simply note that: Hence, by letting u = , it is straightforward to show that: 4 Alternative choices of basis functions are Jacobi, Gegenbauer or Tchebychev's polynomials along with the suitable metrics.
Hence, a suitable choice of basis functions when X ∈ b, b is given by the sequence P j The measure of nonlinearity for this case is: where: The latter set up best suits for measuring nonlinearity on segments of the support of X. Any metrics that follows the guideline described above will delivers a measure of nonlinearity that is reliable. This means that for an appropriately chosen metrics, the function under consideration is nonlinear as soon as κ is strictly positive. However, the interpretation of the result is "metrics specific", meaning that κ is a relative measure of nonlinearity. While the degrees of nonlinearity of different functions obtained under different metrics cannot be compared, different functions sharing the same support can be compared under the same metrics.

Invariance Properties
Observe that 1 − κ 2 has the flavor of the R 2 of a linear regression as it measures the "goodness-of-fit" of the functional projection of G(X) onto [P 0 (.), P 1 (.)]. Based on this observation, one is tempted to claim that κ shares all the invariance properties as an R 2 . However, such a statement is only partially true because of the dependence of κ on the metrics m(x). The invariance properties of κ are discussed below.

Proposition 1. κ is invariant to a linear transformation of Y.
Proposition 1 establishes that the amount of nonlinearity remains the same under drifting and scaling of Y. Applied to a portfolio of financial assets, this property means that leverage does not affect the nonlinearity of a financial position. Another property shared by the R 2 is stated below.
Proposition 2. κ is invariant to the addition of a randomness ε to Y provided that ε is independent of X.
This property is rather interesting as it implies that κ may be used to diagnose linear models with additive error terms. Let us assume that X is the return on the market index and Y the return on the portfolio of an investor such that E (Y X) = α + βX (i.e., we are assuming that the exposure of Y to X is perfectly linear). The strength of the exposure of Y to X is given by the linear correlation between Y and α + βX, that is, the square root of R 2 of the regression of Y onto X. Suppose the investor decides to implement a non-directional (i.e., market neutral) diversification strategy by changing Y into 1 2 Y + 1 2 ε, where ε is independent of X. The exposure of the new position is given by E 1 2 Y + 1 2 ε X =α +βX, whereα = α + 1 2 E (ε) andβ = 1 2 β. The alpha of the new position may have increased of decreased depending on the value of E (ε) and its beta is reduced by half. However, the nonlinearity of the new position as measured by κ is unaltered since E 1 2 Y + 1 2 ε X remains linear in X. The next proposition examines the consequence of an addition of a linear function of X to Y.
Proposition 3. If Y is nonlinearly exposed to X, Adding a linear function of X to Y does not necessarily decrease its degree of nonlinearity.
Indeed, κ is not invariant to the addition of a linear function of X to Y. To understand this result in the context of the proposition (see the proof in Appendix A for more details), suppose γ 1 is positive so that −4γ 1 < 0. Then adding Z = aX + b to Y exacerbates its nonlinearity if a is negative and lies within the range [−4γ 1 , 0]. The degree of nonlinearity decreases only if a < −4γ 1 or a > 0. Alternatively, suppose γ 1 is negative so that −4γ 1 > 0. Then adding Z = aX + b to Y exacerbates its nonlinearity if a is positive and lies within the range [0, −4γ 1 ]. Otherwise, the nonlinearity of Y decreases. Applied to portfolio choice, Proposition 3 implies that an asset that is linearly exposed to X can be used to increase the nonlinearity of an already nonlinear position Y. A sufficient condition for the addition of Z = aX + b to reduce the nonlinearity of Y is that a be of the same sign as γ 1 .
The property of κ highlighted by Proposition 3 is also shared by the linear correlation coefficient. Unlike the correlation coefficient however, the value of κ is sensitive to drifting and scaling of X.
Proposition 4. Let Z = aX + b where a and b are some constants. Then the degree of nonlinearity of Y with respect to X under m(x) is equal to its degree of nonlinearity with respect to Z underm (z) = 1 a m( z−b a ).
The result of Proposition 4 stems from the fact that any transformation of X alters the metrics m(x), which in turn invalidates the orthogonality of P j (X) ∞ j=0 . It implies that the values of κ are not directly comparable across different choices of metrics, which is a drawback of the proposed methodology. However, this drawback is a minor one if m(x) is continuous and puts zero weights outside the support of X. Also, functions that are defined on the same domain D(X) may be compared under the same metrics provided that they all have finite norms.

Spurious Nonlinearity
This section illustrates how to compute κ when G(X) is known and in passing, underscores the importance of selecting the metrics m(x) wisely. Indeed, spurious nonlinearity may arise from a bad choice of metrics. To see this, let us consider the exponential function G (X) = e X , which also has the following representation: It is tempting to claim based on (24) that γ 0 = γ 1 = 1. However, such a claim would be false since γ 0 and γ 1 have been defined as the coordinates of G (X) in the basis formed by the orthogonal polynomials P j (X) ∞ j=0 . By noting that D(X) = R for the exponential function, we let m(X) = e −X 2 so are Hermite polynomials and: This yields the following measure of nonlinearity: Let us now consider G (X) = log X. This function is defined only for x > 0 and hence, its nonlinearity should not be measured as though x lies on the whole real line. For illustration purposes, let us ignore this warning by letting G (X) = ∑ ∞ j=1 γ j H j (X). This leads to: The measure of nonlinearity that results is: which is quite excessive compared to what is obtained for the exponential. In reality, the nonlinearity calculated above is for the function given by: which is distinct from G (X) = log x on (0, +∞). The domain ofG (X) is the whole real line whereas the tight domain of G (X) is D(X) = [0, +∞). This explains why we obtain a spuriously high value of κ.
Let us now account for the fact that the domain of log X is (0, +∞) by letting G (X) = ∑ ∞ j=1 γ j L j (X), where L j (X) ∞ j=0 are Laguerre polynomials. We obtain: This yields: which is more reasonable than previously.

Feasible Estimators
Upon observing a sample {(x 1 , y 1 ) , ..., (x T , y T )} of size T from the joint distribution of (X, Y), the conditional expectation E (Y X = x) = G(x) can be estimated by the nonparametric method of Nadaraya (1964) and Watson (1964) where K (z) is a kernel function and h is a bandwidth.
With the estimatorĜ (x) above in hand, the sample counterpart of κ is: Given these choices of metrics, integrals of the form ∫ D(X) f (x)m(x)dx can be solved analytically when f (x) is a polynomial. For more complicated functions however, numerical quadratures should be used.
When m (x) = e −x 2 , the quadrature rule yields: ω kĜ (x k ) and (34) where {x k } N k=1 are Gauss-Hermite quadrature points associated with the weights {ω k } N k=1 . For the alternative metrics m (x) = e −x , the quadrature rule becomes: Finally, for the natural metrics m (x) = 1, we have: where {x k } N k=1 are Gauss-Legendre quadrature points associated with the weights {ω k } N k=1 . The quadrature rules above are designed such that they are exact when the integrand is a polynomial of order n ≤ 2N − 1. That is: Thus, it is straightforward to show that: If N > j, the approximation error ofγ j by a quadrature rule is: 1 Please note that the finiteness of the norm ofĜ (x) imposes thatγ l P l → 0. Furthermore, is an approximation of 1 P l P j ∫ P l (x)P j (x)dx = 0. Consequently, the approximation error (42) converges to zero fast as N increases (especially for j = 0 and j = 1). This is a good news given that the expression ofκ only requires onγ 0 andγ 1 .
The next proposition states a consistency result forκ by assuming that its numerical approximation error is negligible.
According to Proposition 5, any consistent estimatorĜ (X) of G (X) = E(Y X) may be plugged into (30) to (32) to obtain a consistent estimator of κ. Nonparametric estimators of type (28) have been shown to be consistent under quite general settings (Bierens 1987).

Applications
This section presents three applications of κ. The first subsection discusses the degree of nonlinearity of a European option with respect to the underlying asset. The second subsection discusses the optimal hedge ratio in the presence of nonlinearity. The third subsection presents an empirical example where the relationship between the risk and the returns on the SP500 index is analyzed.

How Nonlinear Are Put and Call Options?
This section examine the nonlinearity of the price of a European style option with respect to the underlying asset. This exercise is trivial in a sense as options generate nonlinear payoffs by construction. However, it is nevertheless useful as it gives us a pretext to compare the degree of nonlinearity of several functions under the same metrics and to further illustrate the importance of correctly selecting the metrics m(x).
The payoff a European Call option at maturity is given by G (X) = max (X − K, 0), where X is the price of the underlying asset and K is the strike.
If one ignores the fact that the support of X is [0, ∞) and use m(x) = e −x 2 to compute the nonlinearity of G(X), then one obtains the following expressions: where Φ(x) is the standard normal cumulative distribution function. This suggests that where γ 0 , γ 1 and G 2 m are given above. Evaluating this formula at K = 0 yields: The value of κ C (0) is clearly misleading since G (X) is linear when K = 0. To avoid spurious nonlinearity, one uses m(x) = e −x . We have: Hence the nonlinearity of the payoff of an European Call is given by: We see that the formula above now implies that κ C (0) = 0, consistently with the fact that G (X) = X when K = 0. I now consider the payoff a Put option with strike K, given by G(X) = max (K − X, 0). Having learned from the previous example, I set m(x) = e −x and obtain: The nonlinearity of the payoff of an European Put is given by: Please note that for a Put, G(X) → K − X when K → ∞. Hence, we expect to see κ P (K) → 0 as K → ∞. Indeed, we have: Figure 4 compares the nonlinearity of a the payoffs of a Call and a Put under the metrics m(x) = e −x . As K increases to infinity, κ P (K) starts at κ P (0) = 0 and converges to 1 whereas κ C (K) at κ C (0) = 1 and converges to zero K → ∞. Over the course of its lifetime, the price of a Call as given by the Black-Scholes formula is: where: X is the price of the underlying asset, K is the strike, t is the current date, T is the maturity and σ is the spot volatility of X.
The following quantities are needed in order to evaluate the nonlinearity of C(X, K, σ, t) as K, t and σ vary: 5 The crossing point where κ C (K) = κ P (K) is approximately K ≃ 1.67.
These integrals cannot be computed in closed form. A numerical approximation based on Gauss-Laguerre rule yields: where x l , l = 1, ..., N are quadrature points associated with weights ω l , l = 1, ..., N. Hence, For an European Put option, the price evolves according to: The nonlinearity of this price with respect to the underlying asset is: Figure 5 is drawn by assuming that T = 1 year, K ∈ [0, 5], σ ∈ [0, 80%] and r = 3%. For a Call option (resp., Put options), nonlinearity increases (resp., decreases) in the strike K for all values of volatility and time to maturity. For both types of options, the nonlinearity is decreasing in the volatility and time to maturity. The nonlinearity appears to be more sensitive in the volatility and time to maturity for a Call than for a Put.

Nonlinearity and Optimal Hedging
Assume that an investor wants to hold −β 0 units of a risk free asset (B t ) and −β 1 units of a risky asset (X t ) so as to hedge against the volatility of an asset Y t . This investor would form a hedge portfolio whose value is Y t − β 0 B f ,t − β 1 X t . The hedging error is given by: In practice, a perfect hedge that results in e t+1 = 0 can rarely be achieved. Therefore, the best solution often consists of minimizing the variance of the hedging error e t+1 with respect to the "hedge ratio" ω x,t ω y,t = k t , which is the number of units of Asset X t to hold per unit of Asset Y t . When Y t is the payoff of an option and X t the price of the underlying asset, k t corresponds to the "delta" of the option (hence the expression "Delta Hedging").
The optimal hedging problem boils down to the following minimization: The optimal hedge ratio is given by Var t (x t+1 ) . However, Cov t (y t+1 , x t+1 ) can be zero if y t+1 is nonlinearly exposed to x t+1 . In this case, one would wrongly conclude that Asset X t is of no help for hedging against the fluctuations of Y t . Now, suppose we have detected that E (y t x t ) = g (x t ) is nonlinear using the methodology proposed in this paper. A reasonable hedging strategy would therefore consist of first using state-of-the-art models to predict x t+1 as x t+1 and next linearizing g (x t ) around this prediction to obtain: An approximately optimal hedge ratio would then be given by: This example underscores the importance of being aware of the presence of nonlinearity in assets returns for a sound portfolio risk management.

Empirical Application: Return-Risk Trade-Off on the SP500
In this section, I illustrate an empirical use of the measure of nonlinearity κ by performing an analysis of the return-risk trade-off based on the Merton's (1973) intertemporal capital asset pricing model (ICAPM). This model posits the following relation between the conditional expected return on the market index E t (R t+1 ) and the conditional expected variance Var t (R t+1 ): where R f ,t+1 is the risk free rate and θ is the relative risk aversion coefficient of the representative agent. A multivariate version of the ICAPM is proposed in Bollerslev et al. (1988). Ghysels et al. (2005) employed a MIxed DAta Sampling (MIDAS) methodology to the CRSP value-weighted portfolio and concluded that "there is a [positive] risk-return trade-off after all." Previous studies who found a positive relationship between risk premium and expected risk include French et al. (1987) and Campbell and Hentschel (1992), in contrast with Nelson (1991) who find a negative relationship or Glosten et al. (1993) and Harvey (2001) whose conclusions are mixed. Ghysels et al. (2005) attributes the conflicting conclusions to differences in the models posited for the conditional variance. Jacquier and Okou (2014) suspect the differences in data frequencies to be responsible of the inconsistencies across findings. The empirical results of the current paper suggest that the controversy on the nature of the return-risk trade-off is mainly due to nonlinearity.
I consider estimating the following nonlinear version of the ICAPM: If one assume that u t and ε t are IID and Gaussian, Equations (54) and (55) imply that: As in the ICAPM, the conditional expected return E t (R t+1 ) is increasing in conditional expected variance E t (RV t+1 ). Unlike in the ICAPM, the relationship between the conditional expected return and the innovation on the log-variance process (ε t+1 ) is explicitly characterized. Namely, the conditional expected return is decreasing in the variance of ε t+1 , as observed empirically by French et al. (1987). Finally, the market price of risk is a nonlinear function of the conditional expected variance. I estimate the AR(1) v t = c 0 + c 1 v t−1 + ε t for the log-RV and compute the fitted values aŝ v t =ĉ 0 +ĉ 1 v t−1 . These fitted values are the expected risk at time t as perceived by investors at time t − 1. The estimated coefficients areĉ 0 = −1.8683 (significant at 5% level) andĉ 1 = 0.7209 (significant at 5% level). The R 2 of this regression is 52% and the estimated error variance is 0.4193. Figure 5a shows the scatter plot of v t againstv t . Next, I regress the log-return r t onv t and a constant and compute the fitted values asr t =μ 0 +μ 1vt . This yieldsμ 0 = −0.0056 andμ 1 = −0.0016. Neither of these coefficients is significant at 5% level and not surprisingly, the R 2 of the regression is less than 1%. The poor fit provided but the linear regression of r t ontov t suggests that the relationship between these variable is nonlinear. This is confirmed by the nonparametric estimator of E (v t r t ) shown by Figure 2. Figure 6a,b respectively show the estimated nonlinearity of E (r t v t ) and E (v t r t ) on increasing segments of the support ofv t and r t . The scaling of the x-axis corresponds to the quantiles of the conditioning variable. For any pair (X, Y), the nonlinearity ofĜ (x) is computed on the segments [x, x k ] using the metrics m(x) = 1 along with Gauss-Legendre's polynomials, where: x k = x + k (x − x) 20, k = 0, ..., 20, x = min (x 1 , ..., x T ) and x = max (x 1 , ..., x T ) .
For each x k ,Ĝ (x k ) is obtained by spline interpolation based on x t ,Ĝ (x t ) T t=1 . We see that the nonlinearity of E (r t v t ) increases fast and remains high after the 20th percentile ofv t . By contrast, the nonlinearity of E (v t r t ) decreased on the first portion of the support of r t and increases steadily after the median. The point at which the nonlinearity of E (v t r t ) is minimized, r * = −0.0075, is a good candidate for the threshold of a piecewise linear model. Figure 6c,d respectively show the strength of the exposure of r t tov t andv t to r t on increasing subsample of type {(x t , y t ) ∶ x t ≤ x k }. The correlation of r t and E (r t v t ) approximately equals 0.1 on a large portion of the support ofv t while that ofv t and E (v t r t ) is on average equal to 0.4. This means that E (v t r t ) fitsv t better than E (r t v t ) fits r t .
Acting on the fact that the linearity of E (v t r t ) reaches its maximum at r * = −0.0075, I estimate the following piecewise linear regression: Letr t denote the fitted values obtained from this piecewise linear regression. Figure 7a shows the fitted regression lines while Figure 7b plots the linear correlation between r t andr t against the quantiles ofv t . The correlation between r t andr t reaches 50% at the 10th percentile and remains above 75% thereafter. This strong nonlinear relationship between log-returns and log-realized volatility is completely missed by the naive linear regression of r t ontov t . The estimated regression lines are: r t = −0.1622 − 0.0183v t for r t ≤ r * and r t = 0.1022 + 0.011v t for r t > r * .
The estimated error variances are respectively Var u 1,t = 0.00062 for the first regime and Var (u 2,t ) = 0.00089 for the second regime. Finally, the estimated nonlinear return-risk trade-offs are: 0.011 if r t ≤ r * and E t (R t+1 ) = 0.8539E t (RV t+1 ) −0.0183 if r t > r * Based on the equation, the unconditional correlation between the actual gross-returns and their predictions shown on Figure 7c is 0.7861. A natural implication of these results is that option pricing models should at least account for the presence of regimes or parameter uncertainty in the distribution of the returns on the underlying asset. Our findings provide a strong empirical support for regime switching models in which volatility influences returns (as in Duan et al. 2002), heteroskedastic mixture models or Bayesian models that naturally account for parameter uncertainty (e.g., Stentoft 2014, 2015).

Conclusions
This paper proposes an approach to measure the degree of nonlinearity of the exposure of a variable Y to the movements of another variable X. The exposure is defined in terms of the expectation of Y conditional on X, denoted G(X). The proposed measure of nonlinearity, denoted κ, is constructed by exploiting the ratio of the norms of the linear and nonlinear parts of G(X). The separation of G(X) into its linear and nonlinear parts is done via its projection onto an orthogonal basis of polynomials with respect to some metrics m(x). The invariance properties of κ are studied. For cases where the exposure function G(X) is unknown, an estimatorκ of κ that exploits a consistent nonparametric estimatorĜ(X) is proposed. It is shown thatκ inherits the consistency ofĜ(X).
Three fields of application are proposed. The first application concerns the measurement of the nonlinearity of the price of a European style option with respect to the underlying asset. I find that the nonlinearity of a Call increases with the strike but decreases with the volatility and time to maturity. For a Put, the nonlinearity decreased with the strike, volatility and time to maturity. The second application attempts to motivate the use of κ for portfolio risk management. Basically, I argue that hedge ratios are misleading in the presence of nonlinearity and therefore, that a diagnosis of nonlinearity should be performed prior to designing the optimal hedge strategy. The third application is empirical and concerns the relationship between the return and realized volatility of the SP500 index. The empirical results provide supportive evidence that the return-risk trade-off on the SP500 is nonlinear and governed by regimes.