FEDERAL RESERVE BANK o f ATLANTA WORKING PAPER SERIES Risk , Return , and Volatility Feedback : A Bayesian Nonparametric Analysis

thank conference participants at the 2012 International Conference on Computational and Financial Econometrics, the National Bureau of Economic Research-National Science Foundation’s 2013 Seminar on Bayesian Inference in Econometrics and Statistics, the Rimini Centre for Economic Analysis’s 2013 Bayesian workshop, and the 2014 workshop on applied financial time-series at HEC Montreal. They also thank seminar participants at McMaster University and University of Toronto. A previous version of this work was titled “A Bayesian Nonparametric Analysis of the Relationship between Returns and Realized Variance.” They are also grateful to Tom McCurdy, who supplied the data. The views expressed here are the authors’ and not necessarily those of the Federal Reserve Bank of Atlanta or the Federal Reserve System. Any remaining errors are the authors’ responsibility. Maheu is grateful to the Social Sciences and Humanities Research Council for financial support.


Introduction
This paper investigates the relationship between risk, return and volatility feedback by using a Bayesian nonparametric joint model for market excess returns and realized variance. A contemporaneous model for excess returns and log-realized variance is used to study volatility feedback effects which simultaneously impact prices.
The early literature found conflicting results on the sign and significance of the conditional variance from GARCH models in the conditional mean of market excess returns. A good summary of these results is found in Lettau & Ludvigson (2010). The recent literature has helped to resolve some of the issues. Scruggs (1998) and Gui & Whitelaw (2006) show that additional priced factors can affect the sign and significance of risk. Lundblad (2007) argues that longer samples are necessary to find a significant relationship between the market risk premium and expected volatility with GARCH specifications. Bandi & Perron (2008) document a long-run relationship between expected excess market returns and past market variance while Maheu & McCurdy (2007) find the long-run component of realized variance is priced in annual data. Ghysels et al. (2013) find a positive risk and return relationship over sample periods that exclude financial crises. 1 Using daily data Maheu et al. (2013) find the conditional variance and conditional skewness due to jumps is significantly priced. Ignoring higher order moments may confound the evidence for a positive risk and return relation.
Despite this little has been done to study the contemporaneous relationship between returns and realized variance while making no assumptions about their distribution and using it to investigate the role of volatility feedback. Our approach is related to Brandt & Kang (2004) and Harvey (2001) in that we jointly model returns and log-volatility contemporaneously, except we follow the advice of Harvey (2001) and dispense with parametric assumptions for conditional expectations and model the conditional expectation of returns nonparametrically, and like Ludvigson & Ng (2007), replace conditional volatilities with observed realized variances. This provides additional flexibility in modelling the joint distribution and provides a better signal on the variance by using daily data to estimate monthly ex post variance. Harrison & Zhang (1999) use a seminonparametric approach (Gallant & Tauchen 1989) based on a Hermite polynomial expansion to estimate the conditional distribution.
The leading term in the expansion is a Gaussian ARCH model for excess returns. They find a positive risk and return relation at long holding intervals of one and two years.
In contrast, we use an infinite mixture of distributions with a flexible Dirichlet process prior to jointly model excess returns and log-realized variance. From this the nonparametric conditional distribution of returns given realized variance consists of an infinite mixture representation whose unknown probabilities and arguments depend on the value of realized variance. This allows for general types of dependence in which the conditional mean of excess returns is a function of the contemporaneous realized variance. A Dirichlet process prior is assumed for the unknown distribution governing the mixture parameters; i.e., the distribution of mixture locations and probabilities are unknown and the Dirichlet process is the prior for this unknown distribution (see Ferguson (1973) and Lo (1984)). Our work extends Muller et al. (1996) and Taddy & Kottas (2010) to slice sampling (Walker 2007) methods to accommodate non-Gaussian data densities and nonconjugate priors. The Dirichlet process mixture (DPM) is the standard approach to Bayesian nonparametrics and performs well in practise. 2 A Markov chain Monte Carlo (MCMC) procedure delivers posterior samples from which estimates are obtained that account for model and distributional uncertainty.
Volatility feedback refers to a causal relationship between the variance and price changes. If volatility is priced and a positive volatility shock arrives, then all things equal, the required rate of return increases which discounts all future cash flows at a higher rate. This results in a simultaneous drop in the current price so as to deliver a higher future return consistent with the increase in risk. The importance of volatility feedback and its relationship to risk and return is discussed in French et al. (1987) and Campbell & Hentschel (1992). Campbell & Hentschel (1992) show that volatility feedback is significant and important in finding a positive risk and return relationship. Ignoring it will tend to obscure any risk and return relationship. Moreover, volatility feedback can be an important source of return asymmetry. For instance, when good (bad) news arrives volatility increases and volatility feedback implies a drop in current prices which mutes (amplifies) the price increase (decrease). Therefore, volatility feedback amplifies the effect of bad news on prices and dampens positive news. This is why it is a leading explanation for asymmetric volatility. 3 Therefore, price increases from good news with be less than what would occur without volatility feedback while price decrease from bad news will be steeper. Campbell & Hentschel (1992) derive their model by imposing economic restrictions that linearly relate log-returns to log-prices and log-dividends. 4 From this the impact of different sources of return shocks can be derived. Additional papers that build on this approach and find empirical support for volatility feedback include Turner et al. (1989), Kim et al. (2004), Kim et al. (2005), Bollerslev et al. (2006) and Calvet & Fisher (2007).
Our paper differs in several important ways from the existing literature. First, while almost all the literature has studied volatility feedback from a tightly parametrized model we use a flexible approach with no economic restrictions. Second, we use realized variance which is an accurate ex post measure of the variance of returns and permits the joint modelling of returns and variance. Third, we nonparametrically model the relationship between contemporaneous excess returns and log-realized variance. Volatility feedback implies an instantaneous causal relationship between volatility innovations and price levels or returns and our contemporaneous model is designed to investigate this relationship directly. Fourth, our nonparametric approach allows for conditioning on predetermined conditioning variables.
Using a long calender span of monthly data we find strong robust evidence of volatility feedback in monthly data. Expected excess returns are always positive when volatility shocks are small, however, they become negative once the volatility shock becomes larger. This relationship is very nonlinear and depends on the current level of expected volatility. Ignoring these dynamics will result in confounding evidence for risk and return.
Once volatility feedback is accounted for, there is an unambiguous positive relationship between expected excess returns and expected log-realized variance. This relationship is nonlinear.
Conditional quantile and contour plots support these findings and display significant deviations from the monotonic changes in the conditional distribution of the parametric model. We show that the volatility feedback effect impacts the whole distribution and not just the conditional mean. This paper is organized as follows. The data and construction of realized variance are discussed in the next section followed by the nonparametric model for excess market returns and log-realized variance in Section 3. Section 4 discusses estimation of the conditional distribution and conditional mean of excess returns given log-realized variance.
Empirical results are found in Section 5 followed by the Conclusion. 4 The approximation is based on Campbell & Shiller (1988).

Return and realized variance data
Using high frequency daily returns permits the construction of monthly realized variance -an ex post, observable variance that is the focus of our study. Although realized variance has been used in empirical finance for some time (French et al. 1987) there exists a strong theoretical foundation for using it as an essentially nonparametric measure of ex post volatility (for recent reviews see Andersen & Benzoni (2008) and McAleer & Medeiros (2008)).
To compute realized variance, daily price data is obtained from Bill Schwert 5 for 1885/2-1925/12, and from CRSP for 1926/1-2011/12 on the value-weighted portfolio with distributions for the S&P500. The data is converted to continuously compounded daily returns. If r t,i denotes the continuously compounded return for day i in month t then we compute realized variance according to where N t denotes the number of daily returns in month t. This estimate of realized variance contains a bias adjustment of order q to account for market microstructure dynamics and stale prices and follows Hansen & Lunde (2006). The Bartlett weights in (1) ensure that RV q t is always positive. In our work we set q = 1 and let RV t ≡ RV q t for the remainder of the paper.
Monthly returns are taken from the associated monthly files from Schwert and CRSP S&P500. The risk-free rate is obtained from Amit Goyal's website for 1885/2-1925/12, and after this time period the risk-free rate equals the 1 month rate from the CRSP Treasury bill file.
Our return-risk analysis dataset, thus, consists of monthly excess returns r t and realized variance RV t from 1885/1-2011/12 for a total of 1519 monthly observations. Returns are scaled by 12 and RV t by 144 in order for our findings to be interpreted in terms of annual returns. When estimating the model we reserve the first 22 observations as conditioning variables. The information set is denoted by I t = {r 1 , RV 1 , . . . , r t , RV t }, for t = 1, . . . , T . Table 1 reports various summary statistics for monthly excess returns and realized variance. Compared to squared returns, realized variance is less noisy. Returns standardized by realized variance are approximately normal with sample skewness of 0.003 5 For details on the construction of these data see Schwert (1990). and sample kurtosis of 2.6856. Log-realized variance is closer to being bell-shaped than the levels of RV t . Figure 1 displays a scatter plot of market excess returns and log(RV t ) which is the basis of our time-series models.

Nonparametric model of market excess returns and realized variance
In this section we provide some intuition on the nonparametric model we will use to specify the joint relationship between excess returns and realized variance and the implied conditional expectation of the return given realized variance. Since no theoretical reason exists for a particular parametric relationship to hold between the conditional mean and variance (Brandt & Kang 2004), we model the joint distribution nonparametrically by assuming the following infinite mixture representation for the conditional, joint, probability density function of excess returns and log-realized variance, where Ω = (ω 1 , ω 2 , . . . , ) and Θ = (θ 1 , θ 2 , . . .), and ω j ≥ 0, for all j such that ∞ j=1 ω j = 1, and f (·, ·|θ j , I t−1 ) is a smooth bivariate density kernel given the parameter θ j and information set I t−1 . This is a nonparametric model in the sense that there is an infinite number of parameters. It can approximate any continuous bivariate distribution to arbitrary accuracy by selecting the appropriate weight ω j and parameter θ j for the jth cluster. To reduce the clutter from carrying around the conditional mixture arguments, Θ and Ω, we drop them from p(r t , log(RV t )|I t−1 , Ω, Θ), when it is clear to do so.
In the next section we will discuss how a model containing of an infinite number of unknowns can be estimated with Bayesian methods, but for the moment we will consider how to obtain a nonparametric representation of the conditional distribution of excess returns that depends on log-realized variance from Eq. (2). From the mixture model above the conditional probability density function can be derived as is the conditional density kernel for the jth cluster and f (log(RV t )|θ j , I t−1 ) is the associated marginal density kernel for log(RV t ). The weights of this mixture have the particular form, so that they sum to 1. From (5) we see that clusters that provide a better fit to log(RV t ) will receive more weight in the mixture representation (4). Varying log(RV t ) will produce smooth changes in the conditional distribution.
Our object of interest is the conditional expectation of market excess returns given log-realized volatility. Since the expectation of a mixture is equivalent to a mixture of the expectation, the expectation of (4) is the desired conditional expectation where E[r t | log(RV t ), θ j , I t−1 ] is the jth cluster specific conditional expectation. A plot of the conditional expectation as a function of log(RV t ) will be a smoothly changing function that weights each of the cluster specific conditional expectations according to how the weight function q j (log(RV t )) changes as log(RV t ) changes. This is true even if In this way we can see the contemporaneous relationship of log-volatility on the conditional mean of excess returns.
As mentioned above, volatility feedback occurs simultaneously and this specification is designed to shed light on it.

A Bayesian model
The Dirichlet process prior has a long history, beginning with Ferguson (1973), of use in Bayesian nonparametric problems. It was used as a prior in countable infinite mixtures for density estimation in Ferguson (1983) and Lo (1984) but applications were limited until modern computational techniques. The seminal paper by Escobar & West (1995) show how to perform Bayesian nonparametric density estimation with Gibbs sampling.
Our approach is similar in that we place a Dirichlet process prior on ω j and θ j . In direct analogy to (2), according to Sethuraman (1994), the model can be represented as where The weights are generated by a stick breaking process since the unit interval is successively broken into smaller pieces by random draws from the beta distribution. Each cluster has a unique parameter θ j independently drawn from the base distribution G 0 .
The positive scalar κ controls the dispersion of the unit mass over the set Ω. A small value will put most of the weight on a few clusters while larger values will spread the weight over many clusters.
Another representation of (7)-(9) is in terms of the hierarchical model where the distribution of the mixture parameters, G, is unknown and modeled with the Dirichlet process distribution, DP (G 0 , κ), with precision parameter κ and base distri- The key quantity of interest is G -the unknown distribution of θ * t . Given the stick breaking definition of the Dirichlet process in (8)-(9), the prior distribution for G is where δ θ j (·) denotes a point mass at θ j and ω j and θ j are defined above. Hence, G will almost surely be a discrete distribution which means the θ * t s will contain repeats over t = 1, . . . , T . This clustering feature is one of the reasons the DP prior is so attractive.
Several data observations can share the same mixture parameter vector. A set of θ * t s all having the same unique mixing parameter θ st , where s t = j when θ * t = θ j . The DP prior is centered around G 0 in the sense that for any measurable set A, In this case, we have a mixture model with mixing measure G 0 . On the other hand, as κ → 0 the mixture model is lost 6 and is replaced by a parametric specification with parameter θ which has a fixed prior G 0 .
Based on theoretical considerations (Andersen et al. 2003), the empirical distribution of log(RV t ) being bell-shaped, and standardized excess returns being approximately normally distributed, we assume the joint kernel density, f , in (10) is, where with f N (·|µ, σ 2 ) defined as the normal density kernel centered at µ with variance σ 2 and Note, that although both excess returns and log-realized variance following conditional normal distributions their joint distribution is non-Gaussian since RV t enters the variance of r t . All of the elements in θ are permitted to differ over each cluster providing the maximum flexibility in modelling. For excess returns, RV t can impact the conditional mean and the variance. Note that under certain conditions RV t will be an unbiased estimate of the variance of returns but we allow for deviations that are captured by η 1 in the mixture model. The specification for log(RV t ) is along the lines of the models in Andersen et al. (2007), Corsi (2009) and the joint models of Maheu & McCurdy (2007, adapted to monthly data. It features a 6 month component to capture persistence beyond one month as well as asymmetric terms from lagged returns.

Posterior simulation
To sample the posterior density of this model we will exploit the mixture representation in (7) and a slice sampler based on Walker (2007), Kalli et al. (2011) andPapaspiliopoulos (2008). 7 This Markov chain Monte Carlo (MCMC) approach introduces a random auxiliary, latent, variable, u t ∈ (0, 1), that slices away any mixtures clusters with a weight ω j less that u t . In this way the infinite mixture model is reduced to a finite mixture.
Introducing the latent variable u t , we define the joint conditional density of the observed variables (r t , log(RV t )) and u t as, This infinite mixture is truncated to only include alive clusters with u t < ω j while dead clusters have a weight of 0 and can be ignored. If u t has a uniform distribution then integration of p(r t , log(RV t ), u t |Ω, Θ, I t−1 ) with respect to u t gives back the original model p(r t , log(RV t )|Ω, Θ, I t−1 ). On the other hand, the marginal density of u t is ∞ j=1 1(u t < ω j ).
Let s t = j assign observation (r t , log(RV t )) to the data density with parameter θ j .
We will augment the parameter space to include estimation of S = (s 1 , . . . , s T ). Let (18) and the joint posterior is where K is the smallest natural number that satisfies the condition K j=1 ω j > 1 − min{U }. This value of K ensures that there are no ω k > u t for k > K. In other words, we have the set of all clusters that are alive, {j : u t < w j }.
4. Find the smallest K such that K j=1 w j > 1 − min{U }.
The first step depends on the model and the density g 0 (·) to the DP priors base measure, G 0 . For the kernel densities in (15)-(16), specifying a normal prior for the regression coefficients and an independent inverse gamma prior for the variance, in other words, defining G 0 ≡ N (b, V ) × G(v/2, s/2), we can employ standard Gibbs sampling techniques in Step 1 (see Greenberg (2013) for details on the exact form of these conditional distributions).
Step 2 results from the conjugacy of the generalized Dirichlet distribution and multinomial sampling (Ishwaran & James 2001). Given Ω K and S each u t is uniformly distributed on (0, ω st ). The next step updates the truncation parameter K. If K is incremented, Step 4 will also involve drawing additional ω j and θ j from the DP prior. The final step is a multinomial draw of the cluster assignment variable s t based on a mixture with equal weights.
Repeating all these steps forms one iteration of the sampler. The MCMC sampler yields the following set of variables at each iteration i, Note that v i,j , j = 1, 2, . . . , K i implies ω i,j , j = 1, 2, . . . , K i through (8). After dropping the burnin phase from the above sampler, we collect i = 1, . . . , N samples. Each iteration of the algorithm produces a draw of the unknown mixing distribution G from its posterior [G|r, RV ] as We will make use of this to form the predictive density and conditional expectations based on this.

Nonparametric conditional density estimation
To flexibly estimate the conditional density p(r t | log(RV t ), I t−1 ) found in (4), or the conditional mean in (6), we use the method of Muller et al. (1996). This is an elegant approach to nonparametric estimation that allows the conditional density and expectation of excess returns to depend on covariates, in this case log(RV t ). The method requires the joint modelling of the predictor variable and its covariates and uses well know estimation methods for Dirichlet process mixture models. We extend Muller et al. (1996) to the slice sampler to accommodate the non-Gaussian data densities and nonconjugate priors found in our nonparametric model of market excess returns and realized variances. 8 Based on the previous section, and given G i , the ith realization from the posterior of the joint conditional predictive density for the generic return, log-realized variance combination, (r, log(RV )), is p(r, log(RV )|G i , I t−1 ) = f (r, log(RV )|θ, I t−1 )G i (dθ).
Substituting in the stick breaking representation for G i found in (21), the posterior draw of the predictive density has the equivalent representation where p(r, log(RV )|G 0 , I t−1 ) = f (r, log(RV )|θ, I t−1 )G 0 (dθ) is the expectation of (10).
To integrate out the uncertainty associated with G, one averages (23) Now, the predictive density of r given log(RV ) can be estimated as well. For each where f (r| log(RV ), θ i,j , I t−1 ) is the conditional density of (15), f (log(RV )|θ i,j , I t−1 ) is the marginal density of (16) and The denominator of q i,j (log(RV )) is the marginal of (23) obtained by integrating out r. f (log(RV )|θ i,j , I t−1 ) is the marginal data density of log(RV ) for the jth cluster with the marginal cluster parameter θ j and f (log(RV )|G 0 , I t−1 ) is the marginal data density with mixing over the base measure. The terms in (25) and (26) involving G 0 are defined as follows Assuming the marginal data density f (log(RV )|θ, I t−1 ) is available in analytic form both of these expressions can be approximated by the usual MCMC methods. For Using this result, features of the conditional distribution such as conditional quantiles can be derived.

Nonparametric conditional mean estimation
Our focus will be on the conditional expectation which can be estimated from these results. First, the conditional expectation of r given log(RV ), G i and the information where E[r| log(RV ), G 0 , I t−1 ] is taken with respect to (27). Note that this final term is only a function of G 0 and can be computed once, at the start of estimation, for a grid of values of log(RV t ). It is estimated as 9 for θ (i) ∼ G 0 , i = 1, . . . , M .
Given G i , equation (30) shows the conditional expectation of r is a convex combination of cluster specific conditional expectations E[r| log(RV ), θ j , I t−1 ], j = 1, . . . , K i , along with the expectation taken with respect to the base measure G 0 . The weighting function changes with the conditioning variable log(RV ), which in turn changes for each Finally, with this we can obtain the posterior predictive conditional mean estimate by averaging over (30) as follows to integrate out uncertainty concerning G. 10 Point-wise density intervals of the conditional mean can be estimated from the quantiles of E[r| log(RV ), G i , I t−1 ].
We evaluate the predictive conditional mean for a grid of values over log(RV ). This will produce a smooth curve and we will have a unique curve for each information set I t−1 in our sample t = 1, . . . , T .

Results
For our analysis we specify the following priors. The base measure G 0 contains priors for each regression parameter in (15) and (16) as independent N (0, 1) while η −2 1 ∼ G(5/2, 5/2) and η −2 2 ∼ G(6/2, 3/2) where G(a, b) denotes a gamma distribution with mean a/b. Note that we expect η 2 1 to be close to 1 and the prior reflects this with E[η −2 1 ] = 1 but allows for deviations from this. The precision parameter of the Dirichlet process is estimated and has a prior G(2, 10). Each cluster contains 9 parameters in θ j . 9 This result makes use of expressing the numerator as xp(x, y|θ)p(θ)dθdx = xp(x|y, θ)p(y|θ)p(θ)dθdx = E[x|y, θ]p(y|θ)p(θ)dθ. 10 Note that the quantity E[r t | log(RV t ), I t−1 ] in (6) assumes parameters are known. In our case they need to be estimated by the posterior density using the full sample of data r, RV . Therefore our estimate implicitly conditions on the observed r and RV in E[r| log(RV ), I t−1 ].
We use 5000 initial iterations of the posterior sampler for burn-in and then collect the following 20000 for posterior inference. The Markov chain mixes well and the posterior mean (0.95 density interval) for κ is 0.2046, (0.0439, 0.4831) and the posterior mean (0.95 density interval) for the number of alive clusters is 2 .6, (2, 4). In other words, about 2.6 components are used to fit the joint model of r t and log(RV t ).
Before we turn to the nonparametric estimates a parametric version of the model is reported in Table 2. This is a one state model. The coefficient α 1 on RV t in the excess return equation is significantly negative. η 2 1 is close to 1 and indicates no systematic bias in RV t . The estimates of γ 1 and γ 2 indicate persistence in log(RV t ). The lagged standardized excess return terms entering the log-volatility equation show asymmetry.
A negative return shock results in a larger conditional mean for log-volatility next period compared to a positive shock.
The conditional expectation of excess returns given log-realized variance is computed for a grid of 100 values over -4.0 to 2.0. We interpolate with a straight line between each of the grid values to approximate the smooth curve E[r| log(RV ), I t−1 , r, RV ] as a function of log(RV ). This is done for every I t−1 in our sample. Figure 2 displays the relationship between expected excess returns and log(RV ) for the parametric model. 11 Although the model specifies a linear relation between excess returns and RV , this yields a nonlinear relation with log(RV ). Note that this curve holds for every I t in our sample and is not affected by low or high volatility periods.
In contrast, Figure 3 displays the conditional expectation of excess returns as a function of log-realized variance for every information set in our dataset for the nonparametric model. Overall there is a general increase in the conditional mean of excess returns as log-realized variance increases from low levels to a point in which expected returns become negative. This is a general pattern that is found in all the plots. However, the point of decrease in the conditional expectation differs. It is clear that if one averaged over these expectations you could obtain a positive value for expected excess returns or a negative value. 12 To really see what is happening we need to consider the conditional expectation and the innovation of log-volatility as well.
To better understand what is happening we isolate three typical periods of low, average and high volatility periods from our sample and report the conditional expectations in Figure 4-6. Each figure contains the conditional expectation of market excess returns given a range of log-realized variance values as well as the conditional expectation of logrealized variance (blue) and the resulting realized value of log-realized variance (black).
Point wise 0.9 probability density intervals are included for the expected excess return.
Recall the discussion of volatility feedback in Section 1. Only if the log-variance happens to occur on the expected value of log-realized variance is the volatility feedback effect 0.
Values of log-realized variance above (below) the expected value are positive (negative) shocks to volatility. As was discussed, this will have a simultaneous impact on current prices, if volatility risk is priced, and result in a decrease (increase) in prices.
This is exactly what these figures show for a positive volatility shock. For instance, consider Figure 4 which conditions on the low volatility information set, I 1964:10 . 13 Expected log-realized variance is −3.158. The expected excess return is positive before and briefly after this value but eventually becomes negative. Before −3.158 there is a gentle increase in the expectation of r but after it there is a strong decrease to negative values. In other words, if the volatility shock is positive and sufficiently large we expect a contemporaneous decrease in prices from volatility feedback. Figure 5 displays a similar pattern for an average value of volatility when we set the information equal to I 1996:2 . Here expected log(RV ) is −2.117. As before, the expected excess return is positive before this and to the right remains positive but eventually become negative. If the log-volatility shock is sufficiently large (about +0.68) then the expected excess return is negative and decreases as the shock increases. For this information set the realized log(RV ) was −1.43. Finally, notice that the whole posterior curve of E[r| log(RV ), I 1996:2 , r, RV ] has shifted rightward as the expected log(RV ) has increased from Figure 4 to 5 (low to average log(RV )). This suggest an increase in compensation for the higher perceived risk.
A high volatility period corresponding to the information set I 2008:12 is found in Figure 6. Just as before, E[r| log(RV ), I 2008:12 , r, RV ] is essentially linear and flat as log(RV ) is increased prior to E[log(RV )|I 2008:12 , r, RV ] but after this point the expectation of excess returns become negative. 14 This is consistent with a volatility feedback effect. Note that in each of the three figures the effect of volatility feedback on returns appears to be stronger with an increased slope in moving from expected low to average to high volatility periods.  I 1964:10 , I 1996:2 and I 2008:12 . As E[log(RV )|I t−1 , r, RV ] increases the conditional expectation of excess returns shifts rightward and up. This is consistent with a positive and increasing reward for baring higher levels of risk.
In summary, we find a robust volatility feedback effect which is most notable for positive shocks to volatility. Expected excess returns are positive below E[log(RV )|I t−1 , r, RV ] but after this value eventually become negative. This suggests that risk is priced and the previous figure was consistent with this.

Risk and return
To focus on risk and return we need to account for the volatility feedback effect. In  in which volatility feedback is zero given the information set I t−1 . The relationship is unambiguously positive and increasing in log(RV ) which accords with theory. The relationship is nonlinear. It is approximately linear for small value of log-volatility but increases sharply as expected log-volatility surpasses 0.
In contrast to Campbell & Hentschel (1992) and the subsequent literature on volatility feedback, we find evidence of a positive risk and return relationship and a volatility feedback effect without imposing any economic restrictions. The key is flexibly modelling the contemporaneous distribution of market excess returns and log-realized variance and accounting for the volatility shock. As log(RV ) increases and the volatility shock becomes larger, most of the mass in each conditional density is over a negative range of excess returns. Here the investor is likely to have a loss from investing in the market. Volatility feedback has an impact on the whole distribution and not just the conditional mean. The changes in the density, as log(RV ) increases, are non-monotonic. In Figures 10 and 11 there is an increase in the spread of the density followed by a decrease and final increase. The point of these changes in the conditional density is to the right of the conditional mean of log(RV t ).

Conditional quantiles and contour plots
The parametric quantile plot is inconsistent with these features.
Contour plots of the conditional predictive density for (r, log(RV )), over the selected time periods discussed above, are found in Figure Table 2, α 1 the coefficient on RV t , is negative and positive over different time periods. The variability of the parameters in the figures is well beyond the 0.95 density intervals for the parametric model reported in Table 2. Although the parametric model estimate of η 2 1 is about 1 the nonparametric estimate varies between 0.4 to 0.85. This is due to the significantly improved fit that the nonparametric model offers in the conditional mean which contributes to a lower innovation variance.

Parameter estimates and robustness
Our results are robust to changes in the priors and the model for the data density.
For instance, we obtain the same qualitative results for E[r| log(RV ), I t−1 , r, RV ] if we omit RV t (α 1 = 0) in (15)

Conclusion
This paper nonparametrically models the contemporaneous relationship between market excess returns and realized variances. An infinite mixture of distributions is given a flexible Dirichlet process prior. From this the nonparametric conditional distribution of returns given realized variance consists of an infinite mixture representation whose probabilities and arguments depend on the value of realized variance. This allows for a smooth nonlinear relationship between the conditional mean of market excess returns and realized variance. The model is estimated with MCMC techniques based on slice sampling methods that extends the posterior sampling methods in the literature.
Applied to a long span of monthly data we find strong robust evidence of volatility feedback. Once volatility feedback is accounted for, there is an unambiguous positive relationship between expected excess returns and expected log-realized variance. In contrast to the existing literature, we find evidence of a positive risk and return relationship and a volatility feedback effect without imposing any economic restrictions. We show that the volatility feedback impacts the whole distribution and not just the conditional mean.     conditional on regressor in the information set from I t−1 , t = 1996 : 2 which is an average volatility period. The expected log-realized volatility based on the model is blue while the actual log-realized volatility for t = 1996 : 2 is the black vertical line.