American Option Pricing with Importance Sampling and Shifted Regressions

: This paper proposes a new method for pricing American options that uses importance sampling to reduce estimator bias and variance in simulation-and-regression based methods. Our suggested method uses regressions under the importance measure directly, instead of under the nominal measure as is the standard, to determine the optimal early exercise strategy. Our numerical results show that this method successfully reduces the bias plaguing the standard importance sampling method across a wide range of moneyness and maturities, with negligible change to estimator variance. When a low number of paths is used, our method always improves on the standard method and reduces average root mean squared error of estimated option prices by 22.5%.


Introduction
As its name suggests, the least-squares Monte Carlo (LSM) method of Longstaff and Schwartz (2001) employs a least-squares regression approach to estimate an optimal exercise strategy when pricing American style options. Contrary to deterministic solutions like those from multinominal trees or finite differences, regression-based estimators like the LSM are well adapted to multivariate settings and allow for great flexibility in the modeling of a strategy. The estimated exercise strategy is a core element of the LSM procedure, and one of the main obstacles to implementing an accurate LSM estimator is the bias and variance introduced by a sub-optimal exercise strategy. In order to efficiently compute accurate LSM estimators, a simple and effective solution is to employ Monte Carlo variance reduction tools.
One such variance reduction method is importance sampling. The objectives of importance sampling for European option pricing are twofold. First, it reduces the frequency of null payoffs that do not contribute to the estimation of the option premium. The same kind of concept is applied in stratified sampling for instance, where the optimal sampling weights are proportional to the variance of the discounted payoffs within each strata, such that no effort should be spent on sampling unexercised paths. Second, importance sampling reduces the variance of the estimator by generating more payoff events, which are otherwise ill-represented in a small sample drawn from the nominal distribution. Importance sampling thereby reduces the number of simulations that are necessary to obtain a balanced sample that includes some of the outliers that play an important role in the computation of an expectation. For the specific problem of pricing American options, a third objective can be formulated. That is, importance sampling should improve the accuracy of an exercise strategy, and, hence, reduce estimator bias resulting from incorrect exercise decisions. This paper demonstrates that the stability of the LSM algorithm is improved when regression coefficients are estimated from the same importance distribution that is used to simulate option prices. The regression of discounted option cashflows on shifted paths improves continuation value predictions where the paucity of data under the nominal measure otherwise impedes the polynomial approximation. Indeed, when too few nonzero cashflows are simulated, the strategy obtained from a standard LSM approach leads to significantly low-biased prices. Our proposed method corrects this bias. The relative importance of the benefits of using our approach is particularly noticeable when a low number of simulation paths is available. Hence, the Monte Carlo valuation of deep out of the money options, knock-in options or barrier options with American-style features may greatly benefit from the approach introduced in this paper.
A crucial aspect of importance sampling lies in the choice of importance measure. For American option pricing with the LSM algorithm, Moreni (2003) considers a uniform change in the drift of the simulated geometric Brownian motion (GBM) paths of the Black and Scholes (1973) model. This method uses the standard continuation values estimated under the nominal measure to determine stopping times for paths simulated under the importance measure using the same normal increments to generate paths under both measures. Another suggestion by Moreni (2004) is simply to implement the LSM algorithm under the nominal measure, and apply a uniform change of drift to the discounted option payoffs at the end of the algorithm after stopping times are estimated using paths under the nominal measure. These American option pricing algorithms successfully achieve the two aforementioned variance reduction objectives. First, they increase the frequency of non-zero payoffs. Second, they generate more rare events by redirecting paths deeper in the money. However, they only achieve modest improvements on the third objective.
It is in this spirit that we introduce the new LSM-s estimator, where the "s" stands for "shifted", in which the shifted paths are used in the regressions. To do so, the LSMs algorithm employs stepwise likelihood ratios as weights in a weighted least squares approach. The LSM-s comports several advantages over the standard LSM. First, the LSM-s has significantly reduced bias and variance compared to the standard LSM. Second, the LSM-s requires no prior knowledge of regression coefficients since the regressions are directly carried out with the importance paths. Our new approach, therefore, removes the overhead related to estimating an exercise strategy under the nominal measure. Third, the LSM-s enjoys a larger sample of relevant observations in the regression due to a higher incidence of non-zero values in the cross-section of shifted payoffs. Finally, the polynomial approximation of an exercise strategy is enhanced as more paths are simulated in the vicinity of the exercise region. These improvements reduce root mean squared error of the "unshifted" LSM estimator by 22.5% on average. Moreover, the LSM-s is particularly effective at bias reduction when the number of simulated paths is small, since it improves the polynomial approximations in data-poor regions where exercise is optimal.
Several contributions to the literature on variance reduction techniques for Monte Carlo derivative pricing have focused on reducing the variance of discounted payoffs to reduce the variance of the estimator. This approach is perfectly reasonable in the case of European-style options because exercise times are known. Whenever early exercise features are involved however, a significant portion of the error of the estimator is governed by the random errors in the exercise strategy. Consequently applying variance reduction tools to both continuation values and discounted payoffs achieves a significant reduction in both bias and variance. Our results, therefore, contribute to a strand of the literature for the valuation of American options in which bias reduction is achieved by reducing random exercise errors. A similar approach was used in Rasmussen (2005) in the context of variance reduction with control variates, as well as in Létourneau and Stentoft (2019) in the context of bootstrap aggregation of regression coefficients. See also Kan and Reesor (2012), who derive explicit approximations to the bias of Monte Carlo estimators of American option prices.
The paper is organized as follows: Section 2 outlines the American option pricing problem, provides details on the implementation of the LSM method, discusses the use of importance sampling and proposes a new method for using importance sampling with simulation and regression based algorithms. Section 3 presents the results of our numerical experiments along with robustness checks demonstrating the benefit of using our proposed method. Next, Section 4 explains how our improved approach generalizes to multivariate settings and a wider class of diffusion processes. Finally, Section 5 concludes. Appendix A contains further details on the numerical results.

Pricing Derivatives with Early-Exercise Features
In this section, we first state the valuation problem associated with pricing American options. Next, we illustrate how the price can be approximated using the least squares Monte Carlo (LSM) method of Longstaff and Schwartz (2001). Finally, we explain how importance sampling can be used in the context of simulation and regression methods. In this section we also propose our improved method in which regressions are carried out directly under the importance measure instead of under the nominal measure which has been the standard.

The Valuation Problem
Consider a complete probability space (Ω, F , P) equipped with a continuous filtration F = {F t : 0 ≤ t ≤ T}. The valuation of an American option written on the F-adapted underlying asset S(t) : t ∈ [0, T] is a stochastic optimal control problem with the objective of maximizing discounted payoffs with respect to an F-adapted class of stopping times T ⊆ [0, T]. Assuming a constant continuously-compounded interest rate r and a risk-neutral pricing probability measure P, we denote the price process of the option as U(t) : t ∈ [0, T] and write U(t) = sup τ∈T :t≤τ where h(·) ≥ 0 is an F-adapted payoff function. For instance, the payoff of a put option with strike price K is h(S(t)) = max(K − S(t), 0). In a discrete-time formulation of the pricing problem, the complete probability space (Ω, G, P) is equipped with a discrete filtration G = {G j : j = 0, . . . , J} to which is adapted the discretized asset price process {S j : j = 0, . . . , J}. Here we assume that the option may be exercised only at evenly spaced time points of length ∆t = T/J which define the time discretization, such that T ⊆ {τ j = j∆t : j = 0, . . . , J}. Let V j (x) : j = 0, . . . , J denote the time-j value of an unexercised option with underlying asset value S j = x. The option price can then be written as One can then show that V 0 (S 0 ) solves the dynamic program below Options with such discrete early exercise features are typically called Bermudan options.
To value an American option numerically, one is technically required to instead price a Bermudan option with a high number of exercise opportunities, thereby letting J → ∞ (Glasserman 2013). In line with the literature and for the sake of our discussion, we refer to options as American rather than Bermudan in this paper.

Least Squares Monte Carlo
We can approximate the dynamic program in Equation (3) by estimating the conditional expectations E P e −r∆t V j+1 (S j+1 ) | G j , j = 0, . . . , (J − 1) with ordinary least squares (OLS) regressions. For this task a finite set of G-measurable basis functions {ψ (·) : = 0, . . . , L} is used. Given a set of N simulated paths {S n,j : n = 1, . . . , N; j = 0, . . . , J}, a parametric approximation of the continuation value is obtained with a regression model of the form where the path-n, time-j error term ε n,j satisfies the usual OLS assumptions. Let N j = {n : h(S n,j ) > 0} be the set of time-j in-the-money (ITM) paths and let N j = |N j |.
In the LSM algorithm, the (L + 1) vector of coefficient estimatesβ j = {β j,0 , . . . ,β j,L } are obtained by regressing discounted cashflows e −r∆tV j+1 (S n,j+1 ) against the cross-section of basis functions ψ(S i,j ) = {ψ 0 (S n,j ), . . . , ψ L (S n,j )}, with ψ 0 as a constant, and for n ∈ N j . One can then easily show thatβ where the N j × (L + 1) matrix Ψ j denotes the time-j cross-section of basis functions and the N j × 1 vectorV j+1 is the sample of discounted option cashflows at time j + 1. The time-j cross-section of fitted continuation values then takes the form which is an unbiased estimator of the conditional expectation of interest E P V j+1 |Ψ j . PosingV n,j ≡V j (S n,j ) for notational simplicity, the resulting approximate dynamic program computes a sample of discounted cashflows as if h(S n,j ) ≥Ĉ n,j and n ∈ N j e −r∆tV n,j+1 if h(S n,j ) <Ĉ n,j and n ∈ N j or n / ∈ N j j = J − 1, . . . , 0.
It is easy to see that the estimated continuation values play a critical role in the LSM algorithm because they form a criterion for exercise and, thus, determine stopping time estimates. For each path,τ n is given by the first time the exercise value is positive and exceeds the estimated continuation value. For unexercised paths, we setτ n = J∆t = T.
In this paper, we focus on asset prices governed by a geometric Brownian motion (GBM) as in the Black and Scholes (1973) model with continuous risk-free rate r, dividend yield q, and volatility of returns σ. The path-n simulation of the asset price is obtained from normal increments Z n = {z n,1 , . . . , z n,J } with where z n,j I ID ∼ N (0, 1) : n = 1, . . . , N; j = 1, . . . , J. The sample-size N LSM estimator of V 0 (S 0 ) is then computed as the average of the pathwise option cashflowŝ whereτ n = k n ∆t is the path-n estimated stopping time from following the LSM exercise strategy derived from the algorithm in Equation (7).

Variance Reduction with Importance Sampling
A potential pitfall with the LSM method arises in the event that only a few of the sample paths meet the optimal exercise threshold. This has the undesirable effect of impeding the reliable approximation of continuation values and of reducing the sample of payoffs that measures the early exercise premium. Importance sampling techniques directly tackle this issue by generating more possible early-exercise instances. In accordance with the literature on importance sampling in a simulation and regression based approach, we select an equivalent importance probability measure among the Gaussian family of densities with identical scale, only allowing for a shift in the location parameter. 1 To be specific, we consider the situation in which a uniform drift term is added to the normal increments by posingZ n ≡ {z n,1 + √ ∆tλ, . . . , z n,J + √ ∆tλ} and we denote the "shifted" probability measure asP. The same normal variates are used to simulate the GBM under the importance measure withS 0 ≡ S 0 and we havẽ By virtue of the Girsanov theorem the k-step likelihood ratio for the discretized diffusion process then takes the form and the standard importance sampling LSM estimator is given bŷ whereτ n =k n ∆t denotes the estimated path-n LSM stopping time. It is apparent from Equation (12) that importance sampling simulates option payoffs under the importance measures. However, in the standard implementation of importance sampling the dynamic program being solved is if h(S n,j ) <Ĉ n,j and n ∈Ñ j or n / ∈Ñ j j = J − 1, . . . , 0, whereÑ j = {n : h(S n,j ) > 0} is the set of time-j ITM paths simulated from the importance measureP andÑ j = |Ñ j |. In this problem the continuation value predictions used to obtain the stopping time are given bŷ where the regression coefficients are determined from Equation (5). Thus, the standard implementation uses a parametrization obtained by projecting discounted payoffs onto the space spanned by the basis functions evaluated under the nominal measures. This method is, therefore, somewhat cumbersome and essentially requires one to store two sets of simulated paths to calculate both Ψ j : j = 0, . . . , J − 1 andΨ j : j = 0, . . . , J − 1. Moreover, it stands to reason that if the shifted payoffs improve the determination of the option price, essentially a conditional expectation, it seems likely that using the shifted paths could improve on the estimated continuation values, also conditional expectations, used to estimate the optimal stopping time strategy, thus reducing the bias that results from incorrect exercise decisions.
Our proposed improvement to the standard implementation of importance sampling with the LSM method leverages this and calculates the cross sectional regressions using the shifted paths. We call this estimator the LSM-s and write where the continuation values are now calculated from the cross-section of basis functions evaluated under the importance measure. That is, the (L + 1) vector of LSM-s coefficient estimatesβ j = {β j,0 , . . . ,β j,L } are obtained by regressing shifted discounted cashflows e −r∆tV j+1 (S n,j+1 ) dP dP (S n,j , ∆t) against the cross-section of basis functions ψ(S i,j ) = {ψ 0 (S n,j ), . . . , ψ L (S n,j )}, with ψ 0 as a constant, and for n ∈Ñ j . The resulting LSM-s coefficients are thenβ where theÑ j × (L + 1) matrixΨ j denotes the time-j cross-section of basis functions and j+1 is the sample of shifted discounted option cashflows at time j + 1 multiplied with the corresponding N × 1 vector of stepwise likelihood ratio for n ∈Ñ j . The time-j cross-section of fitted continuation values is then written aŝ Note that, instead of multiplying the likelihood ratios with the payoffs only when pricing the option in Equation (12), the stepwise likelihood ratios are now multiplied with the discounted payoffs at each iteration. This leads to the LSM-s estimator g (s) n ∆t denotes the path-n LSM-s stopping time. To see why LSM-s is better adapted to the task of estimating stopping times, we consider the work by Whitehead et al. (2012) and Kan and Reesor (2012) which derive approximations to the bias of Monte Carlo estimators of American option prices. Whitehead et al. (2012) do this in the context of the easy to analyze stochastic tree, while Kan and Reesor (2012) provide analogous derivations for LSM estimators. In both cases, the approximation to estimator bias depends on the ratio Var/N, where N is the sample size used to construct the continuation value estimator and Var is the estimated variance. The bias goes to zero as Var/N goes (monotonically) to zero, which happens as the sample size increases or as the estimated variance decreases. The typical importance sampling technique for American options (e.g., Moreni (2003)) has the effect of reducing the estimated variance (Var) compared to the regular LSM estimator. Our proposed LSM-s estimator enjoys the same reduced estimated variance as the LSM estimator with importance sampling, but also increases the sample size used to construct the continuation value estimator. Hence our proposed estimator leads to a further reduction in bias compared to the LSM estimator with importance sampling.

Numerical Results
In this section we report results from a numerical application of the shifted regression method proposed above and compare the results to what would be obtained with the standard simulation and regression based method. We first consider two techniques that can be used to approximate the variance-minimizing change of measure in the context of importance sampling. The first such technique directly optimizes the drift parameter with a numerical gradient-based method. The second technique uses an approximation of the optimal drift for a European option to price an analogous American option. Although suboptimal, the latter technique is common in the literature. 2 Second, we present numerical results for a sample of options with different moneyness and maturity highlighting the advantages of applying the optimal importance sampling tools in the regression component of the LSM. To do so, we examine the effect of importance sampling on the standard deviation, bias, and RMSE efficiency of the LSM and the LSM-s methods. Note that although the drift is selected with variance consideration in view, it has a remarkable effect on the estimator bias, particularly for LSM-s. Finally, we provide evidence on the robustness of these improvements when a suboptimal change of drift is considered.

Determination of the Optimal Drift
The optimal parameterization of an importance measure requires prior knowledge about the option price. Consequently practitioners are faced with the difficult problem of designing an automated procedure to select a distribution. That is to say, the chosen importance measure should consistently reduce the variance of a nominal estimator. One common approach is to find an easy solution to an approximating problem. With this in mind, Moreni (2003) simply proposes to approximate the optimal change of measure for a European option with the saddle point approximation of Glasserman et al. (1999), here called the GHS drift, and apply it to the corresponding American option. For put options, the GHS change in drift λ can be computed very quickly as where . This approach successfully achieves the task of reducing the variance of the nominal estimator, and has the advantage of employing a very cost-effective approximation of the optimal drift. However, one can easily find instances in which the GHS drift is significantly suboptimal and where the variance can be further reduced with a more involved optimization scheme. One such example is shown in Figure 1, which plots the standard deviation efficiency (left hand plot) and the bias with respect to a Cox et al. (1979) binomial tree with a large number of steps (right hand plot), obtained for different values of the change of drift λ for a put option with r = q = 0.06, σ = 0.2, S 0 = K = 40, T = 1, and N = 10 4 . The three lines correspond to using an optimal exercise strategy, labeled FDM, and early exercise strategies estimated with the LSM and LSM-s methods, respectively. The FDM simply replaces the exercise criterion in the LSM algorithm in Equation (13) by a precisely estimated boundary criterion obtained with a finite difference method with a fine grid. The figure demonstrates that the value of λ obtained with the GHS method, although quite close to the optimal level, is suboptimal in terms of the variance reduction that could be achieved. Moreover, using the GHS drift could lead to price estimates that are more biased than with the actual optimal variance minimizing drift. Results are for a put option with r = q = 0.06, σ = 0.2, S 0 = K = 40, T = 1, which is priced using N = 10 4 paths. The three lines correspond to optimal, labeled FDM, and LSM and LSM-s estimates obtained with a polynomial of order 3 in the cross-sectional regressions, respectively. The error bars around the bias illustrate 95% confidence intervals.
Motivated by the findings in Figure 1, in Section 3.2 we first consider the relative performance when an optimal drift is used. To implement this we use a gradient-based solution, as proposed in Morales (2006), by numerically computing the gradient of the variance as a function of the drift term λ. Since Figure 1 shows that the variance of the estimators is convex with respect to the drift parameter, numerical convergence is achieved fairly quickly. 3 For a fair comparison of the performance of the LSM and LSM-s under optimal importance sampling, we use the FDM method with N = 100, 000 paths in the simulation to optimize the drift. In Section 3.3, we consider the results obtained when using the easier to calculate but suboptimal GHS change of measure as a robustness check.

The Benefits of Using Shifted Regressions
In our numerical results, we price a total of 28 put options with 50 exercise opportunities per year (i.e., J = 50T), maturities (in years) T ∈ {0.5, 1, 1.5, 2} and strike prices K ∈ {34, 36, 38, 40, 42, 44, 46}, where the underlying asset price process is governed by a GBM with initial asset price S 0 = 40, volatility of log-returns σ = 0.4, and risk free rate r and dividend yield q set at r = q = 0.06. Both the LSM and the LSM-s algorithms employ a cubic approximation of the continuation value with the four basis functions {ψ (s) = s : = 0, 1, 2, 3}. We compute Monte Carlo estimators with N ∈ {10 3 , 10 4 , 10 5 } paths and R = 10 7 /N replications to illustrate how the relative importance of this method increases for estimators with smaller sample sizes.

Bias
We first note from Figure 2 that the LSM-s estimators always have the smallest bias. 4 Indeed, a significant bias is observed when importance sampling is used with the LSM, IS LSM, but this bias is eliminated with the importance sample LSM-s, IS LSM-s, even when as little as N = 1000 paths are used in the simulation. The LSM-s also has smaller bias than the LSM with and without importance sampling for all 28 options when using N = 10, 000 paths, although the magnitude of the bias tends to be smaller. The detailed results in Table A1 in Appendix A shows that this pattern persists when a large sample of N = 100, 000 paths is used in the simulation although in this case the size of the bias is essentially negligible. From these results we conclude that the LSM-s offers important reductions in the estimator bias compared to the LSM counterparts with and without importance sampling, and this is particularly so when the number of simulation paths is low.
The results in Figure 2 demonstrate that fitted regression values tend to be less accurate when regression coefficients estimated under the nominal measure are used to estimate continuation values when conditioning on values simulated from the importance measure. Indeed, continuation values can be difficult to approximate, especially if they take the form of a polynomial extrapolation for deep ITM paths. By regressing on paths from the importance measure the fitted values are more robust for deep ITM paths. Having a larger sample of deep ITM paths also contributes to preserving the convexity of continuation values with respect to the asset price. This effect reduces the frequency and importance of exercise errors and thus corrects the bias.

Standard Deviation
We next note from Figure 3 that the importance sampling variance reductions of the LSM and LSM-s estimators are remarkably similar. This is also confirmed by the detailed results shown in Table A3 in Appendix A. The figure also shows that compared to the variance of the LSM estimator without importance sampling the standard deviations are significantly lower, often around half of what is obtained without importance sampling, and importance sampling is, thus, clearly effective as a variance reduction technique.
The results in Figure 3 demonstrates that while the estimated early exercise strategy is worse with the IS LSM than when using the IS LSM-s it is "equally" suboptimal across all repetitions leading to an estimator bias of similar size. Once an early exercise strategy has been estimated, valuing the option amounts to averaging discounted payoffs and the fraction of such payoffs that are non-zero depends much less on the estimated exercise strategy and much more on the change of measure.

RMSE Efficiency
The results above demonstrate the efficiency of importance sampling as a variance reduction technique. They also demonstrate that the standard implementation may introduce a significant bias in the estimator which fortunately can be eliminated by using our proposed shifted LSM regression methodology. Combining these two findings we expect the root mean squared error, RMSE, of the estimated prices to be smaller when using LSM-s than when using LSM with importance sampling. We also expect that both methods have RMSEs that are much smaller than what is obtained with the LSM method implemented without any importance sampling. To illustrate this in a more concise way, we compute the RMSE efficiency of a given estimatorf N In other words, the efficiencies of importance sampling for the LSM and the LSM-s are measured relative to the standard LSM estimator with the same number of paths. Figure 4 plots the efficiencies for the two methods and demonstrates that the LSM-s method indeed has the highest RMSE efficiency across the board for all the options, and this holds irrespective of the number of paths used. When using only N = 1000 paths the average efficiency of the LSM-s is an impressive 1.68. The RMSE of the IS LSM-s is on average 22.5% smaller than that of the IS LSM, at least 8.8% better and could improve on the IS LSM by as much as 31.4%. The detailed results in Table A5 in Appendix A shows that the LSM-s also exhibits higher efficiencies when a large sample of N = 100, 000 paths is used in the simulation. However, the relative difference between the efficiencies of importance sampling for LSM and LSM-s methods decreases with the sample size because the bias improvement is negligible. The results in the above figures therefore provide strong support for our proposed method. In particular, they demonstrate that our proposed method should be particularly useful when one is restricted to using a low number of simulated paths for option pricing. Moreover, because of the reduced bias this method lends itself to simple parallel implementation where many independent jobs (with small sample size) are serially farmed out to processors in parallel. The reduction in bias allows for more accurate estimators when computing price estimators in this fashion as compared to the LSM and IS LSM estimators, respectively.

Robustness
The results in Section 3.2 are based on the optimal (but potentially infeasible) drift which is difficult (or even impossible) to calculate as it requires knowledge of the optimal early exercise strategy. In this section we, therefore, consider the robustness of our results when the easier to calculate but suboptimal GHS change of drift is used. Figure 5 demonstrates that the changes of drift indeed differ and that it tends to be much smaller for the suboptimal GHS method and this particularly so for short maturity options.  Tables A2, A4, and A6 in Appendix A. These plots look very similar to what is obtained with an optimal drift. In particular, Figure 6 shows that the LSM-s has the smallest bias in the vast majority of the cases, although the bias is slightly smaller with the IS LSM for the 3 out of the money short term options when N = 1000 paths are used, Figure 7 shows that the standard deviations of the IS LSM and IS LSM-s are very close and much smaller than the LSM without importance sampling, and Figure 8 shows that as a result the IS LSM-s is significantly more efficient than the IS LSM when measured with RMSE. When using N = 1000 paths only, with the GHS drift the average efficiency of the IS LSM-s is still large at 1.48 and the RMSE of the IS LSM-s is on average 18.1% smaller than that of the IS LSM. These results, therefore, demonstrate that our proposed method is indeed robust to the choice of (optimal) change of drift and hold true even when a suboptimal change of drift is considered. Moreover, the results confirm that our proposed method should be particularly useful when one is restricted to using a low number of simulated paths for option pricing.

Extensions and Future Research
Importance sampling can be extended to a multivariate setting as in Moreni (2004).
In particular, we can model D correlated GBM processes as a (D × 1) vector S j = {S where z j I ID ∼ N (0, I D ) is a (D × 1) vector of independent normal increments, and µ = [r − q (1) , ..., r − q (D) ] is a (D × 1) vector of mean log-returns defined as the difference between the continuously compounded risk-free rate and asset-specific dividend yield. Paths are simulated under the importance measure with the (D × 1) drift adjustment Λ by using the same multivariate normal increments to generatez j ≡ z j + √ ∆tΛ. Under both measures the initial values of each path are the same,S 0 = S 0 , and path are simulated under the importance measure with The k-step likelihood ratio for the discretized multivariate diffusion process takes the form for any j = 0, . . . , J and k = 0, . . . , j. One can then optimize the importance drift Λ with a gradient descent routine or use the saddle point approximation of Glasserman et al. (1999) when a closed-form solution to the analogous European-style option price is available.
In addition to the multivariate generalizations, it may also be desirable to consider importance sampling in models with fat-tailed distributions and variance dynamics for the underlying assets as exhibited in financial time series. When it comes to accommodating non-Gaussian fat-tailed returns, it is important to note that importance sampling techniques are not at all restricted to GBMs, as they only require that the importance distribution be absolutely continuous with respect to the nominal distribution. In such cases, the Radon-Nikodym derivative is well defined, and if it can be computed in a cost-effective way, importance sampling yields appreciable variance reduction for security pricing (see Boyle et al. (1997)). When it come to allowing for more flexible variance dynamics, Glasserman et al. (1999) also note that their approach can be used with the Hull and White (1987) stochastic volatility model or the Cox et al. (1977) mean-reverting square-root volatility model for interest rate derivatives. 6 For all these cases, the optimization of the drift term can be carried out with a gradient descent for any model specification or option payoff (Su and Fu 2000).
The shifted regression method proposed herein for American options can in principle be used with (all combinations of) the extensions outlined above simply by adjusting the likelihood ratio accordingly. Extensions to the multivariate setting are particular straightforward and simple to implement. However, to our knowledge applications of importance sampling for American option pricing in the presence of stochastic volatility and conditional heteroskedasticity (i.e., GARCH-type models) has yet to be thoroughly examined in the literature. The extension to the class of GARCH models, in particular with fat-tailed conditional distributions, is particularly interesting and challenging and it is currently the subject of ongoing research. 7

Conclusions
This paper proposes a new method for pricing American options when using importance sampling as a variance reduction technique in simulation-and-regression based methods. Our suggested method uses regressions under the importance measure directly to determine the optimal early exercise strategy. The standard method for using importance sampling is instead to use regressions under the nominal measure.
We show that our proposed method offers several important benefits. First, our proposed method requires no prior knowledge of regression coefficients since the regressions are carried out directly with the importance paths and our approach, therefore, removes the overhead related to estimating an exercise strategy under the nominal measure. Second, our proposed method successfully reduces the bias that plague the standard LSM method with and without importance sampling by improving the accuracy of an exercise strategy. Third, the shifted LSM approach preserves the variance reduction in the standard implementation of importance sampling.
As a result, when a low number of paths is used, our method always improves on the standard method and reduces average root means squared error of estimated option prices by 22.5%. Hence, the Monte Carlo valuation of deep out of the money options, knock-in options or barrier options with American-style features may greatly benefit from the approach introduced in this paper. Moreover, our methods lends itself to simple parallel implementation where many independent jobs (with small sample size) are serially farmed out to processors in parallel because of the reduced bias compared to standard (and unshifted) estimators. Acknowledgments: The authors would like to thank three anonymous referees for valuable comments and SHARCNET for providing computational resources.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Tables with Detailed Results
This appendix provides detailed results for the performance of the various methods used in this paper under the two different choices of change of measures used. Tables A1 and A2 present bias results, calculated with respect to a Cox et al. (1979) binomial tree solution, for the LSM, IS LSM, and IS LSM-s estimators across the 28 options considered for different number of paths and repetitions with the optimal drift and GHS approximate drift, respectively. Tables A3-A6 present corresponding results for the standard deviation and root mean squared error, RMSE, respectively.   See, e.g., Moreni (2003) and Lemieux and La (2005). 3 Note that Boire et al. (2021) show that this approach is not appropriate when importance sampling is combined with other variance reduction techniques. In this situation the variance is no longer convex in the change of drift and alternative methods like, e.g., a grid search is required. 4 Since there is no closed-form solution to the price of an American option, we use a very precise approximation of the price from a Cox et al. (1979) binomial tree with a large number of time steps as the benchmark.

5
Such a decomposition is easily computed with, for example, a Cholesky factorization. 6 They also present a more general framework for importance sampling in the Heath et al. (1990) model. 7 We thank one of the reviewers for suggesting this extension.