Between Scylla and Charybdis: The Bermudan Swaptions Pricing Odyssey

: Bermudan swaptions are options on interest rate swaps which can be exercised on one or more dates before the ﬁnal maturity of the swap. Because the exercise boundary between the continuation area and stopping area is inherently complex and multi-dimensional for interest rate products, there is an inherent “tug of war” between the pursuit of calibration and pricing precision, tractability, and implementation efﬁciency. After reviewing the main ideas and implementation techniques underlying both single- and multi-factor models, we offer our own approach based on dimension reduction via Markovian projection. Speciﬁcally, on the theoretical side, we provide a reinterpretation and extension of the classic result due to Gyöngy which covers non-probabilistic, discounted, distributions relevant in option pricing. Thus, we show that for purposes of swaption pricing, a potentially complex and multidimensional process for the underlying swap rate can be collapsed to a one-dimensional one. The empirical contribution of the paper consists in demonstrating that even though we only match the marginal distributions of the two processes, Bermudan swaptions prices calculated using such an approach appear well-behaved and closely aligned to counterparts from more sophisticated models.


Outline
Interest rate contracts constitute the largest chunk of the global over-the-counter derivatives market, with an estimated total notional outstanding of just over $480 trillion, of which roughly $350 trillion is traded in swaps and almost $50 trillion is traded in interest rate options. The total market value of interest rate swaps and options tallied together exceeds $8 trillion-or about 10% of world GDP.
Anticipating a more formal discussion, an interest rate swap is simply a contract in which two parties exchange two flows of payments ("legs")-one calculated on the basis of a fixed, pre-determined rate (coupon), and one calculated on the basis of a floating interest rate (typically the interbank market benchmark, often referred to as-London Interbank Offered Rate (LIBOR). Swap options, or swaptions, give the owner the right-but not the obligation-to enter into an interest rate swap with a given final maturity and a predetermined strike rate at a certain, specific time in the future called option expiration and corresponding to the fixing date of the swap. Swaptions come in many shapes and forms, some more complex ("exotic") than others. A prototypical example of an exotic interest rate derivative is a Bermudan swaption. Somewhat, the name has nothing to do with geography, but rather with the option exercise profile, as Bermudans, unlike their simpler-so-called European-counterparts, can be exercised not just on a single date, but on any one of a given set of dates prior to the final maturity of the swap (an option that can be exercised at any time is called American).
While we do not know exactly how much of the total market value of interest rate derivatives is made up by Bermudan swaptions, such a breakdown would not do full justice to their importance anyway, as the numbers cited do not account for options and option-like features of Bermudan nature built into simpler fixed income instruments, like callable bonds and mortgage backed securities. Consider that roughly 60% of bonds included in the Bloomberg Barclays Global Aggregate Credit Index-a widely followed benchmark for the corporate bond universe-are issued with call provisions giving issuers the right to repurchase them at par at certain dates in the future. The call provision may be added to increase the coupon, and hence the perceived attractiveness of the issue, but it rarely stays with the issuer, who will instead typically seek to strip the call option and sell it-e.g., in the form of a cancelable swap-to an exotic dealer desk to lower the effective cost of the issuance. For example, Goldman Sachs analysts estimate that Bermudans embedded in 70% of the so-called Formosa bonds (bonds that are denominated in a foreign currency, but registered and listed on the Taipei Exchange) acquire a new life being passed on to exotic dealer desks who tend to actively manage this risk by selling European swaptions with risks most closely mimicking that of the Bermudan (effectively providing "Vega supply" to the market) ("Formosa issuance and USD rate vol supply", Global Rates Insight, Goldman Sachs, 13 February 2020). Other notable natural buyers of such stripped Bermudan swaptions are mortgage originators or issuers of mortgage-backed securities (such as, e.g., Fannie Mae and Freddie Mac in the US), who seek to offset the risk of prepayment options inherent in typical fixed rate mortgage products.
What is special about Bermudan swaptions, and what explains their important role in the financial system, is the complex risk profile stemming from the multiple exercise feature. It turns out that accounting properly for this exotic feature poses a considerable challenge and has been a major driver of the (r)evolution in interest rate modeling-a story that we intend to tell below.
From a mathematical standpoint, the early exercise property gives rise to a free boundary problem not unlike the so-called Stefan problem encountered when analyzing the temperature of water bordering a melting block of ice: if there are N exercise dates T 1 , T 2 , ..., T N , then at each T i , i ∈ {1, 2, ..., N}, the Bermudan holder must decide whether to retain the option or exercise by entering into the i-th co-terminal swap. Naturally, for each T i there will be some critical value of the underlying instrument below which it no longer makes sense to exercise. This reduces the problem of valuing the Bermudan to determining the optimal exercise boundary that separates the "region" where exercise is optimal from the "region" where it is optimal to continue holding the swaption-with obvious analogies to Stefan problems for the location of the interface between water and melting ice (see, e.g., [1]). Such parallels have been known for decades and fruitfully exploited, at least in the case of early exercise boundaries for American-style equity options. And while an early exercise boundary as such cannot be solved for in closed form, it can be represented using an integral equation to be solved iteratively [2] and its upper and lower bounds can be rigorously determined. Barles, Burdeau, Romano, and Samsoen [3] even show that near expiration these upper and lower bounds approach each other, which leads them to postulate an approximation for the value of the option.
Unfortunately, those results do not carry through to interest rate options in a straightforward way. For one, the partial differential equation governing the evolution of the stock option price (the Black-Scholes Partial Differential Equation(PDE)) has been derived under the assumption that the underlying instrument, a share of common stock, is a tradable asset. After all, as shown by Black and Scholes [4], the self-financing dynamic trading strategy replicating the final payoff of the option is implemented by taking positions in, i.e., buying and selling, units of the underlying instrument. That underlying instrument may be-as initially considered-a stock or a commodity futures contract, but not an interest rate that is not tradable-one cannot "buy" or "sell" a yield on, say, a 10-year bond, a swap rate or the LIBOR. As it turns out, this obstacle can be overcome by a judicious application of the change-of-measure technique ( [5]; more on this below). But even if one can come up with an equivalent probability measure under which a given underlying interest rate is a martingale, and thus price the option by simply taking expectation of the terminal payoff, a more profound problem arises: how to ensure consistency between options on different interest rate underlyings? The crucial difference between interest rates and, say, stocks-other than tradability-is that individual interest rates are simply pieces of a greater whole called the term structure, bound together by no-arbitrage relations and complex correlation patterns. This becomes immediately clear when we consider a swaption where the underlying is a (stochastic) average of a number of forward LIBOR rates. As explained above, a Bermudan can even be viewed as a "best of" chooser option to optimally select and enter into one of potentially many co-terminal swaps spanned by the contract. As such, it is by definition an instrument driven by complex dependence patterns among interest rate underlyings-a phenomenon we shall have more to say on later. All this implies that handling interest rate derivatives, and Bermudans especially, requires a consistent model of the term structure. And since different approaches have been proposed to the modeling of interest rates, so too there have been-and still are-different ideas about pricing and risk-managing Bermudan swaptions.
Although it is impossible within the confines of this essay to do justice to the large and still growing literature in the area of interest rate modeling ( [6,7] and vol. II are very good general references), we may, at least at this stage, identify two broad schools of thought: low-factor short-rate models and multi-factor market models. The first camp, drawing inspiration from Vasicek [8], starts from a fundamental assumption that the dynamics of the whole yield curve is driven by the "instantaneous spot rate", or short rate, r(t), at which interest accrues continuously in a locally risk-free way. This definition leads to the so-called money market account, B(t), which evolves according to: and induces the risk-neutral measure Q making all discounted asset prices martingales. Although the short rate is a purely artificial mathematical construct (all be it with an intuitive interpretation), it determines prices of all discount bond that are simply Q-expectations of the future path of r(t), T t r(s)ds. And since all interest rates are derived from bond prices, it follows that the entire discount curve can in principle be parameterized by stipulating a suitable stochastic process for r(t)-a point supported by a large body of empirical evidence suggesting that the first principal component typically accounts for about 90% of the variation in the entire term structure. With two factors-two state variables feeding through to the evolution of r(t)-one would expect to cover well over 95% of yield curve variation. Apart from parsimony and consistency, short rate models have another considerable virtue that came to explain their initial success as tools-of-choice for the valuation of path-dependent interest rate instruments such as Bermudan swaptions: they are relatively easy to implement numerically in lattices, i.e., finite differences or trees, which can efficiently handle the early exercise boundary through backward induction.
However, these virtues do not come without vices to match. With only one or two factors, it might be difficult to generate all practically relevant yield curve shapes (two Principal Components Analysis PCAs would still leave about 5% of variation unexplained) and match a large set of market prices-especially with intuitive and economically justifiable parameter values (using a large set of calibration instruments is necessary if one wishes to consistently apply a single calibrated model to many kinds of exotic interest rate derivatives). More importantly, a low number of factors (one or two) may limit the degree of de-correlation achievable among interest rates, artificially lowering prices of Bermudan swaptions (when correlation among rates is high, the right to choose among different swaptions is lower than in the case when interest rates are decorrelated and subsequent payoffs are unpredictable). Longstaff, Santa-Clara, and Schwartz [9] even thought that the errors and pitfalls of using single-factor models amounted to "throwing away a billion dollars" and in their influential paper argued for a move towards a multi-factor setting. In fact, the stage in that respect had already been set in the form of so-called LIBOR/swap market models [10][11][12] and their later extensions. Models in this class are generally expressed in terms of discretely compounded market rates (hence the term "market models"), each of which can be shown to be a martingale under an appropriate forward measure. This perspective naturally allows for multiple sources of randomness (often there are more than 50 state variables), with almost arbitrarily complex dependence structure, and offers many degrees of freedom to fit observed market patterns, even including the so-called implied volatility smiles (e.g., [13,14] or [15]; cf. also [16][17][18][19][20][21][22]). Whether the number of factors, in and of itself, is enough to improve the pricing of Bermudans has been persuasively contested [23] and we shall review the main arguments presented in this debate. It is clear, however, that while increasing dimensionality improves calibration, it does so at the cost of efficiency and numerical accuracy: with a high number of state variables, lattice techniques are no longer viable, and even if viable (cf. e.g., [24])-not really practicable. One therefore is left with Monte Carlo simulations (with some variance reduction technique to reduce bias). This presents an obvious problem: by evolving the underlying state variables forward in time, the simulation cannot easily determine the optimal time to exercise. And while efficient techniques of circumventing that problem have since been proposed (cf. especially the classic approach of [25,26] for a very good general reference), the exercise region and corresponding Bermudan swaption prices will necessarily be estimated-not exact.
The two approaches sketched broadly above, the single-and multi-factor ones, are like Scylla and Charybdis, the mythical sea monsters, lurking for the Bermudan swaption modeler: pass too close to Scylla, and your model shall crash against the rocks of computational challenges and estimation biases of the multidimensional setting; move towards Charybdis and risk getting sucked by a whirlpool of artificial parameter values and unachievable calibration targets. And yet, unlike in the classical Greek tragedy, we believe there is safe passage between the two hazards, and one that does not involve losing six strongest and bravest companions-a cost tragically borne by Odysseus. Thus, in what follows we try to tell the story-the odyssey-behind the evolution of Bermudan swaption pricing in greater detail, reviewing in a much more formal way the main ideas and solutions underlying both single-and multi-factor models. We discuss their relative merits and drawbacks, presenting especially the crux of the consensus-shattering debate between Longstaff, Santa-Clara, and Schwartz [9] and Andersen and Andreasen [23]. Finally, building on Gatarek and Jabłecki [27], we show our dimension reduction approach based on the concept of Markovian projection pioneered by Gyöngy [28] and independently adapted to the financial landscape by Dupire [29] under the guise of "local volatility". Our idea consists essentially in showing that for purposes of pricing Bermudan swaptions, a potentially complex and multidimensional process for the swap rate can-using the tools of Markovian projection-be collapsed to a one-dimensional one. Naturally, the diffusions of the two processes will differ, but it turns out the early exercise boundaries will approximately agree, with pricing errors within normal bid-ask spreads.
The rest of the essay proceeds as follows. Section 2 presents notation and formally introduces the relevant interest rate instruments. Section 3 presents the evolution of term structure modeling, from single-to multi-factor approaches, zooming in on implementation and calibration aspects. Section 4 then discusses the specific challenges related to applying the models in the context of pricing Bermudan swaptions, and revisits the issue of their factor dependence with some numerical tests. Finally, Section 5 presents a novel argument in the factor-sensitivity debate, demonstrating an elegant extension of the classic result due to Gyöngy [28] on Markovian projection and dimension reduction. Section 6 draws conclusions.

Notation, Modeling Setting and Instruments
Consider a continuous-time economy together with a probability space (Ω, F , P) with a frictionless and arbitrage-free market for zero coupon bonds, i.e., contracts that guarantee holders the payment of one unit of currency at maturity, with no intermediate payments.
Let P(t, T) be the time t price of a zero coupon bond maturing at time T such that P(t, t) = 1 for every t. We assume that such a P(t, T) exists for every 0 < t < T < ∞ and for a given t, P(t, T) is differentiable with respect to maturity time T. The instantaneous forward rate f (t, T) with maturity T contracted at t is defined by: The instantaneous spot rate r(t)-i.e., the short rate-has already been loosely defined above and can now be formalized as: As discussed, r(t) can be thought of as capturing the locally risk-free return from a continuously compounded money market account B(t) ≡ exp t 0 r(s)ds with dynamics dB(t) = r(t)B(t)dt, B(0) = 1. The significance of the money market account in modern finance stems from the fact that it establishes a connection between the economic concept of absence of arbitrage and the mathematical property of existence of the equivalent martingale measure (risk-neutral measure). Specifically, as shown by Harrison and Kreps (1979) and Harrison andPliska (1981, 1983), postulating that our market is arbitrage-free is equivalent to stating that there exists a measure Q on (Ω, F ), equivalent to P, such that all asset prices discounted by B(t) are Q-martingales. In other words, in our economy, for any T we have: so that time-zero bond (asset) prices are Q-expectations of their terminal values (payoffs). We will explore (4) in Section 3 in greater detail as it gives rise to an important class of term structure models, but for now let us simply observe that when we allow for stochastic interest rates-as we naturally should when dealing with interest rate derivatives-the martingale measure Q may not be the most convenient or natural to work with. Thus, Geman et al. (1995) prove that for any positive non-dividend-paying asset N (so-called numeraire) there exists a measure Q N , which is equivalent to the original martingale measure Q, making the prices of all claims normalized by N Q N -martingales. Furthermore, the Radon-Nikodym derivative defining Q N is given simply by B(T)N(T)/N(0). We now move on to introducing other instruments. For this we define a uniformly spaced tenor structure: and set δ n = T n − T n−1 for n = 1, ..., N. A spot LIBOR rate over the interval (T n−1 , T n ) is given by the formula: whereas its forward counterpart, forward LIBOR, is given by: F n (t) = F(t, T n−1 , T n ) = P(t, T n−1 ) − P(t, T n ) δ n P(t, T n ) .
A fixed-for-floating interest rate swap (IRS) with unit notional, fixed rate (coupon) K, and a specified tenor structure T = {T n } β n=α+1 is a contract whereby two parties exchange differently indexed cashflows over a pre-agreed time span. Specifically, on each date T n ∈ T , the fixed leg pays δ n K, whereas the floating leg pays the floating LIBOR rate given by (6). When the fixed leg is paid, the IRS is called a "payer", and conversely the swap is called a "receiver." An interest rate cap/floor is a payer (receiver) IRS in which each payment is executed only if it has positive value. It has become market practice to quote prices of caps in terms of Black's implied volatilities, i.e., parameters σ α,β , which plugged into the formula: give the market price. The Black model itself is inconsistent with market quotes, which tend to exhibit a pronounced implied volatility smile/skew. The forward swap rate S α,β (t) corresponding to the tenor structure T is the rate in the fixed leg that sets it equal to the floating leg and hence makes the net present value of the transaction equal zero: The portfolio of zero-coupon bonds in the denominator of Equation (9) is the so-called annuity factor, for which we will sometimes use a continuous-time definition, writing Note, in passing, that since N α,β (t) is positive and P(t, T α ) − P(t, T β ) is a tradable asset (a portfolio of "long" the T α bond and "short" the T β one), it follows that S α,β (t) is a martingale under Q α,β induced by N α,β (t).
A European payer (receiver) swaption with strike K, maturity T α and tenor T β − T α (henceforth referred to also as T α × (T β − T α ), or T α -into-(T β − T α )) is simply an option that gives the holder the right to enter at T α into a payer (receiver) swap that matures at T β and entitles to pay (receive) a fixed rate K in exchange for a floating LIBOR rate on the tenor dates T . Thus, the time zero price of a European payer swaption with unit notional is given by: Analogously, as with caps/floors, market practice is to quote swaption prices in terms of Black (or more recently-Bachelier) implied volatilities; i.e., parameters σ α,β , which plugged into the Black formula retrieve the market price, e.g., for a payer swaption: Again, from a theoretical point of view, applying the Black model to price swaptions would be inconsistent as the swaptions market exhibits a pronounced implied volatility smile.
Finally, a Bermudan receiver (payer) swaption is an option to enter at any time T i , i ∈ {α, α + 1, ..., β − 1} into a swap that terminates at T β and gives the holder the right to receive (pay) a pre-determined fixed rate K in exchange for floating LIBOR. The period up to T α is called the lockout or no-call period, and hence a Bermudan swaption with final exercise date T β−1 and first exercise T α is often called "T β no-call T α ", or "T β ncT α ." For example, a 11nc1 swaption with annually spaced exercise dates can be exercised at the beginning of any year, starting from year 1. By exercising the option, the holder enters a swap starting at the time of exercise (i.e., years 1, 2, 3,..., 10) and ending at year 11. Accordingly, the price of the "T β no-call T α " Bermudan receiver is given by:

The Modeling Panorama
This section presents an overview of the main approaches to modeling the dynamics interest rates and their term structure. We focus on the distinction between (low-factor) short-rate models and the (multi-factor) market models before presenting the main arguments in the debate as to which school of thought is more appropriate for dealing with contracts with early exercise features. Our treatment is purposefully general and selective, so we refer readers to the excellent magnum opuses of Brigo and Mercurio [7] or Andersen and Piterbarg [6] for a more detailed account.

One-Factor Short-Rate Models
As we have seen in (4), in an arbitrage-free market bond prices are given as riskneutral expectations of a functional of the process of the short rate. And since all kinds of interest rate instruments are themselves defined in terms of bond prices, the whole term structure, or zero-coupon curve, can, in principle, be characterized by the distributional properties of a single state variable-the short rate.

Vasicek
In a seminal contribution, Vasicek [8] leveraged the latter idea by postulating that the dynamics of the short rate (in the real-world measure) is governed by a stochastic differential equation made up of a deterministic mean-reverting component and a stochastic part with a constant diffusion coefficient (later on a variant of the same model with a diffusion coefficient proportional to the short rate itself was proposed by Cox et al., 1985). Formally, Vasicek [8] assumed that r follows an Ornstein-Uhlenbeck process of the form (for practical purposes we present the model in the risk-neutral setting, equivalent to the original formulation provided a specific form of the market price of the risk parameter is chosen): where k, θ and σ are positive constants representing, respectively, the mean reversion speed, the long-term average rate and volatility. The above equation can be integrated to express r(t) as a function of its value at any chosen prior time point s: Thus, r(t) is normally distributed with a conditional mean given by r(s)e −k(t−s) + θ 1 − e −k(t−s) and variance σ 2 2k 1 − e −2k(t−s) . Initially, the positive probability of r < 0 drew some criticism of the model but this property can hardly be held against it in current conditions. And while negative values may appear, they are unlikely to persist, as the instantaneous drift term k[θ − r(t)] ensures that the process is pulled towards its long-term (positive) mean θ at a reversion speed k. Another pleasant feature of the model is that the expectation (4) can be calculated explicitly yielding bond prices of the form: where: Hence, the whole zero-coupon curve is given by a function T → P(t, T; k, θ, σ, r(t)). To use the model for valuation purposes, we need to parameterize this function to make sure that the model incorporates a given observed zero-coupon bond curve T → P M (0, T). This typically consist in running an optimization to find {k, θ, σ} such that T → P(t, T; k, θ, σ, r(t)) is sufficiently close to the market. More generally, consistency with the market is not an inherent feature of the model, but rather needs to be ensured by a judicious choice of parameters. Unfortunately, this is sometimes rather difficult to do, and even outright impossible for some yield curve shapes (e.g., a so-called inverted yield curve typically considered to be a leading indicator of recessions).

Hull and White
The inability of the model (15) to reproduce some typical yield curve shapes presented a major obstacle: if it is impossible to fit the current term structure of interest rates, how can the model be relied upon to price consistently more complex products? To remedy this problem Hull and White [30] have suggested a simple yet profound extension of Vasicek [8] assuming the following dynamics for the short rate: with the mean reversion now depending on time (the most general version of the Gaussian single-factor model allows all parameters to be functions of time; see [6], vol. II).
Integrating (17) now gives: which, plugged into (4), implies after straightforward, if tedious, algebra that ensuring consistency with the spot curve requires: The last term in the above is an adjustment required to preclude arbitrage [31], and it tends to be very small, so that the drift of the short rate is approximately -and hence r follows the slope of the time zero instantaneous forward curve. Moreover, much like in the Vasicek model, we can write down bond reconstitution formulas explicitly: where this time:

Black and Karasinski (1991)
As already hinted above, the possibility to generate negative interest rates has historically been viewed as an important drawback of any interest rate model. And while the Hull-White model improved upon the original Vasicek version in several important respects, it still implies a normal distribution for the short rate, and with it a non-zero probability of negative rates. Accordingly, various attempts have been suggested to overcome that feature, perhaps most notably by Black and Karasinski [32], who postulated the following general process for the logarithm of r(t) under the risk-neutral measure: whereby of course typically one would set σ(t) = σ and k(t) = k, i.e., positive constants. A straightforward application of Ito's lemma shows that: with the solution, for each s ≤ t, Now r(t) is lognormally distributed, precluding the possibility of negative rates. This certainly made sense in the 1990s and early 2000s given the market practice of quoting caps and swaptions prices in terms of (lognormal) Black volatilities, and goes a long way towards explaining the general appeal and early popularity of the model among market practitioners. However, a clear drawback of the model is that it no longer provides closed form bond reconstitution formulas, which renders it much less analytically tractable than Hull and White [30].

Extensions: Two-Factor Hull-White and Cheyette Models
Although relatively simple, intuitive and analytically tractable, single-factor models also tend to share some undesirable properties that become apparent especially when one considers their application to pricing instruments with more involved payoffs. It is perhaps easiest to appreciate this point by recalling that in the Vasicek [8] model all bond prices can be computed directly from Equation (16) and-by extension-all zero coupon yields/spot rates are given as simple affine functions of a single state variable r. This implies in particular that at any instant the model-implied interest rates of all maturities are perfectly correlated-hardly an accurate description of economic reality. Exactly how big a problem that is for the valuation of Bermudan swaptions has been at the very core of the famous debate in the literature (i.e., [9,23] and others), which has hugely influenced both the theory and practice of pricing Bermudans, and one to which we shall return in greater detail in Section 4.2. However, at this point let us simply acknowledge that striving for a more realistic correlation pattern has driven interest in multifactor models, of which we discuss below two important examples: the two-factor Hull-White model and the quasi-Gaussian model of Cheyette [33].

Two-Factor Hull-White (2FHW)
The two-factor Hull-White model (described in [34], see also the classic exposition in [7]) is described by the following system of stochastic differential equations (without loss of generality we change the formulation slightly to better present the two additive factors): where 0 < k 1 < k 2 , σ 1 (·), σ 2 (·) are either constants or deterministic functions of time, θ(t) is deterministic, and finally-(W 1 , W 2 ) is a two-dimensional Brownian motion with instantaneous correlation of the form dW 1 (t)dW 2 (t) = ρdt with ρ constant and obviously ρ ∈ [−1, 1]. Simple integration shows that: which in turn implies that letting t → ∞ both our state variables mean revert to their initial values of zero. It follows that: mean reverts to θ(t). Despite the two-dimensional setting, many features carry through from the single factor case. In particular, we can derive the following bond reconstitution formula: where the convexity correction term V (t, T) can be calculated exactly to be a deterministic function of k 1 , k 2 and ρ (see, e.g., [7] for details). Using (25) we can also determine the functional form of θ(t), which guarantees matching exactly the time-zero term structure of discount factors by solving the following:

Cheyette
Another two-factor (or perhaps more accurately, one-and-a-half-factor) representation of the term structure is afforded by a family of models considered originally by Cheyette [33], Ritchken and Sankarasubramanian [35] and later also others (cf. [36][37][38][39]). These models start from the Heath, Jarrow, and Morton [40] specification of instantaneous forward rates dynamics: and assume that the instantaneous volatility function is separable into a time-and ratedependent term-i.e., the short rate volatility σ(t)-and a maturity-dependent term, h(T): Such volatility of the term structure obviously implies: where H(T) = T 0 h(u)du. Substituting (28) and (29) into (27) and integrating yields: so that: Solving the above for the stochastic integral and plugging into (30) gives: Let x(t) denote the deviation of the short rate from its value implied by the forward curve at inception: and y(t) denote the cumulative short rate volatility: The forward rate can now be written in terms of just these two state variables, x(t) and y(t), as: which, along with (2) and the fact that , yields the following reconstitution formula for the price of any zero coupon bond: As suggested by Cheyette [33] and Ritchken and Sankarasubramanian [35], a natural choice is to set h(t) = exp(−κt) with κ being some small and constant number. Then (36) becomes: Furthermore, the dynamics of the two state variables can be expressed in terms of their current values through differentiation of (33) and (34) as: Thus, with the two state variables (x(t), y(t)) we obtain a two-dimensional Markovian model of the term structure. The yield curve model features an upward drift representing forward curve steepening due to volatility (a "convexity correction") as well as mean reversion captured by κ. Note that although y(t) does not have a diffusion term, it is not deterministic-i.e., not known with certainty at time 0-as it captures information revealed over the time interval [0, t]. Thus, the Cheyette model can be thought of as having one-and-a-half factors. So far, and in the original formulations of the model [33,35], no restrictions were placed on the volatility of the short rate. The Markovian representation is valid even if σ(t) is stochastic or depends on the entire term structure at time t. The fundamental problem in calibration and pricing consists of course in determining the function σ(·), allowing it to be either state-and time-dependent (as in [36,39,41] or [38]) or fully stochastic [42].

Multi-Factor Market Models: LMM and DD-LMM
The final important category of models that we discuss are the market models of the type originally proposed by Brace, Gatarek, and Musiela [10], Jamshidian [11], Miltersen, Sandmann, and Sondermann [12] and later developed and extended in multiple directions by many others. The common conceptual thread of these models is a departure from modeling an unobserved, artificial concept such as the short rate in favor of quantities directly observable and quoted by the market-the LIBOR or swap rates. To fix ideas, consider a tenor structure such as in (5) on which a family of forward LIBOR rates is defined for each of the expiry-maturity pairs (T n−1 , T n ), n = 1, ..., N as in (7) (each rate is "alive" until expiration whereupon it becomes the spot LIBOR rate introduced in (6)). Since the discount bonds P(t, T n−1 ) and P(t, T n ) are tradable assets, F n (t)P(t, T n ) expresses the price of a tradable asset and thus must be a martingale under the forward measure Q n associated with the numeraire P(·, T n ). Thus it is natural to assume the following driftless dynamics for each F n : where W n (t) is a standard Brownian motion under Q n and the instantaneous correlation between individual shocks is given by: Even such a brief introduction allows us to comment on the main virtues of the model. First, the driftless lognormal dynamics (40) ensures that the model-implied prices of caps/floors are given by Black's formula-in line with market practice (an analogous model written for the forward swap rate prices swaptions with Black's formula-again consistently with market practice). None of the previously discussed short rate models could boast such a feat. Second, unlike the one-factor short rate models [30,32], which unrealistically imply perfect instantaneous correlation between forward rates, LMM can easily produce a much richer and more realistic instantaneous correlation structure. On the other hand, with potentially a large number of factors involved, recombining trees or PDE methods is not feasible, and pricing needs to be based on Monte Carlo simulations. To this end, the dynamics of all rates need to be expressed in a single measure Q i : with drifts calculated using the measure change technique as: Although the original LMM formulation is not consistent with the implied volatility smile/skew observed in interest rate markets, and precludes the existence of now pervasive negative interest rates, a relatively straightforward extension can be introduced that addresses both issues. The quick fix entails shifting the lognormal dynamics (40) by a forward-rate specific parameter α n (typically treated as constant) so that: The distribution of each F n (t) thus becomes a shifted lognormal distribution and while the quantities (F n (t) + α n ) still have to be positive, rates as such are permitted to be negative and bounded from below by −α n . More importantly, a shifted lognormal process can consistently generate a (negatively) skewed pattern in Black implied volatilities. Yet another feature of market reality to take into account, especially in the post-crisis environment, is the existence of "basis" between different money market curves reflecting LIBOR rates' lack of risk-free character (see [43] esp. chap. 4 for a good general overview). In practice, this leads to so-called dual-curve pricing, whereby the true risk-free rates (e.g., overnight index swaps, OIS) are modeled and evolved under Monte Carlo and the basis adjustment is applied to the OIS rates in calibration and the pricing of derivatives is derived from LIBOR-based underlyings. In the simplified case of a constant (time-independent) basis the forward LIBOR rates are derived by adding a fixed spread to the simulated paths of the forward OIS rates with no need for extra simulation. To simplify presentation and given that counterparty credit risk issues are not central to the main thrust of the argument, we shall simply assume a flat zero basis. Thus, the model parameters to be calibrated are the functions σ n (·), the shift values α n and the instantaneous correlations ρ n,m . For some products this would entail finding the best fit among potentially hundreds of "free" parameter candidates, rendering the whole procedure prohibitively costly. Therefore, for practical purposes, parametric forms for the volatility term structure and correlation are typically imposed (see, e.g., [44] or [7] for a good discussion of potential choices). In line with well-established practice, we opt for a piecewise constant volatility function (leaving us with N(N + 1)/2 parameters), and Rebonato's classic two-parameter full-factor correlation structure of the form ρ n,m = ρ ∞ + (1 − ρ ∞ ) exp(−ρ dec |n − m|), where ρ ∞ is the base correlation to which off-diagonal matrix entries exponentially decay at a rate of ρ dec . To price Bermudans in LMM (in the displaced diffusion form), we calibrate the model to the entire swaptions matrix, including both ATM and OTM volatilities, making use of approximate swaption pricing formulas [45,46]. Specifically, we take advantage of the fact that the swap rate is a stochastic average of forward rates and thus follows itself in approximately (shifted) lognormal dynamics.

Bermudan Swaption Pricing
Before we turn to pricing Bermudan swaptions more formally, it is useful first to consider some stylized facts related to the economic risk profile of the instrument. As is evident from the preceding discussion, each Bermudan swaption spans a basket of European co-terminal options, all struck at the same rate K, of which only one can be exercised. This basket nature of Bermudan swaptions has two implications.
First, compared to European counterparts, Bermudans typically have much higher sensitivity to changes in implied volatility, i.e., Vega, and their volatility risk profile is also more distributed along the whole term/tenor structure, rather than concentrated at a single point. On the one hand this underscores the attractiveness of Bermudan swaptions as a hedging instrument for investors in callable products, but on the other it suggests that hedging or replicating the risk profile of a Bermudan requires multiple European swaptions to match the former's Vega buckets. For example, the 11nc1 Bermudan receiver swaption priced using current data and a calibrated HW1F model has 1 bp Vega (defined as the change in market value for a 1 bp shift in volatilities of the co-terminal swaptions) of 0.06% of notional, while Vegas of its co-terminal Europeans range from 0.009% (10Y × 1Y) to 0.048% (3Y × 8Y). More importantly, as shown in Figure 1 for a 20Y Bermudan receiver swaption, the volatility risk profile is distributed along the spectrum of term/tenor buckets.
Second, to preclude straightforward arbitrage, a Bermudan swaption must be at least as valuable as any European swaption spanned by it. Consequently, the price of the most expensive European swaption spanned by a given Bermudan tenor structure must be the lower bound on the price of the Bermudan swaption itself. This valuation difference between a Bermudan and its Most Expensive European (MEE) swaption-called the Bermudan-MEE gap, or B/E basis-is a useful simple handle for the economic value of the Bermudan exercise right, i.e., the privilege of having all other exercise possibilities available. Indeed, Qu [41] suggests analyzing the MEE basis in historical perspective as an indication of potential pricing anomalies and a handle for the model risk inherent in the chosen Bermudan pricer (see also [47]). Figure 2 shows the price of the 10Y Bermudan receiver plotted against the prices of its co-terminal European swaptions, all struck at the same rate, K = 1.66%. Evidently, the Bermudan price is higher than each of the Europeans in the "basket" and the gap is tightest for the 3Y-into-8Y swaption, which is therefore the MEE. Both the size of the MEE gap and the MEE swaption itself are likely to depend on a combination of market factors, most importantly yield curve level and steepness, as well as forward swap rate volatilities and correlations. For example, while the 3 × 8 swaption is the MEE for the 1 × 10 Bermudan at current prices, under a significant fall in market interest rates, a 1 × 10 swaption would become the MEE, and should rates rise a 6 × 5 European would become the MEE. The MEE basis in these scenarios would fluctuate from 30 to 200 bp (Figure 3).

Numerical Methods for Pricing Models
Consider now a tenor structure T = {T n } β n=α and a "T β no-call T α " Bermudan receiver swaption introduced above with time t value RBS α,β (t, K). Assuming no prior exercise, at any time point T n the swaption holder has the right to receive the exercise value V e of the swaption, i.e., present value of the underlying swap: The exercise value has to be compared to the so-called continuation value, V c , of holding the option beyond T n : The value of the Bermudan swaption can now be given in terms of (45) and (46) via a dynamic programming recursion: for j = β − 2, β − 3, ..., n. The evaluation of (47) proceeds backward in time: at T β−1 the value of the Bermudan is known and determined by the standard swaption payoff. This allows us to update the continuation value at T β−2 by discounting and comparing it to the exercise value prevailing at the time. The procedure of comparing "backwardly-cumulated" continuation value with the immediate exercise value and deciding upon a swaption exercise is repeated until the initial valuation date is reached, at which point the algorithm yields a price estimate for the Bermudan swaption. The calculation of the continuation value is clearly modeldependent and the choice of modeling framework itself often determines the scope of available numerical techniques. Again without pretense of being exhaustive, we shall consider and briefly discuss three main approaches-binomial/trinomial trees, finite difference schemes based on the derivative PDE and Monte Carlo simulation-traditionally applied in the context of the model categories described above. Our focus will not be on the technical details-for which we refer readers to Glasserman [26], Tavella and Randall [48] as well as Mitchell and Griffiths [49]-but rather on the connection between the modeling framework and numerical implementation techniques.

Tree-Based Methods
Perhaps the most classic approach for pricing derivatives with early exercise features, including Bermudan swaptions, is by building a binomial/trinomial tree for the underlying variable using its discretized SDE (forward induction method). Such methods are only feasible for low-dimensional processes (ideally one-or two-dimensional), which is why they have been extensively used in modeling one-factor short rate models, all of which can be approximated with lattices. Specifically, when approximating a one-factor interest rate diffusion on a trinomial tree, it is assumed that at each time t i from a discretized domain of time points 0 = t 0 < t 1 < ... < t n = T (note that the time points in general do not need to be uniformly spaced) there is a finite number of equispaced interest rate states r i,j (i corresponds to the time point and j to the space index), and at each node (i, j) the short rate can move to r i+1,k+1 , with probability p u ,r i+1,k with probability p m , or r i+1,k−1 , with probability p d , whereby the branching parameters are determined in such a way as to ensure that the conditional mean and variance match as closely as possible their model values. For example, in a simple brute-force implementation of the Hull-White model (17), the probabilities associated with the node (i, j) are: where µ i,j = (θ(i∆t) + κr i,j )∆t is the drift and we have omitted the dependence of transition probabilities on nodes to ease notation. Clearly, building a tree requires as an input the values for short rate volatility σ and mean reversion speed κ, while for numerical stability and convergence considerations ∆r = σ √ 3∆t (in fact, this procedure can be improved upon as explained, e.g., in Hull and White [50]).
The Hull-White model is particularly convenient for pricing Bermudans because knowledge of the short rate is tantamount to the knowledge of all bond prices through the closed reconstitution Formula (20), and thus once an interest rate tree is built along the lines sketched above, we can immediately value all relevant payoffs. Thus, starting from T β−1 , i.e., the last time the swaption can be exercised, and using the readily available bond prices, we calculate for all nodes j in the current column the value RBS j α,β (T β−1 , K) as: We then propagate these continuation values backward in time through the interval T β−1 = t d(β−1) > t d(β−1)−1 > ... > t d(0) = T β−2 using the Formula (46) and the transition probabilities, obtaining "backwardly cumulated" continuation values for each node: Once the exercise point T β−2 is reached we check the exercise opportunity by comparing V j c (T β−2 ) with the exercise value: and update the option value accordingly. Proceeding in this way we finally reach the first exercise date T α and subsequently the initial node of the tree with the time 0 swaption price. While the method sketched in general terms above can be optimized and/or generalized along various dimensions (e.g., varying the discretization step, or including backward propagation of bond prices where no analytical reconstitution formulas are available as in the Black-Karasiński model) it should come as no surprise that the tree can only efficiently handle low-factor short rate rate models. Building a binomial tree for the Hull-White twofactor model is still relatively straightforward; however the added layers of computational complexity seem to defeat the purpose of using a simple (simplistic?) model of the term structure in the first place.

Finite Difference Schemes
The trinomial trees discussed above are in fact a special kind of the so-called explicit finite difference scheme and as such are only conditionally stable. A more efficient method of handling Bermudans in low-factor models, by directly discretizing the partial differential equation for the swaption price, can be deployed as follows. Recall first that if x(t) is the state variable of the model-driving the discount curve and all its derivables-then by no-arbitrage arguments, the swaption price V(t, x) can be expressed as the risk-neutral expectation of the terminal payoff V(T, x(T)). Now, since V(t, x) is a martingale, its drift must be zero (in the appropriate measure), and hence applying Ito's lemma and writing down the dynamics dV, we find that V(t, x) must satisfy the following parabolic partial differential equation (PDE): where is the infinitesimal operator for the SDE describing the dynamics of the state variable. For example, in the special case of the Hull-White model, the European swaption price solves the following parabolic PDE: ∂V(t, r(t)) ∂t Analogously, in the Cheyette model, i.e., Equations (38) and (39), the pricing PDE takes the following form: ∂V(t, x(t), y(t)) ∂y .
Associated with both equations is of course the known terminal swaption payoff condition.
To solve Equation (51) numerically, we discretize it on some finite rectangular domain [0, T] × [x min , x max ], where the grid boundaries x min and x max are either determined a priori as some multiples of the variance of x(T) or within the PDE itself. The partial derivatives are then replaced by corresponding finite difference operators, with schemes differing i.a. in the nature of approximation. The central difference approximation of ∂V/∂t, known as the Crank-Nicolson scheme, is often the method of choice (second-order convergent in ∆t), whereas the backward approximation-given the explicit scheme-should be avoided for stability and convergence considerations. Specifically, for the Crank-Nicolson scheme, where I is the identity matrix, V(t) = [ V(t, x min ), ..., V(t, x max )] and V(t, x) stands for the approximation to the true solution V(t, x), and T is a tri-diagonal matrix of coefficients derived from approximating PDE terms. For a known V(t i+1 ), (53) is simply a linear system of equations to be solved by standard methods. Thus, starting from the terminal condition V(T, x)-where the swaption payoff is known-we use Equation (53) to iterate backwards in time until we get to V(0). Of course, since we are dealing with Bermudan swaptions, at each time step corresponding to exercise time, we additionally need to compare the rollback value with the exercise value-just as has been done on the trinomial tree.
Multi-dimensional PDEs for pricing within the two-factor Hull-White or Cheyette models can still be handled efficiently using the so-called alternating direction implicit method (ADI), introduced by Craig and Sneyd [51]. However, it should be borne in mind that the computational complexity of such problems grows exponentially in the dimension, making it prohibitively inefficient to use finite difference schemes for pricing Bermudans in truly multi-factor models of the term structure.

Monte Carlo
An obvious conclusion would therefore be to switch to Monte Carlo simulation, where the computational cost grows only linearly in dimension. Unfortunately, by design, Monte Carlo simulation works forward in time and therefore dynamic programming of the kind we used above to value Bermudans is not straightforward to implement. A brute-force approach to computing the conditional expectation underlying the continuation value at T n , as per Equation (46), would consist in nesting a new simulation-within the original one-with an initial condition S n,β (T n ), for every T n and each simulation path, and then calculating the expectation directly. Thus, if the simulation consistently uses M paths, then at time 0 we have M paths, at t 1 we generate additional M 2 paths, and so on until at the penultimate exercise date t N = T β−1 we produce another M N simulation paths (this time spanning just the final fixing period [T β−1 , T β ]). Clearly, even for a modest choice of M and N the computational burden of such an embedded scheme becomes prohibitive.
This subtle yet crucial point had arguably been one of the key hindrances holding back the financial industry's embrace of multi-factor market models. Whether many factors per se are desirable for accurate valuation of Bermudans or not-we shall see below that the question is in general not trivial-if they are not amenable to efficient numerical implementation, then they are of little practical significance. A breakthrough came with Longstaff and Schwartz [25], who adapted earlier ideas found in Tilley [52] and Carriere [53], suggesting that the continuation value (46) can be interpreted as a regression of the one-step-ahead payout on the state variables. Indeed, recall that V c (T n ) = E RBS α,β (T n+1 , K) S n,β (T n ) , n = β − 1, β − 2, ..., 1.
Since the conditional expectation of the swaption payoff is an element of the space of square-integrable functions, it has a countable orthonormal basis and can be represented as a linear function of the elements of the basis. Moreover, in a Markovian model, only current values of the state variables are necessary, so we can write: for some unknown function Ψ n and d Markov state variables x 1 (t), ..., x d (t). Longstaff and Schwartz [25] suggest estimating Ψ n -and hence the continuation value-by a linear combination of a finite set of J basis functions where the weights λ n,j are determined by least-squares regression on the Monte Carlo paths (typically, to improve quality of fit and run-time performance, only in-the-money paths may be considered, which limits the region on which the conditional expectation is to be estimated). Since the values of the basis functions are independently and identically distributed, it can be shown that the fitted value Ψ J n (x(T n )) converges in mean-square and in probability to Ψ J n (x(T n )) as the number of paths goes to infinity, which justifies the method.
Once the basis functions and their loadings are determined, valuation follows the dynamic programing method applied above in the context of finite difference schemes and lattices. Thus the algorithm to perform least-squares Monte Carlo (LSMC) to value a Bermudan might look as follows (for concreteness we assume the pricing model is LMM):

1.
Start by specifying and calibrating a desired form of the LMM (number of factors, correlation etc.) as well as the numeraire for the simulation; 2.
Calculate the numeraire, the value of the underlying swap, the values of the basis functions and the exercise values V e along each path and on all relevant exercise dates;

5.
Going backward in time, i.e., for n = β − 2, ..., 1, calculate the weights on the basis functions V c (T n ) = ∑ J j=1 λ n,j ψ j (x(T n )) along all paths ω through standard error minimization: and use the resulting estimates to update the swaption values Several remarks are in order at this point. First, as should be clear from the discussion above, the robustness of our valuation scheme is critically affected by the choice of the explanatory variables in the regression, i.e., the form and number of the basis functions. In their formulation, Longstaff and Schwartz [25] recommend using a polynomial basis, i.e., monomials of the state variables. While sound in principle, this approach might be impractical in the context of LMM due to the possibly high dimension of the vector of state variables (core LIBOR rates). Note also that with too many regressors, the quality of the parameter estimates is likely to deteriorate, given a fixed number of Monte Carlo paths. For example, Glasserman and Yu [54] show that the number of regression variables for which accurate estimation is possible from M paths is O(log M). Thus with, say, 50,000 paths we should not use more than four variables. A popular choice is to choose i.a. the front LIBOR rate (fixing on the exercise date), and several swap rates capturing the overall level and shape of the yield curve.
Second, unlike in lattices or finite difference schemes, in LSMC, exercise decisions will be sub-optimal by design as they rest on estimates of continuations values. At the same time, to the extent that the same set of paths is used to determine both the parameters λ n,j and continuation values, the price estimate might be biased upwards. To reduce bias and make sure that the regression generates a lower bound on the option price, we could use two independent Monte Carlo simulations: the first pilot run to fit the coefficients and then a second run to estimate continuation values for a newly seeded set of paths based on the previously determined decision rule. To gauge the quantitative effect of sub-optimal exercise decisions we conducted a simple experiment in which we priced the same 20-non-call-10 Bermudan swaption with a one-factor Hull-White model using a standard Crank-Nicolson finite-difference grid and LSMC. The model was calibrated to ATM European swaptions with final 20Y maturity, and the mean reversion rate was in each case set at 0.03. The results are shown in Figure 4 and clearly demonstrate the downward bias of the Monte Carlo with spreads relative to the exact finite-difference scheme in the order of 40-50 bps. Comparison of "20-non-call-10" Bermudan swaption prices calculated with a one-factor Hull-White model using a finite difference scheme and Least Square Monte Carlo (in percent; USD market data as of September 2020). Note: for reference, the LSMC uses 25,000 presimulation paths to fit the coefficients and then another 25,000 with antithetic variates (a total of 50,000 paths) to estimate continuation values based on the previously determined decision rule; we used twelve time steps per year and standard Euler discretization; the Crank-Nicolson finite difference gird has a resolution of 20 time steps per year and 400 space steps.

Factor Dependence of Bermudan Swaptions
Having presented the main categories of models and implementation techniques that can be used to price Bermudan swaptions we now turn to the issue of their factor dependence. In particular, using a set of numerical experiments we revisit the debate between Longstaff, Santa-Clara, and Schwartz [9] and Andersen and Andreasen [23], and the threats posed by the modeling equivalents of Scylla and Charybdis mentioned in the introduction.
Consider once again a "T β no-call T α " Bermudan receiver swaption, which, as we know, can be exercised into one of the so-called core swap rates: at either one of the designated exercise times T α , T α+1 , ..., T β , at the same strike rate K. In an important paper Longstaff, Santa-Clara, and Schwartz [9] made a point that this "chooseroption" character of a Bermudan makes is critically sensitive to the correlation among the core rates. Drastically simplifying their argument, we can imagine that when the individual S i,β , S j,β are perfectly correlated, so too should be the respective exercise values. Hence, there is likely to be little value added provided by the option to postpone the exercise. By contrast, the argument goes, when correlation between S i,β , S j,β is low or even negative, a Bermudan's right to choose the exercise date will have more value. Indeed, if the exercise value of the first swaption happens to be positive, it is worth exercising sooner, as payoffs further down the line are likely to turn negative. A straightforward conclusion follows that since single-factor models by design imply perfect correlation between core swap rates, they should artificially under-price Bermudan swaptions, making traders "throw away billions of dollars." From a theoretical point of view, the result-coupled with advances in numerical techniques under the guise of LSMC-provided a powerful argument in favor of switching from short-rate to multi-factor market models. And yet, Andersen and Andreasen [23] pointed out a subtle flaw in the above reasoning: to the extent that our Bermudan is viewed as a chooser option, one has to bear in mind that the choice we refer to is made at different points in time: T α , T α+1 , ..., T β . Consequently, what matters for pricing is not the instantaneous correlation (59) between the Brownian motions driving the dynamics of the core swap rates, but rather the forward or serial correlation between the rates setting at different dates T i and T j , which can be very roughly approximated as: and depend not just on ρ i,j but also on the volatilities of T i -and T j -maturity rates. This shows that while single-factor models do indeed imply perfect instantaneous correlation, they can de-correlate core rates setting at different times (through the time-dependent volatilities), with non-trivial consequences for Bermudan swaptions prices.
To appreciate this last point, we conducted our own set of numerical experiments using the modeling tools described above. Specifically, we priced Bermudan swaptions on a 5Y and 10Y swap with quarterly payment frequency and time to first exercise set at 1, 5 and 10 years. The strikes in each case were set to ATM levels based on the USD interest rate curve as of 11 September 2020. We used the Hull-White one-factor model (HW1F) implemented on a finite-difference grid as a benchmark and considered its twofactor extension (HW2F), and four versions of the LIBOR market model assuming 2-, 3-, 4-and full-factor parameterization. In line with standard practice, we used a piecewise constant volatility function and Rebonato's two-parameter correlation structure of the form ρ n,m = ρ ∞ + (1 − ρ ∞ ) exp(−ρ dec |n − m|), for the full-factor model. We used 25,000 pre-simulation paths to fit the coefficients and then another 25,000 with antithetic variates (a total of 50,000 paths) to estimate continuation values based on the previously determined decision rule.
The results of our numerical experiments, shown in Figure 5, are generally in line with those reported in a much more comprehensive study by Andersen and Andreasen [23] and seem to confirm the de-correlation possibilities of low-factor models. Indeed, prices produced by the HW1F model are consistently higher than those derived from the two-, three-and very high-factor market models. For longer-maturity products this pattern can be quite pronounced with the relative error for the 20-nc-10 swaption reaching almost 15%. Whereas a part of this pattern may be due to different numerical procedures employed (grid solution for HW1F and LSMC for other models), our earlier tests, reported in Figure 4, show that this cannot be the entire explanation. Indeed, to assess the dependence of swaption prices on rate correlation, we used our calibrated full-factor LMM to reprice the same 10 × 10 Bermudan receiver (struck at 1.33%) varying base correlation ρ ∞ from 0.1% to 0.9% ( Figure 6). Again, the results show that increasing correlation increases, not decreases, Bermudan swaption prices.  To provide some intuition for these results, we recall that a forward swap rate is a complex average of underlying forward rates, so we can write: where the weights are given by: . (62) If we now freeze the weights at their time-zero values-as originally proposed by Jackel and Rebonato [55], Rebonato [56] and justified by the fact that the variability of w i s is significantly smaller than the variability of the LIBOR rates-we can differentiate (61) to obtain: Calculating the quadratic variation and approximating it further by freezing all rates at their time-zero values, we obtain the following approximate expression for integrated percentage variance, or (squared) swap rate volatility: which must be plugged into the Black Formula (12) to recover the market price of a European swaption on the swap fixing in T α and maturing in T β . Now, if the volatilities of the forward rates spanning the swap are held fixed (e.g., through thorough calibration), then decreasing ρ i,j by moving to a lower-factor model or by adjusting the base correlation parameter ρ ∞ will necessarily also reduce the swap rate volatility-and hence ultimately the price of the Bermudan. Although other subtle effects can also play a role, justifying more extensive tests than allowed by the scope of this essay (see, e.g., [57,58] and as usual [7] for a general reference), we can nevertheless safely conclude at this point that factor dependency does seem to be an issue in pricing Bermudan swaptions. This empirical conclusion validates our perspective on the evolution of Bermudan swaption pricing as driven by a difficult choice between a more cumbersome and difficult to sensibly calibrate low-factor model that can be more efficiently implemented (Charybdis) and a multi-factor model that allows much more calibration freedom, but raises problems on the numerical implementation front and produces sub-optimal exercise decisions.

Markovian Projection and Dimension Reduction
We now consider a novel approach to deal with the "curse of dimensionality", which rests on the idea of Markovian projection and the seminal contribution of Gyöngy [28], who showed that for every stochastic process with an Ito differential: there exists a stochastic differential equation: having the same one-dimensional probability distribution as ξ(t) for every t, with γ(t, χ) 2 = E(δ(t) 2 |ξ(t) = χ) and b(t, χ) = E(β(t)|ξ(t) = χ). Below we extend and generalize this result to non-probabilistic measures. This will be more relevant to option pricing, which requires taking expectations of discounted terminal payoffs. Concretely, we will show below that for every pair of stochastic processes: there exist Markovian counterparts (χ, U) satisfying the following Ito equations: such that the values E Z(t)1 {ξ(t)>K} and E U(t)1 {χ(t)>K} coincide for every t and K.
If the function ζ is negative, then E U(t)1 {χ(t)>K} = Prob(χ(t) > K), where χ is a Markov process with Ito differential dχ(t) = δ(t, χ(t))dW(t) + β(t, χ(t))dt and killing rate −ζ(t, χ). Furthermore, the infinitesimal generator A of such a process is given by: The practical significance of this result is easy to see. Since ξ(·) and χ(·) have the same marginal probability distributions, it follows that prices of all European options on ξ(·) and χ(·) will be equal, for any expiration time t. For our present purposes this means that even if the swap rate evolves according to some very high-factor model with a stochastic volatility, we can replace it with a much simpler model by "localizing" its parameters, effectively reducing dimensionality. And while in theory this only guarantees that we match European options prices, we shall see below that a Markovian process for the swap rate-with a number of freezing approximations-provides a simple and robust tool for pricing Bermudan swaptions.

Markovian Projection for the Swap Rate
Consider once again the forward swap rate S α,β defined in continuous-time convention by Equation (10), and the corresponding T α -expiry European swaption with time-zero price: If both the expectation and the underlying swap rate are viewed, counter-intuitively, as functions of T α = t, then (for fixed T β = T) there arise two spot processes: For notational convenience denote D(t) = N(t) B(t) and F(t) = D(t)S(t) so that: Now, suppose that the dynamics of S(t) is described by some potentially complicated high-factor model with the following SDE: Note that the notation suggesting that dW(t) is one-dimensional involves no loss of generality since for any multi-dimensional process we can always find a diffusion and one-dimensional Brownian motion such that laws of both processes exactly match.
It is straightforward to calculate that: and for some Wiener processes W 1 and W 2 . We now present the main result in the form of the following theorem.
Then there exists a pair of processes (Z(t), Q(t)) with dynamics such that the discounted one-dimensional distributions of S(t) and Z(t) are equivalent, i.e., for every t, K. The coefficients ϕ,g, h are nonrandom and have the following simple interpretation: is the expectation under t-forward measure defined by: for any ξ.

Remark 1.
The Theorem above can be seen as a generalization of the result due to Gyöngy [28], who first observed that it is possible to construct a process that mimics the one-dimensional distributions of a given Ito process, but is simpler in the sense that its drift and diffusion terms are non-random, localized versions of the drift and diffusion of the original process (which in general may themselves be stochastic processes). We offer a reinterpretation and extension of the result to cover nonprobabilistic and discounted distributions relevant in option pricing. Therefore, the proof we present below is finance-motivated and starts by considering the swaption payoff spot process as per Equation (70).
Proof. By Tanaka's formula, Hence, recalling Equation (70), we can calculate the dynamics of C(t, K) as: use Taking the expected value of both sides under the risk neutral measure Q yields: , the probability distribution of S in the t-forward measure Q t is: Hence, Observe now that: Pick a sequence (x n ) ∞ n=0 of evenly spaced numbers such that x 0 = K, x i+1 − x i = ∆ and lim n→∞ x n = ∞. Then we may write: Letting ∆ → 0, and using Equation (79), we obtain: Analogously, Using (80)-(82) we can finally re-express Equation (78) as: Differentiating (83) twice with respect to the strike rate gives us the Fokker-Planck equation: which describes the time evolution of the probability density implied by swaptions prices {C(t, K} t,K . Note that the unique Markovian solution to Equation (84) is a pair of processes (Z(t), Q(t)) with the following dynamics: It follows that ∂ 2 xx C(t, x) is the "Q-discounted" density of Z(t) and clearly also the "D-discounted" density of S(t), as desired.

Remark 2.
The discounting referred to in the proof is crucial: while the distributions of S(t) and Z(t) are not equivalent, it nonetheless holds that: and non-probabilistic measures P D [S(t) > K] and P Q [Z(t) > K] are equivalent.
As a corollary, observe that the equivalence of non-probabilistic measures implies the equivalence of swaption prices: which in turn implies that functions h(t, K) and g(t, K) are given in closed form. Indeed, and Analogously, and Note that owing to the matching of one-dimensional (discounted) distributions, the equality of European options prices on S(·) and Z(·) is exact. However, Markovian projection does not preserve higher-order distributions, and therefore, in general, (88) does not extend automatically to Bermudan swaptions. That said, we can hope that: Whether and to what extent Equation (93) holds is ultimately an empirical matter, but our numerical experiments reported below suggest that the approximation is in fact good enough to justify its practical use.

Parameter Approximations and Numerical Experiments
Our strategy now is to calculate the right-hand side of (93) and compare the obtained price-across a range of strikes-to RBS α,β (t, K) determined using a proper multi-factor model. Perhaps the most straightforward way involves stripping the functions g(·, ·) and h(·, ·) off the interest rate curve and market swaptions prices using Formulas (90)-(92). But we outline here an even simpler strategy suggested in Gatarek and Jabłecki [59]. Concretely, we assume that P(t, T) ≈ P(0, T) P(0, t) , with d Q(t) = − Q(t)q(t)dt.
Note, in passing, that the dynamics (99) is not only one-dimensional but also resambles one typically used in the equity space to model the evolution of stock prices where the risk neutral drift of a share of stock is made up of the risk-free rate adjusted by the stock's dividend yield. Here, in the drift term q(t) − p(t), q(t) can be thought of as the "short rate" and p(t)-the "dividend yield." The calculation of sup τ∈T E Q Q(τ) K − S(τ) + should now be a straightforward exercise, and can be handled either in a finite difference pricer or LSMC simulation. We favor the former approach. Concretely, we now set T α 1 = 1, T β 1 = 6 and T α 2 = 1, T β 2 = 11 for two sample Bermudans: a 6-nc-1 and an 11-nc-1, respectively. As above, we use USD interest rate curve and volatility data as of September 2020.
First, we calculate p(t) and q(t) using yield curve data which we then use-along with market prices of co-terminal swaptions-to derive the local volatility surface from the approximated PDE: σ(t, K) 2 ≈ 2 ∂ t C(t, K) − (q(t) − p(t))K∂ K C(t, K) + p(t)C(t, K) ∂ 2 KK C(t, K) .
The result of this stripping exercise is shown in Figures 7 and 8.  Figure 7. Implied volatility surface and its corresponding local volatility surface stripped using (101). Note: Terminal swap maturity is fixed at 6 years; US market data as of 28 September 2020; to avoid "sinkholes" local volatility floored at zero. Figure 8. "Short-rate-like" and "dividend-like" parameters q(t) and p(t) of the process (99) (stripped from yield curve data as of 28 September 2020). Note: p(t) = f (0,t)P(0,t) P(0,t)−P(0,T) and q(t) = P(0,t) T t P(0,s)ds , T = 6; plot truncated due to explosive behavior of p with t → T; q(6) = 12.
Once the local volatility has been calculated (separately for each of the swap maturity dates T = 6 and T = 11), we set up the finite difference grid for the process (99). In each case the boundaries are selected such that S max = E(S(T)) + 4 Var(S(T)) and S min = −S max which translate to ±0.06 for T = 6 and ±0.08 for T = 11. For reference, we use 400 time steps and 400 grid intervals and solve the valuation PDE using the Crank-Nicolson method with Gauss-Seidel successive overrelaxation for improved convergence. Calculation time in both cases is instantaneous (Figure 9).
We benchmark valuation results against the whole class of models discussed above: the market standard Hull-White one-factor model, the two-factor Cheyette model, as well as the multi-factor Libor Market Model, obtaining the results shown in Figure 10.
Perhaps the first most striking observation is that the prices for strike rates ranging from 0% to 3% derived from the respective models tend to be generally closely aligned (Cheyette appears to underprice high-strike swaptions which may be due to the Monte Carlo implementation). Remarkably, the approximated model-i.e., the "poor man's" approach, featuring an evolution of only a single process-produces results which seem to be within normal market bid-ask spreads (not to mention model-based uncertainty) of those calculated using a fully-fledged multi-factor Libor Market Model, or a local-volatilityenhanced Cheyette term structure model. As we have stressed above, the "poor man's" model can be readily implemented in a finite difference scheme, with minimal adaptations to standard American equity option pricers, which ensures faster convergence as compared to the Cheyette Monte Carlo implementation (see again Figure 9). This demonstrates not only the fitting capability of the model, but-perhaps equally important-that Markovian projection in our extended framework can be fruitfully employed to valuing derivatives with early exercise features.

Conclusions
We have tried above to sketch our perspective on several decades of evolution in modeling the term structure of interest rates, with a particular focus on the travails involved in pricing Bermudan swaptions. While our treatment is by no means exhaustive, we hope to have shed light on the fascinating "tug of war" between the pursuit of calibration and pricing precision, tractability and implementation efficiency. The curse of dimensionality presented with recourse to Greek mythology boils down to the fact that the exercise boundary between the continuation area and stopping area is inherently complex and multi-dimensional for interest rate products. Ironically, therefore, the better we can capture and describe the product itself (i.e., the more factors we use), the poorer and less exact our handle on the exercise boundary is. And yet, we believe that our extension of the classic approach due to Gyöngy [28] promises a potentially attractive way past both the "Scylla" and "Charybdis" paths. Specifically, on the theoretical side, we show that it is possible to construct a process that mimics the discounted marginal distributions of a given Ito process, but is considerably simpler in the sense that its drift and diffusion terms are non-random, localized versions of the drift and diffusion of the original process (which in general may themselves be stochastic processes). This means that for purposes of swaption pricing, a potentially complex and multidimensional process for the underlying swap rate can be collapsed to a one-dimensional one. The empirical contribution complementing our theoretical observation consists in demonstrating that even though we only match the marginal distributions of the two processes, Bermudan swaption prices calculated using such an approach appear well-behaved and closely aligned to counterparts from more sophisticated models. And while much remains to be done in this area-not least in understanding what exactly enforces the alignment between the true early exercise boundary and the approximated ones-we believe that Markovian projection can be further fruitfully exploited in pricing interest rate products with early exercise features.
Author Contributions: All authors contributed equally. All authors have read and agreed to the published version of the manuscript.