Pricing and Hedging American-Style Options with Deep Learning

Sebastian Becker; Patrick Cheridito; Arnulf Jentzen

doi:10.3390/jrfm13070158

Abstract

In this paper we introduce a deep learning method for pricing and hedging American-style options. It first computes a candidate optimal stopping policy. From there it derives a lower bound for the price. Then it calculates an upper bound, a point estimate and confidence intervals. Finally, it constructs an approximate dynamic hedging strategy. We test the approach on different specifications of a Bermudan max-call option. In all cases it produces highly accurate prices and dynamic hedging strategies with small replication errors.

Keywords:

American option; Bermudan option; optimal stopping; lower bound; upper bound; hedging strategy; deep neural network

1. Introduction

Early exercise options are notoriously difficult to value. For up to three underlying risk factors, tree based and classical PDE approximation methods usually yield good numerical results; see, e.g., Forsyth and Vetzal (2002); Hull (2003); Reisinger and Witte (2012) and the references therein. To treat higher-dimensional problems, various simulation based methods have been developed; see, e.g., Tilley (1993); Barraquand and Martineau (1995); Carriere (1996); Andersen (2000); Longstaff and Schwartz (2001); Tsitsiklis and Van Roy (2001); García (2003); Broadie and Glasserman (2004); Bally et al. (2005); Kolodko and Schoenmakers (2006); Egloff et al. (2007); Broadie and Cao (2008); Berridge and Schumacher (2008); Jain and Oosterlee (2015). Haugh and Kogan (2004) as well as Kohler et al. (2010) have already used shallow1 neural networks to estimate continuation values. More recently, in Sirignano and Spiliopoulos (2018) optimal stopping problems in continuous time have been solved by approximating the solutions of the corresponding free boundary PDEs with deep neural networks. In Becker et al. (2019a, 2019b), deep learning has been used to directly learn optimal stopping strategies. The main focus of these papers is to derive optimal stopping rules and accurate price estimates.

The goal of this article is to develop a deep learning method which learns the optimal exercise behavior, prices and hedging strategies from samples of the underlying risk factors. It first learns a candidate optimal stopping strategy by regressing continuation values on multilayer neural networks. Employing the learned stopping strategy on a new set of Monte Carlo samples gives a low-biased estimate of the price. Moreover, the candidate optimal stopping strategy can be used to construct an approximate solution to the dual martingale problem introduced by Rogers (2002) and Haugh and Kogan (2004), yielding a high-biased estimate and confidence intervals for the price. In the last step, our method learns a dynamic hedging strategy in the spirit of Han et al. (2018) and Buehler et al. (2019). However, here, the continuation value approximations learned during the construction of the optimal stopping strategy can be used to break the hedging problem down into a sequence of smaller problems that learn the hedging portfolio only from one possible exercise date to the next. Alternative ways of computing hedging strategies consist in calculating sensitivities of option prices (see, e.g., Bally et al. 2005; Bouchard and Warin 2012; Jain and Oosterlee 2015) or approximating a solution to the dual martingale problem (see, e.g., Rogers 2002, 2010).

Our work is related to the preprints Lapeyre and Lelong (2019) and Chen and Wan (2019). Lapeyre and Lelong (2019) also use neural network regression to estimate continuation values. However, the networks are slightly different. While they work with leaky ReLU activation functions, we use tanh activation. Moreover, Lapeyre and Lelong (2019) study the convergence of the pricing algorithm as the number of simulations and the size of the network go to infinity, whereas we calculate a posteriori guarantees for the prices and use the estimated continuation value functions to implement efficient hedging strategies. Chen and Wan (2019) propose an alternative way of calculating prices and hedging strategies for American-style options by solving BSDEs.

The rest of the paper is organized as follows. In Section 2 we describe our neural network version of the Longstaff–Schwartz algorithm to estimate continuation values and construct a candidate optimal stopping strategy. In Section 3 the latter is used to derive lower and upper bounds as well as confidence intervals for the price. Section 4 discusses two different ways of computing dynamic hedging strategies. In Section 5 the results of the paper are applied to price and hedge a Bermudan call option on the maximum of different underlying assets. Section 6 concludes.

2. Calculating a Candidate Optimal Stopping Strategy

We consider an American-style option that can be exercised at any one of finitely2 many times

0 = t_{0} < t_{1} < \dots < t_{N} = T

. If exercised at time

t_{n}

, it yields a discounted payoff given by a square-integrable random variable

G_{n}

defined on a filtered probability space

(Ω, F, F = {(F_{n})}_{n = 0}^{N}, P)

. We assume that

F_{n}

describes the information available at time

t_{n}

and

G_{n}

is of the form

g (n, X_{n})

for a measurable function

g : \{0, 1, \dots, N\} \times R^{d} \to [0, \infty)

and a d-dimensional

F

-Markov process3

{(X_{n})}_{n = 0}^{N}

. We assume

X_{0}

to be deterministic and

P

to be the pricing measure. So that the value of the option at time 0 is given by

V = \sup_{τ \in T} E G_{τ},

where

T

is the set of all

F

-stopping times

τ : Ω \to \{0, 1, \dots, N\}

. If the option has not been exercised before time

t_{n}

, its discounted value at that time is

V_{t_{n}} = {ess \sup}_{τ \in T_{n}} E [G_{τ} ∣ F_{n}],

(1)

where

T_{n}

is the set of all

F

-stopping times satisfying

n \leq τ \leq N

.

Obviously,

τ_{N} \equiv N

is optimal for

V_{T} = G_{N}

. From there, one can recursively construct the stopping times

τ_{n} : = \{\begin{matrix} n & if G_{n} \geq E [G_{τ_{n + 1}} ∣ X_{n}] \\ τ_{n + 1} & if G_{n} < E [G_{τ_{n + 1}} ∣ X_{n}] . \end{matrix}

(2)

Clearly,

τ_{n}

belongs to

T_{n}

, and it can be checked inductively that

V_{t_{n}} = E [G_{τ_{n}} ∣ F_{n}] = G_{n} \lor E [V_{t_{n + 1}} ∣ X_{n}] for all n \leq N - 1 .

In particular,

τ_{n}

is an optimizer of (1).

Recursion (2) is the theoretical basis of the Longstaff and Schwartz (2001) method. Its main computational challenge is the approximation of the conditional expectations

E [G_{τ_{n + 1}} ∣ X_{n}]

. It is well known that

E [G_{τ_{n + 1}} ∣ X_{n}]

is of the form

c (X_{n})

, where

c : R^{d} \to R

minimizes the mean squared distance

E [{\{G_{τ_{n + 1}} - c (X_{n})\}}^{2}]

over all Borel measurable functions from

R^{d}

to

R

; see, e.g., Bru and Heinich (1985). The Longstaff–Schwartz algorithm approximates

E [G_{τ_{n + 1}} ∣ X_{n}]

by projecting

G_{τ_{n + 1}}

on the linear span of finitely many basis functions. However, it is also possible to project on a different subset. If the subset is given by

c^{θ} (X_{n})

for a function family

c^{θ} : R^{d} \to R

parametrized by

θ

, one can apply the following variant4 of the Longstaff–Schwartz algorithm:

(i): Simulate5 paths ${(x_{n}^{k})}_{n = 0}^{N}$ , $k = 1, \dots, K$ , of the underlying process ${(X_{n})}_{n = 0}^{N}$ .
(ii): Set $s_{N}^{k} \equiv N$ for all k.
(iii): For $1 \leq n \leq N - 1$ , approximate $E [G_{τ_{n + 1}} ∣ X_{n}]$ with $c^{θ_{n}} (X_{n})$ by minimizing the sum

$\sum_{k = 1}^{K} {(g (s_{n + 1}^{k}, x_{s_{n + 1}^{k}}^{k}) - c^{θ} (x_{n}^{k}))}^{2} over θ .$

(3)
(iv): Set

$s_{n}^{k} : = \{\begin{matrix} n & if g (n, x_{n}^{k}) \geq c^{θ_{n}} (x_{n}^{k}) \\ s_{n + 1}^{k} & otherwise . \end{matrix}$
(v): Define $θ_{0} : = \frac{1}{K} \sum_{k = 1}^{K} g (s_{1}^{k}, x_{s_{1}^{k}}^{k})$ , and set $c^{θ_{0}}$ constantly equal to $θ_{0}$ .

In this paper we specify

c^{θ}

as a feedforward neural network, which in general, is of the form

a_{I}^{θ} \circ φ_{q_{I - 1}} \circ a_{I - 1}^{θ} \circ \dots \circ φ_{q_{1}} \circ a_{1}^{θ},

(4)

where

$I \geq 1$ denotes the depth and $q_{0}, q_{1}, \dots, q_{I}$ the numbers of nodes in the different layers;
$a_{1}^{θ} : R^{q_{0}} \to R^{q_{1}}, \dots, a_{I}^{θ} : R^{q_{I - 1}} \to R^{q_{I}}$ are affine functions;
For $j \in N$ , $φ_{j} : R^{j} \to R^{j}$ is of the form $φ_{j} (x_{1}, \dots, x_{j}) = (φ (x_{1}), \dots, φ (x_{j}))$ for a given activation function $φ : R \to R$ .

The components of the parameter

θ

consist of the entries of the matrices

A_{1}, \dots, A_{I}

and vectors

b_{1}, \dots, b_{I}

appearing in the representation of the affine functions

a_{i}^{θ} x = A_{i} x + b_{i}

,

i = 1, \dots, I

. So,

θ

lives in

R^{q}

for

q = \sum_{i = 1}^{I} q_{i} (q_{i - 1} + 1)

. To minimize (3) we choose a network with

q_{I} = 1

and employ a stochastic gradient descent method.

3. Pricing

3.1. Lower Bound

Once

θ_{0}, θ_{1}, \dots, θ_{N - 1}

have been determined, we set

Θ = (θ_{0}, \dots, θ_{N - 1})

and define

τ^{Θ} : = \min \{n \in \{0, 1, \dots, N - 1\} : g (n, X_{n}) \geq c^{θ_{n}} (X_{n})\}, where \min \emptyset is understood as N .

This defines a valid

F

-stopping time. Therefore,

L = E g (τ^{Θ}, X_{τ^{θ}})

is a lower bound for the optimal value V. However, typically, it is not possible to calculate the expectation exactly. Therefore, we generate simulations

g^{k}

of

g (τ^{Θ}, X_{τ^{Θ}})

based on independent sample paths6

{(x_{n}^{k})}_{n = 0}^{N}

,

k = K + 1, \dots, K + K_{L}

, of

{(X_{n})}_{n = 0}^{N}

and approximate L with the Monte Carlo average

\hat{L} = \frac{1}{K_{L}} \sum_{k = K + 1}^{K + K_{L}} g^{k} .

Denote by

z_{α / 2}

the

1 - α / 2

quantile of the standard normal distribution and consider the sample standard deviation

{\hat{σ}}_{L} = \sqrt{\frac{1}{K_{L} - 1} \sum_{k = K + 1}^{K + K_{L}} {(g^{k} - \hat{L})}^{2}} .

Then one obtains from the central limit theorem that

[\hat{L} - z_{α / 2} \frac{{\hat{σ}}_{L}}{\sqrt{K_{L}}}, \infty)

(5)

is an asymptotically valid

1 - α / 2

confidence interval for L.

3.2. Upper Bound, Point Estimate and Confidence Intervals

Our derivation of an upper bound is based on the duality results of Rogers (2002); Haugh and Kogan (2004) and Becker et al. (2019a). By Rogers (2002) and Haugh and Kogan (2004), the optimal value V can be written as

V = E [\max_{0 \leq n \leq N} (G_{n} - M_{n})],

where

{(M_{n})}_{n = 0}^{N}

is the martingale part of the smallest

F

-supermartingale dominating the payoff process

{(G_{n})}_{n = 0}^{N}

. We approximate

{(M_{n})}_{n = 0}^{N}

with the

F

-martingale

{(M_{n}^{Θ})}_{n = 0}^{N}

obtained from the stopping decisions implied by the trained continuation value functions

c^{θ_{n}}

,

n = 0, \dots, N - 1

, as in Section 3.2 of Becker et al. (2019a). We know from Proposition 7 of Becker et al. (2019a) that if

{(ε_{n})}_{n = 0}^{N}

is a sequence of integrable random variables satisfying

E [ε_{n} ∣ F_{n}] = 0

for all

n = 0, 1, \dots, N

, then

U = E [\max_{0 \leq n \leq N} (G_{n} - M_{n}^{Θ} - ε_{n})]

is an upper bound for V. As in Becker et al. (2019a), we use nested simulation7 to generate realizations

m_{n}^{k}

of

M_{n}^{Θ} + ε_{n}

along independent realizations

{(x_{n}^{k})}_{n = 0}^{N}

,

k = K + K_{L} + 1, \dots, K + K_{L} + K_{U}

, of

{(X_{n})}_{n = 0}^{N}

sampled independently of

{(x_{n}^{k})}_{n = 0}^{N}

,

k = 1, \dots K

, and estimate U as

\hat{U} = \frac{1}{K_{U}} \sum_{k = K + K_{L} + 1}^{K + K_{L} + K_{U}} \max_{0 \leq n \leq N} (g (n, x_{n}^{k}) - m_{n}^{k}) .

Our point estimate of V is

\hat{V} = \frac{\hat{L} + \hat{U}}{2} .

The sample standard deviation of the estimator

\hat{U}

, given by

{\hat{σ}}_{U} = \sqrt{\frac{1}{K_{U} - 1} \sum_{k = K + K_{L} + 1}^{K + K_{L} + K_{U}} {(\max_{0 \leq n \leq N} (g (n, x_{n}^{k}) - m_{n}^{k}) - \hat{U})}^{2}},

can be used together with the one-sided confidence interval (5) to construct the asymptotically valid two-sided

1 - α

confidence interval

[\hat{L} - z_{α / 2} \frac{{\hat{σ}}_{L}}{\sqrt{K_{L}}}, \hat{U} + z_{α / 2} \frac{{\hat{σ}}_{U}}{\sqrt{K_{U}}}]

(6)

for the true value V; see Section 3.3 of Becker et al. (2019a).

4. Hedging

We now consider a savings account together with

e \in N

financial securities as hedging instruments. We fix a positive integer M and introduce a time grid

0 = u_{1} < u_{2} < \dots < u_{N M}

such that

u_{n M} = t_{n}

for all

n = 0, 1, \dots, N

. We suppose that the information available at time

u_{m}

is described by

H_{m}

, where

H = {(H_{m})}_{m = 0}^{M N}

is a filtration satisfying

H_{n M} = F_{n}

for all n. If any of the financial securities pay dividends, they are immediately reinvested. We assume that the resulting discounted8 value processes are of the form

P_{u_{m}} = p_{m} (Y_{m})

for measurable functions

p_{m} : R^{d} \to R^{e}

and an

H

-Markov process9

{(Y_{m})}_{m = 0}^{N M}

such that

Y_{n M} = X_{n}

for all

n = 0, \dots, N

. A hedging strategy consists of a sequence

h = {(h_{m})}_{m = 0}^{N M - 1}

of functions

h_{m} : R^{d} \to R^{e}

specifying the time-

u_{m}

holdings in

P_{u_{m}}^{1}, \dots, P_{u_{m}}^{e}

. As usual, money is dynamically deposited in or borrowed from the savings account to make the strategy self-financing. The resulting discounted gains at time

u_{m}

are given by

{(h \cdot P)}_{u_{m}} : = \sum_{j = 0}^{m - 1} h_{j} (Y_{j}) \cdot (p_{j + 1} (Y_{j + 1}) - p_{j} (Y_{j})) : = \sum_{j = 0}^{m - 1} \sum_{i = 1}^{e} h_{j}^{i} (Y_{j}) (p_{j + 1}^{i} (Y_{j + 1}) - p_{j}^{i} (Y_{j})) .

4.1. Hedging Until the First Possible Exercise Date

For a typical Bermudan option, the time between two possible exercise dates

t_{n} - t_{n - 1}

might range between a week and several months. In case of an American option, we choose

t_{n} = n Δ

for a small amount of time

Δ

such as a day. We assume

τ^{Θ}

does not stop at time 0. Otherwise, there is nothing to hedge. In a first step, we only compute the hedge until time

t_{1}

. If the option is still alive at time

t_{1}

, the hedge can then be computed until time

t_{2}

and so on. To construct a hedge from time 0 to

t_{1}

, we approximate the time-

t_{1}

value of the option with

V_{t_{1}}^{θ_{1}} = v^{θ_{1}} (X_{1})

for the function

v^{θ_{1}} (x) = g (1, x) \lor c^{θ_{1}} (x)

, where

c^{θ_{1}} : R^{d} \to R

is the time-

t_{1}

continuation value function estimated in Section 2. Then we search for hedging positions

h_{m}

,

m = 0, 1, \dots, M - 1

, that minimize the mean squared error

E [{(\hat{V} + {(h \cdot P)}_{t_{1}} - V_{t_{1}}^{θ_{1}})}^{2}] .

To do that we approximate the functions

h_{m}

with neural networks

h^{λ} : R^{d} \to R^{e}

of the form (4) and try to find parameters

λ_{0}, \dots, λ_{M - 1}

that minimize

\sum_{k = 1}^{K_{H}} {(\hat{V} + \sum_{m = 0}^{M - 1} h^{λ_{m}} (y_{m}^{k}) \cdot (p_{m + 1} (y_{m + 1}^{k}) - p_{m} (y_{m}^{k})) - v^{θ_{1}} (y_{M}^{k}))}^{2}

(7)

for independent realizations of

{(y_{m}^{k})}_{m = 0}^{M}

,

k = 1, \dots, K_{H}

of

{(Y_{m})}_{m = 0}^{M}

. We train the networks

h^{λ_{0}}, \dots, h^{λ_{M - 1}}

together, again using a stochastic gradient descent method. Instead of (7), one could also minimize a different deviation measure. However, (7) has the advantage that it yields hedging strategies with an average hedging error close to zero10.

Once

λ_{0}, \dots, λ_{M - 1}

have been determined, we assess the quality of the hedge by simulating new11 independent realizations

{(y_{m}^{k})}_{m = 0}^{M}

,

k = K_{H} + 1, \dots, K_{H} + K_{E}

of

{(Y_{m})}_{m = 0}^{M}

and calculating the average hedging error

\frac{1}{K_{E}} \sum_{k = K_{H} + 1}^{K_{H} + K_{E}} (\hat{V} + \sum_{m = 0}^{M - 1} h^{λ_{m}} (y_{m}^{k}) \cdot (p_{m + 1} (y_{m + 1}^{k}) - p_{m} (y_{m}^{k})) - v^{θ_{1}} (y_{M}^{k}))

(8)

and the empirical hedging shortfall

\frac{1}{K_{E}} \sum_{k = K_{H} + 1}^{K_{H} + K_{E}} {(\hat{V} + \sum_{m = 0}^{M - 1} h^{λ_{m}} (y_{m}^{k}) \cdot (p_{m + 1} (y_{m + 1}^{k}) - p_{m} (y_{m}^{k})) - v^{θ_{1}} (y_{M}^{k}))}^{-}

(9)

over the time interval

[0, t_{1}]

.

4.2. Hedging Until the Exercise Time

Alternatively, one can precompute the whole hedging strategy from time 0 to T and then use it until the option is exercised. In order to do that we introduce the functions

v^{θ_{n}} (x) : = g (n, x) \lor c^{θ_{n}} (x), C^{θ_{n}} (x) : = 0 \lor c^{θ_{n}} (x), x \in R^{d},

and hedge the difference

v^{θ_{n}} (Y_{n M}) - C^{θ_{n - 1}} (Y_{(n - 1) M})

on each of the time intervals

[t_{n - 1}, t_{n}]

,

n = 1, \dots, N

, separately.

v^{θ_{n}}

describes the approximate value of the option at time

t_{n}

if it has not been exercised before, and the definition of

C^{θ_{n}}

takes into account that the true continuation values are non-negative due to the non-negativity of the payoff function g. The hedging strategy can be computed as in Section 4.1, except that we now have to simulate complete paths

{(y_{m}^{k})}_{m = 0}^{N M}

of

{(Y_{m})}_{m = 0}^{N M}

,

k = 1, \dots, K_{H}

, and then for all

n = 1, \dots, N

, find parameters

λ_{(n - 1) M}, \dots, λ_{n M - 1}

which minimize

\sum_{k = 1}^{K_{H}} {(C^{θ_{n - 1}} (y_{(n - 1) M}^{k}) + \sum_{m = (n - 1) M}^{n M - 1} h^{λ_{m}} (y_{m}^{k}) \cdot (p_{m + 1} (y_{m + 1}^{k}) - p_{m} (y_{m}^{k})) - v^{θ_{n}} (y_{n M}^{k}))}^{2} .

Once the hedging strategy has been trained, we simulate independent samples

{(y_{m}^{k})}_{m = 0}^{N M}

,

k = K_{H} + 1, \dots, K_{H} + K_{E}

, of

{(Y_{m})}_{m = 0}^{N M}

and denote the realization of

τ^{Θ}

along each sample path

{(y_{m}^{k})}_{m = 0}^{N M}

by

τ^{k}

. The corresponding average hedging error is given by

\frac{1}{K_{E}} \sum_{k = K_{H} + 1}^{K_{H} + K_{E}} (\hat{V} + \sum_{m = 0}^{τ^{k} M - 1} h^{λ_{m}} (y_{m}^{k}) \cdot (p_{m + 1} (y_{m + 1}^{k}) - p_{m} (y_{m}^{k})) - g (τ^{k}, X_{τ^{k}}))

(10)

and the empirical hedging shortfall by

\frac{1}{K_{E}} \sum_{k = K_{H} + 1}^{K_{H} + K_{E}} {(\hat{V} + \sum_{m = 0}^{τ^{k} M - 1} h^{λ_{m}} (y_{m}^{k}) \cdot (p_{m + 1} (y_{m + 1}^{k}) - p_{m} (y_{m}^{k})) - g (τ^{k}, X_{τ^{k}}))}^{-} .

(11)

5. Example

In this section we study12 a Bermudan max-call option13 on d financial securities with risk-neutral price dynamics

S_{t}^{i} = s_{0}^{i} \exp ([r - δ_{i} - σ_{i}^{2} / 2] t + σ_{i} W_{t}^{i}), i = 1, 2, \dots, d,

for a risk-free interest rate

r \in R

, initial values

s_{0}^{i} \in (0, \infty)

, dividend yields

δ_{i} \in [0, \infty)

, volatilities

σ_{i} \in (0, \infty)

and a d-dimensional Brownian motion W with constant instantaneous correlations14

ρ_{i j} \in R

between different components

W^{i}

and

W^{j}

. The option has time-t payoff

{(\max_{1 \leq i \leq d} S_{t}^{i} - K)}^{+}

for a strike price

K \in [0, \infty)

and can be exercised at one of finitely many times

0 = t_{0} < t_{1} < \dots < t_{N} = T

. In addition, we suppose there is a savings account where money can be deposited and borrowed at rate r.

For notational simplicity, we assume in the following that

t_{n} = n T / N

for

n = 0, 1, \dots, N

, and all assets have the same15 characteristics; that is,

s_{0}^{i} = s_{0}

,

δ_{i} = δ

and

σ_{i} = σ

for all

i = 1, \dots, d

.

5.1. Pricing Results

Let us denote

X_{n} = S_{t_{n}}

,

n = 0, 1, \dots, N

. Then the price of the option is given by

\sup_{τ} E [e^{- r \frac{τ T}{N}} {(\max_{1 \leq i \leq d} X_{τ}^{i} - K)}^{+}],

where the supremum is over all stopping times

τ : Ω \to {0, 1, \dots, N}

with respect to the filtration generated by

{(X_{n})}_{n = 0}^{N}

. The option payoff does not carry any information not already contained in

X_{n}

. However, the training of the continuation values worked more efficiently when we used it as an additional feature. So instead of

X_{n}

we simulated the extended state process

{\hat{X}}_{n} = (X_{n}^{1}, \dots, X_{n}^{d}, X_{n}^{d + 1})

for

X_{n}^{d + 1} = e^{- r \frac{n T}{N}} {(\max_{1 \leq i \leq d} X_{n}^{i} - K)}^{+}

to train the continuation value functions

c^{θ_{n}}

,

n = 1, \dots, N - 1

. The network

c^{θ} : R^{d + 1} \to R

was chosen of the form (4) with depth

I = 3

(two hidden layers),

d + 50

nodes in each hidden layer and activation function

φ = \tanh

. For training we used stochastic gradient descent with mini-batches of size 8192 and batch normalization (Ioffe and Szegedy 2015). At time

N - 1

we used Xavier (Glorot and Bengio 2010) initialization and performed 6000 Adam (Kingma and Ba 2015) updating steps16. For

n \leq N - 2

, we started the gradient descent from the trained network parameters

θ_{n + 1}

and made 3500 Adam updating steps

^{16}

. To calculate

\hat{L}

we simulated

K_{L} =

4,096,000 paths of

{(X_{n})}_{n = 0}^{N}

. For

\hat{U}

we generated

K_{U} =

2048 outer and 2048 × 2048 inner simulations.

Our results for

\hat{L}

,

\hat{U}

,

\hat{V}

and 95% confidence intervals for different specifications of the model parameters are reported in Table 1. To achieve a pricing accuracy comparable to the more direct methods of Becker et al. (2019a, 2019b), the networks used in the construction of the candidate optimal stopping strategy had to be trained for a longer time. But in exchange, the approach yields approximate continuation values that can be used to break down the hedging problem into a series of smaller problems.

Table 1. Price estimates for max-call options on 5 and 10 symmetric assets for parameter values of

r = 5 %

,

δ = 10 %

,

σ = 20 %

,

ρ = 0

,

K = 100

,

T = 3

,

N = 9

.

t_{L}

is the number of seconds it took to train

τ^{Θ}

and compute

\hat{L}

.

t_{U}

is the computation time for

\hat{U}

in seconds. 95% CI is the 95% confidence interval (6). The last column lists the 95% confidence intervals computed in Becker et al. (2019a).

5.2. Hedging Results

Suppose the hedging portfolio can be rebalanced at the times

u_{m} = m T / (N M)

,

m = 0, 1, \dots, N M

, for a positive integer M. We assume dividends paid by shares of

S^{i}

held in the hedging portfolio are continuously reinvested in

S^{i}

. This results in the adjusted discounted security prices

P_{u_{m}}^{i} = s_{0} \exp (σ W_{u_{m}}^{i} - σ^{2} u_{m} / 2), m = 0, 1, \dots, N M .

We set

Y_{m}^{i} = P_{u_{m}}^{i}

. To learn the hedging strategy, we trained neural networks

h^{λ_{m}} : R^{d} \to R^{d}

,

m = 0, \dots, N M - 1

, of the form (4) with depth

I = 3

(two hidden layers),

d + 50

nodes in each hidden layer and activation function

φ = \tanh

. As in Section 5.1, we used stochastic gradient descent with mini-batches of size 8192 and batch normalization (Ioffe and Szegedy 2015). For

m = 0, \dots, M - 1

, we initialized the networks according to Xavier (Glorot and Bengio 2010) and performed 10,000 Adam (Kingma and Ba 2015) updating steps

^{16}

, whereas for

m \geq M

, we started the gradient trajectories from the trained network parameters

λ_{m - M}

and made 3000 Adam updating steps

^{16}

.

Table 2 reports the average hedging errors (8) and (10) together with the empirical hedging shortfalls (9) and (11) for different numbers M of rebalancing times between two consecutive exercise dates

t_{n - 1}

and

t_{n}

. They were computed using

K_{E} =

4,096,000 simulations of

{(Y_{m})}_{m = 0}^{N M}

.

Table 2. Average hedging errors and empirical hedging shortfalls for 5 and 10 underlying assets and different numbers M of rehedging times between consecutive exercise times

t_{n - 1}

and

t_{n}

. The values of the parameters r,

δ

,

σ

,

ρ

, K, T and N were chosen as in Table 1. IHE is the intermediate average hedging error (8), IHS the intermediate hedging shortfall (9), HE the total average hedging error (10) and HS the total hedging shortfall (11).

\hat{V}

is our price estimate from Table 1. T1 is the computation time in seconds for training the hedging strategy from time 0 to

t_{1} = T / N

. T2 is the number of seconds it took to train the complete hedging strategy from time 0 to T.

Figure 1 shows histograms of the total hedging errors

\hat{V} + \sum_{m = 0}^{τ^{k} M - 1} h^{λ_{m}} (y_{m}^{k}) \cdot (p_{m + 1} (y_{m + 1}^{k}) - p_{m} (y_{m}^{k})) - g (τ^{k}, X_{τ^{k}}), k = K_{H} + 1, \dots, K_{E},

for

d \in \{5, 10\}

and

M \in \{12, 96\}

.

Figure 1. Total hedging errors for

s_{0} = 100

,

M \in {12, 96}

,

d = 5

(top) and

d = 10

(bottom) along 4,096,000 sample paths of

{(Y_{m})}_{m = 0}^{N M}

. The values of the parameters r,

δ

,

σ

,

ρ

, K, T and N were as in Table 1 and Table 2.

6. Conclusions

In this article, we used deep learning to price and hedge American-style options. In a first step our method employs a neural network version of the Longstaff–Schwartz algorithm to estimate continuation values and derive a candidate optimal stopping rule. The learned stopping rule immediately yields a low-biased estimate of the price. In addition, it can be used to construct an approximate solution of the dual martingale problem of Rogers (2002) and Haugh and Kogan (2004). This gives a high-biased estimate and confidence intervals for the price. To achieve the same pricing accuracy as the more direct approaches of Becker et al. (2019a, 2019b), we had to train the neural network approximations of the continuation values for a longer time. However, computing approximate continuation values has the advantage that they can be used to break the hedging problem into a sequence of subproblems that compute the hedge only from one possible exercise date to the next.

Author Contributions

S.B., P.C. and A.J. have contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

A.J. acknowledges support from the DFG through Germany’s Excellence Strategy EXC 2044-390685587, Mathematics Münster: Dynamics - Geometry - Structure.

Conflicts of Interest

The authors declare no conflict of interest.

References

Andersen, Leif. 2000. A simple approach to the pricing of Bermudan swaptions in the multifactor LIBOR market model. The Journal of Computational Finance 3: 5–32. [Google Scholar] [CrossRef]
Bally, Vlad, Gilles Pagès, and Jacques Printems. 2005. A quantization tree method for pricing and hedging multidimensional American options. Mathematical Finance 15: 119–68. [Google Scholar] [CrossRef]
Barraquand, Jérôme, and Didier Martineau. 1995. Numerical valuation of high dimensional multivariate American securities. The Journal of Financial and Quantitative Analysis 30: 383–405. [Google Scholar] [CrossRef]
Becker, Sebastian, Patrick Cheridito, and Arnulf Jentzen. 2019a. Deep optimal stopping. Journal of Machine Learning Research 20: 1–25. [Google Scholar]
Becker, Sebastian, Patrick Cheridito, Arnulf Jentzen, and Timo Welti. 2019b. Solving high-dimensional optimal stopping problems using deep learning. arXiv arXiv:1908.01602. [Google Scholar]
Berridge, Steffan J., and Johannes M. Schumacher. 2008. An irregular grid approach for pricing high-dimensional American options. Journal of Computational and Applied Mathematics 222: 94–111. [Google Scholar] [CrossRef]
Bouchard, Bruno, and Xavier Warin. 2012. Monte-Carlo valuation of American options: Facts and new algorithms to improve existing methods. In Numerical Methods in Finance. Berlin and Heidelberg: Springer, pp. 215–255. [Google Scholar]
Broadie, Mark, and Menghui Cao. 2008. Improved lower and upper bound algorithms for pricing American options by simulation. Quantitative Finance 8: 845–61. [Google Scholar] [CrossRef]
Broadie, Mark, and Paul Glasserman. 2004. A stochastic mesh method for pricing high-dimensional American options. Journal of Computational Finance 7: 35–72. [Google Scholar] [CrossRef]
Bru, Bernard, and Henri Heinich. 1985. Meilleures approximations et médianes conditionnelles. Annales de l’I.H.P. Probabilités et Statistiques 21: 197–224. [Google Scholar]
Buehler, Hans, Lukas Gonon, Josef Teichmann, and Ben Wood. 2019. Deep hedging. Quantitative Finance 19: 1271–91. [Google Scholar] [CrossRef]
Carriere, Jacques F. 1996. Valuation of the early-exercise price for options using simulations and nonparametric regression. Insurance: Mathematics and Economics 19: 19–30. [Google Scholar] [CrossRef]
Chen, Yangang, and Justin W.L. Wan. 2019. Deep neural network framework based on backward stochastic differential equations for pricing and hedging American options in high dimensions. arXiv arXiv:1909.11532. [Google Scholar]
Egloff, Daniel, Michael Kohler, and Nebojsa Todorovic. 2007. A dynamic look-ahead Monte Carlo algorithm for pricing Bermudan options. Annals of Applied Probability 17: 1138–71. [Google Scholar] [CrossRef]
Forsyth, Peter A., and Ken R. Vetzal. 2002. Quadratic convergence for valuing American options using a penalty method. SIAM Journal on Scientific Computing 23: 2095–122. [Google Scholar] [CrossRef]
García, Diego. 2003. Convergence and biases of Monte Carlo estimates of American option prices using a parametric exercise rule. Journal of Economic Dynamics and Control 27: 1855–79. [Google Scholar] [CrossRef]
Glorot, Xavier, and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. Paper Presented at Thirteenth International Conference on Artificial Intelligence and Statistics, PMLR, Sardinia, Italy, May 13–15, vol. 9, pp. 249–256. [Google Scholar]
Han, Jiequn, Arnulf Jentzen, and Weinan E. 2018. Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences of the United States of America 115: 8505–10. [Google Scholar] [CrossRef]
Haugh, Martin B., and Leonid Kogan. 2004. Pricing American options: a duality approach. Operations Research 52: 258–70. [Google Scholar] [CrossRef]
Hull, John C. 2003. Options, Futures and Other Derivatives. London: Pearson. Upper Saddle River: Prentice Hall. [Google Scholar]
Ioffe, Sergey, and Christian Szegedy. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift. Paper presented at 32nd International Conference on Machine Learning, ICML 2015, Lille, France, July 6–11, vol. 37, pp. 448–456. [Google Scholar]
Jain, Shashi, and Cornelis W. Oosterlee. 2015. The stochastic grid bundling method: efficient pricing of Bermudan options and their Greeks. Applied Mathematics and Computation 269: 412–31. [Google Scholar] [CrossRef]
Kingma, Diederik P., and Jimmy Ba. 2015. Adam: A method for stochastic optimization. Paper Presented at International Conference on Learning Representations, San Diego, CA, USA, May 7–9. [Google Scholar]
Kohler, Michael, Adam Krzyżak, and Nebojsa Todorovic. 2010. Pricing of high-dimensional American options by neural networks. Mathematical Finance 20: 383–410. [Google Scholar] [CrossRef]
Kolodko, Anastasia, and John Schoenmakers. 2006. Iterative construction of the optimal Bermudan stopping time. Finance and Stochastics 10: 27–49. [Google Scholar] [CrossRef]
Lapeyre, Bernard, and Jérôme Lelong. 2019. Neural network regression for Bermudan option pricing. arXiv arXiv:1907.06474. [Google Scholar]
Longstaff, Francis A., and Eduardo S. Schwartz. 2001. Valuing American options by simulation: a simple least-squares approach. The Review of Financial Studies 14: 113–47. [Google Scholar] [CrossRef]
Reisinger, Christoph, and Jan H. Witte. 2012. On the use of policy iteration as an easy way of pricing American options. SIAM J. Financial Math. 3: 459–78. [Google Scholar] [CrossRef]
Rogers, Chris. 2002. Monte Carlo valuation of American options. Mathematical Finance 12: 271–86. [Google Scholar] [CrossRef]
Rogers, Chris. 2010. Dual valuation and hedging of Bermudan options. SIAM Journal on Financial Mathematics 1: 604–8. [Google Scholar] [CrossRef]
Sirignano, Justin, and Konstantinos Spiliopoulos. 2018. DGM: A deep learning algorithm for solving partial differential equations. Journal of Computational Physics 375: 1339–64. [Google Scholar] [CrossRef]
Tilley, James A. 1993. Valuing American options in a path simulation model. Transactions of the Society of Actuaries 45: 83–104. [Google Scholar]
Tsitsiklis, John N., and Benjamin Van Roy. 2001. Regression methods for pricing complex American-style options. IEEE Transactions on Neural Networks 12: 694–703. [Google Scholar] [CrossRef]

1.	Meaning feedforward networks with a single hidden layer.
2.	This covers Bermudan options as well as American options that can only be exercised at a given time each day. Continuously exercisable options must be approximated by discretizing time.
3.	That is, $X_{n}$ is $F_{n}$ -measurable, and $E [f (X_{n + 1}) ∣ F_{n}] = E [f (X_{n + 1}) ∣ X_{n}]$ for all $n \leq N - 1$ and every measurable function $f : R^{d} \to R$ such that $f (X_{n + 1})$ is integrable.
4.	The main difference between this algorithm and the one of Longstaff and Schwartz (2001) is the use of neural networks instead of linear combinations of basis functions. In addition, the sum in (3) is over all simulated paths, whereas in Longstaff and Schwartz (2001), only in-the-money paths are considered to save computational effort. While it is enough to use in-the-money paths to determine a candidate optimal stopping rule, we need accurate approximate continuation values for all $x \in R^{d}$ to construct good hedging strategies in Section 4.
5.	As usual, we simulate the paths $(x_{n}^{k})$ , $k = 1, \dots, K$ , independently of each other.
6.	Generated independently of ${(x_{n}^{k})}_{n = 0}^{N}$ , $k = 1, \dots, K$
7.	The use of nested simulation ensures that $m_{n}^{k}$ are unbiased estimates of $M_{n}^{Θ}$ , which is crucial for the validity of the upper bound. In particular, we do not directly approximate $M_{n}^{Θ}$ with the estimated continuation value functions $c^{θ_{n}}$ .
8.	Discounting is done with respect to the savings account. Then, the discounted value of money invested in the savings account stays constant.
9.	That is, $Y_{m}$ is $H_{m}$ -measurable and $E [f (Y_{m + 1}) ∣ H_{m}] = E [f (Y_{m + 1}) ∣ Y_{m}]$ for all $m \leq N M - 1$ and every measurable function $f : R^{d} \to R$ such that $f (Y_{m + 1})$ is integrable.
10.	See Table 2 and Figure 1 below.
11.	Independent of ${(y_{m}^{k})}_{m = 0}^{M}$ , $k = 1, \dots, K_{H}$ .
12.	The computations were performed on a NVIDIA GeForce RTX 2080 Ti GPU. The underlying system was an AMD Ryzen 9 3950X CPU with 64 GB DDR4 memory running Tensorflow 2.1 on Ubuntu 19.10.
13.	Bermudan max-call options are a benchmark example in the literature on numerical methods for high-dimensional American-style options; see, e.g., Longstaff and Schwartz (2001); Rogers (2002); García (2003); Broadie and Glasserman (2004); Haugh and Kogan (2004); Broadie and Cao (2008); Berridge and Schumacher (2008); Jain and Oosterlee (2015); Becker et al. (2019a, 2019b).
14.	That is, $E [(W_{t}^{i} - W_{s}^{i}) (W_{t}^{j} - W_{s}^{i})] = ρ_{i j} (t - s)$ for all $i \neq j$ and $s < t$ .
15.	Simulation based methods work for any price dynamics that can efficiently be simulated. Prices of max-call options on underlying assets with different price dynamics were calculated in Broadie and Cao (2008) and Becker et al. (2019a).
16.	The hyperparamters $β_{1}, β_{2}, ε$ were chosen as in Kingma and Ba (2015). The stepsize $α$ was specified as $10^{- 1}$ , $10^{- 2}$ , $10^{- 3}$ and $10^{- 4}$ according to a deterministic schedule.

Figure 1. Total hedging errors for

s_{0} = 100

,

M \in {12, 96}

,

d = 5

(top) and

d = 10

(bottom) along 4,096,000 sample paths of

{(Y_{m})}_{m = 0}^{N M}

. The values of the parameters r,

δ

,

σ

,

ρ

, K, T and N were as in Table 1 and Table 2.

Figure 1. Total hedging errors for

s_{0} = 100

,

M \in {12, 96}

,

d = 5

(top) and

d = 10

(bottom) along 4,096,000 sample paths of

{(Y_{m})}_{m = 0}^{N M}

. The values of the parameters r,

δ

,

σ

,

ρ

, K, T and N were as in Table 1 and Table 2.

Table 1. Price estimates for max-call options on 5 and 10 symmetric assets for parameter values of

r = 5 %

,

δ = 10 %

,

σ = 20 %

,

ρ = 0

,

K = 100

,

T = 3

,

N = 9

.

t_{L}

is the number of seconds it took to train

τ^{Θ}

and compute

\hat{L}

.

t_{U}

is the computation time for

\hat{U}

in seconds. 95% CI is the 95% confidence interval (6). The last column lists the 95% confidence intervals computed in Becker et al. (2019a).

Table 1. Price estimates for max-call options on 5 and 10 symmetric assets for parameter values of

r = 5 %

,

δ = 10 %

,

σ = 20 %

,

ρ = 0

,

K = 100

,

T = 3

,

N = 9

.

t_{L}

is the number of seconds it took to train

τ^{Θ}

and compute

\hat{L}

.

t_{U}

is the computation time for

\hat{U}

in seconds. 95% CI is the 95% confidence interval (6). The last column lists the 95% confidence intervals computed in Becker et al. (2019a).

d	$s_{0}$	$\hat{L}$	$t_{L}$	$\hat{U}$	$t_{U}$	Point Est.	$95 %$ CI	DOS $95 %$ CI
5	90	$16.644$	132	$16.648$	8	$16.646$	$[16.628, 16.664]$	$[16.633, 16.648]$
5	100	$26.156$	134	$26.152$	8	$26.154$	$[26.138, 26.171]$	$[26.138, 26.174]$
5	110	$36.780$	133	$36.796$	8	$36.788$	$[36.758, 36.818]$	$[36.745, 36.789]$
10	90	$26.277$	136	$26.283$	8	$26.280$	$[26.259, 26.302]$	$[26.189, 26.289]$
10	100	$38.355$	136	$38.378$	7	$38.367$	$[38.335, 38.399]$	$[38.300, 38.367]$
10	110	$50.869$	135	$50.932$	8	$50.900$	$[50.846, 50.957]$	$[50.834, 50.937]$

Table 2. Average hedging errors and empirical hedging shortfalls for 5 and 10 underlying assets and different numbers M of rehedging times between consecutive exercise times

t_{n - 1}

and

t_{n}

. The values of the parameters r,

δ

,

σ

,

ρ

, K, T and N were chosen as in Table 1. IHE is the intermediate average hedging error (8), IHS the intermediate hedging shortfall (9), HE the total average hedging error (10) and HS the total hedging shortfall (11).

\hat{V}

is our price estimate from Table 1. T1 is the computation time in seconds for training the hedging strategy from time 0 to

t_{1} = T / N

. T2 is the number of seconds it took to train the complete hedging strategy from time 0 to T.

Table 2. Average hedging errors and empirical hedging shortfalls for 5 and 10 underlying assets and different numbers M of rehedging times between consecutive exercise times

t_{n - 1}

and

t_{n}

. The values of the parameters r,

δ

,

σ

,

ρ

, K, T and N were chosen as in Table 1. IHE is the intermediate average hedging error (8), IHS the intermediate hedging shortfall (9), HE the total average hedging error (10) and HS the total hedging shortfall (11).

\hat{V}

is our price estimate from Table 1. T1 is the computation time in seconds for training the hedging strategy from time 0 to

t_{1} = T / N

. T2 is the number of seconds it took to train the complete hedging strategy from time 0 to T.

$d$	$s_{0}$	$M$	IHE	IHS	IHS/ $\hat{V}$	T1	HE	HS	HS/ $\hat{V}$	T2
5	90	12	$0.007$	$0.190$	$1.1 %$	102	$- 0.001$	$0.676$	$4.1 %$	379
5	90	24	$0.007$	$0.139$	$0.8 %$	129	$- 0.002$	$0.492$	$3.0 %$	473
5	90	48	$0.007$	$0.104$	$0.6 %$	234	$- 0.001$	$0.367$	$2.2 %$	839
5	90	96	$0.007$	$0.081$	$0.5 %$	436	$- 0.001$	$0.294$	$1.8 %$	$1 546$
5	100	12	$0.013$	$0.228$	$1.4 %$	102	$0.006$	$0.785$	$4.7 %$	407
5	100	24	$0.013$	$0.163$	$1.0 %$	131	$0.006$	$0.569$	$3.4 %$	512
5	100	48	$0.013$	$0.118$	$0.7 %$	252	$0.007$	$0.423$	$2.5 %$	931
5	100	96	$0.013$	$0.089$	$0.5 %$	470	$0.006$	$0.335$	$2.0 %$	$1 668$
5	110	12	$0.002$	$0.268$	$1.6 %$	102	$- 0.012$	$0.881$	$5.3 %$	380
5	110	24	$0.002$	$0.192$	$1.2 %$	130	$- 0.012$	$0.638$	$3.8 %$	511
5	110	48	$0.002$	$0.139$	$0.8 %$	262	$- 0.013$	$0.474$	$2.9 %$	950
5	110	96	$0.002$	$0.105$	$0.6 %$	471	$- 0.010$	$0.374$	$2.3 %$	$1 673$
10	90	12	$- 0.015$	$0.192$	$0.7 %$	111	$- 0.010$	$0.902$	$3.4 %$	414
10	90	24	$- 0.014$	$0.147$	$0.6 %$	145	$- 0.011$	$0.704$	$2.7 %$	534
10	90	48	$- 0.015$	$0.136$	$0.5 %$	269	$- 0.011$	$0.611$	$2.3 %$	958
10	90	96	$- 0.015$	$0.121$	$0.5 %$	506	$- 0.012$	$0.551$	$2.1 %$	$1 792$
10	100	12	$0.008$	$0.230$	$0.9 %$	111	$0.015$	$1.025$	$3.9 %$	414
10	100	24	$0.008$	$0.176$	$0.7 %$	152	$0.014$	$0.797$	$3.0 %$	531
10	100	48	$0.008$	$0.150$	$0.6 %$	271	$0.016$	$0.682$	$2.6 %$	978
10	100	96	$0.008$	$0.132$	$0.5 %$	512	$0.014$	$0.672$	$2.6 %$	$1 803$
10	110	12	$- 0.029$	$0.249$	$1.0 %$	112	$- 0.026$	$1.146$	$4.4 %$	410
10	110	24	$- 0.029$	$0.189$	$0.7 %$	146	$- 0.027$	$0.908$	$3.5 %$	530
10	110	48	$- 0.029$	$0.160$	$0.6 %$	269	$- 0.026$	$0.782$	$3.0 %$	965
10	110	96	$- 0.029$	$0.151$	$0.6 %$	507	$- 0.024$	$0.666$	$2.5 %$	$1 777$

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.