Financial Option Valuation by Unsupervised Learning with Artificial Neural Networks

Artificial neural networks (ANNs) have recently also been applied to solve partial differential equations (PDEs). The classical problem of pricing European and American financial options, based on the corresponding PDE formulations, is studied here. Instead of using numerical techniques based on finite element or difference methods, we address the problem using ANNs in the context of unsupervised learning. As a result, the ANN learns the option values for all possible underlying stock values at future time points, based on the minimization of a suitable loss function. For the European option, we solve the linear Black–Scholes equation, whereas for the American option we solve the linear complementarity problem formulation. Two-asset exotic option values are also computed, since ANNs enable the accurate valuation of high-dimensional options. The resulting errors of the ANN approach are assessed by comparing to the analytic option values or to numerical reference solutions (for American options, computed by finite elements). In the short note, previously published, a brief introduction to this work was given, where some ideas to price vanilla options by ANNs were presented, and only European options were addressed. In the current work, the methodology is introduced in much more detail.


Introduction
The interest in machine learning techniques, due to the remarkable successes in different application areas, is growing exponentially. Impressive results have been achieved in image recognition or natural language processing problems, among others. The availability of large data sets and powerful compute units has brought the broad field of data science to a next level. ANNs are learning systems based on a collection of artificial neurons that constitute a connected network [26]. Such systems "learn" to perform tasks, generally without being programmed with task-specific rules. The neurons are organized in multiple layers; The input layer receives external data, the output layer produces the final result. The layers in between input and output are the so-called hidden these unsupervised learning methods yield accurate results. The authors in [16] extended the class of PDE solutions that may be approximated by these unsupervised learning methods, by translating the PDEs to a suitably weighted minimization problem for the ANNs to solve. Moreover, in [4,5] American options were formulated as optimal stopping problems, where optimal stopping decisions were learned and so-called ANN regression was used to estimate the continuation values. This is an example of the unsupervised learning approach to solve a specific formulation of options with early-exercise features.
We will price European and American options modeled by the Black-Scholes PDE and look for solutions for all future time points and stock values. So, linear and nonlinear partial differential equations need to be solved. We will solve European and American option problems based on one and two underlying assets, as the methodology is easily extended to solving multi-asset options. For the European problems, the accuracy of the network can be measured as we have the analytic Black-Scholes solution as a reference. American options will be formulated as linear complementarity problems. Since an analytic solution is not known in this case, the reference solutions are obtained by finite element computations on fine meshes. This paper is organized as follows. In Section 2, the methodology to train the neural network is introduced. In Section 3, the financial PDE problems are formulated, for the linear and the nonlinear case. Numerical results, ANN convergence and solution accuracy, are presented in Section 4. Finally, Section 5 concludes.

Artificial Neural Networks Solving PDEs
In this section, we introduce the methodology following [16] to solve linear and nonlinear time-dependent PDEs by ANNs. With this aim, we write a general PDE problem as follows: N 0 (v(t * , x)) = 0 x ∈ Ω and t * = 0 or t * = T, where v(t, x) denotes the solution of the PDE, N I (·) is a linear or nonlinear time-dependent differential operator, N B (·) is a boundary operator, N 0 (·) is an initial or final time operator, Ω is a subset of R D and ∂ Ω denotes the boundary on the domain Ω.
As mentioned in the introduction, we will compute European and American option values for one and two underlying assets by unsupervised learning. The goal is to obtainv(t, x) by minimizing a suitable loss function L(v) over the space of k-times differentiable functions, where k depends on the order of the derivatives in the PDE, i.e where we denote byv(t, x) the true solution of the PDE.
Results are available that establish a relation between the value of the loss function and the accuracy of the approximated solution. A general expression for the loss function, defined in terms of the L p norm, including a weighting, is defined as follows [19,16]: where Ω = Ω × [0, T ], ∂Ω the boundary of Ω and The integrals of the loss function are labeled as: which are denoted as the interior and the boundary loss functions, respectively.
Financial options with early-exercise features give rise to free boundary PDE problems. Free boundary problems are well-known and often appearing in a variety of engineering problems. We recall some classical formulations of the free boundary problems that we encounter here: • An optimal stopping time problem, • A linear complementarity problem (LCP), • A parabolic variational inequality, • A penalty problem.
We will focus on the reformulation of the free boundary problem as an LCP, and aim to solve this formulation by ANNs and unsupervised learning. The generic LCP formulation reads, or, equivalently, x ∈ Ω and t * = 0 or t * = T.
Our expression for the loss function, to solve the linear complementarity problem, is as follows: As an alternative loss function for the LCP, a variance normalization loss function has also been considered [16], which is defined as: whereN I is defined as N I but considering each term in absolute value andv is the mean of v over the corresponding domain.
The parameter λ ∈ (0, 1) in the loss functions represents the relative importance of the interior and boundary functions in the minimization process. The choice of such value can be addressed in different ways, see [19,16]. In this work, the loss weight is, in most of the tests, set equal to λ = 0.5. It was found in [16] that this choice works very well for PDE problems with smooth, non-oscillatory solutions (as we also encounter them in the option valuation problems under consideration). For some linear complementarity problems, we will compare the basic choice with the variance normalization loss function. In addition, for some other cases, we will compute the loss function considering a so-called optimal loss weight (as in [16]).
Based on the loss function, the ANN has been trained with the Broyden-Fletcher-Goldfarb-Shanno optimization (BFGS). This is a quasi-Newton method which employs an approximate Hessian matrix. Particularly, we use the L-BFGS algorithm to optimize the vector θ, which contains all parameters defining the neural network. The activation function used in the ANN is the hyperbolic tangent function tanh(x), however, other choices of the activation function can also be used, like the sigmoid function (resulting in very similar results in this work). We will work with relatively small neural networks formed by four hidden layers with 20 neurons each for the European and American options. Increasing the number of layers did not improve the accuracy of the solution significantly for these particular problems. Finally, the integral terms in the loss function are approximated by Monte Carlo techniques.

Financial derivative pricing partial differential equations
In this section the option pricing partial differential equation problems are presented. We briefly introduce the models.

European options, one underlying asset
The reference option pricing PDE for the valuation of a plain vanilla European, put or call, option is the Black-Scholes equation. The underlying asset S t is assumed to pay a constant dividend yield δ, and follows the geometric Brownian motion: where W P t is a Brownian motion. The drift term µ, the risk-free interest rate, r, and the asset volatility, σ, are known functions. Assuming there are no arbitrage opportunities, the European option value follows from the Black-Scholes equation, where operator A is defined as, and function H denotes the option's payoff, which is given by: with K the strike price in the option contract.
In order to apply numerical methods to solve the PDE, a bounded domain should be considered and a proper set of boundary conditions should be imposed. We assume a domain large enough being [0, S ∞ ], with S ∞ four times the strike K.
Depending on the kind of option, call v c or put v p , the problem (8) is subject to the conditions: The analytic solution for (8) is known: with, Regarding the numerical solution with ANNs, we will use the methodology introduced in the previous section. In particular, the loss function is defined as: where functions G and H denote the values of the spatial boundary conditions and final condition, respectively. The integral terms in the loss function are approximated by Monte Carlo techniques, as a result, we obtain the following interior and boundary loss function for the parameter vector θ: The collocation points are uniformly distributed over the domain Ω and the boundary ∂ Ω and {y 0 i } n0 i=1 are uniformly distributed over the domain T × Ω , respectively and y = (t, x).

Two underlying assets
We extend the model for one underlying asset to valuing basket options with two underlying assets. The two-asset prices follow the following dynamics, where µ 1 , µ 2 are drift terms, δ 1 , δ 2 dividend yields, the Brownian increments, dW i for i = 1, 2, satisfy E(dW i ) = 0, and the underlying assets are correlated: In the Black-Scholes framework, the two-asset European option price, v(t, S 1 , S 2 ), satisfies the following PDE: where the operator B is defined as follows: and function H 2 (S 1 , S 2 ) denotes the payoff function. By prescribing different payoff functions, different options can be defined, like an exchange option, rainbow option or an average put option. We will deal with the exchange option, for which an analytic solution is given by the Margrabe's formula [15] and the max-on-call rainbow option, for which a closed-form expression was introduced in [10] and [23]. These particular options are defined by their payoff functions: According to the Margrabe's formula, the fair value of a European exchange option at time t is given by: where N 0,1 again denotes the cumulative distribution function for the standard normal, σ = σ 2 1 + σ 2 2 − 2σ 1 σ 2 ρ and With the following parameters: the closed-form formula for a call on the maximum is given by: where M is the cumulative bivariate normal distribution dxdy .
To obtain a numerical solution of the PDE (14), we bound the domain and impose appropriate boundary conditions. The computational domain should be In the particular case of the exchange and rainbow max-on-call options, where the analytic solutions are known, we impose as boundary conditions the analytic option value on each boundary.
Similar to the one-dimensional problem, we address the European exchange option problem building the loss function as a sum of the interior and boundary loss functions, using λ = 0.5.

American options, one underlying asset
As we have introduced in Section 2, we also address the problem for an American option depending on one underlying asset price. With this aim, we focus on the linear complementarity formulation.

Linear complementarity formulation
We will here consider the linear complementarity problem (LCP) American option valuation formulation, see, for example, [24,9], as follows, This LCP can be rewritten as a nonlinear PDE as follows Essentially, using the same methodology for solving the European option PDEs, we address the linear complementarity formulation and its equivalent formulation as a nonlinear PDE given by (19).
As we introduced in Section 2, the loss function can be formulated using variance normalization. Moreover, in case of the American option we will also compute λ as the optimal loss weight.
The loss function based on variance normalization depends on the variance of the network output. For the Black-Scholes American option problem, the loss function following variance normalization is given by whereL(v(t, x))) is defined as follows function G refers to the boundary conditions imposed in a bounded domain which are defined as in (11) and function H denotes the final condition. Moreover,v is the mean of v over the corresponding domain, which is given as Then, approximating each integral term by Monte Carlo techniques the resulting function is defined as follows with θ containing all parameters of the neural network, vector y = (t, x), and the collocation points {y * i } n * i=1 are uniformly distributed over the boundary ∂Ω. An alternative is to build the loss function based on an optimal loss weight. However, optimizing λ can be nontrivial.
In order to find the optimal loss weight, we may look for a so-called -close solution to the true solutionv, see [16], for all n ≥ 0 and i ∈ 1, . . . , d, where d is the dimension of the problem. Satisfying such condition, the value of the optimal loss weight λ * should be: where functionN I (x,v) is defined as the function N I (x,v) with each term in absolute value. This expression of λ * is constant when the analytical solution is known. However, for the American options where the analytical solution is not known, the optimal loss weight can be computed by approximating the value ofv in (24) by the trained solution. Note that in this case, the loss weight is a function instead of a constant value and is optimized by the neural network.
As a result, the loss function is built in the following way: and the optimal loss weight is given in terms of the trained solution v, as follows: (21).

Two-Asset American option
The one underlying asset American option pricing problem is extended to also price multi-asset American options. We focus on two underlying assets and formulate the problem as a linear complementarity problem. Based on two asset prices following correlated geometric Brownian motion, the American option value can be modeled by the following linear complementarity problem: Operator B is defined as in (15) and function H 2 (S 1 , S 2 ) denotes the payoff function. In order to compare with the European option problem, an American call on the maximum is also priced, moreover, we address a two-asset spread option and a put arithmetic average option. Then, the payoff functions are defined as: H 2 (S 1 , S 2 ) = (max(S 1 , S 2 ) − K) + , max-on-call rainbow option, H 2 (S 1 , S 2 ) = (S 1 − S 2 − K) + , asset spread option, H 2 (S 1 , S 2 ) = (K − (S 1 + S 2 )/2) + , arithmetic average put.
In order to solve the linear complementarity formulation using numerical methods, a bounded domain should be considered and appropriate boundary conditions should be imposed. In particular, we consider a domain large enough to avoid that the solution is affected by the conditions, in the interested regions of the asset prices. Whereas for the European option problem the analytical solution is known and imposed as a boundary condition, for the American options problem, where the analytical solution is not known, we should define the appropriate boundary conditions. Then, we start studying at which boundaries a condition should be imposed. Following [18], that includes the theory of Fichera [7], we introduce the notation x 0 = τ , x 1 = S 1 , x 2 = S 2 , and the domain where we use the notation: Then, the PDE in (26) can be written in the form: where the involved data are defined as follows: Next, we introduce the following subset of Γ * in terms of the normal vector to the boundary pointing inwards Ω * , − → m = (m 0 , m 1 , m 2 ) In this particular problem, we have: and Σ 2 = Γ * ,+ 0 . Thus, following [18], the boundary conditions must be imposed over the subset Σ 1 ∪Σ 2 which matches with the set Γ * ,− 0 ∪Γ * ,+ 1 ∪Γ * ,+ 2 . Then, is not necessary to impose boundary conditions above the boundary where the asset prices S 1 and S 2 are equal zero. Moreover, for simplicity, we assume that the option value is equal to the payoff when the asset prices S 1 and S 2 take the maximum values.
Next, taking into account the methodologies proposed to solve the one-dimensional American problem and the two-dimensional European problem, we propose the loss function to solve the multi-asset American option by artificial neural network. First of all, we rewrite the linear complementarity problem (26) as the equivalent nonlinear PDE: Similar to the previous problems, we build the loss function as the sum of the interior and boundary loss functions as follows: where function G refers to the boundary conditions and function H denotes the final condition imposed for the problems. Note that the loss function is a generalization of the loss function introduced for the one asset problem and the integral terms are also approximated by Monte Carlo techniques.

ANN Option Pricing Results
In this section the European and American options values are computed with the ANNs based on the loss functions introduced. We apply the unsupervised learning methodology from the previous section to compute the solutions and show some results. For the following tests, we have considered the parameter p = 2 in the loss functions.

European options
First of all, we discuss the European option single asset results obtained solving the PDE problem (8) by ANNs.
The results are presented with the loss function introduced in (12) and here we use the basic choice λ = 0.5. Recall that optimal loss weight-based loss function or the variance normalization technique are especially useful in the case of nontrivial solutions.
We start with a European put option, with the following parameters values: σ = 0.25, r = 0.04, T = 1, K = 15, S ∞ = 4K, δ = 0.0. In Figure 1, the ANNbased, trained and the analytical solution are plotted for two time instances. We measure the accuracy of the solution generated by the ANN by comparing the relative error of the trained solution v AN N with the analytic solution v BS , as follows: In Figure 2 the error throughout the domain is plotted. Clearly, the biggest error in the ANN solution is found close to the strike price at maturity time t = T , where the payoff is non-smooth. The relative error according to (29) with λ = 0.5 is equal to 2.23 × 10 −4 .
Next, we show some results for a European option depending on two underlying assets.
The corresponding loss function has been optimized by means of the L-BFGS algorithm and choosing the tanh as the activation function. In the last layer a linear activation function is considered. We have plotted in Figure 3 the ANN solution for the European exchange option. The error, comparing the approximated ANN solution with the analytical solution given by (16) is also plotted. Note that the maximum error is obtained for the minimum value of both asset prices, which is related to S 1 /S 2 in the expression of d 1 in (16).   for several values of K, modifying the original domain, we found that scaling the input parameters is not sufficient to obtain highly accurate results for large domain sizes. In Table 1, the error for a European max-call option is presented, based on different unscaled domain sizes. It can be observed that as the domain increases the accuracy of the neural network solution decreases.
In order to understand, the reasons for the degraded accuracy with an increasing domain size, we have computed the gradients of the interior and boundary loss functions. In Table 2   domain size, and therefore the ANN needs to be modified. The initialization of the weights is adapted by using a variation of the Xavier initialization. In particular, the initial values of the weight values in the last layer of the ANN will be scaled, by multiplying them by the maximum option value. As a result, we obtain a solution which is accurate independent of the size of the domain, see Table 3. This adaptation, i.e. the weights having similar magnitude as the expected largest option value in the output, forms a robust weight initialization. Moreover, such initialization helps for the interior and boundary loss functions to have similar sensitivity to the domain size. In Table 4, we can observe such behaviour, where the rate between both gradients remains close to 1/3 when the size of the domain increases. Our results show that the BFGS optimization doesn't seem to pick up the gradient if the initial weights are not sufficiently large. Moreover, similar results can be observed when the inputs are not scaled.
Based on the adapted weights initialization, in Table 5, the results for the European max-call option are presented, and we compare the solution computed by the ANN with the analytical solution given by (17) for some specific asset prices and different strike values, based on the corresponding loss function and λ = 0.5. The parameters considered are, σ 1 = σ 2 = 0.2, ρ = 0.1, δ i = 0.1, r = 0.05 and T = 1. Moreover, the maximum value of the asset prices is S 1∞ = S 2∞ = 4K. Note that the accuracy of the trained solution is not affected by the size of the domain, in addition, similar to the one-dimensional case, the maximum error is (S 1,∞ , S 2,∞ )   obtained when the underlying value is close to the strike price.
In Table 6, we present the error for the two-asset European options. The values are computed based on the expression in (29). We observe very fine accuracy for the problems.

American options
The goal of this section is to address the American option problem by using unsupervised learning with the ANN. As for the European option, we compute the value for one and two underlying assets. However, whereas for the European option, an analytical solution is known, for the American case, we will use the  option values computed by finite elements (FEM) using the numerical methods in [2] and [3] to solve the linear complementarity problem for the American options as the reference.
Similar to the European options problem, the loss function has been optimized using the L-BFGS algorithm, moreover, the ANN is based on the activation function tanh(x). With the aim of comparing both methodologies, we price an American option with the same parameter data that in the previous example, considering now, the optimal loss weight, which equals λ ≈ 0.90. We determine first American options with the following parameter data: σ = 0.25, δ = 0.26, r = 0.3, T = 1, K = 15, S ∞ = 4K. Figure 4 shows the trained solution and the error related to a reference FEM solution, for all time points. As for the European options, the maximum error is reached when the asset price is equal to the strike price, close to the maturity time, where the payoff function is not smooth. In Figure 5, a comparison of the American option value computed by ANNs or FEM is presented for different time points. Moreover, the payoff is plotted to demonstrate that the obstacle condition is satisfied. The accuracy of two loss functions for the LCP, one based on optimal loss weight and another based on variance normalization, is compared by means of the relative error of the solution, computed in terms of the L 2 -norm, similar to (29), i.e.
Very similar accuracy is obtained with both loss functions, 5.38 × 10 −4 (for optimal loss weight) versus 5.82 × 10 −4 (variance normalization). However, comparing the convergence of both methodologies, which is presented in Figure  6, we clearly observe that defining the loss function with a variance normalization (left) the neural network converges faster than using the optimal loss weight (right). In this figure, the relative error is plotted for different numbers of iterations of the L-BFGS algorithm. Next, we value the American options depending on two underlying assets. Optimizing the loss function with the L-BFGS algorithm, with the tanh(x) as the activation function, and equal weighting of boundary and interior losses, λ = 0.5, the following results have been obtained for the three types of options.
In Table 7, a comparison between the ANN and FEM solutions is shown. We focus on an American max-call option with several strike values and the following parameter data, ρ = 0.1, σ 1 = σ 2 = 0.25, r = 0.04, δ = 0.01 and T = 0.5. Moreover, for the FEM, 75 time steps have been considered for the time discretization and the spatial discretization is based on a 101 × 101 mesh. Figure 7 shows the trained solution for the spread American option with the following parameter values, K = 15, S 1∞ = S 2∞ = 4K, σ 1 = σ 2 = 0.25, ρ = 0.0, r R = 0.04, r = 0.3 and T = 1, and the error surface using the FEM solution following [3] as a reference. In Figure 8, the option value and the difference with the payoff function are shown.
Finally, in order to show the accuracy of the method applied to train the ANN to price American options depending on two asset prices, the relative error is presented in Table 8.    Note that the accuracy of the neural network for the American options depending on two stochastic factors is lower than for the European options. However,

Error
Max-call 1.73 × 10 −3 Spread 2.45 × 10 −3 Arithmetic average put 6.42 × 10 −3 Table 8: Error for different multi-asset American options this may be because here a numerical solution is our reference and not a closedform expression.

Conclusions
In this work, classical problems in financial option pricing have been addressed with artificial neural networks. In particular, following the classical Black-Scholes model, European and American options depending on one and two underlying assets have been valued. A new unsupervised learning methodology is introduced to solve the option value problems based on the PDE formulation. With this aim, we proposed appropriate loss functions. The classical Black-Scholes American option pricing problem has been formulated as a linear complementarity problem.
For the European option problem, the accuracy of the methods was compared to the analytical solution, whereas, for American options, solutions computed by the finite element method were used as reference values. For all problems considered, the final error in the ANN solution was highly satisfactory. Needless to mention that ANNs can be easily extended to solving higher-dimensional problems, as they are not drastically affected by the curse of dimensionality. Finally, the PDE problem formulation can be easily generalized by introducing counterparty risk which gives rise to nonlinear option valuation PDEs.