Computing Black Scholes with Uncertain Volatility—A Machine Learning Approach

: In ﬁnancial mathematics, it is a typical approach to approximate ﬁnancial markets operating in discrete time by continuous-time models such as the Black Scholes model. Fitting this model gives rise to difﬁculties due to the discrete nature of market data. We thus model the pricing process of ﬁnancial derivatives by the Black Scholes equation, where the volatility is a function of a ﬁnite number of random variables. This reﬂects an inﬂuence of uncertain factors when determining volatility. The aim is to quantify the effect of this uncertainty when computing the price of derivatives. Our underlying method is the generalized Polynomial Chaos (gPC) method in order to numerically compute the uncertainty of the solution by the stochastic Galerkin approach and a ﬁnite difference method. We present an efﬁcient numerical variation of this method, which is based on a machine learning technique, the so-called Bi-Fidelity approach. This is illustrated with numerical examples.


Introduction
In modern financial markets, traders can choose from a large variety of financial derivatives.This term denotes financial instruments that have a value determined by so-called underlying variables or assets such as stocks, the oil price or the weather.Originally, derivatives were invented to reduce the risk of uncertain prices, especially in agricultural markets where one could have long periods between sowing and harvest, see Chapter 1 and Chapter I in Whaley [2007], Crawford and Sen [1996], respectively.
As the derivative market was growing, the need for a pricing formula for derivatives also increased in the 20th century.A breakthrough was made by Black and Scholes [1973] and Merton [1973] when they contemporaneously formulated a model allowing the evaluation of derivatives.They were later awarded the Nobel prize in economics for their work.Derived from this model, the Black Scholes equation explains the behaviour of the price V of the derivative by means of a partial differential equation (PDE).This derivative is allowed to depend on the time t up to maturity T and only one underlying stochastic asset (e.g., a stock, an index or some commodity price).The price of the asset is denoted by S and is assumed to follow a geometric Brownian motion.The constant r denotes the risk free rate of interest in the market and σ ∈ R is the so-called volatility of the stochastic asset.Later, this model was extended to multiple underlying assets and adjusted for certain kinds of underlying variables, for example, interest rates; see Cox et al. [1985].
Comparison to real data soon showed that the volatility σ of one and the same stochastic asset can take values that differ more than explainable by rounding errors, etc.; see Rubinstein [1985], Scott [1987], Günther and Jüngel [2010].The most popular approaches to deal with this are to model the volatility either as local volatility, i.e., a function σ(S, t) as in Dupire [1994], Coleman et al. [1999], Crepey [2002], Hanke and Rösler [2005] or as a stochastic process; compare, e.g., the famous Heston model, Heston [1993], or Rubinstein [1985], Scott [1987], Hull and White [1987].
These models as well as others are formulated in continuous-time.Because prices in financial markets are only visible in trades, the markets always operate in discrete time; compare, e.g., Mishura and Ralchenko [2021].The models above, therefore, represent approximations of reality.In order to use them, they have to be fitted to the market using discrete data.This fitting procedure, however, introduces uncertainty in the recovered values, e.g., by rounding errors and by the interplay of the random nature of stochastic assets and their discrete observations.Because the interest rate r is easy to determine even from discrete data, we focus on the volatility σ for considerations of uncertainty.
Also in Namihira and Kopriva [2012], Pulch and van Emmerich [2009], Drakos [2016], the authors investigated uncertainty in the volatility: They modelled it as a one-dimensional random variable Σ(ω) = Θ(ω) or a function of a one-dimensional random variable Σ(Θ(ω)) for ω in the underlying probability space.Then the price process V (S, t, Θ) also depends on Θ and follows the stochastic version of the Black Scholes equation This equation can be derived by inserting the stochastic volatility into the Black Scholes model, where the Brownian motion is independent of Θ.It can be studied by means of uncertainty quantification: The solution V is developed in a generalized Polynomial Chaos (gPC) expansion for orthonormal polynomials p n w.r.t. the distribution of Θ and coefficients given by the expected value v n (S, t) = E(V (S, t, Θ)p n (Θ)).If Θ has a density µ : D → R, one can alternatively calculate the coefficients by In Namihira and Kopriva [2012], these integrals are directly computed by a quadrature rule.The required solutions V (•, •, x j ) in the quadrature points x j are calculated as the solutions of the deterministic Black Scholes equation ( 1) with σ = x j .This classifies the method as a Stochastic Collocation method.
In the articles Pulch and van Emmerich [2009], Drakos [2016], however, the stochastic Galerkin method is applied for computing the coefficients v n (S, t).By inserting the gPC expansion (3) into the stochastic Black Scholes equation (2), multiplying the equation by an orthogonal polynomial p k (Θ) and applying the expected value on both sides, deterministic PDEs for the coefficients v n (S, t) are derived After truncating of the system and the coupling term to a finite number of indices, the system is solved numerically by the method of lines in Pulch and van Emmerich [2009] and the finite element method in Drakos [2016].
In our work, we extend the model used above to the volatility Σ(Θ 1 , . . ., Θ L ) depending on many finitely random variables Θ 1 , . . ., Θ L .This leads to the stochastic Black Scholes equation A model like this might, for instance, occur when the volatility is modelled as a random variable that also depends on certain stochastic factors as in Zhang et al. [2018], Bazzana and Collini [2020], Xie et al. [2021], Milos , [2021].
We propose a momentum constrained maximum likelihood technique to fit the volatility distribution to real data and compare our results to market data.This enables the comparison of our model to real data, which was not considered in literature yet, but is an important tool to measure the model fit.
The solution is then derived in the same way as in (4) and calculated numerically by a finite difference method.The increased computational cost for multiple similar calculations is reduced by a Bi-Fidelity technique, which can be considered as a machine learning approach, whose effectiveness is illustrated in numerical examples.
The outline of the article is as follows: After introducing gPC to finitely many random variables, a method of fitting the stochastic volatility to real data is described in Section 2. The stochastic Galerkin method is used to solve Equation (5) in section 3.However, computational costs can be high.Thus, we introduce a Bi-Fidelity numerical technique to compute this more efficiently in Section 4. The paper is rounded out with numerical results illustrating the effectiveness of this technique and the fit to real market data in Section 5.

Fitting the Random Volatility to Real Market Data
For the convenience of the reader and in order to introduce notation, we briefly recall the fundamentals of generalized polynomial chaos (gPC).We then propose a method to fit the volatility distribution to real data.
Denote by Θ 1 , . . ., Θ L random variables with joint distribution function F : D → R for a multivariate domain D ⊂ R L .For a function f : D → R, the following notation is used for integration with respect to (w.r.t.) F :

Consider a system of polynomials {p
, where the polynomial pα (x 1 , . . ., x L ) has degree in the i-th variable deg xi (p α ) = α i .In adaption to Definition 8.24 in Sullivan [2015], we call this an infinite system of orthogonal polynomials w.r.t.F , if for all multi indices α, β ∈ N L 0 one has pα pβ = 0 for α = β, The existence of orthogonal polynomial systems follows from the Gram Schmidt algorithm, if for all α = (α 1 , . . ., α L ) ∈ N L 0 the moments x α1 1 • . . .• x α L L are finite.Uniqueness of the orthogonal polynomials is then given up to multiplication by constants.If the Θ i are independent, they are in particular given by the product of the orthogonal polynomials w.r.t. the distribution of each Θ i .
In the following, L p dF (D, H) denotes the space of all functions D → H that are p-times integrable w.r.t. the measure dF for some D ⊂ R n and codomain H.If dF is not explicitly defined, the Lebesgue measure is chosen.If D and H are not defined, then D equals the domain of F and H equals R.
It is well known that under certain circumstances, orthogonal polynomials span the space L 2 d F .They are thus called a complete orthogonal basis of L 2 d F .This is, for example, the case, if F is absolutely continuous, has finite moments and either (Θ 1 , . . ., Θ L ) realizes in a compact domain almost surely or the density of F is exponentially integrable.For details, see Rahman [2018].If the Θ i are independent, the orthogonal polynomials w.r.t.F span L 2 d F , if all orthogonal polynomial systems w.r.t. the density of Θ i span the corresponding L 2 spaces.This is due to the tensor product representation of L 2 d F in case of independency of the Θ i , see Example E.10 in Janson [1997].
Assuming such circumstances to be given, the gPC expansion can be defined.Theorem 1 (adaption of section 11.3 in Sullivan [2015]).Let Θ 1 , . . ., Θ L : Ω → R be random variables with joint distribution F such that the orthogonal polynomials (p α ) α∈N L 0 w.r.t.F form a complete basis of L 2 d F .Denote by H an arbitrary Hilbert space, e.g., the real numbers R or a space of the form L p (D, R), p = 0, 1, 2, for some domain D ⊂ R n .Then every random variable X : Ω → H with in distribution for a function X ∈ L 2 d F ( D, H) can be represented in the generalized Polynomial Chaos (gPC) form The proof follows in analogy to the proof for independent continuous random variables in Section 11.3 in Sullivan [2015] from the tensor product decomposition To fit the model to the data, we truncate the series by bounding the maximum polynomial degree |α| We propose a momentum constrained maximum likelihood approach to fit the gPC coefficients σ α to discrete real-world data.This facilitates the computation.
The values of the volatility of an asset can be obtained, e.g., by calculating the implied volatilities from corresponding European options.This generates a dataset of implied volatilities representing observations of the random variable Σ K (Θ 1 , . . ., Θ L ).When fitting the volatility to the data, we constrain ourselves to those tuples of coefficients corresponding to a volatility Σ K (Θ 1 , . . ., Θ L ) whose first moments coincide with the empirical moments of the dataset.The choice of these constraints reduces the dimension of the parameter space for the likelihood function while important characteristics of the distribution-the statistical moments-are maintained.
We illustrate the technique on the simple case of two independent random variables and truncation at K = 1: Example 1.Let Θ 1 , Θ 2 be two independent random variables of known distribution and w.l.o.g.expected value 0 and variance 1. Assume the densities of their distributions exist and denote them by f 1 , f 2 , respectively.Now consider the random volatility truncated to maximum polynomial degree 1 when choosing orthonormal polynomials.Assume that values of the volatility y 1 , . . ., y M are given.The plain maximum likelihood method would maximize the joint density h of the realizations y 1 , . . ., y M of Σ 1 (Θ 1 , Θ 2 ) w.r.t. the three coefficients (σ 00 , σ 01 , σ 10 ) ∈ U ⊂ R 3 \ {0}: Constraining the maximum likelihood estimator to be exact in the first moment, i.e., the expected value, gives This reduces the optimization to two variables Further constraints can be used to reduce the complexity even more: Claiming, e.g., the variance to coincide with the empirical variance Var(Σ M gives the additional relation reducing the optimization to one variable and the sign of σ 10 .This ends our example.

Deriving the System of PDEs for the gPC Coefficients
The stochastic Galerkin method is applied to the Black Scholes equation ( 5) with uncertain volatility in order to transform the stochastic PDE into a system of deterministic PDEs for the gPC coefficients of the solution V (S, t, Θ 1 , . . ., Θ L ).
To do so, one has to assume , such that Theorem 1 can be applied.In analogy to the one-dimensional case in Drakos [2016], Pulch and van Emmerich [2009], the thus derived gPC expansions are inserted in the Black Scholes equation ( 5).Multiplication of the equation by pδ (Θ 1 , . . ., Θ L ) and application of the expected value, for each δ ∈ N L 0 at a time, yields the equations due to orthogonality of the p α .Note that the Galerkin multiplication tensor M αβγδ := pα pβ pγ pδ p2 δ exists since the integrated functions are all polynomials in L variables.
In order to solve the system, the boundary conditions and the final condition corresponding to the considered financial derivative are transformed to conditions on the gPC coefficients v i .Usually, they are deterministic and thus appear in the coefficient v (0,...,0) , whereas all other coefficients vanish.
We truncated the gPC expansions up to maximum polynomial degrees K, N ∈ N 0 and obtained representation (9) for Σ K and which can be evaluated numerically.For demonstrative purposes, we use a finite difference scheme, see Appendix 7.
Note, however, that convergence of the truncated stochastic Galerkin solution V N in (10) to the true solution V as N → ∞ is not obvious and could not be proven to date.It is a topic open to further research.From now on, we assume convergence.

A Bi-Fidelity Approach for Calculating the Stochastic Galerkin Solution to the Black Scholes Equation with Random Volatility
If the volatility depends on just L = 2 random variables, the stochastic Galerkin (SG) solution truncated at maximum degree N already has (N + 1)(N + 2)/2 gPC coefficients.Thus, (N + 1)(N + 2)/2 coupled equations have to be Computing Black Scholes with Uncertain Volatility-A Machine Learning Approach PREPRINT solved to obtain the approximate SG solution.The number of equations and with it the computational cost rapidly increase as N or L increases.
This becomes a problem especially if the SG solutions for many options shall be computed at a time, e.g., for risk management evaluations of derivatives with different underlying assets.The Bi-Fidelity approach provides a solution to this problem, if the same type of option (e.g., European Call options) with the same maturity T and interest rate r, but different distributions of the volatility model Σ(Θ 1 , . . ., Θ L ) are considered.A situation like this arises, for instance, when comparing financial derivatives of the same type and maturity date but with different underlying stochastic assets.
In general, given a PDE depending on a random variable Ξ, the Bi-Fidelity method aims to approximate the desired high fidelity solution at a certain realization z of Ξ by pre-stored high and low fidelity solutions in some other realizations of Ξ and the computationally cheaper low fidelity solution in z.This can be considered a machine learning approach because the properties of the equation are learned offline by picking suitable realizations for the stored approximation data.
The application of Bi-Fidelity techniques to various problems is an active area of research, see e.g.At first, the random variable Ξ has to be assigned.In our case, it is not given by (Θ 1 , . . ., Θ L ), since we still want our solution to be a random variable depending on the Θ i in order to explore its stochastic behaviour.Instead, we suppose the distribution of Σ(Θ 1 , . . ., Θ L ) to change from calculation to calculation, as it would be the case for different underlying assets, without changes in the distributions of the Θ i .This could reflect different sensitivities of different stochastic assets to the factors Θ i .By representation (9) of the truncated gPC expansion of Σ, a variation in the distribution of the volatility means a variation in at least one of the gPC coefficients σ α , |α| ≤ K. Hence, the random variable Ξ describes volatility models of the form (9) by their gPC coefficients σ α , |α| ≤ K.
Then, high and low fidelity models have to be defined.The high fidelity model is the one we are actually interested in.
We choose a high-resolution numerical solution to (11) derived by the explicit finite difference scheme (18) with a large number of grid points in the S-space M H ζ and in time N H τ .The low fidelity model, i.e., the cheaper model that is less trusted but used for the approximation rule, is chosen to be the same numerical solution on a coarse grid with small M L ζ and N L τ .Note, however, that N L τ must not be chosen too small; otherwise, the stability of the scheme could be violated for a large number of volatility models.The reason for this requirement will become clear at step 1 of the offline data generating steps.
It is also possible to consider different numerical schemes or even different but similar underlying equations for the high and low fidelity models.However, important characteristics of the solution must be shared between the models.Now one can proceed with the typical Bi-Fidelity algorithm as described in Narayan et al. [2014], Zhu et al. [2014], Liu and Zhu [2020].Below, the application of this algorithm is explained, where the volatility is assumed to depend on L = 2 random variables Θ 1 , Θ 2 for a better readability.An extension to more random variables can easily be achieved.The truncation number K = 1 is chosen such that the random variable Ξ represents the gPC coefficients σ 00 , σ 10 and σ 01 of the volatility as in Example 1.
Since the actual computational effort lies in the calculation of the transformed system of Equation ( 17), the Bi-Fidelity approach is applied directly on the transformed v. Thus, a transformation back to the original variables v N , S and t is performed only once for the Bi-Fidelity solution, reducing the computational cost.For the calculation of the scheme, initial conditions and the Galerkin multiplication tensors are pre-stored and reused.
The following three steps describe the offline learning phase of the algorithm in which the stored approximation data are generated, compare e.g.Narayan et al. [2014], Zhu et al. [2014], Liu and Zhu [2020].These steps have to be executed only once.
Step 1: At first, the codomain of Ξ is described by finite intervals σ 00 ∈ [a 00 , b 00 ], The intervals can, for instance, be constructed by calculation of σ 00 , σ 10 , σ 01 for some of the stochastic assets of interest.Alternatively, one can think of possible values of σ 00 inspired by similar experiments and choose bounds of σ 10 and σ 01 such that the variance of Σ(Θ 1 , Θ 2 ) is bounded by some predefined value.We used this approach in the calculations of Section 5.

PREPRINT
After that, a large set Y of possible realizations of Ξ has to be chosen such that it is a good 'cover' of the possible values of Ξ.One can use Monte Carlo sampling or a structured grid on the codomain of Ξ.For every volatility model described by a y ∈ Y , the low fidelity solution vL (y) is computed, if the corresponding system of equations is parabolic and the low fidelity scheme is stable.
Step 2: Since one can usually not afford to calculate the high fidelity solution in every y ∈ Y , one has to determine the most important points.Let A ∈ N denote the number of high fidelity computations one can afford, then this can be achieved by choosing z 0 := argmax y∈Y d L (v L (y), 0)) and The notation VL ( Ŷ ) := span(v L (ŷ) | ŷ ∈ Ŷ ) for Ŷ ⊂ Y is used for the span of the solutions to previously picked We used a greedy procedure for the point selection; for further details on the computation, compare Algorithm 1 in Narayan et al. [2014].This step selects the points z 1 , . . ., z A whose solutions span the 'largest' subspace VL (z 1 , . . ., z A ) of VL (Y ).
Step 3: The high fidelity solution is calculated in the thus derived points z 1 , . . ., z A .Note that N H τ has to be chosen large enough such that the numerical scheme is stable for all volatility models z i .The parabolicity of the system of PDEs does not need to be checked again, as it has been checked in step 1 already.The pairs of high and low fidelity solutions vH (z i ), vL (z i ) are stored.
Assume now that a certain volatility model z is given and the corresponding Bi-Fidelity solution of the Black Scholes equation with uncertain volatility shall be computed.This is performed in the online phase as follows, see Narayan et al. [2014], Zhu et al. [2014], Liu and Zhu [2020]: Step 1: The low fidelity solution vL (z) is calculated by scheme (18).Note that the system of equations needs to be parabolic and the scheme has to be stable for a reasonable calculation.
Step 2: The low fidelity solution vL (z) is projected onto VL (z 1 , . . ., z A ) leading to the projection formula with projection coefficients c k ∈ R.Here P V y denotes the orthogonal projection of y onto V. Details of the computation of the c k can be found in Narayan et al. [2014], for instance.
Step 3: Finally, the Bi-Fidelity solution is constructed by applying the same projection law to the stored high fidelity solutions After deriving vBF , it has to be transformed back to the original variables v, S and t.

Numerical Results
This section presents numerical solutions to the Black Scholes equation with uncertain volatility.For the sake of simplicity, the volatility is assumed to depend on the two independent random variables Θ with standard normal distribution and ∆ uniformly distributed on [−0.5, 0.5].The properties of such models are investigated and the model is tested on real data.Furthermore, the error of the Bi-Fidelity approximation is investigated and its computation time is compared to the high fidelity model.
For more convenient reading, times t and the maturity T are given in days, whereas for the computations, these values were multiplied by 1/251 to go over to years.

Results for the Extended Model
The numerical solution to the truncated system of Equation ( 11) for a European Call option with a strike price strike = 100 and maturity T = 20 in a market with risk-free rate of interest r = 0 is visualized in Figure 1a,b by plotting its mean and variance.
Computing Black Scholes with Uncertain Volatility-A Machine Learning Approach

PREPRINT
The volatility of the underlying stochastic asset is modelled by for Θ standard normally distributed and ∆ uniformly distributed on [0.5, 0.5].The random variables are modelled to be independent.For the gPC expansion of the solution, the truncation number N = 5 was chosen, for which system (11) is parabolic.The numbers of grid points M ζ = 200 in ζ and N τ = 319 in τ were chosen such that the applied explicit finite difference scheme ( 18) is stable.
Contour lines were drawn at a height of quarters of the maximum absolute value and the borders of the smoothing area, i.e., the area where the solution differs from its final condition V (S, T ) = (S − strike) + , were drawn in red.These lines will be present in each of the following surface plots.Note that the expected value surface resembles the solution of the deterministic Black Scholes equation for σ = 0.5 in Figure 1c, but the smoothing area is larger.Experiments showed that the qualitative shape of the expected value and variance is characteristic for solutions to the Black Scholes equation with random volatility (5) of the form Σ(Θ, ∆) = σ 00 + σ 10 Θ + σ 01 ∆.These models lead to solutions that 'lie between' the solutions for volatility that depends on Θ or ∆ only and has the same mean and variance.
The higher σ 10 is in comparison to σ 01 , the closer the solution is to the solution for volatility depending on Θ only and the further away it is from the solution for the model depending on ∆ only, and vice versa.An increase in the mean σ 00 of the volatility while keeping its variance constant was observed to enlarge the smoothing area and thus the spread of the variance, which in turn flattens it.
A rise in the variance σ 2 10 + σ 2 01 /12 of the volatility with constant mean σ 00 , however, seemed to scale up the variance of the SG solution by the same factor.Meanwhile, the expected value of the SG solution was marginally increased within the smoothing area.

Comparison to Real Market Data
The model is compared to market prices of a European Call option, whose end of day values are considered from 7 January 2019 to 20 September 2019.1 Its underlying asset is the DAX index, and the strike price and maturity are given by strike = 10,275 and T = 180 days, respectively.
A volatility model of the form Σ(Θ, ∆) = σ 00 + σ 10 Θ + σ 01 ∆ was fitted to the daily implied volatilities by using the moment constrained maximum likelihood approach from Example 1 with the two moments mean and variance.This lead to the volatility model Σ(Θ, ∆) = 0.2292 + 0.1126Θ + 0.0115∆, ( whose fitted density is shown in Figure 2a together with a histogram density estimator.The SG solution was computed using the truncation number N = 5 and the numbers of grid points M ζ = 200 and N τ = 678.With these values, the numerical scheme is stable and the system of Equation ( 11) is parabolic.Figure 2b shows the market prices and the expected value of the SG solution as well as the range expected value plus/minus standard deviation and the solution to the deterministic Black Scholes equation with volatility σ = E(Σ(Θ, ∆)).A more detailed plot of those graphs for the last 55 days of the option is given in Figure 2c.One observes that the expected value of the SG solution is very close to the data on these days but slightly above the data at earlier times.However, the data are always in the range expected value plus/minus standard deviation, as one would expect from stochastic theory.
A comparison to the deterministic solution shows that it also lies above the market data for early times.Recall that unlike the deterministic solution, the SG solution allows realizations to differ from the expected value within a certain range.

Comparing Bi-Fidelity Solution and High Fidelity Solution
The Bi-Fidelity solution of the Black Scholes equation with uncertain volatility (5) following volatility model ( 13) for a European Call option is compared to its high fidelity solution.After that, a simulation is performed to find the error in expected value and in variance between the Bi-Fidelity solution and the high fidelity solution.The error is characterized in size and shape by a Monte Carlo simulation for different volatility models.Finally, the computation times for high fidelity and Bi-Fidelity model are compared.
We go back to the toy model of a market with interest rate r = 0 and maturity T = 23 of the option.The strike price was set to strike = 100 and the gPC expansion of the solution was truncated after a total polynomial degree of N = 5 as before.
A rather coarse grid with M L ζ = 50 and N L τ = 150 was chosen for the low fidelity model.This N L τ is high enough such that the vast majority of all low fidelity computations performed in the examples explained below was stable.In the case of instability, the corresponding sample point was removed from the set of low fidelity sample points.The high fidelity solution was computed on a fine grid with M H ζ + 1 = 350 + 1 grid points in S direction.The number of grid points N H τ + 1 = 5853 + 1 in the time direction was chosen such that all high-resolution computations for important volatility models were stable.

The low fidelity sample points represented volatility models
The coefficients were chosen such that V ar(Σ(Θ, ∆)) ≤ σ  To study the deviations, the absolute difference in expected values is displayed in Figure 4a close to the strike price and Figure 4b for a wider range of S values.One observes a difference of size 10 −3 within the smoothing area that seems to increase in absolute value as S → ∞. Figure 4c shows the difference for all values of S and t.The maximum absolute value of the absolute difference is less than 0.3 and occurs close to S = ∞, where the option values tend to infinity.Therefore, a difference of 0.3 in these regions means a small deviation.The difference in the smoothing area of size 3 • 10 −3 is larger compared to the values attained in this region that are close to zero.Recall, however, that the solution is multiplied by strike when transforming back the variables.Hence, an error of size 10 −3 at strike 100 means an error of size 10 −5 • strike.We examine the absolute difference in variance as represented in Figure 6a to lie in the smoothing area.Figure 6b, showing the difference for all S and t values, supports this conclusion.The error is again of size 10 −3 = 10 −7 • strike 2 .Finally, a simulation was performed to obtain the characteristic size and shape of the Bi-Fidelity error.For this purpose, 300 volatility models of the form Σ(Θ, ∆) = σ 00 + σ 10 Θ + σ 01 ∆ were generated randomly from uniformly distributed random variables σ 00 ∈ [0, 0.8], σ 10 ∈ [0, σ 00 /2], σ 01 ∈ [0, 12(σ 00 /2 − σ 2 10 ] 'covered' by the grid in (15).Both high and Bi-Fidelity solutions were calculated for each of these volatility models.
The mean absolute difference of the expected values is represented in Figure 7a close to the strike price and Figure 7b for a larger range of S values.Figure 7c is a plot of the error for all S and t values.The smoothing area is not plotted since it differs for every volatility model.The shape of the error is characterized by an oscillation of size 10 −3 = 10 −5 • strike close to the strike price and a steady increase in absolute value for S → ∞.The maximum absolute difference lies close to S = ∞ and has a size of 10 −2 = 10 −4 • strike, which is small in relative terms.This coincides with the error shape in Figure 4a-c and thus seems to be characteristic for the considered Bi-Fidelity model.The characteristic error in the variances derived by the same 300 volatility models is displayed in Figure 8a.
It shows some oscillation close to the strike price of size 10 −2 = 10 −6 • strike 2 but vanishes elsewhere, as one can observe in Figure 8b.This error can possibly be reduced by adding more approximation pairs for the Bi-Fidelity computation or by choosing a low fidelity model closer to the high fidelity model.Remark 1.This paper focusses on the illustration of the general technique.Hence, the above error analysis is only done for a European Call option as an example.The choice of hyper parameters like the mesh size of the high and low fidelity models depend a lot on the available computational resources and the required accuracy.Therefore, a general recommendation on the choice of hyper-parameters cannot be given.A general rule, however, predicts that the Bi-Fidelity error should decreases when the difference in the low and high fidelity model diminishes.However, choosing the low fidelity model very close to the high fidelity model cancels the computational advantage of the technique.A trade-off has to be made considering the specific situation.Also an increasing the number A of approximation data pairs should increase the accuracy up to some point.If A is not prescribed by computational resources, one could, e.g., add new approximation points in (12) until the considered distance is below some threshold for all remaining realizations.

Comparison of Computation Times
For demonstration, the above Bi-Fidelity model and the high fidelity model with the same number of grid points M H ζ = 350 and N H τ = 5853 were calculated in the same 300 randomly generated volatility models.Every model 01 ∆ belonging to iteration i ∈ {1, . . ., 300} was generated such that it satisfies the same bounds on the coefficients σ (i) 00 ∈ (0, 0.8], σ (i) 10 ∈ [0, σ 00 /2] and σ (i) 01 ∈ [0, 12(σ 00 /2 − σ 2 10 )] as for the low fidelity sample points in (15).The Σ (i) should thus be 'covered' by the low fidelity sample points, which enables a Bi-Fidelity computation.In every calculation, the stability of the scheme w.r.t. the chosen time step is checked.The computation times for both models are plotted in Figure 9.The mean computation time for the high fidelity model is 173.99 s, whereas the Bi-Fidelity model achieved a mean computation time of 10.68 s per volatility model.Hence, the application of the Bi-Fidelity method accelerated our computations by a factor of 16.3 in mean.For finer high fidelity grids, this difference should further increase.However, choosing a finer grid means introducing a larger difference in the high and low fidelity model, which could lead to larger errors.

Conclusions
When the volatility in the Black Scholes equation is determined by discrete market data, uncertainty is introduced due to the estimation procedure.We modelled this uncertainty by a dependence on a finite number of random variables representing random factors of influence.A possibility to fit this uncertain volatility to market data was demonstrated.Afterward, the Black Scholes equation with uncertain volatility was used to model the price process of a derivative.Under certain assumptions, the random volatility and the stochastic solution can be represented by their generalized Polynomial Chaos (gPC) expansions allowing the application of the stochastic Galerkin method.

PREPRINT
The resulting deterministic system of PDEs for the gPC coefficients was truncated and solved numerically by a finite difference scheme.
Numerical examples showed that the expected value of this stochastic model fitted real market data in a similar way as a deterministic model.However, the stochastic solution allows deviations from its expected value within a certain range and it can be used for calculations of further stochastic quantities such as the variance of the solution or in risk management applications.
However, computation can become costly for a large number of random variables or a late truncation.This is due to the fast increase in the number of gPC coefficients.Therefore, a machine learning technique was presented to reduce the computation cost for computing the solutions for different volatility models within the same setting (option type, maturity, interest rate, maximum polynomial degree).The so-called Bi-Fidelity approach approximates a costly solution on the basis of a computationally cheaper solution and some pre-stored costly solutions for wisely selected volatility models.
For a European Call option, the maximum absolute difference in the expected value of the Bi-Fidelity solution to the desired solution was experimentally observed to be of size 10 −5 • strike in mean close to the strike price and increase to size 10 −4 • strike in mean for S → ∞, where the expected value also tends to ∞.The maximum difference in variance attained a value of size 10 −6 • strike 2 in mean.Meanwhile, the mean computation time was decreased by a factor of 16.3.
A topic that is still open to further research is the convergence of the truncated gPC expansion of the stochastic solution to the true solution as the truncation number goes to infinity.If convergence is assumed to hold, it is also possible to solve the deterministic system of PDEs for the gPC coefficients with a different numerical technique and apply the Bi-Fidelity approach to this solution.Furthermore, one could think of applying the technique used in this paper to the Black Scholes equation with uncertain volatility and interest rates, when there are doubts concerning its true value, or to familiar equations such as the Black Scholes equation for multiple assets or the bond equation.

Finite Difference Scheme
For demonstrative purposes, European Call options with strike price strike and maturity T will be considered to present the finite difference scheme used for solving the system of Equation (11).
For an easier implementation, system (11) is rewritten in vector form.This is performed via a bijection φ from the set {0, .which can be found, e.g., in Chapter 2.2.5 in Zhu et al. [2013] for the deterministic Black Scholes equation.This leads to a PDE for v given by: ∂v(ζ, τ In order to solve the system, we choose a finite difference scheme because it is easy to implement for practitioners.An and for for m = 1, . . ., M ζ − 1, n = 0, . . ., N τ − 1.This yields the explicit finite difference scheme (a) Expected value surface for the stochastic solution.(b) Variance surface for the stochastic solution.(c)Deterministic solution.

Figure 2 :
Figure 2: Comparing the stochastic model to real market data.(a) Histogram density estimator and density of Σ(Θ, ∆) fitted to the implied volatilities by constrained maximum likelihood.(b) Market values of the option together with the expected value of the SG solution and the range expected value plus minus standard deviation.(c) Detailed look on the last 55 days.
Figure 3a,b shows the expected value surfaces of the high fidelity and the Bi-Fidelity solution for the volatility model Σ(Θ, ∆) = 0.5 + 0.2Θ + 0.1 √ 12∆.At first glance they seem to approximately coincide.(a) High fidelity solution.(b) Bi-Fidelity solution.

Figure 3 :
Figure 3: Expected value surfaces for high fidelity and Bi-Fidelity solution.

Figure 4 :
Figure 4: Absolute difference in expected value of high fidelity and Bi-Fidelity solution.

Figure 5 :
Figure 5: Variance surface for high fidelity and Bi-Fidelity solution.

Figure 6 :
Figure 6: Absolute difference in variance of high fidelity and Bi-Fidelity solution.

Figure 7 :
Figure 7: Mean absolute difference in expected value of high fidelity and Bi-Fidelity solution.
for all S values

Figure 8 :
Figure 8: Mean absolute difference in variance of high fidelity and Bi-Fidelity solution.

Figure 9 :
Figure 9: Computation times for the high fidelity model and the Bi-Fidelity model evaluated in the same volatility model.
Liu et al. [2021]1]4]banks et al. [2020].In the setting of uncertainty quantification for PDE models, it is frequently described in the context of uncertainty quantification via Stochastic Collocation methods, compareZhu et al. [2014],Narayan et al. [2014]for the general procedure orLiu and Zhu [2020],Gamba et al. [2021],Gao et al. [2020],Liu et al. [2021]for applications.The combination with the stochastic Galerkin method works similarly; however, it is not very common in literature.