On Multilevel and Control Variate Monte Carlo Methods for Option Pricing under the Rough Heston Model

: The rough Heston model is a form of a stochastic Volterra equation, which was proposed to model stock price volatility. It captures some important qualities that can be observed in the ﬁnancial market—highly endogenous, statistical arbitrages prevention, liquidity asymmetry, and metaorders. Unlike stochastic differential equation, the stochastic Volterra equation is extremely computationally expensive to simulate. In other words, it is difﬁcult to compute option prices under the rough Heston model by conventional Monte Carlo simulation. In this paper, we prove that Euler’s discretization method for the stochastic Volterra equation with non-Lipschitz diffusion coefﬁcient E [ | V t − V nt | p ] is ﬁnitely bounded by an exponential function of t . Furthermore, the weak error | E [ V t − V nt ] | and convergence for the stochastic Volterra equation are proven at the rate of O ( n − H ) . In addition, we propose a mixed Monte Carlo method, using the control variate and multilevel methods. The numerical experiments indicate that the proposed method is capable of achieving a substantial cost-adjusted variance reduction up to 17 times, and it is better than its predecessor individual methods in terms of cost-adjusted performance. Due to the cost-adjusted basis for our numerical experiment, the result also indicates a high possibility of potential use in practice.


Introduction
Rough volatility has recently emerged as an impressive tool to model an asset's volatility in the financial market. The empirical study [1] by Gatheral and his co-authors have made all this possible. In short, rough volatility is a type of fractional Brownian motion (fBm) with a Hurst parameter of 0 < H < 0.5, or in terms of alpha notation, 0.5 < α = H + 0.5 < 1. Specifically, fBm with 0 < H < 0.5 has rougher fluctuation than the ordinary Brownian motion, whereas fBm with 0.5 < H < 1 has a smaller/lesser fluctuation than the ordinary Brownian motion. Before continuing the discussion on rough volatility, we would like to mention some of the work that made this discovery possible. First, before the introduction of rough volatility, the fractional Brownian motion with 0.5 < H < 1 was used to model the long-term memory effect in the asset's volatility by Comte and Renault [2]. Subsequently, mixed fractional Brownian motion, composing the sum of fractional Brownian motion and ordinary Brownian motion, was proposed to also model the volatility of the asset return in the work by Cheridito et al. [3].
One of the earliest work that uses fractional Brownian motion with 0 < H < 0.5 as a volatility component in option pricing is by Alos et al. [4]. In particular, the authors managed to generate a short-time-to-maturity at-the-money-term-structure skew of the order O(T c ) for c > −0.5. This is a great accomplishment, as the individual local and stochastic volatility models are unable to replicate the mentioned behavior previously.
Fukasawa [5] also demonstrated that option prices with volatility driven by the stochastic volatility model with the fractional Brownian motion of the Hurst parameter 0 < H < 0.5 can generate an at-the-money volatility skew of the order O(T H−0.5 ).
Due to recent advancements in rough volatility, there are many studies revolving around the development of rough volatility models. In particular, the authors of [6] proposed the rough Bergomi (rBergomi) model, which is a rough extension of the Bergomi model [7]. The result of the study found that the rBergomi model with fewer parameters fits the SPX volatility relatively better than Markovian stochastic volatility models; this is consistent with the SPX variance swap during the year 2008 economic crisis and the flash crash event. The rough Heston model is the main discussion of this paper and it was proposed by El Euch et al. in the work of [8,9]. What is so special about the rough Heston model is that it can fit the implied volatility of the options very well, and it can replicate the explosive behavior of the term-structure at-the-money skew observed in the financial market. Several studies [10][11][12][13][14][15] mainly focus on computational reduction of the rough Heston model. One of the greatest barriers to compute option prices under the rough Heston model is due to the lack of an explicit solution for the fractional Riccati equation, which is a part of the characteristic function of the option prices under the rough Heston model. This paper contributes in three ways: (1) presents and proves a time bound on Euler's discretization method for the simulation of the stochastic Volterra equation E[|V t − V n t | p ] with a non-Lipschitz diffusion coefficient; (2) proves the weak error rate | E[V t − V n t ]| and weak convergence of Euler's discretization method on the same stochastic Volterra equation; and (3) proposes a control variate estimator and a mixed Monte Carlo method to enhance the computational efficiency in computing the option price under the rough Heston model. Our work is possible mainly due to the work of [16][17][18][19][20][21]. The following Section 1.1 provides a simple and short introduction to our work. Section 2 mainly talks about discretization methods-the Euler-Maruyama (first-order Euler) method for the stochastic differential equation and stochastic Volterra equation, as well as the second-order Euler method for the stochastic differential equation. The bounded and weak errors of the Euler discretization method are also presented; the proofs are located in Appendices A-C. Furthermore, we discuss some existing Monte Carlo methods (base, control variate, and multilevel) and propose a mixed Monte Carlo (multilevel control variate) method in Section 3. Section 4 is dedicated to the numerical experiment of some existing methods and the proposed method. Specifically, we consider a few of the tests to benchmark our methods: (1) the capability of implied volatility to change on parameters ρ, α, and T; (2) cost-adjusted variation comparisons for OTM, ATM, and ITM options as well as the comparisons for different estimators; the (3) discretization bias for different estimators on a different number of time steps n and level L.

Prerequisites
The rough Heston model can be formulated as follows: and Proposition 1. Assume the rough Heston model in Equation (2), and then the conditional expectation of the variance is as follows: such that E α,β (x) = ∑ ∞ n=0 x n Γ(αn+β) denotes the generalized Mittag-Leffler function.

Discretization Methods
This section mainly discusses the first-order Euler discretization method for the stochastic differential equation and stochastic Volterra equation, and together with the second-order Euler discretization methods for the stochastic differential equation.

Euler-Maruyama Discretization Method
We first consider the stock process S satisfying the following stochastic differential equation (SDE): where a, b : R → R are coefficient functions, and W is an independent standard Brownian motion. Specifically, the coefficient functions a and b are assumed to satisfy the usual conditions (see Appendix B.2 in [18]) for the existence and uniqueness of a strong solution to the SDE (5). The initial condition of SDE (5) is set as S(0). First off, we denote the time-discretized approximation of the process S as S. Under the Euler-Maruyama method [25], the initial condition remains the same as S 0 = S 0 , and the SDE (5) is discretized on a time grid 0 = t 0 < t 1 < ... < t m as follows: where i = 0, ..., m − 1, and Z is an independent standard Brownian motion Z ∼ N(0, 1). In this work, we limit the method's time grid to linear fixed spacing such that t i = iδ, where δ is the fixed space. It is widely known that the Euler-Maruyama method for SDE has a strong error of O( √ δ). Furthermore, it can be easily further refined to O(δ), using Itô's formula to expand b(S(t)) [26] instead, but we will avoid further discussing this method, as we will introduce the second-order Euler method or Milstein scheme later. See Chapter 6 in [18] for more details.
Suppose now we consider the stochastic Volterra equation (SVE) of the following form: where m, g : [0, T] × R → R and K 1 , K 2 are the (possibly singular) kernels. We refer to the work of [21] on the discussion of discrete-time simulation of SVE for the Lipschitz condition. Notably, we can outline the following modified assumptions (one-dimensional space) stated in page 3 of [21] in order to provide theorems on the error of Euler's discretization method for SVE related to the rough Heston model (non-Lipschitz diffusion coefficient).

Proof. See Appendix A.
The SVE in the first-order Euler discretized form is as follows: where ∆t i+1 = t i+1 − t i = δ and ∆W i+1 = W t i+1 − W t i for i = 0, 1, ..., n − 1. Note that we use the following solution V n t of the Euler scheme (integral form) in the bounded proof of the Euler discretization method.
Theorem 1. Suppose that Assumption 1, conditions (AA1)-(AA7) and condition (BB1) hold. Let for β ∈ (1, 1/(1 − 2H)) and κ ∈ (0, 1); then, there exists a finite constant C > 0 that depends on T, p, V 0 , and β such that for t ∈ [0, T] and n → ∞, the following is true: Proof. Take the limit n → ∞ in Theorem A1, and note that > 0; then, the result follows directly. See Appendix B Remark 1. We note that the Assumptions 1, conditions (AA1)-(AA7) and condition (BB1) satisfy the case of the rough Heston model. In addition, in the case of the rough Heston model, we have the non-Lipschitz diffusion coefficient factor of κ = 1/2. Finally, it can be seen that for small time t, the bounded errors are small too. On the other note, we do have to mention that in the case of the Lipschitz condition for the diffusion coefficient in the stochastic Volterra equation, i.e., |g(t, x) − g(t, y)| ≤ C|x − y|, lim n→∞ E[|V t − V n t | p ] = 0 as proven in [21].

Theorem 2.
Suppose that Assumption 1, conditions (AA1)-(AA7) and condition (BB1) hold. Then, the following is true: Proof. Take the limit n → ∞ in Theorem A2, then the result follows immediately. See Appendix C.
Remark 2. Theorem 2 shows the weak convergence of the variance process in the rough Heston model. This was actually proven in [27], but here, we provide the weak error order for the variance process of the rough Heston model. From Theorem 2, we know that the variance process converges at the rate of δ (1/2β)[(2H−1)β+1] , where δ = T/n and β ∈ (1, 1/(1 − 2H)). Since β can be specifically picked, our goal is to maximize the convergence order (1/2β)[(2H − 1)β + 1] by changing β. We differentiate the order with respect to β as follows: From Equation (23), we know that (1/2β)[(2H − 1)β + 1] is strictly decreasing on the range of β ∈ (1, 1/(1 − 2H)). Then, the weak error order can be obtained as follows: This indicates that the weak error of the rough Heston model is O(n −H ).

Second-Order Euler Discretization Method
In this subsection, we discuss the second-order Euler discretization method that was introduced by Milstein [28] (we refer to the text [18]). Let us consider the stock process (5) in another manner, i.e., the following: We denote the following operators: and Then, the terms t+δ t a(S(u))du and t+δ t b(S(u))dW u can be approximated as follows: t+δ t a(u, S(u))du ≈ a(t, S t )δ + L 0 a(t, S t )I 1 + L 1 a(t, S t )I 2 (28) and where I i for i = 1, ..., 4 denotes the double integral, which can be computed as follows: where ∆W = W t+δ − W t . The computation of I 2 requires some derivation. Based on Chapter 6.2 in [18], I 2 can be simulated as follows: The correlation between ∆W and I 2 can be computed as follows: Then, we let B denote an independent Brownian motion and ∆B ∼ N(0, 1). Finally, we can obtain the following: See [18] for specific details of the derivation of Equations (30)- (33). It was shown by [29] that the second-order Euler discretization method has a weak order of error of O(δ 2 ); it has a better convergence order than the first-order Euler discretization method, which possesses a weak error of O(δ). However, the requirement that comes with the weak order of convergence on the second-order Euler discretization method requires the evaluated functions a and b to have uniformly bounded derivatives, and the functions need to be six times continuously differentiable. The purpose of giving a short review on this method is because while we will not be using this method for the discretization of the stochastic Volterra equation, we can actually use this on the discretization of the stock process in the control variate method and multilevel control variate method (control variate part) instead of the first-order Euler discretization method. We can reduce the overall error of the simulation in numerical experiment as per the weak error that the second-order Euler discretization method has suggested.

Monte Carlo Methods
This section outlines the conventional Monte Carlo method, control variate method, multilevel Monte Carlo methiod, and the novel multilevel control variate Monte Carlo method.
Suppose that the interest of our study is to estimate the E[P], the conventional Monte Carlo method estimates E[P], using N independent samples of ω with a given probability space (Ω, F , P), that is, the following: The variance of this estimator is 1 N Var[P], and the root mean square error (RMS) of the estimator has an order of O(N −1/2 ). Particularly, suppose that we require an accuracy of for the RMS for the estimator; then, a total of N = O( −2 ) samples are needed for the conventional Monte Carlo.

Control Variate Method
The control variate method is one of the variance reduction methods along with antithetic variates, quasi Monte Carlo, and importance sampling methods. The idea of the control variate method is somewhat simple. Suppose we consider another random variable X that is paired with P such that the pairs (X n , P n ) for n = 1, ..., N are simulated with the sample path ω. Furthermore, it is required that E[X] is known, or it can be computed relatively cheaply and accurately. Then, we denote a fixed constant ς such that we can compute the following: Equation (40) is a control variate estimator, and it is an unbiased estimator to P as follows: To optimally reduce the variance or mean square error of the estimator (40), we can pick the constant ς to be the following: The ratio of the control variate estimator to the base estimator is as follows: Equation (42) tells us that as long as the correlation between the random variable X and P is negatively correlated or positively correlated, variance reduction is guaranteed.
Since the quantity of interest P is usually lacking E[P] in practice, we ought to use the optimal sample estimate of ς from Equation (41) as follows: where X and P are the sample mean of X n and P n , respectively. Accordingly, the expectation E[P] can be estimated using the sample mean estimator P( ς * ) such that the optimally variance reduced control variate estimator is as follows: In practice, ς * and X are most likely unknown variables, so it should be estimated beforehand. Pre-running a small number of samples is usually the only way to estimate ς * and X since this study focuses on call option pricing under the rough Heston model. Obviously, P = (S T − K) + is the payoff of the call option prices where S T is the stock price at maturity time and K is the strike price. Now comes the question of what control variate we should employ in order to get ρ X,P to be closest to 1 or −1. We propose a control variate based on the idea of [19].

Proposition 2. Let S CV
0 be the initial stock price and W be an independent Brownian motion. Consider the stock process as follows: where σ(t) is a nonrandom function of t. Then, there is an analytic solution for the European call option as follows: where N(·) denotes the cumulative distribution function of the standard normal distribution and Proof. See page 253 of [30].
We propose a control variate X by using Proposition 1. We start by denoting where S CV t is the stock process governed by the following: Specifically, σ(t) is a nonrandom function and the squared-function is denoted as follows: where E 0 (σ 2 t ) is the expectation from Proposition 1, and ε > 0 is a small positive constant. The use of t + ε in Equation (49) is to ensure that the conditions and requirements of the second-order Euler discretization method are being met. We set ε = δ/10 for our study to avoid the singularity at E 0 (σ 2 t )| t=0 , where δ is the step-size of the discretization method. Hence, the expectation of the call option price can be determined using Proposition 2. In a numerical experiment, the forward variance of the rough Heston model (Equation (3)) in Proposition 1 is computed through the truncated Mittag-Leffler function and numerical integration.

Multilevel Monte Carlo
Multilevel Monte Carlo is a form of the control variate method which uses multiple controls. Specifically, it uses itself with different level of coarseness as control variates. Instead of simulating the process at the finest level, the method approximates P L , using the sequence of P 0 , ..., P L−1 . In particular, the sequence P 0 , ..., P L−1 , P L is a sequence with increasing cost to simulate. The expectation of E[P L ] can be decomposed as follows: Consequently, we can rely on the following unbiased estimator of E[P L ] in practice: For our study, we let M ≥ 2 and D > 1 such that the total discretization steps for P .., L, then the step size is defined as δ i = T/n i . One thing to take note of is that the difference P (51) actually relies on the same sample path ω but at a different coarseness level. For example, if k is the step number of the simulation (same as in Equation (51)) and we have the following parameters M = 256, D = 2, L = 4, and i = 4, we would simulate one set of sample stochastic path ω (n 4 ) with n 4 = 256 discretization steps (in other words, the simulated Brownian motions are W a , a = 0, 1, ..., 256) to compute P . To achieve minimization of the variance of the multilevel estimator using optimal N i with a fixed total cost, we need to solve an optimization problem. We first define V 0 and C 0 to be the variance and cost of one sample P 0 respectively, whereas we can define V i and C i to be the variance and cost of one sample P i − P i−1 respectively. The total variance of the multilevel estimator is as follows: and total cost of the multilevel estimator is as follows: Note that each level of correction in Equation (51) uses independent samples as indicated from the presence of the level i in superscript (i, k) of Equation (51), which is why Equation (52) is of this form (absence of correlation on other random variables). Our optimization problem is to do the following: where C > 0 is a fixed total cost. The Lagrangian function can be constructed as follows: where λ L is the Langrange multiplier. Solving it, we can obtain the optimal N i as follows: Using Equation (57), we are able to achieve the optimal least total variance with a fixed cost by distributing N * i to a different coarse level of P . In practice, we will never know the exact V i and C i ; therefore, we will need to replace them with the estimates V i and C i . What we can do in practice is that we can start by pre-running a small number of the simulation to compute for V i and C i ; then, we can use V i and C i to obtain an estimate of N * i in Equation (57), i.e.,N * i .

Multilevel Control Variate Method
We propose a mixed estimator that uses the idea of the control variate method and the multilevel Monte Carlo method. Readers may also see similar work in [31,32]. We refer to this method as the multilevel control variate method from now onward. From Equation (50), we estimate E[P 0 ] and E[P i − P i−1 ] using the control variate method. Let X 0 , X 1 , ..., X L be the control variate estimator proposed in Equation (47) such that E[X i ] for i = 0, ..., L. Then, the estimator for the decomposition E[P L ] is the following: It can be easily shown that Equation (58) is an unbiased estimator for E[P L ]: To obtain the ratio of variance of the proposed estimator to the Multilevel estimator, we first define the total variance of the multilevel estimator as follows: where V Similarly, we can define the total variance for the multilevel control variate estimator as follows: where V We can reduce the total variance for the multilevel control variate estimator by minimizing V (2) i individually with optimal ς i for i = 0, 1, ..., L, that is, the following: where ς * 0 is the optimal value that minimizes V .., L is as follows: Equations (62) and (63) thus imply the least individual variance as follows: or in another form Using Equations (60), (61) and (65), we can compute the ratio of the total variance of the optimally controlled multilevel control variate estimator to the total variance of the multilevel estimator. Suppose we let i . To simplify the calculation, we can obtain the following ratio: Equation (66) indicates that the variance reduction factor is weighted by ρ 2 P 0 X 0 , ρ 2 Y 1 R 1 , ..., ρ 2 Y L R L . Since −1 ≤ ρ ≤ 1, the variance reduction on the multilevel estimator is basically guaranteed, unless then the variance is minimized to zero. Unfortunately, we are faced with the same issue as any Monte Carlo estimation methods-we are unable simulate X 0 , X 1 , ..., X L directly and exactly. Therefore, we do have to rely on discretization methods and therefore, discretization bias is unavoidable. The estimator that we will be using to conduct numerical experiment is as follows: where the optimal ς 0 and ς i from Equations (62) and (63) are estimated as follows: and i−1 ). In practice, we will not have the optimal ς * i for i = 0, ..., L immediately; pre-running the simulation with a small sample is needed. We need to clarify that X ). Furthermore, to avoid large discretization error on the simulation of X i for i = 0, 1, ..., L, we will simulate X i for i = 0, 1, ...L, using the second-order Euler method (introduced in Section 2.2), which has a weak order of convergence O(δ 2 ). We will see that in the numerical experiment Section 4, the discretization biases are ignorable once a certain level of the multilevel control variate method is employed. Remark 3. The total discretization steps n i and step size δ i for the multilevel control variate method are defined the same as the multilevel method. In addition, the optimal N (2) i for Equation (61) is computed similarly as in Section 3.2. Note that a fair comparison between the multilevel and multilevel control variate should be done using cost-adjusted performance. In other words, the benefit of using the extra control variates to reduce variance must outweigh the cost of generating the extra paths for the control variate.

Numerical Experiments
This section discusses the numerical experiments for our evaluated Monte Carlo methods in four different ways. First, we demonstrate that by using the multilevel control variate Monte Carlo, the implied volatility under the rough Heston model has the ability to change according to different ρ and α with different maturity time T. Then, we will evaluate the cost-adjusted distribution for the OTM, ATM, and ITM volatility under the rough Heston model on the following estimators: (1) base estimator (BE), (2) control variate estimator (CV), (3) multilevel estimator (ML), and (4) multilevel control variate estimator (MLCV). Subsequently, we will compare the bias of the four above-mentioned estimators. Finally, we compare our methods to the implied volatility under the rough Heston model computed, using the Fourier inversion method.
Accordingly, we define the following parameters for four estimators that will be used in the following numerical experiment unless stated otherwise.
The parameter n is used by BE and CV, whereas the parameters M, D and L are used by ML and MLCV. Note that we did not specify the number of simulations N for all the estimators, and instead, we will be using the same run-time or cost for all the methods, as this will keep the benchmark of different methods fair. The general steps of our Monte Carlo simulation are as follows: 1.
Pre-run the simulation of the methods with relatively small N 1 and store the intended result.

2.
Compute time-taken to run a single simulation.

3.
Compute N 2 as the number of simulations needed to achieve the specific run-time or cost C.

4.
With the previous N 1 simulations stored, conduct the rest of N 2 − N 1 simulations.
The numerical experiments are coded in MATLAB R2019b software with i7-8750H CPU @ 2.20 GHz (Max Turbo Frequency: 4.10 GHz) and 16 GB memory.

The ρ and α Change
Here, we show that the Monte Carlo method on implied volatility under the rough Heston model is capable of demonstrating various smile shapes, according to α and ρ with different maturity time T. Figure 1 is produced using MLCV with the parameters stated in Equation (70), except for the α and ρ. In addition, in order to make the graph looks smoother, we used 250 simulations of cost C = 1 and then averaged the computed implied volatility values.
We can see that the behaviors from Figure 1 are quite reasonable for short-and longterm implied volatility. The plots somewhat fit the empirical observation from the work of [6], i.e. their Figures 4 and 8 indicate that as the maturity T of the implied volatility plots gets smaller, the center of the smile approaches to log(K/S) = 0 becomes much more obvious. In addition, we can observe the following effects from the implied volatility plots Figure 1: (1) larger α causes the implied volatilities for different log-strike to have larger discrepancies at in-the-money options, whereas out-of-the-money options are priced below the Black-Scholes option prices on different maturity lengths T; (2) higher correlation ρ flattens the implied volatility such that it behaves more like a smile shape.

Remark 4.
Note that during the computational work in the numerical experiment, the implied volatility with a higher correlation between the stock return and volatility movement ρ requires more computational effort to compute to an accurate number as compared to the lower ρ. Furthermore, while we have only used MLCV to estimate the implied volatility plots on Figure 1, similar plots can be constructed using the BE, CV, and ML methods, but it would take a larger number of simulations to create a smooth line across different log strikes in the figures. The reason that it requires a larger number of simulations is discussed in the subsequent subsection.

Cost-Adjusted Variation and Bias
This section focuses on the following four matters: (1) evaluate the distribution and standard deviation of the out-of-the-money (OTM), at-the-money (ATM), and in-the-money (ITM) implied volatilities under the rough Heston model using MLCV; (2) compare the standard deviation of the MLCV on the changes of the parameters T, ρ, and α; (3) evaluate the distribution and standard deviations between the BE, CV, ML, and MLCV; and (4) compute discretization biases for all the previously mentioned methods on a different number of time steps n and level L. We set the cost of numerical experiments as C = 1 (second) to avoid improper comparison between estimators. Figure 2 shows the histogram plots of 1000(σ i imp − σ imp ) for i = 1, 2, ..., 3000 simulations (3000 runs of 1 second performance) of MLCV with a normal distribution fit; σ imp is the sample mean of the computed implied volatility σ imp . Furthermore, we have fitted normal distribution plot N(0, σ norm ), where σ norm is the standard deviation of 1000(σ i imp − σ imp ). Based on Figure 2, we can observe that ITM options have the largest standard deviation, and therefore, it is harder to pin down an exact option price as compared to the ATM and OTM option. On a bright note, the concept of normal distribution ensures that there is a 99.7% (6 standard deviations gap) chance that the ITM implied volatility would have an absolute error of at most 0.459% (implied volatility error) or roughly 1% percentage error for the implied volatility.
Similarly, we can compute the standard deviations for ATM implied volatility on different values of ρ, α, and T. Table 1 displays the standard deviation of the implied volatility estimator computed by using the MLCV. From Table 1, we can infer that by using the MLCV, option prices under the rough Heston model that have longer maturity time T incur more error than the shorter maturity time T. In addition, based on the observation of the changes in ρ and α, it is evident that a higher α value will lead to a smaller error for the implied volatility, whereas a higher ρ will do the opposite. It is important to realize that among the evaluated ATM implied volatility standard deviations, even the worst case scenario (ρ = −0.2, α = 0.6, T = 1) would have 99.7% chance of having a percentage error of roughly 1.2% only.
In addition, we have computed and plotted the histogram shown in Figure 3 for the ATM implied volatility of the four estimators. Unsurprisingly, the MLCV performed the best, with the smallest standard deviation among all the estimators. To put it in a perspective, by using the multilevel control variate estimator, we managed to achieve a roughly 17 times variance reduction as compared to BE. CV and ML are also able to demonstrate a substantial variance reduction of roughly 11 times and 12 times. The x-axis is 1000(σ k imp − σ imp ), whereas the y-axis is simply the probability that the implied volatility falls into the interval of the histogram. The σ norm is the standard deviation of the normal distribution fit.   Figure 4 shows the number of time steps n or M against the ATM implied volatility. Evidently, the observed implied volatilities of the estimators do differ by a slight bit. The most notable difference in implied volatility is from ML. The result of ML seems to have slightly higher implied volatility, but ultimately, when n becomes larger or level L becomes larger, Figure 4 shows that the estimators do appear to converge and have little variance between one another. A 2 level MLCV with M = 256 has 0.15% percentage error as compared to BE with number of time steps n = 1024, which is acceptable in the financial industry.

Calibration to SPX Options
In this section, we will calibrate the implied volatility under the rough Heston model computed through MLCV to the options of an index consisting of the top 500 companies in the U.S. market, which is the S&P 500 (SPX) option. The SPX options prices and stock are taken from the ending on 22 October 2021 with maturity dates ranging from T = 0.0992 up to T = 0.4812. The interest rate is assumed to be zero in the inversion of option price to implied volatility and computation of implied volatility through the rough Heston model. The main criterion of optimizing the parameters is the minimization of the root mean squared error between the bid-ask implied volatility. The calibrated parameters are as follows: Figure 5 shows the calibration of implied volatility under the rough Heston model to the SPX options against log-moneyness log(K/S) (on the x-axis), using the MLCV Monte Carlo method. It shows that the calibrated implied volatility performs reasonably well for the different option maturities considered. It can be seen from Figure 5 that the implied volatility on T = 0.2579 and T = 0.4812 have some deviations at its end (in-the-money and out-of-the-money). One thing to take note of is that we have actually attempted to calibrate the rough Heston's implied volatility, using MLCV up to maturity of T = 1.66, but were unable to do so because of two main reasons: (1) with cost C = 1 s, the implied volatility at larger maturity actually has large deviation, so the optimization algorithm simply would not work well with such a large deviation at each iteration; (2) there are six parameters to consider, and therefore, the optimization cost is extremely high. We are actually unsure whether the parameters in Equations (71) and (72) are even the global minimum of the optimization problem or not. We have tried three different optimization algorithms, including the gradient-based method (interior-point method), evolutionary algorithm (genetic algorithm), and pattern search algorithm. The gradient-based method obviously does not work well with stochastic problems, whereas the evolutionary algorithm simply requires extremely large computational cost to give a moderate optimal parameter. The parameters in Equations (71) and (72) are obtained through a pattern search algorithm with initial parameters given by the genetic algorithm.

Conclusions and Future Research
This paper mainly contributes to the Monte Carlo computation for option prices under the rough Heston model. In particular, we have proved finite bounds on t for the stochastic Volterra equation with a non-Lipschitz diffusion coefficient as well as the weak error and weak convergence of the discretization method. The proofs are located in the Appendices A and B. In addition, we have proposed a Control Variate estimator for variance reduction purpose and a mixed Monte Carlo method, which consists of the control variate method and multilevel method.
In Section 4, we conducted extensive numerical experiments that consider many scenarios, which we believe many readers or practitioners might find interesting. The most notable part is in Section 4.2, where all the computations and simulations are performed, using the cost-adjusted method rather than the number of simulations N. Time to time comparisons make it easier to identify which estimator is performing the best. In this case, the most favorable estimator goes to the proposed MLCV method. A substantial 17 times variance reduction can be achieved by using MLCV as compared to the BE. The numerical experiment for the discretization bias of the estimators was also conducted. The results show that there is little difference between n = 256 and n = 1024 in terms of discretization bias. A calibration of the SPX option was also performed on the MLCV Monte Carlo method at the end of Section 4. The overall performance of the calibration is overall satisfied, but there were some struggles in calibrating the optimal parameters.
We are aware that there was a recent study on an efficient simulation method of the stochastic Volterra equation [33] that used a hybrid exponential scheme. In particular, the author proved the convergence of the method with Lipschitz coefficient functions to the stochastic Volterra equation. The method can be potentially used to replace the discrete simulation of the stochastic Volterra equation for rough Heston volatility in practice once the convergence of the non-Lipschitz coefficient functions is proven. For future research, it would be interesting to see the hybrid exponential scheme [20,34] being used on a rough Heston model simulation and then consider the cost-adjusted performance of different estimators, including the proposed multilevel control variate estimator. In general, we also do hope to see improvements on the finite bound of E[|V t − V n t | p ] or perhaps even strong convergence of the stochastic Volterra equation with a non-Lipschitz diffusion coefficient, i.e., lim n→∞ E[|V t − V n t | p ] = 0. The main hurdle of proving the strong convergence in Lemma A1 is because we cannot use the Grönwall inequality to Equation (A37) the same way as we did to Equation (A49). Instead, we have to use Perov's inequality to prove the convergence that is bounded on time t. In addition, Remark 2 indicates that the weak convergence rate of stochastic Volterra equation t ]| is of the order O(n −H ), but in the study [35], the author proved that the discretization scheme of the Heston model (with H = 1/2 in the stochastic Volterra process) under a certain hypothesis has a weak convergence rate of O(n −1 ). This indicates that the weak convergence rate of stochastic

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: OTM We prove that the kernel K(t, u) := K i (t, u) = (t − u) H−1/2 for i = 1, 2 satisfies the conditions (AA1)-(AA7). Let C > 0 be a fixed constant. We omit the proofs for conditions (AA1)-(AA3), as they are easy to prove. As for condition (AA4), the proof is the same as that for condition (AA5) when β = 1. Hence, we prove that it satisfies condition (AA5) instead.
Furthermore, let w = v δ . Then we have the following: Finally, we let w = −a and the fact that for x, y ∈ R + , x > y, and q ∈ (0, 1). Then, we notice that (x − y) q ≥ x q − y q to obtain the following: The last inequality is because the integral is finite only if (1 − 2H)B < 1, which is fulfilled for the conditions B ∈ (1, 1/(1 − 2H)) and H ∈ (0, 1/2). The second sum of condition (AA4) can be easily shown by taking note of the following: Similarly, we skip condition (AA6), as it can be shown easily when B = 1 in condition (AA7). For K(t, u) = (t − u) H−1/2 , condition (AA7) can be shown as follows: (A6) Then, it follows the same from the proof of condition 1.

Appendix B. Proof of Theorem 1
Before proving Theorem 1, we do need to prove two lemmas first. The following proofs are somewhat similar to the work of [21] in proving the convergence of the stochastic Volterra equation on the Lipschitz conditions, but with different rate of convergence in time. Note that Fubini's theorem is repeatedly used in the the following proofs to interchange the sign of expectation and integral without being mentioned. We assume that Assumption 1 and conditions (AA1)-(AA7) hold in all the proofs.
Proof. We start from the stochastic Volterra Equation (7). Then, by the Cauchy-Schwarz inequality and the Buckholder-Davis-Gundy (BDG) inequality, we have the following: By using Condition (AA1), Assumption (BB1), Jensen's inequality and Hölder's inequality, we can obtain the following: Notice that from the Hölder inequality, β > 1, 0 < κ < 1, and hence p > 2. Then it follows from condition (BB1) and Jensen's inequality that the following holds: Lastly, we use Grönwall's inequality to obtain the following: For the proof of E[|V n t | p ], the exact same can be done, except for the last step, where we obtain the following: We then define the following function as follows: We continue with The last inequality is due to Grönwall's inequality.
Proof. We first rearrange V t − V s as follows: We start with E[|R 1 | p ] where we use the Cauchy-Schwarz inequality, condition (AA2) and Jensen's inequality.
Lastly, we prove the bounded E[|R 4 | p ] by using the BDG inequality, the Hölder inequality and condition (AA5).
where the last inequality is due to the finite form of Jensen's inequality. We start with the bounding of |G 1 | p : where we used the Cauchy inequality, condition (AA6), condition (BB1), Jensen's inequality, and Lemma A1. We move on to the bound of |G 2 | p , i.e., we used the Cauchy inequality, condition (BB1), Lemma A1, Jensen's inequality, and Lemma A2 to obtain the following: