A Multilevel Monte Carlo Approach for a Stochastic Optimal Control Problem Based on the Gradient Projection Method

: A multilevel Monte Carlo (MLMC) method is applied to simulate a stochastic optimal problem based on the gradient projection method. In the numerical simulation of the stochastic optimal control problem, the approximation of expected value is involved, and the MLMC method is used to address it. The computational cost of the MLMC method and the convergence analysis of the MLMC gradient projection algorithm are presented. Two numerical examples are carried out to verify the effectiveness of our method.


Introduction
The stochastic optimal control problem has been widely used in engineering, finance and economics. Many scholars have studied the stochastic optimal control problem for different controlled systems. Because it is difficult to find analytical solutions for these problems, the application of numerical approximation is a good choice. As with numerical methods for determining optimal control problems (see, e.g., [1][2][3][4][5][6]), numerical methods for stochastic optimal control problems have also been extensively studied. For optimal control problems governed by PDE with a random coefficient, the authors of [7][8][9] studied numerical approximations using different methods for different problems. For SDE optimal control problems, the stochastic maximum principle in [10,11], Bellman dynamic programming principle in [12] and Martingale method in [13] have been used to study numerical approximation in recent years.
The gradient projection method is a common numerical optimization method. In [14][15][16][17], the gradient projection method is used. For the numerical simulation of stochastic optimal control problems, whether gradient projection or other optimization methods are used, the approximation of expected value is always involved. For a class stochastic optimal control problem, the authors of [16] combined the gradient projection method with conditional expectation (which was used to solve forward and backward stochastic differential equations) to solve the stochastic optimal control problem. This method is difficult because it involves conditional expectation and numerical methods of solving forward and backward stochastic differential equations. In [15], the expectation was calculated using the Monte Carlo (MC) method, which is easy to understand and implement. However, the convergence speed of the MC method is slow. If we want it to produce a relatively small allowable error, we need to use a large sample. The MLMC method is a commonly used method of improving the slow convergence rate. For the MLMC method and its application, we can refer to the literature [7,[18][19][20][21][22][23][24]. It is worth mentioning that in reference [18], an MLMC method is proposed for the robust optimization of PDEs with random coefficients. The MLMC method can effectively overcome adverse effects when the iterative solution is near the exact solution. However, the proof of the convergence for the gradient algorithm is not given.
In this work, we apply the gradient projection method with the MLMC method to solve a stochastic optimal control problem. An expected value is needed to compute in the simulation of the stochastic optimal control problem. To reduce the influence of statistical and discrete errors brought about by the calculation of the gradient, we use the MLMC method to estimate the gradient in each iteration. In the process of iteration, the mean square error (MSE) is dynamically updated. We prove the convergence of the gradient projection method combined with MLMC, and also extend the theory of MLMC which is suitable for our stochastic optimal control problem.
The rest of this paper is organized as follows. We describe the stochastic optimal control problem in Section 2. In Section 3, we review the gradient projection method and MLMC theory. In Section 4, we expand the existing MLMC theory and apply it to the gradient projection method for the stochastic optimal problem. The convergence analysis is also presented in this section. Some numerical experiments are carried out to verify the validity of our method in Section 5. Main contributions and future work are presented in Section 6.

Stochastic Optimal Control Problem
is a natural filtration generated by a one-dimensional standard Brownian motion {W t } t≥0 , and (1) The objective function of the optimal control problem that we consider is where h(·), j(·) are first-order continuous derivative functions. u ∈ U ad is a deterministic control. U ad is a closed convex control set in the control space L 2 (0, T). E(·) stands for expectation, which is defined by E(h) = Ω h(ω)dP(ω). The stochastic process y(u) ∈ L 2 F ([0, T]; R) is generated by the following stochastic differential equation: Assume that f is a continuous differentiable function with respect to t, y, u; and g is a continuous differentiable function with respect to t, y. Under these continuous differentiable assumptions, we can find that for the problem (3) there exists a unique solution y(·) ∈ L 2 F ([0, T]; R) with (y 0 , u(·)) ∈ R × U ad (this can be seen in [25]). Here, y(·) is a function of u(·). For the optimal control problem (2)-(3) there exists a unique solution. Let u * be the optimal solution.
In the following, C denotes different positive constants at different occurrences, and is independent of discrete parameters, sample parameters and iteration times.

Review of Gradient Projection Method and MLMC Method
The gradient projection method is a common numerical method for optimization problems. The gradient projection method for stochastic optimal control problems usually contains expectation, and MLMC is an important method for calculating expectations. This section introduces the gradient projection and MLMC methods.

Gradient Projection Method
Let J(u) = J(y(u), u), where y(u) is the solution of (3). We assume that J(u) is a convex function, U = L 2 ([0, T], R) is a Hilbert space and K is a closed convex subset of U. Let b(·, ·) be a symmetric and positive definite bilinear form and define b : Combining the first order optimization condition with the projection operator P K : U → K, we can obtain (see [15,16]) where J (u) is the Gâteaux derivative of J(u), ρ is a positive constant. We introduce a uniform partition for time intervals: where χ I N n is the characteristic function of I N n . u * can be approximated by: where K N = K ∩ U N . Based on the analysis of (4)-(6), we can get an iterative scheme for the numerical approximation of (2)-(3) as below: where ρ i is the iterative step size, and J N is the numerical approximation of J (·). The error between J (·) and J N (·) is represented by the following formula: For the iterative scheme (7), the following convergence results hold (see [16]).

Lemma 1.
Assume that J (·) is Lipschitz continuous and uniformly monotone in the neighborhood of u * and u * ,N , i.e., there exist positive constants c and C such that Suppose that and ρ i can be chosen such that 0 < 1 − 2cρ i + (1 + 2C)ρ 2 i ≤ δ 2 for some constant 0 < δ < 1. Then the iteration scheme (7) is convergent, i.e., In the iterative scheme (7), the gradient of the objective function (2) needs to be calculated. Using the stochastic maximum principle, the gradient is solved more conveniently by introducing an adjoint equation: − dp = [h (y) + p f y (t, y, u) − pg y (t, y) 2 ]dt + pg y (t, y)dW t , p(T) = 0.
The detail deduction can be seen in [26,27]. Thus the derivative of J(u) can be denoted by According to the Riesz representation theorem, we can get

MLMC Method
In [19][20][21], the quantities of interest in the MLMC method are scalar value. The authors of [18,21]) make an extension of the MLMC theory to fit function value. In the gradient projection algorithm, the quantities of interest are unknown functions. In this section, we will first briefly review the theoretical knowledge of MLMC. Then, we discuss in detail how to estimate the expectation for the gradient projection method.

Scalar-Valued Quantities of Output
The quantity we are interested in is A : Ω → R. Generally, the exact sample of A is not known. We can only get an approximate sample A h (ω). It is assumed that A has an α order weak convergence property, i.e., and the computational cost satisfies where γ is a positive constant. Usually, α, γ depend on the algorithm itself. For the MLMC method (see, e.g., [19,21]), we consider the multiple approximations A h 0 , A h 1 , · · · , A h L of A, where h l = M −l T (l = 0, 1, · · · , L) represents time step size at the lth level. Here, M is a positive integer (in the later numerical experiments, M = 2). The expectation of E[A h L ] can be defined by where , we apply the standard MC method to estimate. If the sample number is M l , we can obtain In order to maintain a high correlation between samples under fine and coarse meshes, we must produce samples A h l (ω i ), A h l−1 (ω i ) in the same Brownian motion path. Combining (19) and (20), we can get the multilevel estimation Y = ∑ L l=0Ŷ l . Because the expectation operator is linear, and each expectation is estimated independently, we have Moreover, Y is known as an approximation of E[A], and its mean square error (MSE) can be described as: where the first term is the statistical error and the second term is the algorithm error. To make the MSE less than 2 , we may let both terms be less than 2 /2. Denote the cost of taking a sample of Y l by C l , and the sample size by M l . The total cost can be represented as How should we choose M l such that the multilevel estimated variance is less than 2 ?
In [21], we find that the optimal sample number can be selected as Here the symbol · indicates rounding up. The complexity theorem for the MLMC is given in [19][20][21] when the quantities of interest are scalar. Usually, l can not be taken from 0, because when the grid is too coarse, the correlation of equation (SDE, SPDE) is lost. Especially when the quantity is a function, an interpolation operator is needed. If the grid is too coarse, interpolation can cause large errors.

Function Valued Quantities of Output
When the interest quantity is a function in Equation (16), a natural problem is how to apply classical MLMC theory.
When the samples are a vector or matrix, the discrete time step size of each level is different. A h l (ω) and A h l−1 (ω) are not compatible. Thus they cannot be subtracted directly. The most natural idea is processed by interpolation and compression. Reference [28] introduced an abstract operator I l 2 l 1 : R M l 1 +1 → R M l 2 +1 for one-dimensional cases. If l 1 < l 2 , the operator is a bounded linear prolongation operator. If l 1 > l 2 , the operator is a bounded linear compression operator. If l 1 = l 2 , it is an identity operator. We also require I l 2 In real applications, I l 2 l 1 is often a linear interpolation operator. However, in order to be consistent with the previous control space (5), we define I l : and where The MLMC theory of extending to vectors or functions can be found in [18,21]. The MSE of each A is less than 2 , i.e., The optimal sample number is determined by the maximum variance. Thus, the optimal sample number can be revised as follows: Next we consider the termination condition of the MLMC method, when the bias term is less than 2 /2 (see, e.g., [20]). a Cb if and only if a ≤ Cb and b ≤ a. Choosing M = 4, we may assume that where I l is a bounded linear prolongation operator, such as linear interpolation. Using the inverse triangle inequality, we can obtain From (30)-(31), we can derive Furthermore, we get Combining Equation (32) with (33), we can derive the following error estimate, To ensure that the bias term can be less than 2 /2, we use the following formula: Based on the above analysis, the complexity theorem of MLMC is given below.
Then, there exists a positive integer L and a sequence {M l } L l=0 such that, for any and the cost The proof of Theorem 1 is similar to that of Theorem 3.1 in [20]. The norm can be replaced by || · || L p (D) , 1 ≤ p ≤ ∞.
Next, we introduce a combination of the MLMC method with the gradient projection method.

MLMC Method Based on Gradient Projection
For the expectation form E[p f u (t, y, u)] in formula (16), a more general expectation estimation form may occur in the numerical approximation of stochastic optimal control problems. We consider the expectation formula where y, p are the solutions of the state equation and the adjoint equation, respectively.
We assume that f , g have continuous derivatives. The numerical approximation is de- Before the theoretical analysis, we first make the following assumptions: Hypothesis 1 (H1). Assume that the error estimate of state y is as Hypothesis 2 (H2). Assume that the error estimate of the adjoint state p is as Hypothesis 3 (H3). Assume that the cost of calculating approximate sample A is as where the first and second terms are costs in sampling y, p, respectively.

Classic Monte Carlo Method
is estimated by the average value of the sample, i.e., where Because the exact sample is taken here, there is only statistical error (see, e.g., [7,22]). The statistical error can be given by the following lemma.
The approximation of E[A] can be defined by where A h L (ω i ), i = 1, · · · , M is independent and identically distributed. From the formula (45), we can obviously find that there are two sources of error: one is statistical error, and the other is discrete error. The detailed error estimate is described as follows.
where I l A h l = f (I l y h l )g(I l p h l ).
Proof. Firstly, applying Lemma 2 and triangle inequality, we obtain Secondly, using Cauchy-Schwartz inequality and Lipschitz continuity, we have Combining (47) with (48), we can derive the desired result.
From Theorem 2, we can find that the numbers of statistical samples are affected by the step size. Based on this, the complexity theorem of MC is given below: Theorem 3. Let assumptions H1-H3 hold, A ∈ L 2 (Ω, L 2 [0, T]) and f , g be Lipschitz continuous. Then, the MC sample number M can be derived as the error bound yields and the total cost satisfies , from Theorem 2 we have inequality (50). The formula (51) can be obtained with assumption H 3 .

Multilevel Monte Carlo Method
According to Equations (26) and (27), a multilevel estimator is established for A in the following theorem.
Theorem 4. Let assumptions H1-H2 hold, A ∈ L 2 (Ω, L 2 [0, T]) and f , g be Lipschitz continuous. Then, using (26) and (27), the error of MLMC expectation for A is established as follows: Proof. Using the triangle inequality, we get where Now, the aim is to estimate the terms I 1 and I 2 . For I 1 , employing Theorem 2, we can obtain For the term I 2 , using the triangle inequality, Lemma 2, Cauchy-Schwarz inequality and Lipschitz continuity, we get where h l−1 = Mh l is used in the above derivation. Hence, substituting estimates of I 1 ,I 2 into equation (53), we derive which is the desired result.
The formula (52) shows that {M l } L l=0 is selected by balancing discrete and statistical errors. We choose sample number {M l } L l=0 such that and the total cost C mlmc = L ∑ l=0 C l M l is as little as possible. According to [7], this is a convex optimization minimization problem. Therefore, there exists an optimal sample number at each level. Thus, we introduce a Lagrange function as Letting the derivative of L(µ, N) with respect to N l be zero, we can derive Based on the above analysis, the complexity theorem of the MLMC method based on gradient projection is given as follows.
Theorem 5. Let assumptions H1-H3 hold, A ∈ L 2 (Ω, L 2 [0, T]) and f , g be Lipschitz continuous. Then, the MLMC estimator (27) can be obtained by the following choice of {M l } L l=0 , where Then the error bound is yielded as And the total computational cost C mlmc is asymptotically bounded by L → ∞ Proof. Firstly, we prove (63). Using Theorem 4 and h l−1 = Mh l , choosing we obtain thereby (63) is proved. For formula (64), we discuss the proof of the first case that τ > 0. It is similar for the latter two cases.
According to hypothesis H 3 , we get Let τ > 0, as L → ∞. Then Thus we have Here we complete the proof.

Gradient Projection Based on Optimization
Following the analysis of the former parts, the numerical iterative algorithm for the optimal control problem (2)-(3) is given in [15].
From Lemma 1, Corollary 3.2 of [16], we know that their algorithm is convergent. The error of the scheme in [15] has two main sources. One is the error caused by estimating the expected E[p f u (t, y, u)]; the other is caused by the Euler scheme, which cannot be ignored. The error of the estimation for expectation has little effect on the step size ρ i at the initial iteration step. When ρ i is small, the estimated error from expectations may completely distort the gradient direction. This causes the algorithm to be unable to converge, i.e., the iterative error does not subsequently decrease.
To make the gradient valid, it is necessary that the MSE satisfies (i) < η||u (i) − u (i−1) || L 2 (0,T) , where η ∈ (0, 1) (η is determined by the optimal control problem). To reduce the influence of statistical error, discrete error and unnecessary computation cost, we use MLMC to estimate expectation.

MLMC Gradient Projection Algorithm
For the algorithm presented in [15], the MSE cannot decrease for a fixed time step size, no matter how much the number of samples is increased, because it is a biased estimate. Therefore, in an efficient algorithm, the time step size will not be fixed. Here we apply the MLMC method to estimate the gradient, which can ensure that the MSE of each iteration is within an allowable range. The detail step is presented in Algorithm 1. Norm || · || L 2 (Ω,L 2 [0,T]) is induced by the inner product (u, v) L 2 (Ω),L 2 [0,T] = T 0 Ω uvdP(ω)dt. J (L i ,M i ) (u (i−1) ) = Y + j (u (i−1) ) is the MLMC approximation of (16). The definition of Y can be seen in (27). Here, A = p f u (t, y, u) is involved in (28), Theorems 4 and 5.
To determine the new optimal iterative step size ρ i , we need to calculate the objective function. After a small number of iterations, the change of the objective function value is small, which makes the MSE very small and the computation cost of MLMC very large, especially through the Armijio search method. So for the iterative step size here we simply , where d is a positive constant. Lines 8-12 determine the MSE of the next iteration, ensure that the MSE of the MLMC estimation gradient will not affect the iterative error of each iteration, and also ensure that the MSE cannot be less than the iterative error of each iteration. This avoids the waste of unnecessary samples; especially when the iteration is close to termination, a very small error change will lead to a large sample number difference.
Noting that the sample number M i and max level L i are related to (i) , and from Algorithm 1, we can see hold.

Convergence Analysis of the Algorithm
First of all, we consider the accuracy of applying MLMC to estimate a gradient. For the accuracy of the gradient estimate, Theorem 6.1 of [18] is discussed in detail. The gradient estimated by MLMC is the exact gradient of a discrete objective function. When the objective function is convex, the MLMC estimation remains convex.
Algorithm 1 eliminates the impact of ||u (i) − u (i−1) || L 2 [0,T] from discrete and statistical errors as far as possible. According to formula (16), line 3 and line 9 in Algorithm 1, we have where J (L i ,M i ) (u (i−1) ) is the MLMC estimate of the J (u (i) ). From Lemma 1, Corollary 3.2 of [16] and Theorem 1 of [15], we know that Algorithm 1 is convergent and that the final iterative error is not affected by discrete or statistical errors. We have the following convergence theorem as τ → 0: Theorem 6. Suppose all the conditions of Lemma 1 hold, and ρ i satisfies 0 < 1 − 2cρ i +Cρ 2 i ≤ δ 2 4 , where δ ∈ (0, 1). Then, Algorithm 1 based on multilevel estimation is convergent, i.e., Proof. Triangle inequality implies where the definition of u * ,N(i−1) can be seen (6).
Then, for the second term of the right hand side (RHS) of (75), we have For the first term of the RHS of (76), we can get The second term of the RHS of (77) can be written as Using (11), (72) and Cauchy-Schwartz inequality, we can obtain for the second term of the RHS of (78) Substituting (77)-(79) into (76), we have Triangle inequality and (80) imply Choosing appropriate ρ i , we can get Combining (75) with (76) and (82), we have It is known from Theorem 3.1 of [16] that When i → ∞, according to the assumption and (71), we have || This completes the proof.

Numerical Experiments
In this section, two numerical examples are presented. The effectiveness of Algorithm 1 is analyzed numerically.
From Figure 3, we can see that when the τ gradually decreases, the error decreases slowly (σ = 0.1, σ = 0.2). This is mainly caused by the direction we choose is the negative gradient direction, which is the disadvantage of the gradient descent method itself.

Example 2
This example comes from [15]; it is as follows: This optimal control problem is equivalent to: where The parameters are selected as u When σ = 5, partial information about the computation process is listed in Table 2. In this table, M k has the same meaning as in Table 1. The total time of iteration is 1.841 × 10 4 s when σ = 5.
From Figures 4-6 and Table 2, it can also be seen that (i) ||u (i) − u (i−1) || L 2 [0,T] → 0, Algorithm 1 is convergent, and the MLMC method is efficient.    In the numerical approximation of the stochastic optimal control problem, statistical error (or MSE) and discrete error have a great influence on the convergence of the gradient projection method. If we use a Monte Carlo method as in [15], the optimization iteration method (gradient projection method) may not converge for a fixed large time step. In order to ensure the convergence of the iteration (gradient projection method), it is necessary to select a small time step size h and a large sample size M. This means that each optimization iteration will take a lot of time.
Our Algorithm 1 is actually an MLMC method with variable step size. The number of samples in each iteration step is different, and the sample size is also different with different time step sizes. It is optimal in a sense, just as the MLMC is superior to the MC method. Compared with the method in [16], our method is much simpler.

Conclusions
In this paper, an MLMC method based on gradient projection is used to approximate a stochastic optimal control problem. One of the contributions is that MLMC method is used to compute expectation, reducing the influence of statistical and discrete errors on the convergence of the gradient projection algorithm. The other contribution is that the convergence proof of Algorithm 1 is given; to the best of our knowledge, this is not found elsewhere. Two numerical examples are carried out to verify the effectiveness of the proposed algorithm.
Our method can be applied to simulate other stochastic optimal control problems with expected value in the optimality conditions (optimal control of SDE or SPDE).
The MLMC is used to reduce the MSE. Other methods include the most important sampling method (see, e.g., [31,32]), quasi-Monte Carlo method and multilevel quasi-Monte Carlo method (see, e.g., [33,34]). In future work, we can use the importance sampling MLMC method and the multi-level quasi-Monte Carlo method to approximate the stochastic optimal control problem with gradient projection or other optimization methods.