Abstract
In this work we design a novel and efficient quasi-regression Monte Carlo algorithm in order to approximate the solution of discrete time backward stochastic differential equations (BSDEs), and we analyze the convergence of the proposed method. With the challenge of tackling problems in high dimensions we propose suitable projections of the solution and efficient parallelizations of the algorithm taking advantage of powerful many core processors such as graphics processing units (GPUs).
1. Introduction
In this work we are interested in numerically approximating the solution of a decoupled forward-backward stochastic differential equation
The terminal time is fixed. These equations are considered in a filtered probability space supporting a dimensional Brownian motion W. In this representation, X is a d-dimensional adapted continuous process (called the forward component), Y is a scalar adapted continuous process and Z is a q-dimensional progressively measurable process. Regarding terminology, is called terminal condition and f the driver.
2. Results
Our aim is to solve
where , f being the driver in (1). In other words, our subsequent scheme will approximate the solutions to
and
One important observation is that, due to the Markov property of the Euler scheme, for every i, there exist measurable deterministic functions , such that almost surely. A second crucial observation is that the value functions are independent of how we initialize the forward component. Our subsequent algorithm takes advantage of this observation. For instance, let be a random variable in with some distribution and let be the Euler scheme evolution of starting from ; it writes
This flexibility property w.r.t. the initialization then writes
Approximating the solution to (3) is actually achieved by approximating the functions . In this way, we are directly approximating the solution to the semi-linear PDE (5). In order to control better the truncation error we define a weighted modification of by for a damping exponent . For , and coincide. The previous DPE (7) becomes
The introduction of the polynomial factor gives higher flexibility in the error analysis, it ensures that decreases faster at infinity, which will provide nicer estimates on the approximation error when dealing with Fourier-basis.
Then we define some proper basis functions which satisfy orthogonality properties in and which span some space. It turns out that the choice of measure for defining the space has to coincide with the sampling measure of . Our strategy for defining such basis functions is to start from trigonometric basis on and then to apply appropriate changes of variable: later, this transform will allow to easily quantify the approximation error when truncating the basis. Using the notation
we can rewrite the exact solution as Under mild conditions on f, g and , is square-integrable, and therefore is in , thus Using the orthonormality property of the basis functions ’s, thus allowing us to the use of Monte Carlo simulation in order to compute the Fourier coefficients. The resulting Algorithm 1 is shown below.
| Algorithm 1. Global Quasi-Regression Multistep-forward Dynamical Programming (GQRMDP) algorithm |
|
3. Discussion
A implementation on GPUs of the GQRMDP algorithm is proposed. It includes two kernels, one simulates the paths of the forward process and computes the associated responses, the other one computes the regression coefficients . In the first kernel the initial value of each simulated path of the forward process is stored in a device vector in global memory, it will be read later in the second kernel. In order to minimize the number of memory transactions and therefore maximize performance, all accesses to global memory have been implemented in a coalesced way. The random numbers needed for the path generation of the forward process were generated on the fly (inline generation) taking advantage of the NVIDIA cuRAND library [1] and the generator MRG32k3a proposed by L’Ecuyer in [2]. Therefore, inside this kernel the random number generator is called as needed. Another approach would be the pre-generation of the random numbers in a separate previous kernel, storing them in GPU global memory and reading them back from this device memory in the next kernel. Both alternatives have advantages and drawbacks. In this work we have chosen inline generation having in mind that this option is faster and saves global memory. Besides, register swapping was not observed on the implementation and the quality of the obtained solutions is similar to the accuracy of pure sequential traditional CPU solutions achieved employing more complex random number generators. In the second kernel, in order to compute the regression coefficients, a parallelization not only over the multi-indices but also over the simulations was proposed. Thus, blocks of threads parallelize the outer for loop , whilst the threads inside each block carry out in parallel the inner loop traversing the vectors of the responses and the simulations.
Conflicts of Interest
The authors declare no conflict of interest.
References
- NVIDIA cuRAND Web Page. Available online: https://developer.nvidia.com/curand (accessed on 5 October 2018).
- L’Ecuyer, P. Good parameters and implementations for combined multiple recursive random number generators. Oper. Res. 1999, 47, 159–164. [Google Scholar] [CrossRef]
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).