Asynchronous Iterations of Parareal Algorithm for Option Pricing Models

Spatial domain decomposition method has been largely investigated in the last several decades, while time decomposition seems against intuition, which is not as popular as the former. However, there are still many attractive methods being proposed, especially the parareal algorithm, which shows both theoretical and experimental eﬃciency in the context of parallel computing. In this paper, we present an original model of asynchronous variant based on parareal scheme, applied to the European option pricing problem. Some numerical experiments are given to illustrate the convergence performance and computational eﬃciency of such method.


Introduction
Today's dominating high-performance computer architecture is parallel.Computer Assisted Engineering (CAE) generally lead to problems that are not naturally parallelizable.Strong effort was done during the last years to propose high-performance decomposition domain methods (DDM) that support scalability and reach high rates of speedup and efficiency.
Since about two decades, people have also tried to propose parallel-in-time algorithms.Although it can be appear as unnatural because of time-line "orientation", some attempts did succeed for particular problems or equations, inviting people to look forward and continue research in this direction.Multiple shooting methods [12,17] for example were proposed to allow for parallel computations of initial-value problems of differential equations.Another approach is the time decomposition method originally introduced by researchers from the multi-grid field [29,32] and applied to solve PDEs.Finally, people also tried to apply spatial domain decomposition methods for time dependent problems.The so-called Waveform relaxation methods (see, e.g., [27]) distribute the computation on parallel computers by partitioning the system into subsystems and then use a Picard iteration to compute the global solution.The major flaw of these methods is their low convergence rate.To make them efficient, one needs to use time-dependent transmission conditions adapted to the underlying problem [26].
The recent parareal scheme (resp.PITA algorithm) proposed in [33] (resp.[22]) follows both multiple shooting and multi-grid approaches.Two levels of grid in time are considered in order to split the time domain into subdomains.A prediction of the solution is parallel computed on the fine grid in time.Then at each end-boundary of time subdomains, the solution makes a jump with the previous initial boundary value (IBV) to the next time subdomain.A correction of the IBV for the next fine grid iterate is then computed on the coarse grid in time.Ref. [22] shows that the method converges at least in a finite number of iterations, due to the propagation of the fine time grid solution as at each iteration k, the IBV of the kth time subdomain is exact.Generalizations of the parareal scheme then were proposed by introducing a wider class of coarse solvers that are not specifically defined on coarser sub-grids in time.Coarse solvers can be simpler physical models where fine physics is approximated or simply skipped.
On the other hand, an efficient computational scheme has been proposed to confront the drawbacks of the classical parallel methods, which is called asynchronous iterative scheme.The primordial idea was put forward by Chazan and Miranker [13] for the solving of linear systems, where a necessary and sufficient convergence condition was derived.Several extensions were then developed based on the operator theory applied to the nonlinear problems [44,7,21,46].Furthermore, we cite also the work of Bertsekas and Tsitsiklis [8,9] for the general theories of asynchronous iterations, which leads to a great deal of striking applications (see, e.g., [41,40]).In [9], the asynchronous scheme is designed to be divided into two types, which are called totally asynchronous iterations and partially asynchronous iterations.The major difference consists in whether the bounded assumption being used for the communication time, where the negative answer was first formalized in [7].Recently, a brand-new scheme called asynchronous iterations with flexible communication was proposed in [20,45,24], which was continually under investigation in the following periods [25,19].
The key idea behind this scheme is that items are sent to the other processors as soon as it is obtained, regardless of the local completion.Accordingly, such scheme is built upon the two-stage model, which was introduced in [23], or the partial update situation, see [19] for further information.An attempt to present all the theoretical and practical efforts is beyond the scope of our paper, thus we refer the reader to [25,3] for a broader discussion.
In this paper, we concentrate on a modified parareal algorithm which will be enhanced by the asynchronous iterative scheme with flexible communication.In Section 2, we formalize an option pricing model which will be considered to the end.Section 3 gives the details of asynchronous parareal algorithm.Then, we implement the asynchronous solver in Section 4, where we present the programming trick using an advanced asynchronous communication library.Finally, Section 5 is devoted to the numerical experiments, and concluding remarks are presented in Section 6.

Overview
An option is a contract that gives its owner the right to trade in a fixed number of shares of the underlying asset at a fixed price at any time on or before a given date, which is a prominent form of financial derivatives that have been derived from other financial instruments, mainly used for hedging and arbitrage.The right to buy a security is called a call option, whereas the right to sell is called a put option.
Option pricing remained a frustrating problem that was obscure to solve, until the revolutionary advent of Black-Scholes model.The pioneering work was done by Black, Merton, and Scholes [10,42,43] in the early 1970s.In quick succession, Cox and Ross [15] proposed the risk neutral pricing theory, which leads to the famous martingale pricing theory [28] by Harrison and Kreps.Meanwhile, Cox, Ross, and Rubinstein simplified the Black-Scholes model, giving birth to the binomial options pricing model [16].More recently, some stochastic volatility option models have been proposed to better simulate the real volatility in the financial market (see, e.g., [30,18]).
To the end, we will investigate the original Black-Scholes equation, to solve the European call option pricing problem.Consider the following equation where V is the price of the option as a function of underlying asset price S and time t and the parameters are volatility σ and risk-free interest rate r, with the final and boundary conditions given by where T is the time to maturity, and E is the exercise price.There are many assumptions to be held: (i) the underlying asset follows geometric Brownian motion with drift rate µ and volatility σ constant, (ii) the underlying asset does not pay dividends, (iii) the riskfree interest rate r is constant, (iv) there are no transaction costs or taxes, (v) trading in assets is a continuous process, (vi) the short selling is permitted, (vii) there are no arbitrage opportunities.

Derivation of the Black-Scholes Equation
To derive this outstanding equation, it is necessary to establish a model for the underlying asset price.Economic theory suggest that a stock price follows a generalized Wiener process The right hand side is composed of two parts.The first contains a drift rate ν, which depicts the trend of stock price, whereas as the second item is a standard Wiener process, also known as Brownian motion, with a volatility parameter σ, describing the standard deviation of the stock's returns.The stochastic term Z takes on the expression where is a standard normal distribution variable, while stock price follows the lognormal distribution.In the limit, as ∆t → 0, this becomes Therefore, the option price is a function of stock price and time.From Itô's lemma [31], it follows the process Since a square-root term √ ∆t exists in the stochastic process, the second-order term is kept during the Taylor series expansions.It is seen that solving the equation seems obscure in view of the stochastic term dZ.The main conceptual idea proposed by Black and Scholes lies in the construction of a portfolio consisting of underlying stock and option that is instantaneously risk-less.Let Π be the value of portfolio consisting of one short position derivative and ∂C ∂S units of the underlying asset.Then, the value of the portfolio is given by Hence, the instantaneous change in the portfolio becomes Substituting (2) and (3) yields The change of the portfolio value is not dependent on dZ, which means that the portfolio is risk-less during the time interval dt.As mentioned before, there are no arbitrage opportunities in the financial market, so that the return of this portfolio must be equal to other risk-free securities.Thus dΠ = rΠdt.
Substituting ( 4) and ( 5) into (6) yields where we obtain the original Black-Scholes equation ( 1), which involves adequate conditions to reach the final solution.

Transformation into the Heat Equation
It is seen that the Black-Scholes partial differential equation ( 1) is a parabolic linear equation, while the heat equation leads to a simpler one.Accordingly, the transformation from the original to the latter can simplify the solving process.A change of variables is given as Substituting into the Black-Scholes equation (1) gives then gives the heat equation in an infinite interval We notice that the heat equation is forward parabolic, whereas the Black-Scholes equation is backward.In particular, the initial and boundary conditions become Unfortunately, we address the issue that such infinity conditions can not be applied directly to the discrete applications.A suitably large boundary instead of the original boundary is supposed to be considered.Accordingly, we choose the precise boundaries as following We then notice that the equation ( 7) can be solved employing appropriate time and space discretization.

Black-Scholes Pricing Formula
Without numerical tools, we can also deduce the solution of ( 7), called the Black-Scholes formula, by some analytic procedures.In the case of heat equation, the fundamental solution is In particular, the general solution of the heat equation can be presented by the convolution integral where u(x, 0) = u 0 (x) with x ∈ R and τ ≥ 0. Substituting (8) yields )u 0 (y)dy.
We note that applying the initial condition into such eqaution still involves some brute force.
To summarize, the final closed-form solution is where N (x) is the cumulative distribution function of the standard normal distribution with the parameters d 1 and d 2 shown as below We notice that 3 Asynchronous Parareal Algorithm

Asynchronous Iteration
Let E be a product space with For i = 1, . . ., p and j = 1, . . ., p, let µ i j (k) ∈ N such that and satisfying ∀i, j ∈ {1, . . ., p}, lim An asynchronous iteration corresponding to f and starting with a given vector x 0 is defined recursively by  Note that the assumption (9) illustrates that each processor will proceed the computation now and again, while (11) indicates that each component used by each processor will be updated eventually.
On the other hand, Algorithm 1 gives us a computational model of the basic asynchronous iteration.
Algorithm 1 Asynchronous iteration with asynchronous communication.Receive x from other processors 3: Send x i to other processors 5: end while It is important to notice that the asynchronous iteration functions well without the synchronization points, while the latter may become a decisive bottleneck.In other words, we proceed the next iteration using the latest local data rather than waiting for the newest information from other processors.Finally, we are interested in the operational process of such iteration.An typical example is showed in Figure 1.To make clear, we illustrate also the synchronous counterpart in Figure 2 whereby we can gain a better insight of the asynchronous iterative mechanism.
We now turn to the study of two-stage methods that can be applied to the parareal algorithm.The original asynchronous two-stage methods were introduced in [23], which are called outer asynchronous and totally asynchronous respectively.A well-known improvement consists of allowing the intermediate results from the inner level to be used by the other processors.As a consequence, this method may be performant in some cases taking advantage of a better approximation to the solution.
It would obviously be possible to extent the model ( 12) to the two-stage situation.Let f : E × E → E be the function defined by f i : E × E → E i .For i = 1, . . ., p and j = 1, . . ., p, and satisfying ∀i, j ∈ {1, . . ., p}, lim We assume that the conditions ( 9), ( 10) and ( 11) are still vouched in such context.Then, an asynchronous two-stage iteration with flexible communication corresponding to f and starting with a given vector x 0 is defined recursively by Furthermore, we can establish the computational scheme from the aforementioned mathematical model, which is presented in Algorithm 2.
Algorithm 2 Asynchronous two-stage iteration with flexible communication.Receive x from other processors 3: x ← x

4:
Send x i to other processors 5: while not precise enough do 6: Send x i to other processors 8: end while 9: end while Clearly, the crucial property lies in the sending instruction of inner iteration whereby one can make a transmission without waiting for the local completion in such scope.In this context, we may fully exploit the latest local approximations to accelerate the global convergence.In the same manner, Figure 3 provides an example of the asynchronous two-stage iteration with flexible communication, which illustrates the context of inner/outer iteration (see, e.g., [19]).
Figure 4: Parareal iterative scheme.F propagates step-by-step, whereas G performs only one step in each subdomain.We compute λ n+1 by combining the predictor and the corrector.Then, continue the next iteration.

Classical Parareal Algorithm
Given a second-order linear elliptic operator L, consider the following time-dependent problem where the boundary ∂Ω is Lipschitz continuous.The above mathematical description can be decomposed into N sequential problems By importing a function λ n , we now reconstruct our problem as where n = 0, . . ., N − 1, together with the condition Now we can solve the subproblems ( 16) independently with some essential message passing of the initial boundary values.The parareal iterative scheme is driven by two operators, G and F , which are called coarse propagator and fine propagator respectively.On the other hand, we assume that the time-dependent problem is approximated by some appropriate classical discretization schemes.Let δt be a fine time step.Each subproblem will be solved at a time by G with respect to ∆T , and be further solved by F concerning δt.Note that the discretization schemes exploited for such two operators might be different.Finally, in order to approximate the solution of ( 16) in parallel, we arrive at an iterative method which is the parareal iterative scheme, where G(λ k+1 n ) is called predictor and the remain is called corrector, with n = 0, . . ., N − 1 and λ 0 0 = u 0 .This is therefore a predictor-corrector scheme.Furthermore, we are obliged to respect two more conditions λ 0 n+1 = G(λ 0 n ) and λ k+1 0 = λ k 0 within the iterations.To make clear, Figure 4 illustrates the aforementioned context and Algorithm 3 gives us the implementation of such method.λ 0 = G(λ 0 ) 5: end for 6: w = G(λ 0 ) 7: while not convergence do wait for the update of λ 0 from processor n − 1 10: send λ to processor n + 1 as λ 0 13: w = w 14: end while

Asynchronous Parareal Algorithm
Now we turn to the investigation of a modified parareal method based on asynchronous scheme.We choose equation (15) to establish the new model.Given P k ⊆ {0, . . ., N − 1} and P k = ∅, consider the following iteration where λ k+1 0 = λ k 0 , and satisfying and under the similar assumptions We notice that the predictor and the corrector come from different iterations, so that we can treat parareal scheme as a special case of two-stage methods.It is seen that the classical parareal scheme will be obtained when setting P k = {0, . . ., N − 1}, µ n (k) = k + 1 and ρ n (k) = k.As a consequence, the asynchronous parareal scheme is illustrated in Algorithm 4 whereby one can exploit for the implementation.

Asynchronous Communication Library
The development of asynchronous communication library is interesting because they provide the reliable computational environments.However, the issues of termination and communication management are important, for which many theoretical investigations have been done, but few implementations are proposed.Several libraries have been proposed to deal with the problems (see, e.g., [6,5,4,11,14]), implemented upon Java or C++, but no one manages to build upon the MPI library, which is indeed widely used in the scientific domain.
Recently, JACK [39], an asynchronous communication kernel library, was proposed as the first MPI-based C++ library for parallel implementation of both classical and asynchronous iterations, which has been upgraded to the new version called JACK2.It offers a new highlevel API, a simplified management of resources and several global convergence detectors.In the sequel, we will specify the implementation of the asynchronous parareal iterative scheme.

Preprocessing
We follow the primitive ideas of JACK2 that each element should be configured before processing including communication graph, communication buffers, computation residual and solution vectors.Hence, there are many but unambiguous works to be carried out in this cycle.
The parareal algorithm is a special case of domain decomposition methods, since each time frame only depends upon its predecessor, and essentially needed by its successor.We separate the neighbors into the outgoing links and the incoming links, and thus write the code as Listing 1.Notice that the first processor has no predecessor, whereas the last one has no successor.The second component is the communication buffers, which is managed completely by the library, illustrated in Listing 2. Clearly, each processor needs to handle the information from one incoming neighbor and one outgoing neighbor, while each neighbor has an array of data.Therefore, buffers are constructed with the two-dimensional arrays.The computation residual involves an indication of norm, whereby we define the length of a vector, and thus measure the convergence results compared with a threshold.In Listing 3, as an example, we choose the L2-norm and present the configuration.
Finally, for the asynchronous parareal scheme, the solution vectors are the parameters to initialize the configuration of asynchronous iterations.Listing 4 illustrates the variables in demand.In JACK2, we can see that JACKComm is a front-end interface to perform both blocking and nonblocking tasks.A simple way to initialize the communicator is shown in Listing 5.For the asynchronous mode, the library employs a member function in initializing the solution vectors, which can be easily overloaded for some advanced abilities.We mention here that there are many classes behind the common front-end interface to handle the diverse problems, such as stopping criterion, norm computation and spanning tree construction.If necessary, moreover, one can switch to an appropriate convergence detection method within several choices equipped with some advanced configurations, which are still easy to manipulate.We do not pursue these features further and turn to the implementation of processing step.

Implementation of Parareal Algorithms
We implement the algorithms in two levels, which solve the sequential time-dependent problem and parallel-in-time problem respectively.It is seen that in parallel solver we solve (*res_vec_buf) = 0.0; while (res_vec_norm >= res_thresh) { comm.UpdateResidual(); numb_iter++; } } Finally, the asynchronous mode is more interesting for us, while the implementation is compact enough.Listing 11 illustrate a somewhat similar code as before.
Listing 11: Asynchronous parareal iterative process.Notice that function Integrate gives the same effect as G(λ n ) or F (λ n ).In the code above, we use vec_U0 in each processor as incoming information and employ the two propagator to obtain the intermediate data, whereas vec_U, the solution vector, is computed eventually by the predictor-corrector scheme.lconv_flag is a convergence indicator equipped by the library, whereby function UpdateResidual can invoke other objects to communicate the residual information.It is seen that the programmer should guarantee the sending and receiving buffers corresponding to the vec_U and vec_U0.In practice, such behavior depends on the mathematical library used in the project.

Numerical Experiments
In this section, we are interested in the numerical performance of asynchronous parareal method.We utilize Alinea [34] to carry out the mathematical operations, which is implemented in C++ for both central processing unit and graphic processing unit devices.Note that it has basic linear algebra operations [1] and plenty of linear system solvers [36,2,35], together with some energy consumption optimization [38] and spatial domain decomposition methods [37].Besides, the experiments are executed on the SGI ICE X clusters connected with InfiniBand.Each node includes two Intel Xeon E5-2670 v3 2.30 GHz CPUs.Finally, SGI Message Passing Toolkit 2.14 provides the MPI environment.
As we mentioned above, the application is European option pricing problem, and we employ the Black-Scholes model to predict the target price.The basic idea is, obviously, that we can use heat equation to simplify the computation.After deciding the appropriate discretization schemes, we need to solve the remain equation in the time domain.In the sequel, we assume that both coarse problem and fine problem are approximated by the implicit Euler method, while testing different high-order schemes are beyond the purpose of this paper.Moreover, let us fix volatility and risk-free interest rate as σ = 0.2 and r = 0.05, which has few influence to the parareal algorithms.Finally, we choose finite difference method to discretize the spatial domain, thereby a variable m representing the number of sub-intervals has to be determined.We first illustrate some results which means to prove the precision of asynchronous parareal scheme.In Table 1, we vary ∆T to simulate different time to maturity, and further obtain different option prices, where we can see that the method is precise enough.In the case of convergence properties, we illustrate an example In Figure 5, over which we can see, tidily enough, that each processor converges fast.For the synchronous counterpart, as mentioned before, the processor n will stop updating after (n + 1)th iteration.We notice that the asynchronous parareal are similar to that case, but bring out different number of iterations when keeping a better view.
We now consider the running time.The first case leads to a fixed time to maturity, while changing the number of processors dramatically, shown in Figure 6.We vary ∆T with respect to N to keep the problem immutable with some given parameters.Finally, we compare the performance of asynchronous parareal with its synchronous counterpart, which leads to Table 2.The results illustrate that the asynchronous scheme requires more iterations with the increase of processors, whereas the number of iterations of the synchronous version remains the same throughout the test.We notice that the asynchronous version is faster.Let us mention here that the environment is not distributed, thereby the scene of large communication delays can hardly appear.

Conclusions
In this paper we investigated a modified parareal method with respect to the asynchronous iterative scheme.We first proposed a brand-new scheme from the traditional parallel theories, upon which an attractive model was established.Note that we applied asynchronous iterations to the predictor-corrector scheme, instead of the classical inner-outer scheme, thereby we can make use of the two-stage model to derive our target equations.To overcome the difficulty of implementation, an asynchronous communication library has been adopted to facilitate our programming.We illustrated the design philosophy in detail and gave several numerical results, from which, as expected, we illustrated the excellent precision and the conditional efficiency for the target approach.Notice that the Black-Scholes equation is too simple to accurately predict the real financial market, thereby we could proceed our research with a somewhat challenging situation, such as a mutable volatility or other types of option pricing problem.More important, there are still many theoretical research could be done, upon which a comprehensive analysis is under investigation by the authors.

Figure 1 :
Figure 1: Example of the asynchronous iteration with asynchronous communication.

Figure 2 :
Figure 2: Example of the synchronous iteration with asynchronous communication.

Figure 3 :
Figure 3: Example of the asynchronous two-stage iteration with flexible communication.

Figure 6 :
Figure 6: Asynchronous parareal iterations for fixed time to maturity.

Table 1 :
Asynchronous parareal results with approximate option prices V a , exact option prices V e , absolute error a , relative error r and number of iterations I, given N = 20, m = 250, δt = 0.001, S = 50, E = 60.