Deep Neural Network Algorithms for Parabolic PIDEs and Applications in Insurance and Finance

Rüdiger Frey; Verena Köck

doi:10.3390/computation10110201

and

¹

Institute for Statistics and Mathematics, Vienna University of Economics and Business, Welthandelsplatz 1, 1020 Vienna, Austria

¹

Institute for Statistics and Mathematics, Vienna University of Economics and Business, Welthandelsplatz 1, 1020 Vienna, Austria

^*

Author to whom correspondence should be addressed.

Computation2022, 10(11), 201;https://doi.org/10.3390/computation10110201

This article belongs to the Special Issue Computational Issues in Insurance and Finance

Version Notes

Order Reprints

Abstract

In this paper we study deep neural network algorithms for solving linear and semilinear parabolic partial integro-differential equations with boundary conditions in high dimension. Our method can be considered as an extension of the deep splitting method for PDEs to equations with non-local terms. To show the viability of our approach, we discuss several case studies from insurance and finance.

Keywords:

deep neural networks; parabolic partial integro-differential equations; machine learning; insurance; finance

1. Introduction

Many problems in insurance and finance lead to terminal or boundary value problems involving parabolic partial integro-differential equations (PIDEs). Examples include option pricing in models with jumps, the valuation of insurance contracts, optimal investment problems and many applications in credit risk. These PIDEs can be inear (such as PIDEs arising in risk-neutral pricing) or semilinear (such as the dynamic programming equation in many stochastic control problems). Practical applications often involve several underlying assets or economic factors, so one has to deal with PIDEs in a high-dimensional space. These PIDEs do typically not admit an analytic solution, making the design of suitable numerical methods an ongoing challenge. Existing numerical methods include deterministic schemes, in particular finite difference and finite element methods, see e.g., [1,2,3,4,5], and random schemes based on Monte Carlo methods, see e.g., [6,7,8]. However, finite difference and finite element methods cannot be used in the case of high-dimensional PIDEs, as they suffer from the curse of dimensionality. Monte Carlo methods, on the other hand, are suitable for problems in higher dimensions. However, these methods only provide a solution for a single fixed time-space point

(t, x)

. This is problematic in risk management applications, where one needs to find the solution of a pricing problem for a large set D of future scenarios. The naive solution via nested Monte Carlo is in most cases computationally infeasible. Regression-based Monte Carlo methods (see [7]) sometimes help, but the choice of proper basis functions remains a delicate issue. Moreover, it is not straightforward to apply Monte Carlo techniques to semilinear parabolic equations.

For these reasons, many recent contributions study machine learning techniques for the numerical solution of PDEs. A large strand of this iterature is based on the representation of semilinear parabolic PDEs via backward stochastic differential equations (BSDEs). In the seminal papers [9,10], a BSDE is discretized using a time grid

t_{0} < t_{1} < \dots < t_{N} = T

, and the solution at the initial date

t_{0}

and its gradient at every time step

t_{n}

are approximated by a combined deep neural network. The parameters are trained by minimizing the difference between the network approximation and the known terminal value of the solution. An error estimation of this method is given in [11,12] considers an extension to elliptic semilinear PDEs and applications to insurance mathematics. Other contributions use a backward induction over the time steps

t_{n}

. To begin with, [13] estimates the solution and its gradient simultaneously by backward induction via sequential minimization of suitable loss functions; moreover, the paper contains convergence results. The paper of [14] uses a different discretization method, called deep splitting. In this method, one computes the unknown gradient of the solution by automatic differentiation, which reduces the size of the networks. For inear PDEs, the method boils down to the simpler global regression approach of [15], where Feynman-Kac representation and the

L^{2}

-minimality of conditional expectations is used to characterize the solution through an infinite-dimensional stochastic minimization problem, which is then solved with machine learning. Ref. [16] combines the ideas of [13,14] to introduce a neural network scheme for fully nonlinear PDEs. Finally, [17] extends the method in [13] and the paper provides a convergence analysis that can be adapted to show convergence of the deep splitting method of [14].

Applications of deep learning methods to partial integro differential equations with non-local terms, on the other hand, are scarce. The paper [18] presents an extension of the method of [13] to PIDEs and generalizes the convergence results. Numerical case studies are, however, not provided. In fact, from a numerical viewpoint, the method of [18] is quite involved since one needs to approximate the solution, the gradient and the non-local term in the PIDE via three separate networks. The work of [19] is based on the deep Galerkin method of [20]. This is an alternative machine learning approach for PDEs, where the network is trained by directly minimizing the deviations of the fitted functions from the desired differential operator and boundary conditions. The very recent contribution [21] is discussed in Section 5.

In this paper, we consider deep neural network (DNN) algorithms for inear and semilinear parabolic PIDEs that generalize the regression approach of [15] and the deep splitting method of [14], respectively. In the semilinear case, we linearize the equation ocally in time using a time grid

t_{n}

,

n = 0, \dots, N

, and we perform a backward induction over the grid points using the DNN algorithm for the inear case in each step. In particular, the gradient and the non-local term entering the equation at

t_{n}

are computed using the approximate solution at time

t_{n + 1}

. The advantage of this approach, as opposed to the method of [18], is that we approximate only the solution by a DNN so that it suffices to train a single network per time step. In the semilinear case, we propose an alternative linearization procedure to [14], and we show in numerical experiments that this procedure performs substantially better than the original deep splitting method of [14]. Moreover, we apply our DNN algorithms also to boundary value problems, while [14,15] consider only pure Cauchy problems.

The focus of this paper is on the application of our methodology to problems in insurance and finance, a formal convergence analysis is given in the companion paper [22]. To assess the performance of our methodology, we carry out two practically relevant case studies involving multi-dimensional PIDEs arising in actuarial and financial mathematics. In the first case study, we study so-called reinsurance counterparty credit risk (RCCR), which is the risk of a loss that occurs if a reinsurance company fails to meet her contractual obligations towards the ceding insurer. We consider a reinsurance contract of stop-loss type in a framework where claims arrive with stochastic intensity. The associated RCCR can be quantified with the credit value adjustment (CVA), which is defined as the expected discounted value of the replacement cost for the contract. The computation of the CVA by Monte Carlo leads to a nested Monte Carlo problem. We compute the value of the reinsurance contract and the CVA with our deep learning method and asses the performance by comparing the solution to the results of an extensive Monte Carlo simulation. The second case study is an optimization problem in finance. We consider a bank who dynamically optimizes her credit portfolio in the presence of transaction costs and risk capital constraints. In the absence of capital constraints, the problem admits an analytic solution and is, therefore, a useful test case. With capital constraints, the model leads to a semilinear boundary value problem with no explicit solution, so we study this case numerically. Our numerical experiments show that in all test cases the performance of the proposed approximation algorithms is quite satisfactory in terms of accuracy and speed.

The paper is organized as follows. Section 2 introduces the general setting and characterizes the types of inear and semilinear PIDEs considered. In Section 3 the method for inear PIDEs is described, and Section 4 presents the related case study on an application for reinsurance counterparty credit risk. Section 5 is devoted to the theory of the deep splitting method for semilinear PIDEs and Section 6 continues with the related case study on optimal credit portfolios with and without capital restrictions. Section 7 finally concludes.

2. Modelling Framework

We fix a probability space

(Ω, F, P)

, a time horizon T and a right continuous filtration

F

. Consider measurable functions

μ : [0, T] \times R^{d} \to R^{d}

,

σ : [0, T] \times R^{d} \to R^{d \times d}

and

γ^{X} : [0, T] \times R^{d} \times E \to R^{d}

, where

(E, E)

is a separable measurable space. We assume that

(Ω, F, P)

supports a d-dimensional Brownian motion W and a Poisson random measure J on

[0, T] \times E

. The compensator of J is given by

ν (d z) d t

for a sigma-finite measure

ν

on E. We consider a d-dimensional process X that is the unique strong solution of the SDE

\begin{matrix} d X_{t} = μ (t, X_{t}) d t + σ (t, X_{t}) d W_{t} + \int_{E} γ^{X} (t, X_{t -}, z) J (d t, d z), X_{0} = x \in R^{d} . \end{matrix}

(1)

Define the set

D^{+} (t, x) = {z \in E : ∥ γ^{X} (t, x, z) ∥ > 0}

and note that X jumps at t whenever

J ({t} \times D^{+} (t, X_{t -})) > 0

. We assume that

\begin{matrix} E [\int_{0}^{T} ν (D^{+} (t, X_{t -})) d t] < \infty . \end{matrix}

(2)

This condition ensures that X has a.s. only finitely many jumps on

[0, T]

, so that every integral with respect to J is well-defined. The restriction to finite-activity jump processes simplifies the exposition and it is sufficient for most applications in insurance. Conditions on the coefficients

μ

,

σ

and

γ^{X}

ensuring that the SDE (1) has a unique strong solution are given, for instance, in [23] or [24].

By

C^{k} (D)

, we denote the functions that are k times continuously differentiable on the set

D \subset R^{d}

, and by

C^{1, k} ([0, T] \times D)

, we denote functions that are once continuously differentiable in t and k-times continuously differentiable in x on the set

[0, T] \times D

. For every function

h \in C^{1, k} ([0, T] \times D)

, we write

h_{x_{i}}

for the first derivatives of h with respect to

x_{i}

,

h_{x_{i} x_{j}}

,

1 \leq i, j \leq d

, for second derivatives and

h_{t}

for the first derivative with respect to time.

Define the matrix

Σ (t, x) = (b_{i, j} (t, x), i, j = 1, \dots, d)

, by

Σ (t, x) = σ (t, x) σ^{⊤} (t, x)

and consider for

u \in C^{1, 2} ([0, T] \times R^{d})

the integro-differential operator

L

given by

\begin{matrix} L u (t, x) : = & \sum_{i = 1}^{d} μ_{i} (t, x) u_{x_{i}} (t, x) + \frac{1}{2} \sum_{i, j = 1}^{d} b_{i, j} (t, x) u_{x_{i} x_{j}} (t, x) \\ + \int_{R^{d}} [u (t, x + γ^{X} (t, x, z)) - u (t, x)] ν (d z), x \in R^{d}, t \in [0, T] . \end{matrix}

(3)

The operator

L

is the generator of X, and X is a solution of the martingale problem for

L

, see e.g., [25] for details.

Consider functions

c : [0, T] \times R^{d} \to R

,

r : [0, T] \times R^{d} \to R

and

g : [0, T] \times R^{d} \to R

, and et D be an open subset of

R^{d}

. In Section 3 we are interested in the following boundary value problem

\begin{matrix} u_{t} (t, x) + L u (t, x) - r (t, x) u (t, x) + c (t, x) = 0, (t, x) \in [0, T) \times D, \end{matrix}

(4)

\begin{matrix} u (t, x) = g (t, x), (t, x) \in ([0, T) \times (R^{d} \ D)) \cup ({T} \times R^{d}) . \end{matrix}

(5)

The special case

D = R^{d}

corresponds to a pure Cauchy problem without boundary conditions; in that case we use the simpler notation

g (T, x) = : φ (x)

to denote the terminal condition.

It follows from the Feynman-Kac formula that under some integrability conditions a classical solution u of (4) has the probabilistic representation

\begin{matrix} u (t, x) = E [\int_{t}^{T \land τ} e^{- \int_{t}^{s} r (u, X_{u}) d u} c (s, X_{s}) d s + e^{- \int_{t}^{T \land τ} r (u, X_{u}) d u} g (T \land τ, X_{T \land τ}) ∣ X_{t} = x], \end{matrix}

(6)

where

τ : = inf {s \geq t : X_{s} \notin D}

. Conversely, [26,27] provide conditions ensuring that the function u defined in (6) is a classical solution of the problem (4) (for the case of a pure Cauchy problem).

In Section 3 we propose a deep neural network (DNN) algorithm to approximate the function u defined in (6). In Section 5 we are interested in semilinear problems of the form

\begin{matrix} u_{t} (t, x) + L u (t, x) + f (t, x, u (t, x), \nabla u (t, x)) = 0, (t, x) \in [0, T) \times D, \end{matrix}

(7)

\begin{matrix} u (t, x) = g (t, x), (t, x) \in ([0, T) \times (R^{d} \ D)) \cup ({T} \times R^{d}), \end{matrix}

(8)

where

f : [0, T] \times R^{d} \times R \times R^{d} \to R

is a nonlinear function such as the Hamiltonian in a typical Hamilton Jacobi Bellman equation. To handle this case, we partition the interval

[0, T]

into time points

0 = t_{0} < t_{1} < \dots < t_{N} = T

, consider a linearized version of (7) for each subinterval

[t_{n}, t_{n + 1}]

, and apply the DNN algorithm for the inear case recursively.

3. Deep Neural Network Approximation for Linear PIDEs

In this section we consider inear PIDEs. We extend the regression-based algorithm of [15] to PIDEs and we include boundary conditions into the analysis.

3.1. Representation as Solution of a Minimization Problem

Fix some time point

t \in [0, T)

and a closed and bounded set

A \subset \bar{D}

. Define the function

u : [0, T] \times R^{d} \to R

by the Feynman-Kac representation (6). We want to compute an approximation to the function

u (t, \cdot)

on the set A. The key idea is to write this function as a solution of a minimization problem in an infinite dimensional space. This representation is used to construct a loss function for our deep neural network method.

Consider some random variable

ξ

whose distribution is absolutely continuous with respect to the Lebesgue measure such that the corresponding density has support A (in applications the distribution of

ξ

is often the uniform distribution on A). Denote by

X^{ξ}

the solution of the SDE (1) with initial value

X_{t} = ξ

. Define the random variable

\begin{matrix} Y^{ξ} : = \int_{t}^{T \land τ} e^{- \int_{t}^{s} r (u, X_{u}^{ξ}) d u} c (s, X_{s}^{ξ}) d s + e^{- \int_{t}^{T \land τ} r (u, X_{u}^{ξ}) d u} g (T \land τ, X_{T \land τ}^{ξ}) . \end{matrix}

(9)

Assume that

E [| Y^{ξ} |^{2}] < \infty

and that the function

u (t, \cdot)

belongs to

C^{0} (\bar{A})

. Since

X^{ξ}

is a Markov process, it holds that

u (t, ξ) = E [Y^{ξ} ∣ σ (ξ)]

, where

σ (ξ)

is the sigma-field generated by

ξ

. As

Y^{ξ}

is square integrable, we get from the

L^{2}

-minimality of conditional expectations that

\begin{matrix} E & [| Y^{ξ} - u (t, ξ) |^{2}] = inf \{E [| Y^{ξ} - Z |^{2}] : Z \in L^{2} (Ω, σ (ξ), P)\} . \end{matrix}

(10)

Since

u (t, \cdot) \in C^{0} (A)

and since the density of

ξ

is strictly positive on A we conclude that

u (t, \cdot)

is the unique solution of the minimization problem

\begin{matrix} min E [| Y^{ξ} - v (ξ) |^{2}], v \in C^{0} (A) . \end{matrix}

(11)

The problem (11) can be solved with deep learning methods, as we explain next.

3.2. The Algorithm

The first step in solving (11) with machine learning techniques is to simulate trajectories of

X^{ξ}

up to the stopping time

τ

. The simplest method is the Euler-Maruyama scheme. Here, we choose a time discretization

t = t_{0} < t_{1} < \dots < t_{M} = T

,

Δ t_{m} = t_{m} - t_{m - 1}

, generate K simulations

ξ^{(1)}, \dots, ξ^{(K)}

of the random variable

ξ

and simulate K paths

X^{(1)}, \dots, X^{(K)}

of

X^{ξ}

up to the stopping time

τ

by the following recursive algorithm. We et

X_{t}^{(k)} = ξ^{(k)}

, and for

m \geq 1

,

\begin{matrix} X_{t_{m} \land τ}^{(k)} : = X_{t_{m - 1} \land τ}^{(k)} + 1_{(0, τ)} (t_{m - 1}) ( & μ (t_{m - 1}, X_{t_{m - 1}}^{(k)}) Δ t_{m} + σ (t_{m - 1}, X_{t_{m - 1}}^{(k)}) Δ W_{t_{m}}^{(k)} \\ + \int_{t_{m - 1}}^{t_{m}} \int_{R^{d}} γ (t_{m - 1}, X_{t_{m - 1}}^{(k)}, z) J^{(k)} (d z, d s)) . \end{matrix}

(12)

Note that the integrand in the integral with respect to

J^{(k)}

is evaluated at

t_{m - 1}

, so this integral corresponds to the increment of a standard compound Poisson process. Using these simulations we compute for each path

Y^{(k)} : = \int_{t}^{T \land τ} e^{- \int_{t}^{s} r (u, X_{u}^{(k)}) d u} c (s, X_{s}^{(k)}) d s + e^{- \int_{t}^{T \land τ} r (u, X_{u}^{(k)}) d u} g (T \land τ, X_{T \land τ}^{(k)}),

where the integrals on the right can be approximated by Riemann sums.

In the next step, we approximate

u (t, \cdot)

from (6) by a deep neural network

U_{t} (\cdot) = U_{t} (\cdot; θ^{*}) : A \to R^{d}

. We determine the network parameters

θ^{*}

(training of the network) by minimizing the loss function

\begin{matrix} Θ ∋ θ \mapsto \frac{1}{K} \sum_{k = 1}^{K} {(Y^{(k)} - U_{t} (ξ^{(k)}; θ))}^{2}, \end{matrix}

(13)

where

Θ

denotes the set of admissible network parameters. For this we rely on stochastic gradient-descent methods; algorithmic details are given in the next section. This approach can be considered as a regression-based scheme since one attempts to minimize the squared error between the DNN approximation

U_{t} (\cdot; θ)

and the given terminal and boundary values of the PIDE. Note that derivatives of u do not enter the algorithm, so that the method applies even in cases where u is only a continuous viscosity solution of the PIDE (4).

Remark 1

(Convergence result for the inear case). In the companion paper [22] we provide a convergence result for the case of pure Cauchy problems. Under Lipschitz assumptions on the coefficients of

L

and on the functions c and g we show that

E [{[U_{t} (X_{t}, θ^{*}) - u (t, X_{t})]}^{2}] \leq C (sup_{m} | Δ t_{m} | + \bar{ϵ}),

(14)

where

\bar{ϵ}

measures the approximation quality of the set

N_{t} = {U_{t} (\cdot, θ) : θ \in Θ}

of network functions (loosely speaking,

\bar{ϵ}

describes how well the true solution can be approximated by functions from

N_{t}

). Relation (14) shows that the approximation error of the method can be made small by considering a fine time grid (small

{sup}_{m} | Δ t_{m} |

) and a class of network functions with good approximation properties (small

\bar{ϵ}

).

4. Case Study for the Linear Case: Reinsurance Counterparty Credit Risk

In this section we test the performance of the proposed DNN algorithm for inear PIDEs. Our case study is concerned with reinsurance counterparty credit risk (RCCR), which is the risk that a reinsurance company R might default and is thus unable to fulfill her contractual obligations towards the ceding insurer I. Using market-consistent valuation this risk is described in terms of the credit value adjustment (CVA), which gives the expected discounted value of the replacement cost for the contract incurred by I at the default time

τ_{R}

of R. We work in the model of [28], where RCCR is studied using Monte Carlo methods. The framework of [28] includes several sources of dependence between the market value of the reinsurance contract and the default event of the reinsurance company. In this way it is possible to account for practically relevant contagion effects that are often subsumed under the abel “wrong way risk”.

Our case study consists of two parts: first, we compute the value of a reinsurance contract with maturity T, where the underlying claims process L follows a doubly stochastic marked point process. Second, we compute the CVA to measure the RCCR arising from the reinsurance contract. This leads to a nested Monte Carlo problem, as one needs to compute the value of the reinsurance contract for all possible values of

L_{t}

for every

0 \leq t \leq T

. The Monte Carlo approach used in [28] therefore has a high computational cost, whereas our DNN algorithm is very well suited for this problem. Details on the numerical implementation and the computational equipment are given in Remark 2.

4.1. Valuation of an Insurance Contract with Doubly Stochastic Poisson Arrivals

Consider a reinsurance contract with maturity

T = 1

on an insurance portfolio with cumulative loss process

L = {(L_{t})}_{t \geq 0}

. To describe the dynamics of L, we introduce a sequence

{T_{n}}_{n \in N}

of claim arrival times with nonnegative intensity process

λ^{L} = {(λ_{t}^{L})}_{t \geq 0}

and a sequence

{Z_{n}}_{n \in N}

of claim sizes that are iid strictly positive random variables independent of the counting process

N = {(N_{t})}_{t \geq 0}

with

N_{t} = \sum_{n = 1}^{\infty} 1_{{T_{n} \leq t}}

. The cumulative loss process is given by

L_{t} = \sum_{n = 1}^{N_{t}} Z_{n}

,

t \geq 0

. We assume that the

Z_{n}

are Gamma(

α

,

β

) distributed with density

f_{α, β} (z)

. This is a common choice in insurance. The claim-arrival intensity process

λ^{L}

satisfies the SDE

\begin{matrix} d λ_{t}^{L} & = μ^{L} (λ_{t}^{L}) d t + σ^{L} (λ_{t}^{L}) d W_{t}^{1}, λ_{0}^{L} = λ_{0} \in R_{+}, \end{matrix}

(15)

where

W^{1}

is a standard Brownian motion. Define the process X by

X_{t} = (L_{t}, λ_{t}^{L})

,

t \geq 0

. We assume that the reinsurance contract is a stop-loss contract, i.e., the indemnity payment is of the form

φ (L_{T})

with

\begin{matrix} φ (l) = {[l - K]}^{+}, with {[z]}^{+} = max {z, 0} . \end{matrix}

(16)

We use a market consistent valuation approach and define the market value u at time

t \in [0, T]

of the reinsurance contract by

\begin{matrix} u (t, l, λ) : = E [φ (L_{T}) | L_{t} = l, λ_{t}^{L} = λ], (l, λ) \in R_{+}^{0} \times R_{+} . \end{matrix}

(17)

Here, expectations are taken under a pricing measure and the risk premium charged by the reinsurer is accounted for by a proper choice of that measure, see [28] for details. In [28] it is shown that u is the unique solution of the PIDE

u_{t} (t, l, λ) + L u (t, l, λ) = 0

with terminal condition

u (T, l, λ) = φ (l)

and generator

\begin{matrix} L u (t, l, λ) = & u_{λ} (t, l, λ) μ^{L} (λ) + \frac{1}{2} u_{λ λ} (t, l, λ) {(σ^{L} (λ))}^{2} \\ + λ \int_{R} [u (t, l + z, λ) - u (t, l, λ)] f_{α, β} (z) d z, \end{matrix}

(18)

for

(l, λ) \in R_{+}^{0} \times R_{+}

,

t \in [0, T)

. There is no explicit solution for this PIDE, and we approximate

u (0, l, λ)

on the set

A : = {(l, λ) : \in [0, 30], λ \in [90, 130]}

with a deep neural network

U_{0} (l, λ)

. Table 1 contains the parameters we use. Paths of the processes L and

λ^{L}

are simulated with the Euler-Maruyama scheme on a equidistant time grid

0 = t_{0} < t_{1} < \dots < t_{N} = T, Δ t = T / N

with

N = 20

and

ξ \sim Unif (A)

.

Table 1. Parameters used for the valuation of the stop-loss contract. The claim sizes are Gamma

(α, β)

distributed with density function

f_{α, β}

.

Figure 1 shows the approximate solution

U_{0}

obtained by the DNN algorithm; details on the training procedure are given in Remark 2.

Figure 1. Solution

U_{0} (l, λ)

for

l = 0

,

λ \in [90, 130]

(left) and

l \in [0, 30]

,

λ = 90

(right) computed with the DNN-algorithm (green ine) and reference points computed with Monte Carlo (grey dots).

As a reference for the whole set A, we compute for fixed

(l, λ)

approximate values

U^{M C} (l, λ) \approx u (0, l, λ)

with Monte Carlo using

10^{6}

simulated paths for each point

(l, λ)

(paths are simulated with the Euler-Maruyama scheme). The relative

L^{1}

-error between the DNN approximation

U_{0}

and the MC-solution

U_{0}^{M C}

is defined as

\begin{matrix} ϵ : = E [| \frac{U_{0} (ξ^{l}, ξ^{λ}) - U_{0}^{M C} (ξ^{l}, ξ^{λ})}{U_{0}^{M C} (ξ^{l}, ξ^{λ})} |] . \end{matrix}

(19)

Using 1000 simulations of

ξ^{l} \sim Unif ([0, 30]), ξ^{λ} \sim Unif ([90, 130])

, we obtained a relative error of

ϵ = 0.0032 .

(20)

The computation of

U_{0}

via the training of a DNN took around 90 s, whereas the computation of

U_{0}^{M C}

with Monte Carlo for a fixed point

(l, λ)

took around 4.5 s. This shows that the DNN approach is faster than the MC approach if one wants to compute

u (t, l_{i}, λ_{i})

for a grid

(l_{i}, λ_{i})

with more than 20 grid points. Details on the numeric implementation and computational equipment are given in Remark 2.

We continue with a more detailed performance analysis for the fixed space point

(l, λ) = (0, 110)

. As a reference we approximate

u (0, 0, 110)

with Monte Carlo using

10^{7}

simulations. The computation of a single point takes around 2 min, upper and lower bounds for the reference value are

6.2825 \mp 0.0033

(

mean \mp \tilde{σ} / \sqrt{10^{7}}

), where

\tilde{σ}

is the standard deviation of the Monte Carlo simulation. In Table 2 we approximately calculate the mean, the standard deviation, the average relative approximation error (with the Monte Carlo approximation as reference) and the average runtime of

U_{0} (0, 110)

for different numbers of epochs based on 10 independent realizations (10 independent trainings of

U_{0} (\cdot, θ)

).

Table 2. Average value, standard deviation, average relative error of

U_{0} (0, 110)

and average run time for different number of epochs.

In Figure 2 we illustrate Log

_{10}

-average relative error of

U_{0} (0, 110)

relative to the reference value 6.2825, based on 10 independent realizations. We observe that the relative

L^{1}

-error decreases with increasing epochs.

Figure 2. Log

_{10}

-average relative error of

U_{0} (0, 110)

relative to the reference value 6.2825, based on 10 independent realizations.

4.2. Computing the CVA for the Reinsurance Contract

The reinsurance company R may default and we denote by

τ_{R}

the stopping time representing her default time. If

τ_{R} \leq T

, R will not be able to fulfill her obligations which creates reinsurance counterparty credit risk. The default indicator process

H^{R} = {(H_{t}^{R})}_{t \geq 0}

associated with

τ^{R}

is given by

\begin{matrix} H_{t}^{R} = 1_{{τ_{R} \leq t}}, t \geq 0 . \end{matrix}

(21)

We assume that

H^{R}

admits a stochastic intensity

ℓ^{R} = {(ℓ_{t}^{R})}_{t \geq 0}

(in the sequel called the default intensity of R), with the property that

H_{t}^{R} - \int_{0}^{t \land τ_{r}} ℓ_{s}^{R} d s

is a martingale. We assume that

ℓ^{R}

follows the SDE

\begin{matrix} d ℓ_{t}^{R} & = μ^{R} (ℓ_{t}^{R}) d t + σ^{R} (ℓ_{t}^{R}) (ρ d W_{t}^{1} + \sqrt{1 - ρ^{2}} d W_{t}^{2}), ℓ_{0}^{R} = ℓ_{0} \in R, \end{matrix}

(22)

where

W^{2}

is a standard Brownian motion independent of

W^{1}

.

We define the credit value adjustment (CVA) for the reinsurance contract as the market consistent value of the future credit loss, that is,

\begin{matrix} {CVA}_{t} = E [\int_{t}^{T} u (s, L_{s}, λ_{s}^{L} + γ λ_{s}^{L}) e^{- r (s - t)} d H_{s}^{R} | H_{t}, L_{t}, λ_{t}^{L}, ℓ_{t}^{R}], 0 \leq t \leq T, \end{matrix}

(23)

for

γ > 0

and u the value of the insurance contract given in (17). The amount

{CVA}_{t}

can be viewed as a risk reserve that the insurance company has to set aside at time t to cover for losses due to reinsurance counterparty risk.

Our framework accounts for two forms of dependence between the default of R and the market value of the insurance contract. First, there is correlation between default intensity

ℓ^{R}

and claim-arrival intensity

λ^{L}

, modelled by the parameter

ρ \in [0, 1]

, which means that many losses (high claim arrival intensity

λ^{L}

) lead to a ess favourable economic outlook for the reinsurance company. Second, there is pricing contagion: at the default time

τ_{R}

the risk-neutral claim arrival intensity

λ^{L}

jumps upward in (24), which corresponds to an increase of the market value of the reinsurance contract as the claim arrival intensity becomes larger. This models the phenomenon that the default of R will reduce the supply for reinsurance so that the market price of reinsurance goes up. (This is a pure pricing phenomenon; in particular we do not assume that the claim arrival intensity under the real-world measure jumps upward at

τ^{R}

).

In [28] it is shown that the value of the CVA is equal to

\begin{matrix} {CVA}_{t} = (1 - H_{t}^{R}) E [\int_{t}^{T} u (s, L_{s}, λ_{s}^{L} + γ λ_{s}^{L}) ℓ_{s}^{R} e^{- \int_{t}^{s} (r + ℓ_{u}^{R}) d u} d s ∣ L_{t}, λ_{t}^{L}, ℓ_{t}^{R}] . \end{matrix}

(24)

Moreover, the following PIDE characterization of the CVA is given: It holds that

\begin{matrix} {CVA}_{t} = (1 - H_{t}^{R}) f (t, L_{t}, λ_{t}^{L}, ℓ_{t}^{R}), \end{matrix}

(25)

where

f : [0, T] \times R^{+} \times R \times R \to R^{+}

is the unique classical solution of the PIDE

\begin{matrix} f_{t} + L^{(L, λ^{L}, ℓ^{R})} f + ℓ u (t, l, λ + γ λ) = (ℓ + r) f, \end{matrix}

(26)

for all

(t, l, λ, ℓ) \in [0, T) \times R^{+} \times R^{2}

with terminal condition

f (T, l, λ, ℓ) = 0

. The operator

L^{(L, λ^{L}, ℓ^{R})}

is given by

\begin{matrix} L^{(L, λ^{L}, ℓ^{R})} f = f_{λ} μ^{L} (λ) & + f_{ℓ} μ^{R} (ℓ) + \frac{1}{2} f_{λ λ} σ^{L} {(λ)}^{2} + \frac{1}{2} f_{ℓ ℓ} σ^{R} {(ℓ)}^{2} + f_{λ ℓ} ρ σ^{L} (λ) σ^{R} (ℓ) \\ + \int_{R^{+}} (f (t, + z, λ, ℓ) - f (t, l, λ, ℓ)) λ ν (d z), \end{matrix}

(27)

where f is always evaluated at

(t, l, λ, ℓ)

.

We approximate

f (0, l, λ, ℓ)

on the set

B : = {(l, λ, ℓ) : \in [0, 30], λ \in [90, 130], ℓ \in [0, 0.05]}

with a deep neural network

U_{0}^{f} (l, λ, ℓ)

. We use

μ^{R} (ℓ) = (0.05 - ℓ)

and

σ^{R} (ℓ) = 0.1 \sqrt{ℓ}

; the other parameters are as in Section 4.1. Paths of the processes L,

λ^{L}

,

ℓ^{R}

are simulated with the Euler-Maruyama scheme on the time grid

0 = t_{0} < t_{1} < \dots < t_{N} = T, Δ t = T / N

with

N = 20

and

{(ξ^{l}, ξ^{λ}, ξ^{ℓ})}^{T} \sim Unif (B)

. For each path, we approximate the integral term in (24) with the trapezoidal rule using the endpoints of time grid

0 = t_{0} < t_{1} < \dots < t_{N} = T

. The contract value

u (t_{n}, \cdot, \cdot)

is replaced by its neural network approximation

U_{n}^{u}

on the set

A_{n} : = {(l, λ) : \in [0, 30 + 10 n], λ \in [90, 130]}

, which we compute as described in Section 4.1 for all

n = 0, \dots, N

.

In this case study, we are particularly interested in the impact of the correlation parameter

ρ

and of the pricing contagion parameter

γ

on the CVA. We therefore construct a neural network

U_{0}^{f} (l, λ, ℓ; ρ, γ)

with additional input variables

ρ

and

γ

. During the training procedure we use different values of

ρ \sim Unif ([0, 1])

to simulate the paths of

ℓ^{R}

and different values of

γ \sim Unif ([0, 1])

to compute the integral term in (24). In the sequel we write

f^{ρ, γ}

to stress the dependence of f on

ρ

and

γ

. Figure 3 shows

f^{ρ, γ} (0, l, λ, ℓ)

for varying default-intensity ℓ and Figure 4 for varying

ρ \in [0, 1]

in the left panel and varying

γ \in [0, 1]

in the right panel. Moreover, all figures compare the solution for different values of

γ

and

ρ

. We see that

{CVA}_{0}

increases in both

ρ

and

γ

, but the effect of price contagion (i.e., variation in

γ

) is quite pronounced and dominates the effect of dependence between intensities (i.e., variation in

ρ

).

Figure 3. Solution

f^{ρ, γ} (0, l, λ, ℓ)

for

l = 0

,

λ = 100

, varying

ℓ \in [0, 0.05]

and different levels of

γ

and

ρ

with computed with the DNN-algorithm. We have

ρ = 0

(left) resp.

γ = 0

(right).

Figure 4. Solution

^{ρ, γ} (0, l, λ, ℓ)

for

l = 0

,

λ = 100

,

ℓ = 0.05

varying

ρ \in [0, 1]

and different levels of

γ

(left) and varying

γ \in [0, 1]

for different levels of

ρ

(right) computed with the DNN-algorithm.

This part of the case study highlights the computational advantages of the DNN approach compared with nested Monte Carlo methods. Figure 5 shows

f^{ρ, γ} (0, l, λ, ℓ)

for varying claim-arrival intensity

λ

together with reference solution points computed with classical Monte Carlo simulations. In this nested Monte Carlo problem, we use 3000 simulations to approximate the inner function u and 2000 path simulations for the outer expectation. The integral term is approximated following the trapezoidal rule on the time grid

0 = t_{0} < t_{1} < \dots < t_{N} = T

. Computation of only one point takes more than 5 min (despite using only 3000 resp. 2000 simulations). On the other hand training the network approximations of

u (t_{n}, \cdot, \cdot)

for all

n = 0, \dots, N - 1

takes around 20 min and training of the network approximation for

f (0, \cdot, \cdot, \cdot)

takes ess than 4 min, i.e., it takes only around 24 min to train the network for the whole space domain B and for all values of

ρ, γ \in [0, 1]

. This should be contrasted with a computation time of more than 60 min to compute the CVA for only 12 space points (numerical details on the training procedure are given in Remark 2). Due to the computational infeasibility of nested Monte Carlo solutions, we omit a similar error analysis as in (19).

Figure 5. Solution

f^{ρ, γ} (0, l, λ, ℓ)

for

l = 0

,

ℓ = 0.05

varying

λ \in [90, 130]

, different levels of

γ

resp.

ρ

and

ρ = 0

resp.

γ = 0

(left resp. right) computed with the DNN-algorithm. The reference points are computed with Monte Carlo (left, grey dots).

In [28] we moreover addressed the hedging of RCCR by dynamic trading in a credit default swap (CDS) on the reinsurance company using a quadratic hedging approach. The computation of the mean-variance minimizing hedging strategies (Theorem 4.3 [28]) involves computing various derivatives of the functions f and g, where g is the present value of future payments of the CDS. The DNN algorithm is particularly useful for this: we can approximate f and g for

0 = t_{0} < \dots < t_{N} = T

with neural networks on the whole space domain and compute their derivatives via automatic differentiation. In [28] the derivatives of f and g were computed via classical Monte Carlo, which fairly unstable and time-consuming.

Remark 2

(Details regarding the numerical implementation). Our choice of the network architecture largely follows [15]. Throughout, we use DNNs with 2 hidden ayers and 20 nodes for the hidden ayers. For a payoff of the form (16), it is advantageous to use softplus as activation function for the hidden ayers and the identity for the output ayer. The networks are initialized with random numbers using the Xavier initialization. We use mini-batch optimization with mini-batch size

M = 3000

, batch normalization and 5000 epochs for training. We minimize the loss function with the Adam optimizer and decrease learning rate from 0.1 to 0.01 and 0.001 after a 1000 and 3000 epochs. The neural networks used are standard, hence, training induces a cost that is inear in M and e, i.e.,

O (M e)

, where M is the batch size and e the number of epochs, see also Table 2 in the column runtime. Each of the numerical experiments presented in this paper are performed in Python using Tensorflow computations on a Lenovo Thinkpad notebook with an Intel Core i5 processor (1.7 GHz) and 16 GB memory. Note that one can gain a performance boost by using GPU computing, e.g., with Cuda from Nvidia, but to emphasize the practicality of the algorithm we run our computations with a standard CPU.

Remark 3

(Further examples for the inear case). Further examples for the inear case can be found in our companion papers [22,29]. In [22] we consider an example from [30], who study the pricing of basket options in jump diffusion models. In [29] we compute the ruin probability for an insurance company with three different business ines, which leads to a boundary value problem. In both cases the performance of the deep learning method is assessed by comparing the solution to the results of an extensive Monte Carlo simulation. The DNN algorithm performed well in all cases considered.

5. Deep Learning Approximation for Semilinear PIDEs

Next we consider semilinear PIDEs of the form

\begin{matrix} u_{t} (t, x) & + L u (t, x) + f (t, x, u (t, x), \nabla u (t, x)) = 0, (t, x) \in [0, T) \times D, \\ u (t, x) & = g (t, x), (t, x) \in ([0, T) \times (R^{d} \ {D)) \cup ({T} \times R^{d}) . \end{matrix}

(28)

Here,

f : [0, T] \times R^{d} \times R \times R^{d} \to R

is a nonlinear function and the integro-differential operator

L

is given in (3). Our goal is to extend the deep splitting approximation method developed by [14] for semilinear PDEs without boundary conditions to the more general PIDE (28). Moreover, we propose an approach for improving the performance of the method. In order to derive the method with a minimal amount of technicalities, we assume that a classical solution of this PIDE exists. Note however, that the deep splitting approximation applies also in cases where the PIDE (28) admits only a continuous viscosity solution. In that case one has to consider the forward backward stochastic differential equation associated with (28), see see [22] or [17]. The very recent contribution [21] considers an extension of the deep splitting to PIDEs of the form (28) with Neumann boundary conditions, but assumes that f is independent of the gradient of u.

5.1. Basic Method

We divide the interval

[0, T]

into subintervals using a time grid

0 = t_{0} < t_{1} < \dots < t_{N} = T

and we et

Δ t_{n} = t_{n} - t_{n - 1}

. Suppose that

u^{*}

is the unique solution of the PIDE (28). Define for a fixed

u \in C^{0, 1} ([0, T] \times D)

the function

(t, x) \mapsto f_{[u]} (t, x) = f (t, x, u (t, x), \nabla u (t, x)) .

Then

u^{*}

also solves the inear PIDE

\begin{matrix} u_{t} (t, x) + L u (t, x) + f_{[u^{*}]} (t, x) = 0 . \end{matrix}

(29)

Applying the Feynman-Kac formula to the inear PIDE (29), we get the following representation for

u^{*}

\begin{matrix} u^{*} (t_{n - 1}, x) = E [u^{*} (t_{n} \land τ, X_{t_{n} \land τ}^{x}) + \int_{t_{n - 1}}^{t_{n} \land τ} f_{[u^{*}]} (s, X_{s}^{x}) d s], x \in D . \end{matrix}

(30)

Here,

X^{x}

solves the SDE (1) with initial condition

X_{t_{n - 1}}^{x} = x

, that is,

\begin{matrix} X_{t}^{x} = x + \int_{t_{n - 1}}^{t} μ (s, X_{s}^{x}) d s + \int_{t_{n - 1}}^{t} σ (s, X_{s}^{x}) d W_{s} + \int_{t_{n - 1}}^{t} \int_{R^{d}} γ (s, X_{s -}^{x}, z) J (d z, d s), \end{matrix}

(31)

and the stopping time

τ

is given by

τ = inf {s > t_{n - 1} : X_{s}^{x} \notin D}

. In the deep splitting method proposed in [14] the integral term in (30) is approximated by the endpoint rule

\begin{matrix} \int_{t_{n - 1}}^{t_{n}} f_{[u^{*}]} (s, X_{s}^{x}) d s \approx f_{[u^{*}]} (t_{n}, X_{t_{n}}^{x}) Δ t_{n} . \end{matrix}

(32)

An approximate solution

U_{n} (\cdot) \approx u^{*} (t_{n}, \cdot)

,

n = 0, \dots N

is then found by backward induction over the grid points. The solution at the maturity

t_{N} = T

is defined using the terminal condition, that is,

U_{N} (x) : = g (T, x) .

Given

U_{n}

, we then compute

U_{n - 1}

,

n = 1, 2, \dots, N

as follows. First, we put

\begin{matrix} \tilde{u} (t_{n - 1}, x) & : = E [U_{n} (X_{t_{n}}^{x}) 1_{{τ > t_{n}}} + g (τ, X_{τ}^{x}) 1_{{τ \leq t_{n}}} \\ + (t_{n} \land τ - t_{n - 1}) f (t_{n} \land τ, X_{t_{n} \land τ}^{x}, U_{n} (X_{t_{n} \land τ}^{x}), \nabla U_{n} (X_{t_{n} \land τ}^{x}))] . \end{matrix}

(33)

Then, we compute

U_{n - 1} \approx \tilde{u} (t_{n - 1}, \cdot)

by applying the regression-based DNN algorithm method from Section 3. In this step, it is important to work with DNNs with a smooth activation function so that

\nabla U_{n}

is well-defined (If

g (T, \cdot)

is not

C^{1}

one has to define

U_{N}

as DNN approximation to the terminal condition; see for instance [17]).

Remark 4.

An additional complication may arise if the domain D is unbounded. In that case, we compute the approximate solution on some bounded set

A_{n} \subset D

. We have to make sure that the set

A_{n}

is big enough so that the probability of generating paths with

X_{t_{n - 1}} \in A_{n - 1}

but

X_{t_{n}} \notin A_{n}

is sufficiently small. Then, these paths can be ignored without affecting the training procedure. There are various ways to achieve this, see [14] for details.

5.2. Alternative Method

Next, we discuss an alternative linearization procedure based on the midpoint rule

\begin{matrix} \int_{t_{n - 1}}^{t_{n}} f_{[u^{*}]} (s, X_{s}^{x}) d s \approx f_{[u^{*}]} ({\bar{t}}_{n}, X_{{\bar{t}}_{n}}^{x})) Δ t_{n} . \end{matrix}

(34)

Here,

{\bar{t}}_{n} = (t_{n - 1} + t_{n}) / 2

is the midpoint of the interval

[t_{n - 1}, t_{n}]

. It is well known that (34) usually provides a better approximation to an integral than the endpoint rule (32). We therefore propose the following approximation scheme based on the midpoint rule. In a first step, we apply the approximation (33) over the smaller interval

[{\bar{t}}_{n}, t_{n}]

; this yields an approximate solution

{\bar{U}}_{n}

at

{\bar{t}}_{n}

. Next, we define

\begin{matrix} \tilde{u} (t_{n - 1}, x) & : = E [U_{n} (X_{t_{n}}^{x}) 1_{{τ > t_{n}}} + g (τ, X_{τ}^{x}) 1_{{τ \leq t_{n}}} \\ + (t_{n} \land τ - t_{n - 1}) f ({\bar{t}}_{n} \land τ, X_{{\bar{t}}_{n} \land τ}^{x}, {\bar{U}}_{n} (X_{{\bar{t}}_{n} \land τ}^{x}), \nabla {\bar{U}}_{n} (X_{{\bar{t}}_{n} \land τ}^{x}))]; \end{matrix}

(35)

as before, a numerical approximation

U_{n - 1}

to

\tilde{u} (t_{n - 1}, x)

is computed via the deep learning algorithm from Section 3. Our numerical experiments in Section 6 show that the approximation based on the midpoint rule performs significantly better than the original deep splitting method from [14].

Remark 5

(Convergence results for the semilinear case). The companion paper [22] provides a convergence result for semilinear Cauchy problems of the form (28) as well. The approximation error is divided into three components: the Euler approximation error from the time discretization (which is of order

Δ t_{m}

), an error due to the linearization of the equation and finally an approximation error in the class of neural networks considered. A precise formulation is more involved than in the inear case and we refer to [22] for details.

6. Case Study for the Semilinear Case: Optimal Credit Portfolios with Risk Capital Constraints

As a case study for the semilinear case, we consider the problem of optimizing the size of the credit portfolio of a financial institution under transaction costs and risk capital constraints. Without capital constraints, the problem admits an analytic solution and is therefore a useful test case. With capital constraints, the model leads to a semilinear boundary value problem for which there is no explicit solution, and we study this case numerically using the deep splitting method. In fact, the analysis of the impact of risk capital constraints on optimal credit portfolios is also interesting in its own right.

We consider a stylized model of a financial institution who invests in a credit portfolio with market value

S_{t}

at time t and in cash. We assume that the process S has dynamics

d S_{t} = \bar{μ} d t + σ d W_{t} - d R_{t},

Here, the Brownian motion W models the compensated small credit losses, the compound Poisson process R with intensity

λ

and jump size distribution

η (d z)

models the large credit losses, and

\bar{μ} > 0

is the expected return earned by the portfolio. We assume that the credit portfolio can be adjusted only gradually over time. In mathematical terms, this means that the financial institution uses an absolutely continuous trading strategy with trading rate

θ = {(θ_{t})}_{0 \leq t \leq T}

, so that for a given strategy

θ

, the size

Q^{θ}

of the portfolio satisfies

d Q_{t}^{θ} = θ_{t} d t

. Moreover, the financial institution incurs a transaction cost, that is proportional to the trading rate

θ_{t}^{2}

. This models the fact that a rapid adjustment of the portfolio is expensive, as the institution needs to carry out costly marketing activities or enter into expensive securitization activities. Given a parameter

κ > 0

modelling the size of the transaction cost, the cash position

C^{θ}

has dynamics

d C_{t}^{θ} = - (θ_{t} S_{t} + κ θ_{t}^{2}) d t .

Here, the term

- θ_{t} S_{t} d t

gives the revenue from trading the credit portfolio and the term

- κ θ_{t}^{2} d t

models the reduction of the cash position due to transaction costs. From a mathematical viewpoint, this model is fairly similar to models used in the iterature on optimal portfolio execution such as [31] or [32].

Denote by

E_{t}^{θ} = S_{t} Q_{t}^{θ} + C_{t}^{θ}

the institution’s equity. We assume that the institution wants to maximize the expected value of her equity position

E_{T}^{θ}

at some horizon date time T, and that she incurs a liquidation cost of size

γ {(Q_{T}^{θ})}^{2}

for some parameter

γ > 0

. Note that for a given strategy

θ

the process

(Q^{θ}, E^{θ})

has dynamics

\begin{matrix} d Q_{t}^{θ} & = θ_{t} d t, \\ d E_{t}^{θ} & = Q_{t}^{θ} \bar{μ} d t + Q_{t}^{θ} σ d W_{t} - Q_{t}^{θ} d R_{t} - κ θ_{t}^{2} d t . \end{matrix}

(36)

Hence, the pair

(Q^{θ}, E^{θ})

is Markov and we may use this process as state process. We define the value function of the optimization problem for

t \in [0, T]

as

\begin{matrix} u (t, q, e) : = sup_{θ \in A} E [E_{T}^{θ} - γ {(Q_{T}^{θ})}^{2} ∣ Q_{t}^{θ} = q, E_{t}^{θ} = e], \end{matrix}

(37)

where

A

denotes the set of all adapted trading rates such that

E [\int_{0}^{T} θ_{t}^{2} d t] < \infty

.

To prevent early ruin, financial regulators usually impose risk capital constraints. In particular, they may liquidate a financial institution if the equity is too low compared to the size or riskiness of the institution’s credit portfolio. Below, we use the deep splitting method to study how capital constraints affects the value function and the optimal strategy of the financial institution. First, we consider the unconstrained problem for which there is an explicit solution of the HJB equation (PIDE (40) below), which can be used to test the performance of the deep splitting method. Details on the numerical implementation and the computational equipment are given in Remark 6.

6.1. The Case without Constraints

By standard arguments, the HJB equation associated with the problem (37) is

\begin{matrix} u_{t} (t, q, e) & + q \bar{μ} u_{e} (t, q, e) + \frac{1}{2} σ^{2} q^{2} u_{e e} (t, q, e) + λ \int_{0}^{\infty} [u (t, q, e + q z) - u (t, q, e)] η (d z) \\ + sup_{θ \in R} {θ u_{q} (t, q, e) - κ θ^{2} u_{e} (t, q, e)} = 0, \end{matrix}

(38)

with terminal condition

u (T, q, e) = e^{2} - γ q^{2}

. Maximization gives the candidate optimal trading rate

\begin{matrix} θ_{t}^{*} (t, q, e) = \frac{1}{2 κ} \frac{u_{q} (t, q, e)}{u_{e} (t, q, e)}; \end{matrix}

(39)

substitution into (38) yields the semilinear PIDE

\begin{matrix} u_{t} (t, q, e) & + q \bar{μ} u_{e} (t, q, e) + \frac{1}{2} σ^{2} q^{2} u_{e e} (t, q, e) + λ \int_{0}^{\infty} [u (t, q, e + q z) - u (t, q, e)] η (d z) \\ + \frac{1}{4 κ} \frac{u_{q} {(t, q, e)}^{2}}{u_{e} (t, q, e)} = 0 . \end{matrix}

(40)

To solve the case without constraints, we make the Ansatz

u (t, q, e) = e + v (t, q)

, that is, we assume that u is inear in the equity value e. This implies

λ \int_{0}^{\infty} [u (t, q, e + q z) - u (t, q, e)] η (d z) = λ q \int_{0}^{\infty} z η (d z) = : q λ \bar{η} .

If we define

\bar{α} = \bar{μ} - λ \bar{η}

, (40) reduces to the following first order PDE for v

\begin{matrix} - \bar{α} q = v_{t} (t, q) + \frac{v_{q} {(t, q)}^{2}}{4 κ}, \end{matrix}

(41)

with terminal condition

v (T, q) = - γ q^{2}

. To find an explicit solution for v, we follow [31] and make the Ansatz

v (t, q) = h_{0} (t) + h_{1} (t) q - \frac{1}{2} h_{2} (t) q^{2} .

Substitution in the HJB Equation (41) gives the following ODE system for

h_{0}, h_{1}, h_{2}

\begin{matrix} h_{2}^{'} = h_{2}^{2} / (2 κ), h_{1}^{'} = - \bar{α} + h_{1} h_{2} / (2 κ), h_{0}^{'} = - h_{1}^{2} / (4 κ) \end{matrix}

(42)

with terminal condition

h_{0} (T) = h_{1} (T) = 0

and

h_{2} (T) = 2 γ

. The ODEs can be solved explicitly: one has

\begin{matrix} h_{2} (t) & = \frac{2 κ}{T + \frac{κ}{γ} - t}, \end{matrix}

(43)

\begin{matrix} h_{1} (t) & = \frac{c_{1} + \frac{1}{2} \bar{α} t^{2} - a \bar{α} t}{a - t}, where c_{1} : = - \frac{1}{2} \bar{α} T^{2} + a \bar{α} T and a : = T + \frac{κ}{γ}, \end{matrix}

(44)

\begin{matrix} h_{0} (t) & = \frac{1}{4 κ (a - t)} (- \frac{2}{3} {\bar{α}}^{2} a^{4} + \frac{2}{3} {\bar{α}}^{2} a^{3} t + 2 \bar{α} a^{2} c_{1} + \bar{α} c_{1} t^{2} - 2 \bar{α} a c_{1} t - c_{1}^{2} + \frac{1}{12} {\bar{α}}^{2} t^{4} - \frac{1}{3} {\bar{α}}^{2} a t^{3}) . \end{matrix}

(45)

For comparison purposes we computed the solution

u (t, x) = u (t, q, e)

of (40) with the deep splitting algorithm on the set A:=

{(q, e) : - 7 \leq q \leq 7, 0 \leq e \leq 6}

. For this we partition the time horizon into

N = 10

intervals

0 = t_{0} < t_{1} < \dots < t_{N} = T

and simulate the auxiliary process

X = (X^{Q}, X^{E})

with

X_{t}^{Q} = ξ^{Q}

and

\begin{matrix} X_{t}^{E} & = ξ^{E} + \int_{t_{n - 1}}^{t} X_{s}^{Q} \bar{μ} d s + \int_{t_{n - 1}}^{t} X_{s}^{Q} σ d W_{s} + \int_{t_{n - 1}}^{t} \int_{R} X_{s}^{Q} z J (d z, d s), \end{matrix}

(46)

where

ξ = (ξ^{Q}, ξ^{E})

is uniformly distributed on

[- 7, 7] \times [0, 7]

, W is a Brownian motion and J is a jump measure with compensating measure

ν = λ η

for

η

a Gamma distribution with parameters

α

and

β

. Moreover, we identify the nonlinear term in the PIDE as

\begin{matrix} f (t, x, y, z) = \frac{1}{4 κ} \frac{z_{1}^{2}}{z_{2}} . \end{matrix}

(47)

We used the midpoint rule (34) to linearize the PIDE. The network architecture was as in Remark 6. The training for the whole network consisting of 20 subnetworks (a DNN for every grid point

t_{n}

,

0 \leq n \leq 9

and a DNN for every midpoint

{\bar{t}}_{n}

) took around 4400 s. Table 3 contains the parameters used in the experiment. Figure 6 shows the approximate solution

U_{0}

obtained by the deep splitting algorithm (solid ine) and the analytic reference solution (dashed ine) for varying

q \in [- 7, 7]

in the left panel and varying

e \in [0, 6]

in the right panel.

Table 3. Parameters for the credit optimization problem.

Figure 6. Solution of the unconstrained credit optimization problem for

q \in [- 7, 7]

,

e = 3

(left) and

q = - 4

,

e \in [0, 6]

(right) with the DNN-algorithm (green) and reference solution (orange).

As the solution u can be zero, we report the absolute

L^{1}

-error between the DNN approximation

U_{n}

and the analytic solution, that is,

ε_{t_{n}}^{abs} : = E [| U_{n} (ξ) - u (t_{n}, ξ) |] .

To compute the absolute error, we used 10,000 simulations of

ξ = (ξ^{Q}, ξ^{E})

where

ξ^{Q} \sim Unif ([- 7, 7])

and

ξ^{E} \sim Unif ([0, 6])

. Figure 7 illustrates the mean

ε_{t_{n}}^{abs}

of 10 different DNN approximations

U_{n}^{i}, i = 1, \dots, 10

for each time point

t_{n}

,

n = 0, \dots, N - 1

using once the midpoint rule (solid ine) and once the original method from [14] (dashed ine). We see that the algorithm based on the midpoint rule performs substantially better than the original deep splitting method.

Figure 7. Comparison of the two different training methods illustrating the averaged absolute

L^{1}

-error

{\bar{ε}}_{t}^{abs}

using endpoint (grey) and midpoint (black) approximation in 10 training procedures.

6.2. The Optimization Problem with Constraints

Next, we discuss the optimization problem under risk capital constraints. We expect that risk capital constraints will alter the investment decisions of the financial institution since she wants to avoid liquidation of the company. To set up the corresponding optimization problem, we need to specify the set D of acceptable positions of the financial institution and the liquidation value given that her position exits from D. For numerical reasons, it is convenient to model D as a bounded set. Given large constants

\bar{e}

and

\bar{q}

and a parameter

δ > 0

, we define the set D of acceptable positions as

\begin{matrix} D = {(q, e) \in R^{2} : | q | < \bar{q}, δ | q | < e < \bar{e}} . \end{matrix}

(48)

The constraint

e > δ | q |

implies that before liquidation the solvency ratio

| Q_{t} | / E_{t}

of the institution is bounded by

δ^{- 1}

, so that

δ^{- 1}

can be viewed as a measure of the regulator’s risk tolerance. We assume that the liquidation value of the portfolio on the boundary of D is given by the value

u^{unreg}

of the unconstrained problem, reduced by a penalty

k (t, q, e) \geq 0

, and we set

g (t, q, e) = u^{unreg} (t, q, e) - k (t, q, e) .

Here,

k : [0, T] \times [- \bar{q}, \bar{q}] \times [0, \bar{e}] \to [0, \infty)

is a smooth function that penalizes exit from the set of acceptable positions. We assume that

k (t, q, \bar{e}) = 0, 0 \leq t \leq T

(no penalization for high equity values) and that

k (T, q, e) = 0, 0 \leq e \leq \bar{e}

(no penalization at maturity). In our numerical experiments, we take

\bar{e} = 10^{6}

,

δ = 1

, and

k (t, q, e) = 0.5 (T - t) (\bar{e} - e) / \bar{e}

; the other parameters are identical to the unconstrained case, see Table 3.

The value function for the problem with risk capital constraints is

\begin{matrix} u^{reg} (t, q, e) = sup_{θ \in A} E [g (τ \land T, Q_{τ \land T}^{θ}, E_{τ \land T}^{θ}) | Q_{t} = q, E_{t} = e], t \in [0, T], (q, e) \in D, \end{matrix}

(49)

where

τ = inf {s \geq t : (Q^{θ}, E^{θ}) \notin D}

. The PIDE associated with this control problem and the form of the optimal strategy are the same as in the unregulated case (see (38) and (39)), but now we have in addition the boundary condition

u (t, q, e) = u^{unreg} (t, q, e) - k (t, q, e), t \in [0, T), (q, e) \notin D .

Due to the boundary condition, the Ansatz

u^{reg} (t, q, e) = e + v (t, q)

does not hold and we have to compute

u^{reg}

via the deep splitting algorithm. In view of its superior performance in the unconstrained case, we worked with the midpoint procedure, and we used the same network architecture as in the unconstrained case. We worked on the set

A = {(q, e) : - e \leq q \leq e, 0 \leq e \leq 6} \subset D

. Figure 8 illustrates the DNN-approximation for

u^{reg} (0, q, e)

(solid ine) and

u^{unreg} (0, q, e)

(dashed ine) on the sections

{(q, 4) : q \in [- 4, 4]}

(left panel) and

{(2, e) : e \in [2, 6]}

(right panel). The right plot shows that with risk capital constraints the value function is concave in e and for

e \approx 2

significantly lower than

u^{unreg}

(for

δ = 1

the point

(q, e) = (2, 2)

belongs to the lower bound of D).

Figure 8. DNN-approximations of

u^{r e g} (0, q, e)

(green) and

u^{u n r e g} (0, q, e)

(orange) on the section

{(q, 4) : q \in [- 4, 4]}

(left) and on

{(2, e) : e \in [2, 6]}

(right).

The optimal trading rate is plotted in Figure 9. The plots show that the optimal strategy for the regulated case (solid ine) differs from the optimal strategy in the unregulated case (dashed ine), as the financial institution wants to reduce the size

| q |

of her risky position in order to avoid costly liquidation, in particular when her position is close to the liquidation boundary. For instance, in the right plot the optimal trading rate for

q = 2

is negative for e close to 2.

Figure 9. DNN-approximations of the optimal strategy

θ_{0}^{*}

for the regulated case (green) and the unregulated case (orange) on

{(q, 4) : q \in [- 4, 4]}

(left) and on

{(2, e) : e \in [2, 6]}

(right).

Remark 6

(Details on the numerical implementation). We used a similar network architecture as in [14]. We worked with deep neural networks with 2 hidden ayers consisting of 100 nodes each. All neural networks are initialized with random numbers using the Xavier initialization. We use mini-batch optimization with mini-batch size

M = 5000

, batch normalization and 12,000 training epochs. The loss function is minimized with the Adam optimizer and decreasing learning rate from 0.1 to 0.01, 0.001, 0.0001 and 0.00001 after 2000, 4000, 6000 and 8000 epochs. We use sigmoid as activation function for the hidden ayers and theidentity for the output ayer. The neural networks used are standard, and, hence, training induces a cost that is inear in M, e and N, i.e.,

O (N M e)

, where M is the batch size, e the number of epochs and N is the size of the time discretization. All computations are performed in Python using Tensorflow computations on a Lenovo Thinkpad notebook with an Intel Core i5 processor (1.7 GHz) and 16 GB memory.

Remark 7

(Further examples for the semilinear case). For an additional application of our DNN-algorithm to a semilinear PIDE, we refer to the companion paper [22] where the semilinear stochastic inear regulator problem is studied. It is well known that there exists an analytical solution (e.g., see [33]), that can be used to verify the accuracy of the numerical method. However, there are not many (non-trivial) semilinear problems of type (7) with an explicit solution, and the majority of standard numerical methods is quite involved, so that it is not straightforward to come up with further test cases.

7. Conclusions

In this paper we studied an extension of the regression-based deep neural network algorithms of [14,15,17] to PIDE problems from insurance and finance. We illustrated the performance of our methodology, carrying out extensive tests for several multi-dimensional PIDEs arising in typical financial or actuarial pricing and control problems. In all test cases, the performance of the proposed approximation algorithms was satisfying in terms of accuracy and speed. Our results suggest that deep neural network algorithms might become a useful enhancement of the toolbox for solving many valuation and control problems in insurance and finance.

We continue with some ideas for further research. A possible alternative to the methodology proposed in this paper is to extend some of the deep backward dynamic programming schemes discussed in [17] to the PIDE case, similarly as in [18]. However, instead of approximating the increment

u (t_{n}, x + Δ x) - u (t_{n}, x)

by a third (large) network, one might use the solution at time

t_{n + 1}

as this is ess involved numerically. Moreover, it might be interesting to extend the method to the more general class of fully nonlinear PIDEs.

Author Contributions

Conceptualization and Writing, R.F. and V.K.; Numerical analysis and Coding V.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We are grateful to Michaela Szoelgyenyi and Christa Cuchiero for useful remarks and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

PIDE	Partial integro differential equation
PDE	Partial differential equation
BSDE	Backward stochastic differential equation
FBSDEJ	Forward backward stochastic differential equation with jumps
DNN	Deep neural network
MC	Monte Carlo
RCCR	Reinsurance counterparty credit risk
CVA	Credit value adjustment

References

Cont, R.; Voltchkova, E. A finite difference scheme for option pricing in jump diffusion and exponential Lévy models. SIAM J. Numer. Anal. 2005, 43, 1596–1626. [Google Scholar] [CrossRef]
Andersen, L.; Andreasen, J. Jump-diffusion processes: Volatility smile fitting and numerical methods for option pricing. Rev. Deriv. Res. 2000, 4, 231–262. [Google Scholar] [CrossRef]
Matache, A.M.; Von Petersdorff, T.; Schwab, C. Fast deterministic pricing of options on Lévy driven assets. ESAIM Math. Model. Numer. Anal. 2004, 38, 37–71. [Google Scholar] [CrossRef]
Kwon, Y.; Lee, Y. A second-order finite difference method for option pricing under jump-diffusion models. SIAM J. Numer. Anal. 2011, 49, 2598–2617. [Google Scholar] [CrossRef]
Briani, M.; Natalini, R.; Russo, G. Implicit–explicit numerical schemes for jump–diffusion processes. Calcolo 2007, 44, 33–57. [Google Scholar] [CrossRef]
Giles, M.B. Multilevel Monte Carlo path simulation. Oper. Res. 2008, 56, 607–617. [Google Scholar] [CrossRef]
Glasserman, P. Monte Carlo Methods in Financial Engineering; Springer: New York, NY, USA, 2003. [Google Scholar]
Metwally, S.A.; Atiya, A.F. Using Brownian bridge for fast simulation of jump-diffusion processes and barrier options. J. Deriv. 2002, 10, 43–54. [Google Scholar] [CrossRef][Green Version]
Han, J.; Jentzen, A.; Weinan, E. Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. USA 2018, 115, 8505–8510. [Google Scholar] [CrossRef]
Han, J.; Jentzen, A. Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Commun. Math. Stat. 2017, 5, 349–380. [Google Scholar] [CrossRef]
Han, J.; Long, J. Convergence of the deep BSDE method for coupled FBSDEs. Probab. Uncertain. Quant. Risk 2020, 5, 5. [Google Scholar] [CrossRef]
Kremsner, S.; Steinicke, A.; Szölgyenyi, M. A deep neural network algorithm for semilinear elliptic PDEs with applications in insurance mathematics. Risks 2020, 8, 136. [Google Scholar] [CrossRef]
Huré, C.; Pham, H.; Warin, X. Deep backward schemes for high-dimensional nonlinear PDEs. Math. Comput. 2020, 89, 1547–1579. [Google Scholar] [CrossRef]
Beck, C.; Becker, S.; Cheridito, P.; Jentzen, A.; Neufeld, A. Deep splitting method for parabolic PDEs. SIAM J. Sci. Comput. 2021, 43, A3135–A3154. [Google Scholar] [CrossRef]
Beck, C.; Becker, S.; Grohs, P.; Jaafari, N.; Jentzen, A. Solving the Kolmogorov PDE by means of deep learning. J. Sci. Comput. 2021, 88, 73. [Google Scholar] [CrossRef]
Pham, H.; Warin, X.; Germain, M. Neural networks-based backward scheme for fully nonlinear PDEs. SN Partial. Differ. Equ. Appl. 2021, 2, 16. [Google Scholar] [CrossRef]
Germain, M.; Pham, H.; Warin, X. Approximation error analysis of some deep backward schemes for nonlinear PDEs. SIAM J. Sci. Comput. 2022, 44, A28–A56. [Google Scholar] [CrossRef]
Castro, J. Deep Learning Schemes For Parabolic Nonlocal Integro-Differential Equations. arXiv 2021, arXiv:2103.15008. [Google Scholar] [CrossRef]
Al-Aradi, A.; Correia, A.; Naiff, D.d.F.; Jardim, G.; Saporito, Y. Applications of the Deep Galerkin Method to Solving Partial Integro-Differential and Hamilton-Jacobi-Bellman Equations. arXiv 2019, arXiv:1912.01455. [Google Scholar]
Sirignano, J.; Spiliopoulos, K. DGM: A deep learning algorithm for solving partial differential equations. J. Comput. Phys. 2018, 375, 1339–1364. [Google Scholar] [CrossRef]
Boussange, V.; Becker, S.; Jentzen, A.; Kuckuck, B.; Pellissier, L. Deep learning approximations for non-local nonlinear PDEs with Neumann boundary conditions. arXiv 2022, arXiv:2205.03672. [Google Scholar]
Frey, R.; Köck, V. Convergence Analysis of the Deep Splitting Scheme: The Case of Partial Integro-Differential Equations and the associated FBSDEs with Jumps. arXiv 2022, arXiv:2206.01597. [Google Scholar]
Gihman, I.; Skohorod, A. The Theory of Stochastic Processes; Springer: New York, NY, USA, 1980; Volume III. [Google Scholar]
Kliemann, W.; Koch, G.; Marchetti, F. On the unnormalized solution of the filtering problem with counting process observations. IEEE Trans. Inf. Theory 1990, 36, 1415–1425. [Google Scholar] [CrossRef]
Ethier, S.; Kurtz, T.G. Markov Processes: Characterization and Convergence; Wiley: New York, NY, USA, 1986. [Google Scholar]
Pham, H. Optimal stopping of controlled jump diffusion processes: A viscosity solution approach. J. Math. Syst. Estim. Control 1998, 8, 1–27. [Google Scholar]
Colaneri, K.; Frey, R. Classical Solutions of the Backward PIDE for Markov Modulated Marked Point Processes and Applications to CAT Bonds. Insur. Math. Econ. 2021, 101 Pt B, 498–507. [Google Scholar] [CrossRef]
Ceci, C.; Colaneri, K.; Frey, R.; Köck, V. Value adjustments and dynamic hedging of reinsurance counterparty risk. SIAM J. Financ. Math. 2020, 11, 788–814. [Google Scholar] [CrossRef]
Frey, R.; Köck, V. Deep Neural Network Algorithms for Parabolic PIDEs and Applications in Insurance Mathematics. arXiv 2021, arXiv:2109.11403. [Google Scholar]
Xu, G.; Zheng, H. Approximate basket options valuation for a jump-diffusion model. Insur. Math. Econ. 2009, 45, 188–194. [Google Scholar] [CrossRef]
Cardaliaguet, P.; LeHalle, C.A. Mean Field Game of Controls and an Application to Trade Crowding. Math. Financ. Econ. 2018, 12, 335–363. [Google Scholar] [CrossRef]
Cartea, Á.; Jaimungal, S.; Penalva, J. Algorithmic and High-Frequency Trading; Cambridge University Press: Cambridge, UK, 2015. [Google Scholar]
Øksendal, B.K.; Sulem, A. Applied Stochastic Control of Jump Diffusions; Springer: Berlin/Heidelberg, Germany, 2007; Volume 498. [Google Scholar]

Figure 1. Solution

U_{0} (l, λ)

for

l = 0

,

λ \in [90, 130]

(left) and

l \in [0, 30]

,

λ = 90

(right) computed with the DNN-algorithm (green ine) and reference points computed with Monte Carlo (grey dots).

Figure 2. Log

_{10}

-average relative error of

U_{0} (0, 110)

relative to the reference value 6.2825, based on 10 independent realizations.

Figure 3. Solution

f^{ρ, γ} (0, l, λ, ℓ)

for

l = 0

,

λ = 100

, varying

ℓ \in [0, 0.05]

and different levels of

γ

and

ρ

with computed with the DNN-algorithm. We have

ρ = 0

(left) resp.

γ = 0

(right).

Figure 4. Solution

^{ρ, γ} (0, l, λ, ℓ)

for

l = 0

,

λ = 100

,

ℓ = 0.05

varying

ρ \in [0, 1]

and different levels of

γ

(left) and varying

γ \in [0, 1]

for different levels of

ρ

(right) computed with the DNN-algorithm.

Figure 5. Solution

f^{ρ, γ} (0, l, λ, ℓ)

for

l = 0

,

ℓ = 0.05

varying

λ \in [90, 130]

, different levels of

γ

resp.

ρ

and

ρ = 0

resp.

γ = 0

(left resp. right) computed with the DNN-algorithm. The reference points are computed with Monte Carlo (left, grey dots).

Figure 6. Solution of the unconstrained credit optimization problem for

q \in [- 7, 7]

,

e = 3

(left) and

q = - 4

,

e \in [0, 6]

(right) with the DNN-algorithm (green) and reference solution (orange).

Figure 7. Comparison of the two different training methods illustrating the averaged absolute

L^{1}

-error

{\bar{ε}}_{t}^{abs}

using endpoint (grey) and midpoint (black) approximation in 10 training procedures.

Figure 8. DNN-approximations of

u^{r e g} (0, q, e)

(green) and

u^{u n r e g} (0, q, e)

(orange) on the section

{(q, 4) : q \in [- 4, 4]}

(left) and on

{(2, e) : e \in [2, 6]}

(right).

Figure 9. DNN-approximations of the optimal strategy

θ_{0}^{*}

for the regulated case (green) and the unregulated case (orange) on

{(q, 4) : q \in [- 4, 4]}

(left) and on

{(2, e) : e \in [2, 6]}

(right).

Table 1. Parameters used for the valuation of the stop-loss contract. The claim sizes are Gamma

(α, β)

distributed with density function

f_{α, β}

.

Table 1. Parameters used for the valuation of the stop-loss contract. The claim sizes are Gamma

(α, β)

distributed with density function

f_{α, β}

.

$μ^{L} (λ)$	$σ^{L} (λ)$	$α$	$β$	K
$0.5 (100 - λ)$	$0.2 λ$	1	1	110

Table 2. Average value, standard deviation, average relative error of

U_{0} (0, 110)

and average run time for different number of epochs.

Table 2. Average value, standard deviation, average relative error of

U_{0} (0, 110)

and average run time for different number of epochs.

Epochs	Mean	Std. Deviation	Rel. Error	Runtime
1000	6.3126	0.5001	0.0645	17.6
2000	6.3048	0.1151	0.0208	34.9
3000	6.3073	0.1388	0.0189	52.1
4000	6.3360	0.0898	0.0122	69.2
5000	6.3116	0.0468	0.0068	86.3

Table 3. Parameters for the credit optimization problem.

$σ$	$γ$	$κ$	$\bar{μ}$	$λ$	$α$	$β$
$0.1$	$0.1$	$0.1$	$0.8$	5	$0.4$	4

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Deep Neural Network Algorithms for Parabolic PIDEs and Applications in Insurance and Finance

Abstract

1. Introduction

2. Modelling Framework

3. Deep Neural Network Approximation for Linear PIDEs

3.1. Representation as Solution of a Minimization Problem

3.2. The Algorithm

4. Case Study for the Linear Case: Reinsurance Counterparty Credit Risk

4.1. Valuation of an Insurance Contract with Doubly Stochastic Poisson Arrivals

4.2. Computing the CVA for the Reinsurance Contract

5. Deep Learning Approximation for Semilinear PIDEs

5.1. Basic Method

5.2. Alternative Method

6. Case Study for the Semilinear Case: Optimal Credit Portfolios with Risk Capital Constraints

6.1. The Case without Constraints

6.2. The Optimization Problem with Constraints

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Article Metrics

Citations

Article Access Statistics