Phase-Type Models in Life Insurance : Fitting and Valuation of Equity-Linked Benefits

Phase-type (PH) distributions are defined as distributions of lifetimes of finite continuous-time Markov processes. Their traditional applications are in queueing, insurance risk, and reliability, but more recently, also in finance and, though to a lesser extent, to life and health insurance. The advantage is that PH distributions form a dense class and that problems having explicit solutions for exponential distributions typically become computationally tractable under PH assumptions. In the first part of this paper, fitting of PH distributions to human lifetimes is considered. The class of generalized Coxian distributions is given special attention. In part, some new software is developed. In the second part, pricing of life insurance products such as guaranteed minimum death benefit and high-water benefit is treated for the case where the lifetime distribution is approximated by a PH distribution and the underlying asset price process is described by a jump diffusion with PH jumps. The expressions are typically explicit in terms of matrix-exponentials involving two matrices closely related to the Wiener-Hopf factorization, for which recently, a Lévy process version has been developed for a PH horizon. The computational power of the method of the approach is illustrated via a number of numerical examples.


Introduction
A random variable (r.v.) τ is phase-type (PH) with representation (α, T) if it is distributed as the lifetime inf{t ≥ 0 : J t = †} of a killed (terminating) time-homogenous Markov process J = {J t } t≥0 with p < ∞ states, initial distribution α (a row vector), and generator T, where † is an additional absorbing state.PH distributions originate from queueing theory and the work of A.K. Erlang, A. Jensen and M.F.Neuts.However, in recent decades, they have found numerous applications to different areas.The main reason for this is that many calculations that are explicit for exponential distributions are often computationally tractable with PH assumptions and that, in addition, PH distributions are dense so that a given distribution F on [0, ∞) can be approximated arbitrarily well by a PH distribution provided one takes p large enough.For surveys of the theory and applications, see (Asmussen 2003;Asmussen and Albrecher 2010;Bladt and Nielsen 2017).The present paper is concerned with some applications to life insurance, an area where PH distributions have been far less employed than in the related areas of non-life insurance.There are many papers in ruin theory that assume that claim sizes are PH distributed; see (Cai and Li 2005b) and the review paper (Bladt 2005).Some papers assume that the claim time is PH distributed; see (Stanford et al. 2000).(Zadeh et al. 2014) considered disability insurance using a PH model.(Zadeh and Stanford 2016) studied credibility theory using PH distributions.In finance, PH distributions have also been used extensively.For example, (Asmussen et al. 2004) considered option pricing under an exponential phase-type Lévy model; (Cai and Li 2005a) dealt with applications of PH distributions to risk measures; and (Egami and Yamazaki 2014) considered PH approximations of Lévy models.On the contrary, the only references of applications of PH distributions to life insurance we know of are (Lin and Liu 2007) and (Zadeh et al. 2014).
The motivation for the present study is some valuation problems of equity-linked products in life insurance considered in Gerber et al. (2012Gerber et al. ( , 2013Gerber et al. ( , 2015) ) (GSY).Let τ = τ x be the remaining lifetime of an insured of age x when signing the contract, then the payment of this class of benefits has the form Ψ(S τ , S τ ) where S t is the price at time t of some stock or stock index and S t = max v≤t S t the running maximum.An example of such a benefit is the guaranteed minimum death benefit (GMDB) max(S τ , K), but there are many others; see the list in Section 4. Defining X t as X t = log(S t /S 0 ), M t = X t as the running maximum of X, and D t = M t − X t as the displacement from the maximum, we then have: where Φ(m, d) = Ψ(S 0 e m , S 0 e m−d) ) (1) (note that X t = M t − D t ), and the fair value of the benefit is the expected discounted payment.
A widely-used model (Cont and Tankov 2004) assumes that X = {X t } t≥0 is a Lévy process, and the key vehicle in GSY for the computations for such expectations is the classical Wiener-Hopf factorization: Lemma 1.For a Lévy process X t and an independent exponential time τ, the r.v.'s M τ = max 0≤s≤τ X s and D τ = M τ − X τ are independent.Further, For the geometric Brownian motion model where X is a Brownian motion (BM) with drift, M τ and D τ both have an exponential distribution, and for more general jump diffusions, a number of computational procedures have been developed (see Section 5 for references).In GSY, they approximate the density/tail of τ x by a linear combination of exponential terms, which allows them to reduce the computations to simple integrations using Lemma 1.
The approximation of the distribution of τ x by a linear combination of exponential terms in GSY uses an ad hoc optimization procedure.The main problem there is that it is very difficult to make the linear combination of exponential densities to be a density function; in particular, very often, in the tail part, the linear combination of exponential densities can be negative.Since, in the application to valuing equity-linked insurance products, the distribution at the tail part plays an important role, this motivated us to consider using the PH distribution to approximate the future lifetime distribution.For PH distributions, more classical statistical methods using maximum likelihood have, however, been developed (Asmussen et al. 1996), and in view of the denseness of PH distributions and the fact that they have proven a computationally-tractable generalization of the exponential distribution in other settings, a natural idea is therefore to use a PH approximation τ to τ x .The first step in implementing this for pricing equity-linked benefits is to develop a version of the Wiener-Hopf factorization for the case where τ is PH and independent of X.This has been done in a companion paper (Asmussen and Ivanovs 2019), and the contribution of the present paper is to provide the further steps such as PH fitting of human mortality and computations of prices of equity-linked benefits for BM and jump diffusions.

Preliminaries
The vector of killing rates of a PH(α, T) r.v.τ is given by t = −T1, where 1 is a column vector with all entries equal to one.The density is αe Tz t, and the tail P(τ > z) is αe Tz 1.Here, the exponential e R of a matrix is defined as ∑ ∞ 0 R n /n!.We shall often use formulas like: that are immediate analogues of standard formulas for univariate exponentials.For example, taking R = −sI + T gives the Laplace transform E[e −sτ ] as α(sI − T) −1 t.Of course, the existence of R −1 has to be verified, which for a PH generator R = T is a standard fact.
The subclass of generalized Coxian (GC) distributions will play a particular role.Here, the states are run through in lexicographical order, with exponential (λ i ) holding time in state i, and it is possible to enter or exit from any state (see Figure 1), where q i is the probability of exit after a visit to i and r i = 1 − q i the probability of going to the next state.The structure of Coxian distributions is similar, except that only State 1 can be entered so that α 1 = 1.Coxian distributions are in particular used in (Lin and Liu 2007) and (Zadeh et al. 2014).
Remark 1.Though computational tractability and denseness are the main reasons for the popularity of PH distributions, other advantages have also sometimes been advocated.For example, whereas in the majority of examples, the PH modeling is purely descriptive, there are others where attempts have been made to give the Markov states a physical meaning like in compartment models and, in the life insurance context, in (Lin and Liu 2007) and (Zadeh et al. 2014).This makes the number p of phases quite large there, of order 200, whereas we have taken the descriptive point of view.The fit obviously becomes better when increasing p, but how good it needs to be depends very much on what it will be used for.In this paper, we consider the valuation problem, that is to calculate the net premium (expected discounted payoff) for equity-linked insurance products, and our numerical examples will show that taking p = 50 will be sufficient for the precision one needs in this context.

PH Fits of Human Mortality Data
The distribution of human lifetimes of course varies somewhat according to the period where the data were collected, geographical and social conditions, etc., but the basic shapes are much alike.We have therefore concentrated on a single example, the illustrative life table in Appendix 2A of (Bowers et al. 1997).The form of the distribution, as illustrated via the number of deaths in consecutive years, is given in Figure 2. We use an EM algorithm to fit various types of phase-type distributions to this life table.This idea was originally developed in (Asmussen et al. 1996), with the case of censored data and other implementation details given in (Olsson 1996).Olsson (Olsson 1998) has written a C implementation of this algorithm called EMpht, which is available online.The overwhelming computational burden is in the E step, where many complicated integrals involving matrix exponentials must be evaluated.This is done approximately by converting the problem into a system of ODEs, which are approximately solved via a Runge-Kutta method.While the original C implementation is still very adequate for creating fits with a small number of phases p, it can be improved upon for larger p.
We wrote a new implementation using the Julia language.It allows for the E step to be evaluated in three different ways: (i) using an ODE solver (as in (Asmussen et al. 1996;Olsson 1996)), (ii) using quadrature methods, and (iii) using the uniformization method of Okamura et al.Okamura et al. (2011aOkamura et al. ( , 2011b)).We used this last method to generate the fits found later in this paper; the code we used to create the figures below will be made available online (Asmussen et al. 2019).Each fit was given one hour execution time restricted to one thread on the processor Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20 GHz.The EM algorithm could have been terminated much earlier without a major difference to the quality of the fits, but as we were comparing the various values of p, we wanted to ensure no individual fit was judged poorly for simply being stuck in a local maxima.
The PH representation in terms of α, T is heavily overparameterized.There is an abundance of sets α, T that leads to the same distribution.Consider, e.g., the PH with α = 1 p 1 (where stands for transposition) and where T is a diagonal matrix with (−1, −2, . . ., −p) on the diagonal (i.e., a hyperexponential distribution).If the diagonal elements in T are permuted in any of the p! possibilities, then the distribution is unchanged.This example indicates that the maximum * of the likelihood function is attained at several parameter values so that the choice of the initial parameter set will determine which one is found by the EM algorithm.The presence of local maxima with values < * cannot be excluded, but we are not aware of serious cases where the EM algorithm has been stuck in one of these.
In life insurance calculations, the relevant lifetime is not the overall one τ 0 , but rather the remaining lifetime τ = τ x of the insured at the time when he/she signs the contract.That is, if the age of the insured at that time is x, the relevant distribution is that of τ 0 − x given τ 0 ≥ x.As a typical example, we took x = 35 and fitted PH distributions with various structure and various numbers p of phases.Note that while we fit the distribution of τ 0 − x, which starts at zero, in the following, we plot τ 0 given τ 0 ≥ x (i.e., the x axes begin at x = 35 and not zero).
As a first illustration, we fit both a general structure PH and a generalized Coxian (GC) one with p = 20 phases.The results are in Figure 3, with the densities of the fitted PH distributions in the left panel together with the data and the corresponding hazard rates in the right panel.The fits of the two structures are almost identical (but the shape of the density is somewhat off the data, which indicates that a larger valuer of p could be appropriate).The general structure leads to a system of ODEs of dimension q = p 2 + p, whereas the corresponding number for GC is only q = 2p − 1. Due to this curse of dimensionality and the marginal differences of the fits, we have only considered GC structures for higher values of p.
Figure 4 shows GC fits for p = 35 and p = 40.As expected, p = 40 improves upon p = 35.In Figure 5, we complement Figure 4 by plots of the tails.Such plots are frequently used to illustrate the quality of fits, but in our minds, a good agreement is not an argument in itself: two tails may come out quite similar in a comparison even if the shapes of the densities are rather different.Finally, Figure 6 gives GC fits with p = 50, 75, 100.Which value of p should one then work with?There exists hardly a single criterion for answering such questions, i.e., assessing the quality of the fits to the data.Some concerns may be aesthetic, such as avoiding negative values of the density (cf.Appendix A below), support outside the obvious range of the data (cf.kernel estimates), or an excessive number of parameters (as is the case for most PH fits as compared to, e.g., the classical Gompertz-Makeham fit of life table data).Other concerns may have to do with the features of the distribution which are considered especially important, be it the overall shape of the density (caught well by maximum likelihood), the cumulative distribution function (c.d.f.), although c.d.f.'s may look close even for rather different distributions, the tail (where maximum likelihood fails and other tools such as the Hill estimator may be more appropriate), the hazard rate (as showing up in numerous life insurance contexts), or the behavior in a more narrow range (cf. a life insurance contract signed at the start of employment and running up to the time of retirement).For the present purposes and related ones like applications in queueing and insurance risk, the main point is, however, that PH fitting is not done for its own sake, but for the purpose of facilitating some calculation, here of the prices of equity-linked products, in queueing, say of waiting time characteristics, etc.In the rest of the paper, we therefore study the performance of the PH approach and the various fits and return to a discussion of the issue in the concluding Section 8.

Valuation of Benefits
The main application of the results in this paper is valuing equity-linked insurance products, in particular GMDBs in various deferred annuities.The time of payment of this kind of product is at the time τ of death of the policyholder, often denoted τ x , where x is his/her age at t = 0 when he/she signs the contract.The amount of the payment depends on the price S τ of a stock (or stock index) at that time and possibly also on the history of the stock price as, e.g., in (1).There are many different products (benefit functions) in the market.One example is the GMDB with benefit function being: where K is the guaranteed amount, e.g., K = 0.9S 0 .The second term on the right-hand side (r.h.s.) of ( 2) is the payoff of a life-contingent put option.Another popular product is the so-called high-water benefit (HWB) with the payoff function being: Another form of the HWB function is: the second term on the r.h.s. of ( 3) is the payoff of a life-contingent fractional lookback option.
In practice, the policyholders can withdraw.Note that the payment of the GMDB can be written as the payoff of a put option (plus a share of underlying stock).When the stock price rises, put options become less valuable; hence, policies may lapse.This motivates considering the benefit function: where I {.} is the indicator function and H is a "barrier".This is a barrier option payoff form.
Since we can approximate the distribution of τ x by a PH distribution, the valuation problem becomes calculating: where δ denotes a discounting factor and τ is PH r.v.One example is to take δ as the force of interest, but the company may want to use a lower δ in its technical reserve to ensure a conservative pricing.

The Factorization: Implementation for Jump Diffusions
Our basic assumption is that X is a jump diffusion with Brownian parameters µ, σ 2 , upward jumps at rate λ 1 and with a PH(p 1 , α 1 , T 1 ) distribution of the sizes, and similar parameters λ 2 , p 2 , α 2 , T 2 for the downward jumps.
We will first need to introduce the equivalent time-reversed PH representation (α * , T * ) of τ.It is obtained by considering: It turns out (Asmussen 2003, III.5) that J * is a terminating time-homogeneous Markov process with initial distribution α * and generator T * given by: where ∆ ν denotes the diagonal matrix with the positive vector ν = −αT −1 on the diagonal.To see this, consider a doubly-infinite stationary Markovian arrival process (Asmussen 2003, XI.1) with generator T + D, where D = tα gives the rates of phase changes with arrivals, and note that the corresponding stationary distribution of the phase process is proportional to ν.Now, just use the standard time reversion for non-killed Markov processes.Considering, for the ease of exposition, the case δ = 0 of no discounting, the form of the factorization identity of (Asmussen and Ivanovs 2019) that is most convenient for our purposes is then: where P * is the probability measure where τ is generated by the time-reversed representation and X evolves as −X did under P, σ τ is the time at which the maximum is attained, and The distributions of M τ , D τ are particularly simple if X is BM(µ, σ 2 ) and τ exponential (λ).In fact, they are then exponential with rates ρ + , respectively ρ − , where: If more generally, X(t) is a jump diffusion with both up-and down-ward PH jumps, there is an abundance of results and algorithms giving the distributions in various representations or their Laplace transforms.See, among others, (Bean et al. 2005;Breuer 2008;Dieker and Mandjes 2011;Horváth and Telek 2017;Jacobsen 2005;Jiang and Pistorius 2008;Lewis and Mordecki 2008;Mordecki 2002;Pistorius 2006).The most appropriate form for our purposes is that for (computable) matrices U, U * of dimension p(1 + p 1 ) × p(1 + p 1 ), respectively p(1 + p 2 ) × p(1 + p 2 ), one has: where u k , u * k are the k th element of the vectors −Ue, respectively −U * e; this occurs already in (Asmussen 1995), as well as several later sources, but we have in fact used a new and (we feel) somewhat more straightforward approach; see Appendix B.
Multiplying ( 7), (8) by α i , respectively α * j and summing over i, j, the factorization (6) then takes the form: The modification with discounting is explained in Section 4 of (Asmussen and Ivanovs 2019) and becomes: where U δ , U * δ are calculated as U, U * with T replaced by T − δI (here, the r k remain unchanged, i.e., u k , u * k and c k are computed for the case δ = 0).This can be rewritten in matrix form as follows: (ii) For functions f and g, Proof.Since α * e −U * δ y e k ∈ R 1 , it may be replaced by its transpose e k e −U * δ y α * in (9), and this gives the first identity in (i); the second follows by transposition.Part (ii) then follows by integration.
For computations, the matrix form rather than the sum form is convenient for keeping the code short and transparent, and in many software packages like MATLAB, it also is faster.In a number of our examples, the matrices F U , G U * come out in a simple form, avoiding integration.Here is a first example: Example 1.For the HWB, we need to compute: Thus (assuming S 0 = 1 w.l.o.g.), we have f (x) = e x , g(y) = max(a, e −y ) in Part (ii) of Corollary 1, and we get: Proof.For x > 0, we have: and so, the result follows immediately by Part (ii) of Corollary 1.The proof for x < 0 is similar; one only has to note that if S * δ is defined as S δ with U δ and U * δ interchanged, then S * δ = S δ .
Remark 2. The distribution of X τ restricted to (0, ∞) is in fact (defective) PH, but in general, with a different generator U than U = U 0 (for this one would have needed that S δ α * be proportional to −Ue).This follows from the general fact that if α, s ≥ 0 and T is a phase generator, but possibly s = −Te, then αe T x s = αe T x t where: and here, T is a phase generator; all needed positivity properties follow from S δ and −T −1 being non-negative.Similar remarks apply to the negative part of X τ .These observations may be seen as an extension to jump diffusions at PH times of the standard fact that Brownian motion at an independent exponential time has a two-sided exponential distribution (often called the Laplace distribution).A similar result for matrix-exponential distributions is on p. 478 of (Bladt and Nielsen 2017), but is in terms of Laplace transforms rather than densities.
Example 2. For the GMDB, the price is (taking S 0 = 1): In the most common case K < S 0 , the first term equals: when S 0 = 1, whereas the second is:

♦
Example 3. The calculation of reserves: is closely related to price calculations.However, the details may be somewhat more cumbersome since for a fixed t, the expressions for M τ , D τ have two components, the quantities m 0 = max s≤t X s and d 0 = m 0 − X t , which are known at time t, and the unknown random ones: In terms of these and the relations: , where: Since the conditional distribution of τ − t given τ > t is PH(α t , T) where α t = αe Tt /αe Tt e, this gives: (note that for the reversal, we need not calculate α * by replacing α by α t in ( 5), but can use the same α * for all t, cf.Section 2.2 of (Asmussen and Ivanovs 2019)).
Example 4. For the GMDB, we have: where K = Ke −x 0 .Combining with Remark 2, we therefore arrive at the same expression as in Example 2 when Ke −x 0 ≤ 1, only with K replaced by K and α by α t .The case Ke −x 0 > 1 may arise by randomness, but is just a minor modification.
Example 5.The details for the HWB reserves are somewhat more complicated than for the GMDB.
Here, one needs to consider e M τ max(a, e −D τ ), corresponding to: One approach (potentially the easier one) may be simply to insert in (11) and do the two-dimensional numerical integration.However, in fact, one can reduce to a one-dimensional numerical integration similar to Corollary 2. To this end, note that e M τ max(a, e −D τ ) after simple derivations comes out as: of these four possibilities, (12d) (and to a slightly lesser extent, (12c)) is the most complicated one with which to deal.The contribution to the reserve from ( 12c) is zero if m 0 < d 0 + log a, since then, the defining region is empty.Otherwise, it becomes: which after two integrations reduces to an expression, which, though complicated, only involves known matrices and their exponentials, except that: is left to be determined by numerical integration.♦

Numerical Examples
For the underlying stock price process S t = S 0 e X t , the most popular model is the geometric Brownian motion model.Since this model cannot capture the asymmetric leptokurtic features and the volatility smile, many modified models have been proposed in the literature; one of these is the jump diffusion model, cf.(Kou 2002).Empirical studies show that the daily return distribution tends to have larger kurtosis than the distribution of monthly returns, and the jump diffusion model can capture this feature; see (Das and Foresi 1996).As an illustrative example, we suppose that the yearly interest rate is r = 3% and the yearly volatility σ = 0.25.The jumps follow a compound Poisson process with upward jump rate λ 1 = 3 with the jump sizes being exponential with rate ν 1 = 50 (mean 1/50) and downward jump rate λ 2 = 2 with jump size being exponential with rate ν 2 = 30.When considering the valuation of an insurance product alone, it is not necessary to use a risk-neutral valuation approach.However, since in our setting, the underlying involves an equity, it is natural to use an equivalent martingale measure Q, defined by the requirement log E Q e X(1) = r.This does not define Q uniquely since the market is incomplete due to the jumps and the mortality risk.Our choice was to let the parameters σ 2 , λ 1 , λ 2 , ν 1 , ν 2 remain unchanged, which gives the risk-neutral Brownian drift as: For the illustrative example, we have: It is left to compute the U δ , U * δ .Writing: we obtain: and thus, it suffices to consider U. Essentially, two types of algorithms are available.One set is based on finding roots in the complex plane of equations of the form κ(s) = a where κ(s) = log Ee sX( 1) is the Lévy exponent of X.The other uses fixed-point equations of the form U = ψ(U) and iteration, U = lim n→∞ U n , with U n = ψ(U n−1 ) and U 0 suitably chosen.We give a few examples of previous algorithms of both types in Appendix B. When implementing these with τ having one of the GC representations fitted in Section 3, our experience was that the root-finding method was only feasible for a rather small number p of phases.More precisely, for p = 50 or p = 75, meaningless results were produced, with warnings of the matrix H being close to singular.This is understandable since many eigenvalues of the relevant matrices are almost identical.Since the iterative algorithms ran without problems also for large values of p, we have therefore concentrated on these.In fact, we have developed a new iterative algorithm, which may be simpler than previous ones.It is presented in Appendix B, but for now, we concentrate on some numerical examples.Example 6.In Figure 7, we have plotted the marginal densities of M τ , D τ , and X τ .Example 8. To consider the sensitivity of the algorithms to the value p of chosen phases for τ, we took δ = r = 0.03 or 0.00 and computed the price ( 10) for p = 20, 35, 40, 50, 75, 100 and the GC fits presented in Section 3 for two products, the HWB with a = 0.85 and the GMDB with K = 0.85; the relevant formulas are available from Examples 1 and 2. The results are in Table 1.We also complemented with a Monte Carlo simulation estimate with an associated 95% confidence interval based on R = 10 6 replications, where τ was generated from the discrete life table distribution with an additional randomization within the year.A related question is how varying different fits can be for a fixed p due to issues such as non-uniqueness of PH representations and possibly local maxima of the likelihood.We illustrate this point in Table 2, which is a parallel of Table 1.We fix p at 50, but use five different initial values (seeds) for the EM algorithm, all allowed the same running time.The fitted densities are in Figure 9.Our conclusion is that the differences are too minor to be of any concern, and certainly, using a single fit with p ≥ 50 will produce acceptable results.Insisting on more than 2-3-digit precision in price estimates does not make sense due to issues such as model uncertainty and that payments will be adjusted later via bonus and dividend arrangements.The programming was done in MATLAB on a 2014 MacBook Air with a 1.4 GHz Intel Core i5 processor.As an example of execution times, MATLAB's tic-toc command reported approximately two seconds for each HWB or GMDB value using the PH methodology.The figures for the Monte Carlo values were much higher, approximately 30,000 replications per minute (recall that 10 6 replications were used).The programming is very simple, with the main effort lying in writing a routine for computing U for a given set of input parameters.This amounted to 50 lines of code in our implementation.Given this ease, we did some further price computations, even if our aim here is not to illustrate pricing aspects, but rather the computational ones.
Example 9. We first consider varying the discount rate δ, but not the force of interest r, cf. the remarks following (4) (that in particular motivate taking δ ≤ r).The results for the HWB are in Table 3 and show the expected picture, that the price is decreasing in the discount rate.Example 10.The next example is discounting with the force of interest, i.e., taking r = δ and varying r.
The numerical values are in Table 4, and the picture is again that the price is decreasing in r = δ.The intuition is less clear-cut here, since the Brownian drift µ is increasing and thus pulls in the opposite direction.Example 11.Next, we give an illustration of a feature of the factorization: if τ is exponential, then M τ and D τ are independent.cf.Lemma 1.In the PH case, there is dependence caused by the common value of τ = k.How does this influence the price?A numerical illustration is in Table 5, where the entries are the ratios between the correct price with dependence and the incorrect one computed assuming independence, i.e., as: The other settings are as in Table 3. Table 5 shows that in some cases, dependence indeed influences the price quite a bit.Example 12.As a follow-up of Example 3, we finally consider the evaluation of reserves.Two paths of the jump diffusion were simulated, and the reserve was calculated for each value of t using the formulas of Examples 3, 4 and 5.The results are given in the graphs of Figure 10.The upper panel gives the stock price S a−35 = e X a−35 (assuming S 35 = 1) and the lower one the ratio V a−35 /e X a−35 as function of the age a of the insured.When δ = r, because of the martingale property.Hence, the ratio in the lower panel should always be at least one.This is also confirmed from the figure, as well as the expectation that the ratio should be close to one when X t is large.
In the calculations, we used the fact that α * and U * need not be recalculated for each t, cf. the remarks following (11).This gave a substantial speed-up.

Erlangization and Extrapolation
A multitude of papers in finance, insurance, and queuing theory explore the idea of Erlangization or Canadization, that is approximation of a deterministic time T by an Erlang(q) time E = E q,q/T with the same mean.Here, q is the number of stages and q/T is the rate, so that E q,q/T is the sum of q mean T/q exponential r.v.'s.This idea has various applications: explicit identities (see, e.g., (Asmussen et al. 2002;Carr 1998;Stanford et al. 2005)); this is used in (Kuznetsov et al. 2011) together with the traditional Wiener-Hopf factorization to provide an algorithm simulating (M E , X E ) to approximate (M T , X T ) and as a numerical tool for calculating f (T) as the limit of E f (E q,q/T ) (or just f (E q,q/T )) under suitable continuity conditions on f ; see, e.g., (Asmussen et al. 2002;Carr 1998;Leung et al. 2015).This last procedure uses the fact that E q,q/T converges in probability to T as q → ∞.In (Asmussen et al. 2002), it is combined with extrapolation, which is based on the asymptotics: valid for f smooth and some C = C( f ).This C is typically not available, but (13) suggests eliminating C by estimating f (T) using: which converges to f (T) at the improved rate O(q −2 ).The numerical examples of (Asmussen et al. 2002) indicate that often, calculations with q set as small as 2-4 will give very good precision.
The relevance for the present purposes is that the time horizon of many life insurance contracts is not the remaining lifetime τ of the insured, but rather T ∧ τ, where T is a fixed deterministic number.For example, when the age of the insured is 35 at the time of underwriting, T = 45 corresponds to contract expiry at age 80 if the insured is still alive.Some calculations can be found in Section 10 of (Gerber et al. 2012) for τ exponential and a call or put option, but rely heavily on this particular structure.For our life insurance applications, it is crucial that E q,q/T ∧ τ is again PH when E q,q/T and τ are independent.The details involve the Kronecker product ⊗ and sum ⊕ so that: For two Markov process generators, the Kronecker sum is the generator of independent versions, and so, E q,q/T ∧ τ is PH with generator E ⊕ T of dimension nq × nq where E = E q,q/T is the PH generator of E q,q/T .Written out with lexicographical order of the states: Example 13.We consider a HWB or GMDB contract signed at age 35 of the insured and expiring at the time τ of death or at age 70, whatever comes first.We took r = δ = 3% and used the GC fit with p = 50, leading to the numerical values given in Table 6; in comparison, Monte Carlo simulation with R = 50,000 replications gave the 95% confidence interval 1.636 ± 0.034 for the HWB and 1.097 ± 0.029 for the GMDB.The extrapolation is done via ( 14), with the required smoothness being obvious for the GMDB.For the HWB, d → max(a, e −d ) is not smooth at d = − log a, but some rough heuristics (that we omit) suggest that (13) remains in force for this special f .Certainly, extrapolation appears to produce sensible results also for the GMDB.In both cases, it seems that q = 4-6 is enough to obtain sufficiently precise estimates, whereas without extrapolation, one would need at least q = 10.This is a considerable improvement upon simple Erlangization, since the dimension of the matrices in the computations (where inversion is the heavy part) is 2pq.This means that for p = 50, one would need to invert matrices of dimension 400 when q = 4, but 1000 when q = 10, and since the inversion of an n × n matrix has complexity n 3 (or slightly less with sophisticated methods), this corresponds to reducing computation time by a factor of about 2.5 3 = 15.6.

Conclusions
In this paper, we have considered pricing calculations for life insurance products like the guaranteed minimum death benefit and the high-water benefit.The payout occurs at the death time τ of the insured, and its size is linked to an asset price S either via its current value S τ or to features of its past evolution during the time of the contract such as the maximum S τ .
The key observation is that for the most popular model of S, an exponential jump diffusion, the distributions of S τ and/or S τ are available when the jumps and τ are phase-type.This has long been known in the geometric Brownian motion setting without jumps when τ is exponential and is essentially just the Wiener-Hopf factorization of Brownian motion.There is also an extensive literature for PH jumps, and for τ itself being PH, the relevant Wiener-Hopf factorization has recently been developed in (Asmussen and Ivanovs 2019).
The program thus involves two steps, first fitting a PH distribution of τ to lifetime data and next implementing the pricing calculations.
The shape of lifetime data with a steep decline after the mode makes the fitting of PH distributions non-trivial, necessitating the number p of phases to be relatively high.However, we already made the points that a good fit is one that produces reliable estimates of the prices of the life insurance products, and that giving more than 2-3-digit precision in price estimates does not make sense due to issues such as uncertainty about the model and its parameters (think, e.g., of longevity risk) and bonus/dividend arrangements.We found p = 50 to be more than sufficient in the sense of providing appropriate accuracy of the calculated price of life insurance products and could well have worked with smaller values.An alternative is developed in GSY, to use matrix-exponentials fitted at a selected number of points, but we show in Appendix A that this approach has its pitfalls.
Fitting of so high-dimensional PH distributions is slow, though we developed in part some new software giving substantial speed-up.However, given this step has been passed, pricing calculations are both very easy to program and run very fast, also compared to Monte Carlo simulation, and this may be the main advantage of our approach.A limitation is of course that we need the pay-out to be of the form (1), and there are many products for which this is not the case.Alternatives such as Monte Carlo simulation and traditional methods based on Thiele-type differential equations may then be more suitable.Therefore, we do not claim the superiority of our own approach for all purposes, but have rather tried to clarify its advantages and disadvantages.In Figures A1 and A2a, the fits are fine in the selected age range (x, x + n), but fail miserably beyond.This is not necessarily a drawback if the fit has been made for the pricing of a product where x + n is the upper limit of the relevant age range.The picture in Figure A2b is more rough, but presumably repairable by increasing beyond = 10.Another phenomenon that is observed is the possibility of densities that are negative and exhibit weird oscillations.This is seen most markedly in Figure A2b.
This behavior is a result of the fitting criterion (A1), which only pays attention to the selected grid x, x + 1, . . ., x + n, and not to what goes on in between the grid points.

Appendix B. Algorithms for Computing U
As an example of the root-finding algorithms, we state the following particularly simple example, which is a special case of Corollary 5.2 of (Asmussen 1995): Proposition A1.Assume X is BM(µ, σ 2 ) and that τ is GC, as in Figure 1.Then, P i (M τ ∈ dy; J σ = k) = e i e Uy e k dy where: U = H∆ −s H −1 ; s is the vector with elements s j = −µ/σ 2 + µ 2 /σ 4 + 2λ j /σ 2 ; H is the matrix with columns h 1 , . . ., h n ; h j is the eigenvector of T + λ j I corresponding to eigenvalue zero, with elements h ij given by h ij = 0 for i > j, h 1j = 1 and: In particular, P(J σ = k) = −αU −1 e k • u k where u k is the k th element of −Ue.
Remark A1.For a GC distribution, the characteristic polynomial, its roots, and the eigenvalues are explicit because of the upper triangular form of the generator matrix.This may of course be a curiosity in the computer age, and it is probably faster to code the program of Proposition A1 via matrix software than to use the explicit formulas.However, continuing in the same vein, note that not only is the determinant of an upper triangular matrix explicit, but also the inverse.Indeed, let k be the vector of diagonal elements of H and write H = ∆ k (I + N), i.e., N = ∆ 1/k (H − I).Then, N is nilpotent and: The iterative algorithms are hardly simpler for the GC case than for a general PH structure.An example (Theorem 4.1 of (Asmussen 1995)) follows.
Then, U is the solution of the fixed-point problem U = ψ(U) where ψ(V ) is the matrix whose i th row is the i th row of the matrix: 2 σ 2 (∆ λ + T)(η i I − V ) −1 − ∆ ω .Further, U = lim n→∞ U n with U n = ψ(U n−1 ) and U 0 = −∆ ω .
In our numerical examples, we have used an algorithm that is somewhat related, but also contains some genuinely new ideas.It is applicable to more general models than jump diffusions and thereby also in other application areas than life insurance.For this reason, the details will be provided elsewhere, and we give here only a very brief outline.The basic idea is illustrated in Figure A3, which contains a realization of the jump diffusion X up to the PH time τ.The underlying Markov process has two states, blue B and red R for τ, two cyan C and magenta M for the upward jumps, and one green G for the downward ones.The phases when X is at a maximum are shown in the left-hand vertical bar and have four possible states B, R, CB, MB, CR, MR with legends given in the upper right corner.The basic observation for the algorithm is that this process ξ(x) of phases at maxima x (and thereby, the overall maximum M τ ) is the same as for the embedded discrete time killed Markov random walk marked by black dots.Its values are the values in the graph of X either at the start of a Brownian segment, just before the end, or at the maximum, or at phase changes in jumps.Here, the graph of X is understood to comprise the vertical parts from jumps, and Brownian segments are separated either by state changes in the Markov process underlying the phase horizon τ, or by jumps.The generator U of ξ is then shown to satisfy a certain fixed-point equation that is solved iteratively.
Computationally, the main effort in each step is to invert a matrix U n whose dimension is the number p(1 + p + ) of states of ξ.

Figure 3 .
Figure3.Probability density functions and hazard rate functions for the life table and phase-type (PH) fits using the general and the generalized Coxian (GC) structures with p = 20 phases.

Figure 4 .
Figure 4. Probability density functions and hazard rate functions for the life table and phase-type fits using the generalized Coxian structures with p = 35, 40 phases.

Figure 5 .Figure 6 .
Figure 5. Tails of the life table and GC fits with p = 35, 40 phases.

Figure 7 .
Figure 7. Densities of (a) M τ , (b) D τ , and (c) X τ .Example 7. Figure8gives the joint distribution of M τ and D τ (computed by Corollary 1) for our jump diffusion with the parameters given above.

Figure 8 .
Joint distribution of M τ and D τ .The joint density is in the left panel and a contour plot in the right.

Figure 9 .
Figure9.Fitted densities for five different seeds (against the life table data).

Figure 10 .
Figure 10.Asset price paths and excess ratio of associated GMDB reserves.

Figure A3 .
Figure A3.Jump diffusion and embedded Markov random walk.

Table 1 .
Price estimates for δ = r and varying p. HWB, high-water benefit; GMDB, guaranteed minimum death benefit.

Table 2 .
Price estimates for δ = r and varying seeds.

Table 3 .
HWB price estimates for r = 0.03 and varying δ.

Table 5 .
Ratios between prices with and without dependence.