1. Introduction
A random variable (r.v.) 
 is phase-type (PH) with representation 
 if it is distributed as the lifetime 
 of a killed (terminating) time-homogenous Markov process 
 with 
 states, initial distribution 
 (a row vector), and generator 
, where † is an additional absorbing state. PH distributions originate from queueing theory and the work of A.K. Erlang, A. Jensen and M.F. Neuts. However, in recent decades, they have found numerous applications to different areas. The main reason for this is that many calculations that are explicit for exponential distributions are often computationally tractable with PH assumptions and that, in addition, PH distributions are dense so that a given distribution 
F on 
 can be approximated arbitrarily well by a PH distribution provided one takes 
p large enough. For surveys of the theory and applications, see (
Asmussen 2003; 
Asmussen and Albrecher 2010; 
Bladt and Nielsen 2017). The present paper is concerned with some applications to life insurance, an area where PH distributions have been far less employed than in the related areas of non-life insurance. There are many papers in ruin theory that assume that claim sizes are PH distributed; see (
Cai and Li 2005b) and the review paper (
Bladt 2005). Some papers assume that the claim time is PH distributed; see (
Stanford et al. 2000). (
Zadeh et al. 2014) considered disability insurance using a PH model. (
Zadeh and Stanford 2016) studied credibility theory using PH distributions. In finance, PH distributions have also been used extensively. For example, (
Asmussen et al. 2004) considered option pricing under an exponential phase-type Lévy model; (
Cai and Li 2005a) dealt with applications of PH distributions to risk measures; and (
Egami and Yamazaki 2014) considered PH approximations of Lévy models. On the contrary, the only references of applications of PH distributions to life insurance we know of are (
Lin and Liu 2007) and (
Zadeh et al. 2014).
The motivation for the present study is some valuation problems of equity-linked products in life insurance considered in 
Gerber et al. (
2012, 
2013, 
2015) (GSY). Let 
 be the remaining lifetime of an insured of age 
x when signing the contract, then the payment of this class of benefits has the form 
 where 
 is the price at time 
t of some stock or stock index and 
 the running maximum. An example of such a benefit is the guaranteed minimum death benefit (GMDB) 
, but there are many others; see the list in 
Section 4. Defining 
 as 
, 
 as the running maximum of 
X, and 
 as the displacement from the maximum, we then have:
	  (note that 
), and the fair value of the benefit is the expected discounted payment.
A widely-used model (
Cont and Tankov 2004) assumes that 
 is a Lévy process, and the key vehicle in GSY for the computations for such expectations is the classical Wiener-Hopf factorization:
Lemma 1. For a Lévy process  and an independent exponential time τ, the r.v.’s  and  are independent. Further,  For the geometric Brownian motion model where 
X is a Brownian motion (BM) with drift, 
 and 
 both have an exponential distribution, and for more general jump diffusions, a number of computational procedures have been developed (see 
Section 5 for references). In GSY, they approximate the density/tail of 
 by a linear combination of exponential terms, which allows them to reduce the computations to simple integrations using Lemma 1.
The approximation of the distribution of 
 by a linear combination of exponential terms in GSY uses an ad hoc optimization procedure. The main problem there is that it is very difficult to make the linear combination of exponential densities to be a density function; in particular, very often, in the tail part, the linear combination of exponential densities can be negative. Since, in the application to valuing equity-linked insurance products, the distribution at the tail part plays an important role, this motivated us to consider using the PH distribution to approximate the future lifetime distribution. For PH distributions, more classical statistical methods using maximum likelihood have, however, been developed (
Asmussen et al. 1996), and in view of the denseness of PH distributions and the fact that they have proven a computationally-tractable generalization of the exponential distribution in other settings, a natural idea is therefore to use a PH approximation 
 to 
. The first step in implementing this for pricing equity-linked benefits is to develop a version of the Wiener-Hopf factorization for the case where 
 is PH and independent of 
X. This has been done in a companion paper (
Asmussen and Ivanovs 2019), and the contribution of the present paper is to provide the further steps such as PH fitting of human mortality and computations of prices of equity-linked benefits for BM and jump diffusions.
  2. Preliminaries
The vector of killing rates of a PH
 r.v. 
 is given by 
, where 
 is a column vector with all entries equal to one. The density is 
, and the tail 
 is 
. Here, the exponential 
 of a matrix is defined as 
. We shall often use formulas like:
	  that are immediate analogues of standard formulas for univariate exponentials. For example, taking 
 gives the Laplace transform 
 as 
. Of course, the existence of 
 has to be verified, which for a PH generator 
 is a standard fact.
The subclass of generalized Coxian (GC) distributions will play a particular role. Here, the states are run through in lexicographical order, with exponential 
 holding time in state 
i, and it is possible to enter or exit from any state (see 
Figure 1), where 
 is the probability of exit after a visit to 
i and 
 the probability of going to the next state. The structure of Coxian distributions is similar, except that only State 1 can be entered so that 
. Coxian distributions are in particular used in (
Lin and Liu 2007) and (
Zadeh et al. 2014).
Remark 1. Though computational tractability and denseness are the main reasons for the popularity of PH distributions, other advantages have also sometimes been advocated. For example, whereas in the majority of examples, the PH modeling is purely descriptive, there are others where attempts have been made to give the Markov states a physical meaning like in compartment models and, in the life insurance context, in (
Lin and Liu 2007) and (
Zadeh et al. 2014). This makes the number 
p of phases quite large there, of order 200, whereas we have taken the descriptive point of view. The fit obviously becomes better when increasing 
p, but how good it needs to be depends very much on what it will be used for. In this paper, we consider the valuation problem, that is to calculate the net premium (expected discounted payoff) for equity-linked insurance products, and our numerical examples will show that taking 
 will be sufficient for the precision one needs in this context.
   3. PH Fits of Human Mortality Data
The distribution of human lifetimes of course varies somewhat according to the period where the data were collected, geographical and social conditions, etc., but the basic shapes are much alike. We have therefore concentrated on a single example, the illustrative life table in Appendix 2A of (
Bowers et al. 1997). The form of the distribution, as illustrated via the number of deaths in consecutive years, is given in 
Figure 2.
We use an EM algorithm to fit various types of phase-type distributions to this life table. This idea was originally developed in (
Asmussen et al. 1996), with the case of censored data and other implementation details given in (
Olsson 1996). Olsson (
Olsson 1998) has written a C implementation of this algorithm called EMpht, which is available online. The overwhelming computational burden is in the E step, where many complicated integrals involving matrix exponentials must be evaluated. This is done approximately by converting the problem into a system of ODEs, which are approximately solved via a Runge-Kutta method. While the original C implementation is still very adequate for creating fits with a small number of phases 
p, it can be improved upon for larger 
p.
We wrote a new implementation using the Julia language. It allows for the E step to be evaluated in three different ways: (i) using an ODE solver (as in (
Asmussen et al. 1996; 
Olsson 1996)), (ii) using quadrature methods, and (iii) using the uniformization method of Okamura et al. 
Okamura et al. (
2011a, 
2011b). We used this last method to generate the fits found later in this paper; the code we used to create the figures below will be made available online (
Asmussen et al. 2019). Each fit was given one hour execution time restricted to one thread on the processor Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20 GHz. The EM algorithm could have been terminated much earlier without a major difference to the quality of the fits, but as we were comparing the various values of 
p, we wanted to ensure no individual fit was judged poorly for simply being stuck in a local maxima.
The PH representation in terms of  is heavily overparameterized. There is an abundance of sets  that leads to the same distribution. Consider, e.g., the PH with  (where  stands for transposition) and where  is a diagonal matrix with  on the diagonal (i.e., a hyperexponential distribution). If the diagonal elements in  are permuted in any of the  possibilities, then the distribution is unchanged. This example indicates that the maximum  of the likelihood function ℓ is attained at several parameter values so that the choice of the initial parameter set will determine which one is found by the EM algorithm. The presence of local maxima with values  cannot be excluded, but we are not aware of serious cases where the EM algorithm has been stuck in one of these.
In life insurance calculations, the relevant lifetime is not the overall one , but rather the remaining lifetime  of the insured at the time when he/she signs the contract. That is, if the age of the insured at that time is x, the relevant distribution is that of  given . As a typical example, we took  and fitted PH distributions with various structure and various numbers p of phases. Note that while we fit the distribution of , which starts at zero, in the following, we plot  given  (i.e., the x axes begin at  and not zero).
As a first illustration, we fit both a general structure PH and a generalized Coxian (GC) one with 
 phases. The results are in 
Figure 3, with the densities of the fitted PH distributions in the left panel together with the data and the corresponding hazard rates in the right panel.
The fits of the two structures are almost identical (but the shape of the density is somewhat off the data, which indicates that a larger valuer of p could be appropriate). The general structure leads to a system of ODEs of dimension , whereas the corresponding number for GC is only . Due to this curse of dimensionality and the marginal differences of the fits, we have only considered GC structures for higher values of p.
Figure 4 shows GC fits for 
 and 
. As expected, 
 improves upon 
.
 In 
Figure 5, we complement 
Figure 4 by plots of the tails. Such plots are frequently used to illustrate the quality of fits, but in our minds, a good agreement is not an argument in itself: two tails may come out quite similar in a comparison even if the shapes of the densities are rather different. Finally, 
Figure 6 gives GC fits with 
.
Which value of 
p should one then work with? There exists hardly a single criterion for answering such questions, i.e., assessing the quality of the fits to the data. Some concerns may be aesthetic, such as avoiding negative values of the density (cf. 
Appendix A below), support outside the obvious range of the data (cf. kernel estimates), or an excessive number of parameters (as is the case for most PH fits as compared to, e.g., the classical Gompertz-Makeham fit of life table data). Other concerns may have to do with the features of the distribution which are considered especially important, be it the overall shape of the density (caught well by maximum likelihood), the cumulative distribution function (c.d.f.), although c.d.f.’s may look close even for rather different distributions, the tail (where maximum likelihood fails and other tools such as the Hill estimator may be more appropriate), the hazard rate (as showing up in numerous life insurance contexts), or the behavior in a more narrow range (cf. a life insurance contract signed at the start of employment and running up to the time of retirement). For the present purposes and related ones like applications in queueing and insurance risk, the main point is, however, that PH fitting is not done for its own sake, but for the purpose of facilitating some calculation, here of the prices of equity-linked products, in queueing, say of waiting time characteristics, etc. In the rest of the paper, we therefore study the performance of the PH approach and the various fits and return to a discussion of the issue in the concluding 
Section 8.
  4. Valuation of Benefits
The main application of the results in this paper is valuing equity-linked insurance products, in particular GMDBs in various deferred annuities. The time of payment of this kind of product is at the time 
 of death of the policyholder, often denoted 
, where 
x is his/her age at 
 when he/she signs the contract. The amount of the payment depends on the price 
 of a stock (or stock index) at that time and possibly also on the history of the stock price as, e.g., in (
1). There are many different products (benefit functions) in the market. One example is the GMDB with benefit function being:
	  where 
K is the guaranteed amount, e.g., 
. The second term on the right-hand side (r.h.s.) of (
2) is the payoff of a life-contingent put option. Another popular product is the so-called high-water benefit (HWB) with the payoff function being: 
Another form of the HWB function is: 
	  where 
. Since:
	  the second term on the r.h.s. of (
3) is the payoff of a life-contingent fractional lookback option.
In practice, the policyholders can withdraw. Note that the payment of the GMDB can be written as the payoff of a put option (plus a share of underlying stock). When the stock price rises, put options become less valuable; hence, policies may lapse. This motivates considering the benefit function: 
	  where 
 is the indicator function and 
H is a “barrier”. This is a barrier option payoff form.
Since we can approximate the distribution of 
 by a PH distribution, the valuation problem becomes calculating:
	  where 
 denotes a discounting factor and 
 is PH r.v. One example is to take 
 as the force of interest, but the company may want to use a lower 
 in its technical reserve to ensure a conservative pricing.
  5. The Factorization: Implementation for Jump Diffusions
Our basic assumption is that X is a jump diffusion with Brownian parameters , upward jumps at rate  and with a PH distribution of the sizes, and similar parameters  for the downward jumps.
We will first need to introduce the equivalent time-reversed PH representation 
 of 
. It is obtained by considering:
It turns out (
Asmussen 2003, III.5) that 
 is a terminating time-homogeneous Markov process with initial distribution 
 and generator 
 given by: 
	  where 
 denotes the diagonal matrix with the positive vector 
 on the diagonal. To see this, consider a doubly-infinite stationary Markovian arrival process (
Asmussen 2003, XI.1) with generator 
, where 
 gives the rates of phase changes with arrivals, and note that the corresponding stationary distribution of the phase process is proportional to 
. Now, just use the standard time reversion for non-killed Markov processes. Considering, for the ease of exposition, the case 
 of no discounting, the form of the factorization identity of (
Asmussen and Ivanovs 2019) that is most convenient for our purposes is then:
	  where 
 is the probability measure where 
 is generated by the time-reversed representation and 
X evolves as 
 did under 
, 
 is the time at which the maximum is attained, and 
.
The distributions of 
 are particularly simple if 
X is BM
 and 
 exponential 
. In fact, they are then exponential with rates 
, respectively 
, where: 
If more generally, 
 is a jump diffusion with both up- and down-ward PH jumps, there is an abundance of results and algorithms giving the distributions in various representations or their Laplace transforms. See, among others, (
Bean et al. 2005; 
Breuer 2008; 
Dieker and Mandjes 2011; 
Horváth and Telek 2017; 
Jacobsen 2005; 
Jiang and Pistorius 2008; 
Lewis and Mordecki 2008; 
Mordecki 2002; 
Pistorius 2006). The most appropriate form for our purposes is that for (computable) matrices 
 of dimension 
, respectively 
, one has:
      where 
 are the 
 element of the vectors 
, respectively 
; this occurs already in (
Asmussen 1995), as well as several later sources, but we have in fact used a new and (we feel) somewhat more straightforward approach; see 
Appendix B.
Multiplying (7), (
8) by 
, respectively 
 and summing over 
, the factorization (
6) then takes the form: 
	  where 
. The modification with discounting is explained in Section 4 of (
Asmussen and Ivanovs 2019) and becomes: 
	  where 
, 
 are calculated as 
 with 
 replaced by 
 (here, the 
 remain unchanged, i.e., 
, 
 and 
 are computed for the case 
). This can be rewritten in matrix form as follows:
Corollary 1. (ii) 
For functions f and g,where ,  are the matrices , respectively . Proof.  Since , it may be replaced by its transpose  in (9), and this gives the first identity in (i); the second follows by transposition. Part (ii) then follows by integration. ▯
 For computations, the matrix form rather than the sum form is convenient for keeping the code short and transparent, and in many software packages like MATLAB, it also is faster. In a number of our examples, the matrices ,  come out in a simple form, avoiding integration. Here is a first example:
Example 1. For the HWB, we need to compute:
        
Thus (assuming 
 w.l.o.g.), we have 
, 
 in Part (ii) of Corollary 1, and we get:
        
 ⋄
 Corollary 2. Define . Then:  Proof.  For 
, we have:
        
        and so, the result follows immediately by Part (ii) of Corollary 1. The proof for 
 is similar; one only has to note that if 
 is defined as 
 with 
 and 
 interchanged, then 
. ▯
 Remark 2. The distribution of 
 restricted to 
 is in fact (defective) PH, but in general, with a different generator 
 than 
 (for this one would have needed that 
 be proportional to 
). This follows from the general fact that if 
 and 
 is a phase generator, but possibly 
, then 
 where:
        
        and here, 
 is a phase generator; all needed positivity properties follow from 
 and 
 being non-negative.
Similar remarks apply to the negative part of 
. These observations may be seen as an extension to jump diffusions at PH times of the standard fact that Brownian motion at an independent exponential time has a two-sided exponential distribution (often called the Laplace distribution). A similar result for matrix-exponential distributions is on p. 478 of (
Bladt and Nielsen 2017), but is in terms of Laplace transforms rather than densities.
 Example 2. For the GMDB, the price is (taking 
):
        
In the most common case 
, the first term equals:
        
        when 
, whereas the second is:
        
 ⋄
 Example 3. The calculation of reserves:
        
        is closely related to price calculations. However, the details may be somewhat more cumbersome since for a fixed 
t, the expressions for 
 have two components, the quantities 
 and 
, which are known at time 
t, and the unknown random ones:
        
In terms of these and the relations:
        
        we can write 
, where:
        
Since the conditional distribution of 
 given 
 is PH
 where 
, this gives:
        
		(note that for the reversal, we need not calculate 
 by replacing 
 by 
 in (
5), but can use the same 
 for all 
t, cf. Section 2.2 of (
Asmussen and Ivanovs 2019)).
 Example 4. For the GMDB, we have:
        
       where 
. Combining with Remark 2, we therefore arrive at the same expression as in Example 2 when 
, only with 
K replaced by 
 and 
 by 
. The case 
 may arise by randomness, but is just a minor modification.
 Example 5. The details for the HWB reserves are somewhat more complicated than for the GMDB. Here, one needs to consider 
, corresponding to:
        
One approach (potentially the easier one) may be simply to insert in (11) and do the two-dimensional numerical integration. However, in fact, one can reduce to a one-dimensional numerical integration similar to Corollary 2. To this end, note that 
 after simple derivations comes out as:
		
	  of these four possibilities, (12d) (and to a slightly lesser extent, (12c)) is the most complicated one with which to deal. The contribution to the reserve from (12c) is zero if 
, since then, the defining region is empty. Otherwise, it becomes:
        
        which after two integrations reduces to an expression, which, though complicated, only involves known matrices and their exponentials, except that:
        
        is left to be determined by numerical integration. ⋄
   6. Numerical Examples
For the underlying stock price process 
, the most popular model is the geometric Brownian motion model. Since this model cannot capture the asymmetric leptokurtic features and the volatility smile, many modified models have been proposed in the literature; one of these is the jump diffusion model, cf. (
Kou 2002). Empirical studies show that the daily return distribution tends to have larger kurtosis than the distribution of monthly returns, and the jump diffusion model can capture this feature; see (
Das and Foresi 1996). As an illustrative example, we suppose that the yearly interest rate is 
 and the yearly volatility 
. The jumps follow a compound Poisson process with upward jump rate 
 with the jump sizes being exponential with rate 
 (mean 1/50) and downward jump rate 
 with jump size being exponential with rate 
. When considering the valuation of an insurance product alone, it is not necessary to use a risk-neutral valuation approach. However, since in our setting, the underlying involves an equity, it is natural to use an equivalent martingale measure 
, defined by the requirement 
. This does not define 
 uniquely since the market is incomplete due to the jumps and the mortality risk. Our choice was to let the parameters 
 remain unchanged, which gives the risk-neutral Brownian drift as:
For the illustrative example, we have:
It is left to compute the 
. Writing:
	  we obtain:
	  and thus, it suffices to consider 
. Essentially, two types of algorithms are available. One set is based on finding roots in the complex plane of equations of the form 
 where 
 is the Lévy exponent of 
X. The other uses fixed-point equations of the form 
 and iteration, 
, with 
 and 
 suitably chosen.
We give a few examples of previous algorithms of both types in 
Appendix B. When implementing these with 
 having one of the GC representations fitted in 
Section 3, our experience was that the root-finding method was only feasible for a rather small number 
p of phases. More precisely, for 
 or 
, meaningless results were produced, with warnings of the matrix 
 being close to singular. This is understandable since many eigenvalues of the relevant matrices are almost identical. Since the iterative algorithms ran without problems also for large values of 
p, we have therefore concentrated on these. In fact, we have developed a new iterative algorithm, which may be simpler than previous ones. It is presented in 
Appendix B, but for now, we concentrate on some numerical examples.
Example 6. In 
Figure 7, we have plotted the marginal densities of 
, 
, and 
.
 Example 7. Figure 8 gives the joint distribution of 
 and 
 (computed by Corollary 1) for our jump diffusion with the parameters given above.
  Example 8. To consider the sensitivity of the algorithms to the value 
p of chosen phases for 
, we took 
 or 0.00 and computed the price (10) for 
 20, 35, 40, 50, 75, 100 and the GC fits presented in 
Section 3 for two products, the HWB with 
 and the GMDB with 
; the relevant formulas are available from Examples 1 and 2. The results are in 
Table 1. We also complemented with a Monte Carlo simulation estimate with an associated 95% confidence interval based on 
 replications, where 
 was generated from the discrete life table distribution with an additional randomization within the year.
A related question is how varying different fits can be for a fixed 
p due to issues such as non-uniqueness of PH representations and possibly local maxima of the likelihood. We illustrate this point in 
Table 2, which is a parallel of 
Table 1. We fix 
p at 50, but use five different initial values (seeds) for the EM algorithm, all allowed the same running time. The fitted densities are in 
Figure 9.
Our conclusion is that the differences are too minor to be of any concern, and certainly, using a single fit with  will produce acceptable results. Insisting on more than 2–3-digit precision in price estimates does not make sense due to issues such as model uncertainty and that payments will be adjusted later via bonus and dividend arrangements.
 The programming was done in MATLAB on a 2014 MacBook Air with a 1.4 GHz Intel Core i5 processor. As an example of execution times, MATLAB’s tic-toc command reported approximately two seconds for each HWB or GMDB value using the PH methodology. The figures for the Monte Carlo values were much higher, approximately 30,000 replications per minute (recall that  replications were used). The programming is very simple, with the main effort lying in writing a routine for computing  for a given set of input parameters. This amounted to 50 lines of code in our implementation. Given this ease, we did some further price computations, even if our aim here is not to illustrate pricing aspects, but rather the computational ones.
Example 9. We first consider varying the discount rate 
, but not the force of interest 
r, cf. the remarks following (
4) (that in particular motivate taking 
). The results for the HWB are in 
Table 3 and show the expected picture, that the price is decreasing in the discount rate.
 Example 10. The next example is discounting with the force of interest, i.e., taking 
 and varying 
r. The numerical values are in 
Table 4, and the picture is again that the price is decreasing in 
. The intuition is less clear-cut here, since the Brownian drift 
 is increasing and thus pulls in the opposite direction.
 Example 11. Next, we give an illustration of a feature of the factorization: if 
 is exponential, then 
 and 
 are independent. cf. Lemma 1. In the PH case, there is dependence caused by the common value of 
. How does this influence the price? A numerical illustration is in 
Table 5, where the entries are the ratios between the correct price with dependence and the incorrect one computed assuming independence, i.e., as:
        
The other settings are as in 
Table 3. 
Table 5 shows that in some cases, dependence indeed influences the price quite a bit.
 Example 12. As a follow-up of Example 3, we finally consider the evaluation of reserves. Two paths of the jump diffusion were simulated, and the reserve was calculated for each value of 
t using the formulas of Examples 3, 4 and 5. The results are given in the graphs of 
Figure 10. The upper panel gives the stock price 
 (assuming 
) and the lower one the ratio 
 as function of the age 
a of the insured.
When 
,
        
        because of the martingale property. Hence, the ratio in the lower panel should always be at least one. This is also confirmed from the figure, as well as the expectation that the ratio should be close to one when 
 is large.
In the calculations, we used the fact that  and  need not be recalculated for each t, cf. the remarks following (11). This gave a substantial speed-up.
   7. Erlangization and Extrapolation
A multitude of papers in finance, insurance, and queuing theory explore the idea of Erlangization or Canadization, that is approximation of a deterministic time 
T by an Erlang
 time 
 with the same mean. Here, 
q is the number of stages and 
 is the rate, so that 
 is the sum of 
q mean 
 exponential r.v.’s. This idea has various applications: explicit identities (see, e.g., (
Asmussen et al. 2002; 
Carr 1998; 
Stanford et al. 2005)); this is used in (
Kuznetsov et al. 2011) together with the traditional Wiener-Hopf factorization to provide an algorithm simulating 
 to approximate 
 and as a numerical tool for calculating 
 as the limit of 
 (or just 
) under suitable continuity conditions on 
f; see, e.g., (
Asmussen et al. 2002; 
Carr 1998; 
Leung et al. 2015). This last procedure uses the fact that 
 converges in probability to 
T as 
. In (
Asmussen et al. 2002), it is combined with extrapolation, which is based on the asymptotics:
	  valid for 
f smooth and some 
. This 
C is typically not available, but (13) suggests eliminating 
C by estimating 
 using:
	  which converges to 
 at the improved rate 
. The numerical examples of (
Asmussen et al. 2002) indicate that often, calculations with 
q set as small as 2–4 will give very good precision.
The relevance for the present purposes is that the time horizon of many life insurance contracts is not the remaining lifetime 
 of the insured, but rather 
, where 
T is a fixed deterministic number. For example, when the age of the insured is 35 at the time of underwriting, 
 corresponds to contract expiry at age 80 if the insured is still alive. Some calculations can be found in Section 10 of (
Gerber et al. 2012) for 
 exponential and a call or put option, but rely heavily on this particular structure. For our life insurance applications, it is crucial that 
 is again PH when 
 and 
 are independent. The details involve the Kronecker product ⊗ and sum ⊕ so that:
For two Markov process generators, the Kronecker sum is the generator of independent versions, and so, 
 is PH with generator 
 of dimension 
 where 
 is the PH generator of 
. Written out with lexicographical order of the states:
Example 13. We consider a HWB or GMDB contract signed at age 35 of the insured and expiring at the time 
 of death or at age 70, whatever comes first. We took 
 and used the GC fit with 
, leading to the numerical values given in 
Table 6; in comparison, Monte Carlo simulation with 
 50,000 replications gave the 95% confidence interval 1.636 ± 0.034 for the HWB and 1.097 ± 0.029 for the GMDB.
The extrapolation is done via (14), with the required smoothness being obvious for the GMDB. For the HWB,  is not smooth at , but some rough heuristics (that we omit) suggest that (13) remains in force for this special f. Certainly, extrapolation appears to produce sensible results also for the GMDB. In both cases, it seems that  4–6 is enough to obtain sufficiently precise estimates, whereas without extrapolation, one would need at least . This is a considerable improvement upon simple Erlangization, since the dimension of the matrices in the computations (where inversion is the heavy part) is . This means that for , one would need to invert matrices of dimension 400 when , but 1000 when , and since the inversion of an  matrix has complexity  (or slightly less with sophisticated methods), this corresponds to reducing computation time by a factor of about .
   8. Conclusions
In this paper, we have considered pricing calculations for life insurance products like the guaranteed minimum death benefit and the high-water benefit. The payout occurs at the death time  of the insured, and its size is linked to an asset price S either via its current value  or to features of its past evolution during the time of the contract such as the maximum .
The key observation is that for the most popular model of 
S, an exponential jump diffusion, the distributions of 
 and/or 
 are available when the jumps and 
 are phase-type. This has long been known in the geometric Brownian motion setting without jumps when 
 is exponential and is essentially just the Wiener-Hopf factorization of Brownian motion. There is also an extensive literature for PH jumps, and for 
 itself being PH, the relevant Wiener-Hopf factorization has recently been developed in (
Asmussen and Ivanovs 2019).
The program thus involves two steps, first fitting a PH distribution of  to lifetime data and next implementing the pricing calculations.
The shape of lifetime data with a steep decline after the mode makes the fitting of PH distributions non-trivial, necessitating the number 
p of phases to be relatively high. However, we already made the points that a good fit is one that produces reliable estimates of the prices of the life insurance products, and that giving more than 2–3-digit precision in price estimates does not make sense due to issues such as uncertainty about the model and its parameters (think, e.g., of longevity risk) and bonus/dividend arrangements. We found 
 to be more than sufficient in the sense of providing appropriate accuracy of the calculated price of life insurance products and could well have worked with smaller values. An alternative is developed in GSY, to use matrix-exponentials fitted at a selected number of points, but we show in 
Appendix A that this approach has its pitfalls.
Fitting of so high-dimensional PH distributions is slow, though we developed in part some new software giving substantial speed-up. However, given this step has been passed, pricing calculations are both very easy to program and run very fast, also compared to Monte Carlo simulation, and this may be the main advantage of our approach. A limitation is of course that we need the pay-out to be of the form (
1), and there are many products for which this is not the case. Alternatives such as Monte Carlo simulation and traditional methods based on Thiele-type differential equations may then be more suitable. Therefore, we do not claim the superiority of our own approach for all purposes, but have rather tried to clarify its advantages and disadvantages.