Gaussian and Affine Approximation of Stochastic Diffusion Models for Interest and Mortality Rates

In the actuarial literature, it has become common practice to model future capital returns and mortality rates stochastically in order to capture market risk and forecasting risk. Although interest rates often should and mortality rates always have to be non-negative, many authors use stochastic diffusion models with an affine drift term and additive noise. As a result, the diffusion process is Gaussian and, thus, analytically tractable, but negative values occur with positive probability. The argument is that the class of Gaussian diffusions would be a good approximation of the real future development. We challenge that reasoning and study the asymptotics of diffusion processes with affine drift and a general noise term with corresponding diffusion processes with an affine drift term and an affine noise term or additive noise. Our study helps to quantify the error that is made by approximating diffusive interest and mortality rate models with Gaussian diffusions and affine diffusions. In particular, we discuss forward interest and forward mortality rates and the error that approximations cause on the valuation of life insurance claims.


Introduction
For the most part of the 20th century, actuaries used to model the lifespan of an insured stochastically, but relied on a mere deterministic prognosis of capital returns and mortality rates.The past has shown that these assumptions can vary significantly within one contract period.Especially in recent years, financial markets have experienced increased volatility, and life expectancies have risen in many developed countries at an unforeseen rate.For two decades, actuaries have been increasingly using stochastic interest rate models, and since the last decade, stochastic mortality rate models have become more and more popular.On the one hand, the stochastic approach helps to better capture systematic risk by offering the possibility to calculate confidence intervals.On the other hand, investors and insurance regulators nowadays call for a market-consistent valuation of the insurers assets and liabilities, so that concepts from financial mathematics have entered actuarial science, where the market value of a claim is calculated as a risk-neutral expectation in a specific stochastic model.
In financial mathematics, stochastic modeling of interest rates is a common approach, and a great variety of different interest rate models can be found in the literature.The majority of authors models the interest rate with the help of diffusion processes.Typical short rate models are, for example, the Vasicek model and the Cox-Ingersoll-Ross model.The evolution of forward rates is often modeled within the Heath-Jarrow-Morton framework.Readers may refer to [1] for a comprehensive overview or to the original works, such as [2][3][4].In the nineteen nineties, these models also found their way into the actuarial literature; see e.g., [5][6][7][8][9][10][11].
The uncertainty about future mortality probabilities always drew the attention of life insurance actuaries.However, since recently, actuaries have been increasingly using stochastic models for the mortality rate.A very popular approach is to adopt the well-known models from interest rate theory, such as the Vasicek or the Cox-Ingersoll-Ross model.A more recent development is the concept of forward mortality rates, which are frequently modeled by a Heath-Jarrow-Morton framework similarly to forward interest rates.References that use diffusion process are, e.g., [12][13][14][15][16][17][18][19][20][21][22][23][24], just to mention a few.Although negative mortality rates do not have a meaningful interpretation, many of those references use diffusion models, where negative values occur with a positive probability.In particular, Gaussian diffusion processes are frequently used.This is justified with the argument that Gaussian models would offer a good approximation of more appropriate models.
In the present paper, we study the asymptotics of diffusions with affine drift and a general noise term to corresponding diffusions with an affine drift term, but an affine noise term or additive noise.In the additive case, we end up with Gaussian processes, that is, the finite-dimensional projections are multivariate, normally distributed.The affine case is a generalization of the additive case and allows for better approximations, but the corresponding distributions are not always analytically tractable.We introduce a reasonable approximation concept and calculate bounds for the absolute moments of the approximation errors.Furthermore, we calculate the error that the approximating processes imply when valuating insurance claims.Our basic notion is to study asymptotics relative to the size of the noise term; the smaller the noise term, the better the approximation.Our results help to better assess the quality of Gaussian and affine approximations for interest and mortality rate models.
After introducing a basic life insurance model in Section 2, we study Gaussian approximations in Section 3 and affine approximations in Section 4 and present theoretical results for the asymptotic behavior.The proofs are put into their own section at the end; see Section 7. In Section 5, we discuss the effect of the Gaussian and affine approximations on the valuation of insurance claims.A numerical illustration is given in Section 6.In Section 8, we summarize our findings.

Basic Life Insurance Framework
Assume that the lifetime of a policyholder is described by a nonnegative random variable T .Time t measures the seniority of the policy (i.e., the time elapsed since policy issue).Policyholder's age at policy issue is denoted as x, so that the age at time t is x + t.Suppose that the distribution function P [T ≤ x + t] of the lifetime T has a representation of the form: where m is the so-called mortality rate (or force of mortality).We consider a general life insurance contract with both survival and death benefits.We write B(t) for the aggregated survival benefits minus premiums on [0, t] and c(t) for the benefit payments that are due in the case of a death at time t.The contract terminates at time ω x = ω − x < ∞, at the latest, where ω is the maximal value that T can take, possibly infinite.
Let r(t) be the short rate.Then, the discounted sum of future benefits minus premiums for a policyholder alive at time t is given by: In classical life insurance modeling, r and m are integrable functions, B is a right-continuous function with finite variation and c is a bounded function.Therefore, in the classical perspective, r, m, B and c are deterministic, and the prospective reserve at time t in the active state is defined as: and has the representation: In insurance practice, the functions r and m are not perfectly known for the future, but have to be forecasted.In order to capture the corresponding uncertainty, we construct a doubly-stochastic model by assuming that: • (r(t)) t≥0 is a stochastic process with continuous paths, • (m(t)) t≥0 is a stochastic process with continuous paths, • (F t ) t≥0 is the natural filtration generated by the joint process, (r, m).
We can embed the classical model, where r and m are deterministic, into the doubly-stochastic model by interpreting Formula (1) as the conditional distribution function P [T ≤ x + t|(r(t), m(t)) t≥0 ] := P [T ≤ x + t|F ∞ ].Doing that for all x + t, Formula (1) uniquely defines a stochastic kernel (A, (r, m)) → P [T ∈ A|(r(t), m(t)) t≥0 = (r, m)].The prospective reserve according to Equation ( 4) is now interpreted as the conditional expectation: (5) Moreover, B and c may here be (F t ) t≥0 -adapted stochastic processes, as well.For a rigorous definition of the integrals in Equation ( 4), we assume that B is a semimartingale and c, a bounded process.In the following, we work throughout in the doubly-stochastic framework.Suppose that we are at time s, where the information F s is available.The forward interest rate, (ρ s (t)) t≥s , is defined via a zero-coupon bond, i.e., as the F s -measurable solution of: where the underlying probability measure represents here some market measure.In the present paper, we suppose throughout that the underlying probability measure is given by the context of the application and that in the following all expectations are calculated on the basis of that probability measure.For our asymptotic approximation results, the origin of the underlying probability measure will not be relevant.
In [12,13,16,19], a forward mortality rate (µ s (t)) t≥s is defined as the F s -measurable solution of: In [25], we find a spread rate concept where the forward mortality rate is either defined as the solution of: or as the solution of:

Gaussian Diffusion Approximation
In this section, we discuss in some generality the approximation of diffusions with affine drift and a general noise term by corresponding Gaussian diffusions with an affine drift term and additive noise.
Let X be a diffusion process that satisfies the stochastic differential equation where ς ∈ [0, 1] is a scaling parameter and: (a) the column vector W (t) is a d-dimensional standard Wiener process, d ∈ N, (b) the mapping, (x, t) → α(x, t) x + β(t), which is composed of the bounded and measurable mappings α where ∥ • ∥ is the Euclidean norm.
These assumptions are sufficient to ensure that the stochastic differential equation according to (10) has a unique solution with finite moment functions of any order; see Section 4.5 in [26].As the dt term in Equation (10) can be seen as an affine mapping of X(t), we say that X has an affine drift term.The scaling parameter, ς, will be the reference parameter for studying asymptotic properties of X.For ς = 0, we obtain the differential equation of the expectation function, E[X(t)], of X(t): To see that this is indeed the differential equation of the expectation function, integrate Equation ( 10) from zero to t; take expectation, and then, differentiate again.Apart from a few specific choices of σ, the distribution of X is not analytically tractable.Calculations are much easier if σ(X(t), t) is deterministic, since then, X satisfies a stochastic differential equation with so-called additive noise, and X is a Gaussian process, i.e., its finite dimensional projections are multivariate, normally distributed.With respect to the mean squared error, the best deterministic substitute for σ(X(t), t) is E[σ(X(t), t)], as the following proposition shows.
Proposition 3.1.Let the process X * be defined by: and for a bounded and measurable function S : [0, ω] → R 1×d , let Y be a process defined by: Then, for all choices of y 0 and S, we have: Proof.By applying Ito's formula: and taking expectations on both sides, we obtain: The latter integral equation has the unique solution: The right-hand side is minimal if x 0 equals y 0 and S(s) equals E[σ(X(s), s)], since: In the insurance applications that we have in mind, the distribution of X is difficult to get, and so, we also have difficulties in calculating E[σ(X(t), t)].Therefore, we suggest approximating X by substituting σ(X(t), t) with σ(E[X(t)], t) instead of E[σ(X(t), t)], which leads to a Gaussian diffusion X, defined by: Note that the expectation functions of X and X are equal and that they can be easily obtained from Equation (11).If the difference between X(t) and E[X(t)] is small, then X should be a good approximation for X.The fluctuation of X(t) − E[X(t)] is small whenever ς is small, because a small ς means that we have only little noise.The following theorem gives an asymptotic result on the absolute moments of the approximation error X(t) − X(t).
Theorem 3.2.Assume that the properties (a) to (d) hold.Then, for each (k, l) ∈ N 2 0 , there exists a constant C kl < ∞ in such a way that: For the proof, see Section 7. From this theorem, we learn that approximating X(t) by its expectation function, E[X(t)], leads to a mean absolute error of order ς 1 , and approximating X(t) by the corresponding Gaussian process, X(t), leads to a mean absolute error of order ς 2 .The same is true for the n-th roots of the n-th order moments for n ∈ N. The following example shows that the order of convergence cannot be generally improved.
Example 3.3.Let α(t) = 1/2 and σ(t, x) = x.Following the ideas of the proof of Theorem 3.2, we can show that: This integral equation has the unique solution: The integral on the right-hand side is positive for each t, so the left-hand side has exactly the order ς 2 .
A natural question is whether approximating X by X * instead of X leads to a faster convergence than ς 2 .In the previous example, X * and X are equivalent, so order ς 2 also applies for X * .Hence, in general, the optimal approximation X * does not converge faster than the approximation X.
All in all, we can conclude that in the case that interest and mortality rates are diffusion processes of the form of Equation (10) and the noise term is 'small', the corresponding Gaussian diffusion, Equation (19), approximates the moment functions quite well.In the proof of Theorem 3.2, the estimates for the constants C k,l increase with an increasing time interval [0, ω], function α, Lipschitz constant K and order k of the absolute moments.The following Corollary states some bounds explicitly.Corollary 3.4.Under the assumptions of Theorem 3.2, for all t ∈ [0, ω] and ς ∈ [0, 1] we have For the proof, see the details of the proof of Theorem 3.2.

Affine Diffusion Approximation
In the previous section, we simplified the diffusion process X by making the factor σ(X(t), t) in the noise term of X deterministic.In this section, we study more general approximations by substituting σ(X(t), t) with the first-order Taylor approximation: Given that the partial differential ∂ x σ exists, we end up with the diffusion process: As σ lin is an affine mapping in X(t), we say that X has an affine noise term, and we call X an affine diffusion approximation of X (recall that the drift term of X is also affine).Note that the expectation functions of X and X are equal and that they can be easily obtained from Equation (11).The following theorem shows that X approximates X faster than X approximates X.
Theorem 4.1.Under the assumptions (a) to (d) in Section 3 and given that ∂ xx σ(x, t) exists and is bounded by K, for each (k, l) ∈ N 2 0 , there exists a constant, C kl < ∞, in such a way that: For the proof, see Section 7. According to the previous section, approximating X(t) by the Gaussian process X(t) leads to a mean absolute error of order ς 2 .Theorem 4.1 shows that approximating X(t) by X(t) leads to a mean absolute error of order ς 3 .The same is true for the n-th roots of the n-th order moments for n ∈ N.
Corollary 4.2.Under the assumptions of Theorem 4.1, for all t ∈ [0, ω] and ς ∈ [0, 1], we have: While X has a Gaussian distribution whose mean and variance can be easily calculated, the distribution of X must often be calculated numerically.In applications, the user has to decide decide between the high speed of convergence and analytical tractability.

Approximation Error When Valuating Insurance Claims
Suppose that the spot rates r and m and the forward rates ρ s and µ s are diffusion processes of the form of Equation (10) and that we approximate them by Gaussian or affine diffusions.Then, the approximation error for the corresponding moment functions can be described with the help of the results in the two previous sections.However, in insurance practice, it is rather important to know the approximation error that we obtain in the formulas (4) to (9).All of those valuation formulas are based on: Therefore, we now study the mean approximation error in these two terms if the Gaussian and the affine diffusion approximation are used.
Theorem 5.1.Assume that X satisfies (10) with properties (a) to (d) and that X(t) is nonnegative for all t ≥ 0.Then, there exists a constant, C, such that: and: is finite, then we also have: For the proof, see Section 7. From this theorem, we learn that the Gaussian and the affine diffusion approximation lead to valuation errors of order ς 2 and ς 3 when valuating typical insurance claims.Approximating X by its expectation function is also an option, but the speed of convergence is just ς 1 .Remark 5.2.In the present paper, we only study two-state life insurance with the states 'active' and 'dead'.However, our results can be easily generalized to multi-state life insurance whenever the corresponding transition probabilities have explicit representations with building blocks of the form of (31).This is typically the case for Markovian state processes with a hierarchical transition matrix, i.e., there are no recurrent states.

Numeric Illustration
We now give a numeric example that illustrates the asymptotics of the Gaussian and the affine diffusion approximation.Let X be a diffusion process that satisfies the stochastic differential equation: Since X starts at x 0 = 1 and has a positive drift whenever X(t) ≤ 4 and lim x↓0 σ(t, x) = 0, the process X is always nonnegative.The corresponding expectation function equals: ) in red and σ lin (t, x) in blue for t = 0 (left) and t = 10 (right).
Figure 1 shows the mappings x → σ(t, x) (green curve), x → σ(t, E[X(t)]) (red curve) and x → σ lin (t, x), for t = 0 and t = 10.We see that σ(t, E[X(t)]) and σ lin (t, x) approximate σ(t, x) only for x close to E[X(t)] quite well.Figure 2 shows a simulated path of X (green curve), the Gaussian approximation, X (red curve) and the affine diffusion approximation X (blue curve).For all three processes, we used the same random numbers; that is, all paths correspond to the same path of the Brownian motion, W .The smaller ς, the closer are the processes to the expectation function, t → E[X(t)]. Figure 3 shows the relative differences to the reference process X. Recall that the mean absolute difference between X(t) and E[X(t)] is of order ς 1 , that the mean absolute difference between X(t) and X(t) is of order ς 2 and that the mean absolute difference between X(t) and X(t) is of order ς 3 .Indeed, E[X(t)] has the greatest approximation error here.For ς = 1 and ς = 0.75, the errors of the Gaussian and the affine diffusion approximations have about the same magnitude, but for ς = 0.25, we can see a clear advantage for the affine approximation, because of its faster speed of convergence.Table 1 shows the corresponding value of: and the corresponding values of the expectation approximation, Gaussian approximation and affine approximation: The factor 0.01 was added, because interest and mortality rates typically have magnitudes that are in the percent range.The expectation approximation clearly produces the greatest error for all sizes of ς.Interestingly, already for ς = 0.75, the error of the Gaussian and the affine approximation is quite small.As predicted by the theorems in the previous sections, the smaller ς, the more does the faster convergence speed of X prevail and the greater is indeed the advantage of the affine approximation.Table 2 shows Monte Carlo simulations for the expectations of ( 40) and ( 41) and for the mean absolute deviation of (41) from (40).We used 10,000 simulations.The affine approximation has the smallest mean absolute deviations, followed by the Gaussian approximation.This conforms with our theoretical results from the previous section.Already for ς = 0.75 the Gaussian and the affine approximation for the expectation of (40) work quite well.Surprisingly, for ς = 0.5, 0.25 the Gaussian approximation seems to be closer to the true value than the affine approximation, but that may well be a random effect from the Monte Carlo simulation. and The twice continuously differentiable function: is for any ε > 0 a majorant of x → |x| on R. Thus, property (20) holds if for an ε > 0, there exists a constant, C k,l,ε < ∞, with: In contrast to x → |x|, the function g ε is differentiable at zero, which allows us to apply Ito's formula.We prove the theorem by induction.(k, l) = (0, 0), we can choose C 0,0 = 1.The sequence of induction steps is tricky here.At first, induction arguments are presented; then, we explain the sequence of induction steps.In order to avoid case differentiations, we define < z > := max{z, 0} and E k,l t := 0 for all (k, l) ∈ Z 2 \ N 2 0 .Applying Ito's formula, we get: Taking expectations on both sides makes the last integral zero, since its integrand is square integrable (to see that, apply Theorem 4.5.4 in [26] and Hölders inequality).Since: the expectation of the left-hand side has for k ̸ = 1 and l ̸ = 1 an upper bound of: Hence, since we assumed that ς ≤ 1, we obtain: If inequality (20) holds for {(k − 2, l + 2), (k 1, l + 1), (k − 1, l), (k, l − 2)}, then we can show that: For ε → 0 and with Gronwall's inequality (cf.Lemma 4.5.1 in [26]), we get: for all (k, l) ∈ N 2 0 with k ̸ = 1 and l ̸ = 1.Let now k = 1 and/or l = 1.Then, analogously to the above, but additionally using that 0 ≤ g ′′ ε (x) ≤ 3/(2ε) for x ∈ R, we can show that: if inequality (20) holds for {(k − 2, l + 2), (k − 1, l + 1), (k − 1, l), (k, l − 2)}, where δ ij is defined as one if i equals j and zero, else.Setting ε = ς 2k+l and applying Gronwall's inequality, we can conclude that: whenever k = 1 and/or l = 1.By using Inequalities (49) and (51), we do the induction steps along the lines: {(0, 0), (0, 1), (0, 2), ...}, {(1, 0), (1, 1), (1, 2), ...}, {(2, 0), (2, 1), (2, 2), ...} and so forth, which ensures that being at (k, l), we already have been at Proof of Theorem 4.1.The proof is completely analogous to the proof of Theorem 3.2, apart from the fact that the terms: have to be substituted by the terms: ∥σ(X(t), t) − σ lin ( X(t), t)∥ 2 , (σ(X(t), t) − σ lin ( X(t), t))σ lin ( X(t), t) T , ∥σ lin ( X(t), t)∥ 2 (53) and that the upper estimates for ∥σ(X(s), s) − σ(X(s), s)∥ and ∥σ(X(t), t)∥ have to be replaced by the upper estimates: for ξ(t) between X(t) and X(t), and: since |∂ x σ(x, t)| ≤ K, because of the Lipschitz continuity of x → σ(x, t).Furthermore, we have to set ε = ς 3k+l instead of ε = ς 2k+l .The order of the induction steps stays the same.
Proof of Theorem 5.1.Taylor's theorem applied on the function x → exp{−x} yields: for k ∈ N and a proper ξ between zero and h.Setting k = 0, for ξ between zero and − ∫ t s X(u) − E[X(u)]du.Since X is nonnegative, the random variable, ξ, is bounded by ∫ t s E[X(u)]du.Applying Theorem 3.2, we obtain (32).Applying inequality (32) and using the fact that the continuous function t → E[X(t)] is bounded on [0, ω], inequality (34) is equivalent to having an upper bound of order ς 1 for: Since X is nonnegative, we can bound the previous term by: Thus, we showed inequality (34).Applying Taylor's formula with k = 1, we get: Analogously, we can show that: for ξ not greater than the absolute of ∫ t s X(u) − E[X(u)]du.Taking the difference of the two equations, we obtain: According to Theorem 3.2, the first addend on the right-hand side is of order ς 2 .Since exp{ξ} has a deterministic upper bound and: The second addend on the right-hand side of inequality ( 63) is also of order ς 2 .Using Hölder's inequality, the third addend on the right-hand side of inequality (63) has an upper bound of: because of Theorem 4.1.Note that E[exp{2 ξ}] has a finite upper bound, since the absolute of ξ is bounded by the absolute of ∫ t s X(u) − E[X(u)]du, where the latter term has a normal distribution, as the process X has a stochastic differential equation with additive noise.Hence, inequality (33) holds.Applying inequality (33) yields that inequality (35) is equivalent to having an upper bound of order ς 2 for: With Taylor's approximation for k = 0, the previous term has an upper bound of: Using the fact that exp{ξ} has a deterministic upper bound and by applying Hölder's inequality, the first addend has an upper bound of: and the second addend has an upper bound of: Thus, we showed inequality (35).Now, we look at inequality (36), which we prove analogously to inequality (33).Applying Taylor's inequality for k = 2, we obtain: According to Theorem 3.2, the first addend on the right-hand side is of order ς 3 .With the help of the Binomial theorem and Hölder's inequality, the second addend has an upper bound of: The first factor is of order ς 3 , while the second factor has a finite upper bound.Similarly to the estimates for the second and third addend on the right-hand side of inequality (63), we can show that the third and fourth addend on the right-hand side of inequality (73) have upper bounds of order ς 3 .Thus, we showed inequality (36).Applying inequality (36) yields that inequality (37) is equivalent to having an upper bound of order ς 3 for: With Taylor's approximation for k = 1, the previous term has an upper bound of: The first addend is of order ς 3 .The second addend has an upper bound of: which has an order of ς 4 .To see that, apply Hölder's inequality, Theorem 3.2 and Theorem 4.1.The fourth and fifth addend of the term above have an order of ς 3 , which can be verified similarly to the estimates for the second and third addend on the right-hand side of inequality (63).

Conclusions
Because of nice analytical features, in the actuarial literature, stochastic models for interest and mortality rates often have the form of Gaussian diffusions with an affine drift term and additive noise.The Gaussian diffusion framework is justified with the argument that it would reasonably approximate the true development of interest and mortality rates.We studied approximation errors if appropriate models for interest and mortality rates have the form of diffusions with affine drift, but a general noise term.We calculated theoretical bounds for absolute moments and valuation formulas, showing the speed of convergence.A numerical study illustrates the approximation error, confirming the theoretical results.Our results indicate that approximation errors are reasonable if the noise term is not too large.In particular, that might be the case when modeling the mortality intensity, since demographic changes of mortality typically happen slowly, but steadily.
We generalized the Gaussian diffusion approximation with additive noise by introducing an affine diffusion approximation with an affine noise term.We found that with an affine noise term instead of additive noise, the speed of convergence can be improved from ς 2 to ς 3 .However, we lose the nice analytical features of the Gaussian approach.
All in all, we identified a large class of diffusion processes that can be well approximated by Gaussian diffusions with additive noise or by affine diffusions with affine noise terms.As a consequence, we can say that modeling interest and mortality rates by Gaussian diffusions is well justified if there is evidence that the true developments of future interest and mortality rates have the form of diffusion processes with an affine drift term and an arbitrary, but small, noise term.
In the present paper, we measured approximation quality by point-wise moment errors and by mean absolute errors for some specific insurance claims.Future research should consider also other approximation measures.

Table 1 .
One-path simulation for (40) and its corresponding expectation approximation, Gaussian approximation and affine approximation.

Table 2 .
Monte Carlo simulation for the expectations of (40) and (41) and for the mean absolute deviation of (41) from (40).