THE STABILITY OF THE AGGREGATE LOSS DISTRIBUTION

: In this article we introduce the stability analysis of a compound sum: it consists 1 in computing the standardized variation of the survival function of the sum resulting from an 2 inﬁnitesimal perturbation of the common distribution of the summands. Stability analysis is 3 complementary to the classical sensitivity analysis, which consists in computing the derivative 4 of an important indicator of the model, with respect to a model parameter. We obtain a computational 5 formula for this stability from the saddlepoint approximation. We apply the formula to the compound 6 Poisson insurer loss with gamma individual claim amounts and to the compound geometric loss 7 with Weibull individual claim amounts. 8


Introduction
This article presents a computational formula for the stability of the survival function (s.f.) of the compound sum of independent and identically distributed (i.i.d.) random variables that are independent of their summation index.The compound sum typically represents the insurer total claim amount during a fixed period (e.g. a year): the i.i.d.random variables are the individual claim amounts and the number of claims within the period is a counting random variable or a counting stochastic process, if we let the period length vary.We define the stability of a sum as the standardized variation of the s.f. of the sum resulting from an infinitesimal perturbation at some point x ∈ R of the distribution of the summands.
More precisely, let ∆ x denote the Dirac distribution function (d.f.) over R with mass one at x (thus jumping from level 0 to level 1 at point x).If F denotes the d.f. of the summands, then is the ε-perturbation of F at x, for any choice of ε ∈ [0, 1].The derivative of the s.f. of the sum under F xε with respect (w.r.t.) ε evaluated at ε = 0 is the s.f.stability (s.f.s.) at the perturbation point x.
This concept differs from the one of sensitivity of queueing theory or risk theory, which is defined as the derivative of the s.f. of the sum w.r.t. a parameter of F; cf.e.g.Asmussen and Albrecher (2010), Section IV.9.From an abstract point of view, a parametric model spans only a low-dimensional or narrow subset of the space of probability distributions.Such a narrow subset is indeed beneficial to statistical data reduction, but often does not contain all realistic perturbations of the assumed model.
In this sense, the sensitivity is a limited indicator of the model stability.Allowing for perturbations in all possible directions provides a more complete or realistic analysis of the model stability.In this sense, our concept of stability is preferable.This concept is in fact an important idea of robust statistics, see e.g.Hampel et al. (1986).Mathematically, the quantity of interest of a stochastic model is regarded as a functional and a functional derivative is computed.This approach is used for example in renewal theory by Grübel (1989), where the renewal function is a functional of the lifetime distribution, or by Politis (2006), for the probability of ruin of the risk process.
Practically, for a given actuarial aggregate loss model in form of a compound sum, if a stability of low magnitude results from the perturbation of a new large individual claim amount (viz.a large value of x), then the loss model is reliable under perturbations through extreme large claims.In the context of uncertainty (where for example catastrophic events are not incorporated in the model), this notion of stability appears practically relevant.The s.f.s.informs the risk manager about the variation of the upper tail probability of the aggregate loss when an uncertain large claim amount is considered.
Still from the practical point of view, the sensitivity as described above has the alternative role of identifying important model parameters: the most significant ones have large sensitivity value.But this interpretation holds only when the model is really the correct one (which is often not simple to establish).Of course, both sensitivity and stability analyses can be carried out simultaneously.Field and Ronchetti (1985) considered this type of stability for the sample mean and called it "tail area influence function".Their applications concerned statistical testing.They computed the tail area influence function with the saddlepoint approximation of Daniels (1954).This article generalizes this approximation to the stability of the compound sum and suggests using this concept in risk management.The new formula is easy and fast to compute.A numerical illustration for the total claim amount with gamma individual claim amounts and Poisson number of claims is provided.
Most methods for computing sensitivities rely on Monte Carlo simulation; see e.g.Asmussen and Rubinstein (1999) and Asmussen and Glynn (2007), Section VII.One exception is Gatto and Peeters (2015), who proposes evaluating the sensitivity of the s.f. of the random sum w.r.t. the parameter of the summation index distribution (which is either Poisson or geometric) with the saddlepoint approximation.Gatto and Peeters (2015) shows numerically that the sensitivities obtained by the saddlepoint approximation and by simulation with importance sampling are very close, eventhough importance sampling is computationally intensive.The high accuracy of the saddlepoint approximation is well illustrated in the literature of statistics and applied probability; refer e.g. to Jensen (1995) or to Gatto and Mosimann (2012) in the context of risk theory.
The next parts of this article are the following.Section 2 provides the approximations to s.f.s.
based on the saddlepoint approximation.Section 2.1 considers the the deterministic sum and Section 2.2 the compound sum, viz. the insurer aggregate claim amount.Section 3 provides numerical illustrations.Section 3.1 considers the aggregate claim amount with Poisson distributed number of claims and gamma distributed individual claim amounts.In Section 3.2, the number of claim follows the geometric distribution and the individual claim amounts follow the Weibull distribution.Some related long derivatives are provided in the Appendix.

Saddlepoint approximation to the stability
This section has two parts: in Section 2.1 an approximation to s.f.s. of the deterministic sum is derived.It corresponds to the formula of Field and Ronchetti (1985), although the derivation is different.Section 2.2 generalizes the formula to the compound sum, which is an essential quantity of risk theory.

The sum
Let X 1 , . . ., X n be independent random variables with d.f.F, (moment generating function) m.g.f.M and (cumulant generating function) c.g.f.K = log M. Define the sample mean by Xn = ∑ n i=1 X i /n.
The Legendre-Fenchel transform (or convex conjugate or large deviations index) of the c.g.f.K and at point t ∈ R is given by where dom ϕ = {x ∈ R | |ϕ(x)| < ∞} is the domain of definition of ϕ : R → R. The transform K is clearly nonnegative.One can show that it is convex and that it attains its minimum at µ = E[X 1 ], when the expectation exists.Assume that the supremum is ( 2) is attained at v t ∈ int dom K.This condition is satisfied without restrictions on t when F is light-tailed, in the sense of having exponentially decaying tails.Under this assumption, v t solves w.r.t.v the equation and convexity tells that it is the unique solution.It is called the saddlepoint at t and K Define the sample mean by Xn = ∑ n j=1 X j /n and the s.f.Hn (t Although ( 4) is a large deviations approximation, the asymptotics is in the logarithmic scale of the probability.
This article is based on the saddlepoint approximation of Lugannani and Rice (1980), because it is known that it provides a very accurate approximation to the s.f.Hn (t).It has bounded relative error on the probability scale, instead of the logarithmic scale.From now on, we assume that F is absolutely continuous.Under this additional assumption, Lugannani and Rice's approximation to Hn (t) at t = µ is given by where φ and Φ are the standard normal density and d.f., respectively.The relative error of approximation ( 5) is O(n −1 ), as n → ∞.For comparison, (4) re-expressed in terms of the new variable r leads to the quite dissimilar approximation to Hn (t) given by √ 2πφ(n 1/2 r).
The s.f.s. of Xn at tail value t and perturbation point x ∈ R is given by the Gâteaux differential where P xε is the probability measure obtained by the replacement of the summand d.f.F by its The following result gives an approximation to the s.f.s.obtained from (5).
Theorem 1.Under the previous assumptions, the s.f.s. of Xn given in (7), at t = µ and at perturbation point x ∈ R, can be approximated by where s and r are given by ( 6), v t is given by (3) and The reminder term in ( 8) is given by (because for any Borel function g : R → R, R g(y)d∆ x (y) = g(x)).The perturbed saddlepoint v txε at point t ∈ R is defined by where Note that small perturbations do not affect the sign of v t when tail probabilities are considered.
Then, from the multivariate chain rule, Hence we obtain From the multivariate chain rule we obtain , see (A3) of the Appendix.These two last results yield (10).
The leading term of the approximation to the s.f.s. ( 8) is equal to the formula (3.1) in Field and Ronchetti (1985), which is however not derived from the saddlepoint approximation (5) but from the Laplace approximation to the integral of the saddlepoint approximation to the density of Daniels (1954).In order to control this equality, the following correspondences between the two notations can be useful: Thus Theorem 1 provides an alternative derivation of the s.f.s. of Field and Ronchetti (1985) as well as the exact form of the error term.However, numerical studies suggest that it is preferable using the first order term alone.
Regarding the sum, let S n = ∑ n j=1 X j , then P[S n ≥ t] = Hn (t/n) is its s.f., its saddlepoint x, F) is its s.f.s. and the saddlepoint approximation is τn (t/n; x, F).

The compound sum
Let the random variable X 1 , X 2 , . . .fulfill the assumptions given in Section 2.1 and let F denote their common d.f.Let N be an independent random variable taking values in {0, 1, . ..} with probability function p n = P[N = n], for n = 0, 1, . ... Consider the compound sum where X 0 = 0 by definition.Define the indicator I{A} as the function equal to 1, if the statement A is true, or equal to 0, if A is false.The s.f. of Z at t ∈ R can be written as which is generally not a computational formula.This section provides the saddlepoint approximation to ( 14) and then the associated approximation to the s.f.s.
In ( 14) we see that the distribution of Z is a linear combination of a distribution with mass one at zero and an absolutely continuous distribution.The mass at zero must be eliminated in order to apply the saddlepoint approximation.Denote by M N and K N the m.g.f. and the c.g.f. of N and by K the c.g.f. of X 1 .Then the m.g.f of Z is M Z = M N • K and its c.g.f. is K Z = K N • K. Let Z 0 be a random variable with the conditional distribution of Z given N > 0. Then HZ 0 (t) = P[Z ≥ t|N > 0] and K Z 0 (v) = log E[e vZ |N > 0] are the s.f. and the c.g.f. of Z 0 , respectively.The Legendre-Fenchel transform of the c.g.f.K Z 0 at t ∈ R is given by We assume that the supremum in ( 15) is attained at v t ∈ int dom K Z 0 .Under this assumption, v t solves w.r.t.v the equation The solution v t is unique and called saddlepoint at t.We obtain the saddlepoint approximation to HZ 0 (t) at t = E[Z 0 ], denoted ḠZ 0 (t), by the left side of ( 5) with n = 1 and with It follows from 16) can be re-expressed as More explicit expressions of s and r than those in ( 17) are obtained with see (A6) in the Appendix, and by It follows from HZ (t) = HZ 0 (t)(1 that the saddlepoint approximation to HZ (t) is given by The s.f.s. of Z is the Gâteaux differential where F x,ε is given by ( 1), ∀x ∈ R and ε ∈ [0, 1].The following result gives an approximation to the s.f.s.τ Z (t; x, F) obtained from the first order approximation of the s.f.s. of the mean given in Theorem 1.
Theorem 2. Under the previous assumptions, the s.f.s.given in ( 23), for the s.f. of Z at t = E[N]E[X 1 ]/(1 − p 0 ) and at perturbation x ∈ R, can be approximated by where τZ 0 (t; x, F) = −φ(r) r ṙx s , s and r are given by ( 17), ( 19) and ( 20), v t is given in ( 18), and Proof.This proof is similar to the one of Theorem 1 and so only the main arguments are given.Let x ∈ R and ε ∈ [0, 1].Let us define the perturbed m.g.f.M xε as in ( 11), K xε = log M xε and for v ∈ R, By following the reasoning that lead to (12) in the proof of Theorem 1, we obtain the perturbed saddlepoint at point t ∈ R as vtx = − where given by (A5) in the Appendix.With (18) it simplifies to where Kx (v) and K x (v) are respectively given in (A1) and (A2) of the Appendix.
By denoting r = r(F), we find for The multivariate chain rule yields where KZ 0 x = ∂/∂ε K Z 0 xε (v) | ε=0 is given in (A4) in the Appendix.By joining these two last expressions one obtains (25) in the theorem.Then ( 26) is obtained from (A4) and (A1) in the Appendix and from (18).
The s.f.s. of Z 0 is given by Thus it follows from (21) that This justifies (24) in the theorem.
Remark 1.Another approximation to the s.f.s. of Z can be obtained by generalizing the reminder term given in Theorem 1.This yields the approximation at t = E[Z 0 ] given by −φ(r) Assume that the individual claim amounts or losses X 1 , X 2 , . . .are gamma distributed, with density f (y) = β α /Γ(α)y α−1 e −βy , ∀y > 0, for some parameters α, β > 0. Let v < β.The c.g.f. of X 1 and its derivatives are given by The m.g.f. of the aggregate loss Z = ∑ N j=0 X j is given by and so the c.g.f. of Z 0 , viz.Z given N > 0, is given by With these formulae we can obtain the values of s, r and ṙx required in Theorem 2. So we can compute the s.f.s.τ(t; x, F) given in (24).
For the numerical illustration, we fix λ = 2, α = 2 and β = 3.The results are shown in Figure 1.The dashed curve shows the saddlepoint approximation ḠZ (t) to the s.f., see ( 22), for all relevant values of t.The four solid curves of Figure 1 show the approximation to the stability τZ (t; x, F), for the perturbation points x = 1, 2, 5, 10 and for relevant values of t.The highest curves correspond to the largest values of x.This is what we would have expected.A large perturbation point x yields a large increase of the upper tail probability, so a large value of the stability.A vanishing perturbation point x yields either a small increase or a decrease of the upper tail probability, so a small value of the stability.We should note that the numerical computation of these curves is very fast.Thus the proposed approximation to the s.f.s.inherits the well-known computational efficiency of the saddlepoint approximation.Any purely numerical technique (like Monte Carlo simulation) would be computational intensive and thus computationally slower.For a practical illustration, consider the following values from the setting of Figure 1: ḠZ (14.75) = 0.0099 1% and τZ (14.75; 10, F) = 0.7639.If the insurance believes that additional claim amounts of x = 10 with small frequency ε = 1‰ have to be considered, then the tail probability of the non-perturbed model would rise by 7%, because 0.0099 + 0.001 • 0.7636 = 0.0107.

Geometric-Weibull total claim amount
The suggested approximation is tested with a different aggregate loss model.Assume that the total number of claims N follows the geometric distribution with parameter ρ ∈ (0, 1), precisely p n = P[N = n] = ρ(1 − ρ) n , for n = 0, 1, . ... The m.g.f. of N and its derivatives at v < − log(1 − ρ) are given by Assume the individual losses X 1 , X 2 , . . .follow the Weibull distribution with density f (y) = αy α−1 exp{−y α }, ∀y > 0, for some α > 0. We can easily compute its moments µ exists for all v over a neighborhood of zero iff α ≥ 1.Thus, the Weibull distribution is light-tailed in this sense iff α ≥ 1.Therefore, the power series representation M(v) = ∑ ∞ k=0 µ k v k /k! holds for any v within a neighborhood of zero.Moreover, for v in this neighborhood, with M (0) = M.With this, the m.g.f. of the aggregate loss can be expressed as Z = ∑ N j=0 X j is given by and the c.g.f. of Z 0 can be written as These formulae allow us to compute s, r and ṙx of Theorem 2 and thus we can compute the s.f.s.τ(t; x, F) given in (24).
The dashed curve indicates the saddlepoint approximation ḠZ (t) to the s.f., cf. ( 22), for all relevant values of t.The four solid curves of Figure 2 show the approximation to the s.f.s.τZ (t; x, F), for the perturbation points x = 1/2, 3/2, 3, 7 and for relevant values of t.The highest curves correspond to the largest values of x.The numerical evaluation of the above series representations of m.g.f. and c.g.f.
does not give any particular problem: after few summands only, numerical convergence is obtained.
We note that the numerical results are similar in nature to the ones of the Poisson-gamma aggregate loss of Section 3.1.Also, as with the Poisson-gamma model, the approximate s.f.s.can be computed very fast.Thus it can be conveniently applied to practical problems and it provides an additional indicator of reliability of the model under uncertainty.

Appendix
This appendix provides various elementary but long derivatives appearing in the previous developments.
Appendix A.1 Derivatives of the cumulant generating function of the perturbed summand Recall that M and K denote the m.g.f. and the c.g.f. of X 1 .This section gives some derivatives of K(v) under the ε-perturbation, viz. of K xε (v) = log((1 − ε)M(v) + εe vx ), w.r.t. to v and ε.The results are the following:
The approximate s.f.s.(8) is obtained by differentiating w.r.t.ε the Lugannani and Rice saddlepoint approximation (5) at F xε and by evaluating it at ε