Next Article in Journal
Evolutionary Approaches to the Identification of Dynamic Processes in the Form of Differential Equations and Their Systems
Next Article in Special Issue
Coordinate Descent for Variance-Component Models
Previous Article in Journal
Classification of the Structural Behavior of Tall Buildings with a Diagrid Structure: A Machine Learning-Based Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimating Tail Probabilities of Random Sums of Phase-Type Scale Mixture Random Variables †

School of Mathematics and Physics, The University of Queensland, Brisbane, QLD 4072, Australia
*
Author to whom correspondence should be addressed.
This paper is an extended version of our paper published in WSC2016 and MODSIM2021.
Algorithms 2022, 15(10), 350; https://doi.org/10.3390/a15100350
Submission received: 5 August 2022 / Revised: 23 September 2022 / Accepted: 23 September 2022 / Published: 27 September 2022
(This article belongs to the Special Issue Algorithms in Monte Carlo Methods)

Abstract

:
We consider the problem of estimating tail probabilities of random sums of scale mixture of phase-type distributions—a class of distributions corresponding to random variables which can be represented as a product of a non-negative but otherwise arbitrary random variable with a phase-type random variable. Our motivation arises from applications in risk, queueing problems for estimating ruin probabilities, and waiting time distributions, respectively. Mixtures of distributions are flexible models and can be exploited in modelling non-life insurance loss amounts. Classical rare-event simulation algorithms cannot be implemented in this setting because these methods typically rely on the availability of the cumulative distribution function or the moment generating function, but these are difficult to compute or are not even available for the class of scale mixture of phase-type distributions. The contributions of this paper are that we address these issues by proposing alternative simulation methods for estimating tail probabilities of random sums of scale mixture of phase-type distributions which combine importance sampling and conditional Monte Carlo methods, showing the efficiency of the proposed estimators for a wide class of scaling distributions, and validating the empirical performance of the suggested methods via numerical experimentation.

1. Introduction

Tail probabilities of random sums have attracted the interest of researchers for decades, as they are important quantities in many fields; for instance:
  • In insurance risk theory, these quantities correspond to the ruin probabilities associated with certain initial capital, and the random sum is a geometric sum of ladder heights (integrated tails of the claim sizes) in a claim surplus process; see, for example, [1].
  • In queueing theory, stable single server Markovian queues with service times have a geometric length in equilibrium, and these quantities correspond to the probabilities that an arriving customer must wait longer than a certain time; see, for example, [2].
Estimation from sample data and density approximation with phase-type distributions was considered in [3], and parameter estimation for the class of discrete scaled phase-type distributions was subsequently treated in [4]. Maximum likelihood estimation via an expectation-maximisation (EM) algorithm was exploited in both papers.
Estimating tail probabilities of random sums has been extensively investigated both for light- and heavy-tailed summands. Classical methods, including large deviations, saddlepoint approximations, and exponential change of measure, are most commonly used. However, the application of these methods requires the existence of the moment generating function (MGF) of the summand, so are limited to light-tailed cases. In the heavy-tailed setting, subexponential theory provides asymptotic approximations but these offer poor accuracy for moderately large values ([5]). From a rare-event simulation perspective, there are a few efficient estimators proposed in the literature ([6,7,8]). Simulation methods for the heavy-tailed case often require the cumulative distribution function (CDF) or probability density function (PDF) of the summand, so these are frequently not easily implementable.
In this paper, we consider the problem of efficiently estimating the quantity
( u ) = P ( S N > u ) ,
where S N = Z 1 + + Z N , for large u. Specifically, N is a discrete light-tailed random variable supported over the positive integers and { Z i } i N is independent of N and forms a sequence of independent, non-negative, and identically distributed random variables having stochastic representation Z i = W i X i , where the random variables W i and X i are non-negative and independent of each other.
In our setting, the sequence { Z i } i N belongs to the class of W-mixture of phase- type distributions (PH distributions), which can be represented as a product W i X i , where each W i is a non-negative but otherwise arbitrary random variable and X i is a PH-distributed random variable. The concept of W-mixture of PH distributions can be found in [9]. Bladt et al. [10] proposed a subclass of W-mixture of PH distributions to approximate any heavy-tailed distribution. Such a class is very attractive in stochastic modelling because it inherits many important properties of PH distributions—including being dense in the class of non-negative distributions and closed under finite convolutions—while it also circumvents the problem that individual PH distributions are light-tailed, implying that the tail behaviour of a heavy-tailed distribution cannot be captured correctly by PH distributions alone. Reference [11] showed that a distribution in the W-mixture of PH distributions class is heavy-tailed if and only if the random variable W i has unbounded support.
The CDF and PDF of a distribution in the W-mixture of PH distributions class are both available in closed form but these are given in terms of infinite dimensional matrices. The problem of approximating tail probabilities of scale mixture of PH-distributed random variables is not easily tractable from a computational perspective. Bladt et al. [10] addressed this issue and proposed a methodology which can be easily adapted to compute geometric random sums of scale mixture of PH distributions. Their approach is based on an infinite series representation of the tail probability of the geometric sum which can be computed to any desired precision at the cost of increased computational effort. In this paper, we explore an obvious alternative approach: rare-event estimation.
More precisely, we propose and analyse simulation methodologies to approximate the tail probabilities of a random sum of scale mixture of PH-distributed random variables. We remark that since the CDF and PDF of a distribution in the W-mixture of PH distributions class is effectively not available, most algorithms for the heavy-tailed setting discussed above cannot be implemented. Our approach is to use conditioning arguments and adapt the Asmussen–Kroese estimator proposed in [7]. The Asmussen–Kroese approach is usually not directly implementable in our setting because the CDF of the product W i X i is typically not available. We address this issue using a simple conditioning argument on either the PH random variable X i or the scaling random variable W i , thereby simply simulating the PH distribution and using the CDF of W i or simulating the scaling distribution and using the CDF of X i . Moreover, we explore the use of importance sampling (IS) on the last term of the summands.
The key contributions of this paper are as follows:
  • Developing a number of rare event simulation methodologies;
  • Proving the efficiency of proposed estimators under certain conditions;
  • Exploring the proposed estimators through various numerical experiments.
The remainder of the paper is organised as follows. In Section 2, we provide background knowledge on PH distributions and scale mixture of PH distributions. In Section 3, we introduce the proposed algorithms. In Section 4, we prove the efficiency of the estimators proposed. In Section 5, we present the empirical results for several examples. Section 6 provides some concluding remarks and an outlook to future work.

2. Preliminaries

In this section, we provide a general overview of PH distributions and scale mixture of PH distributions.

2.1. Phase-Type Distributions and Properties

PH distributions have been used in stochastic modelling since being introduced in [12]. Apart from being mathematically tractable, PH distributions have the additional appealing feature of being dense in the class of non-negative distributions. That is, for any distribution on the positive real axis, there exists a sequence of PH distributions which converges weakly to the target distribution (see [2] for details). In other words, PH distributions may arbitrarily closely approximate any distribution with support on [ 0 , ) .
In order to define a PH distribution, we first consider a continuous-time Markov chain (CTMC)  { Y ( t ) , t 0 } on the finite state space E = { 1 , 2 , , p } { } , where states 1 , 2 , , p are transient and state ∆ is absorbing. Further, let the process have an initial probability of starting in any of the p transient phases given by the 1 × p probability vector α , with α i 0 and i = 1 p α i 1 . Hence, the process { Y ( t ) : t 0 } has an intensity matrix (or infinitesimal generator) Q of the form:
Q = T t 0 0 ,
where T is a p × p sub-intensity matrix of transition rates between the transient states, t is a p × 1 vector of transition rates to the absorbing state, and 0 is a 1 × p zero row vector.
The (continuous) PH distribution is the distribution of time until absorption of { Y ( t ) : t 0 } . The 2-tuple ( α , T ) completely specifies the PH distribution, and is called a PH-representation. The CDF is given by F ( y ) = 1 α exp ( T y ) 1 , y 0 , where 1 is a column vector with all ones.
Besides being dense in the non-negative distributions, the class of continuous PH distributions forms the smallest family of distributions on R + which contains the point mass at zero and all exponential distributions, and is closed under finite mixtures, convolutions, and infinite mixtures (among other interesting properties) (see [12]).

2.2. Scale Mixture of Phase-Type Distributions

A random variable Z of the form Z : = W · X is a scale mixture of PH-distributed if X F , where F is a PH distribution, and W H , where H is an arbitrary non-negative distribution. We call W the scaling random variable and H the scaling distribution. It follows that the CDF of Z can be written as the Mellin–Stieltjes convolution of the two non-negative distributions F and H:
B ( z ) = 0 F ( z / w ) d H ( w ) = 0 H ( z / x ) d F ( x ) , z 0 .
The integral expression above is available in closed form in very few isolated cases. Thus, we should rely on numerical integration or simulation methods for its computation.
The key motivation for considering the class of scale mixture of PH distributions over the class of PH is that the latter class forms a subclass of light-tailed distributions while a distribution in the former class with scaling random variable having unbounded support is heavy-tailed ([11]). Hence, the class of scale mixture of PH distributions turns out to be an appealing tractable class for approximating heavy-tailed distributions. Recall that a non-negative random variable X has a heavy-tailed distribution if and only if E [ e θ X ] = , θ > 0 , equivalently, if lim sup x P ( X > x ) e θ x = , θ > 0 . Otherwise, we say X is light-tailed.

3. Simulation Methods

In this section, we introduce our rare-event simulation estimators for ( u ) . The key approach is combining an Asmussen–Kroese-type algorithm with conditional Monte Carlo and further with importance sampling when necessary. The method is inspired by using the tower property of expectation E ^ ( u ) = E E [ ^ ( u ) | T ] where we have conditioned on some statistic T, and so in practical terms, one should be able to simulate T and compute E [ ^ ( u ) | T ] .

3.1. Asmussen–Kroese-Type Algorithm

The Asmussen–Kroese estimator [7] is efficient for estimating tail probabilities of light-tailed random sums of heavy-tailed summands [13].
The key idea is based on the law of total probability and P ( S n > u , max { Z i , i = 1 , , n } = Z i ) being the same for i = 1 , , n (as { Z i , i = 1 , , n } are i.i.d. random variables). We then have the following identity:
P ( S N > u | N = n ) = n P ( S n > u , max { Z i , i = 1 , , n } = Z n ) = n E B ¯ ( Z n 1 ( u S n 1 ) ) ,
where B ¯ ( · ) is the complementary CDF of Z i , Z n 1 = max { Z i , i = 1 , , n 1 } , S n 1 = i = 1 n 1 Z i , and x y = max ( x , y ) .
In our setting, the summands Z i = W i X i are heavy-tailed whenever W i has unbounded support [11] and so it is natural to consider this estimator here. Unfortunately, the Asmussen–Kroese approach is usually not directly implementable in our setting because the CDF of Z i is typically unavailable.
Instead, we consider a simple modification, by conditioning on a single scaling random variable W N or on the PH random variable X N . We further consider applying a change of measure to this single random variable. Conditioning on N, the Asmussen–Kroese estimator (without applying the control variate method for N) for ( u ) takes the form
^ AK ( u ) = N B ¯ Z N 1 ( u S N 1 ) .
In our context, when conditioning on the W N associated with Z N , we arrive at
^ ConAK 1 ( u ) = N F ¯ Z N 1 ( u S N 1 ) W N ,
where F ¯ ( · ) is the complementary CDF of a PH random variable.
When conditioning on the XN associated with Z N , we arrive at
^ ConAK 2 ( u ) = N H ¯ Z N 1 ( u S N 1 ) X N ,
where H ¯ ( · ) is the complementary CDF of a scaling random variable.
We then consider improving the efficiency of our algorithm by implementing IS over the distribution H or F, ensuring that the random sum after conditioning is equal in expectation to u. Inspired by popular methodologies drawn from light-tailed and heavy-tailed problems, we suggest two alternative approaches.

3.1.1. Exponential Twisting

The exponential twisting method is asymptotically optimal for tail probabilities of random sums with light-tailed summands ([14]).
Definition 1.
The estimator ^ ( u ) generated using the probability measure P is said to be an asymptotically optimal estimator of E P [ ^ ( u ) ] if and only if
log E P [ ^ ( u ) 2 ] / log E P [ ^ ( u ) ] 1 .
The method is specified as follows:
Define an exponential family of PDFs { f θ , θ Θ } based on the original PDF of any random variable X, f, via
f θ ( x ) = e θ x M X ( θ ) f ( x ) = e θ x ln M X ( θ ) f ( x ) ,
where M X ( θ ) = e θ x f ( x ) d x is the MGF of X.
The likelihood ratio of a single element associated with this change of measure (i.e., using measure f θ ( x ) d x instead of f ( x ) d x ) is
f ( X ) f θ ( X ) = e θ X + ζ ( θ ) ,
where ζ ( θ ) = ln M X ( θ ) is the cumulant function of X. Then, the twisted mean is μ θ = E θ [ X ] = ζ ( θ ) ([15]).
The selection of a proper twisting parameter θ is a key aspect in the implementation of this method. For dealing with the random sum (1), we select the twisting parameter θ such that the changed mean of the random sum is equal to the threshold, that is, E θ [ S N | N = n ] = u . Note that E θ [ i = 1 N Z i | N = n ] = n E [ W ] E θ [ X ] .
For the case of conditioning on light-tailed scaling random variables, we combine the estimator (3) with the above importance sampling method, and arrive at the following estimator for a single replicate:
^ ConAK 1 + IS ( u ) = N F ¯ Z N 1 ( u S N 1 ) W N × h ( W N ) h θ ( W N ) ,
where W N is a generic copy of the scaling random variable, and h is the PDF of W N .
We further combine the estimators (3) and (5) with a control variate for N to improve the efficiency, which is based on the subexponential asymptotics lim u P ( S N > u ) E [ N ] B ¯ ( u ) = 1 :
^ ConAK 1 + CV ( u ) = N F ¯ Z N 1 ( u S N 1 ) W N N E [ N ] F ¯ u W N ,
^ ConAK 1 + IS + CV ( u ) = N F ¯ Z N 1 ( u S N 1 ) W N × h ( W N ) h θ ( W N ) N E [ N ] F ¯ u W N × h ( W N ) h θ ( W N ) .
For the case of conditioning on the PH random variables, we combine the estimator (4) with the above importance sampling method for both light- and heavy-tailed scaling distributions, and arrive at the following estimator for a single replicate:
^ ConAK 2 + IS ( u ) = N H ¯ Z N 1 ( u S N 1 ) X N × f ( X N ) f θ ( X N ) .
where X N is a generic copy of the PH random variable, and f is the PDF of X N .
We further combine the estimators (4) and (8) with a control variate for N to improve the efficiency, which is based on the same subexponential asymptotics as in (6):
^ ConAK 2 + CV ( u ) = N H ¯ Z N 1 ( u S N 1 ) X N N E [ N ] H ¯ u X N ,
^ ConAK 2 + IS + CV ( u ) = N H ¯ Z N 1 ( u S N 1 ) X N × f ( X N ) f θ ( X N ) N E [ N ] H ¯ u X N × f ( X N ) f θ ( X N ) .

3.1.2. Hazard Rate Twisting

In this section, we consider the case in which scaling distributions are heavy-tailed. Writing the PDF of the scaling random variable W as h, its hazard rate is denoted by λ ( w ) = h ( w ) / H ¯ ( w ) . Let Λ ( w ) = 0 w λ ( y ) d y = ln H ¯ ( w ) denote the hazard function. Reference [8] introduced hazard rate twisting and the hazard rate twisted density with parameter θ , 0 θ < 1 :
h θ ( w ) = h ( w ) e θ Λ ( w ) M ˇ ( θ ) ,
where M ˇ ( θ ) 0 h ( w ) e θ Λ ( w ) d w , which is the normalisation constant. The resulting twisted PDF is
h θ ( w ) = λ ( w ) ( 1 θ ) e 0 w ( 1 θ ) λ ( y ) d y , w 0 ,
where θ 1 n / Λ ( u ) .
The following equation will later be useful to compute the twisted tail probability:
H ¯ θ ( w ) = w λ ( t ) ( 1 θ ) e 0 t ( 1 θ ) λ ( y ) d y d t = w ( 1 θ ) h ( t ) H ¯ ( t ) e ( 1 θ ) ( ln H ¯ ( t ) ) d t = H ¯ ( w ) 1 θ .
Thus, conditioning on N = n , and taking in to account E [ X ] , the likelihood ratio of a single element associated with this change of measure is
h ( W ) h θ ( W ) = 1 1 θ ( n ) e θ ( n ) Λ ( W ) ,
where θ ( n ) = 1 n / Λ ( u / E [ X ] ) .
Combining the estimator (3) with the above importance sampling method for heavy-tailed scaling distributions, we then arrive at the following estimator for a single replicate:
^ ConAK 1 + IS ( u ) = N F ¯ Z N 1 ( u S N 1 ) W N × h ( W N ) h θ ( W N ) .
Since N is random, we combine the estimator above with a control variable, and this yields the estimator:
^ ConAK 1 + IS + CV ( u ) = N F ¯ Z N 1 ( u S N 1 ) W N × h ( W N ) h θ ( W N ) N E [ N ] F ¯ u W N × h ( W N ) h θ ( W N ) .
Note that Equations (12) and (13) have the same form as Equations (6) and (7), respectively. When we combine with the importance sampling method in this section, we exploit the hazard rate twisting method instead of the exponential twisting method.
Conditional (either on W N or on X N ) Asmussen–Kroese with importance sampling and control variate results are summarised in Algorithms 1 and 2 used to generate a single replicate.
Note that we replace u by Z N 1 ( u S N 1 ) when determining θ , since we perform change of measure only on Z N .
Algorithm 1 Conditional (on W N ) Asmussen–Kroese with importance sampling and control variate algorithm
1.
Generate N.
2.
Generate Z 1 , , Z N 1 as Z i = W i X i with W 1 , , W N 1 i . i . d H and X 1 , , X N 1 i . i . d F .
3.
Compute S N 1 = i = 1 N 1 Z i , find Z N 1 ( u S N 1 ) , and determine θ from ζ ( θ ) = ( Z N 1 ( u S N 1 ) / E [ X N ] (light-tailed W N ) or θ = 1 1 / Λ ( ( Z N 1 ( u S N 1 ) / E [ X N ] ) (heavy-tailed W N ).
4.
Generate W N H θ and compute the likelihood ratio L : = e ζ ( θ ) θ W N (light-tailed W N ) or L : = ( 1 θ ) 1 e θ Λ ( W N ) (heavy-tailed W N ).
5.
Return ^ ConAK 1 + IS + CV ( u ) = N F ¯ Z N 1 ( u S N 1 ) W N × L N E [ N ] F ¯ ( u W N ) × L .
Algorithm 2 Conditional (on X N ) Asmussen–Kroese with importance sampling and control variate algorithm
1.
Generate N.
2.
Generate Z 1 , , Z N 1 as Z i = W i X i with W 1 , , W N 1 i . i . d H and X 1 , , X N 1 i . i . d F .
3.
Compute S N 1 = i = 1 N 1 Z i , find Z N 1 ( u S N 1 ) .
4.
Generate X N F θ , with parameter θ solving ζ ( θ ) = ( Z N 1 ( u S N 1 ) ) / E [ W N ] .
5.
Return ^ ConAK 2 + IS + CV ( u ) = N H ¯ Z N 1 ( u S N 1 ) X N × f ( X N ) f θ ( X N ) N E [ N ] H ¯ u X N × f ( X N ) f θ ( X N ) .

4. Efficiency of the Modified Amussen–Kroese Estimator

In this section, we investigate the asymptotic relative error of the estimators ^ ( u ) . Since ( u ) = E [ ^ ( u ) ] , the estimator ^ ( u ) is consistent and we have the following second-order efficiency measures (see [15]),
1.
The estimator is said to be logarithmically efficient if the following condition on the logarithmic rates holds:
lim inf u log ( V ar ( ^ ( u ) ) ) 2 log ( ( u ) ) 1 .
2.
The estimator is said to have bounded relative error if
lim sup u V ar ( ^ ( u ) ) ( u ) 2 K < .
where K is a constant that does not depend on u.
3.
The estimator is said to have (asymptotically) vanishing relative error if
lim sup u V ar ( ^ ( u ) ) ( u ) 2 = 0 .
We focus on subexponential distributions and related distributions. The definitions of these relevant distributions are listed as follows.
Definition 2.
A distribution function F with support on ( 0 , ) is subexponential, if n 2 , it satisfies
lim x F ¯ * n ( x ) F ¯ ( x ) = n ,
where F ¯ * n ( x ) denotes the n-fold convolution of F ¯ . Denote the class of subexponential distribution functions by S .
Definition 3.
A distribution function F with support on ( 0 , ) is dominatedly varying, if y ( 0 , 1 ) , it satisfies
lim sup x F ¯ ( y x ) F ¯ ( x ) < .
Denote the class of dominatedly varying distribution functions by D .
Definition 4.
A distribution function F with support on ( 0 , ) is long-tailed, if y 0 , it satisfies
lim x F ¯ ( x y ) F ¯ ( x ) = 1 .
Denote the class of long-tailed distribution functions by L .
We also exploit the definitions of Weibullian tail from [16].
Definition 5
([16]). A distribution F on [ 0 , ) is said to have a Weibullian tail
F ¯ ( z ) C z γ e x p [ β z α ] , w h e r e α , β , C > 0 , γ ( , ) .
We write Z W ( α , β , γ , C ) if Z satisfies Definition 5.
The following examples are two examples of widely used distributions that have Weibullian tails.
Example 1.
The CDF of a Weibull distribution with shape parameter b and scale parameter a is
H ¯ ( w ) = e ( w / a ) b , x 0 ,
denoted as Weibull(b, a), and W W ( b ,   a b ,   0 ,   1 ) .
Example 2.
The asymptotic form of the tail of a general PH distribution has the form
F ¯ ( x ) C x k e η x ,
where C , η > 0 and k { 0 , 1 , 2 } (cf. [1]), so that we can write X W ( 1 , η , k , C ) .
Lemma 1 in [16] provides the way to calculate the parameters of product of two random variables having Weibullian tails.
Lemma 1
([16]). Let W W ( α 1 , β 1 , γ 1 , C 1 ) , X W ( α 2 , β 2 , γ 2 , C 2 ) be independent non-negative random variables. Then, W · X W ( α , β , γ , C ) with
  • α = α 1 α 2 α 1 + α 2 ,
  • β = β 1 α 2 / ( α 1 + α 2 ) β 2 α 1 / ( α 1 + α 2 ) α 1 α 2 α 2 / ( α 1 + α 2 ) α 2 α 1 α 1 / ( α 1 + α 2 ) ,
  • γ = α 1 α 2 + 2 α 1 γ 2 + 2 α 2 γ 1 2 ( α 1 + α 2 ) ,
  • C = 2 π C 1 C 2 1 α 1 + α 2 ( α 1 β 1 ) ( α 2 2 γ 1 + 2 γ 2 ) / ( 2 ( α 1 + α 2 ) ) ( α 2 β 2 ) ( α 1 2 γ 2 + 2 γ 1 ) / ( 2 ( α 1 + α 2 ) ) .
Suppose that we have the following conditions for the random variables involved.
Condition 1.
Assume that N is an integer-valued random variable with E [ N 2 ] < and X = ( X 1 , X 2 , X 3 , ) is a sequence of i.i.d. random variables with continuous distribution function B S , independent of N.
Condition 2.
There exists a function a ( u ) that satisfies one of the following:
(1) 
lim u ^ ( u ) a ( u ) = N almost surely;
(2) 
lim u ^ ( u ) a ( u ) = E [ N ] .
Condition 3.
There exists a random variable U with E [ U 2 ] < , such that for all u > 0 ,
U ^ ( u ) a ( u ) ,
almost surely.
Lemma 2.
If Condition 2 (1) and Condition 3 are fulfilled, we obtain
lim u V ar [ ^ ( u ) ] E [ ^ ( u ) ] 2 = V ar [ N ] E [ N ] 2 .
If Condition 2 (2) and Condition 3 are fulfilled, we obtain
lim u V ar [ ^ ( u ) ] E [ ^ ( u ) ] 2 = 0 .
Proof. 
lim u V ar [ ^ ( u ) ] E [ ^ ( u ) ] 2 = lim u E ^ ( u ) a ( u ) 2 E ^ ( u ) a ( u ) 2 E ^ ( u ) a ( u ) 2 .
If Condition 3 is fulfilled, by dominated convergence, we then interchange limit and expectation in (16) to obtain the result. □
Note that if Condition 1 is fulfilled, then (14) is bounded.
We list the conditions and lemma above which are mirrored from [13] for the reader’s convenience.
Proposition 1.
The estimators ^ ConAK 1 ( u ) and ^ ConAK 2 ( u )
  • Have bounded relative error when N is light-tailed, and W D L ;
  • Are logarithmically efficient when N is deterministic, and W has a Weibullian tail with shape parameter ( 0 , log ( 3 / 2 ) / log ( 4 / 3 ) ) .
Proof. 
Based on the law of total variance V ar ( Y ) = E V ar ( Y X ) + V ar E [ Y X ] , we can conclude that the conditional Monte Carlo estimator has lower variance than the estimator Y. Thus, we can obtain V ar ^ ConAK 1 ( u ) V ar ^ AK ( u ) and V ar ^ ConAK 2 ( u ) V ar ^ AK ( u ) for conditional Asmussen–Kroese estimators ^ ConAK 1 ( u ) and ^ ConAK 2 ( u ) .
The Asmussen–Kroese estimator (2) was proven to be logarithmically efficient when Z has a Weibullian tail in [7], and to have bounded relative error in [13] for the following three cases:
  • N is light-tailed, and Z D L .
  • N is light-tailed, and Z log-normal ( μ , σ 2 ) .
  • N is bounded, and Z is Weibull-distributed with shape parameter ( 0 , log ( 3 / 2 ) / log ( 3 ) ) .
Here, we extended the result to scaling random variables in similar cases for the estimators ^ ConAK 1 ( u ) and ^ ConAK 2 ( u ) , which are as follows:
  • W D L
    As a consequence of Corollary 3.3 in [17], if F is a non-degenerate phase-type distribution, and the scaling distribution H D L , then the phase-type scale mixture distribution B D L . Therefore, the estimators ^ ConAK 1 ( u ) and ^ ConAK 2 ( u ) attain the same efficiency as the Asmussen–Kroese estimator, as they have lower variance.
  • W W ( α , β , γ , C )
    By Lemma 1, we observe that a phase-type distribution has a Weibullian tail with shape parameter being 1, and if the scaling random variable W W ( α , β , γ , C ) , then Z W ( α Z , β Z , γ Z , C Z ) . Furthermore, the shape parameter of Z is α Z = α α + 1 ( 0 , 1 ) ; thus, Z is heavy-tailed.
    In [7], the authors restricted to the Weibull-type distribution with shape parameter in ( 0 , log ( 3 / 2 ) / log ( 2 ) ) and showed that the Asmussen–Kroese estimator is logarithmically efficient when N is deterministic. By Lemma 1, we obtain that if the shape parameter of the scaling random variable α ( 0 , log ( 3 / 2 ) / log ( 4 / 3 ) ) , then the shape parameter of the phase-type scale mixture distribution is α Z ( 0 , log ( 3 / 2 ) / log ( 2 ) ) . Therefore, the estimators ^ ConAK 1 ( u ) and ^ ConAK 2 ( u ) attain the same efficiency as the Asmussen–Kroese estimator if the scaling random variable W W ( α , β , γ , C ) with α ( 0 , log ( 3 / 2 ) / log ( 4 / 3 ) ) .
Further combining with a control variate for the number of summands N will additionally improve the efficiency of the estimators ^ ConAK 1 ( u ) and ^ ConAK 2 ( u ) .

5. Numerical Experiments

In this section, we provide a variety of numerical experiments to illustrate the proposed simulation methods. In all illustrative cases, we consider each scaling random variable W to have support on all of [ 0 , ) and each X to be Erlang-distributed with shape parameter being k and rate parameter being β . Erlang distributions are selected as an example PH distribution because they play an important role in the class of PH distributions ([18]). The class of generalised Erlang distributions (a series of k exponentials with their own rates β i where i { 1 , 2 , , k } ) is dense in the set of all probability distributions on the non-negative half-line.
We remind the reader that in all cases, the resulting product Z = W · X is heavy-tailed since the scaling random variable has unbounded support.
We explore the estimators ^ ConAK 2 + CV which are the conditional Asmussen–Kroese estimator with control variates for the number of summands, and the estimators ^ ConAK 1 + IS + CV and ^ ConAK 2 + IS + CV , which are the conditional Asmussen–Kroese estimator with importance sampling on the last term of the summands and control variate for the number of summands.
The figures we include in this paper are the estimates of ( u ) for u in the range [ 0 , 10 4 ) , empirical logarithmic rates obtained by estimating log ( V ar ( ^ ( u ) ) ) / 2 log ( ( u ) ) , and empirical relative errors obtained by estimating V ar ( ^ ( u ) ) / ( u ) 2 .
We explore the estimators combined with control variates for the number of summands to the exclusion of the estimators not combined with control variates in later examples, because combining with control variates will slightly improve the efficiency of the proposed estimators. We provide a preliminary comparison for the estimators ^ ConAK 1 , ^ ConAK 1 + CV , ^ ConAK 1 + IS and ^ ConAK 1 + IS + CV for the scaling random variable being exponential, and the estimators ^ ConAK 2 , ^ ConAK 2 + CV , ^ ConAK 2 + IS and ^ ConAK 2 + IS + CV for the scaling random variable being Weibull, aiming to rule out estimators that do not perform well in later examples. The details of density selection regarding importance sampling for the preliminary examples are discussed in Examples 3 and 4, and the results are summarised in Figure 1 and Figure 2, respectively. We observe that in Figure 1, estimators ^ ConAK 1 and ^ ConAK 1 + CV perform poorly due to numerical instability. When u exceeds around 2600, it is beyond the capability of data calculation. Thus, we do not include the estimators ^ ConAK 1 and ^ ConAK 1 + CV in later examples. In Figure 2, all estimators appear to provide sharp estimates. We also observe that the empirical logarithmic rate tends to 1 when u gets large, which suggests that the estimators are logarithmically efficient. Nevertheless, the empirical relative errors for the estimators without importance sampling is increasing and not as stable when u becomes large. Since we could observe a slight improvement when combining with control variates from both Figure 1 and Figure 2, we include the estimators combined with control variates exclusively.
Example 3
(Light-tailed Scaling: Exponential). Let W i i . i . d E x p ( μ ) be the scaling random variable, with PDF
h ( w i ) = μ e μ w i , w i 0 ,
and consider X i i . i . d E r l a n g ( k , β ) , with PDF
f ( x i ) = β k x i k 1 e β x i ( k 1 ) ! , x i 0 ,
where i = 1 , , N .
The Erlang distribution is the same as a gamma distribution with the shape parameter being an integer.
(1) 
When conditioning on the scaling random variable, the resulting exponentially-twisted PDF is
h θ ( w ) = μ θ e μ θ w , w 0 ,
where μ θ < μ . Hence, the likelihood ratio is given by
h ( W N ) h θ ( W N ) = μ μ θ e ( μ μ θ ) W N .
The choice of μ θ , conditioning on N, is determined by solving ζ ( θ ( N ) ) = Z N 1 ( u S N 1 ) E [ X N ] using ζ ( θ ( N ) ) = ln μ μ θ ( N ) , resulting in θ ( N ) = μ E [ X N ] Z N 1 ( u S N 1 ) , and μ θ = E [ X N ] Z N 1 ( u S N 1 ) . The likelihood ratio for (7), conditioning on N, is given by
h ( W N ) h θ ( W N ) = ( Z N 1 ( u S N 1 ) ) β μ k e μ k β ( Z N 1 ( u S N 1 ) ) W N .
(2) 
When conditioning on the PH random variable, the resulting PDF of the Erlang random variable is E r l a n g ( k , β θ ) . Then, the likelihood ratio is given by
f ( X N ) f θ ( X N ) = β β θ k e β β θ X N .
We solve ζ ( θ ( N ) ) = Z N 1 ( u S N 1 ) E [ W N ] using ζ ( θ ( N ) ) = k ln β β θ ( N ) , conditioning on N, resulting in θ ( N ) = β k E [ W N ] Z N 1 ( u S N 1 ) , and β θ = k E [ W N ] Z N 1 ( u S N 1 ) . The likelihood ratio for (10), conditioning on N, is given by
f ( X N ) f θ ( X N ) = ( Z N 1 ( u S N 1 ) ) β μ k k e β k μ ( Z N 1 ( u S N 1 ) ) X N .
An alternative option is to perform importance sampling using the exponential PDF. Then, the likelihood ratio is given by
f ( X N ) f θ ( X N ) = β k β θ X N ( k 1 ) e ( β β θ ) X N ( k 1 ) ! .
We solve ζ ( θ ( N ) ) = Z N 1 ( u S N 1 ) E [ W N ] using ζ ( θ ( N ) ) = ln β β θ ( N ) , conditioning on N, resulting in θ ( N ) = β E [ W N ] Z N 1 ( u S N 1 ) , and β θ = E [ W N ] Z N 1 ( u S N 1 ) . The likelihood ratio for (10), conditioning on N, is given by
f ( X N ) f θ ( X N ) = ( Z N 1 ( u S N 1 ) ) β k μ X N ( k 1 ) ( k 1 ) ! e β 1 μ ( Z N 1 ( u S N 1 ) ) X N .
We take a sample size of 10 5 , W i i . i . d E x p ( μ ) , X i i . i . d E r l a n g ( k , β ) , where i = 1 , , N and N Geo ( p ) with μ = 1 , k = 1 , β = 3 , and p = 0.2 . The results are shown in Figure 3. Proposition 1 does not provide a theoretical efficiency guarantee for this example as the number of summands is random, although the scaling random variable has a Weibullian tail. From numerical example results, we can observe that estimator ^ ConAK 2 + CV is not providing reliable estimates; however, estimators ^ ConAK 1 + IS + CV and ^ ConAK 2 + IS + CV perform better.
Example 4
(Heavy-tailed Scaling: Weibull). Let the scaling random variable W i be Weibull-distributed, with PDF
h ( w i ) = b a w i a b 1 e ( w i / a ) b , w i 0 ,
and consider X i i . i . d E r l a n g ( k , β ) , where i = 1 , , N .
(1) 
When conditioning on the scaling random variable, the resulting hazard-rate twisted PDF of the scaling random variable is
h θ ( w ) = λ ( w ) ( 1 θ ) e 0 w ( 1 θ ) λ ( y ) d y , w 0 .
Thus, conditioning on N, and taking into account E [ X N ] , the likelihood ratio of a single element associated with this change of measure is
h ( W N ) h θ ( W N ) = 1 1 θ ( N ) e θ ( N ) Λ ( W N ) .
The likelihood ratio for (7), conditioning on N, is given by
h ( W N ) h θ ( W N ) = ln H ¯ Z N 1 ( u S N 1 ) E [ X N ] ( H ¯ ( W N ) ) 1 + 1 ln H ¯ ( ( Z N 1 ( u S N 1 ) ) / E [ X N ] ) .
where θ ( N ) = 1 1 / Λ ( ( Z N 1 ( u S N 1 ) ) / E [ X N ] ) . We note that from (11), we have H ¯ θ ( W N ) = H ¯ ( W N ) 1 θ , and so it is straightforward to generate the twisted scaling random variables by using the inverse transform method.
(2) 
When conditioning on the PH random variable, the resulting PDF of the Erlang random variable is E r l a n g ( k , β θ ) ; then, the likelihood ratio is given by
f ( X N ) f θ ( X N ) = β β θ k e β β θ X N .
We solve ζ ( θ ( N ) ) = Z N 1 ( u S N 1 ) E [ W N ] using ζ ( θ ( N ) ) = k ln β β θ ( N ) , conditioning on N, resulting in θ ( N ) = β k E [ W N ] Z N 1 ( u S N 1 ) , and β θ = k E [ W N ] Z N 1 ( u S N 1 ) . The likelihood ratio for (10), conditioning on N, is given by
f ( X N ) f θ ( X N ) = ( Z N 1 ( u S N 1 ) ) β k a Γ ( 1 + 1 / b ) k e β k a Γ ( 1 + 1 / b ) ( Z N 1 ( u S N 1 ) ) X N .
An alternative option is to perform importance sampling using the exponential PDF. Then, the likelihood ratio is given by
f ( X N ) f θ ( X N ) = β k β θ X N ( k 1 ) e ( β β θ ) X N ( k 1 ) ! .
We solve ζ ( θ ( N ) ) = Z N 1 ( u S N 1 ) E [ W N ] using ζ ( θ ( N ) ) = ln β β θ ( N ) , conditioning on N, resulting in θ ( N ) = β E [ W N ] Z N 1 ( u S N 1 ) , and β θ = E [ W N ] Z N 1 ( u S N 1 ) . The likelihood ratio for (10), conditioning on N, is given by
f ( X N ) f θ ( X N ) = ( Z N 1 ( u S N 1 ) ) β k X N ( k 1 ) a Γ ( 1 + 1 / b ) ( k 1 ) ! e β a Γ ( 1 + 1 / b ) ( Z N 1 ( u S N 1 ) ) X N .
We take a sample size of 10 5 , W i i . i . d Weibull ( b , a ) , X i i . i . d Erlang ( k , β ) , where i = 1 , , N , and N Geo ( p ) with scale parameter a = 2 , shape parameter b = 0.35 , k = 2 , β = 3 , and p = 0.2 . The results are shown in Figure 4. Proposition 1 does not provide a theoretical efficiency guarantee for this example as the number of summands is random, although the scaling random variable has a Weibullian tail. From numerical example results, we can observe that estimators ^ ConAK 1 + IS + CV and ^ ConAK 2 + IS + CV (importance sampling density using an exponential PDF) perform well. However, estimators ^ ConAK 2 + CV and ^ ConAK 2 + IS + CV (importance sampling density using an Erlang PDF) appear to fluctuate more when u becomes larger, and they have increasing empirical relative errors.
Next, we change random variable N from geometrically distributed to deterministic, and keep the rest of the parameters the same as above. The results are shown in Figure 5. Proposition 1 suggests that the estimator ^ ConAK 2 + CV is logarithmically efficient as this example satisfies the second case of Proposition 1. From numerical example results, we can observe that estimators ^ ConAK 1 + IS + CV and ^ ConAK 2 + IS + CV (importance sampling density using an exponential PDF) perform well. However, estimators ^ ConAK 2 + CV and ^ ConAK 2 + IS + CV (importance sampling density using an Erlang PDF) appear to fluctuate more when u becomes larger, and they have increasing empirical relative errors.
Example 5
(Heavy-tailed Scaling: Pareto). Let the scaling random variable W i be Pareto-distributed, with PDF
h ( w i ) = δ w i δ + 1 , w i 1 ,
and consider X i i . i . d Erlang ( k , β ) , where i = 1 , , N . Note that Pareto distributions D L ; see [5].
(1) 
When conditioning on the scaling random variable, the resulting hazard-rate twisted PDF of the scaling random variable is
h θ ( w ) = δ ( 1 θ ) w 1 + δ ( 1 θ ) , w 1 .
Hence, the likelihood ratio is given by
h ( W N ) h θ ( W N ) = 1 1 θ W N θ δ .
Thus, conditioning on N, we set θ ( N ) = 1 1 / Λ ( ( Z N 1 ( u S N 1 ) ) / E [ X N ] ) . Since Λ ( W N ) = δ ln ( W N ) in this case, the likelihood ratio of single element for (7), conditioning on N, is
h ( W N ) h θ ( W N ) = δ ln ( Z N 1 ( u S N 1 ) ) β k W N ( δ 1 / ln ( ( Z N 1 ( u S N 1 ) ) β / k ) ) .
(2) 
When conditioning on the PH random variable, the resulting PDF of the Erlang random variable is E r l a n g ( k , β θ ) ; then, the likelihood ratio is given by
f ( X N ) f θ ( X N ) = β β θ k e β β θ X N .
We solve ζ ( θ ( N ) ) = Z N 1 ( u S N 1 ) E [ W N ] using ζ ( θ ( N ) ) = k ln β β θ ( N ) , conditioning on N, resulting in θ ( N ) = β k E [ W N ] Z N 1 ( u S N 1 ) , and β θ = k E [ W N ] Z N 1 ( u S N 1 ) . The likelihood ratio for (10), conditioning on N, is given by
f ( X N ) f θ ( X N ) = ( Z N 1 ( u S N 1 ) ) β ( δ 1 ) k δ k e β k δ ( Z N 1 ( u S N 1 ) ) ( δ 1 ) X N .
An alternative option is to perform importance sampling using the exponential PDF. Then, the likelihood ratio for is given by
f ( X N ) f θ ( X N ) = β k β θ X N ( k 1 ) e ( β β θ ) X N ( k 1 ) ! .
We solve ζ ( θ ( N ) ) = Z N 1 ( u S N 1 ) E [ W N ] using ζ ( θ ( N ) ) = ln β β θ ( N ) , conditioning on N, resulting in θ ( N ) = β E [ W N ] Z N 1 ( u S N 1 ) , and β θ = E [ W N ] Z N 1 ( u S N 1 ) . The likelihood ratio for (10), conditioning on N, is given by
f ( X N ) f θ ( X N ) = ( Z N 1 ( u S N 1 ) ) β k ( δ 1 ) X N ( k 1 ) δ ( k 1 ) ! e β δ ( Z N 1 ( u S N 1 ) ) ( δ 1 ) X N .
We take a sample size of 10 5 , W i i . i . d Pareto ( δ ) , X i i . i . d E r l a n g ( k , β ) , where i = 1 , , N , and N Geo ( p ) with δ = 8 , k = 2 , β = 3 , and p = 0.2 . The results are shown in Figure 6. Proposition 1 suggests that the estimator ^ ConAK 2 + CV will have bounded relative error, since Pareto D L . From numerical example results, we can observe that the estimator ^ ConAK 1 + IS + CV performs well. The estimator ^ ConAK 2 + CV has a few larger empirical relative error estimates as u is varied. Estimators ^ ConAK 2 + IS + CV (importance sampling density using an Erlang PDF and using an exponential PDF) appear to fluctuate more when u becomes larger, and they have increasing empirical relative errors.
Example 6
(Heavy-tailed Scaling: Log-normal). Consider now X i i . i . d Erlang ( k , β ) , and the scaling random variable W i i . i . d LogN ( μ , σ 2 ) with PDF
h ( w i ) = 1 w i σ 2 π e ( ln w i μ ) 2 2 σ 2 , w i > 0 ,
with corresponding CDF
H ( w i ) = Φ ln w i μ σ , w i > 0 ,
where i = 1 , , N , and Φ ( · ) is the CDF of a standard normal random variate.
(1) 
When conditioning on the scaling random variable, the resulting hazard-rate twisted PDF of the scaling random variable is
h θ ( w ) = λ ( w ) ( 1 θ ) e 0 w ( 1 θ ) λ ( y ) d y , w 0 .
The likelihood ratio is given by
h ( W N ) h θ ( W N ) = 1 1 θ ( N ) H ¯ ( W N ) θ ( N ) ,
where θ ( N ) = 1 1 / Λ ( ( Z N 1 ( u S N 1 ) ) / E [ X N ] ) . The likelihood ratio for (7), conditioning on N, is given by
h ( W N ) h θ ( W N ) = ln H ¯ Z N 1 ( u S N 1 ) E [ X N ] ( H ¯ ( W N ) ) 1 + 1 ln H ¯ ( ( Z N 1 ( u S N 1 ) ) / E [ X N ] ) .
where θ ( N ) = 1 1 / Λ ( ( Z N 1 ( u S N 1 ) ) / E [ X N ] ) .
We note that from (11), we have H ¯ θ ( W N ) = H ¯ ( W N ) 1 θ , and so it is straightforward to generate the twisted scaling random variables by using the inverse transform method.
(2) 
When conditioning on the PH random variable, the resulting PDF of the Erlang random variable is E r l a n g ( k , β θ ) ; then, the likelihood ratio is given by
f ( X N ) f θ ( X N ) = f ( X N ) f θ ( X N ) = β β θ k e β β θ X N .
We solve ζ ( θ ( N ) ) = Z N 1 ( u S N 1 ) E [ W N ] using ζ ( θ ( N ) ) = k ln β β θ ( N ) , conditioning on N, resulting in θ ( N ) = β k E [ W N ] Z N 1 ( u S N 1 ) , and β θ = k E [ W N ] Z N 1 ( u S N 1 ) . The likelihood ratio for (10), conditioning on N, is given by
f ( X N ) f θ ( X N ) = ( Z N 1 ( u S N 1 ) ) β k E [ W N ] k e β k E [ W N ] Z N 1 ( u S N 1 ) X N .
An alternative option is to perform importance sampling using the exponential PDF. Then, the likelihood ratio is given by
f ( X N ) f θ ( X N ) = β k β θ X N ( k 1 ) e ( β β θ ) X N ( k 1 ) ! .
We solve ζ ( θ ( N ) ) = Z N 1 ( u S N 1 ) E [ W N ] using ζ ( θ ( N ) ) = ln β β θ ( N ) , conditioning on N, resulting in θ ( N ) = β E [ W N ] Z N 1 ( u S N 1 ) , and β θ = E [ W N ] Z N 1 ( u S N 1 ) . The likelihood ratio for (10), conditioning on N, is given by
f ( X N ) f θ ( X N ) = ( Z N 1 ( u S N 1 ) ) β k X N ( k 1 ) E [ W N ] ( k 1 ) ! e β E [ W N ] Z N 1 ( u S N 1 ) X N .
We take a sample size of 10 5 , W i i . i . d LogN ( μ , σ 2 ) , X i i . i . d E r l a n g ( k , β ) , where i = 1 , , N , and N Geo ( p ) with μ = 2 , σ = 1.5 , k = 2 , β = 3 , and p = 0.2 . The results are shown in Figure 7. Proposition 1 does not provide a theoretical efficiency guarantee for this example as the log-normal distribution is neither D L nor has a Weibullian tail. From numerical example results, we can observe that estimators ^ ConAK 1 + IS + CV , ^ ConAK 2 + CV , and ^ ConAK 2 + IS + CV (importance sampling density using an exponential PDF) perform well. ^ ConAK 2 + IS + CV (importance sampling density using an exponential PDF) turns out to have slightly lower empirical logarithmic rates and higher empirical relative errors than estimators ^ ConAK 1 + IS + CV and ^ ConAK 2 + CV . The estimator ^ ConAK 2 + IS + CV (importance sampling density using an Erlang PDF) appears to fluctuate more when u becomes larger, and it has increasing empirical relative errors.

6. Discussion and Outlook

In this paper, we proposed straightforward simulation methods for estimating tail probabilities of random sums of scale mixture of PH-distributed summands. We combined Asmussen–Kroese estimation with the conditional Monte Carlo method and further exploited importance sampling applied to the last term of the summands. Since the method can perform poorly when the number of summands is large, we implemented an additional control variate for N to improve the accuracy of this estimator.
When combining Asmussen–Kroese estimation with the conditional Monte Carlo method, we can either condition on the scaling random variable or PH random variable. The estimators are denoted as ^ ConAK 1 and ^ ConAK 2 , respectively. In the given examples, we compared the estimators ^ ConAK 1 + IS + CV , ^ ConAK 2 + CV , ^ ConAK 2 + IS + CV (importance sampling using an exponential PDF), and ^ ConAK 2 + IS + CV (importance sampling using an Erlang PDF). We took the distribution of N to be geometrically distributed in most cases, with one extra example for N as deterministic.
We observed that in Example 3, where the scaling random variable was exponentially distributed, the estimators ^ ConAK 1 + IS + CV and ^ ConAK 2 + IS + CV with importance sampling density being either exponential or Erlang gave sharp estimates and empirical logarithmic rates which tended to 1 as u became large, and also had lower empirical relative errors than estimator ^ ConAK 2 + CV . The estimator ^ ConAK 2 + CV performed poorly due to the calculation capacity issue, as we observed that when u exceeded around 1500, the empirical relative errors were 0 in Figure 3c. The poor performance of estimator ^ ConAK 2 + CV is not surprising as Proposition 1 does not provide theoretical efficiency guarantees when the scaling random variable has a Weibullian tail and the number of summands is random.
In Example 4, where the scaling random variable was Weibull-distributed, we took the number of summands N being geometrically distributed or being deterministic. The results are summarised in Figure 4 and Figure 5. We observed that all the estimators in both examples performed similar. The estimators ^ ConAK 1 + IS + CV , ^ ConAK 2 + CV , and ^ ConAK 2 + IS + CV (importance sampling using an exponential PDF) gave sharp estimates and empirical logarithmic rates which tended to 1 as u became large, and also had lower empirical relative errors than estimator ^ ConAK 2 + IS + CV with importance sampling density being Erlang. Despite positive indications from the empirical logarithmic efficiency estimates, the empirical relative errors for the estimator ^ ConAK 2 + CV had an increasing trend and were fluctuating. Among the other two better-performing estimators, the estimator ^ ConAK 1 + IS + CV had the lowest empirical relative errors, and empirical logarithmic rates which tended to 1 faster.
In Example 5, where the scaling random variable was Pareto-distributed, the estimators ^ ConAK 1 + IS + CV , ^ ConAK 2 + CV , and ^ ConAK 2 + IS + CV (importance sampling using an exponential PDF) gave sharp estimates and empirical logarithmic rates which tended to 1 as u became large, and also had lower empirical relative errors than estimator ^ ConAK 2 + IS + CV with importance sampling density being Erlang. Despite positive indications from the empirical logarithmic efficiency estimates, the empirical relative errors for the estimator ^ ConAK 2 + IS + CV (importance sampling using an exponential PDF) had an increasing trend and were fluctuating. Among the other two better-performing estimators, the estimator ^ ConAK 1 + IS + CV had the lowest empirical relative errors, and empirical logarithmic rates which tended to 1 faster.
In Example 6, where the scaling random variable was log-normal-distributed, we observed similar trends as in Example 5. In this example, we observed that the estimators ^ ConAK 1 + IS + CV , ^ ConAK 2 + CV , and ^ ConAK 2 + IS + CV (importance sampling using an exponential PDF) gave sharp estimates. Empirical logarithmic rates of all the estimators appeared to tend to 1 as u became large. The estimators ^ ConAK 1 + IS + CV , ^ ConAK 2 + CV , and ^ ConAK 2 + IS + CV (importance sampling using an exponential PDF) had lower empirical relative errors than estimator ^ ConAK 2 + IS + CV with importance sampling density being Erlang. The estimators ^ ConAK 1 + IS + CV and ^ ConAK 2 + CV performed similarly, and showed better performance with empirical relative errors closer to 0, and higher empirical logarithmic rates in this example.
In all the above examples, the estimator ^ ConAK 2 + IS + CV (importance sampling using an exponential PDF) appeared to perform better than the estimator ^ ConAK 2 + IS + CV (importance sampling using an Erlang PDF), with sharp estimates, empirical logarithmic rates closer to 1, and lower empirical relative errors.
Finally, the examples given suggest that the estimator ^ ConAK 2 + CV performs well when the scaling random variable has a heavier tail. The efficiency study suggests that ^ ConAK 1 and ^ ConAK 2 will attain the same efficiency as the Asmussen–Kroese estimator for the classes of distributions in Proposition 1. However, when conditioning on the scaling random variable, we do not always obtain reliable estimates due to the numerical limitations. In this case, the numerical experiments suggest that further exploiting importance sampling on the scaling random variable of the last term of the summands will provide more reliable estimates. For the log-normal case which does not fall in the cases in Proposition 1, we do not apply Proposition 1 to conclude theoretical efficiency for the scaling random variable being log-normal-distributed. However, the numerical experiment suggests that the estimator ^ ConAK 1 + IS + CV performs the best.
In conclusion, numerical studies suggest that, in general, the estimator ^ ConAK 1 + IS + CV will provide more reliable estimates when estimating tail probabilities of random sums of PH scale mixture random variables. When the scaling random variables have much heavier tail (e.g., log-normal case), the estimator ^ ConAK 2 + CV performs better than in other cases. For all cases, the estimator ^ ConAK 2 + IS + CV (importance sampling using an exponential PDF) appears to provide more reliable and stable estimates than using an Erlang PDF, with empirical logarithmic rates which tend to 1 as u becomes large, and relatively low empirical relative errors.
At present, the importance sampling densities and related parameters are chosen heuristically to ensure that the probabilities of interest are no longer rare under these changes of measure. One key question which we did not address in this work is how to choose these densities and parameters in a principled and ideally asymptotically optimal way. We studied the efficiency of conditional Asumusse–Kroese estimators. Proof of the theoretical efficiency properties of the proposed estimators ^ ConAK 1 + IS + CV and ^ ConAK 2 + IS + CV is a subject for future research. Furthermore, in the present work, we assumed for simplicity that all of the random variables in the sum are independent, and PH random variables were chosen to be Erlang-distributed. An interesting avenue for future work is to develop effective simulation methods for this problem when there is structured dependence between the random variables. The PH assumption can be also relaxed as long as the product of the two random variables is one of D L , Weibull with appropriate shape parameter, or is log-normal.

Author Contributions

Conceptualisation, H.Y. and T.T.; methodology, H.Y. and T.T.; software, H.Y.; validation, H.Y. and T.T.; formal analysis, H.Y. and T.T.; investigation, H.Y.; resources, H.Y.; writing—original draft preparation, H.Y.; writing—review and editing, H.Y. and T.T.; visualisation, H.Y. and T.T.; supervision, T.T.; project administration, H.Y. and T.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All code for the numerical experiments found in this paper is available at https://github.com/HY9412/Estimating-Tail-Probabilities-of-Random-Sums-of-Phase-type-Scale-Mixture-Random-Variables (accessed on 5 August 2022).

Acknowledgments

The authors acknowledge the valuable input of Leonardo Rojas-Nandayapa on an earlier version of this manuscript. The authors thank the reviewers for their comments which improved the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Asmussen, S.; Albrecher, H. Ruin Probabilities, 2nd ed.; World Scientific Publishing: Singapore, 2010. [Google Scholar]
  2. Asmussen, S. Applied Probabilities and Queues, 2nd ed.; Springer: New York, NY, USA, 2003. [Google Scholar]
  3. Asmussen, S.; Nerman, O.; Olsson, M. Fitting phase-type distributions via the EM algorithm. Scand. J. Stat. 1996, 23, 419–441. [Google Scholar]
  4. Bladt, M.; Rojas-Nandayapa, L. Fitting phase-type scale mixtures to heavy-tailed data and distributions. Extremes 2018, 21, 285–313. [Google Scholar] [CrossRef]
  5. Foss, S.; Korshunov, D.; Zachary, S. An Introduction to Heavy-Tailed and Subexponential Distributions, 2nd ed.; Springer: New York, NY, USA, 2013. [Google Scholar]
  6. Asmussen, S.; Binswanger, K.; Højgaard, B. Rare Events Simulation for Heavy-tailed Distributions. Bernoulli 2000, 6, 303–322. [Google Scholar] [CrossRef]
  7. Asmussen, S.; Kroese, D. Improved algorithms for rare event simulation with heavy tails. Adv. Appl. Probab. 2006, 38, 545–558. [Google Scholar] [CrossRef]
  8. Juneja, S.; Shahabuddin, P. Simulating Heavy Tailed Processes Using Delayed Hazard Rate Twisting. ACM Trans. Model. Comput. Simul. (TOMACS) 2002, 12, 94–118. [Google Scholar] [CrossRef]
  9. Keilson, J.; Steutel, F. Mixtures of distributions, moment inequalities and measures of exponentiality and normality. Ann. Probab. 1974, 2, 112–130. [Google Scholar] [CrossRef]
  10. Bladt, M.; Nielsen, B.F.; Samorodnitsky, G. Calculation of Ruin Probabilities for a Dense Class of Heavy Tailed Distributions. Scand. Actuar. J. 2015, 2015, 573–591. [Google Scholar] [CrossRef]
  11. Rojas-Nandayapa, L.; Xie, W. Asymptotic tail behaviour of phase-type scale mixture distributions. Ann. Actuar. Sci. 2018, 12, 412–432. [Google Scholar] [CrossRef]
  12. Neuts, M. Probability Distributions of Phase Type; Department of Mathematics, University of Louvain: Ottignies-Louvain-la-Neuve, Belgium, 1975; pp. 173–206. [Google Scholar]
  13. Hartinger, J.; Kortschak, D. On the Efficiency of the Asmussen–Kroese-estimator and its Application to Stop-loss Transforms. Blätter der DGVFM 2009, 30, 363–377. [Google Scholar] [CrossRef]
  14. Siegmund, D. Importance Sampling in the Monte Carlo Study of Sequential Tests. Ann. Stat. 1976, 4, 673–684. [Google Scholar] [CrossRef]
  15. Kroese, D.; Taimre, T.; Botev, Z. Handbook of Monte Carlo Methods, 1st ed.; John Wiley and Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
  16. Arendarczyk, M.; Dȩbicki, K. Asymptotics of supremum distribution of a Gaussian process over a Weibullian time. Bernoulli 2011, 17, 194–210. [Google Scholar] [CrossRef]
  17. Su, C.; Chen, Y. On the behavior of the product of independent random variables. Sci. China 2006, A49, 342–359. [Google Scholar] [CrossRef]
  18. O’Cinneide, C.A. Phase-type Distributions and Majorization. Ann. Appl. Probab. 1991, 1, 219–227. [Google Scholar] [CrossRef]
Figure 1. The random variable N is geometrically distributed with success probability p = 0.2 , W i i . i . d Exp ( 1 ) , X i i . i . d Erlang ( 1 , 3 ) , where i = 1 , , N . Green circles: ConAK1. Red stars: ConAK1+CV. Black dots: ConAK1+IS (conditioning and IS on exponential). Cyan diamonds: ConAK1+IS+CV (conditioning and IS on exponential). (a) Estimates of ( u ) as a function of u on a logarithmic scale. (b) Empirical logarithmic rates as a function of u. (c) Empirical relative errors as a function of u.
Figure 1. The random variable N is geometrically distributed with success probability p = 0.2 , W i i . i . d Exp ( 1 ) , X i i . i . d Erlang ( 1 , 3 ) , where i = 1 , , N . Green circles: ConAK1. Red stars: ConAK1+CV. Black dots: ConAK1+IS (conditioning and IS on exponential). Cyan diamonds: ConAK1+IS+CV (conditioning and IS on exponential). (a) Estimates of ( u ) as a function of u on a logarithmic scale. (b) Empirical logarithmic rates as a function of u. (c) Empirical relative errors as a function of u.
Algorithms 15 00350 g001
Figure 2. The random variable N is geometrically distributed with success probability p = 0.2 , W i i . i . d Weibull ( 0.35 , 2 ) , X i i . i . d Erlang ( 2 , 3 ) , where i = 1 , , N . Green circles: ConAK2. Red stars: ConAK2+CV. Black dots: ConAK2+IS (conditioning and IS on Erlang using an exponential PDF). Cyan diamonds: ConAK2+IS+CV (conditioning and IS on Erlang using an exponential PDF). (a) Estimates of ( u ) as a function of u on a logarithmic scale. (b) Empirical logarithmic rates as a function of u. (c) Empirical relative errors as a function of u.
Figure 2. The random variable N is geometrically distributed with success probability p = 0.2 , W i i . i . d Weibull ( 0.35 , 2 ) , X i i . i . d Erlang ( 2 , 3 ) , where i = 1 , , N . Green circles: ConAK2. Red stars: ConAK2+CV. Black dots: ConAK2+IS (conditioning and IS on Erlang using an exponential PDF). Cyan diamonds: ConAK2+IS+CV (conditioning and IS on Erlang using an exponential PDF). (a) Estimates of ( u ) as a function of u on a logarithmic scale. (b) Empirical logarithmic rates as a function of u. (c) Empirical relative errors as a function of u.
Algorithms 15 00350 g002
Figure 3. The random variable N is geometrically distributed with success probability p = 0.2 , W i i . i . d Exp ( 1 ) , X i i . i . d Erlang ( 1 , 3 ) , where i = 1 , , N . Black squares: ConAK1+IS+CV (conditioning and IS on exponential). Red stars: ConAK2+CV. Blue dots: ConAK2+IS+CV (conditioning and IS on Erlang using an exponential PDF). Cyan diamonds: ConAK2+IS+CV (conditioning and IS on Erlang using an Erlang PDF). (a) Estimates of ( u ) as a function of u on a logarithmic scale. (b) Empirical logarithmic rates as a function of u. (c) Empirical relative errors as a function of u.
Figure 3. The random variable N is geometrically distributed with success probability p = 0.2 , W i i . i . d Exp ( 1 ) , X i i . i . d Erlang ( 1 , 3 ) , where i = 1 , , N . Black squares: ConAK1+IS+CV (conditioning and IS on exponential). Red stars: ConAK2+CV. Blue dots: ConAK2+IS+CV (conditioning and IS on Erlang using an exponential PDF). Cyan diamonds: ConAK2+IS+CV (conditioning and IS on Erlang using an Erlang PDF). (a) Estimates of ( u ) as a function of u on a logarithmic scale. (b) Empirical logarithmic rates as a function of u. (c) Empirical relative errors as a function of u.
Algorithms 15 00350 g003
Figure 4. The random variable N is geometrically distributed with success probability p = 0.2 , W i i . i . d Weibull ( 0.35 , 2 ) , X i i . i . d Erlang ( 2 , 3 ) , where i = 1 , , N . Black squares: ConAK1+IS+CV (conditioning and IS on Weibull). Red stars: ConAK2+CV (conditioning on Erlang). Blue dots: ConAK2+IS+CV (conditioning and IS on Erlang using an exponential PDF). Cyan diamonds: ConAK2+IS+CV (conditioning and IS on Erlang using an Erlang PDF). (a) Estimates of ( u ) as a function of u on a logarithmic scale. (b) Empirical logarithmic rates as a function of u. (c) Empirical relative errors as a function of u.
Figure 4. The random variable N is geometrically distributed with success probability p = 0.2 , W i i . i . d Weibull ( 0.35 , 2 ) , X i i . i . d Erlang ( 2 , 3 ) , where i = 1 , , N . Black squares: ConAK1+IS+CV (conditioning and IS on Weibull). Red stars: ConAK2+CV (conditioning on Erlang). Blue dots: ConAK2+IS+CV (conditioning and IS on Erlang using an exponential PDF). Cyan diamonds: ConAK2+IS+CV (conditioning and IS on Erlang using an Erlang PDF). (a) Estimates of ( u ) as a function of u on a logarithmic scale. (b) Empirical logarithmic rates as a function of u. (c) Empirical relative errors as a function of u.
Algorithms 15 00350 g004
Figure 5. The number of summands N = 30 , W i i . i . d Weibull ( 0.35 , 2 ) , X i i . i . d Erlang ( 2 , 3 ) , where i = 1 , , N . Black squares: ConAK1+IS+CV (conditioning and IS on Weibull). Red stars: ConAK2+CV (conditioning on Erlang). Blue dots: ConAK2+IS+CV (conditioning and IS on Erlang using an exponential PDF). Cyan diamonds: ConAK2+IS+CV (conditioning and IS on Erlang using an Erlang PDF). (a) Estimates of ( u ) as a function of u on a logarithmic scale. (b) Empirical logarithmic rates as a function of u. (c) Empirical relative errors as a function of u.
Figure 5. The number of summands N = 30 , W i i . i . d Weibull ( 0.35 , 2 ) , X i i . i . d Erlang ( 2 , 3 ) , where i = 1 , , N . Black squares: ConAK1+IS+CV (conditioning and IS on Weibull). Red stars: ConAK2+CV (conditioning on Erlang). Blue dots: ConAK2+IS+CV (conditioning and IS on Erlang using an exponential PDF). Cyan diamonds: ConAK2+IS+CV (conditioning and IS on Erlang using an Erlang PDF). (a) Estimates of ( u ) as a function of u on a logarithmic scale. (b) Empirical logarithmic rates as a function of u. (c) Empirical relative errors as a function of u.
Algorithms 15 00350 g005
Figure 6. The random variable N is geometrically distributed with success probability p = 0.2 , W i i . i . d Pareto ( 8 ) , X i i . i . d Erlang ( 2 , 3 ) , where i = 1 , , N . Black squares: ConAK1+IS+CV (conditioning and IS on Pareto). Red stars: ConAK2+CV (conditioning on Erlang). Blue dots: ConAK2+IS+CV (conditioning and IS on Erlang using an exponential PDF). Cyan diamonds: ConAK2+IS+CV (conditioning and IS on Erlang using an Erlang PDF). (a) Estimates of ( u ) as a function of u on a logarithmic scale. (b) Empirical logarithmic rates as a function of u. (c) Empirical relative errors as a function of u.
Figure 6. The random variable N is geometrically distributed with success probability p = 0.2 , W i i . i . d Pareto ( 8 ) , X i i . i . d Erlang ( 2 , 3 ) , where i = 1 , , N . Black squares: ConAK1+IS+CV (conditioning and IS on Pareto). Red stars: ConAK2+CV (conditioning on Erlang). Blue dots: ConAK2+IS+CV (conditioning and IS on Erlang using an exponential PDF). Cyan diamonds: ConAK2+IS+CV (conditioning and IS on Erlang using an Erlang PDF). (a) Estimates of ( u ) as a function of u on a logarithmic scale. (b) Empirical logarithmic rates as a function of u. (c) Empirical relative errors as a function of u.
Algorithms 15 00350 g006
Figure 7. The random variable N is geometrically distributed with success probability p = 0.2 , W i i . i . d LogN ( 2 , 2.25 ) , X i i . i . d Erlang ( 2 , 3 ) , where i = 1 , , N . Black squares: ConAK1+IS+CV (conditioning and IS on log-normal). Red stars: ConAK2+CV (conditioning on Erlang). Blue dots: ConAK2+IS+CV (conditioning and IS on Erlang using an exponential PDF). Cyan diamonds: ConAK2+IS+CV (conditioning and IS on Erlang using an Erlang PDF). (a) Estimates of ( u ) as a function of u on a logarithmic scale. (b) Empirical logarithmic rates as a function of u. (c) Empirical relative errors as a function of u.
Figure 7. The random variable N is geometrically distributed with success probability p = 0.2 , W i i . i . d LogN ( 2 , 2.25 ) , X i i . i . d Erlang ( 2 , 3 ) , where i = 1 , , N . Black squares: ConAK1+IS+CV (conditioning and IS on log-normal). Red stars: ConAK2+CV (conditioning on Erlang). Blue dots: ConAK2+IS+CV (conditioning and IS on Erlang using an exponential PDF). Cyan diamonds: ConAK2+IS+CV (conditioning and IS on Erlang using an Erlang PDF). (a) Estimates of ( u ) as a function of u on a logarithmic scale. (b) Empirical logarithmic rates as a function of u. (c) Empirical relative errors as a function of u.
Algorithms 15 00350 g007
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Yao, H.; Taimre, T. Estimating Tail Probabilities of Random Sums of Phase-Type Scale Mixture Random Variables. Algorithms 2022, 15, 350. https://doi.org/10.3390/a15100350

AMA Style

Yao H, Taimre T. Estimating Tail Probabilities of Random Sums of Phase-Type Scale Mixture Random Variables. Algorithms. 2022; 15(10):350. https://doi.org/10.3390/a15100350

Chicago/Turabian Style

Yao, Hui, and Thomas Taimre. 2022. "Estimating Tail Probabilities of Random Sums of Phase-Type Scale Mixture Random Variables" Algorithms 15, no. 10: 350. https://doi.org/10.3390/a15100350

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop