Lambert W Random Variables and Their Applications in Loss Modelling

: Several distributions and families of distributions are proposed to model skewed data, e.g., with skew-normal and related distributions. Lambert W random variables offer an alternative approach in which, instead of constructing a new distribution, a certain transformation is proposed. Such an approach allows the construction of a Lambert W skewed version from any distribution. Here, we choose the Lambert W normal distribution as a natural starting point and include the Lambert W exponential distribution due to the simplicity and shape of the exponential distribution, which, after skewing, may produce a reasonably heavy tail for loss models. In the theoretical part, we focus on the mathematical properties of obtained distributions, including the range of skewness. In the practical part, the suitability of the corresponding Lambert W transformed distributions is evaluated on real insurance data. Finally, the results are compared with those obtained using common loss distributions.


Introduction
Loss modelling is an essential part of actuarial and financial mathematics.Several distributional models have been applied over the years, and the increasing volumes of data and computational power have motivated the use of even more complex distributions to fit the data.
In the actuarial and financial fields, the data are usually skewed.Several classical distributions can be used to fit skewed data (see, e.g., [1,2]).A generic approach for skewing symmetric distributions was introduced in Azzalini [3], where the shape of the normal distribution is deformed by a certain skewness parameter.Similarly, other asymmetric distributions (e.g., skew t-distribution) have been developed [4].Unified overviews of skewed distributions are provided in [5,6], while a review of different applications of skew-elliptical distributions in actuarial and financial mathematics is provided in [7].
In [8], another method of generating skewness was introduced through the Lambert W function that, when applied to symmetric distributions, can produce skewness and a heavy tail.In addition, Lambert W random variables can be seen as a generalization, as the input distribution can be arbitrary and not necessarily symmetric.When using the Lambert W function, instead of using the parametric manipulation of the original symmetric density function to introduce skewness, the random variable itself is transformed.
Another Lambert W transformation related to random variables was studied in [9], namely, a class of log-Lambert W random variables with applications to likelihood-based inference of normal random variables.
A different approach using the Lambert W function was introduced in [10,11], where the transformation is applied to the cumulative distribution function of the continuous positive valued random variable.
The Lambert W function has proven useful in mathematics, physics, chemistry, biology, engineering, risk theory, and other fields, though it has been less widely used in statistical modelling.Nonetheless, there are a number of noteworthy examples.In [12], the Lambert W approach was applied to normalize a vector regardless of its actual distribution.The use of the Lambert W distribution in matrix factorization with an implementation in probabilistic programming was presented in [13].The Lambert W function has been used to derive the exact distribution of the likelihood ratio test statistic and to solve related problems in [14][15][16].
The approach of modelling the skewed random variables and symmetrizing the data using the Lambert W function as a variable transformation was used in [8,12,17,18].We use [8] as the basis of our construction in this paper.
The rest of this paper is organized as follows.In the first section, we provide a short overview of the Lambert W function.In Section 3, general definitions and the expressions of the cumulative density functions and probability density functions of the Lambert W random variables are introduced, followed by more detailed results concerning the Lambert W normal and exponential distributions.In Section 4, we describe the results of fitting the Lambert W normal and exponential distributions to two insurance-related datasets, then compare the fit with several typical insurance models.Proofs of several properties, technical details of estimation, and additional figures showing the fitted distributions are presented in the Appendices A-C.

The Lambert W Function and Its Properties
In the following, we define the Lambert W function and provide a brief overview of its properties; refer to [19][20][21] for more details on the topic.
The Lambert W function is a set of inverse functions for the following function: Substituting x = x ′ e x ′ leads to the definition of the Lambert W function.
Definition 1.The Lambert W function W(x) is defined by the following equality: Note that, in general, the function W(x) can be defined for real or complex arguments, and that Equation (1) has infinitely many solutions, most of which are complex.Following the notation of [21], we denote the different branches of the function by W k (x), where the branch index k ∈ {0, ±1, ±2, . . .} and x ∈ C. For real x, all branches other than W 0 (x) and W −1 (x) are complex.For x ∈ −∞, − 1 e , the equation has only complex solutions.We denote the branch corresponding to W(x) ≥ −1 by W 0 (x), which we call the principal branch, and the branch corresponding to W(x) ≤ −1 by W −1 (x), which we call the nonprincipal branch.
Among the characteristic properties of the function (see Figure 1 as well) are: 1.
Based on its construction as an inverse of a certain exponential function, the asymptotes of W are similar to those of the natural logarithm.More precisely, the limits can be found as follows: At the same time, the absolute difference between the Lambert's W function and the natural logarithm |W 0 (x) − ln x| goes to infinity for x → ∞ [20].
x W(x)

Definitions
Next, we present the definitions of different types of Lambert random variables based on Goerg [8].We provide the formulae of the cumulative distribution function (cdf) and probability density function (pdf) for scale and location-scale random variables.Definition 2. Let U be a continuous random variable with a cdf F U (u is a noncentral and nonscaled Lambert W × F U random variable with skewness parameter γ. The skewness parameter γ can take any value on the real line; however, as the exponential function is always positive, the transformation (2) preserves the sign.Thus, if γ = 0, then Y = U.The effect of the transformation on the shape of the distribution depends on the original variable U.If U has both positive and negative values, then positive γ folds back the tail with negative values at a point − 1 γ , relocating part of negative U values, while on the positive side the values move further away, making the right tail heavier.Negative γ acts the other way around.Note that for a skewed U, the Lambert W transform can produce a more symmetric random variable.
The transformation in (2) is not scale-or location-invariant.In order to keep these properties, which are needed, for example, to construct the Lambert W normal random variables, it is necessary to include the transformed variable's location and scale parameters in the definition.For more details about the location-scale family of distributions, refer to [22] (pp.116-121).Definition 3. Let X be a continuous random variable from a location-scale family with cdf F X (x|β), where β is the corresponding parameter vector.Let U = X − µ σ be the zero-mean unit variance version of X.Then, is a location-scale Lambert W × F X random variable with parameter vector (β, γ).
If γ > 0, the location-scale Lambert W × F X random variable takes values in the interval (µ − σ γe , ∞).For a negative γ, on the contrary, Y has an upper bound, and the values are in the interval (−∞, µ − σ γe ).For γ > 0, the cdf and pdf of a location-scale Lambert W × F X random variable are respectively and where z = y − µ σ and we denote the derivative of W(γz) by z as In (6), the principal and non-principal branches are not distinguished, as the same holds for both.The derivation of these expressions can be found in Goerg [8].The derivation and resulting expressions for γ < 0 are similar, except that the three regions considered are pivoted: the first region is y ≤ µ, where only the principal branch is used; the second region is µ < y < µ − σ γe , where both branches are used; and y ≥ µ − σ γe for last region, where the cdf reaches 1 and the pdf is equal to 0.
For a non-negative X from the scale family, for example, an exponentially-distributed X, we can define the corresponding scale-family Lambert random variable as follows.
Definition 4. Let X be a non-negative continuous random variable from a scale family with cdf F X (x|β), where β is the parameter vector.Let U = X σ be the unit-variance version of X.Then, is a scale Lambert W × F X random variable with parameter vector (β, γ).
If γ > 0, then the cdf and pdf for a scale Lambert random variable can be found easily, as the transformation (7) takes values only on the positive side of the real line; as we apply the transformation W on positive arguments as well, only the principal branch plays a role.Hence, the cdf has the following form: Taking the derivative of ( 8), we obtain the following form for the pdf: Our primary focus is on positive γ that produces a heavier right tail to right-skewed distribution, possibly making the distribution more suitable for describing insurance losses.Yet, the results for γ < 0 are not as straightforward as for the location-scale family case.Thus, to complete the theory, we analyze this situation as well and derive the cdf and pdf.First, the cdf: Now, as the argument γy/σ is negative for y > 0, both branches are needed when we apply the Lambert function.Hence, The principal and non-principal branches are equal at point y = − σ γe ; thus, this is the point where the cdf F Y reaches 1.In summary, if γ < 0, then (10) and the corresponding pdf is

Lambert W Normal Distribution
In this section, we apply the Lambert location-scale transformation (3) on a normal random variable X ∼ N(µ, σ).The resulting random variable ) random variable with parameter vector (µ, σ, γ).Without loss of generality, we assume that the skewness parameter γ is positive; the situation is mirrored for negative γ-s (i.e., left skew instead of right skew).Using (4), the cdf for a positive skewness parameter γ can be written as where z = y−µ σ and Φ is the standard normal cdf.Likewise, using (5), we obtain the pdf for γ > 0 as where f 0 (z) and f −1 (z) are the components of the pdf corresponding to the principal and non-principal branch, respectively: Examples of the cdf and pdf for the Lambert W × N(0, 1) distribution with γ > 0 are shown in Figure 2. In the following, we provide several results that describe the behaviour of the pdf of a Lambert W normal random variable.To keep our proofs technically cleaner, the analysis is applied to Lambert W × N(0, 1) random variables, as generalization to Lambert W × N(µ, σ) is straightforward.Proofs of these lemmas are presented in Appendix A.
Lemma 1.The pdf of a Lambert W × N(0, 1) random variable Z, f Z has an asymptote at − 1 γe : The point − 1 γe where f Z has an asymptote can be thought of as a point where the transformation folds the left tail of N(0, 1) and fits it into the interval (− 1 γe , 0).At this turning point, the density accumulates; see Figures 2-5 for examples.Although the transformation squeezes the negative values of N(0, 1) into a fixed interval and makes the right tail heavier, it continues to have zero as a point where the probability mass is divided into equal halves.Furthermore, at point z = 0, the pdf f Z is equal to the pdf of N(0, 1), i.e., f Z (0) = 1 √ 2π .This property is pointed out in the right-hand panels of Figures 5 and 6.Lemma 2. The principal branch component of the pdf of a Lambert W × N(0, 1) random variable f 0 has the following properties.The function f 0 (z): (a) has two local extrema (maximum and minimum) if γ ∈ (0, Lemma 3. The non-principal branch component of the pdf of a Lambert W × N(0, 1) random variable f −1 has the following properties.The function f −1 (z): Consequently, depending on the value of the skewness parameter γ, it is possible to distinguish three main shapes of the pdf of a Lambert W normal random variable.First, if γ ∈ (0, √ 2 − 1), then the pdf has two local extrema due to the principal branch component f 0 ; see Figure 3 or the left panel of Figure 4 , then the pdf is a strictly decreasing function of z, as in the right panel of Figure 4. Third, if γ > √ 2 + 1, then the pdf again has two local extrema, now due to the non-principal branch component f −1 , and compared to the first case, the overall shape of pdf is different, as seen in Figures 5 and 6.In these two figures, the right panel provides a more detailed view of the interval where the maximum is placed.As notably seen in Figure 6, the apparently sharp peak turns out to be quite smooth if examined more closely.
Lastly, we provide the expressions of the moments and skewness coefficient of a Lambert W × N(µ, σ) random variable.The moments of a Lambert W × N(0, 1) random variable can be found using the moment generating function (mgf) of the underlying standard normal distribution.Let Z be a Lambert W × N(0, 1) random variable.The moments for Z are then as follows [8]: where M N(0,1) denotes the mgf of N(0, 1).For the general case, i.e., for a Lambert W × N(µ, σ) random variable Y, we can use the properties of the location-scale family, meaning that we have As the moments are found using the derivatives of an exponential function, the moments of any order k exist and are finite.Using the above expressions, we can derive the formulae for the mean of and the skewness coefficient γ 1 (Y): The skewness coefficient is a monotone function of γ, and has the same sign.As γ → ±∞, we have γ 1 (Y) → ±∞, and the speed of growth is exponential.For example, if we look at the range of values γ ∈ ( , where the pdf is monotone decreasing, the skewness coefficient grows from around 3 to 20,000 (see Figure 7).

Lambert W Exponential Distribution
Let X be an exponentially distributed random variable with parameter λ > 0 (λ as rate).Then, the transformed random variable Y = Xe γλX has a Lambert W × Exp(λ) distribution with parameter vector (λ, γ).According to (7), for positive γ, the cdf of Y is and, using (9), the pdf of Y is For γ < 0, the expressions for cdf and pdf additionally involve the non-principal branch of the Lambert W function, as seen in ( 10) and ( 11): and For examples of the pdf and cdf for the Lambert W × Exp(1) distribution, see Figures 8  and 9.As apparent from Figure 8, the Lambert random variables have a heavier tail in the case of positive γ as compared to the exponential distribution.
For negative γ values (see Figure 9), the random variable Y takes values in the fixed interval (0, − 1 eγλ ), as the transformation relocates the larger values of the underlying exponential random variable X.While it can be argued that this kind of transformation is not relevant for typically heavy-tailed insurance data, our example (see Section 4) shows an adequate fit when using the Lambert W exponential random variables with γ < 0 for log claims of Danish fire loss data.In the case of γ < 0, if the absolute value of γ is small, this produces a distribution with a suitably large cut-off point to fit data with moderate tails, as is the case for the Danish log claims data.Similarly, only small values of γ are of practical use for positive γ, as the tail quickly becomes heavy very.For example, if γ ≥ 1, then Lambert W × Exp(λ) random variables do not have a finite first moment.For γ < 1, the first moment is 1 λ(1−γ) 2 .In general, the following expression holds: The skewness coefficient for a Lambert W × Exp(λ) random variable with γ < 1 3 can be calculated as follows: If γ ≥ 1 3 , then the third moment of Y is infinite, and the coefficient γ 1 (Y) cannot be found.The skewness coefficient is a non-monotonic function of γ (see Figure 10).If γ = 0, the distribution simplifies to exponential, and the skewness coefficient γ 1 = 2.For a Lambert W × Exp(λ) distribution, the skewness coefficient γ 1 can exceed a value of 2, and approaches infinity as γ → 1  3 .For values −∞ < γ < −1, the skewness coefficient is a decreasing function of γ with a minimum value of − 9 √ 15 50 , while for γ ∈ (−1, 1 3 ) it is increasing (see Figure 10).

Fitting Lambert W Random Variables to Insurance Data
In this subsection, we fit the Lambert W normal and exponential random variables on two well-known datasets, the US indemnity data introduced in [23] and the Danish fire data introduced in [24], then compare the fit with previous results.
These datasets have been widely used in field-specific literature before; see, e.g., [25,26] for the US indemnity data and [27][28][29] for the Danish fire data, among others.A consolidated overview of previous results is provided in [30].
To recall the distributions of these example datasets, see Figure 11 for the US indemnity data and Figure 12 for the Danish fire loss data.In both figures, the left panel presents the data on the original scale (thousands of USD for US indemnity and millions of DKK for Danish fire data), and the right panel presents the same data after log transformation.In the case of the log-transformed data, we use a similar shift to the one in [30] in order to keep the results comparable.More precisely, the transformation ln(y) − min(ln(y)) + 10 −10 is applied on the original variable y.
It is evident that both datasets exhibit significant skewness when observed on the original scale.The skewness is more extreme for the Danish fire data, with a skewness coefficient of γ 1 = 18.74, as compared to γ 1 = 9.15 for the US indemnity data.In the case of the US indemnity data, the log-transformed data produces an almost symmetric histogram that is very similar to a normal distribution.The log-transform reduces the skewness for Danish fire data as well, although the result remains skewed, with γ 1 = 1.76.In [30], nineteen distributions were fitted to the two aforementioned datasets, with the result that the skew-normal and skew t distributions are reasonably competitive compared to other models commonly used for insurance data.
In our research, we follow this construction and include all fitted continuous distributions while adding three more distributions to the list: the Lambert W normal and exponential distributions as our main contribution, and the Pareto distribution, which was previously missing due to technical problems.We use the maximum likelihood method for parameter estimation, as in [30].For more details of the estimation process, see Appendix B.
To compare these models with competitors, we measure the goodness of fit between the data distribution using the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC).The BIC is included because the number of parameters of the distributions ranges from 1 to 5, making the penalty of the AIC quite small compared to the flexibility that additional parameters can provide.
Before the comparison, we first examine the parameter estimates of the Lambert W distributions in Table 1.In the case of the Lambert W exponential model for the US indemnity data, the γ estimate 0.496 provides an infinite skewness coefficient.At the same time, the fit according to the BIC is good relative to other models; see Table 2 and later discussion.As the US indemnity data are close to normal on the log scale, the Lambert W exponential is not really a suitable model here.However, the estimate γ = −0.321corresponds to a skewness coefficient value of 0.09, i.e. this model is able to pick up the symmetry of the data.
What is interesting in the case of the Danish log data is the negative γ estimate, as it produces a distribution with an upper bound of − 1 γeλ , here resulting in 7.82.As the maximum value in data is around 5.57, this model allows even higher claim values than in the data.Furthermore, this model suits the data well, as it ranks high according to BIC value (see Table 3, discussed in detail later on).The fit is not good for the same data on the original scale, which are highly skewed, and the γ estimate 0.096 can be considered unexpectedly low.
From the estimates of the Lambert W normal parameter, we can point out that the γ estimates are in the interval that produces monotone decreasing pdf for both datasets on the original scale.For US indemnity data on the log scale, the γ estimate −0.021 produces a distribution very similar to normal, which is in agreement with the histogram.For the Danish log data, the estimate for γ is 0.373, which is in the interval (0, √ 2 − 1), which corresponds to the pdf shape with some downward bend between the asymptote and maximum, as in the left panel of Figure 4.As shown in the following analysis, the fit provided by the Lambert W transformed random variables is promising.
The results of model fitting are presented in Tables 2 and 3.The distributions are sorted in ascending order by the number of parameters, with the two newly added Lambert W distributions always shown at the top of the table.In every column, the first three results are marked: the best result is in bold, the second-best is underlined, and the third-best is underlined and in italics.In the case of the US indemnity data (see Table 2), we have seen earlier that the log-transformed data closely resemble the normal distribution.Therefore, the log-normal distribution can be expected to provide the best fit for the data in the original scale.However, the Lambert W exponential model provides a good fit as well, with the second-best AIC and BIC values.For the log-transformed data, the two smallest AIC values are almost equal, with the following block having very close values.Thus, the skew-normal and Lambert W normal distributions share first place, and skew t follows at the top of the next block.Based on the BIC, the normal distribution provides the best fit, having fewer parameters than the skew-normal or Lambert W normal.The skew-normal and Lambert W normal distributions fall to second and third place, respectively.The pdfs for the best three models with data histograms are plotted in Figure A1 in Appendix C. It is apparent from the latter graph that the top three models exhibit a high degree of similarity, with the primary distinction residing in the region of small claims when viewed on the original scale.For the log-transformed data, the three curves practically coincide.From Table 3, it can be seen that for the Danish fire data on the original scale, the two best-fitting models are the skew t and Lambert W normal distributions.For the Danish log data, the Lambert W normal distribution again has the best fit based on the AIC, followed by the skew t.Based on the BIC, the best model is the Lambert W normal distribution, while the Lambert W exponential has the second best result; for further illustration, see Figure A2 in Appendix C. The three best pdfs for the original data are very similar.On the log-transformed data, the discrepancies are not large either, though they are more clearly visible.In conclusion, the Lambert W models provide a good fit to both the original and log-transformed data.

Summary
In this paper, we have addressed the Lambert W transform-based approach and the properties of the resulting distributions, thoroughly investigating the Lambert W normal and Lambert W exponential distributions.We introduce the skewness via the Lambert W transform and the skewness parameter γ.Without loss of generality, we focus on positive values of γ, as these are more of interest in loss modelling applications.For the Lambert W standard normal distribution with a positive skewness parameter γ, the pdf f (y) has an asymptote at y = −1 γe .We establish the following three regions based on the shape of the pdf: In the first range, where γ ∈ (0, √ 2 − 1), the shape of the distribution is at first glance not the most suitable for loss modelling, and needs additional explanations.Nevertheless, it can be argued that the asymptote effect is reasonably small, meaning that the distribution can provide a good fit, as in the Danish fire log data.Such a shape might be suitable in zero-altered models as well, where zero claims are included.The second and most appealing range, where the pdf is monotone decreasing, covers a wide range of the skewness coefficient values; see Figure 7.If γ = √ 2 + 1, then the skewness coefficient is about 20,000; thus, the not-very-suitable shape in the third range is not a problem for most practical applications.
For the Lambert W exponential distribution, we establish that it allows a wider choice of the skewness coefficient than the exponential distribution.Moreover, one additional parameter relaxes the rigid relationship between the mean and variance of the exponential distribution.These properties make the Lambert exponential distribution a promising model for insurance loss data.
Our results in the practical part show that the Lambert W transformed distributions operating in a wide range of skewness represent a viable choice for insurance loss modelling.Both the normal and exponential distribution-based transforms show a reasonably good fit.An especially illustrative proof of this flexibility is visible in the Danish fire data, where the results of the Lambert W normal model are well at the top for both the original and log-transformed datasets.
Clearly, the choices available for the Lambert W approach are not limited to normal and exponential random variables.While the normal and exponential distributions seem to be a natural starting point for loss modelling, other distributions can offer valuable contributions as well.
In conclusion, because f Z (z) = f 0 (z) − f −1 (z) for z ∈ (− 1 γe , 0], we have lim z→− 1 γe + f Z (z) = ∞.The lemma is proved.Proof of Lemma 2. We first note that Formulas ( 12) and ( 13) differ only in the specification of the branch (W 0 or W −1 ).Because most of the following argumentation holds for both branches, we do not specify the branch unless explicitly needed.In other words, we start by searching for the extrema of the function       The equality (A3) holds if the numerator is zero and the denominator is not.As we assume that γ > 0, the denominator provides a restriction z ̸ = − 1 γe , which is already accounted for.In the numerator, because the value of the exponential function is positive for any fixed

Figure 2 .
Figure 2. Plots of the pdf (left panel) and cdf (right panel) of W × N(0, 1) distributions with different γ values.

Figure 5 .Figure 6 .Figure 7 .
Figure 5. Example of Lambert W × N(0, 1) pdf when γ = 2.5.The right panel shows a closer view of the interval marked with grey in the left panel.

Figure 8 .Figure 9 .
Figure 8. Plots of the pdf (left panel) and cdf (right panel) of W × Exp(1) distributions with different positive γ values.
existence of extrema for different values of γ > 0, we first have to take the derivative from the expression (A1) by z, ignoring the constant in front:  exp − (W(γz))

Figure A2 .
Figure A2.Left panel: Danish fire data claims (in millions of DKK).For a better overview, values above 20 are not shown on the histogram.Right panel: the same data after log-transformation.The added lines represent the best three estimates based on BIC.

Table 1 .
Parameter estimates for Lambert distributions.

Table 2 .
US indemnity data: AIC and BIC values for fitted distributions.

Table 3 .
Danish fire data: AIC and BIC values for fitted distributions.