A Bimodal Extension of the Exponential Distribution with Applications in Risk Theory

: There are some generalizations of the classical exponential distribution in the statistical literature that have proven to be helpful in numerous scenarios. Some of these distributions are the families of distributions that were proposed by Marshall and Olkin and Gupta. The disadvantage of these models is the impossibility of ﬁtting data of a bimodal nature of incorporating covariates in the model in a simple way. Some empirical datasets with positive support, such as losses in insurance portfolios, show an excess of zero values and bimodality. For these cases, classical distributions, such as exponential, gamma, Weibull, or inverse Gaussian, to name a few, are unable to explain data of this nature. This paper attempts to ﬁll this gap in the literature by introducing a family of distributions that can be unimodal or bimodal and nests the exponential distribution. Some of its more relevant properties, including moments, kurtosis, Fisher’s asymmetric coefﬁcient, and several estimation methods, are illustrated. Different results that are related to ﬁnance and insurance, such as hazard rate function, limited expected value, and the integrated tail distribution, among other measures, are derived. Because of the simplicity of the mean of this distribution, a regression model is also derived. Finally, examples that are based on actuarial data are used to compare this new family with the exponential distribution.


Introduction
It is well-known that many empirical datasets that are traditionally used in different scenarios, such as financial econometrics, actuarial, income modelling, and industrial engineering, include positive support and bimodality. For example, in the stochastic frontier model, it can be assumed that some firms are fully efficient. In contrast, others are inefficient, giving to an error term that can be bimodal (see, for example, the recent work of [1]). In this case, the problem is to assess which regime (efficient or inefficient) each firm belongs to. The result of the distribution of the disturbance term can be, in this case, bimodal. Furthermore, in some cases, the classical continuous distributions can neither account for zero values in its support nor reach two maxima. In this regard, many empirical continuous data with positive support begins at zero, including a high frequency of this initial value. This portion of observations is neither negligible, nor should they be ignored. Because most of the aforementioned classical distributions are not allowed to incorporate the zeroes, most of the researchers truncate the data by omitting those values or modelling with mixed random variables that include a mixture of a classical distribution and a point mass 0 (see [2]). Although numerous models have been developed in discrete scenarios, where −∞ < q < ∞ is a new parameter and where it is assumed that E f X (x;α) [κ(x; α, q)] = X κ(x; α, q) f X (x; α) dx < ∞. Now, by combining this methodology with the idea that is given in [3], a result that provides a generalization of a classical continuous distribution is proposed. The resulting model can be either unimodal or bimodal, as shown in the following Proposition.
Proof. The result is obtained by considering that f Y (y) ≥ 0 and integrating over the support of the random variable Y to have Y f Y (y) dy = 1.
The parameter θ controls the unimodality or bimodality of the distribution. Additionally, by taking θ = 0, the parent pdf g Y (y; µ, σ) is obtained as a special case. Subsequently, the methodology proposed in Proposition 1 is a method to generalize a parent pdf.
The probability density function that is given in (2) can be viewed as a weighted distribution. There exists a vast literature dealing with the construction of such distributions in the discrete case since the pioneering work of [10]. However, the literature regarding the continuous scenario is scarce. The idea behind this construction is simple and it aims to obtain more flexible distributions that adapt to empirical data distributions. If the weight is the mean of the initial distribution, then the weighted function's interpretation in terms of the length biased (size biased) sampling is possible. However, much more effort is required to obtain an interpretation of the function ω(·) beyond incorporating a parameter that controls the unimodality or bimodality of the distribution obtained.
The cumulative distribution function (cdf) of this family is obtained by integrating (2) by parts It is noted that the integral in the second term of the right-hand side of (3) is obtained in a closed-form under the classical distributions, such as the exponential, gamma, and Weibull. In this paper, we discuss the particular case in which the parent distribution is the exponential distribution. A comprehensive examination of its mathematical properties is carried out with relevant emphasis on results that are related to insurance. Additionally, parameter estimation is completed by the methods of moments and maximum likelihood. Moreover, we analyze the efficiency of the estimates via a simulation study. Finally, the model's practical performance is examined by using two real claims size sets of data. The distribution that is proposed in this paper can be used as a basis for excess-of-loss quotations. Furthermore, it provides a good description of the random behaviour of significant losses, similarly to the Pareto distribution. Unlike other generalizations of the exponential distribution, the one introduced in this work allows for us to derive a regression model due to its mean simple expression.
The rest of the paper is organised, as follows. Section 2 provides some statistical properties of this model. Section 3 shows a catalogue of actuarial results. Next, Section 4 describes parameter estimation and a simulation study. The regression model is derived in Section 5. Numerical applications are given in Section 6 and the last Section concludes the paper.

Bimodal Extension of the Exponential Distribution
Let us first consider the classical exponential distribution with the pdf given by with a rate parameter α > 0 and a unique modal value located at x 0 = α. The survival function isF X (x) = Pr(X > x) = exp(−αx). Henceforward, a random variable X that follows (4) will be denoted by X ∼ Exp(α). Now, by using (2) and taking into account that µ = 1/α and σ = 1/α, we have that, for where ζ(θ) = (2 + θ 2 ) −1 , the expression is a genuine pdf for y ≥ 0, α > 0 and −∞ < θ < ∞. The survival function is obtained from (3) and it is given byF with The exponential distribution is obtained when θ = 0. From now on, a random variable that Y follows the pdf (5) will be denoted as Y ∼ BE(α; θ). Figure 1, below, shows the graphs of the pdf of this distribution for several values of the parameters α and θ. It can be easily verified that, if |θ| < 1, then the shape of the distribution resembles the exponential one with a mode at y = 0. On the other hand, for other values of θ, the pdf reaches a local maximum (local mode) at and a minimum (antimode) atỹ Furthermore, since f Y (0) = αζ(θ) 1 + (1 + θ) 2 , then the value of the pdf at zero is larger than the one of the classical exponential distribution at the same value for θ > 0 and lower in the rest of its domain.

Reliability, Hazard Rate Function and Moments
The reliability function and the hazard (failure) rate function are two important reliability measures. The reliability function of a random variable Y is defined as R Y (t) = 1 − F Y (t) and it is given by (6), while the hazard rate function, defined as h Y (t) = f Y (t)/(1 − F Y (t)), is provided by h(y) = α ω(y; α, θ) ω 1 (y; α, θ) . Figure 2 displays the hazard rate function of the BE law for different values of the parameters. As compared to the exponential distribution, which has a constant hazard rate, it is discernible that the hazard function of the distribution that is proposed here exhibits a wide variety of shapes. Therefore, the new family of distributions is flexible enough to describe a diversity of real datasets. It is simple to see that the hazard rate function reaches a minimum at 1 αθ (1 + Obviously, for θ = 0 the hazard rate function is constant, corresponding to the exponential case. The moment generating function is given by from which we can derive the moments of the distribution; these are given by where µ k = k!/α k , and µ = 1/α and σ = 1/α are the mean and standard deviation of the exponential distribution respectively. Therefore, (8) can be rewritten as In particular, the mean, second order moment, and variance are given by It is straightforward to see that the mean decreases with θ for −2 − √ 6 < θ < −2 + √ 6 and increases in the rest of the support of this parameter.

Results in Risk Theory
In this section, some interesting actuarial results of this family of distributions are provided. Let the random variable Y represent either a policy limit or reinsurance deductible (from an insurer's perspective); then, the limited expected value function L of Y with cdf F(y), is defined by which is the expectation of the cdf F(y). In other words, it represents the expected amount per claim that is retained by the insured on a policy with a fixed amount deductible of y. This is an appropriate tool for analyzing an excess of loss reinsurance ( [13], Chapter 2, p. 59 and [14], Chapter 3, p. 113), among others. For the BE distribution, this amount is given by (5) can be also applied to rating excess-of-loss reinsurance, as it can be seen in the following result.

Proposition 2.
Let Y be a random variable denoting the individual claims size taking values only for individual claims greater than d. Let us also assume that Y follows the pdf (5), then the expected cost per claim to the reinsurance layer when the loss in excess of m subject to a maximum of l is given by Proof. The result follows by taking into account that from which we obtain the result after some algebra.
The failure rate of the integrated tail distribution, as defined by , is given by Additionally, the reciprocal of γ I (y) is the mean residual life that can be easily derived. In the insurance context (see, for example [13,15]) for a claims amount random variable Y, the mean excess function or mean residual life function, e(y), plays an essential role in reinsurance framework. It is interpreted as the expected payment per claim on a policy with a fixed deductible of y, where claims with an amount less than or equal to y is completely ignored. Because the mean residual life is related to the limited expected value function through the expression (see [13], p. 59) a closed-form expression can be obtained for the mean residual life Finally, the TVaR function is also provided in closed-form, .

Methods of Estimation and Simulation
Given a random sampleỹ = (y 1 , . . . , y n ) taken from the BE(α, q) distribution, simple moment estimates can be calculated by equating the sample and theoretical moments. Because there are two parameters, we need, for example, the mean,m, and the sample second order moment around the origin,s 2 . Now, by setting equal (9) and (10) to the sample counterparts, we getα By plugging (13) into (10), we obtain the equation which depends solely onθ and it can be solved numerically. The impossibility of proving that both moment and maximum likelihood estimators exist and are unique is one of the most substantial limitations of the proposed probabilistic model. However, in practice, the model's estimates, as shown in the simulation analysis and numerical applications, are easily obtained by numerical methods without difficulty. This issue leads us to think that they correspond to global maxima, although they cannot be guaranteed. We now proceed with the maximum likelihood method of estimation. The loglikelihood function can be written as Then, the normal equations are given by Numerical procedures, such as the Newton-Raphson algorithm can be used to derive the solutions of the system of equations that are given by (14) and (15). Unlike the exponential distribution, the maximum likelihood estimates cannot be expressed in closed-form. In practice, as we are unable to prove that the log-likelihood function is concave, the likelihood function can be directly maximized by considering different values as seed points, since the global maximum is not guaranteed by the impossibility to prove that the loglikelihood function is concave. We have used different maximum search methods available in the FindMaximum built-in function in Mathematica software package. These methods include the Newton-Raphson and the Broyden-Fletcher-Goldfarb-Shanno (BGGS) algorithms. The same results were achieved under these two optimization functions. Although a more general structure, such as kernel regression or neural network, could provide accurate estimates, the approach used in this paper does not require training data. It can also work well, even if the fit to data is not perfect. Additionally, this method is easier to understand and interpret results, i.e., a parametric test for the significance of the parameter estimates can lead to a rejection of the null hypothesis rather than the non-parametric counterpart. Finally, from the actuarial perspective, the practitioner may be interested in the parametric approach since it provides appealing closed-form expressions, as is the case of this BE representation.

Simulation Experiment
Here, an acceptance-rejection algorithm to generate random variates from the BE(α, θ) distribution (see [16]) is used. The simulation analysis results are illustrated in Table 1, where the behaviour of the maximum likelihood estimates of 1000 simulated samples of sizes 50, 100, 150, and 200 from the BE(α, θ) distribution. For each simulated sample generated, the estimates were numerically computed via a Newton-Raphson algorithm. In this Table, the means, standard deviations (SD), and percentage of coverage probability (C) are reported for different values of the parameters α and θ. As expected, it is observable that the bias becomes smaller as the sample size n increases.

A suitable Regression Model
In practice, to better explain the response variable, it is important that the statistical model is able to incorporate covariates. By rewriting (9) as a reparameterization of the BE distribution in (5) is obtained, where E(Y) = µ. The variance of the reparameterized distribution is given by In this case, the parameter θ can be interpreted as a precision parameter, since the function m(θ) increases for −3 < θ < 1 (see Figure 3) and decreases in the rest of the domain of θ. Thus, µ is the mean of the response variable and θ can be regarded as a precision parameter in the sense that, for a fixed value of µ, the variance of Y varies according to the values of m(θ), i.e., the values of the parameter θ. Because the mean of the response is non-negative, the most common function that relates the mean and the linear predictor is the log link, . , x iq ) is a q-vector of explanatory variables and β β β = (β 1 , β 2 , . . . , β p ) ∈ IR q is a q-vector of unknown regression coefficients that may include an intercept. Subsequently, we have the conventional log-linear model, such that The maximum likelihood estimates of the regressors β s , s = 1, . . . q, can be computed via the Newton-Raphson algorithm. In our applications, parameters will be estimated by the maximum likelihood method by using this algorithm available in the software packages Mathematica [17] and RATS [18]. The code for the latter package is available upon request.
As is well-known, the marginal effect reflects the variation of the conditional mean of Y due to a one-unit change in the sth covariate, and it is calculated as ∂µ i /∂y s = β s µ i for i = 1, . . . , n and s = 1, . . . , q. Thus, the marginal effect indicates that a one-unit change in the sth regressor increases or decreases the expectation of the dependent variable, depending on the sign, positive or negative, of the regressor for each mean. For indicator or dummy variables that take only the value 0 or 1, the marginal effect in term of the odds-ratio is approximately exp{β j }. Therefore, the conditional mean is exp{β j } times larger if the indicator variable is one rather than zero.

Empirical Results
Two datasets will be used to illustrate the applicability of the distribution studied in this paper. The first one is related to life insurance and the other one to non-life insurance. For these two examples considered, the exponential distribution and the zero-adjusted gamma model provided in [2] will be used as a benchmark. The pdf of this last distribution is given by where 0 < p < 1, α > 0, θ > 0. This mixture distribution will be denoted hereafter as GM.
It is worthy to point out that other versions of the generalized exponential distribution, such as the one that was proposed by [4], and the one suggested by [5], do not yield significant improvement over the model introduced in this work.

Dataset 1
Life insurers offer new products to increase their market share, as is the case in the majority of financial companies that operate in the markets. Thus, insurance companies need to know specific characteristics of their potential customers, which makes it is crucial to update their databases accordingly to include the appropriate information. In this first example, we use data from the Consumer Finance Survey (SCF), a representative sample at the national level (in the U.S.) of 500 households with positive income who were interviewed in the survey that was conducted in 2004. For term life insurance, the amount of insurance is measured by the policy face, the amount that the company will pay in the event of the death of the named insured. Important characteristics that may impact this quantity (covariates) are shown in some detail in Table 2. This dataset can be found in the personal website of Professor E. Frees (Wisconsin School of Business Research), https://instruction.bus.wisc.edu/jfrees/jfreesbooks/Regression%20Modeling/ BookWebDec2010/data.html. Parameter estimates obtained by the method of maximum likelihood together with standard errors (in brackets) are illustrated in Tables 3 and 4 for the model without and with covariates, respectively. As it can be observed, the model proposed in this paper provides a better fit to this dataset than the exponential distribution in both cases in terms of three measures of model selection, maximum of the log-likelihood function, AIC and CAIC. The expressions of the latter measures are: where k is the number of parameters, and n is the sample size. A model with a lower value of these measures is preferable. It is also discernible in Figure 4 that the BE distribution is able to reproduce the bimodality of the empirical data. Also, in Table 4 are displayed the estimates, standard errors (S.E.), t-Wald statistics and p-values for the BE and exponential (in brackets) distributions. It can be noticed that the sign of some of the estimated regressors differs in the sign for both models. Besides, most of the explanatory variables (except for ethnicity and smarstat) are statistically significant at the 5% significance level. Similarly to the previous case, according to the same model validation measures, the BE provides a better fit to data.  This second example discusses a cross-sectional dataset that was collected in 1977 related to third party car insurance claims. It can also be found in the website of Professor E. Frees. See also [19] and the references therein. The explanatory variables of interest are described below in Table 5. Using these explanatory variables, we explain the response variable, the payments in Swedish krona (variable payment divided by 10 3 ). Only 136 data observations out of the total 2182 observations correspond to the case where the insured is classified in the bonus scale 1 (insured starts in class 1 and it is moved up one class, to a maximum of 7 after a claim-free year). Therefore, in this application, only those insureds with a lesser driving experience have been considered since the empirical distribution shows bimodality. The estimates of the basic model (without covariates) and standard errors are exhibited in Table 3. Once again, the BE distribution provides a better fit to data than the exponential model in terms of the three measures of model selection.
Moreover, Table 6 shows the estimates, standard errors (S.E.), t-Wald statistics, and p-values for the BE and exponential (in brackets) distributions. It is noted that there are no significant differences with respect to the exponential regression model, except for the variable make, which is not statistically at the usual significance levels under the new regression model. Similarly to the previous example (see Figure 5), the new model can also reproduce the second mode of the empirical data. Finally, we have computed the limited expected values tha are given in (12) for the basic model, without including covariates, and compared those numerical figures with the empirical values and those obtained for the exponential distribution. It is apparent that, for the BE distribution, the computed values adhere closer to the empirical ones than those ones that are derived from the exponential model (see Figure 6). Now, we are interested in analyzing the out-of-sample performance of the BE distribution. For that reason, we have randomly partitioned the second dataset into two disjoint subsets of the same size. The first subset of size 68 is used for fitting the models, whereas the second subset tests the model's out-of-sample prediction accuracy. We have fitted the exponential and the BE distribution to these datasets. Subsequently, we compare the out-of-sample performance via the likelihood ratio test proposed by [20] for non-nested models. The test statistic of Vuong's closeness test is where is the sample variance of the pointwise log-likelihood ratios and f (·) and g(·) represent the pdf of two different non-nested models, θ 1 and θ 2 are the maximum likelihood estimates of θ 1 and θ 2 and n f and n g are the number of estimated parameters in the model with pmf f (·) and g(·), respectively. Note that the Vuong's statistic is sensitive to the number of estimated parameters in each model and, therefore, the test must be corrected for dimensionality. We test H 0 : Under the null hypothesis, T is asymptotically normally distributed. At the 5% significance level, the rejection region for this test in favour of the alternative hypothesis occurs when |T| > 1.96. By following this approach, we have calculated the test statistics of the Vuong's test 1000 times. The resulting average value of the test statistics was T = 6.985, then the BE regression model is preferable to the exponential regression model at the 5% significance level, in-sample and out-of-sample, for this dataset.
Apart from improving the maximum value of the likelihood function, it is interesting to note that the estimation of the parameters is fast from a computational point of view and not problematic. Bear in mind that, to obtain bimodal modelling, a finite mixture of two distributions, such as the gamma exponential, may be possible. Still, in these cases, the estimation of the parameter that weighs both distributions can give rise to identifiability problems.

Conclusions and Extensions
In this paper, a methodology that allows us to generalize an initial probability distribution by adding a parameter that controls the unimodality or bimodality of the new family was introduced. Special attention was paid to the case where the parent model is the exponential distribution, thus obtaining a new generalization of the exponential distribution that can be considered to be an alternative model to other extensions of this distribution. This new model's analytical expression is simple, and many interesting statistical and actuarial quantities are obtained in closed-form. The derivation of composite and folded models that are based on this bimodal family might be a line of further research following the recent works of [21,22], among others.
Finally, the computation of the disturbance term's distribution in the stochastic frontier analysis assuming the bimodal exponential distribution as the distribution of the inefficiency and a normal distribution of the noise can be obtained in a closed-form expression. This distribution can be bimodal, thus it is a suitable model for explaining different sources of inefficiency. We believe that this is a promising line of research to be addressed in the future.