Model Uncertainty and Selection of Risk Models for Left-Truncated and Right-Censored Loss Data

Insurance loss data are usually in the form of left-truncation and right-censoring due to deductibles and policy limits respectively. This paper investigates the model uncertainty and selection procedure when various parametric models are constructed to accommodate such left-truncated and right-censored data. The joint asymptotic properties of the estimators have been established using the Delta method along with Maximum Likelihood Estimation when the model is specified. We conduct the simulation studies using Fisk, Lognormal, Lomax, Paralogistic, and Weibull distributions with various proportions of loss data below deductibles and above policy limits. A variety of graphic tools, hypothesis tests, and penalized likelihood criteria are employed to validate the models, and their performances on the model selection are evaluated through the probability of each parent distribution being correctly selected. The effectiveness of each tool on model selection is also illustrated using {well-studied} data that represent Wisconsin property losses in the United States from 2007 to 2010.


Introduction
In actuarial science, calculating financial and insurance risk measures plays an important role in the modeling of financial and insurance loss data.In many cases, loss data exhibits complex features such as left-truncation and right-censoring.These features pose challenges on model selection and the analysis of model uncertainty.
The selection of appropriate risk models for left-truncated and right-censored (LTRC) loss data is an essential task in actuarial practice.Selection of inaccurate models can lead to incorrect estimation of financial risk, resulting in sub-optimal decision-making.There is a need for a thorough analysis of model uncertainty and model selection for LTRC loss data.This article aims to address the model uncertainty and the corresponding effect on model selection.
In actuarial literature, (5) studied various aspects of LTRC insurance loss data such as probability density function (PDF), cumulative distribution function (CDF), quantile function (QF), building up a framework to derive asymptotic distributions of parametric and empirical estimators.Besides, (15) designed some robust parametric estimation procedures for the insurance payment data affected by deductibles, policy limits, and coinsurance factors.These authors clearly presented examples and showed how model uncertainty arises in the model fitting process of the actuarial loss data and how to deal with model mis-specification.
Left truncation and right censoring are both types of data incompleteness commonly encountered in survival analysis.(13) studied the left truncation and right censoring in lifetime data.Left truncation is the situation where the individuals in the study are only observed from a certain point in time forward, and those who have experienced the event of interest prior to this time point are excluded from the analysis.Right censoring refers to the situation where the event of interest has not occurred for some individuals by the end of the study period.In our study, left truncation means that losses below a certain deductible value have not been observed, and right censoring means that only frequencies are observed for losses above a certain policy limit value without further information about severity.This context of insurance loss data differs from the context of lifetime data for survival analysis.
In the model selection procedure, the selection criteria play a critical role in determining the best candidate model.The performance of typical penalized likelihood measures, Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) have been reviewed (see (7) for example) and discussed using various incomplete data sets with the data truncation and censoring facilitated according to their industrial applications.(2) demonstrated that AIC is superior to the BIC when the true model comes from the Gamma, Lognormal and Weibull distributions.(14) suggested that the likelihood-based selection criterion is preferred to the distance-based selection criterion (such as Kolmogorov-Smirnov distance) within the Lehmann family distributions.We will also employ the Information Complexity (ICOMP) criterion, a measure that penalizes the interdependencies among parameter estimators of Maximum Likelihood Estimation, as a supplemental tool, in addition to AIC and BIC, to identify the structural differences among widely used medium and heavy tailed distributions.In this research, we compare all the above-mentioned selection indicators and consider the Anderson-Darling test, another goodness-of-fit test that puts more weight on the tails, to evaluate the model performance on a wider range of loss models.
The remainder of the paper is organized as follows.In Section 2, we present key probabilistic and log-likelihood functions of the Fisk, Frechet, Lognormal, Lomax, Paralogistic, and Weibull distributions.In Section 3, we discuss Quantitile-Quantitile plot, Kolmogorov-Smirnov test and Anderson-Darling test for model validation.Further, AIC, BIC, and ICOMP criteria will be discussed to select the model in Section 4. The simulation study has been conducted in Section 5. A real data analysis has been conducted in Section 6.Furthermore, the main results of the paper are summarized in Section 7 which concludes.

Theoretical Derivations for Candidate Distributions
In this Section, we study six candidate loss distributions for medium and heavy tailed risks.They are Fisk, Frechet, Lognormal, Lomax, Paralogistic, and Weibull distributions.Let X * represent the conditional random variable X|d ≤ X ≤ u.

Model Distribution with left truncation and right censoring
Candidate 1: Suppose random variable X follows a Fisk distribution (see (9) for example) with scale parameter θ and shape parameter α.The pdf and cdf for a ground-up Fisk loss distribution are: This distribution is also called Loglogistic distribution.It resembles the Lognormal near zero when the shape parameter is greater than one and behaves more like Lomax in the tail.The cdf of X * representing the loss data with deductible (d) and policy limit (u) is: Further, the qf of X * is: Candidate 2: The Frechet, known as inverse Weibull distribution, is a special case of the generalized extreme value distribution with much fatter tails on the right.It has ground-up pdf and cdf as follows Then cdf of X * representing the loss data with deductible (d) and policy limit (u) is: Further, the qf of X * is: Candidate 3: Suppose random variable X is distributed according to a Lognormal distribution with log-location parameter −∞ < µ < ∞, and log-scale parameter σ > 0. We will denote this fact as X ∼ LN (µ, σ).Its pdf and cdf are: Here Φ, φ, Φ −1 denote the cdf, pdf, qf of the standard normal distribution, respectively.Next, let us introduce the following abbreviations: Then cdf of X * representing the loss data with deductible and policy limit is Further, the qf of X * is: Candidate 4: Suppose random variable X is distributed according to Lomax distribution (see (12) for example) (also known as Pareto Type II, see (1) for example).The pdf and cdf for a ground-up Lomax loss distribution are: Then cdf of X * representing the loss data with deductible (d) and policy (u) limit is: Further, the qf of X * is: Note that p u = 1 − d+θ u+θ α . The log-likelihood function becomes Candidate 5: Suppose random variable X is distributed according to Paralogistic distribution.The pdf and cdf for a ground-up Paralogistic loss distribution are: Then cdf of X * representing the loss data with deductible (d) and policy (u) limit is: Further, the qf of X * is: . Then the log-likelihood function becomes Candidate 6: Suppose random variable X follows a Weibull distribution.The pdf and cdf for a ground-up Weibull loss distribution are: Then cdf of X * representing the loss data with deductible (d) and policy (u) limit is: Further, the qf of X * is:

MLE Approach
Parametric methods use the observed data x * 1 , . . ., x * n and fully recognize their distributional properties.The Maximum Likelihood Estimation (MLE) approach is one of the most common estimation techniques.Parameter estimates are found by maximizing the following log-likelihood function: where 1{•} denotes the indicator function and u − represents a point closing to u from the left.
the observed Fisher information represents the amount of information that the observed data provides about the parameter θ.Intuitively, a high Fisher information indicates that the data contain valuable information about the parameter, and it implies that the estimate of the parameter based on the data is likely to be more precise.On the other hand, a low Fisher information suggests that the data provides little information about the parameter, leading to higher uncertainty in the estimate.The variancecovariance matrix is related to the inverse of the Fisher information matrix.

Model Validation
A variety of graphic tools, hypothesis tests, and penalized likelihood criteria are employed to select the model and validate the performance.

Quantile-Quantile Plot
A Quantile-Quantile (QQ) plot is a graphical representation used to compare two probability distributions.To construct a QQ plot, we plot the sample distribution on the Y −axis and the theoretical distribution on the X−axis.When the points in the QQ plot align closely to a diagonal line at 45 degrees, it indicates a high level of agreement between the sample distribution and the theoretical distribution.
In order to avoid visual distortions due to large spacings between the most extreme observations, both axes can be measured on the logarithmic scale.The QQ plot is generated using the following formula. log where (n) denote the ordered claim severities, u i = (i − 0.5)/n is the quantile level.

Kolmogorov-Smirnov Test and Anderson-Darling Test
Besides the graphical visual check, multiple quantitative hypothesis tests can help compare the model performance and specify an appropriate fitted parametric distribution.Here Kolmogorov-Smirnov (KS) Test and Anderson-Darling (AD) Test are considered in the following model validation procedure.
For a KS test, the test statistic is where d is the left truncation point (d = 0 if there is no truncation) and u is the right censoring point (u = ∞ if there is no censoring).
AD Test is similar to the KS test but uses a different measure of the difference between the two distribution functions.The test statistic is The AD goodness-of-fit statistic measures the cumulative weighted quadratic distance (with more weight on the tails) between the empirical cdf and the parametrically estimated cdf.The computational formula for LTRC data is given by where the unique noncensored data points are with s, tr, and det denoting the rank, trace, and determinant of Σ (the variance-covariance matrix), respectively.

Validation Methods
As we know, the smaller the value of a likelihood-based measure is, the better the performance of a parametric loss model.However, how much difference in a likelihood-based measure really matters and separates the models?Suppose we set up a null hypothesis H 0 : Model i is the parent candidate that fits a data set and use BIC to select the candidate.(8), p. 403 defined ∆ i = BIC i − BIC min , the difference of BIC between the model i and the model with the smallest BIC, and then used the range of ∆ i (see From the definition, the value of BIC highly depends on the sample size n, the larger the sample size, the higher the scale of a BIC could be.Therefore, for the same difference ∆ i = 10, it is relatively easier to separate fitted models for a data set with sample n = 1, 000 rather than a data set with n = 10, 000.To avoid such an impact of sample size on model selection and validation, we report two types of probabilities instead of the pure BIC values (or the differences) in the following simulation study.
• Type 1 -the probability of each parent distribution being correctly selected.Inspired by (14), we can fit all the candidate models including the parent one to an LTRC data, and then choose the model with the lowest BIC number as the best fit.We repeat this process N times and calculate the proportion of times each candidate is chosen as the best, resulting in the probability for each of the candidate models to be selected.The same process is followed for other likelihood-based criteria AIC and ICOMP and the distance-based measure KS statistic and AD statistic.
• Type 2 -the posterior model probabilities.Suppose we only have k models at hand and assume that at the beginning, each model has the same chance (1/k) being the underlying distribution.
Then the difference in likelihood can update the possibility of a model being selected and provides an insight about the best candidate model.Defined in ( 16) and ( 6), the posterior probability of model i is .
Here, the constant 1/2 in exponent will balance the multiplier 2 in the BIC definition and the ratio structure will reduce the impact of sample size on BIC comparison.The posterior probability of AIC and ICOMP can be calculated using the same formula, and we just need to change the term ∆ as the difference for those two criteria.

Simulation Study
In this section, we conduct an analysis of model uncertainty due to both left truncation and right censoring structure and investigate the effects of various combinations of proportions on the model selection for the above-mentioned loss models.We fix the deductible d and the policy limit u for all models and tune the parameter values of each model so that F (d) and F (u) are both pre-specified constants.The parameter values can be determined using the percentile matching at two percentile levels, namely F (d) and F (u).The data are then generated according to those parameter values of an underlying distribution and the MLE estimations are conducted to fit the data to each of the candidate models.Multiple model selection rules are implemented to specify the parent distribution of data and their performance are compared in terms of selection accuracy and likelihood probabilities ultimately.
The simulation study is designed with the following setups: The choice of sample size, truncation and censoring thresholds, truncation probability and censoring probability depends on practical considerations and research objectives.A sample size of 1,000 can be considered moderate for many statistical studies.It is often chosen due to constraints on resources.It can provide a reasonable level of statistical power to detect meaningful effects.A sample size of 100,000 is quite large and may be chosen for studies where researchers need to achieve a very high level of statistical power, detect small effect sizes, and adequately capture rare events.A deductible of 500 and a policy limit of 10,000 are very common in insurance industry, which provides a practical foundation for our choice of truncation threshold being 500 and censoring threshold being 10,000.A truncation probability of 10% is kind of low in insurance, which means not too much information loss due to truncation but is useful for comparison with another scenario of more information loss due to a higher truncation probability.A truncation probability of 50% is high in theory but quite ordinary in insurance practice due to the fact of many loss events with magnitudes below deductibles.Both a censoring probability of 15% and that of 20% imply that a substantial portion of the subjects experience censoring during the study.This could be chosen based on prior knowledge or to simulate realistic scenarios.The choice of these parameters depends on the balance between theoretical rigor, practical considerations, and the objectives of the study.Tables 5.1 -5.4 demonstrate how often LTRC data can correctly identify its underlying distribution with various combinations of left-truncation and right-censoring proportions and how the sample size affects correct identification.The selecting probabilities of a model within all indicator measurements (KS, AD, AIC, BIC, and ICOMP) are recorded as the first value in each cell and the likelihood-based approach also reports posterior probability in additional parenthesis, where the former number is the mean of 100 simulations while the latter one is the median.The computation is run through Matlab.
In a worst-case scenario of F (d) = 0.5 and F (u) = 0.8 in Table 5.3, we only use 30% of middle values to estimate the parameters and identify the parent distributions.As a result, the true underlying model is generally not distinguishable from other candidates when n = 1, 000.It is clear that AIC and BIC always tend to pick up Fisk model as the best fit no matter what the parent model is except Weibull as the parent distribution.In particular, if the data are generated from Paralogistic distribution, there is no chance for AIC and BIC to identify it since the probabilities that Paralogistic model is chosen as the best candidate are 0% and 2%, respectively.Meanwhile, KS and AD prefer to select Weibull, but this tendency overall is not as strong as the likelihood based measures AIC and BIC to select Fisk.ICOMP 0.00 (0.00, 0.00) 1.00 (1.00, 1.00) 0.00 (0.00, 0.00) 0.00 (0.00, 0.00) 0.00 (0.00, 0.00) These outcomes give rise to a scenario of model uncertainty, wherein posterior probabilities appear to equally fluctuate around 20%, leading to no signal for model preference.As the sample size is increased to 100, 000 in Table 5.4, all measures except ICOMP work reasonably well for separating models.The correct selection rate, especially for Paralogistic data, has moved up from 0% to 66% in likelihood-based measures, although there is still an 8% gap to the distance-based indicators.
With a 500 deductible (d = 500), 1890 LGPIF loss payments were recorded in the portfolio and the histogram of losses was displayed in Figure 6.1.It is clear that the loss values resemble a heavy-tailed distribution even after the logarithmic transformation.The above-mentioned Fisk, Frechet, Lognormal, Lomax, Paralogistic, and Weibull could be good candidates to investigate the uncertainty of claim payment for this insurance fund.This fund covers seven different types of property.Due to the various ranges of loss magnitude, we set a flexible policy limit for each individual type, stopping at the 80th percentile of the loss (that is F (u) = 0.8), respectively.The estimated parameters and the model selection indicators are recorded in Table 6.1.All values of AD, AIC and BIC are the lowest for Frechet model and the differences in  We also present plots of the fitted-versus-observed quantiles for the Fisk, Frechet, Lognormal, Lomax, Paralogistic, and Weibull distributions in Figure 6.2.Clearly, Frechet-estimated quantiles fall almost perfectly on the 45 • line against the empirical quantiles.On the other hand, Fisk, Lognormal, Lomax, Paralogistic, and Weibull QQ plots do not look as good, and having bottom observations above the 45 • line indicates that these models underestimate the right tail of the data.

Conclusion and Discussion
We have presented our research on model uncertainty and selection in the modeling of insurance loss data using left-truncated and right-censored (LTRC) distributions.We have considered six candidate distri- A variety of model validation methods, including Quantile-Quantile (QQ) plots, Kolmogorov-Smirnov (KS) and Anderson-Darling (AD) statistics are employed.These tools are critical for assessing how well each distribution fits the data.Model selection adopts information criteria such as Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and Information Complexity (ICOMP).These criteria help us choose the most suitable model while penalizing the complexity of the model.
A simulation study has been conducted to evaluate the performance of each model validation method and model selection criterion under different scenarios of left truncation and right censoring.This is achieved through calculating both selection probability and posterior probability for each candidate model.Uncertainty has been observed in the patterns regarding which model validation method or model selection criterion is outperforming others, depending on a lot of factors, such as sample size, the underlying true distribution, left truncation proportion, and right censoring proportion, etc.
In the simulation study, we have noticed that multiple models may be difficult to distinguish, and various tests and/or criteria may lead to the selection of different models being the best.It seems that distance-based criteria tend to select Weibull model and likelihood-based criteria prefer to select Fisk model among the considered candidate models.Then model averaging can be considered at this time.
Model averaging can provide more robust estimates, especially when there is uncertainty in identifying the true underlying distribution.
Analyzing actual local government property loss data in Wisconsin serves as a practical application of our modeling methodology.Based on the outputs of the model selection indicators and the QQ plots, we identify Frechet distribution as the best fit to LGPIF, which is a significant finding, especially given its relevance for modeling extreme events.
In summary, our study combines theoretical and empirical approaches to analyze LTRC insurance loss data.The use of various statistical tools and criteria helps ensure the reliability of our modeling results, which exhibit a certain degree of uncertainty in the model selection procedure illustrated through both simulation and case studies.Through our research in this paper, we would like to give the following recommendations in regard to uncertainty in the model selection procedure, using our selection probability framework.If KS and AD can provide a strong evidence to select a best candidate model, then it is very likely that AIC and BIC will come up with the same decision.When KS and AD can only provide a fairly weak evidence for the selection of a best model, then we may consider further use of AIC and BIC to reinforce the decision if a common best model has been selected by AIC/BIC as KS/AD.Otherwise, when the use of AIC and BIC leads to the selection of a different best model from the use of KS and AD, we may consider model averaging of two or more competing best models.
Our comprehensive study of modeling LTRC insurance loss data addresses various aspects of model uncertainty and selection, but there are also areas where some limitations and potential avenues for future research can be discussed.
There are some limitations to the assumptions about candidate distributions.Our study considers six candidate distributions for modeling insurance loss data.However, it is worth noting that these distributions may not capture all possible underlying distributions.There are other limitations to the sensitivity analysis.We have already mentioned that the performance of model validation methods and selection criteria will vary with factors such as sample size, left truncation and right censoring ratios.It would be useful to further quantify specific sensitivity levels.This information can help practitioners better understand the applicability of our methodology in various real-world scenarios.
In addition to exploring other distributions and conducting more quantitative sensitivity analysis, future research avenues could include, but are not limited to, robust model averaging, out-of-sample testing, incorporation of expert opinions, quantification of model uncertainty, and applications in other areas.First, we mention that model averaging can provide more robust estimates when there is uncertainty in identifying the true underlying distribution.Future research could delve deeper into methods and techniques for robust model averaging, exploring different weighting schemes or Bayesian model averaging.Second, while we have conducted simulation studies and applied our methods to real data, future research could focus on out-of-sample testing to evaluate how well the selected models generalize to new, unseen data.Third, incorporating expert opinions or domain knowledge into the model selection process can improve the relevance and accuracy of the selected distributions in practical applications.Fourth, it seems interesting to develop techniques that more explicitly quantify and visualize model uncertainty which can help decision-makers understand the range of potential outcomes based on different candidate distributions.Finally, we can explore how our modeling approach can be applied to areas beyond insurance loss data, such as finance, healthcare, or environmental modeling.
Incorporating these limitations and future research considerations will help enrich our work and provide a more complete understanding of the challenges and opportunities of modeling insurance loss data using LTRC distributions.It will also enhance the practical applicability of our findings.

Figure 6 . 1 :
Figure 6.1: Loss data of LGPIF when the deductible is 500

Figure 6 . 2 :
Figure 6.2: Q-Q plot of the fitted models to the real data example.
4 Model Selection Criteria and Validation methods4.1 Selection Criteria: AIC, BIC, and ICOMPAkaike Information Criteria (AIC) is a statistical measure that can be used to assess and compare the goodness-of-fit and complexity of different statistical models.AIC is a useful tool to compare different models in the context of model selection and is relevant when dealing with multiple candidate models.Akaike (1973) has provided formula asAIC = −2 log f x n θp + 2p,where p is the number of parameters.Bayesian Information Criteria (BIC) is another statistical measure which is used to assess quality of the goodness-of-fit and selection of models.Schwarz (1978) has provided formula to compute BIC asBIC = −2 log f x n θp + p • log(n).Similar to AIC, the model with the lowest BIC value is considered the most appropriate choice.It balances the trade-off between the model accurately capturing the data and employing a concise number of model parameters.Moreover, similar to AIC and BIC, (3, 4) has introduced another metric Information Complexity (ICOMP) and its corresponding value is given by

Table 4 .
1)to indicate the strength of evidence to take against the model i as the parent distribution.

Table 4 . 1 :
Strength of evidence to take against the model i Probability of selecting a model in various parent data distributions.** The likelihood-based approach has additional posterior probability in parenthesis (a, b), where a represents the mean, and b is the median.KS and AD on identifying correct distribution for Fisk and Lomax data, whereas the opposite performance are observed for Weibull data.In particular, when all five candidate distributions are used to fit Weibull data, KS and AD have 59% and 50% chances, respectively, to correctly select the underlying Weibull distribution while AIC and BIC only have around 20% to 23% correction rate.Meanwhile, ICOMP always specifies Lognormal model no matter what the underlying distribution, thus it is not a reliable selection criterion under this situation.Besides, selection probabilities and posterior probabil- ities are generally consistent to indicate the best fit, that means, when a selection probability is large enough to specify a candidate model, the corresponding posterior probability will move to a significantly higher value than the benchmark of 20%.
Note: θ and α are estimates based on the formulas in Section 2. For Lognormal distribution, θ represents μ and α represents σ.