Previous Article in Journal
Low Financial Risk of Default and Productive Use of Assets Through Hidden Markov Models
Previous Article in Special Issue
Optimizing Moral Hazard Management in Health Insurance Through Mathematical Modeling of Quasi-Arbitrage
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Model Misspecification and Data-Driven Model Ranking Approach for Insurance Loss and Claims Data

1
Department of Statistics, M.M.V, Banaras Hindu University, Varanasi 221005, India
2
Department of Mathematical Sciences, Bentley University, Waltham, MA 02452, USA
*
Authors to whom correspondence should be addressed.
Risks 2025, 13(12), 231; https://doi.org/10.3390/risks13120231 (registering DOI)
Submission received: 30 August 2025 / Revised: 4 November 2025 / Accepted: 18 November 2025 / Published: 28 November 2025
(This article belongs to the Special Issue Financial Risk, Actuarial Science, and Applications of AI Techniques)

Abstract

Statistical models are crucial in analyzing insurance loss and claims data, offering insights into various risk elements. The prevailing statistical notion that “all models are wrong, but some are useful” can wield significant influence, particularly when multiple competing statistical models are considered. This becomes particularly pertinent when all models portray similar characteristics within specific subsets of the support of the random variable under scrutiny. Since the actual model is unknown in practical scenarios, the challenge of model selection becomes daunting, complicating the study of associated characteristics of the actual data generation process. To address these challenges, the concept of model averaging is embraced. Often, averaging over multiple models helps alleviate the risk of model misspecification, as different models may capture distinct aspects of the data or modeling assumptions. This enhances the robustness of the estimation process, yielding a more accurate and reasonable estimate compared to relying solely on a single model. This paper introduces two novel data-based model selection methods—one using the likelihood function and the other using the density power divergence measure. The study focuses on estimating the Value-at-Risk (VaR) for non-life insurance claim size data, providing comprehensive insights into potential losses for insurers. The performance of the proposed procedures is evaluated through Monte Carlo simulations under both uncontaminated conditions and in the presence of data contamination. Additionally, the applicability of the methods is illustrated using two real non-life insurance datasets, with the VaR values estimated at different confidence levels.

1. Introduction

In the field of risk and insurance, probability theory and statistical models play important roles in various aspects, such as determining insurance pricing, assessing financial risks, and understanding claim size, claim count, and aggregate-loss distributions (Bahnemann 1996). Due to the stochastic nature of insurance data, the underlying data generation mechanism is always unknown, and choosing a suitable probability model to fit insurance data is a challenging problem. The statistical literature has considered different approaches to handling model uncertainty in such cases. For instance, one may use a statistical model with a larger number of parameters to increase flexibility and then test the significance of the added parameters. Another approach is to impose prior probability distributions on the model parameters in a Bayesian paradigm. While these methods offer benefits and advantages, they also come with their own set of drawbacks. For instance, increasing the number of model parameters amplifies model complexity and presents challenges in parameter estimation. Additionally, the roles of these added parameters in determining the distribution’s location, scale, and shape may become ambiguous. Furthermore, misspecification of the underlying probability model in fitting insurance data may lead to a substantial loss of efficiency in statistical analysis, inaccurate conclusions, and improper decisions. Therefore, it is crucial to identify the best-fit model from a pool of candidate models and understand the consequences of model misspecification in analyzing insurance risk data. This understanding significantly overcomes the negative impact of model misspecification and leads to feasible estimates of the characteristics under study. An interesting insight into the disparity arising from incorrect model identification followed by misspecified inferences can be found in Gustafson (2001).
The consequences of model misspecification have been explored across different domains in the statistical literature. Dumonceaux and Antle (1973) was the first to discuss the effects of model misspecification, in the form of a statistical hypothesis test problem, for analyzing the lifetimes of ball bearings using two competing probability models. Several studies have been dedicated to model discrimination and misspecification themes between two competing models. An interesting application analyzing electromigration failure in the interconnects of integrated circuits was presented by Basavalingappa et al. (2017). Besides such experiments being conducted in accelerated settings, followed by transforming the observed failures to represent lifetimes in regular conditions, the occurrence of failures in ICs is also sporadic. Another study in the context of a rare event, i.e., modeling high magnitude earthquakes in the Himalayan regions and the subsequent effects of model misspecification, can be found in Pasari (2018). The problem of model misspecification has also been explored for censored-sample cases (Block and Leemis 2008; Dey and Kundu 2012). Some other intriguing directions of model misspecification include the analysis of strength distribution for brittle materials by Basu et al. (2009) and one-shot device test data by Ling and Balakrishnan (2017). A comprehensive investigation into the impact of misspecification of statistical models in insurance and risk management, particularly concerning censored, truncated, and outlier-laden insurance loss and claims data, is of practical interest. Following the insights of scholars such as White (1982), Chow (1984), and Fomby and Hill (2003), we examine the misspecification of statistical models and investigate the behavior of point estimators for key risk measures.
Before conducting statistical inference (such as estimation, prediction, hypothesis testing, etc.) on insurance data, it is often assumed that the data originate from a specific parametric probability model on which the analysis is based. A vast array of probability distributions is available in the literature for fitting insurance data. For instance, distributions such as gamma, log-gamma, lognormal, and Weibull are commonly employed to model insurance claim size, while Poisson, negative binomial, and Delaporte distributions are often used to model insurance claim counts (Brazauskas and Kleefeld 2011; Hewitt and Lefkowitz 1979; Rolski et al. 1999; Zelinková 2015). However, in practical applications, when employing a statistical distribution for data analysis, the underlying probability distributions of various risk elements and the stochastic mechanisms that generate loss and claims data are often unknown. Additionally, the probability model may be misspecified, and the insurance datasets may be contaminated or contain outliers.
Broadly, an insurance company encounters two types of fund claim variables. One is the extent of the payout, in monetary terms, commonly referred to as the claim severity (size) variable, and the other is the variable representing the number of claims (also known as claim frequency) arising in a given period (Loss Data Analytics Core Team 2020). In this study, our interest lies in exploring the risks associated with the amount paid for the claim, commonly referred to as the claim size. An important characteristic of a claim severity model is its ability to represent the worst potential loss due to adverse events such as natural disasters, accidents, or other unforeseen circumstances that an insurer might experience over a specified duration at a certain confidence level δ ( 0 , 1 ) , commonly referred to as Value-at-Risk (VaR) (Jorion 2006, chap. 5; Christoffersen 2012). Thus, VaR helps assess the potential financial impact of extreme but plausible events over a specific time horizon. By quantifying the maximum expected loss within a certain confidence level (e.g., 95 % or 99 % ), insurers can better understand their exposure to risk and make informed decisions regarding capital reserves, reinsurance arrangements, and pricing strategies. Let X be the random variable representing the loss size. For a pre-specified value δ ( 0 , 1 ) (e.g., δ = 0.95 ), a general definition of VaR with confidence level 100 δ % is Pr ( X > V a R ) 1 δ . Thus, for a confidence level of 100 δ % , the VaR can be interpreted as the threshold loss such that the probability of incurring a loss greater than this amount is at most 1 δ . If the loss variable X has CDF F θ ( · ) with parameter vector θ , then its quantile function is denoted by Q θ ( δ ) = F θ 1 ( δ ) (Tse 2023). Therefore, the VaR with confidence level 100 δ % corresponds to the 100 δ % quantile of the loss distribution, i.e., V a R = Q θ ( δ ) , for a given time horizon.
In the context of the importance of selecting a suitable distribution for analyzing insurance data, Yu (2020) considered six parametric distributions for modeling and analyzing operational loss data related to external fraud types in retail banking branches of major commercial banks in China from 2009 to 2015. They observed that different distributional assumptions yielded significantly different VaR estimates, leading to inconsistent conclusions. As emphasized by Wang et al. (2020), selecting an appropriate claim severity distribution is crucial, especially in the presence of extreme values. Hogg and Klugman (1984) investigated claim severities in non-life insurance, such as those resulting from hurricane damage or automobile accidents, and noted their tendency to exhibit a heavy-tailed skewed nature. Various heavy-tailed as well as regular models, including gamma, lognormal, log-gamma, Weibull, Pareto, and log-logistic, have been employed in modeling non-life insurance data (see, for example, Kleiber and Kotz 2003; Klugman et al. 2012, and the references cited therein). Additionally, when claim sizes are large, settlements are often delayed due to case-by-case handling, necessitating consideration of the role of inflation, which can contaminate claim sizes (Hogg and Klugman 1984; Kaas et al. 2008). Hence, a robust estimation procedure is required to analyze claim severities.
In this paper, we aim to examine the effects of model misspecification on modeling insurance data. Since the underlying data-generating mechanism of insurance data is often unknown, and such data frequently contain influential observations and/or outliers, we propose two robust weighted estimation criteria based on likelihood and divergence measures. We then compare the performance of the proposed methods through Monte Carlo simulation. Our goal is to advance the understanding of model fitting and analysis for insurance and risk data, as well as the related inferential methods, and help practicing actuaries and academic actuaries better understand the effects and consequences of model misspecification when model uncertainty is present. With an emphasis on practical implementation and pedagogical clarity, this paper provides accessible and robust tools for practitioners and educators in actuarial science and risk analysis.
The rest of this paper is organized as follows. Section 2 provides an overview of the probability models used in this study, along with a brief review of their applications for non-life insurance claim size data. In Section 3, we discuss the estimation of model parameters and VaR using the likelihood method and density power divergence (DPD) method. Section 4 introduces the rationale of the proposed data-driven model ranking and selection approach based on likelihood and DPD measures, followed by a Monte Carlo simulation study in Section 5 that details the performance of the proposed procedures, considering both the presence and absence of data contamination. Section 6 presents two real data analyses to illustrate the proposed model ranking and selection method developed in this work. Finally, in Section 8, we provide concluding remarks and directions for future work.

2. Probability Models for Claim Severity

In this section, we consider six commonly used probability models in insurance and finance, namely the Fisk, Fréchet, Lomax, lognormal, paralogistic, and Weibull distributions, to model the severity of claims for non-life insurance data. For notation convenience, we denote these six models as M j ; j = 1 , , 6 in the remaining length of the article, where M 1 , M 2 , M 3 , M 4 , M 5 and M 6 represents the Fisk, Fréchet, Lomax, lognormal, paralogistic, and Weibull distributions, respectively. The corresponding probability density functions (PDFs), cumulative distribution functions (CDFs), and VaR values of model M j with parameter vectors θ j are denoted as f θ j ( · ) , F θ j ( · ) and Q θ j ( δ ) , δ ( 0 , 1 ) ( j = 1 , , 6 ), respectively. Although we primarily considered six probability distributions in this paper, the methodologies developed can be applied to any candidate models with different probability distributions. Moreover, the methodologies can be extended to candidate statistical models with different numbers of parameters by suitably penalizing models with more parameters based on the practitioner’s preference and the parsimony principle. Additionally, the methodologies discussed here can be extended directly to truncated probability distributions, which are often more appropriate for claims subject to deductibles or policy limits.

2.1. Fisk Distribution

The Fisk distribution was originally proposed by Fisk (1961) as a limiting form of the Champernowne distribution Champernowne (1952) in connection with modeling income. A random variable X following a Fisk distribution (denoted as F S ( β 1 , λ 1 ) ) has PDF, CDF, and VaR Q θ 1 ( δ )
f θ 1 ( x ) = β 1 λ 1 β 1 x β 1 1 ( x β 1 + λ 1 β 1 ) 2 , for   x > 0 ; 0 , otherwise ,
F θ 1 ( x ) = x β 1 x β 1 + λ 1 β 1 , for   x > 0 ; 0 , otherwise ,
Q θ 1 ( δ ) = λ 1 δ 1 δ 1 / β 1 , for   0 < δ < 1 ; 0 , otherwise ,
respectively, where θ 1 = ( β 1 , λ 1 ) is the parameter vector with β 1 > 0 as the shape parameter and λ 1 > 0 representing the median of the distribution.
The Fisk distribution is also known as the log-logistic distribution because it can be obtained by using the logarithmic transformation of the logistic distribution. The Fisk distribution offers insight into the distribution of income sizes, derived from the reasoning of income generation processes Singh and Maddala (1976). In the context of income inequality, the Fisk distribution provides a mathematical representation of the unequal distribution of income. For more details on income distribution and income inequality, the readers may refer to Kakwani (1980) and Kakwani and Son (2022). Buch-larsen et al. (2005) discussed the suitability of the Fisk distribution to model non-life insurance data since it is capable of accommodating small as well as large claim sizes, owing to its heavy-tailed nature, which is otherwise a challenge for regular models.

2.2. Fréchet Distribution

The Fréchet distribution, named after mathematician Maurice René Fréchet, is a probability distribution derived as the limiting distribution of extreme values. It is also known as the inverse Weibull distribution, which has various applications in economics, hydrology, and meteorology to model extreme events or phenomena. It is often employed to describe the distribution of extreme values in a dataset, such as maximum wind speeds, flood levels, or income levels (Kotz and Nadarajah 2000).
A random variable X follows a Fréchet distribution (denoted as F R ( β 2 , λ 2 ) ) with PDF, CDF, and VaR Q θ 2 ( δ ) :
f θ 2 ( x ) = β 2 λ 2 β 2 x ( β 2 + 1 ) e x / λ 2 β 2 , for   x > 0 ; 0 , otherwise ,
F θ 2 ( x ) = e x / λ 2 β 2 , for   x > 0 ; 0 , otherwise ,
Q θ 2 ( δ ) = λ 2 ( ln δ ) 1 / β 2 , for   0 < δ < 1 ; 0 , otherwise ,
where θ 2 = ( β 2 , λ 2 ) is a parameter vector, λ 2 > 0 is the scale parameter and β 2 > 0 is the shape parameter.

2.3. Lomax Distribution

The Lomax distribution was introduced by K. S. Lomax (1954), which is commonly employed in modeling income distributions, insurance claim sizes, and other phenomena where observations are bounded from below and exhibit heavy-tailed behavior. This distribution offers valuable insights into extreme events and tail behavior, making it a valuable tool in various analytical and modeling contexts. A random variable X has a Lomax distribution (denoted as L M ( β 3 , λ 3 ) ) with PDF, CDF, and VaR Q θ 3 ( δ ) :
f θ 3 ( x ) = β 3 λ 3 β 3 ( x + λ 3 ) ( β 3 + 1 ) , for   x > 0 ; 0 , otherwise ,
F θ 3 ( x ) = 1 λ 3 x + λ 3 β 3 , for   x > 0 ; 0 , otherwise ,
Q θ 3 ( δ ) = λ 3 ( 1 δ ) 1 / β 3 1 , for   0 < δ < 1 ; 0 , otherwise ,
respectively, where θ 3 = ( β 3 , λ 3 ) is a parameter vector, λ 3 > 0 is the scale parameter and β 3 > 0 is the shape parameter.

2.4. Lognormal Distribution

The lognormal distribution is often used to model phenomena where the logarithm of the variable of interest is normally distributed, such as the distribution of stock prices, income levels, or the size of biological populations. Mathematically, if a random variable Z = ln X is normally distributed, then the distribution of X is said to be lognormal. It is particularly useful when dealing with variables that are inherently positive and skewed, as the lognormal distribution naturally accommodates these characteristics. The lognormal distribution is also known as the Galton distribution. For more details on lognormal distribution, one may refer to Chapter 14 of Johnson et al. (1994), the book by Crow and Shimizu (1988), and the references therein.
The PDF, CDF, and VaR Q θ 4 ( δ ) of a lognormal variate X (denoted as L N ( β 4 , λ 4 ) ) are
f θ 4 ( x ) = 1 2 π β 4 x exp ( ln x λ 4 ) 2 2 β 4 2 , for   x > 0 ; 0 , otherwise ,
F θ 4 ( x ) = Φ ln x λ 4 β 4 , for   x > 0 ; 0 , otherwise ,
Q θ 4 ( δ ) = exp λ 4 + β 4 Φ 1 ( δ ) , for   0 < δ < 1 ; 0 , otherwise ,
respectively, where θ 4 = ( β 4 , λ 4 ) is a parameter vector with exp ( λ 4 ) as the scale parameter, β 4 > 0 as the shape parameter, Φ ( · ) and Φ 1 ( · ) representing the CDF and inverse CDF (quantile function) of the standard normal distribution.

2.5. Paralogistic Distribution

The paralogistic distribution is a statistical distribution used primarily in econometrics, finance, and risk management, which is characterized by its flexibility in modeling skewed or heavy-tailed data. As demonstrated by Kleiber and Kotz (2003), the paralogistic distribution can be applied to economic modeling of wealth and income. The paralogistic distribution allows for asymmetry and heavier tails, which provides flexibility in capturing the tail behavior of the data.
A random variable X has a paralogistic distribution (denoted as P L ( β 5 , λ 5 ) ) with PDF, CDF, and VaR Q θ 5 ( δ ) :
f θ 5 ( x ) = β 5 2 ( x / λ 5 ) β 5 x 1 + ( x / λ 5 ) β 5 β 5 + 1 , for   x > 0 ; 0 , otherwise ,
F θ 5 ( x ) = 1 1 + ( x / λ 5 ) β 5 β 5 , for   x > 0 ; 0 , otherwise ,
Q θ 5 ( δ ) = λ 5 ( 1 δ ) 1 / β 5 1 1 / β 5 , for   0 < δ < 1 ; 0 , otherwise ,
respectively, where θ 5 = ( β 5 , λ 5 ) is a parameter vector, λ 5 > 0 is the scale parameter and β 5 > 0 is the shape parameter.

2.6. Weibull Distribution

Weibull distribution has been used for analyzing data from different domains like life-testing experiments, inter-arrival times of earthquakes, wind speed data across different seasons and years, lead time demand in inventory control, water pollution levels, size of non-life insurance claims (Abubakar and Sabri 2023; Bowden et al. 1983; Das and Nath 2022; Adeleke and Ibiwoye 2011; Kreer et al. 2015; Luko 1999; Smith 2003; Tadikamalla 1978; Wais 2017). The theory of extreme values shows that the Weibull distribution can be used to model the minimum of many independent positive random variables from a specific class of statistical distributions. For well-documented, detailed references on the theory and applications of the Weibull distribution, one may refer to Chapter 21 of Johnson et al. (1994) and the book by Rinne (2008).
A random variable X is said to follow a Weibull distribution (denoted as W E ( β 6 , λ 6 ) ) with PDF, CDF, and VaR Q θ 6 ( δ ) :
f θ 6 ( x ) = β 6 λ 6 x λ 6 β 6 1 exp x / λ 6 β 6 , for x > 0 ; 0 , otherwise ,
F θ 6 ( x ) = 1 exp x / λ 6 β 6 , for x > 0 ; 0 , otherwise ,
Q θ 6 ( δ ) = λ 6 ( ln ( 1 δ ) ) 1 / β 6 , for 0 < δ < 1 ; 0 , otherwise ,
respectively, where θ 6 = ( β 6 , λ 6 ) is a parameter vector, λ 6 > 0 is the scale parameter and β 6 > 0 is the shape parameter.

3. Motivation and Estimation of Model Parameters

Given the pivotal role of statistical distributions in modeling insurance and claims data, this study aims to investigate the implications and consequences of model misspecification and offer practical and transparent solutions to mitigate its adverse effects. In this section, we use the real dataset dataCar from the insuranceData package (Wolny-Dominiak and Trzesiok 2014) in R (R Core Team 2025) to motivate the study by examining the behavior of the estimated PDFs and CDFs for all candidate models described in Section 2, as illustrated in Figure 1. The dataset contains information on vehicle insurance policies issued between 2004 and 2005, with variables such as vehicle value (in $10,000), the number of claims filed, and the amount of claims. The variable of interest is the claim amount. Specifically, we use the numeric variable clmcst0 for cases in which a claim was filed (based on the clm variable, where 0 indicates no claim, and 1 indicates at least one claim). From Figure 1, the fitted PDFs and CDFs appear broadly similar. However, the maximum likelihood estimates (MLEs) of the parameters, presented in Table 1, differ substantially among different models. Because tail behavior affects the value of VaR with confidence level 100 δ % , especially when δ is close to 1, and is sensitive to model choice, we also report parameter estimates from the minimum density power divergence (MDPD) estimator, a more robust alternative, which will be introduced in subsequent sections and used later in our selection and ranking procedures (Table 2).
In such cases, fitting candidate models to a real dataset and comparing the fits with the empirical density or distribution may suggest that all the models are suitable for explaining the random phenomenon of interest. However, these models can differ substantially, especially in their tail behavior. Given the heavy-tailed nature of non-life insurance claims, parameter estimation or model discrimination based solely on likelihood-based criteria can be restrictive due to their lack of robustness. Tail-related statistics, such as VaR, are particularly sensitive to the choice of the underlying model, which may lead to misleading conclusions and decisions. To illustrate this point, we report the true VaR values at the 95% and 99% levels for these candidate distributions in Table 1, where it is evident that the VaR estimates differ considerably across models. In such situations, it is intuitively appealing to estimate model parameters using robust procedures, such as the minimum DPD estimation method.
Furthermore, since the VaR values of all the considered models differ significantly (see Table 1), in such a situation, estimating the VaR based on the given data using a single model will likely yield an estimator with a much higher risk of incorrect conclusions, intuitively, it may be argued that for such cases, adopting a model averaging method for estimating the parameters of interest would substantially reduce the risks associated with model uncertainty and misspecification (Steel 2020).
In the following, we first discuss the estimation procedures used to estimate the parameters of the considered models. Subsequently, we provide a mathematical framework for the proposed model averaging techniques used to obtain VaR estimates. For the estimation of the unknown parameter vector of a considered model, under the assumption that the data has originated from f θ j ( · ) , j = 1 , , 6 , we adopt the method of maximum likelihood and the method of minimized density power divergence, respectively. Since the true population is unknown, assuming any of the above six models to be true is considerable for the type of random variable being studied here, irrespective of the true underlying model.

3.1. Likelihood-Based Estimation

Let x = ( x 1 , x 2 , , x n ) be an independent and identically distributed (i.i.d.) sample of size n from f θ j ( · ) , j = 1 , , 6 . Then, the likelihood function (LF) for the parameter vector θ j based on the observed sample x is
L ( θ j | x ) = i = 1 n f θ j ( x i ) ; j = 1 , 2 , , 6 .
The maximum likelihood estimate (MLE) of the parameter vector θ j , denoted as θ ^ j , is obtained by maximizing L ( θ j | x ) or, equivalently, the log-likelihood ln L ( θ j | x ) with respect to θ j , i.e.,
θ ^ j = arg max θ j L ( θ j | x ) arg max θ j ln L ( θ j | x ) .
Owing to the invariance property of MLEs, we can obtain the MLE of a function g ( θ j ) by substituting θ ^ j into g ( θ j ) , i.e., g ( θ ^ j ) is the MLE of g ( θ j ) . Hence, the MLE of VaR for a specific confidence level δ of the j-th model can be obtained as Q ^ θ ^ j ( δ ) . For notational convenience, we denote the MLE of VaR with confidence level δ under distribution f θ j ( · ) as Q ^ j ( δ ) = Q ^ θ ^ j ( δ ) .

3.2. Divergence-Based Estimation

The minimum density power divergence estimation method was proposed by Basu et al. (1998) as a contemporary to the existing estimation methods based on density-based divergence measures. The density power divergence (DPD) measures the difference between a data-driven PDF g ( · ) G and an assumed parametric model f θ ( · ) G with parameter vector θ , G being the set of distributions having densities conforming to a dominating measure. Mathematically, it is defined as
d α ( g , f θ ) = f θ ( 1 + α ) ( z ) 1 + 1 α g ( z ) f θ α ( z ) + 1 α g ( 1 + α ) ( z ) d z , α > 0 ,
where d α ( g , f θ ) > 0 for all PDFs on the Lebesgue measure and is 0 almost everywhere if and only if f θ g . The MDPD estimator of the parameter vector θ , denoted as θ ˜ , is obtained by minimizing the DPD measure in Equation (21) with respect to θ , i.e., θ ˜ = arg min θ d α ( g , f θ ) .
For general families of distribution, the MDPD estimating equation has the form
U n ( t ) = u t ( z ) f t 1 + α ( z ) d z n 1 i = 1 n u t ( X i ) f t α ( X i ) = 0 ,
where u t ( z ) = ln f t ( z ) / t is the score function. For α > 0 , U n ( t ) induces a down-weighting mechanism, in which observations that are outliers under the assumed model receive smaller weights. As α 0 , minimizing Equation (21) reduces to the MLE, yielding fully efficient estimators in which all observations (outliers included) are effectively weighted equally. At α = 1 , minimizing Equation (21) corresponds to the minimum L 2 distance criterion. Thus, α acts as a tuning parameter that controls the trade-off between robustness and efficiency of the resulting estimators.
Thus, based on a random sample x of size n from f θ j , the MDPD estimate θ ˜ j can be obtained by minimizing the empirical form of Equation (21) given below:
H α ( θ j | x ) = f θ j 1 + α ( x ) d x 1 + 1 α n 1 i = 1 n f θ j α ( x i ) ; j = 1 , 2 , , 6
or by solving
u θ j ( x ) f θ j 1 + α ( x ) d x n 1 i = 1 n u θ j ( x i ) f θ j α ( x i ) = 0 ; j = 1 , 2 , , 6 ,
where u θ j ( x ) = ln f θ j ( x ) / θ j is the score function corresponding to the density f θ j , instead of minimizing Equation (21), since the term g 1 + α ( z ) d z in Equation (21) is independent of θ and the term g ( z ) f θ α ( z ) d z can be replaced by f θ α ( z ) d G ( z ) , where d G ( z ) serves an empirical estimate of g ( z ) , owing to its linear involvement. Note that, as a limiting case, when α 0 , the DPD reduces to the Kullback-Leibler divergence and yields an estimator equivalent to the MLE. On the other hand, the MDPD estimator is equivalent to the minimum mean square error estimator for α = 1 . Pertaining to the behavior of the DPD measure for α in ( 0 , 1 ) , a reasonable choice of α lies between 0 and 1 since the efficiency of the estimators gradually decreases with the increase in α (Ghosh et al. 2017; Jones et al. 2001).
One of the interesting properties of the MDPD estimation method is that the MDPD estimators also adhere to the invariance principle, akin to the MLEs. This property allows us to study the behavior of VaR or any other characteristic of a probability model based on MDPD estimates without delving into any complicated mathematical framework. Thus, the MDPD estimator of VaR for a given model f θ j can be obtained as Q ˜ θ ˜ j ( δ ) , corresponding to the chosen confidence level δ . Once again, for notational convenience, we denote the MDPD estimate of VaR with confidence level δ as Q ˜ j ( δ ) = Q ˜ θ ˜ j ( δ ) .

4. Estimating Value-at-Risk via Model Averaging

Based on earlier discussions, it is essential to identify the best-fitting model to minimize the degree of uncertainty and inconclusiveness in estimating the quantity of interest. However, owing to the closeness of these models in highly dense regions and highly dissimilar tail behaviors, the effect of misspecification on the VaR in the case of claim size data requires special attention. Two significant kinds of model misspecifications can be considered: (i) a wrongly specified model (e.g., the underlying model is lognormal, but a Weibull model is assumed) and (ii) a contaminated-observation model (e.g., the underlying model is Weibull, but some contaminated observations/outliers follow a different distribution or the same class of distributions with different parameters). For empirical investigation, following White (1982) and Chow (1984), we examine the consequences of model misspecification in terms of the relative bias and relative variability when the MLE and MDPD are used to estimate the VaR through a Monte Carlo simulation study.
This section describes the proposed model selection and ranking approaches based on the maximum likelihood method and the minimum density power divergence method. We consider the practical situation that a specific model M j with probability density f θ j ( j = 1 , 2 , , 6 ) is considered as the assumed model for modeling a given dataset. Based on this assumption, the MLEs and MDPD estimates of the model parameters, as well as the VaR, can be obtained, along with the associated maximized value of the LF and minimized value of the DPD measure, for the given dataset.

4.1. Model Selection Approach

We consider a model selection approach to select the best-fitting model for a particular dataset. Here, we consider two different approaches for model selection based on the maximized value of the LF and the minimized value of the DPD function for a given dataset x.
For the approach based on the maximized LF, the probability distribution with the maximized value of the likelihood has an intuitive appeal and can be reasonably claimed to be the best among all the considered models. Thus, for a given dataset, we can order all the k candidate models based on the magnitude of the maximized value of the LF if these models have the same number of parameters. Specifically, we denote M [ 1 ] > M [ 2 ] > > M [ k ] , where the subscript [ · ] indicates the ordered position of a model, and the notation “>” means “is preferable to”. The same idea can be implemented using the DPD measure. For a given dataset x, the model yielding the minimum value of the DPD statistic H α ( θ ˜ j | x ) in Equation (21) can be considered as the closest one to the true model g ( · ) among all the candidate models. Hence, based on the minimized value of the DPD statistic, one can order the models such that M [ 1 ] * > M [ 2 ] * > > M [ k ] * , where M [ 1 ] * is the model with the smallest value of the DPD statistic, M [ 2 ] * is the model with the second smallest value of the DPD statistic, and so on. In other words, M [ 1 ] = M j means model M j is the most preferable model based on the likelihood function approach, and M [ 1 ] * = M j means model M j is the most preferable model based on the DPD measure approach. For the model selection approach, after selecting the most probable model, we can estimate the VaR with the selected models based on likelihood function and DPD measure, i.e., models M [ 1 ] and M [ 1 ] * , respectively.

4.2. Model Ranking Approach

To reduce the risk of misspecification and improve the accuracy of estimating the quantity of interest, model averaging methods have been widely adopted and found useful across various domains (see, for example, Chatfield 1995; Dormann et al. 2018; Okoli et al. 2018; Wintle et al. 2003). One argument for using the VaR estimates from multiple models for a given dataset is to mitigate the potential risks associated with relying on a single selected model, especially since the true model is unknown in practice. Model averaging essentially protects against a potentially suboptimal or misspecified model. Moreover, estimates from an individual model may exhibit high variance, whereas combining estimates from multiple models can effectively reduce this variance, leading to a more stable and accurate overall estimate. An ensemble estimate is typically less susceptible to the limitations of any single model and leverages the collective strengths of multiple models. Kass and Raftery (1995) and Raftery (1995) discussed the implementation of model averaging in the Bayesian paradigm through the use of posterior odds and/or the Bayes factor. However, the Bayesian model averaging methods can be sensitive to the choice of priors. An interesting yet simple method for model averaging in the frequentist framework was explored by Buckland et al. (1997) in which scaled weights w j are assigned to each of the candidate models M j ( j = 1 , , k ) and the weighted average is the final estimate of the quantity of interest.
Following Buckland et al. (1997), we compute the weighted estimate of the quantity of interest. For instance, if the quantity of interest is VaR, instead of estimating the VaR based on the model with the maximized value of the LF or the model with the minimum value of the DPD statistic (i.e., the model selection approach in Section 4.1), we consider the best k * ( < k ) models to obtain an estimate of VaR. Suppose the MLEs and MDPD estimates of VaR at δ level based on models M [ ] and M [ ] * are Q ^ [ ] ( δ ) and Q ˜ [ ] ( δ ) , respectively, then using the model ranking approach, the weighted VaR estimates based on maximum likelihood and DPD, denoted as Q ^ * ( δ ) and Q ˜ * ( δ ) , can be obtained as follows:
Q ^ * ( δ ) = = 1 k * w · Q ^ [ ] ( δ )
and   Q ˜ * ( δ ) = = 1 k * w * · Q ˜ [ ] ( δ )
respectively, with weights w and w * , = 1 , 2 , k * , such that l = 1 k * w = l = 1 k * w * = 1 . The associated weights w ( w * ) are considered proportional to the maximized (minimized) value of LF (DPD), i.e., w 1 > > w k * ( w 1 * > > w k * * ). Specifically, for the likelihood approach, suppose L F [ ] is the value of maximum likelihood based on model M [ ] and hence, L F [ 1 ] > L F [ 2 ] > L F [ 3 ] > > L F [ k ] , then, for the top k * models, the weights in terms of the maximized value of the LF can be expressed as
w = L F [ ] = 1 k * L F [ ] ,
for = 1 , 2 , , k * , such that w 1 > w 2 > > w k * , and = 1 k * w = 1 . Similarly, for the divergence approach, suppose D P D [ ] is the value of density power divergence based on model M [ ] * , hence, D P D [ 1 ] < D P D [ 2 ] < < D P D [ k ] , and since the values of H α ( θ ˜ | x ) for all considered models are less than 0, for the top k * models, the weights in terms of the minimized values of the DPD can be expressed as
w * = | D P D [ ] | = 1 k * D P D [ ] ,
for = 1 , 2 , , k * , such that w 1 * > w 2 * > > w k * * , and = 1 k * w * = 1 .
Note that the model selection approach discussed in Section 4.1 can be formulated as a special case of the model ranking approach by setting w 1 = 1 and w 2 = = w k * = 0 (or w 1 * = 1 and w 2 * = = w k * * = 0 ). Furthermore, employing a specific model M j without model selection can be viewed as a special case within the framework of the proposed model ranking approach, achieved by assigning w j = 1 and w = 0 for j . Consequently, the efficacy of the proposed model ranking approach aligns with that of the non-model selection approach when suitable values for k * and the weights are chosen. See Li et al. (2020); Mehrabani and Ullah (2020); Miljkovic (2025); Miljkovic and Grün (2021) and related work for further discussion of weighted estimates from multiple models.

5. Monte Carlo Simulation Study

In this section, a Monte Carlo simulation study is used to evaluate the performance of the proposed methodologies in the presence and absence of contamination. To compare the performance of different approaches and estimators, we consider the simulated biases and root mean square errors (RMSEs) of the MLE and MDPD estimates of VaR with confidence level δ under statistical distribution f θ j based on N = 1000 simulations, which can be computed as
B i a s ( Q ^ j ( δ ) ) = 1 N s = 1 N Q ^ j ( s ) ( δ ) Q j ( δ ) ,
R M S E ( Q ^ j ( δ ) ) = 1 N s = 1 N Q ^ j ( s ) ( δ ) Q j ( δ ) 2 1 / 2 ,
B i a s ( Q ˜ j ( δ ) ) = 1 N s = 1 N Q ˜ j ( s ) ( δ ) Q j ( δ ) ,
R M S E ( Q ˜ j ( δ ) ) = 1 N s = 1 N Q ˜ j ( s ) ( δ ) Q j ( δ ) 2 1 / 2 ,
where Q ^ j ( s ) ( δ ) and Q ˜ j ( s ) ( δ ) are the MLE and MDPD estimate of VaR of f θ j with confidence level δ in the s-th simulation, and Q j ( δ ) is the true value of VaR of f θ j with confidence level  δ .
To facilitate the comparison of the various estimation methods and the proposed model selection/ranking strategies for addressing model uncertainty, we compute the relative bias and relative RMSE of two estimators, say Q ^ ( 1 ) and Q ^ ( 2 ) , defined as
R B i a s ( Q ^ ( 1 ) , Q ^ ( 2 ) ) = B i a s ( Q ^ ( 1 ) ) B i a s ( Q ^ ( 2 ) )
and   R R M S E ( Q ^ ( 1 ) , Q ^ ( 2 ) ) = R M S E ( Q ^ ( 1 ) ) R M S E ( Q ^ ( 2 ) ) ,
respectively. The absolute value of the relative bias in Equation (30) and the value of relative RMSE in Equation (31) less than 1 indicate that estimator Q ^ ( 2 ) is a better estimator compared to Q ^ ( 1 ) . The sign of the relative bias in Equation (30) indicates if the directions of the biases of the two estimators Q ^ ( 1 ) and Q ^ ( 2 ) are the same (positive RBias) or they are opposite to each other. The simulation results are obtained based on a simulation size N = 1000 and reported in the Appendices.
First, considering the assumed model and the true model agree (i.e., without model misspecification), we compare the performance of the likelihood-based and divergence-based parameter estimation procedures presented in Section 3. The average MLEs and MDPD estimates of Q j ( 0.99 ) for the six probability models in Section 2 ( j = 1 , 2 , , 6 ) , the biases and RMSEs of the MLEs, and the relative biases and relative RMSEs of the MDPD estimates relative to the MLEs (i.e., R B i a s ( Q ˜ j ( δ ) , Q ^ j ( δ ) ) and R R M S E ( Q ˜ j ( δ ) , Q ^ j ( δ ) ) ) are presented in Table A1. From Table A1, it is evident that when the assumed model is indeed the true model and without any contaminated observations, the performance of the MLE is better than that of the MDPD estimator in terms of MSE, while the MDPD estimator provides smaller biases compared to the MLE in some situations.
In the following subsection, we present the results of the Monte Carlo simulation study to investigate the performance of the estimation procedures and the proposed model selection and ranking approaches under model misspecification with and without contamination. In Section 5.1, we compare the performance of the proposed model selection and model ranking procedures based on likelihood and divergence methods where contamination is not present. Additionally, since estimators based on DPD measures are known to be robust, we consider a contamination model in Section 5.2 and compare the performances of the proposed procedures to determine whether methods based on DPD are relatively better than those based on LF.

5.1. Performance of the Proposed Methods in the Absence of Contamination

To assess the performance of the proposed methods in the absence of contamination, the data is assumed to be governed by one of the six candidate distributions discussed in Section 2. Specifically, we generate random samples of size n from F S ( 1.43 , 834.95 ) , F R ( 1.05 , 518.75 ) , L M ( 2.04 , 2202.85 ) , L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) , and W E ( 0.79 , 1690.57 ) for N = 1000 simulations.
Before evaluating the performance of the proposed model selection and model ranking approaches, we study the performance of the MLE and MDPD estimation methods for VaR under model misspecification with the non-model-selection approach (i.e., estimate the VaR under the assumed model M j ). For each dataset simulated from a true model (say, model M h ), the parameter vector and the corresponding VaR values are estimated under the assumed model using MLE and MDPD estimation methods. In Table A2, Table A3 and Table A4, we present the average VaR estimates with confidence level 0.99 , the relative biases relative to the bias of the MLEs, when the assumed model is the true model (i.e., R B i a s ( Q ^ j ( δ ) , Q ^ h ( δ ) ) and R B i a s ( Q ˜ j ( δ ) , Q ^ h ( δ ) ) ) and the relative RMSEs relative to the RMSE of the MLEs when the assumed model is the true model (i.e., R R M S E ( Q ^ j ( δ ) , Q ^ h ( δ ) ) and R R M S E ( Q ˜ j ( δ ) , Q ^ h ( δ ) ) ) respectively. Therefore, when the assumed model is the true model (values highlighted in boldface), the value of R B i a s ( Q ^ j ( δ ) , Q ^ h ( δ ) ) and R R M S E ( Q ^ j ( δ ) , Q ^ h ( δ ) ) is 1 under the likelihood method. Table A2, Table A3 and Table A4 illustrate that when the model is misspecified, biases in estimation arise, regardless of the method used for estimating the VaR. Furthermore, although some RMSEs may be smaller in certain instances compared to those observed under correct model specifications, overall, RMSEs tend to be higher when the model is misspecified. Particularly noteworthy is the substantial increase in biases and RMSEs when the model is misspecified as a Fréchet distribution. These findings underscore the imperative of developing effective strategies to mitigate the impact of model misspecification.
Here, we also evaluate the performance of VaR estimates based on the proposed model selection and model ranking approach presented in Section 4. For each simulated dataset, we perform the model selection approach by using the likelihood-based and the divergence-based approaches; then, based on the selected model (i.e., the most probable model), we compute the estimate of VaR by using the MLE or MDPD estimation methods, as Q ^ ( δ ) and Q ˜ ( δ ) , respectively. To evaluate the performance of the estimates Q ^ ( δ ) and Q ˜ ( δ ) using the model selection approach, we compute the relative biases and relative RMSEs relative to the biases and RMSEs of the MLEs under the correctly specified model M h , i.e., R B i a s ( Q ^ ( δ ) , Q ^ h ( δ ) ) and R R M S E ( Q ^ ( δ ) , Q ^ h ( δ ) ) for the model selection approach using MLE, and R B i a s ( Q ˜ ( δ ) , Q ^ h ( δ ) ) and R R M S E ( Q ˜ ( δ ) , Q ^ h ( δ ) ) for the DPD measure approach).
In Table A5, the average values of the estimates of VaR, based on the model selection approach, along with the associated relative biases and the relative RMSEs of all the proposed estimators, relative to the corresponding bias and RMSEs of the MLEs obtained under the assumption of the correctly specified model. Moreover, we present the simulation proportion of the best-fitting model in Table A6 (in terms of maximized likelihood or minimized divergence) among all competing models. The boldfaced proportions in Table A6 represent the proportions of the true model from which the data is generated, being selected as the best model in terms of maximized likelihood or minimized DPD in N = 1000 simulations. The other proportions for different assumed models against a given true model indicate the proportions of the assumed model being chosen as the best model in the N = 1000 simulations, which is the model then used to compute the final VaR estimates with Equation (24).
For the model ranking approach, we consider k * = 2 and 3, i.e., use the top two and top three most probable models to compute the weighted average estimate of VaR. In Table A7 and Table A8, we present the average estimates, the relative biases, and relative RMSEs relative to the MLEs under the true model for Q ^ * ( δ ) and Q ˜ * ( δ ) with k * = 2 and 3, respectively. To evaluate the performance of the proposed model ranking approach in identifying the true model, in Table A9 and Table A10, we present the proportion of times the true model is ranked within the top k * (denoted by order of preference = 1 , 2 for k * = 2 ; and order of preference = 1 , 2 , 3 for k * = 3 ) and the proportion of times the true model is not ranked as top k * (denoted by order of preference = 0). In other words, the order of preference 0 indicates the case when the true model is not used for computing the weighted average VaR estimate.
The results in Table A7, Table A8, Table A9 and Table A10 indicate that, as expected, estimation procedures without knowledge of the true model generally perform worse than MLEs under the true model. Among the model selection ( k * = 1 ) and model ranking approaches ( k * = 2 and k * = 3 ), the relative MSEs are smaller for k * = 2 compared to k * = 1 or k * = 3 for most models considered, except that of the Weibull distribution. The value of k * is subjective and is determined by the degree of confidence in the performance of the estimator under study. Specifically, when the level of uncertainty in the quantity to be estimated is high, larger values of k * are more appropriate, and vice versa.
Although these estimators do not perform as well as those using the true model, model averaging offers better credibility since an incorrect model would be highly inconsistent. Additionally, Table A9 and Table A10 show that the proposed model selection and ranking techniques may not have high discriminating power for all models. For example, models like Fisk, Lomax, and paralogistic have proportions of the true model used for VaR calculation via the model ranking method of less than 0.5, likely due to the similarity of the candidate distributions, as discussed in Section 3.

5.2. Performance of the Proposed Methods in the Presence of Contamination

In non-life insurance claims, it is common to encounter some exceptionally high-magnitude observations. Although these occurrences are infrequent, yet, they can significantly distort model fitting and model selection if the criteria used are not robust. Literature suggests that the density power divergence method is a robust estimation criterion (see, for example, Basu et al. 1998, 2013; Jones et al. 2001, and the articles cited therein). When analyzing and fitting probability models to contaminated datasets, two key questions often arise. The first is how to identify and estimate the effect of contamination or outliers, especially if they are of significant importance. The second is how to design an estimator that is minimally affected by outliers, allowing the true characteristics of the data, free from contamination, to be accurately studied. In practical applications, contamination models can be treated as a convex mixture of two distributions (Brownie et al. 1983; Punzo and Tortora 2021).
In this subsection, we evaluate the performance of the model selection and model ranking approaches under contamination models. We generate contaminated data under a fixed contamination model, where a fixed proportion ϵ of the sample is obtained from the contamination model, and the remaining observations in the sample come from the original model of interest. Here, the observed random sample from the original model f θ is denoted as x ( o ) , and the observed random sample from the contamination model f θ ( c ) is respectively denoted as x ( c ) . The following algorithm is used to generate a contaminated sample of size n for a given ϵ :
  • Generate x ( c )   = ( x 1 , x 2 , , x [ n · ϵ ] ) from f θ ( c ) , where [ n · ϵ ] is integer value of ( n · ϵ ) ;
  • Generate x ( o )   = ( x [ n · ϵ ] + 1 , x [ n · ϵ ] + 2 , , x n [ n · ϵ ] ) from f θ ;
  • Return the sample x = ( x ( o ) , x ( c ) ) .
For the probability distributions f θ and f θ ( c ) , we consider the six probability distributions described in Section 2, where f θ ( c ) has the same probability law as f θ , but with different values of scale parameters. The values of parameters of the contamination models were chosen to yield significantly high values of claim sizes, which would mostly appear as outliers for the original model (Wellmann and Gather 1999).
The performances of the proposed methods are evaluated in terms of their relative biases and relative RMSEs in connection with the biases and RMSEs of the MLEs of VaR values obtained based on the correctly specified model in the presence of contamination. An intuitive justification for comparing the proposed methods to the MLEs in the presence of contamination is that divergence-based measures are known to be robust. Therefore, it is reasonable to believe that one or more deviant observations should not significantly affect the divergence-based estimator’s general behavior for the characteristic of interest, which is the estimate of VaR of the original model.
As discussed in the contamination-free case in Section 5.1, before evaluating the performance of the proposed model selection and model ranking approaches, we examine the performance of the MLE and MDPD estimation methods for VaR under model misspecification and contamination without employing model selection. Table A18 presents the average VaR estimates along with their associated biases and RMSEs for the MLEs, and the relative biases and relative RMSEs for the MDPD estimates for some selected models. The table highlights the significant deviations of the VaR estimates from those obtained from the correctly specified model. Additionally, the biases and RMSEs of the MLEs for VaR are notably higher in the presence of contamination compared to the scenario without contamination.
We then evaluate the performance of the MLE and MDPD estimation methods for VaR under model misspecification with data contamination. Specifically, for the case with 10% contamination (i.e., ϵ = 0.1 ), Table A19, Table A20 and Table A21 present the average VaR estimates along with their associated relative biases and relative RMSEs. These metrics are compared to the bias and MSE of the MLEs under the correctly specified model. The results in these tables indicate that the MDPD estimator for VaR outperforms the MLE in terms of reduced relative risks, particularly for values of the tuning parameter that are away from 0. Additionally, the findings highlight that using a non-model selection approach can lead to misleading VaR estimates when the model is misspecified.
Table A22, Table A23 and Table A24 present the average VaR estimates using the model selection approach (i.e., model ranking approach with ( k * = 1 ) and the model ranking approach with ( k * = 2 ) and ( k * = 3 ), along with their corresponding relative biases and relative RMSEs, compared to the MLE under the correctly specified model with 10% data contamination. Additionally, the simulation proportions of the true model being included in the final VaR estimate calculation based on the model selection approach and the model ranking approach with ( k * = 2 ) and ( k * = 3 ) are shown in Table A19, Table A20 and Table A21. These simulation results demonstrate that the proposed model selection and ranking approaches, particularly those based on the minimized DPD, provide more accurate VaR estimates compared to using a prespecified model when contamination is present in the dataset. Interestingly, there are cases (e.g., when the true model is lognormal or Weibull) where the true model is rarely used in the final VaR estimates. Overall, the model averaging technique based on minimized DPD offers a more accurate procedure for estimating VaR, with significantly better reliability in the presence of contamination, even though the true model may seldom be correctly identified.
To keep this paper concise, we present only some representative simulation results here; additional simulation results are provided in Appendices Appendix B and Appendix D.

6. Practical Data Analyses

In this section, we demonstrate the application of the proposed model selection and ranking approaches for estimating VaR using real claim size data from non-life insurance. Section 6.1 applies the methodology to the dataset introduced earlier in Section 3, which was used to motivate the study and illustrate the behavior of the candidate models. Section 6.2 extends the analysis to a different dataset from the insuranceData package in R (R Core Team 2025), thereby providing an additional example to assess the robustness and practical applicability of the proposed procedures across distinct non-life insurance contexts.

6.1. Vehicle Insurance Data

For the dataCar dataset discussed in Section 3, we estimate the VaR under the six candidate models and apply the model selection and model averaging approaches presented in the previous sections. The VaR estimates based on the model selection (for this dataset, the Fréchet distribution is the selected model) and based on the model ranking approach (with k * = 2 and 3) for both maximum likelihood and minimum density power divergence methods are reported in Table 2.
From the results in Table 2, it is both interesting and expected to observe significant variations in the VaR estimates, particularly at the 99% level. Given the unknown ground truth of the underlying probability model for vehicle insurance claim amounts, these results suggest that selecting different models can lead to substantially different conclusions. Conversely, the model ranking and averaging approaches moderate these differences, providing a more balanced estimate of VaR. We will further illustrate this point in the following numerical example.

6.2. Partial Casco Motorcycles Insurance Data

The dataset called dataOhlsson comprises partial casco insurance for motorcycles insured and claimed from the Swedish insurance company from 1994 to 1998. We are interested in the non-zero claim amounts in Swedish Krona (SEK) of those partial casco motorcycle insurance claims described in the variable skadkost. The fitted CDFs of the six candidate models for this dataset are plotted in Figure 2, and the corresponding VaR estimates based on the six candidate models are presented in Table 3. In Table 3, we highlight the VaR estimates based on the model selection (here, the lognormal distribution is the selected model) and present the VaR estimates using the model ranking approach (with k * = 2 and 3) for both maximum likelihood and minimum density power divergence methods.
Once again, we observe significant variations in the VaR estimates, particularly at the 99% level. These variations highlight the risks associated with relying on a single assumed model for VaR estimation, especially when the model is misspecified. For instance, if one considers the distribution of non-zero claim amounts for partial casco motorcycle insurance claims follows a Fréchet distribution, the maximum likelihood method indicates a 99% probability that future claim amounts will not exceed 1.19 billion. In stark contrast, if a Weibull distribution is assumed, the same method suggests a 99% probability that future claim amounts will not exceed 17.52 million, which is approximately 68 times smaller than the Fréchet estimate. This discrepancy highlights the importance of model selection in estimating VaR.
Although these numerical examples cannot prove the superiority of the proposed model selection and ranking approaches due to the unknown ground truth, they demonstrate that these approaches help mitigate the risk of drawing highly inaccurate conclusions resulting from model misspecification.

7. Extension to Candidate Models with Unequal Number of Parameters

In the preceding sections, the candidate probability models (see Section 2) are all two-parameter models, implying equal model complexity. This choice is common in comparative studies, as models with the same number of parameters allow for a fair comparison.
However, in some situations, researchers may consider probability models with different numbers of parameters. In such cases, model complexity must be explicitly accounted for when comparing model fits. In this section, we address the question of whether the model selection and ranking procedure based on either the LF or the DPD measure remains valid and practical when the candidate set contains models with unequal numbers of parameters. Although our earlier development assumed that all candidate models have the same number of unknown parameters, this section demonstrates how the proposed method can be extended to accommodate models with differing parameter counts.
When the likelihood-based estimation method described in Section 3.1 is used for parameter estimation, information-based criteria—such as the Akaike information criterion (AIC), the Bayesian information criterion (BIC), and the deviance information criterion (DIC)—are widely adopted for model selection (see, for example, Burnham and Anderson 2002; Castilla et al. 2020; Karagrigoriou and Mattheou 2009). These criteria incorporate a penalty for the number of model parameters, thereby addressing differences in model complexity. For example, the AIC is given by
A I C = 2 κ 2 ln ( L ( θ ^ | x ) ] ,
where κ is the number of parameters in the model, L ( θ ^ | x ) is the value of maximized likelihood function defined in Equation (19) evaluated at the MLE θ ^ defined in Equation (20). The AIC balances goodness-of-fit with parsimony by penalizing excessive parameters, favoring models that remain simple yet adequately explain the data. A smaller AIC value reflects a more desirable trade-off between fit and complexity.
When the divergence-based estimation method described in Section 3.2 is employed, the robust divergence-based Bayesian criterion (DBBC) proposed by Kurata and Hamada (2020), denoted R C C α can serve as a model selection and ranking tool. It is defined as
R C C α = 2 n H α ( x ; θ ˜ ) + κ ln ( n ) , α 0 ,
where κ denotes the number of parameters in the model and H α ( x ; θ ˜ ) is the minimized value of the DPD measure given in Equation (22), evaluated at θ = θ ˜ . Like the AIC, R C C α penalizes models with more parameters, and a lower value indicates a better compromise between model fit and complexity.
For illustration, consider a set of four candidate models, where two are generalized versions of the other two and naturally include them as special cases. Specifically, we use the exponentiated Weibull distribution, denoted as E W ( β 7 , ν , λ 7 ) , (Mudholkar and Srivastava 1993), and the exponentiated Fréchet distribution, denoted as E F ( β 8 , η , λ 8 ) , (Nadarajah and Kotz 2003), analyze the vehicle insurance dataset discussed in Section 3 and Section 6.1. The PDFs of E W ( β 7 , ν , λ 7 ) and E F ( β 8 , η , λ 8 ) are given by
f θ 7 ( x ) = ν β 7 λ 7 x β 7 1 1 e λ 7 x β 7 ν 1 e λ 7 x β 7 , for   x > 0 ; 0 , otherwise ,
f θ 8 ( x ) = η β 8 λ 8 β 8 1 e λ 8 / x β 8 η 1 x ( 1 + β 8 ) e λ 8 / x β 8 , for   x > 0 ; 0 , otherwise ,
where θ 7 = ( β 7 , ν , λ 7 ) and θ 8 = ( β 8 , η , λ 8 ) are the respective parameter vectors; λ 7 > 0 and λ 8 > 0 are the scale parameters, while β 7 , ν > 0 and β 8 , η > 0 are the shape parameters.
Table 4 presents the model ranking results based on the AIC and the R C C α criteria for α = 0.05 , 0.1 and 0.2 . The four candidate models include two basic two-parameter models (the Fréchet (FR) and Weibull (WE) distributions) and their corresponding three-parameter generalizations, the exponentiated Fréchet (EF) and exponentiated Weibull (EW) distributions.
Across both AIC and R C C α measures, the EF and EW models consistently outperform their simpler counterparts ( F R and W E , respectively) for all α values considered here. This improvement can be attributed to the inclusion of an additional shape parameter in the EF and EW distributions, which increases model flexibility and allows a better adaptation to the tail behavior and overall distributional shape of the data.
Importantly, both AIC and R C C α incorporate a penalty term for model complexity, ensuring that a model is not favored solely because it has more parameters. The fact that EF and EW achieve lower (better) values for these penalized criteria indicates that the improvement in goodness of fit more than compensates for the increase in parameter count. In other words, the additional shape parameter provides meaningful explanatory power rather than overfitting noise in the data.
These results suggest that, in this application, allowing greater distributional flexibility via an additional shape parameter can lead to a statistically justifiable improvement in model performance, even when penalizing for increased complexity. For explanatory purposes, the VaR estimates at δ = 0.95 for the four candidate models are reported in Table 5.
It is worth noting that the methodology proposed here can be applied with model-selection criteria beyond AIC and R C C α . For example, goodness-of-fit measures, such as the Kolmogorov-Smirnov distance and the Anderson-Darling statistic, can be considered.

8. Concluding Remarks

The findings presented in this article provide significant insights into the implications and mitigation strategies for model misspecification in insurance claims data analysis. By exploring various probability models and adopting both likelihood-based and divergence-based estimation methods, this study highlights the importance of accurate model selection and ranking in minimizing the adverse effects of model uncertainty. The numerical examples and simulation studies demonstrate that the proposed model selection and ranking approaches, particularly those based on the divergence measure, offer robust solutions for estimating risk measures such as VaR.
The practical implications of this research are profound, as these methods effectively address inaccuracies arising from model misspecification. These methods provide practicing and academic actuaries with advanced tools to mitigate risks and achieve more reliable outcomes in their risk assessments, even in the presence of model misspecification. Practically, relying on a single, potentially misspecified model can yield large errors and misleading conclusions for VaR. By aggregating across competing models and using robust estimators when appropriate, our approach yields more stable VaR estimates and decisions that are less sensitive to outliers, which are common in insurance claims data.
This, in turn, enhances the reliability of insurance data analysis, ultimately leading to better decision-making in the insurance industry. Future research could expand on these methods to refine the accuracy and applicability of model selection and ranking approaches in various actuarial contexts.
A few limitations of the proposed methodologies should also be acknowledged. Under model uncertainty, the true data-generating process is unknown, and all candidate parametric models are approximations; likelihood-based and divergence-based estimators target pseudo-true parameters, and the resulting risk measures may still be biased if the candidate models are poorly specified. Although aggregating across multiple probability distributions and employing robust estimators such as the MDPD can substantially reduce sensitivity to misspecification and outliers, the performance of the approach ultimately depends on the researcher’s ability to select a reasonably rich and flexible set of candidate models. If these candidates are uniformly inadequate, the resulting estimates may remain unreliable.

Author Contributions

Investigation, Methodology, and Original Draft Preparation: S.B.; Conceptualization, Methodology, Writing, Review, and Editing: H.K.T.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by The Society of Actuaries’ Committee on Knowledge Extension Research and the Casualty Actuarial Society Individual Grant. The initial work on this article commenced while the first author was visiting the Department of Mathematical Sciences, Bentley University (Waltham, MA), under the U.S. Department of State’s (Fulbright Scholars) Exchange Visitor program (Program No. G-1-00005).

Data Availability Statement

The original data presented in the study are openly available in the insuranceData R package (Wolny-Dominiak and Trzesiok 2014) at https://doi.org/10.32614/CRAN.package.insuranceData, accessed on 20 October 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

List of notations used in the tables in Appenidx A, Appenidx B, Appenidx C and Appenidx D.
AV L Average Q ^ j ( δ ) for Likelihood
AV D Average Q ˜ j ( δ ) for Divergence
BS L T B i a s ( Q ^ j ( δ ) )
MS L T R M S E ( Q ^ j ( δ ) )
BS L R B i a s ( Q ^ j ( δ ) , Q ^ h ( δ ) )
BS D R B i a s ( Q ˜ j ( δ ) , Q ^ h ( δ ) )
MS L R R M S E ( Q ^ j ( δ ) , Q ^ h ( δ ) )
MS D R R M S E ( Q ˜ j ( δ ) , Q ^ h ( δ ) )
AV s Average of Q ^ ( δ ) or Q ˜ ( δ )
AV k * Average of Q ^ * ( δ ) or Q ˜ * ( δ )
BS s R B i a s ( Q ^ ( δ ) or Q ˜ ( δ ) , Q ^ h ( δ ) )
BS k * R B i a s ( Q ^ * ( δ ) or Q ˜ * ( δ ) , Q ^ h ( δ ) )
MS s R R M S E ( Q ^ ( δ ) or Q ˜ ( δ ) , Q ^ h ( δ ) )
MS k * R R M S E ( Q ^ * ( δ ) or Q ˜ * ( δ ) , Q ^ h ( δ ) )

Appendix A. VaR Estimation for δ = 0.99 Without Contamination

Table A1. Simulated average estimate of VaR with associated bias and RMSEs (relative bias and relative RMSEs of divergence estimators) for δ = 0.99 , when the true models are F S ( 1.43 , 834.95 ) , F R ( 1.05 , 518.75 ) , L M ( 2.04 , 2202.85 ) , L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) , denoted as F S , F R , L M , L N , P L and W E respectively.
Table A1. Simulated average estimate of VaR with associated bias and RMSEs (relative bias and relative RMSEs of divergence estimators) for δ = 0.99 , when the true models are F S ( 1.43 , 834.95 ) , F R ( 1.05 , 518.75 ) , L M ( 2.04 , 2202.85 ) , L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) , denoted as F S , F R , L M , L N , P L and W E respectively.
nEstimator α Model
j = 1 j = 2 j = 3 j = 4 j = 5 j = 6
Statistics (a) FS FR LM LN PL WE
Q j ( δ ) 2.0764.1461.8850.7701.9001.168
50Likelihood AV L 2.2284.5962.0140.7792.0771.167
BS L T 0.1520.4500.1280.0090.177−0.002
MS L T 1.0443.2861.3820.2681.2090.275
Divergence0.05 AV D 2.2384.6132.0800.7842.0991.175
BS D 1.0671.0371.5181.6171.124−3.974
MS D 1.0151.0021.0771.0291.0241.028
0.10 AV D 2.2504.6532.1650.7902.1241.181
BS D 1.1461.1262.1772.2091.264−8.247
MS D 1.0531.0301.2711.0751.0761.076
0.20 AV D 2.2764.7552.4930.8002.1811.194
BS D 1.3181.3524.7333.3241.588−16.199
MS D 1.1651.1262.3241.1991.2591.268
100Likelihood AV L 2.1364.3251.9040.7872.0021.165
BS L T 0.0600.1790.0190.0180.102−0.003
MS L T 0.6561.8160.8410.1770.7150.199
Divergence0.05 AV D 2.1454.3311.9440.7892.0141.168
BS D 1.1451.0343.1501.0831.1180.135
MS D 1.0301.0121.0571.0101.0281.009
0.10 AV D 2.1544.3441.9890.7902.0271.170
BS D 1.2941.1025.5491.1541.243−0.601
MS D 1.0731.0411.1591.0371.0791.032
0.20 AV D 2.1714.3722.1210.7912.0551.174
BS D 1.5731.25912.5781.2411.513−1.889
MS D 1.1711.1221.5721.1241.2171.121
(a) expressed as a multiplier of 10 4 .
Table A2. Simulated average estimate of VaR (expressed as a multiplier of 10 4 ), relative bias and relative RMSE with δ = 0.99 when the non-model selection approach is adopted for true models F S ( 1.43 , 834.95 ) and F R ( 1.05 , 518.75 ) . Values in bold indicate the results obtained under the true model.
Table A2. Simulated average estimate of VaR (expressed as a multiplier of 10 4 ), relative bias and relative RMSE with δ = 0.99 when the non-model selection approach is adopted for true models F S ( 1.43 , 834.95 ) and F R ( 1.05 , 518.75 ) . Values in bold indicate the results obtained under the true model.
True
Model   ( h )
n Method α Assumed Model
j = 1 j = 2 j = 3 j = 4 j = 5 j = 6
FS FR LM LN PL WE
F S ( 1.43 , 834.95 )  50Likelihood based AV L 2.22836.7271.7171.6671.8301.240
BS L 1.000228.288−2.365-2.691−1.622−5.509
MS L 1.000223.2041.2590.8330.9721.163
Divergence based0.05 AV D 2.23841.1211.6211.6201.7841.027
BS D 1.067257.236−2.999−3.007−1.921−6.908
MS D 1.015149.3001.1490.8070.9431.105
0.10 AV D 2.25046.2411.5201.5891.7470.857
BS D 1.146290.970−3.662−3.209−2.169−8.030
MS D 1.053158.2701.0880.8190.9501.208
0.20 AV D 2.27671.7101.3281.5651.6950.676
BS D 1.318458.761−4.926−3.368−2.513−9.222
MS D 1.165418.4291.0690.8841.0271.357
100Likelihood based AV L 2.13631.2081.6091.6201.7281.242
BS L 1.000482.290−7.734−7.548−5.767−13.814
MS L 1.000115.2361.2761.0111.0491.544
Divergence based0.05 AV D 2.14530.0751.5061.5691.6820.991
BS D 1.145463.527−9.440−8.399−6.524−17.961
MS D 1.03097.2451.2771.0401.0711.712
0.10 AV D 2.15430.3881.4031.5341.6430.832
BS D 1.294468.718−11.146−8.971−7.172−20.586
MS D 1.07374.0421.3321.0801.1111.921
0.20 AV D 2.17135.5011.2121.4981.5820.662
BS D 1.573553.361−14.309−9.560−8.172−23.407
MS D 1.17194.8151.5181.1511.2052.166
F R ( 1.05 , 518.75 ) 50Likelihood based AV L 1.8524.5962.8661.6951.8832.248
BS L −5.0951.000−2.843−5.443−5.025−4.216
MS L 0.7711.0001.0630.8280.8041.380
Divergence based0.05 AV D 1.6204.6132.4391.3621.5721.335
BS D −5.6091.037−3.790−6.184−5.717−6.242
MS D 0.8101.0020.9090.8750.8380.897
0.10 AV D 1.4474.6532.0771.1511.3380.924
BS D −5.9941.126−4.596−6.652−6.235−7.155
MS D 0.8491.0300.8440.9260.8840.992
0.20 AV D 1.2134.7551.5180.9031.0260.578
BS D −6.5141.352−5.836−7.202−6.927−7.924
MS D 0.9091.1260.8670.9940.9621.088
100Likelihood based AV L 1.7724.3252.5891.6041.7732.082
BS L −13.2551.000−8.695−14.190−13.247−11.523
MS L 1.3571.0001.1831.4471.3741.741
Divergence based0.05 AV D 1.5624.3312.2381.3131.5001.263
BS D −14.4271.034−10.651−15.817−14.774−16.098
MS D 1.4541.0121.2371.5791.4941.614
0.10 AV D 1.3994.3441.9281.1171.2860.890
BS D −15.3401.102−12.386−16.909−15.967−18.180
MS D 1.5351.0411.3351.6791.5971.801
0.20 AV D 1.1714.3721.4300.8800.9910.568
BS D −16.6081.259−15.165−18.234−17.617−19.975
MS D 1.6511.1221.5411.8031.7481.972
Table A3. Simulated average VaR estiamtes (expressed as a multiplier of 10 4 ), relative bias and relative RMSE with δ = 0.99 when non-model selection approach is adopted for true models L M ( 2.04 , 2202.85 ) and L N ( 1.19 , 6.81 ) . Values in bold indicate the results obtained under the true model.
Table A3. Simulated average VaR estiamtes (expressed as a multiplier of 10 4 ), relative bias and relative RMSE with δ = 0.99 when non-model selection approach is adopted for true models L M ( 2.04 , 2202.85 ) and L N ( 1.19 , 6.81 ) . Values in bold indicate the results obtained under the true model.
True
Model   ( h )
n Method α Assumed Model
j = 1 j = 2 j = 3 j = 4 j = 5 j = 6
FS FR LM LN PL WE
L M ( 2.04 , 2202.85 ) 50Likelihood based AV L 4.016687.8272.0142.6813.1581.327
BS L 16.5945.34 × 10 3 1.0006.2019.913–4.349
MS L 2.0875.20 × 10 3 1.0001.0841.5870.694
Divergence based0.05 AV D 4.3932.36 × 10 3 2.0802.8583.4771.206
BS D 19.5341.84 × 10 4 1.5187.57512.396−5.291
MS D 2.4863.03 × 10 4 1.0771.2851.9920.585
0.10 AV D 4.781799.4362.1653.0633.8451.101
BS D 22.5536.21 × 10 3 2.1779.17615.268−6.109
MS D 2.9343.20 × 10 3 1.2711.5672.5430.623
0.20 AV D 5.5734.34 × 10 3 2.4933.6074.7690.962
BS D 28.7253.38 × 10 4 4.73313.41122.462−7.192
MS D 4.0093.27 × 10 4 2.3242.6214.3360.701
100Likelihood based AV L 3.963227.0691.9042.6293.0661.306
BS L 110.7311.20 × 10 4 1.00039.66562.930−30.905
MS L 2.9481.32 × 10 3 1.0001.3431.9790.852
Divergence based0.05 AV D 4.323269.2321.9442.7873.3491.189
BS D 129.9501.43 × 10 4 3.15048.08078.030-37.124
MS D 3.4671.08 × 10 3 1.0571.5732.4170.905
0.10 AV D 4.680341.8191.9892.9643.6551.091
BS D 148.9901.81 × 10 4 5.54957.51694.355−42.327
MS D 4.0091.53 × 10 3 1.1591.8572.9370.991
0.20 AV D 5.359755.9572.1213.3694.3210.956
BS D 185.1774.02 × 10 4 12.57879.093129.827−49.531
MS D 5.1206.52 × 10 3 1.5722.5984.2391.133
L N ( 1.19 , 6.81 ) 50Likelihood based AV L 1.1386.9330.7050.7790.8890.519
BS L 40.417676.477−7.1221.00013.163−27.521
MS L 2.16634.7441.3151.0001.4721.154
Divergence gence0.05 AV D 1.1708.8470.7030.7840.9010.485
BS D 43.985886.568−7.3461.61714.414−31.273
MS D 2.33149.1211.3291.0291.5441.208
0.10 AV D 1.19610.9050.6950.7900.9070.448
BS D 46.7641.11 × 10 3 −8.2262.20915.133−35.335
MS D 2.48067.4051.3461.0751.6161.302
0.20 AV D 1.22815.3440.6700.8000.9090.386
BS D 50.3041.60 × 10 3 −10.9133.32415.278−42.048
MS D 2.726121.4881.3831.1991.7551.491
100Likelihood based AV L 1.1376.7320.7200.7870.8910.538
BS L 20.945340.029−2.8211.0006.946−13.194
MS L 2.63140.6281.3521.0001.5291.502
Divergence based0.05 AV D 1.1648.0290.7110.7890.8970.497
BS D 22.471414.027−3.3561.0837.283−15.548
MS D 2.80849.0781.3321.0101.5771.650
0.10 AV D 1.1839.2650.6970.7900.8980.456
BS D 23.592484.521−4.1581.1547.337−17.875
MS D 2.95557.9091.3251.0371.6181.838
0.20 AV D 1.20411.7490.6590.7910.8890.390
BS D 24.805626.190−6.3301.2416.800−21.625
MS D 3.16377.3751.3511.1241.6832.175
Table A4. Simulated average VaR estimates (expressed as a multiplier of 10 4 ), relative bias and relative RMSE with δ = 0.99 when non-model selection approach is adopted for true models P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) . Values in bold indicate the results obtained under the true model.
Table A4. Simulated average VaR estimates (expressed as a multiplier of 10 4 ), relative bias and relative RMSE with δ = 0.99 when non-model selection approach is adopted for true models P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) . Values in bold indicate the results obtained under the true model.
True
Model   ( h )
n Method α Assumed Model
j = 1 j = 2 j = 3 j = 4 j = 5 j = 6
FS FR LM LN PL WE
P L ( 1.27 , 1117.04 ) 50Likelihood based AV L 2.60384.3811.7211.8752.0771.219
BS L 3.977466.850−1.016−0.1431.000−3.857
MS L 1.201470.8281.1100.7271.0000.910
Divergence based0.05 AV D 2.68587.8521.6741.8802.0991.047
BS D 4.443486.497−1.280−0.1141.124−4.829
MS D 1.285295.4271.0880.7211.0240.798
0.10 AV D 2.764111.7791.6231.8982.1240.914
BS D 4.885621.925−1.568−0.0161.264−5.586
MS D 1.385543.0751.1020.7521.0760.860
0.20 AV D 2.905146.2981.5211.9572.1810.749
BS D 5.685817.306−2.1470.3191.588−6.516
MS D 1.611572.3471.2020.8911.2590.972
100Likelihood based AV L 2.54757.3141.6451.8542.0021.219
BS L 6.348543.883−2.504−0.4551.000−6.687
MS L 1.444322.7791.0650.7931.0001.158
Divergence based0.05 AV D 2.62354.5941.5791.8532.0141.044
BS D 7.089517.186−3.153−0.4701.118−8.407
MS D 1.567131.3471.0290.7911.0281.269
0.10 AV D 2.69257.7481.5071.8602.0270.908
BS D 7.774548.144−3.861−0.3991.243−9.741
MS D 1.696131.3011.0270.8171.0791.422
0.20 AV D 2.81369.9311.3611.8942.0550.743
BS D 8.955667.718−5.290−0.0661.513−11.362
MS D 1.944172.1431.1000.9181.2171.634
W E ( 0.79 , 1690.57 ) 50Likelihood based AV L 5.8351.08 × 10 4 1.4883.7324.2691.167
BS L −2.95 × 10 3 −6.81 × 10 6 −202.110−1.62 × 10 3 −1.96 × 10 3 1.000
MS L 20.2357.02 × 10 5 2.72112.03914.7481.000
Divergence gence0.05 AV D 7.0562.24 × 10 5 1.6974.4275.4731.175
BS D −3.72 × 10 3 −1.41 × 10 8 −334.238−2.06 × 10 3 −2.72 × 10 3 −3.974
MS D 26.7422.44 × 10 7 4.32316.69222.6351.028
0.10 AV D 8.5341.47 × 10 7 2.0705.4087.3061.181
BS D −4.66 × 10 3 −9.33 × 10 9 −570.471−2.68 × 10 3 −3.88 × 10 3 −8.247
MS D 36.2101.70 × 10 9 9.25228.57240.2001.076
0.20 AV D 13.3122.51 × 10 11 7.31018.25550.4271.194
BS D −7.68 × 10 3 −1.59 × 10 14 −3.88 × 10 3 −1.08 × 10 4 −3.12 × 10 4 −16.199
MS D 143.4732.89 × 10 13 182.046960.2274.16 × 10 3 1.268
100Likelihood based AV L 5.5611.97 × 10 3 1.4143.6133.9421.165
BS L −1.41 × 10 3 −6.31 × 10 5 −78.692−784.091−889.3301.000
MS L 23.9024.97 × 10 4 2.47613.69615.7961.000
Divergence based0.05 AV D 6.5862.54 × 10 3 1.5404.1294.8261.168
BS D −1.74 × 10 3 −8.15 × 10 5 −119.045−949.515−1.17 × 10 3 0.135
MS D 29.9211.12 × 10 5 3.39016.78221.4221.009
0.10 AV D 7.7403.55 × 10 3 1.7154.7455.9581.170
BS D −2.11 × 10 3 −1.14 × 10 6 −175.210−1.15 × 10 3 −1.54 × 10 3 −0.601
MS D 36.9561.57 × 10 5 4.95420.74729.1881.032
0.20 AV D 10.3889.62 × 10 3 2.4026.4059.2321.174
BS D −2.96 × 10 3 −3.08 × 10 6 −395.750−1.68 × 10 3 −2.59 × 10 3 −1.889
MS D 54.6363.82 × 10 5 15.42233.64456.2121.121
Table A5. VaR estimates (expressed as a multiplier of 10 4 ), relative biases and relative RMSEs for k * = 1 , δ = 0.99 , when true models are F S ( 1.43 , 834.95 ) , F R ( 1.05 , 518.75 ) , L M ( 2.04 , 2202.85 ) , L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) . Values in bold indicate the smallest M S S among all the estimators.
Table A5. VaR estimates (expressed as a multiplier of 10 4 ), relative biases and relative RMSEs for k * = 1 , δ = 0.99 , when true models are F S ( 1.43 , 834.95 ) , F R ( 1.05 , 518.75 ) , L M ( 2.04 , 2202.85 ) , L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) . Values in bold indicate the smallest M S S among all the estimators.
n Likelihood BasedDivergence BasedLikelihood BasedDivergence Based
α = 0.05 α = 0.10 α = 0.20 α = 0.05 α = 0.10 α = 0.20
True model: F S ( 1.43 , 834.95 ) True model: L N ( 1.19 , 6.81 )
True Q 1 ( 0.99 ) = 2.076 True Q 4 ( 0.99 ) = 0.770
50 AV s 1.1301.1461.1871.2550.4810.5420.5970.722
BS s −6.235−6.127−5.854−5.409−31.646−24.982−18.901−5.191
MS s 2.3242.4772.7343.3313.0433.7554.4765.846
100 AV s 0.9490.9620.9450.9290.4370.4690.5060.632
BS s −18.654−18.437−18.725−18.995−18.949−17.164−15.006−7.862
MS s 2.5872.6782.7012.7943.6754.1134.7907.318
True model: F R ( 1.05 , 518.75 ) True model: P L ( 1.27 , 1117.04 )
True Q 2 ( 0.99 ) = 4.146 True Q 5 ( 0.99 ) = 1.900
50 AV s 2.0972.0852.0912.0561.0111.0961.1181.264
BS s −4.550−4.577−4.564−4.641−5.033−4.553−4.428−3.605
MS s 1.1431.1371.1471.1651.5092.0952.3093.120
100 AV s 2.1012.1032.1072.1060.9630.9680.9640.969
BS s −11.420−11.405−11.387−11.388−9.202−9.155−9.189−9.147
MS s 1.7741.7751.7811.8062.1342.1532.1552.212
True model: L M ( 2.04 , 2202.85 ) True model: W E ( 0.79 , 1690.57 )
True Q 3 ( 0.99 ) = 1.885 True Q 6 ( 0.99 ) = 1.168
50 AV s 0.9851.0671.1371.3810.6460.6740.7150.852
BS s −7.015−6.372−5.826−3.930330.374312.732286.627200.286
MS s 1.1991.4041.6842.4983.2383.3213.4674.616
100 AV s 0.9721.0071.0431.1480.6060.6190.6370.690
BS s −48.666−46.801−44.888−39.318180.380176.138170.449153.560
MS s 1.7491.8101.8652.2234.2384.2784.3524.729
Table A6. Proportion of models included for the VaR estimates reported in Table A5 for sample sizes 50 and 100 when the true models are F S ( 1.43 , 834.95 ) , F R ( 1.05 , 518.75 ) , L M ( 2.04 , 2202.85 ) , L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) . Values in bold indicate the results obtained under the true model.
Table A6. Proportion of models included for the VaR estimates reported in Table A5 for sample sizes 50 and 100 when the true models are F S ( 1.43 , 834.95 ) , F R ( 1.05 , 518.75 ) , L M ( 2.04 , 2202.85 ) , L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) . Values in bold indicate the results obtained under the true model.
True ModelnMethod α Assumed Model
FS FR LM LN PL WE
F S ( 1.43 , 834.95 ) 50Likelihood 0.2810.1020.0730.3200.1800.044
Divergence0.050.2840.1000.0720.3080.1990.037
0.100.2870.1030.0720.2940.2100.034
0.200.2730.1030.0640.3040.2220.034
100Likelihood 0.4080.0280.0540.2710.2340.005
Divergence0.050.4050.0320.0430.2640.2520.004
0.100.3940.0290.0430.2620.2680.004
0.200.3580.0290.040.2950.2720.006
F R ( 1.05 , 518.75 ) 50Likelihood 0.0310.8870.0000.0820.0000.000
Divergence0.050.0350.8920.0000.0730.0000.000
0.100.0400.8950.0000.0650.0000.000
0.200.0500.8820.0000.0670.0010.000
100Likelihood 0.0120.9610.0000.0270.0000.000
Divergence0.050.0150.9640.0000.0210.0000.000
0.100.0170.9660.0000.0170.0000.000
0.200.0170.9610.0000.0220.0000.000
L M ( 2.04 , 2202.85 ) 50Likelihood 0.0630.0070.3410.2570.0910.241
Divergence0.050.0700.0140.3370.2600.0990.220
0.100.0760.0160.3070.2640.1150.222
0.200.0880.0310.2700.2640.1240.223
100Likelihood 0.0420.0000.5140.1450.1390.160
Divergence0.050.0480.0010.5110.1440.1460.150
0.100.0560.0020.4970.1550.1460.144
0.200.0740.0040.4360.1770.1520.157
L N ( 1.19 , 6.81 ) 50Likelihood 0.0890.0960.0620.6010.0940.058
Divergence0.050.0970.1150.0530.5860.1040.045
0.100.1010.1260.0460.5770.1150.035
0.200.1120.1560.0310.5470.1250.029
100Likelihood 0.1060.0310.0260.7300.0940.013
Divergence0.050.1190.0440.0170.7100.1000.010
0.100.1310.0560.0160.6930.0970.007
0.200.1450.0880.0090.6550.0990.004
P L ( 1.27 , 1117.04 ) 50Likelihood 0.1840.0510.1580.3240.2100.073
Divergence0.050.1940.0580.1540.3100.2200.064
0.100.1950.0600.1510.3100.2250.059
0.200.1970.0700.1440.3030.2210.065
100Likelihood 0.2380.0110.1770.2430.3200.011
Divergence0.050.2310.0110.1660.2440.3400.008
0.100.2180.0100.1550.2530.3550.009
0.200.2160.0110.1540.2560.3500.013
W E ( 0.79 , 1690.57 ) 50Likelihood 0.0000.0000.1230.0630.0180.796
Divergence0.050.0030.0000.1320.0730.0190.773
0.100.0060.0000.1370.0930.0220.742
0.200.0140.0020.1490.1320.0240.679
100Likelihood 0.0010.0000.1400.0050.0000.854
Divergence0.050.0010.0000.1480.0110.0010.839
0.100.0010.0000.1530.0150.0040.827
0.200.0020.0000.1560.0320.0050.805
Table A7. VaR estimates (expressed as a multiplier of 10 4 ), relative biases and relative RMSEs for k * = 2 , δ = 0.99 , when true models are F S ( 1.43 , 834.95 ) , F R ( 1.05 , 518.75 ) , L M ( 2.04 , 2202.85 ) , L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) . Values in bold indicate the smallest M S k * among all the estimators.
Table A7. VaR estimates (expressed as a multiplier of 10 4 ), relative biases and relative RMSEs for k * = 2 , δ = 0.99 , when true models are F S ( 1.43 , 834.95 ) , F R ( 1.05 , 518.75 ) , L M ( 2.04 , 2202.85 ) , L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) . Values in bold indicate the smallest M S k * among all the estimators.
n Likelihood BasedDivergence BasedLikelihood BasedDivergence Based
α = 0.05 α = 0.10 α = 0.20 α = 0.05 α = 0.10 α = 0.20
True model: F S ( 1.43 , 834.95 ) True model: L N ( 1.19 , 6.81 )
True Q 1 ( 0.99 ) = 2.076 True Q 4 ( 0.99 ) = 0.770
50 AV k * 1.0771.1241.1701.2920.4990.5630.6420.816
BS k * −6.584−6.269−5.970−5.162−29.730−22.714−13.9595.104
MS k * 1.3061.3551.4291.6761.6401.8042.1893.209
100 AV k * 0.9560.9480.9590.9680.4830.5370.5860.719
BS k * −18.547−18.680−18.491−18.349−16.350−13.259−10.452−2.882
MS k * 1.8431.8531.8571.8932.0442.2532.4953.568
True model: F R ( 1.05 , 518.75 ) True model: P L ( 1.27 , 1117.04 )
True Q 2 ( 0.99 ) = 4.146 True Q 5 ( 0.99 ) = 1.900
50 AV k * 1.5411.4741.4201.3581.0541.1171.1761.409
BS k * −5.785−5.935−6.054−6.192−4.790−4.436−4.099−2.783
MS k * 0.8920.8880.8810.9011.0241.0791.1902.002
100 AV k * 1.4851.4361.3901.3110.9921.0211.0551.133
BS k * −14.859−15.131−15.389−15.831−8.916−8.636−8.297−7.532
MS k * 1.5031.5261.5491.5921.3951.3961.3991.576
True model: L M ( 2.04 , 2202.85 ) True model: W E ( 0.79 , 1690.57 )
True Q 3 ( 0.99 ) = 1.885 True Q 6 ( 0.99 ) = 1.168
50 AV k * 1.1201.2251.3961.8760.7610.9081.1962.774
BS k * −5.958−5.141−3.812−0.074257.901164.620−17.435−1015.696
MS k * 0.8120.8651.1412.3502.5103.5086.06938.488
100 AV k * 1.0811.1511.2261.4060.6590.7250.8371.393
BS k * −42.852−39.123−35.150−25.559163.240142.046106.259−71.952
MS k * 1.1321.1171.1181.5902.7052.9453.7209.341
Table A8. VaR estimates (expressed as a multiplier of 10 4 ), relative biases and relative RMSEs for k * = 3 , δ = 0.99 , when true models are F S ( 1.43 , 834.95 ) , F R ( 1.05 , 518.75 ) , L M ( 2.04 , 2202.85 ) , L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) . Values in bold indicate the smallest M S k * among all the estimators.
Table A8. VaR estimates (expressed as a multiplier of 10 4 ), relative biases and relative RMSEs for k * = 3 , δ = 0.99 , when true models are F S ( 1.43 , 834.95 ) , F R ( 1.05 , 518.75 ) , L M ( 2.04 , 2202.85 ) , L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) . Values in bold indicate the smallest M S k * among all the estimators.
n Likelihood BasedDivergence BasedLikelihood BasedDivergence Based
α = 0.05 α = 0.10 α = 0.20 α = 0.05 α = 0.10 α = 0.20
True model: F S ( 1.43 , 834.95 ) True model: L N ( 1.19 , 6.81 )
True Q 1 ( 0.99 ) = 2.076 True Q 4 ( 0.99 ) = 0.770
50 AV k * 0.7130.7300.7530.8010.3320.3670.4100.498
BS k * −8.976−8.868−8.717−8.399−48.002−44.178−39.463−29.769
MS k * 1.4021.3981.4011.4391.7891.7421.7481.893
100 AV k * 0.6330.6260.6240.6430.3300.3570.3930.461
BS k * −23.890−23.996−24.034−23.722−25.090−23.505−21.499−17.606
MS k * 2.2402.2482.2532.2802.6012.5262.4622.548
True model: F R ( 1.05 , 518.75 ) True model: P L ( 1.27 , 1117.04 )
True Q 2 ( 0.99 ) = 4.146 True Q 5 ( 0.99 ) = 1.900
50 AV k * 0.9060.8380.7970.7470.7180.7520.7880.901
BS k * −7.195−7.347−7.438−7.548−6.693−6.498−6.297−5.659
MS k * 1.0121.0231.0331.0461.0741.0751.0771.286
100 AV k * 0.8530.8020.7610.7120.6770.6880.7020.743
BS k * −18.388−18.671−18.898−19.172−12.012−11.896−11.765−11.364
MS k * 1.8231.8491.8701.8971.7491.7401.7281.730
True model: L M ( 2.04 , 2202.85 ) True model: W E ( 0.79 , 1690.57 )
True Q 3 ( 0.99 ) = 1.885 True Q 6 ( 0.99 ) = 1.168
50 AV k * 0.8490.9271.0451.3670.7770.9751.2836.694
BS k * −8.074−7.466−6.544−4.039247.431121.999−72.393−3494.306
MS k * 0.8440.8290.8711.2972.1633.1245.933468.866
100 AV k * 0.8240.8910.9561.1110.7180.8421.0171.587
BS k * −56.578−52.988−49.546−41.279144.572104.56348.518−134.183
MS k * 1.3251.2641.2191.2552.5452.4502.9477.726
Table A9. Proportion of inclusion of true model as per order of preference for the VaR estimates reported in Table A7 for sample sizes 50 and 100 when the true models are F S ( 1.43 , 834.95 ) , F R ( 1.05 , 518.75 ) , L M ( 2.04 , 2202.85 ) , L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) .
Table A9. Proportion of inclusion of true model as per order of preference for the VaR estimates reported in Table A7 for sample sizes 50 and 100 when the true models are F S ( 1.43 , 834.95 ) , F R ( 1.05 , 518.75 ) , L M ( 2.04 , 2202.85 ) , L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) .
nEstimator α Order of PreferenceOrder of Preference
τ 12 τ 12
True model: F S ( 1.43 , 834.95 ) True model: L N ( 1.19 , 6.81 )
50Likelihood 0.3910.2810.3280.2400.6010.159
Divergence0.050.3620.2840.3540.2360.5860.178
0.100.3570.2870.3560.2360.5770.187
0.200.3600.2730.3670.2430.5470.210
100Likelihood 0.2310.4080.3610.1520.7300.118
Divergence0.050.2080.4050.3870.1610.7100.129
0.100.1850.3940.4210.1680.6930.139
0.200.1930.3580.4490.1750.6550.170
True model: F R ( 1.05 , 518.75 ) True model: P L ( 1.27 , 1117.04 )
50Likelihood 0.0650.8870.0480.4460.2100.344
0.050.0620.8920.0460.4340.2200.346
Divergence0.100.0580.8950.0470.4370.2250.338
Likelihood0.200.0630.8820.0550.4700.2210.309
0.0130.9610.0260.2360.3200.444
100Divergence0.050.0130.9640.0230.2390.3400.421
0.100.0130.9660.0210.2510.3550.394
0.200.0210.9610.0180.2890.3500.361
True model: L M ( 2.04 , 2202.85 ) True model: W E ( 0.79 , 1690.57 )
50Likelihood 0.3220.3410.3370.1010.7960.103
Divergence0.050.3510.3370.3120.1220.7730.105
0.100.3880.3070.3050.1470.7420.111
0.200.4510.2700.2790.2060.6790.115
100Likelihood 0.2100.5140.2760.0340.8540.112
Divergence0.050.2370.5110.2520.0430.8390.118
0.100.2700.4970.2330.0600.8270.113
0.200.3390.4360.2250.0800.8050.115
τ non-inclusion of the VaR estimate of the true model.
Table A10. Proportion of inclusion of true model as per order of preference for the VaR estimates reported in Table A8 for sample sizes 50 and 100 when the true models are F S ( 1.43 , 834.95 ) , F R ( 1.05 , 518.75 ) , L M ( 2.04 , 2202.85 ) , L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) .
Table A10. Proportion of inclusion of true model as per order of preference for the VaR estimates reported in Table A8 for sample sizes 50 and 100 when the true models are F S ( 1.43 , 834.95 ) , F R ( 1.05 , 518.75 ) , L M ( 2.04 , 2202.85 ) , L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) .
nEstimator α Order of PreferenceOrder of Preference
τ 123 τ 123
True model: F S ( 1.43 , 834.95 ) True model: L N ( 1.19 , 6.81 )
50Likelihood 0.1720.2810.3280.2190.0960.6010.1590.144
Divergence0.050.1310.2840.3540.2310.0880.5860.1780.148
0.100.1080.2870.3560.2490.0890.5770.1870.147
0.200.0940.2730.3670.2660.0900.5470.2100.153
100Likelihood 0.0640.4080.3610.1670.0350.7300.1180.117
Divergence0.050.0460.4050.3870.1620.0360.7100.1290.125
0.100.0400.3940.4210.1450.0300.6930.1390.138
0.200.0360.3580.4490.1570.0280.6550.1700.147
True model: F R ( 1.05 , 518.75 ) True model: P L ( 1.27 , 1117.04 )
50Likelihood 0.0420.8870.0480.0230.1250.2100.3440.321
Divergence0.050.0380.8920.0460.0240.1220.2200.3460.312
0.100.0330.8950.0470.0250.1290.2250.3380.308
0.200.0330.8820.0550.0300.1480.2210.3090.322
Likelihood 0.0080.9610.0260.0050.0340.3200.4440.202
100Divergence0.050.0060.9640.0230.0070.0340.3400.4210.205
0.100.0050.9660.0210.0080.0350.3550.3940.216
0.200.0070.9610.0180.0140.0460.3500.3610.243
True model: L M ( 2.04 , 2202.85 ) True model: W E ( 0.79 , 1690.57 )
50Likelihood 0.2040.3410.3370.1180.0700.7960.1030.031
Divergence0.050.2340.3370.3120.1170.0900.7730.1050.032
0.100.2680.3070.3050.1200.1190.7420.1110.028
0.200.3290.2700.2790.1220.1740.6790.1150.032
100Likelihood 0.1020.5140.2760.1080.0220.8540.1120.012
Divergence0.050.1260.5110.2520.1110.0300.8390.1180.013
0.100.1520.4970.2330.1180.0390.8270.1130.021
0.200.2100.4360.2250.1290.0670.8050.1150.013
τ non-inclusion of the VaR estimate of the true model.

Appendix B. VaR Estimation for δ = 0.95 Without Contamination

Table A11. Simulated average estimate of VaR with associated bias and RMSEs (relative bias and relative RMSEs of divergence estimators) for δ = 0.95 , when the true models are F S ( 1.43 , 834.95 ) , F R ( 1.05 , 518.75 ) , L M ( 2.04 , 2202.85 ) , L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) , denoted as F S , F R , L M , L N , P L and W E respectively. Values in bold indicate the results obtained under the true model.
Table A11. Simulated average estimate of VaR with associated bias and RMSEs (relative bias and relative RMSEs of divergence estimators) for δ = 0.95 , when the true models are F S ( 1.43 , 834.95 ) , F R ( 1.05 , 518.75 ) , L M ( 2.04 , 2202.85 ) , L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) , denoted as F S , F R , L M , L N , P L and W E respectively. Values in bold indicate the results obtained under the true model.
nEstimator α Statistics (a)Model
j = 1 j = 2 j = 3 j = 4 j = 5 j = 6
FS FR LM LN PL WE
Q j ( δ ) 0.6540.8570.7360.3420.6620.678
50 AV L 0.6760.9000.7380.3430.6820.675
Likelihood BS L T 0.0220.0600.0010.0010.021−0.003
MS L T 0.2170.4050.2510.0920.2360.136
Divergence0.05 AV D 0.6780.9000.7470.3440.6860.679
BS D 1.0751.0048.1703.6431.171−0.221
MS D 1.0101.0131.0291.0241.0181.019
0.10 AV D 0.6800.9010.7580.3460.6890.681
BS D 1.1491.02615.5996.0361.336−1.230
MS D 1.0371.0461.1041.0611.0561.051
0.20 AV D 0.6820.9040.7900.3490.6960.686
BS D 1.2821.09639.29610.2241.664−2.911
MS D 1.1161.1341.4631.1651.1761.184
100 AV L 0.6640.8930.7270.3480.6780.676
Likelihood BS L T 0.0090.036−0.0090.0060.016−0.002
MS L T 0.1410.2560.1720.0620.1570.099
Divergence0.05 AV D 0.6650.8940.7340.3480.6800.677
BS D 1.1581.0140.2981.0751.1220.351
MS D 1.0211.0121.0241.0081.0191.006
0.10 AV D 0.6670.8950.7400.3480.6820.678
BS D 1.3131.041−0.4211.1301.236−0.189
MS D 1.0561.0401.0731.0321.0571.023
0.20 AV D 0.6690.8960.7580.3490.6860.680
BS D 1.5721.092−2.3451.1701.456−1.036
MS D 1.1401.1121.2701.1121.1621.095
(a) expressed as a multiplier of 10 4 .
Table A12. Simulated average estimate of VaR (expressed as a multiplier of 10 4 ), relative bias and relative RMSE with δ = 0.95 when the non-model selection approach is adopted for true models F S ( 1.43 , 834.95 ) and F R ( 1.05 , 518.75 ) . Values in bold indicate the results obtained under the true model.
Table A12. Simulated average estimate of VaR (expressed as a multiplier of 10 4 ), relative bias and relative RMSE with δ = 0.95 when the non-model selection approach is adopted for true models F S ( 1.43 , 834.95 ) and F R ( 1.05 , 518.75 ) . Values in bold indicate the results obtained under the true model.
Assumed Model
True Model   ( h ) n Method α j = 1 j = 2 j = 3 j = 4 j = 5 j = 6
FS FR LM LN PL WE
F S ( 1.43 , 834.95 ) 50Likelihood based AV L 0.676 2.4980.6630.6860.6290.683
BS L 1.00084.5460.3711.460−1.1561.307
MS L 1.00017.9731.1641.0940.9971.610
Divergence based0.05 AV D 0.6782.9410.6470.6730.6190.598
BS D 1.075104.831−0.3580.841−1.609−2.580
MS D 1.01019.1031.0861.0170.9761.070
0.10 AV D 0.6803.2650.6280.6640.6100.521
BS D 1.149119.711−1.2310.414−2.031−6.138
MS D 1.03721.3931.0291.0030.9800.983
0.20 AV D 0.6823.9260.5860.6550.5950.430
BS D 1.282150.006−3.1560.029−2.722−10.281
MS D 1.11636.2050.9811.0711.0351.174
100Likelihood based AV L 0.6642.5840.6540.6770.6150.688
BS L 1.000206.365−0.0592.404−4.1973.639
MS L 1.00020.9361.1331.0881.0141.712
Divergence based0.05 AV D 0.6652.7270.6340.6620.6050.584
BS D 1.158221.649−2.1820.788−5.338−7.558
MS D 1.02120.2751.0611.0221.0211.118
0.10 AV D 0.6672.8580.6120.6510.5950.510
BS D 1.313235.765−4.532−0.328−6.374−15.491
MS D 1.05620.4291.0331.0201.0501.283
0.20 AV D 0.6693.1990.5660.6410.5790.424
BS D 1.572272.153−9.418−1.489−8.083−24.699
MS D 1.14024.5171.1051.0901.1361.736
F R ( 1.43 , 834.95 ) 50Likelihood based AV L 0.5940.9160.8440.7050.6311.032
BS L −4.4101.000−0.217−2.552−3.7872.937
MS L 0.8651.0001.0530.9340.8702.728
Divergence based0.05 AV D 0.5430.9180.7860.6040.5660.733
BS D −5.2651.033−1.183−4.249−4.883−2.075
MS D 0.9151.0090.9140.8580.9020.980
0.10 AV D 0.5020.9220.7280.5320.5110.551
BS D −5.9541.098−2.154−5.451−5.805−5.126
MS D 0.9781.0400.8360.9290.9730.956
0.20 AV D 0.4420.9310.6200.4410.4290.377
BS D −6.9551.250−3.967−6.979−7.188−8.053
MS D 1.0911.1280.8381.0891.1221.228
100Likelihood based AV L 0.5810.8930.8240.6850.6161.005
BS L −7.5911.000−0.894−4.731−6.6224.073
MS L 1.2361.0001.0411.0731.1672.672
Divergence based0.05 AV D 0.5320.8940.7680.5900.5540.706
BS D −8.9251.014−2.431−7.323−8.325−4.150
MS D 1.3731.0120.9721.2101.3201.109
0.10 AV D 0.4920.8950.7120.5220.5010.536
BS D −10.0241.041−3.977−9.203−9.779−8.813
MS D 1.5011.0400.9881.4031.4801.393
0.20 AV D 0.4330.8960.6080.4330.4210.372
BS D −11.6481.092−6.852−11.641−11.995−13.328
MS D 1.7051.1121.1761.7011.7531.927
Table A13. Simulated average estimate of VaR (expressed as a multiplier of 10 4 ), relative bias and relative RMSE with δ = 0.95 when non-model selection approach is adopted for true models L M ( 2.04 , 2202.85 ) and L N ( 1.19 , 6.81 ) . Values in bold indicate the results obtained under the true model.
Table A13. Simulated average estimate of VaR (expressed as a multiplier of 10 4 ), relative bias and relative RMSE with δ = 0.95 when non-model selection approach is adopted for true models L M ( 2.04 , 2202.85 ) and L N ( 1.19 , 6.81 ) . Values in bold indicate the results obtained under the true model.
Assumed Model
True Model  ( h ) n Method α j = 1 j = 2 j = 3 j = 4 j = 5 j = 6
FS FR LM L N PL WE
L M ( 2.04 , 2202.85 ) 50Likelihood based AV L 0.9889.1220.7380.9420.8760.728
BS L 184.6806149.5051.000151.066102.676−6.317
MS L 1.591126.7451.0001.4801.2821.193
Divergence based0.05 AV D 1.04715.5870.7470.9880.9240.681
BS D 227.61310891.3268.170184.840137.862−40.603
MS D 1.877384.1221.0291.7111.5260.839
0.10 AV D 1.10414.6430.7581.0400.9760.635
BS D 269.89110198.94015.599222.475175.556−74.194
MS D 2.187139.3741.1042.0221.8390.795
0.20 AV D 1.21430.7570.7901.1631.0870.570
BS D 350.25922016.19239.296313.028257.452−122.024
MS D 2.861583.4491.4633.0112.6780.887
100Likelihood based AV L 0.9847.4190.7270.9330.8690.721
BS L −26.911−725.5821.000−21.380−14.3661.664
MS L 1.97271.4931.0001.7221.4391.077
Divergence based0.05 AV D 1.0439.2640.7340.9770.9150.673
BS D −33.286−925.9000.298−26.136−19.4396.863
MS D 2.34881.5291.0242.0031.7320.923
0.10 AV D 1.10011.0710.7401.0250.9640.630
BS D −39.433−1122.061−0.421−31.340−24.69711.544
MS D 2.734101.6871.0732.3432.0740.954
0.20 AV D 1.20316.5000.7581.1301.0620.567
BS D −50.633−1711.510−2.345−42.771−35.31518.440
MS D 3.490223.1771.2703.1742.8601.173
L N ( 1.19 , 6.18 ) 50Likelihood based AV L 0.3630.9030.3270.3430.3300.313
BS L 31.249853.066−23.5331.000−17.735−44.701
MS L 1.1397.9821.0051.0001.0331.020
Divergence based0.05 AV D 0.3691.0610.3260.3440.3320.297
BS D 40.9701092.348−24.1273.643−14.747−68.395
MS D 1.21310.4211.0131.0241.0670.986
0.10 AV D 0.3741.2160.3240.3460.3330.279
BS D 48.2631327.837−26.9826.036−13.575−95.557
MS D 1.28512.9901.0261.0611.1061.033
0.20 AV D 0.3791.5010.3180.3490.3320.248
BS D 56.6201761.416−36.36610.224−15.694−143.206
MS D 1.41418.6201.0591.1651.1911.228
100Likelihood based AV L 0.3650.9210.3350.3480.3340.323
BS L 4.093101.479−1.1911.000−1.351−3.316
MS L 1.15110.6891.0381.0001.0121.096
Divergence based0.05 AV D 0.3711.0490.3330.3480.3350.304
BS D 5.025123.842−1.5421.075−1.211−6.653
MS D 1.23012.9331.0281.0081.0341.082
0.10 AV D 0.3741.1640.3300.3480.3350.284
BS D 5.686144.026−2.1141.130−1.280−10.129
MS D 1.30515.0411.0241.0321.0641.215
0.20 AV D 0.3781.3750.3200.3490.3310.250
BS D 6.312181.018−3.7911.170−1.904−16.054
MS D 1.42719.1651.0421.1121.1401.614
Table A14. Simulated average estimate of VaR (expressed as a multiplier of 10 4 ), relative bias and relative RMSE with δ = 0.95 when non-model selection approach is adopted for true models P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) . Values in bold indicate the results obtained under the true model.
Table A14. Simulated average estimate of VaR (expressed as a multiplier of 10 4 ), relative bias and relative RMSE with δ = 0.95 when non-model selection approach is adopted for true models P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) . Values in bold indicate the results obtained under the true model.
Assumed Model
TrueModel   ( h ) n Method α j = 1 j = 2 j = 3 j = 4 j = 5 j = 6
FS FR LM LN PL WE
P L ( 1.27 , 1117.04 ) 50Likelihood based AV L 0.7503.6440.6710.7430.6820.678
BS L 4.306145.0890.4433.9711.0000.780
MS L 1.09630.6791.0611.1251.0001.417
Divergence based0.05 AV D 0.7664.4070.6630.7460.6860.610
BS D 5.046182.2240.0404.1051.171−2.538
MS D 1.16230.5671.0301.1191.0180.922
0.10 AV D 0.7805.0460.6520.7520.6890.549
BS D 5.726213.298−0.4804.3791.336−5.484
MS D 1.23940.7631.0161.1631.0560.843
0.20 AV D 0.8046.0850.6260.7690.6960.468
BS D 6.895263.827−1.7285.2001.664−9.418
MS D 1.41251.3221.0351.3411.1760.978
100Likelihood based AV L 0.7483.6070.6720.7440.6780.685
BS L 5.253178.9050.5885.0031.0001.407
MS L 1.18730.7341.0501.2101.0001.322
Divergence based0.05 AV D 0.7633.9460.6590.7450.6800.611
BS D 6.142199.446−0.1805.0661.122−3.105
MS D 1.27827.7721.0001.2101.0190.984
0.10 AV D 0.7764.2300.6440.7490.6820.548
BS D 6.953216.689−1.0985.2671.236−6.920
MS D 1.37529.5820.9681.2491.0571.027
0.20 AV D 0.7994.8770.6100.7600.6860.466
BS D 8.320256.007−3.1655.9841.456−11.895
MS D 1.56936.1530.9771.3971.1621.373
W E ( 0.79 , 1690.57 ) 50Likelihood based AV L 1.28728.8420.6811.1831.0780.675
BS L −230.799−1.067 ×   10 4 −1.040−191.338−151.6771.000
MS L 5.4081.350 ×   10 3 1.1104.7933.9291.000
Divergence based0.05 AV D 1.4521.143 ×   10 2 0.7141.3381.2340.679
BS D −293.208−4.303 ×   10 4 −13.784−250.069−210.582−0.221
MS D 6.9391.476 ×   10 4 1.3736.4225.5091.019
0.10 AV D 1.6341.338 ×   10 3 0.7631.5321.4300.681
BS D −362.149−5.066 ×   10 5 −32.376−323.500−284.783−1.230
MS D 8.7922.928 ×   10 5 2.0199.1567.8691.051
0.20 AV D 2.0721.115 ×   10 6 1.0102.5202.2430.686
BS D −527.976−4.226 ×   10 8 −125.888−697.980−592.831−2.911
MS D 16.4352.594 ×   10 8 9.90091.79466.2161.184
100Likelihood based AV L 1.27123.4810.6741.1771.0540.676
BS L −349.095−1.34 ×   10 4 2.154−293.847−221.6801.000
MS L 6.598528.7171.1005.7414.4641.000
Divergence based0.05 AV D 1.42229.4910.6981.3071.1890.677
BS D −438.486−1.70 ×   10 4 −11.884−370.264−300.8240.351
MS D 8.293791.4711.2707.2126.0171.006
0.10 AV D 1.58438.1430.7301.4551.3460.678
BS D −533.725−2.21 ×   10 4 −30.474−457.465−393.307−0.189
MS D 10.1461.08 ×   10 3 1.6168.9857.9181.023
0.20 AV D 1.92574.2050.8301.8211.7300.680
BS D −734.430−4.33 ×   10 4 −89.715−673.064−619.862−1.036
MS D 14.2692.24 ×   10 3 3.32813.92113.0361.095
Table A15. VaR estimates (expressed as a multiplier of 10 4 ), relative biases and relative RMSEs for k * = 1 , δ = 0.95 , when true models are F S ( 1.43 , 834.95 ) , F R ( 1.05 , 518.75 ) , L M ( 2.04 , 2202.85 ) , L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) . Values in bold indicate the smallest M S S among all the estimators.
Table A15. VaR estimates (expressed as a multiplier of 10 4 ), relative biases and relative RMSEs for k * = 1 , δ = 0.95 , when true models are F S ( 1.43 , 834.95 ) , F R ( 1.05 , 518.75 ) , L M ( 2.04 , 2202.85 ) , L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) . Values in bold indicate the smallest M S S among all the estimators.
n Likelihood BasedDivergence BasedLikelihood BasedDivergence Based
α = 0.05 α = 0.10 α = 0.20 α = 0.05 α = 0.10 α = 0.20
True model: F S ( 1.43 , 834.95 ) True model: L N ( 1.19 , 6.81 )
True Q 1 ( 0.95 ) = 0.654 True Q 4 ( 0.95 ) = 0.342
50 AV s 0.3400.3400.3420.3450.1740.1810.1870.201
BS s −14.433−14.430−14.341−14.198−254.946−244.556−235.040−213.511
MS s 2.3732.3972.4412.5542.7702.8452.9363.140
100 AV s 0.3220.3220.3190.3150.1740.1780.1820.196
BS s −35.577−35.582−35.937−36.309−29.365−28.728−27.985−25.662
MS s 3.4033.4193.4243.4463.9864.0254.0944.391
True model: F R ( 1.05 , 518.75 ) True model: P L ( 1.27 , 1117.04 )
True Q 2 ( 0.95 ) = 0.857 True Q 5 ( 0.95 ) = 0.662
50 AV s 0.4340.4310.4300.4220.3430.3490.3510.362
BS s −7.100−7.145−7.161−7.290−15.527−15.226−15.127−14.594
MS s 1.6561.6551.6641.6782.1352.2042.2302.384
100 AV s 0.4390.4390.4380.4370.3400.3400.3400.340
BS s −11.492−11.500−11.505−11.549−19.575−19.534−19.545−19.556
MS s 2.4762.4782.4832.5013.0843.0873.0883.100
True model: L M ( 2.04 , 2202.85 ) True model: W E ( 0.79 , 1690.57 )
True Q 3 ( 0.95 ) = 0.736 True Q 6 ( 0.95 ) = 0.678
50 AV s 0.3740.3830.3910.4160.3450.3520.3620.391
BS s −265.734−258.878−252.968−234.818126.059123.430119.696108.616
MS s 2.1902.2192.2662.4233.6073.6203.6423.793
100 AV s 0.3700.3750.3810.3970.3400.3430.3470.360
BS s 39.79239.26438.60236.897199.204197.236194.804187.452
MS s 3.1153.1273.1433.2214.9064.9114.9204.982
Table A16. VaR estimates (expressed as a multiplier of 10 4 ), relative biases and relative RMSEs for k * = 2 , δ = 0.95 , when true models are F S ( 1.43 , 834.95 ) , F R ( 1.05 , 518.75 ) , L M ( 2.04 , 2202.85 ) , L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) . Values in bold indicate the smallest M S k * among all the estimators.
Table A16. VaR estimates (expressed as a multiplier of 10 4 ), relative biases and relative RMSEs for k * = 2 , δ = 0.95 , when true models are F S ( 1.43 , 834.95 ) , F R ( 1.05 , 518.75 ) , L M ( 2.04 , 2202.85 ) , L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) . Values in bold indicate the smallest M S k * among all the estimators.
n Likelihood BasedDivergence BasedLikelihood BasedDivergence Based
α = 0.05 α = 0.10 α = 0.20 α = 0.05 α = 0.10 α = 0.20
True model: F S ( 1.43 , 834.95 ) True model: L N ( 1.19 , 6.81 )
True Q 1 ( 0.95 ) = 0.654 True Q 4 ( 0.95 ) = 0.342
50 AV k * 0.3330.3370.3400.3480.1740.1810.1900.208
BS k * −14.723−14.571−14.438−14.072−255.662−244.217−230.788−203.567
MS k * 1.6021.6001.6041.6161.9221.8731.8331.784
100 AV k * 0.3220.3200.3200.3180.1760.1820.1880.202
BS k * −35.578−35.809−35.786−35.945−29.102−28.004−26.994−24.523
MS k * 2.4282.4412.4442.4642.7382.6722.6152.529
True model: F R ( 1.05 , 518.75 ) True model: P L ( 1.27 , 1117.04 )
True Q 2 ( 0.95 ) = 0.857 True Q 5 ( 0.95 ) = 0.662
50 AV k * 0.3730.3590.3480.3310.3440.3500.3550.372
BS k * −8.118−8.352−8.544−8.828−15.450−15.162−14.915−14.111
MS k * 1.2691.2911.3081.3481.4581.4451.4411.492
100 AV k * 0.3680.3560.3450.3280.3420.3450.3490.356
BS k * −13.451−13.779−14.061−14.529−19.404−19.223−18.983−18.546
MS k * 1.9531.9952.0332.0982.1022.0872.0692.060
True model: L M ( 2.04 , 2202.85 ) True model: W E ( 0.79 , 1690.57 )
True Q 3 ( 0.95 ) = 0.736 True Q 6 ( 0.95 ) = 0.678
50 AV k * 0.3840.3960.4130.4540.3520.3740.4110.545
BS k * −258.561−249.563−237.003−206.978123.459115.159101.17550.371
MS k * 1.5001.4731.4591.5132.4972.4312.4333.894
100 AV k * 0.3820.3910.4020.4240.3400.3510.3680.441
BS k * 38.52237.48536.30833.889199.122192.602182.416139.645
MS k * 2.1362.0932.0482.0033.4693.3913.2983.236
Table A17. VaR estiamtes (expressed as a multiplier of 10 4 ), relative biases and relative RMSEs for k * = 3 , δ = 0.95 , when true models are F S ( 1.43 , 834.95 ) , F R ( 1.05 , 518.75 ) , L M ( 2.04 , 2202.85 ) , L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) . Values in bold indicate the smallest M S k * among all the estimators.
Table A17. VaR estiamtes (expressed as a multiplier of 10 4 ), relative biases and relative RMSEs for k * = 3 , δ = 0.95 , when true models are F S ( 1.43 , 834.95 ) , F R ( 1.05 , 518.75 ) , L M ( 2.04 , 2202.85 ) , L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) . Values in bold indicate the smallest M S k * among all the estimators.
n Likelihood BasedDivergence BasedLikelihood BasedDivergence Based
α = 0.05 α = 0.10 α = 0.20 α = 0.05 α = 0.10 α = 0.20
True model: F S ( 1.43 , 834.95 ) True model: L N ( 1.19 , 6.81 )
True Q 1 ( 0.95 ) = 0.654 True Q 4 ( 0.95 ) = 0.342
50 AV k * 0.2240.2250.2260.2270.1160.1200.1250.134
BS k * −19.726−19.697−19.652−19.583−343.706−337.259−329.954−315.924
MS k * 2.0212.0162.0172.0182.4772.4372.3962.322
100 AV k * 0.2170.2150.2140.2130.1180.1210.1250.132
BS k * −46.755−46.980−47.154−47.253−39.309−38.725−38.005−36.756
MS k * 3.1273.1413.1523.1663.6283.5793.5213.431
True model: F R ( 1.05 , 518.75 ) True model: P L ( 1.27 , 1117.04 )
True Q 2 ( 0.95 ) = 0.857 True Q 5 ( 0.95 ) = 0.662
50 AV k * 0.2390.2250.2150.1990.2330.2360.2400.248
BS k * −10.369−10.600−10.772−11.039−20.857−20.694−20.526−20.149
MS k * 1.5481.5761.5991.6361.8501.8381.8271.818
100 AV k * 0.2340.2220.2120.1960.2330.2340.2350.238
BS k * −17.131−17.456−17.741−18.182−26.072−26.001−25.918−25.713
MS k * 2.4472.4912.5302.5922.7592.7522.7442.731
True model: L M ( 2.04 , 2202.85 ) True model: W E ( 0.79 , 1690.57 )
True Q 3 ( 0.95 ) = 0.736 True Q 6 ( 0.95 ) = 0.678
50 AV k * 0.2700.2800.2930.3240.2740.3010.3370.486
BS k * −342.361−334.847−325.115−302.308152.993142.831129.18372.596
MS k * 1.8901.8561.8171.7533.0182.8792.7588.552
100 AV k * 0.2690.2780.2880.3080.2680.2880.3130.384
BS k * 50.78449.72148.69446.459241.322229.906215.181172.926
MS k * 2.7482.6952.6452.5484.1794.0023.7893.380

Appendix C. VaR Estimation for δ = 0.99 in the Presence of Contamination

Table A18. True VaR and the simulated average VaR estimates with associated biases and RMSEs (relative biases and relative RMSEs of divergence estimators) for δ = 0.99 , in the presence of ϵ = 0.1 contamination, for true models L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) , contaminated by L N ( 1.19 , 18.54 ) , P L ( 1.27 , 89363.2 ) and W E ( 0.79 , 84528.5 ) respectively.
Table A18. True VaR and the simulated average VaR estimates with associated biases and RMSEs (relative biases and relative RMSEs of divergence estimators) for δ = 0.99 , in the presence of ϵ = 0.1 contamination, for true models L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) , contaminated by L N ( 1.19 , 18.54 ) , P L ( 1.27 , 89363.2 ) and W E ( 0.79 , 84528.5 ) respectively.
nEstimator α Statistics (a)True Model
f θ : LN ( 1.19 , 6.81 ) PL ( 1.27 , 1117.04 ) WE ( 0.79 , 1690.57 )
f θ ( c ) : LN ( 1.19 , 18.54 ) PL ( 1.27 , 89363.2 ) WE ( 0.79 , 84528.5 )
Q θ j ( δ ) 0.7701.9001.168
50 AV L 1.53 ×   10 3 481.6748.504
Likelihood BS L T 1526.004481.6597.335
MS L T 1673.085531.0728.340
Divergence0.05 AV D 45.11315.8156.378
BS D 0.0290.0330.710
MS D 0.0280.0340.702
0.10 AV D 1.0271.9684.317
BS D 1.69 ×   10 4 0.0040.429
MS D 3.18 ×   10 4 0.0050.430
0.20 AV D 0.9761.0631.907
BS D 1.35 ×   10 4 0.0020.101
MS D 2.50 ×   10 4 0.0020.147
100 AV L 8.34513.4098.345
Likelihood BS L T 1490.9611.5087.177
MS L T 1560.37312.5537.682
Divergence0.05 AV D 6.2049.7326.204
BS D 0.7020.6810.702
MS D 0.6970.6920.697
0.10 AV D 4.1677.1894.167
BS D 0.4180.4600.418
MS D 0.4180.4870.418
0.20 AV D 1.8434.5551.843
BS D 0.0940.2310.094
MS D 0.1150.2840.115
(a) expressed as a multiplier of 10 4 .
Table A19. Simulated average VaR estimates (expressed as a multiplier of 10 4 ) for δ = 0.99 , relative bias and relative RMSE when a non-model selection approach is adopted for the true model L N ( 1.19 , 6.81 ) with ϵ = 0.1 contamination from L N ( 1.19 , 18.54 ) . Values in bold indicate the results obtained under the true model.
Table A19. Simulated average VaR estimates (expressed as a multiplier of 10 4 ) for δ = 0.99 , relative bias and relative RMSE when a non-model selection approach is adopted for the true model L N ( 1.19 , 6.81 ) with ϵ = 0.1 contamination from L N ( 1.19 , 18.54 ) . Values in bold indicate the results obtained under the true model.
True ModelnMethod α Assumed Model
FS FR LM LN PL WE
X ( c ) L N ( 1.19 , 18.54 ) 50Likelihood based AV L 109.971363.6874.25 ×   10 3 1.53 × 10 3 481.6747.29 ×   10 3
BS L 0.0720.2382.7871.0000.3154.779
MS L 0.0710.2632.8751.0000.3174.958
Divergence based0.05 AV D 5.80452.335188.27845.11315.8151.09 ×   10 3
BS D 0.0030.0340.1230.0290.0100.712
MS D 0.0040.0450.1300.0280.0110.692
0.10 AV D 1.88224.36110.2851.0271.9680.460
BS D 0.0010.0150.006 1.69 × 10 4 0.001−2.03 × 10 4
MS D 0.0010.0240.008 3.18 × 10 4 0.0012.06 ×   10 4
0.20 AV D 1.39921.0030.8860.9761.0630.401
BS D 4.12 ×   10 4 0.0137.63 ×   10 5 1.35 × 10 4 1.93 ×   10 4 −2.41 ×   10 4
BS D 5.71 ×   10 4 0.0234.99 ×   10 4 2.50 × 10 4 4.16 ×   10 4 2.33 ×   10 4
X ( 0 ) L N ( 1.9 , 6.81 ) 100Likelihood based AV L 108.003344.3714.08 ×   10 3 1.49 × 10 3 467.6137.08 ×   10 3
BS L 0.0720.2302.7351.0000.3134.748
MS L 0.0710.2422.7641.0000.3134.842
Divergence based0.05 AV D 5.59446.778178.36244.86915.0681.07 ×   10 3
BS D 0.0030.0310.1190.0300.0100.719
MS D 0.0030.0370.1220.0290.0100.710
0.10 AV D 1.80320.4699.1570.9421.8160.460
BS D 0.0010.0130.006 1.15 × 10 4 0.001−2.08 ×   10 4
MS D 0.0010.0170.006 2.07 × 10 4 0.0012.09 ×   10 4
0.20 AV D 1.34616.6910.7720.9131.0030.400
BS D 3.87 ×   10 4 0.0111.34 ×   10 6 9.61 × 10 5 1.57 ×   10 4 −2.48 ×   10 4
MS D 4.80 ×   10 4 0.0162.50 ×   10 4 1.77 × 10 4 2.92 ×   10 4 2.43 ×   10 4
Table A20. Simulated average VaR estimates for δ = 0.99 (expressed as a multiplier of 10 4 ), relative bias and relative RMSE when the non-model selection approach is adopted for the true model P L ( 1.27 , 1117.04 ) with ϵ = 0.1 contamination from P L ( 1.27 , 89363.2 ) . Values in bold indicate the results obtained under the true model.
Table A20. Simulated average VaR estimates for δ = 0.99 (expressed as a multiplier of 10 4 ), relative bias and relative RMSE when the non-model selection approach is adopted for the true model P L ( 1.27 , 1117.04 ) with ϵ = 0.1 contamination from P L ( 1.27 , 89363.2 ) . Values in bold indicate the results obtained under the true model.
True
Model
nMethod α Assumed Model
FS FR LM LN PL WE
X ( o ) P L ( 1.27 , 1117.04 ) ;    X ( c ) P L ( 1.27 , 89363.2 ) 50Likelihood
based
AV L 109.971363.6874.25 ×   10 3 1.53 ×   10 3 481.6747.29 ×   10 3
BS L 0.2280.7558.8313.1701.00015.141
MS L 0.2230.8309.0583.1521.00015.620
Divergence
based
0.05 AV D 5.80452.335188.27845.11315.8151.09 ×   10 3
BS D 0.0120.1090.3910.0940.0332.256
MS D 0.0120.1440.4100.0890.0342.181
0.10 AV D 1.88224.36110.2851.0271.9680.460
BS D 0.0040.0510.0210.0020.0040.001
MS D 0.0040.0750.0260.0020.0050.001
0.20 AV D 1.39921.0030.8860.9761.0630.401
BS D 0.0030.0440.0020.0020.0020.001
MS D 0.0030.0740.0020.0020.0020.001
100Likelihood
based
AV L 10.4831013.92126.31010.28313.40911.893
BS L 0.74687.9372.1210.7281.0000.868
MS L 0.7362116.9572.1700.7211.0001.381
Divergence
based
0.05 AV D 8.003168.71420.7837.5369.7327.351
BS D 0.53014.4951.6410.4900.6810.474
MS D 0.53426.8281.6810.4830.6920.472
0.10 AV D 6.406173.30215.5905.6417.1894.092
BS D 0.39114.8941.1900.3250.4600.190
MS D 0.40828.6561.2430.3310.4870.198
0.20 AV D 4.829188.7988.3073.7004.5551.070
BS D 0.25416.2400.5570.1560.231−0.072
MS D 0.28841.1260.6490.1900.2840.074
Table A21. Simulated average estimate of VaR for δ = 0.99 (expressed as a multiplier of 10 4 ), relative bias and relative RMSE when non-model selection approach is adopted for true model W E ( 0.79 , 1690.57 ) with ϵ = 0.1 contamination from W E ( 0.79 , 84528.5 ) . Values in bold indicate the results obtained under the true model.
Table A21. Simulated average estimate of VaR for δ = 0.99 (expressed as a multiplier of 10 4 ), relative bias and relative RMSE when non-model selection approach is adopted for true model W E ( 0.79 , 1690.57 ) with ϵ = 0.1 contamination from W E ( 0.79 , 84528.5 ) . Values in bold indicate the results obtained under the true model.
True
Model
nMethod α Assumed Model
FS FR LM LN PL WE
X ( o ) W E ( 0.79 , 1690.57 ) ;    X ( c ) W E ( 0.79 , 84528.5 ) 50Likelihood
based
AV L 17.6771.25 ×   10 5 23.80413.47620.7158.504
BS L 2.2511.70 ×   10 4 3.0861.6782.6651.000
MS L 2.3114.45 ×   10 5 3.5991.7892.9391.000
Divergence
based
0.05 AV D 17.6442.29 ×   10 4 23.62813.17421.3656.378
BS D 2.2463.13 ×   10 3 3.0621.6372.7530.710
MS D 2.4292.42 ×   10 4 4.0252.0803.4000.702
0.10 AV D 18.2831.46 ×   10 5 24.42914.23823.7594.317
BS D 2.3331.98 ×   10 4 3.1711.7823.0800.429
MS D 2.8372.64 ×   10 5 5.2794.8995.1610.430
0.20 AV D 30.0195.19 ×   10 15 56.731407.5382.43 ×   10 3 1.907
BS D 3.9337.07 ×   10 14 7.57555.398330.5650.101
MS D 35.0301.97 ×   10 16 48.8411.48 ×   10 3 9.06 ×   10 3 0.147
100Likelihood
based
AV L 16.5314.81 ×   10 4 21.05812.79218.5928.345
BS L 2.1416.70 ×   10 3 2.7711.6202.4281.000
MS L 2.1441.57 ×   10 5 2.9711.6262.5041.000
Divergence
based
0.05 AV D 16.1944.86 ×   10 3 19.85612.08318.2566.204
BS D 2.094677.4072.6041.5212.3810.702
MS D 2.1362.93 ×   10 3 2.8701.5552.5330.697
0.10 AV D 16.2678.81 ×   10 3 18.59411.77618.5014.167
BS D 2.1041.23 ×   10 3 2.4281.4782.4150.418
MS D 2.2075.44 ×   10 3 2.8351.5672.7200.418
0.20 AV D 17.7086.54 ×   10 4 18.25412.58921.7151.843
BS D 2.3059.11 ×   10 3 2.3811.5912.8630.094
MS D 2.6317.40 ×   10 4 3.7381.9384.0270.115
Table A22. VaR estimates (expressed as a multiplier of 10 4 ), relative biases and relative RMSEs for k * = 1 , δ = 0.99 , when true models are L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) ; contaminated with ϵ = 0.1 from L N ( 1.19 , 18.54 ) , P L ( 1.27 , 89363.2 ) and W E ( 0.79 , 84528.5 ) respectively. Values in bold indicate the smallest M S S among all the estimators.
Table A22. VaR estimates (expressed as a multiplier of 10 4 ), relative biases and relative RMSEs for k * = 1 , δ = 0.99 , when true models are L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) ; contaminated with ϵ = 0.1 from L N ( 1.19 , 18.54 ) , P L ( 1.27 , 89363.2 ) and W E ( 0.79 , 84528.5 ) respectively. Values in bold indicate the smallest M S S among all the estimators.
True
Model
nLikelihood
Based
Divergence Based
α = 0.05 α = 0.10 α = 0.20
L N 50 AV s 1.91 ×   10 3 26.4047.3591.775
BS s 1.2490.0170.0040.001
MS s 1.9770.0320.0090.003
100 AV s 1.92 ×   10 3 23.8217.1911.415
BS s 1.2890.0150.0044.33 ×   10 4
MS s 1.9190.0270.0080.002
P L 50 AV s 1.91 ×   10 3 26.4047.3591.775
BS s 3.960.0550.0150.004
MS s 6.2290.1030.0300.009
100 AV s 18.30715.57612.5068.123
BS s 1.4261.1880.9220.541
MS s 2.3572.2282.0791.745
W E 50 AV s 11.97711.99612.04111.941
BS s 1.4731.4761.4821.469
MS s 2.5702.8063.1145.429
100 AV s 10.1859.5258.9387.737
BS s 1.2561.1641.0830.915
MS s 1.9991.8881.8341.703
Table A23. VaR estimates (expressed as a multiplier of 10 4 ), relative biases and relative RMSEs for k * = 2 , δ = 0.99 , ϵ = 0.1 , when true models are L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) ; contaminated with ϵ = 0.1 from L N ( 1.19 , 18.54 ) , P L ( 1.27 , 89363.2 ) and W E ( 0.79 , 84528.5 ) respectively. Values in bold indicate the smallest M S k * among all the estimators.
Table A23. VaR estimates (expressed as a multiplier of 10 4 ), relative biases and relative RMSEs for k * = 2 , δ = 0.99 , ϵ = 0.1 , when true models are L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) ; contaminated with ϵ = 0.1 from L N ( 1.19 , 18.54 ) , P L ( 1.27 , 89363.2 ) and W E ( 0.79 , 84528.5 ) respectively. Values in bold indicate the smallest M S k * among all the estimators.
True
Model
nLikelihood
Based
Divergence Based
α = 0.05 α = 0.10 α = 0.20
L N 50 AV k * 1.15 ×   10 3 59.9505.0731.543
BS k * 0.7550.0390.0030.001
MS k * 0.7780.0420.0040.002
100 AV k * 1.10 ×   10 3 56.1314.6921.287
BS k * 0.7400.0370.0030.000
MS k * 0.7470.0380.0030.001
P L 50 AV k * 1.15 ×   10 3 59.9505.0731.543
BS k * 2.3920.1240.0110.003
MS k * 2.4510.1320.0150.006
100 AV k * 15.08912.2499.1365.813
BS k * 1.1460.8990.6290.340
MS k * 1.3121.1500.9290.686
W E 50 AV k * 11.43911.60012.22018.288
BS k * 1.4001.4221.5072.334
MS k * 1.6211.7802.20911.184
100 AV k * 9.7189.2979.0379.442
BS k * 1.1911.1331.0971.153
MS k * 1.2501.2031.2301.852
Table A24. VaR estimates (expressed as a multiplier of 10 4 ), relative biases and relative RMSEs for k * = 3 , δ = 0.99 , when true models are L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) ; contaminated with ϵ = 0.1 from L N ( 1.19 , 18.54 ) , P L ( 1.27 , 89363.2 ) and W E ( 0.79 , 84528.5 ) respectively. Values in bold indicate the smallest M S k * among all the estimators.
Table A24. VaR estimates (expressed as a multiplier of 10 4 ), relative biases and relative RMSEs for k * = 3 , δ = 0.99 , when true models are L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) ; contaminated with ϵ = 0.1 from L N ( 1.19 , 18.54 ) , P L ( 1.27 , 89363.2 ) and W E ( 0.79 , 84528.5 ) respectively. Values in bold indicate the smallest M S k * among all the estimators.
True
Model
nLikelihood
Based
Divergence Based
α = 0.05 α = 0.10 α = 0.20
L N 50 AV k * 561.13028.3892.8300.879
BS k * 0.3670.0180.0017.18 ×   10 5
MS k * 0.3780.0200.002 7.18 × 10 4
100 AV k * 538.19726.6192.6290.758
BS k * 0.3600.0170.001−7.91 ×   10 6
MS k * 0.3640.0180.002 4.74 × 10 4
P L 50 AV k * 561.13028.3892.8300.879
BS k * 1.1650.0590.0060.002
MS k * 1.1910.0630.0080.003
100 AV k * 8.3726.8815.5173.653
BS k * 0.5620.4330.3140.152
MS k * 0.6490.5450.4450.336
W E 50 AV k * 7.2627.3707.567272.877
BS s 0.8310.8450.87237.041
MS k * 0.9911.0981.267994.725
100 AV k * 6.2346.0255.8856.195
BS s 0.7060.6770.6570.700
MS k * 0.7560.7460.7651.042
Table A25. Proportion of models included for the VaR estimates reported in Table A22 for sample sizes 50 and 100 when the true models are L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) , contaminated by L N ( 1.19 , 18.54 ) , P L ( 1.27 , 89363.2 ) and W E ( 0.79 , 84528.5 ) respectively with ϵ = 0.1 . Values in bold indicate the results obtained under the true model.
Table A25. Proportion of models included for the VaR estimates reported in Table A22 for sample sizes 50 and 100 when the true models are L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) , contaminated by L N ( 1.19 , 18.54 ) , P L ( 1.27 , 89363.2 ) and W E ( 0.79 , 84528.5 ) respectively with ϵ = 0.1 . Values in bold indicate the results obtained under the true model.
True
Model
nMethod α Assumed Model
FS FR LM LN PL WE
L N 50Likelihood 0.0000.1930.8070.0000.0000.000
Divergence0.050.0000.9530.0470.0000.0000.000
0.100.2200.7690.0100.0000.0000.001
0.200.2310.3240.0120.2890.0980.046
100Likelihood 0.0000.0990.9010.0000.0000.000
Divergence0.050.0000.9770.0230.0000.0000.000
0.100.1920.8070.0010.0000.0000.000
0.200.2580.2330.0090.4220.0730.005
P L 50Likelihood 0.0000.1930.8070.0000.0000.000
Divergence0.050.0000.9530.0470.0000.0000.000
0.100.2200.7690.0100.0000.0000.001
0.200.2310.3240.0120.2890.0980.046
100Likelihood 0.1330.4300.4340.0010.0020.000
Divergence0.050.3010.3860.3110.0000.0020.000
0.100.5110.3350.1520.0000.0020.000
0.200.7180.2230.0440.0090.0060.000
W E 50Likelihood 0.4230.0590.3380.0740.0910.015
Divergence0.050.4770.0610.2880.0750.0820.017
0.100.5070.0610.2620.0880.0630.019
0.200.4330.0570.2030.1600.0700.077
100Likelihood 0.4340.0050.4250.0210.1150.000
Divergence0.050.5240.0050.3330.0220.1160.000
0.100.5880.0060.2620.0290.1110.004
0.200.5790.0030.2180.0750.0910.034
Table A26. Proportion of inclusion of true model as per order of preference for the VaR estimates reported in Table A23 for sample sizes 50 and 100 when the true models are L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) with contamination from L N ( 1.19 , 18.54 ) , P L ( 1.27 , 89363.2 ) and W E ( 0.79 , 84528.5 ) respectively for ϵ = 0.1 .
Table A26. Proportion of inclusion of true model as per order of preference for the VaR estimates reported in Table A23 for sample sizes 50 and 100 when the true models are L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) with contamination from L N ( 1.19 , 18.54 ) , P L ( 1.27 , 89363.2 ) and W E ( 0.79 , 84528.5 ) respectively for ϵ = 0.1 .
True
Model
nMethod α Order of Preference
0 τ 12
L N 50Likelihood 1.0000.0000.000
Divergence0.051.0000.0000.000
0.101.0000.0000.000
0.200.3970.2890.314
100Likelihood 1.0000.0000.000
Divergence0.051.0000.0000.000
0.101.0000.0000.000
0.200.2680.4220.310
P L 50Likelihood 1.0000.0000.000
Divergence0.050.9980.0000.002
0.100.8470.0000.153
0.200.6720.0980.230
100Likelihood 0.6400.0020.358
Divergence0.050.6640.0020.334
0.100.5250.0020.473
0.200.3170.0060.677
W E 50Likelihood 0.9800.0150.005
Divergence0.050.9810.0170.002
0.100.9750.0190.006
0.200.8860.0770.037
100Likelihood 1.0000.0000.000
Divergence0.050.9980.0000.002
0.100.9930.0040.003
0.200.9450.0340.021
τ non-inclusion of the VaR estimate of the true model.
Table A27. Proportion of inclusion of true model as per order of preference for the VaR estimates reported in Table A24 for sample sizes 50 and 100 when the true models are L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) with contamination from L N ( 1.19 , 18.54 ) , P L ( 1.27 , 89363.2 ) and W E ( 0.79 , 84528.5 ) respectively for ϵ = 0.1 .
Table A27. Proportion of inclusion of true model as per order of preference for the VaR estimates reported in Table A24 for sample sizes 50 and 100 when the true models are L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) with contamination from L N ( 1.19 , 18.54 ) , P L ( 1.27 , 89363.2 ) and W E ( 0.79 , 84528.5 ) respectively for ϵ = 0.1 .
True
Model
nMethod α Order of Preference
0 τ 123
L N 50Likelihood 1.0000.0000.0000.000
Divergence0.051.0000.0000.0000.000
0.100.9920.0000.0000.008
0.200.1710.2890.3140.226
100Likelihood 1.0000.0000.0000.000
Divergence0.051.0000.0000.0000.000
0.100.9990.0000.0000.001
0.200.0540.4220.3100.214
P L 50Likelihood 0.0000.0000.0001.000
Divergence0.050.2300.0000.0020.768
0.100.1460.0000.1530.701
0.200.4090.0980.2300.263
100Likelihood 0.2270.0020.3580.413
Divergence0.050.2150.0020.3340.449
0.100.1050.0020.4730.420
0.200.1330.0060.6770.184
W E 50Likelihood 0.9740.0150.0050.006
Divergence0.050.9680.0170.0020.013
0.100.9500.0190.0060.025
0.200.8390.0770.0370.047
100Likelihood 0.9970.0000.0000.003
Divergence0.050.9940.0000.0020.004
0.100.9870.0040.0030.006
0.200.9120.0340.0210.033
τ non-inclusion of the VaR estimate of the true model.

Appendix D. VaR Estimation for δ = 0.95 in the Presence of Contamination

Table A28. True VaR and the simulated average VaR estimates with associated bias and RMSEs (relative bias and relative RMSEs of divergence estimators) for δ = 0.95 , in the presence of ϵ = 0.1 contamination, for true models L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) , contaminated by L N ( 1.19 , 18.54 ) , P L ( 1.27 , 89363.2 ) and W E ( 0.79 , 84528.5 ) respectively.
Table A28. True VaR and the simulated average VaR estimates with associated bias and RMSEs (relative bias and relative RMSEs of divergence estimators) for δ = 0.95 , in the presence of ϵ = 0.1 contamination, for true models L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) , contaminated by L N ( 1.19 , 18.54 ) , P L ( 1.27 , 89363.2 ) and W E ( 0.79 , 84528.5 ) respectively.
nEstimator α Statistics (a)True Model
f θ : LN ( 1.19 , 6.81 ) PL ( 1.27 , 1117.04 ) WE ( 0.79 , 1690.57 )
f θ ( c ) : LN ( 1.19 , 18.54 ) PL ( 1.27 , 89363.2 ) WE ( 0.79 , 84528.5 )
Q θ j ( δ ) 0.3420.0050.678
50 AV L 103.94016.7698.504
Likelihood BS L T 103.59816.7642.689
MS L T 109.25217.4792.969
Divergence0.05 AV D 7.4541.9972.723
BS D 0.0690.1190.761
MS D 0.0680.1220.757
0.10 AV D 0.4260.5321.998
BS D 0.0010.0310.491
MS D 0.0010.0330.497
0.20 AV D 0.4130.3671.019
BS D 0.0010.0220.127
MS D 0.0010.0220.176
100 AV L 103.2492.3533.334
Likelihood BS L T 102.9071.6912.656
MS L T 105.6971.7782.799
Divergence0.05 AV D 7.4631.9242.673
BS D 0.0690.7470.751
MS D 0.0690.7570.749
0.10 AV D 0.4021.5831.949
BS D 0.0010.5450.479
MS D 0.0010.5700.481
0.20 AV D 0.3941.1710.997
BS D 0.0010.3010.120
MS D 0.0010.3570.145
(a) expressed as a multiplier of 10 4 .
Table A29. VaR estimates (expressed as a multiplier of 10 4 ), relative biases and relative RMSEs for k * = 1 , δ = 0.95 , when true models are L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) , contaminated with ϵ = 0.1 from L N ( 1.19 , 18.54 ) , P L ( 1.27 , 89363.2 ) and W E ( 0.79 , 84528.5 ) respectively. Values in bold indicate the smallest M S S among all the estimators.
Table A29. VaR estimates (expressed as a multiplier of 10 4 ), relative biases and relative RMSEs for k * = 1 , δ = 0.95 , when true models are L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) , contaminated with ϵ = 0.1 from L N ( 1.19 , 18.54 ) , P L ( 1.27 , 89363.2 ) and W E ( 0.79 , 84528.5 ) respectively. Values in bold indicate the smallest M S S among all the estimators.
n Likelihood
Based
Divergence Based
α = 0.05 α = 0.10 α = 0.20
True model: L N ( 1.19 , 6.81 ) ; Q ( 0.95 ) = 0.342
50 AV s 23.5261.8030.7530.308
BS s 0.2240.0140.0040.000
MS s 0.3390.0250.0100.004
100 AV s 24.7591.7620.7740.282
BS s 0.2370.0140.004−0.001
MS s 0.3480.0240.0100.004
True model: P L ( 1.27 , 1117.04 ) ; Q ( 0.95 ) = 0.005
50 AV s 23.5261.8030.7530.308
BS s 1.4030.1070.0450.018
MS s 2.1310.1680.0750.032
100 AV s 1.8871.6611.3861.008
BS s 0.7250.5910.4280.204
MS s 1.3811.2741.1390.918
True model: W E ( 0.79 , 1690.57 ) ; Q ( 0.95 ) = 0.678
50 AV s 1.6071.5901.5761.478
BS s 0.3460.3390.3340.298
MS s 0.6990.7130.7350.809
100 AV s 1.5131.4631.4161.314
BS s 0.3140.2950.2780.240
MS s 0.6490.6290.6160.594
Table A30. VaR estimates (expressed as a multiplier of 10 4 ), relative biases and relative RMSEs for k * = 2 , δ = 0.95 , when true models are L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) , contaminated with ϵ = 0.1 from L N ( 1.19 , 18.54 ) , P L ( 1.27 , 89363.2 ) and W E ( 0.79 , 84528.5 ) respectively. Values in bold indicate the smallest M S k * among all the estimators.
Table A30. VaR estimates (expressed as a multiplier of 10 4 ), relative biases and relative RMSEs for k * = 2 , δ = 0.95 , when true models are L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) , contaminated with ϵ = 0.1 from L N ( 1.19 , 18.54 ) , P L ( 1.27 , 89363.2 ) and W E ( 0.79 , 84528.5 ) respectively. Values in bold indicate the smallest M S k * among all the estimators.
n Likelihood
Based
Divergence Based
α = 0.05 α = 0.10 α = 0.20
True model: L N ( 1.19 , 6.81 ) ; Q ( 0.95 ) = 0.342
50 AV k * 16.7272.9220.5710.280
BS k * 0.1580.0250.002−0.001
MS k * 0.1580.0260.0040.002
100 AV k * 16.5952.8720.5640.265
BS k * 0.1580.0250.002−0.001
MS k * 0.1570.0250.0030.001
True model: P L ( 1.27 , 1117.04 ) ; Q ( 0.95 ) = 0.005
50 AV k * 16.7272.9220.5710.280
BS k * 0.9970.1740.0340.016
MS k * 1.0030.1790.0400.019
100 AV k * 1.6871.4331.1560.843
BS k * 0.6060.4560.2920.107
MS k * 0.6720.5620.4300.303
True model: L N ( 1.19 , 6.81 ) ; Q ( 0.95 ) = 0.342
True model: W E ( 0.79 , 1690.57 ) ; Q ( 0.95 ) = 0.678
50 AV k * 1.5701.5611.5701.654
BS k * 0.3320.3280.3320.363
MS k * 0.3650.3780.4120.782
100 AV k * 1.4801.4441.4151.405
BS k * 0.3020.2880.2780.274
MS k * 0.3160.3080.3090.343
Table A31. VaR estimates (expressed as a multiplier of 10 4 ), relative biases and relative RMSEs for k * = 3 , δ = 0.95 , when true models are L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) , contaminated with ϵ = 0.1 from L N ( 1.19 , 18.54 ) , P L ( 1.27 , 89363.2 ) and W E ( 0.79 , 84528.5 ) , respectively. Values in bold indicate the smallest M S k * among all the estimators.
Table A31. VaR estimates (expressed as a multiplier of 10 4 ), relative biases and relative RMSEs for k * = 3 , δ = 0.95 , when true models are L N ( 1.19 , 6.81 ) , P L ( 1.27 , 1117.04 ) and W E ( 0.79 , 1690.57 ) , contaminated with ϵ = 0.1 from L N ( 1.19 , 18.54 ) , P L ( 1.27 , 89363.2 ) and W E ( 0.79 , 84528.5 ) , respectively. Values in bold indicate the smallest M S k * among all the estimators.
n Likelihood
Based
Divergence Based
α = 0.05 α = 0.10 α = 0.20
True model: L N ( 1.19 , 6.81 ) ; Q ( 0.95 ) = 0.342
50 AV k * 9.2561.5100.3430.174
BS k * 0.0860.0110.000−0.002
MS k * 0.0860.0120.0020.002
100 AV k * 9.1891.4880.3360.167
BS k * 0.0860.0110.000−0.002
MS k * 0.0860.0110.0010.002
True model: P L ( 1.27 , 1117.04 ) ; Q ( 0.95 ) = 0.005
50 AV k * 9.2561.5100.3430.174
BS k * 0.5520.0900.0200.010
MS k * 0.5540.0920.0230.011
100 AV k * 1.0180.8800.7500.560
BS k * 0.2100.1290.052−0.060
MS k * 0.2680.2100.1630.153
True model: W E ( 0.79 , 1690.57 ) ; Q ( 0.95 ) = 0.678
50 AV k * 1.0231.0161.0151.609
BS k * 0.1280.1260.1250.346
MS k * 0.1740.1840.2025.728
100 AV k * 1.4491.4161.3911.391
BS k * 0.2900.2780.2680.268
MS k * 0.3030.2950.2950.326

References

  1. Abubakar, Hamza, and Shamsul Rijal Muhammad Sabri. 2023. A Bayesian approach to Weibull distribution with application to insurance claims data. Journal of Reliability and Statistical Studies 16: 1–24. [Google Scholar] [CrossRef]
  2. Adeleke, Isaac A., and Adeyinka Ibiwoye. 2011. Modeling claim sizes in personal line non-life insurance. International Business & Economics Research Journal (IBER) 10: 21–38. [Google Scholar] [CrossRef]
  3. Bahnemann, Dietrich. 1996. Distributions for Actuaries. Arlington: Casualty Actuarial Society. [Google Scholar]
  4. Basavalingappa, Anand, John M. Passage, and John R. Lloyd. 2017. Electromigration: Lognormal versus Weibull distribution. Paper presented at 2017 IEEE International Integrated Reliability Workshop (IIRW), South Lake Tahoe, CA, USA, 8–12 October; pp. 1–4. [Google Scholar] [CrossRef]
  5. Basu, Anirban, Anirban Mandal, and Leandro Pardo. 2013. Testing statistical hypotheses based on the density power divergence. Annals of the Institute of Statistical Mathematics 65: 319–48. [Google Scholar] [CrossRef]
  6. Basu, Anirban, Ian R. Harris, and M. Christopher Jones. 1998. Robust and efficient estimation by minimizing a density power divergence. Biometrika 85: 549–59. [Google Scholar] [CrossRef]
  7. Basu, Bijay, Dinesh Tiwari, and Rajiv Prasad. 2009. Is Weibull distribution the most appropriate statistical strength distribution for brittle materials? Ceramics International 35: 237–46. [Google Scholar] [CrossRef]
  8. Block, A. Daniel, and Lawrence M. Leemis. 2008. Parametric model discrimination for heavily censored survival data. IEEE Transactions on Reliability 57: 248–59. [Google Scholar] [CrossRef]
  9. Bowden, Geoffrey J., Peter R. Barker, and John W. Twidell. 1983. The weibull distribution function and wind power statistics. Wind Engineering 7: 85–98. [Google Scholar]
  10. Brazauskas, Vytautas, and Andreas Kleefeld. 2011. Folded and log-folded-t distributions as models for insurance loss data. Scandinavian Actuarial Journal 2011: 59–74. [Google Scholar] [CrossRef]
  11. Brownie, Cecilia, Jean-Pierre Habicht, and David S. Robson. 1983. An estimation procedure for the contaminated normal distributions arising in clinical chemistry. Journal of the American Statistical Association 78: 228–37. [Google Scholar] [CrossRef]
  12. Buch-larsen, Thomas, Jesper P. Nielsen, and Carles Bolancé. 2005. Kernel density estimation for heavy-tailed distributions using the Champernowne transformation. Statistics 39: 503–16. [Google Scholar] [CrossRef]
  13. Buckland, Stephen T., Kenneth P. Burnham, and Noel H. Augustin. 1997. Model selection: An integral part of inference. Biometrics 53: 603–18. [Google Scholar] [CrossRef]
  14. Burnham, Kenneth P., and David R. Anderson. 2002. Model Selection and Multimodel Inference, 2nd ed. New York: Springer. [Google Scholar] [CrossRef]
  15. Castilla, Elena, Natalia Martín, and Kostas Zografos. 2020. Model selection in a composite likelihood framework based on density power divergence. Entropy 22: 270. [Google Scholar] [CrossRef] [PubMed]
  16. Champernowne, David Gawen. 1952. The graduation of income distributions. Econometrica 20: 591–615. [Google Scholar] [CrossRef]
  17. Chatfield, Christopher. 1995. Model uncertainty, data mining and statistical inference. Journal of the Royal Statistical Society: Series A (Statistics in Society) 158: 419–44. [Google Scholar] [CrossRef]
  18. Chow, Gregory C. 1984. Maximum-likelihood estimation of misspecified models. Economic Modelling 1: 134–38. [Google Scholar] [CrossRef]
  19. Christoffersen, Peter F. 2012. Elements of Financial Risk Management, 2nd ed. San Diego: Academic Press. [Google Scholar] [CrossRef]
  20. Crow, Edwin L., and Kunio Shimizu. 1988. Lognormal Distributions—Theory and Applications. New York: Marcel Dekker. [Google Scholar]
  21. Das, Jayanta, and Dilip C. Nath. 2022. Weighted quantile regression theory and its application Weibull distribution as an actuarial risk model: Computation of its probability of ultimate ruin and the moments of the time to ruin, deficit at ruin and surplus prior to ruin. Journal of Data Science 17: 161–94. [Google Scholar] [CrossRef]
  22. Dey, Arindam, and Debasis Kundu. 2012. Discriminating between the Weibull and log-normal distributions for Type-II censored data. Statistics 46: 197–14. [Google Scholar] [CrossRef]
  23. Dormann, Carsten F., Justin M. Calabrese, Gurutzeta Guillera-Arroita, Eleni Matechou, Volker Bahn, Kamil Bartoń, Colin M. Beale, Simone Ciuti, Jane Elith, Katharina Gerstner, and et al. 2018. Model averaging in ecology: A review of Bayesian, information-theoretic, and tactical approaches for predictive inference. Ecological Monographs 88: 485–504. [Google Scholar] [CrossRef]
  24. Dumonceaux, Richard, and Charles E. Antle. 1973. Discrimination between the log-normal and the Weibull distributions. Technometrics 15: 923–26. [Google Scholar] [CrossRef]
  25. Fisk, Percy R. 1961. The graduation of income distributions. Econometrica 29: 171–85. [Google Scholar] [CrossRef]
  26. Fomby, Thomas, and Ray C. Hill. 2003. Maximum Likelihood Estimation of Misspecified Models: Twenty Years Later. Bingley: Emerald Group Publishing Limited. [Google Scholar]
  27. Ghosh, Aniruddha, Ian R. Harris, and Leandro Pardo. 2017. A generalized divergence for statistical inference. Bernoulli 23: 2746–83. [Google Scholar] [CrossRef]
  28. Gustafson, Paul. 2001. On measuring sensitivity to parametric model misspecification. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63: 81–94. [Google Scholar] [CrossRef]
  29. Hewitt, Charles C., Jr., and Benjamin Lefkowitz. 1979. Methods for fitting distributions to insurance loss data. Proceedings of the Casualty Actuarial Society 66: 139–60. [Google Scholar]
  30. Hogg, Robert V., and Stuart A. Klugman. 1984. Loss Distributions. Wiley Series in Probability and Statistics. Hoboken: John Wiley & Sons, Ltd. [Google Scholar] [CrossRef]
  31. Johnson, Norman L., Samuel Kotz, and N. Balakrishnan. 1994. Continuous Univariate Distributions, Volume 1, 2nd ed. New York: John Wiley & Sons. [Google Scholar]
  32. Jones, M. Christopher, Nils L. Hjort, and Anirban Basu. 2001. A comparison of related density-based minimum divergence estimators. Biometrika 88: 865–73. [Google Scholar] [CrossRef]
  33. Jorion, Philippe. 2006. Value at Risk: The New Benchmark for Managing Financial Risk, 3rd ed. New York: McGraw Hill LLC. [Google Scholar]
  34. Kaas, Rob, Marc Goovaerts, and Michel Denuit. 2008. Modern Actuarial Risk Theory, 2nd ed. Berlin/Heidelberg: Springer. [Google Scholar] [CrossRef]
  35. Kakwani, Nanak, and Hai H. Son. 2022. Economic Inequality and Poverty—Facts, Methods, and Policies. Oxford: Oxford University Press. [Google Scholar] [CrossRef]
  36. Kakwani, Nanak Chand. 1980. Income Inequality and Poverty. Technical Report. Washington, DC: World Bank Research Publication. [Google Scholar]
  37. Karagrigoriou, Athanasios, and Konstantinos Mattheou. 2009. Measures of divergence in model selection. In Advances in Data Analysis: Theory and Applications to Reliability and Inference, Data Mining, Bioinformatics, Lifetime Data, and Neural Networks. New York: Springer, pp. 51–65. [Google Scholar]
  38. Kass, Robert E., and Adrian E. Raftery. 1995. Bayes factors. Journal of the American Statistical Association 90: 773–95. [Google Scholar] [CrossRef]
  39. Kleiber, Christian, and Samuel Kotz. 2003. Statistical Size Distributions in Economics and Actuarial Sciences. Hoboken: John Wiley & Sons. [Google Scholar]
  40. Klugman, Stuart A., Harry H. Panjer, and Gordon E. Willmot. 2012. Loss Models: From Data to Decisions. Wiley Series in Probability and Statistics. Hoboken: John Wiley & Sons. [Google Scholar]
  41. Kotz, Samuel, and Saralees Nadarajah. 2000. Extreme Value Distributions: Theory and Applications. Singparore: World Scientific. [Google Scholar]
  42. Kreer, Markus, Ayşe Kızılersü, Anthony W. Thomas, and Alfredo D. Egídio dos Reis. 2015. Goodness-of-fit tests and applications for left-truncated Weibull distributions to non-life insurance. European Actuarial Journal 5: 139–63. [Google Scholar] [CrossRef]
  43. Kurata, Shuichi, and Emiko Hamada. 2020. On the consistency and the robustness in model selection criteria. Communications in Statistics - Theory and Methods 49: 5175–95. [Google Scholar] [CrossRef]
  44. Li, Ling, Hon Keung Tony Ng, Ali H. Algarni, Abdullah M. Almarashi, and Zaher A. Abo-Eleneen. 2020. A model-ranking approach for estimation based on accelerated degradation test data. IEEE Transactions on Reliability 69: 484–96. [Google Scholar] [CrossRef]
  45. Ling, Mun Hon, and N. Balakrishnan. 2017. Model mis-specification analyses of Weibull and Gamma models based on one-shot device test data. IEEE Transactions on Reliability 66: 641–50. [Google Scholar] [CrossRef]
  46. Lomax, Kenneth S. 1954. Business failures: Another example of the analysis of failure data. Journal of the American Statistical Association 49: 847–52. [Google Scholar] [CrossRef]
  47. Loss Data Analytics Core Team. 2020. Loss Data Analytics: An open text authored by the Actuarial Community, Version 1.1 ed.
  48. Luko, Stephen N. 1999. A review of the Weibull distribution and selected engineering applications. SAE Transactions 108: 398–412. [Google Scholar]
  49. Mehrabani, Ali, and Aman Ullah. 2020. Improved average estimation in seemingly unrelated regressions. Econometrics 8: 15. [Google Scholar] [CrossRef]
  50. Miljkovic, Tanja. 2025. Premium Estimation Under Model Uncertainty: Model Averaging for Left-Truncated Reinsurance Losses. Variance 18: 1–24. [Google Scholar]
  51. Miljkovic, Tanja, and Bettina Grün. 2021. Using model averaging to determine suitable risk measure estimates. North American Actuarial Journal 25: 562–79. [Google Scholar] [CrossRef]
  52. Mudholkar, Gopal S., and Devendra K. Srivastava. 1993. Exponentiated weibull family for analyzing bathtub failure-rate data. IEEE Transactions on Reliability 42: 299–302. [Google Scholar] [CrossRef]
  53. Nadarajah, Saralees, and Samuel Kotz. 2003. The exponentiated Fréchet distribution. Interstat Electronic Journal 14: 1–7. [Google Scholar]
  54. Okoli, Kenechukwu, Korbinian Breinl, Luigia Brandimarte, Anna Botto, Elena Volpi, and Giuliano Di Baldassarre. 2018. Model averaging versus model selection: Estimating design floods with uncertain river flow data. Hydrological Sciences Journal 63: 1913–26. [Google Scholar] [CrossRef]
  55. Pasari, Sumanta. 2018. Stochastic modelling of earthquake interoccurrence times in northwest himalaya and adjoining regions. Geomatics, Natural Hazards and Risk 9: 568–88. [Google Scholar] [CrossRef]
  56. Punzo, Antonio, and Cristina Tortora. 2021. Multiple scaled contaminated normal distribution and its application in clustering. Statistical Modelling 21: 332–58. [Google Scholar] [CrossRef]
  57. Raftery, A. E. 1995. Bayesian model selection in social research. Sociological Methodology 25: 111–63. [Google Scholar] [CrossRef]
  58. R Core Team. 2025. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. [Google Scholar]
  59. Rinne, H. 2008. The Weibull Distribution—A Handbook. Boca Raton: CRC Press. [Google Scholar]
  60. Rolski, Tomasz, Hanspeter Schmidli, Volker Schmidt, and Jozef L. Teugels. 1999. Stochastic Processes for Insurance & Finance. Chichester: John Wiley & Sons. [Google Scholar]
  61. Singh, Surendra K., and Gary S. Maddala. 1976. A function for size distribution of incomes. Econometrica 44: 963–70. [Google Scholar] [CrossRef]
  62. Smith, Richard L. 2003. Statistics of extremes, with applications in environment, insurance, and finance. In Extreme Values in Finance, Telecommunications, and the Environment. Edited by B. Finkenstadt and H. Rootzen. Boca Raton: Chapman and Hall/CRC, pp. 20–97. [Google Scholar] [CrossRef]
  63. Steel, Mark F. J. 2020. Model averaging and its use in economics. Journal of Economic Literature 58: 644–719. [Google Scholar] [CrossRef]
  64. Tadikamalla, Pandu R. 1978. Applications of the Weibull distribution in inventory control. Journal of the Operational Research Society 29: 77–83. [Google Scholar] [CrossRef]
  65. Tse, Yiu-Kuen. 2023. Nonlife Actuarial Models: Theory, Methods and Evaluation. Cambridge: Cambridge University Press. [Google Scholar] [CrossRef]
  66. Wais, Piotr. 2017. A review of Weibull functions in wind sector. Renewable and Sustainable Energy Reviews 70: 1099–1107. [Google Scholar] [CrossRef]
  67. Wang, Yinzhi, Ingrid Hobæk Haff, and Arne Huseby. 2020. Modelling extreme claims via composite models and threshold selection methods. Insurance: Mathematics and Economics 91: 257–68. [Google Scholar] [CrossRef]
  68. Wellmann, Järgen, and Ursula Gather. 1999. A note on contamination models and outliers. Communications in Statistics—Theory and Methods 28: 1793–1802. [Google Scholar] [CrossRef]
  69. White, Halbert. 1982. Maximum likelihood estimation of misspecified models. Econometrica 50: 1–25. [Google Scholar] [CrossRef]
  70. Wintle, Brendan A., Michel A. McCarthy, Chris T. Volinsky, and Rodney P. Kavanagh. 2003. The use of Bayesian model averaging to better represent uncertainty in ecological models. Conservation Biology 17: 1579–90. [Google Scholar] [CrossRef]
  71. Wolny-Dominiak, Alicja, and Michal Trzesiok. 2014. insuranceData: A Collection of Insurance Datasets Useful in Risk Classification in Non-life Insurance, R Package Version 1.0.
  72. Yu, Daoping. 2020. Comparing model selection criteria to distinguish truncated operational risk models. Frontiers in Applied Mathematics and Statistics 6: 557971. [Google Scholar] [CrossRef]
  73. Zelinková, Katerina. 2015. Fitting probability distributions to market risk and insurance risk. Central European Review of Economics 18: 168–74. [Google Scholar]
Figure 1. Fitted (a) CDFs and (b) PDFs of the six candidate models based on MLEs of parameters of the dataCar data.
Figure 1. Fitted (a) CDFs and (b) PDFs of the six candidate models based on MLEs of parameters of the dataCar data.
Risks 13 00231 g001
Figure 2. Fitted CDFs of the six candidate models based on MLEs of parameters of the dataOhlsson data.
Figure 2. Fitted CDFs of the six candidate models based on MLEs of parameters of the dataOhlsson data.
Risks 13 00231 g002
Table 1. Point estimates and associated MLEs of VaR values (expressed in multiples of 10 3 ), denoted as Q ^ θ j ( · ) , of the considered models for δ = 0.95 and 0.99 for dataCar #.
Table 1. Point estimates and associated MLEs of VaR values (expressed in multiples of 10 3 ), denoted as Q ^ θ j ( · ) , of the considered models for δ = 0.95 and 0.99 for dataCar #.
DistributionShapeScale Q ^ θ j ( 0.95 ) Q ^ θ j ( 0.99 )
F S 1.43834.956.5120.60
F R 1.05518.758.7140.97
L N 1.196.817.3318.74
L M 2.042202.856.4214.43
P L 1.271117.046.5418.66
W E 0.791690.576.8311.81
# available in the insuranceData package in R.
Table 2. VaR estimates (in 10 3 ) of all the models for δ = 0.95 and 0.99, along with weighted VaR estimates, in addition to the associated model ranks based on likelihood and divergence method when k * = 2 , 3 , respectively, for the dataCar dataset.
Table 2. VaR estimates (in 10 3 ) of all the models for δ = 0.95 and 0.99, along with weighted VaR estimates, in addition to the associated model ranks based on likelihood and divergence method when k * = 2 , 3 , respectively, for the dataCar dataset.
Method α Q ^ j ( δ ) or Q ˜ j ( δ ) Q ^ * ( δ ) or Q ˜ * ( δ )
FS FR LM LN PL WE k * = 2 k * = 3
δ = 0.95
Ranks  315246
Likelihood 6.518.717.336.426.546.837.567.21
Divergence0.056.318.897.316.146.336.367.527.12
0.106.048.947.295.866.075.877.406.95
0.205.448.726.995.245.464.886.996.48
δ = 0.99
Likelihood 20.6040.9718.7414.4318.6611.8127.6625.29
Divergence0.0519.7842.4118.6313.6817.8410.7728.0725.31
0.1018.7142.9318.5412.8916.849.7327.9624.89
0.2016.3241.7417.1111.2414.517.7926.6123.21
The model ranks are the same for the likelihood and all the values of α used for the divergence-based method for both the values of δ considered.
Table 3. VaR estimates (in 10 4 ) of all the models for δ = 0.95 , 0.99 along with weighted VaR estimates based on likelihood and divergence method when k * = 2 and 3, respectively, for the dataOhlsson dataset.
Table 3. VaR estimates (in 10 4 ) of all the models for δ = 0.95 , 0.99 along with weighted VaR estimates based on likelihood and divergence method when k * = 2 and 3, respectively, for the dataOhlsson dataset.
Method δ α Q ^ j ( δ ) or Q ˜ j ( δ ) Q ^ * ( δ ) or Q ˜ * ( δ )
FS FR LM LN PL WE k * = 2 k * = 3
Likelihood0.95 14.8169.5710.9912.8313.669.4611.1511.10
0.99 69.331190.6335.7738.5959.0217.5228.0530.62
Ranks →563142
Divergence0.950.0515.7682.5812.1113.5314.729.3012.8213.45
0.1016.4995.4512.5914.1915.689.1213.3914.15
0.2017.19121.2414.4915.3017.168.6614.9015.65
0.990.0576.091469.9943.8141.3666.6717.0942.5950.61
0.1081.451774.3146.8743.9673.8516.6545.4154.89
0.2086.982452.4961.3248.4284.9715.5554.8664.89
Ranks 462135
The model ranks are the same for all the values of α used for the divergence-based method.
Table 4. Model ranking for four candidate models with two or three parameters based on AIC and R C C α for the vehicle insurance dataset.
Table 4. Model ranking for four candidate models with two or three parameters based on AIC and R C C α for the vehicle insurance dataset.
ModelsAIC RCC α
α = 0.05 α = 0.1 α = 0.2
F R 77,195.22−122,251−40,651.3−9130.11
E F 76,956.88−122,350−40,750.3−9208.57
W E 78,987.19−120,976−39,756.6−8699.82
E W 77,304.19−122,147−40,571.325.31705
Table 5. VaR estimates (in 1000) based on likelihood and divergence method for δ = 0.95 for the four considered models.
Table 5. VaR estimates (in 1000) based on likelihood and divergence method for δ = 0.95 for the four considered models.
Method α Q ^ j ( 0.95 ) or Q ˜ j ( 0.95 ) Q ^ * ( 0.95 ) or Q ˜ * ( 0.95 )
FR EF WE EW k * = 2 k * = 3
Likelihood 8.71219.4456.8307.11814.06511.745
Divergence0.058.89515.1426.3646.95912.02010.334
0.18.94318.6195.8666.94413.78711.511
Ranks 2143
0.28.72921.4784.8840.00115.13411.837
Ranks2134
The model ranks are the same for the likelihood method as well as the divergence-based method for α = 0.05 and 0.10 .
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Basu, S.; Ng, H.K.T. Model Misspecification and Data-Driven Model Ranking Approach for Insurance Loss and Claims Data. Risks 2025, 13, 231. https://doi.org/10.3390/risks13120231

AMA Style

Basu S, Ng HKT. Model Misspecification and Data-Driven Model Ranking Approach for Insurance Loss and Claims Data. Risks. 2025; 13(12):231. https://doi.org/10.3390/risks13120231

Chicago/Turabian Style

Basu, Suparna, and Hon Keung Tony Ng. 2025. "Model Misspecification and Data-Driven Model Ranking Approach for Insurance Loss and Claims Data" Risks 13, no. 12: 231. https://doi.org/10.3390/risks13120231

APA Style

Basu, S., & Ng, H. K. T. (2025). Model Misspecification and Data-Driven Model Ranking Approach for Insurance Loss and Claims Data. Risks, 13(12), 231. https://doi.org/10.3390/risks13120231

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop