Model Misspecification and Data-Driven Model Ranking Approach for Insurance Loss and Claims Data

Basu, Suparna; Ng, Hon Keung Tony

doi:10.3390/risks13120231

Open AccessArticle

Model Misspecification and Data-Driven Model Ranking Approach for Insurance Loss and Claims Data

by

Suparna Basu

^1,*

and

Hon Keung Tony Ng

^2,*

¹

Department of Statistics, M.M.V, Banaras Hindu University, Varanasi 221005, India

²

Department of Mathematical Sciences, Bentley University, Waltham, MA 02452, USA

^*

Authors to whom correspondence should be addressed.

Risks 2025, 13(12), 231; https://doi.org/10.3390/risks13120231 (registering DOI)

Submission received: 30 August 2025 / Revised: 4 November 2025 / Accepted: 18 November 2025 / Published: 28 November 2025

(This article belongs to the Special Issue Financial Risk, Actuarial Science, and Applications of AI Techniques)

Download

Browse Figures

Versions Notes

Abstract

Statistical models are crucial in analyzing insurance loss and claims data, offering insights into various risk elements. The prevailing statistical notion that “all models are wrong, but some are useful” can wield significant influence, particularly when multiple competing statistical models are considered. This becomes particularly pertinent when all models portray similar characteristics within specific subsets of the support of the random variable under scrutiny. Since the actual model is unknown in practical scenarios, the challenge of model selection becomes daunting, complicating the study of associated characteristics of the actual data generation process. To address these challenges, the concept of model averaging is embraced. Often, averaging over multiple models helps alleviate the risk of model misspecification, as different models may capture distinct aspects of the data or modeling assumptions. This enhances the robustness of the estimation process, yielding a more accurate and reasonable estimate compared to relying solely on a single model. This paper introduces two novel data-based model selection methods—one using the likelihood function and the other using the density power divergence measure. The study focuses on estimating the Value-at-Risk (VaR) for non-life insurance claim size data, providing comprehensive insights into potential losses for insurers. The performance of the proposed procedures is evaluated through Monte Carlo simulations under both uncontaminated conditions and in the presence of data contamination. Additionally, the applicability of the methods is illustrated using two real non-life insurance datasets, with the VaR values estimated at different confidence levels.

Keywords:

contamination models; maximum likelihood estimation; minimum density power divergence estimation; model selection; statistical distributions; Value-at-Risk

1. Introduction

In the field of risk and insurance, probability theory and statistical models play important roles in various aspects, such as determining insurance pricing, assessing financial risks, and understanding claim size, claim count, and aggregate-loss distributions (Bahnemann 1996). Due to the stochastic nature of insurance data, the underlying data generation mechanism is always unknown, and choosing a suitable probability model to fit insurance data is a challenging problem. The statistical literature has considered different approaches to handling model uncertainty in such cases. For instance, one may use a statistical model with a larger number of parameters to increase flexibility and then test the significance of the added parameters. Another approach is to impose prior probability distributions on the model parameters in a Bayesian paradigm. While these methods offer benefits and advantages, they also come with their own set of drawbacks. For instance, increasing the number of model parameters amplifies model complexity and presents challenges in parameter estimation. Additionally, the roles of these added parameters in determining the distribution’s location, scale, and shape may become ambiguous. Furthermore, misspecification of the underlying probability model in fitting insurance data may lead to a substantial loss of efficiency in statistical analysis, inaccurate conclusions, and improper decisions. Therefore, it is crucial to identify the best-fit model from a pool of candidate models and understand the consequences of model misspecification in analyzing insurance risk data. This understanding significantly overcomes the negative impact of model misspecification and leads to feasible estimates of the characteristics under study. An interesting insight into the disparity arising from incorrect model identification followed by misspecified inferences can be found in Gustafson (2001).

The consequences of model misspecification have been explored across different domains in the statistical literature. Dumonceaux and Antle (1973) was the first to discuss the effects of model misspecification, in the form of a statistical hypothesis test problem, for analyzing the lifetimes of ball bearings using two competing probability models. Several studies have been dedicated to model discrimination and misspecification themes between two competing models. An interesting application analyzing electromigration failure in the interconnects of integrated circuits was presented by Basavalingappa et al. (2017). Besides such experiments being conducted in accelerated settings, followed by transforming the observed failures to represent lifetimes in regular conditions, the occurrence of failures in ICs is also sporadic. Another study in the context of a rare event, i.e., modeling high magnitude earthquakes in the Himalayan regions and the subsequent effects of model misspecification, can be found in Pasari (2018). The problem of model misspecification has also been explored for censored-sample cases (Block and Leemis 2008; Dey and Kundu 2012). Some other intriguing directions of model misspecification include the analysis of strength distribution for brittle materials by Basu et al. (2009) and one-shot device test data by Ling and Balakrishnan (2017). A comprehensive investigation into the impact of misspecification of statistical models in insurance and risk management, particularly concerning censored, truncated, and outlier-laden insurance loss and claims data, is of practical interest. Following the insights of scholars such as White (1982), Chow (1984), and Fomby and Hill (2003), we examine the misspecification of statistical models and investigate the behavior of point estimators for key risk measures.

Before conducting statistical inference (such as estimation, prediction, hypothesis testing, etc.) on insurance data, it is often assumed that the data originate from a specific parametric probability model on which the analysis is based. A vast array of probability distributions is available in the literature for fitting insurance data. For instance, distributions such as gamma, log-gamma, lognormal, and Weibull are commonly employed to model insurance claim size, while Poisson, negative binomial, and Delaporte distributions are often used to model insurance claim counts (Brazauskas and Kleefeld 2011; Hewitt and Lefkowitz 1979; Rolski et al. 1999; Zelinková 2015). However, in practical applications, when employing a statistical distribution for data analysis, the underlying probability distributions of various risk elements and the stochastic mechanisms that generate loss and claims data are often unknown. Additionally, the probability model may be misspecified, and the insurance datasets may be contaminated or contain outliers.

Broadly, an insurance company encounters two types of fund claim variables. One is the extent of the payout, in monetary terms, commonly referred to as the claim severity (size) variable, and the other is the variable representing the number of claims (also known as claim frequency) arising in a given period (Loss Data Analytics Core Team 2020). In this study, our interest lies in exploring the risks associated with the amount paid for the claim, commonly referred to as the claim size. An important characteristic of a claim severity model is its ability to represent the worst potential loss due to adverse events such as natural disasters, accidents, or other unforeseen circumstances that an insurer might experience over a specified duration at a certain confidence level

δ \in (0, 1)

, commonly referred to as Value-at-Risk (VaR) (Jorion 2006, chap. 5; Christoffersen 2012). Thus, VaR helps assess the potential financial impact of extreme but plausible events over a specific time horizon. By quantifying the maximum expected loss within a certain confidence level (e.g.,

95 %

or

99 %

), insurers can better understand their exposure to risk and make informed decisions regarding capital reserves, reinsurance arrangements, and pricing strategies. Let X be the random variable representing the loss size. For a pre-specified value

δ \in (0, 1)

(e.g.,

δ = 0.95

), a general definition of VaR with confidence level

100 δ %

is

Pr (X > V a R) \leq 1 - δ

. Thus, for a confidence level of

100 δ %

, the VaR can be interpreted as the threshold loss such that the probability of incurring a loss greater than this amount is at most

1 - δ

. If the loss variable X has CDF

F_{θ} (\cdot)

with parameter vector

θ

, then its quantile function is denoted by

Q_{θ} (δ) = F_{θ}^{- 1} (δ)

(Tse 2023). Therefore, the VaR with confidence level

100 δ %

corresponds to the

100 δ %

quantile of the loss distribution, i.e.,

V a R = Q_{θ} (δ)

, for a given time horizon.

In the context of the importance of selecting a suitable distribution for analyzing insurance data, Yu (2020) considered six parametric distributions for modeling and analyzing operational loss data related to external fraud types in retail banking branches of major commercial banks in China from 2009 to 2015. They observed that different distributional assumptions yielded significantly different VaR estimates, leading to inconsistent conclusions. As emphasized by Wang et al. (2020), selecting an appropriate claim severity distribution is crucial, especially in the presence of extreme values. Hogg and Klugman (1984) investigated claim severities in non-life insurance, such as those resulting from hurricane damage or automobile accidents, and noted their tendency to exhibit a heavy-tailed skewed nature. Various heavy-tailed as well as regular models, including gamma, lognormal, log-gamma, Weibull, Pareto, and log-logistic, have been employed in modeling non-life insurance data (see, for example, Kleiber and Kotz 2003; Klugman et al. 2012, and the references cited therein). Additionally, when claim sizes are large, settlements are often delayed due to case-by-case handling, necessitating consideration of the role of inflation, which can contaminate claim sizes (Hogg and Klugman 1984; Kaas et al. 2008). Hence, a robust estimation procedure is required to analyze claim severities.

In this paper, we aim to examine the effects of model misspecification on modeling insurance data. Since the underlying data-generating mechanism of insurance data is often unknown, and such data frequently contain influential observations and/or outliers, we propose two robust weighted estimation criteria based on likelihood and divergence measures. We then compare the performance of the proposed methods through Monte Carlo simulation. Our goal is to advance the understanding of model fitting and analysis for insurance and risk data, as well as the related inferential methods, and help practicing actuaries and academic actuaries better understand the effects and consequences of model misspecification when model uncertainty is present. With an emphasis on practical implementation and pedagogical clarity, this paper provides accessible and robust tools for practitioners and educators in actuarial science and risk analysis.

The rest of this paper is organized as follows. Section 2 provides an overview of the probability models used in this study, along with a brief review of their applications for non-life insurance claim size data. In Section 3, we discuss the estimation of model parameters and VaR using the likelihood method and density power divergence (DPD) method. Section 4 introduces the rationale of the proposed data-driven model ranking and selection approach based on likelihood and DPD measures, followed by a Monte Carlo simulation study in Section 5 that details the performance of the proposed procedures, considering both the presence and absence of data contamination. Section 6 presents two real data analyses to illustrate the proposed model ranking and selection method developed in this work. Finally, in Section 8, we provide concluding remarks and directions for future work.

2. Probability Models for Claim Severity

In this section, we consider six commonly used probability models in insurance and finance, namely the Fisk, Fréchet, Lomax, lognormal, paralogistic, and Weibull distributions, to model the severity of claims for non-life insurance data. For notation convenience, we denote these six models as

M_{j}; j = 1, \dots, 6

in the remaining length of the article, where

M_{1}, M_{2}, M_{3}, M_{4}, M_{5}

and

M_{6}

represents the Fisk, Fréchet, Lomax, lognormal, paralogistic, and Weibull distributions, respectively. The corresponding probability density functions (PDFs), cumulative distribution functions (CDFs), and VaR values of model

M_{j}

with parameter vectors

θ_{j}

are denoted as

f_{θ_{j}} (\cdot)

,

F_{θ_{j}} (\cdot)

and

Q_{θ_{j}} (δ)

,

δ \in (0, 1)

(

j = 1, \dots, 6

), respectively. Although we primarily considered six probability distributions in this paper, the methodologies developed can be applied to any candidate models with different probability distributions. Moreover, the methodologies can be extended to candidate statistical models with different numbers of parameters by suitably penalizing models with more parameters based on the practitioner’s preference and the parsimony principle. Additionally, the methodologies discussed here can be extended directly to truncated probability distributions, which are often more appropriate for claims subject to deductibles or policy limits.

2.1. Fisk Distribution

The Fisk distribution was originally proposed by Fisk (1961) as a limiting form of the Champernowne distribution Champernowne (1952) in connection with modeling income. A random variable X following a Fisk distribution (denoted as

F S (β_{1}, λ_{1})

) has PDF, CDF, and VaR

Q_{θ_{1}} (δ)

\begin{matrix} f_{θ_{1}} (x) & = & \{\begin{matrix} \frac{β_{1} λ_{1}^{β_{1}} x^{β_{1} - 1}}{{(x^{β_{1}} + λ_{1}^{β_{1}})}^{2}}, & for x > 0; \\ 0, & otherwise, \end{matrix} \end{matrix}

(1)

\begin{matrix} F_{θ_{1}} (x) & = & \{\begin{matrix} \frac{x^{β_{1}}}{x^{β_{1}} + λ_{1}^{β_{1}}}, & for x > 0; \\ 0, & otherwise, \end{matrix} \end{matrix}

(2)

\begin{matrix} Q_{θ_{1}} (δ) & = & \{\begin{matrix} λ_{1} {(\frac{δ}{1 - δ})}^{1 / β_{1}}, & for 0 < δ < 1; \\ 0, & otherwise, \end{matrix} \end{matrix}

(3)

respectively, where

θ_{1} = (β_{1}, λ_{1})

is the parameter vector with

β_{1} > 0

as the shape parameter and

λ_{1} > 0

representing the median of the distribution.

The Fisk distribution is also known as the log-logistic distribution because it can be obtained by using the logarithmic transformation of the logistic distribution. The Fisk distribution offers insight into the distribution of income sizes, derived from the reasoning of income generation processes Singh and Maddala (1976). In the context of income inequality, the Fisk distribution provides a mathematical representation of the unequal distribution of income. For more details on income distribution and income inequality, the readers may refer to Kakwani (1980) and Kakwani and Son (2022). Buch-larsen et al. (2005) discussed the suitability of the Fisk distribution to model non-life insurance data since it is capable of accommodating small as well as large claim sizes, owing to its heavy-tailed nature, which is otherwise a challenge for regular models.

2.2. Fréchet Distribution

The Fréchet distribution, named after mathematician Maurice René Fréchet, is a probability distribution derived as the limiting distribution of extreme values. It is also known as the inverse Weibull distribution, which has various applications in economics, hydrology, and meteorology to model extreme events or phenomena. It is often employed to describe the distribution of extreme values in a dataset, such as maximum wind speeds, flood levels, or income levels (Kotz and Nadarajah 2000).

A random variable X follows a Fréchet distribution (denoted as

F R (β_{2}, λ_{2})

) with PDF, CDF, and VaR

Q_{θ_{2}} (δ)

:

\begin{matrix} f_{θ_{2}} (x) & = & \{\begin{matrix} β_{2} λ_{2}^{β_{2}} x^{- (β_{2} + 1)} e^{- {(x / λ_{2})}^{- β_{2}}}, & for x > 0; \\ 0, & otherwise, \end{matrix} \end{matrix}

(4)

\begin{matrix} F_{θ_{2}} (x) & = & \{\begin{matrix} e^{- {(x / λ_{2})}^{- β_{2}}}, & for x > 0; \\ 0, & otherwise, \end{matrix} \end{matrix}

(5)

\begin{matrix} Q_{θ_{2}} (δ) & = & \{\begin{matrix} λ_{2} {(- ln δ)}^{- 1 / β_{2}}, & for 0 < δ < 1; \\ 0, & otherwise, \end{matrix} \end{matrix}

(6)

where

θ_{2} = (β_{2}, λ_{2})

is a parameter vector,

λ_{2} > 0

is the scale parameter and

β_{2} > 0

is the shape parameter.

2.3. Lomax Distribution

The Lomax distribution was introduced by K. S. Lomax (1954), which is commonly employed in modeling income distributions, insurance claim sizes, and other phenomena where observations are bounded from below and exhibit heavy-tailed behavior. This distribution offers valuable insights into extreme events and tail behavior, making it a valuable tool in various analytical and modeling contexts. A random variable X has a Lomax distribution (denoted as

L M (β_{3}, λ_{3})

) with PDF, CDF, and VaR

Q_{θ_{3}} (δ)

:

\begin{matrix} f_{θ_{3}} (x) & = & \{\begin{matrix} β_{3} λ_{3}^{β_{3}} {(x + λ_{3})}^{- (β_{3} + 1)}, & for x > 0; \\ 0, & otherwise, \end{matrix} \end{matrix}

(7)

\begin{matrix} F_{θ_{3}} (x) & = & \{\begin{matrix} 1 - {(\frac{λ_{3}}{x + λ_{3}})}^{β_{3}}, & for x > 0; \\ 0, & otherwise, \end{matrix} \end{matrix}

(8)

\begin{matrix} Q_{θ_{3}} (δ) & = & \{\begin{matrix} λ_{3} ({(1 - δ)}^{- 1 / β_{3}} - 1), & for 0 < δ < 1; \\ 0, & otherwise, \end{matrix} \end{matrix}

(9)

respectively, where

θ_{3} = (β_{3}, λ_{3})

is a parameter vector,

λ_{3} > 0

is the scale parameter and

β_{3} > 0

is the shape parameter.

2.4. Lognormal Distribution

The lognormal distribution is often used to model phenomena where the logarithm of the variable of interest is normally distributed, such as the distribution of stock prices, income levels, or the size of biological populations. Mathematically, if a random variable

Z = ln X

is normally distributed, then the distribution of X is said to be lognormal. It is particularly useful when dealing with variables that are inherently positive and skewed, as the lognormal distribution naturally accommodates these characteristics. The lognormal distribution is also known as the Galton distribution. For more details on lognormal distribution, one may refer to Chapter 14 of Johnson et al. (1994), the book by Crow and Shimizu (1988), and the references therein.

The PDF, CDF, and VaR

Q_{θ_{4}} (δ)

of a lognormal variate X (denoted as

L N (β_{4}, λ_{4})

) are

\begin{matrix} f_{θ_{4}} (x) & = & \{\begin{matrix} \frac{1}{\sqrt{2 π} β_{4} x} exp [- \frac{{(ln x - λ_{4})}^{2}}{2 β_{4}^{2}}], & for x > 0; \\ 0, & otherwise, \end{matrix} \end{matrix}

(10)

\begin{matrix} F_{θ_{4}} (x) & = & \{\begin{matrix} Φ (\frac{ln x - λ_{4}}{β_{4}}), & for x > 0; \\ 0, & otherwise, \end{matrix} \end{matrix}

(11)

\begin{matrix} Q_{θ_{4}} (δ) & = & \{\begin{matrix} exp (λ_{4} + β_{4} Φ^{- 1} (δ)), & for 0 < δ < 1; \\ 0, & otherwise, \end{matrix} \end{matrix}

(12)

respectively, where

θ_{4} = (β_{4}, λ_{4})

is a parameter vector with

exp (λ_{4})

as the scale parameter,

β_{4} > 0

as the shape parameter,

Φ (\cdot)

and

Φ^{- 1} (\cdot)

representing the CDF and inverse CDF (quantile function) of the standard normal distribution.

2.5. Paralogistic Distribution

The paralogistic distribution is a statistical distribution used primarily in econometrics, finance, and risk management, which is characterized by its flexibility in modeling skewed or heavy-tailed data. As demonstrated by Kleiber and Kotz (2003), the paralogistic distribution can be applied to economic modeling of wealth and income. The paralogistic distribution allows for asymmetry and heavier tails, which provides flexibility in capturing the tail behavior of the data.

A random variable X has a paralogistic distribution (denoted as

P L (β_{5}, λ_{5})

) with PDF, CDF, and VaR

Q_{θ_{5}} (δ)

:

\begin{matrix} f_{θ_{5}} (x) & = & \{\begin{matrix} \frac{β_{5}^{2} {(x / λ_{5})}^{β_{5}}}{x {(1 + {(x / λ_{5})}^{β_{5}})}^{β_{5} + 1}}, & for x > 0; \\ 0, & otherwise, \end{matrix} \end{matrix}

(13)

\begin{matrix} F_{θ_{5}} (x) & = & \{\begin{matrix} 1 - {(1 + {(x / λ_{5})}^{β_{5}})}^{- β_{5}}, & for x > 0; \\ 0, & otherwise, \end{matrix} \end{matrix}

(14)

\begin{matrix} Q_{θ_{5}} (δ) & = & \{\begin{matrix} λ_{5} {({(1 - δ)}^{- 1 / β_{5}} - 1)}^{1 / β_{5}}, & for 0 < δ < 1; \\ 0, & otherwise, \end{matrix} \end{matrix}

(15)

respectively, where

θ_{5} = (β_{5}, λ_{5})

is a parameter vector,

λ_{5} > 0

is the scale parameter and

β_{5} > 0

is the shape parameter.

2.6. Weibull Distribution

Weibull distribution has been used for analyzing data from different domains like life-testing experiments, inter-arrival times of earthquakes, wind speed data across different seasons and years, lead time demand in inventory control, water pollution levels, size of non-life insurance claims (Abubakar and Sabri 2023; Bowden et al. 1983; Das and Nath 2022; Adeleke and Ibiwoye 2011; Kreer et al. 2015; Luko 1999; Smith 2003; Tadikamalla 1978; Wais 2017). The theory of extreme values shows that the Weibull distribution can be used to model the minimum of many independent positive random variables from a specific class of statistical distributions. For well-documented, detailed references on the theory and applications of the Weibull distribution, one may refer to Chapter 21 of Johnson et al. (1994) and the book by Rinne (2008).

A random variable X is said to follow a Weibull distribution (denoted as

W E (β_{6}, λ_{6})

) with PDF, CDF, and VaR

Q_{θ_{6}} (δ)

:

\begin{matrix} f_{θ_{6}} (x) & = & \{\begin{matrix} \frac{β_{6}}{λ_{6}} {(\frac{x}{λ_{6}})}^{β_{6} - 1} exp [- {(x / λ_{6})}^{β_{6}}], & for x > 0; \\ 0, & otherwise, \end{matrix} \end{matrix}

(16)

\begin{matrix} F_{θ_{6}} (x) & = & \{\begin{matrix} 1 - exp [- {(x / λ_{6})}^{β_{6}}], & for x > 0; \\ 0, & otherwise, \end{matrix} \end{matrix}

(17)

\begin{matrix} Q_{θ_{6}} (δ) & = & \{\begin{matrix} λ_{6} {(- ln (1 - δ))}^{1 / β_{6}}, & for 0 < δ < 1; \\ 0, & otherwise, \end{matrix} \end{matrix}

(18)

respectively, where

θ_{6} = (β_{6}, λ_{6})

is a parameter vector,

λ_{6} > 0

is the scale parameter and

β_{6} > 0

is the shape parameter.

3. Motivation and Estimation of Model Parameters

Given the pivotal role of statistical distributions in modeling insurance and claims data, this study aims to investigate the implications and consequences of model misspecification and offer practical and transparent solutions to mitigate its adverse effects. In this section, we use the real dataset dataCar from the insuranceData package (Wolny-Dominiak and Trzesiok 2014) in R (R Core Team 2025) to motivate the study by examining the behavior of the estimated PDFs and CDFs for all candidate models described in Section 2, as illustrated in Figure 1. The dataset contains information on vehicle insurance policies issued between 2004 and 2005, with variables such as vehicle value (in $10,000), the number of claims filed, and the amount of claims. The variable of interest is the claim amount. Specifically, we use the numeric variable clmcst0 for cases in which a claim was filed (based on the clm variable, where 0 indicates no claim, and 1 indicates at least one claim). From Figure 1, the fitted PDFs and CDFs appear broadly similar. However, the maximum likelihood estimates (MLEs) of the parameters, presented in Table 1, differ substantially among different models. Because tail behavior affects the value of VaR with confidence level

100 δ %

, especially when

δ

is close to 1, and is sensitive to model choice, we also report parameter estimates from the minimum density power divergence (MDPD) estimator, a more robust alternative, which will be introduced in subsequent sections and used later in our selection and ranking procedures (Table 2).

In such cases, fitting candidate models to a real dataset and comparing the fits with the empirical density or distribution may suggest that all the models are suitable for explaining the random phenomenon of interest. However, these models can differ substantially, especially in their tail behavior. Given the heavy-tailed nature of non-life insurance claims, parameter estimation or model discrimination based solely on likelihood-based criteria can be restrictive due to their lack of robustness. Tail-related statistics, such as VaR, are particularly sensitive to the choice of the underlying model, which may lead to misleading conclusions and decisions. To illustrate this point, we report the true VaR values at the 95% and 99% levels for these candidate distributions in Table 1, where it is evident that the VaR estimates differ considerably across models. In such situations, it is intuitively appealing to estimate model parameters using robust procedures, such as the minimum DPD estimation method.

Furthermore, since the VaR values of all the considered models differ significantly (see Table 1), in such a situation, estimating the VaR based on the given data using a single model will likely yield an estimator with a much higher risk of incorrect conclusions, intuitively, it may be argued that for such cases, adopting a model averaging method for estimating the parameters of interest would substantially reduce the risks associated with model uncertainty and misspecification (Steel 2020).

In the following, we first discuss the estimation procedures used to estimate the parameters of the considered models. Subsequently, we provide a mathematical framework for the proposed model averaging techniques used to obtain VaR estimates. For the estimation of the unknown parameter vector of a considered model, under the assumption that the data has originated from

f_{θ_{j}} (\cdot), j = 1, \dots, 6

, we adopt the method of maximum likelihood and the method of minimized density power divergence, respectively. Since the true population is unknown, assuming any of the above six models to be true is considerable for the type of random variable being studied here, irrespective of the true underlying model.

3.1. Likelihood-Based Estimation

Let

x = (x_{1}, x_{2}, \dots, x_{n})

be an independent and identically distributed (i.i.d.) sample of size n from

f_{θ_{j}} (\cdot)

,

j = 1, \dots, 6

. Then, the likelihood function (LF) for the parameter vector

θ_{j}

based on the observed sample x is

\begin{matrix} L (θ_{j} | x) = \prod_{i = 1}^{n} f_{θ_{j}} (x_{i}); j = 1, 2, \dots, 6 . \end{matrix}

(19)

The maximum likelihood estimate (MLE) of the parameter vector

θ_{j}

, denoted as

{\hat{θ}}_{j}

, is obtained by maximizing

L (θ_{j} | x)

or, equivalently, the log-likelihood

ln L (θ_{j} | x)

with respect to

θ_{j}

, i.e.,

\begin{matrix} {\hat{θ}}_{j} = arg max_{θ_{j}} L (θ_{j} | x) \equiv arg max_{θ_{j}} ln L (θ_{j} | x) . \end{matrix}

(20)

Owing to the invariance property of MLEs, we can obtain the MLE of a function

g (θ_{j})

by substituting

{\hat{θ}}_{j}

into

g (θ_{j})

, i.e.,

g ({\hat{θ}}_{j})

is the MLE of

g (θ_{j})

. Hence, the MLE of VaR for a specific confidence level

δ

of the j-th model can be obtained as

{\hat{Q}}_{{\hat{θ}}_{j}} (δ)

. For notational convenience, we denote the MLE of VaR with confidence level

δ

under distribution

f_{θ_{j}} (\cdot)

as

{\hat{Q}}_{j} (δ) = {\hat{Q}}_{{\hat{θ}}_{j}} (δ)

.

3.2. Divergence-Based Estimation

The minimum density power divergence estimation method was proposed by Basu et al. (1998) as a contemporary to the existing estimation methods based on density-based divergence measures. The density power divergence (DPD) measures the difference between a data-driven PDF

g (\cdot) \in G

and an assumed parametric model

f_{θ} (\cdot) \in G

with parameter vector

θ

,

G

being the set of distributions having densities conforming to a dominating measure. Mathematically, it is defined as

\begin{matrix} d_{α} (g, f_{θ}) & = & \int \{f_{θ}^{(1 + α)} (z) - (1 + \frac{1}{α}) g (z) f_{θ}^{α} (z) + \frac{1}{α} g^{(1 + α)} (z)\} d z, α > 0, \end{matrix}

(21)

where

d_{α} (g, f_{θ}) > 0

for all PDFs on the Lebesgue measure and is 0 almost everywhere if and only if

f_{θ} \equiv g

. The MDPD estimator of the parameter vector

θ

, denoted as

\tilde{θ}

, is obtained by minimizing the DPD measure in Equation (21) with respect to

θ

, i.e.,

\tilde{θ} = arg min_{θ} d_{α} (g, f_{θ})

.

For general families of distribution, the MDPD estimating equation has the form

\begin{matrix} U_{n} (t) = \int u_{t} (z) f_{t}^{1 + α} (z) d z - n^{- 1} \sum_{i = 1}^{n} u_{t} (X_{i}) f_{t}^{α} (X_{i}) = 0, \end{matrix}

where

u_{t} (z) = \partial ln f_{t} (z) / \partial t

is the score function. For

α > 0

,

U_{n} (t)

induces a down-weighting mechanism, in which observations that are outliers under the assumed model receive smaller weights. As

α \to 0

, minimizing Equation (21) reduces to the MLE, yielding fully efficient estimators in which all observations (outliers included) are effectively weighted equally. At

α = 1

, minimizing Equation (21) corresponds to the minimum

L_{2}

distance criterion. Thus,

α

acts as a tuning parameter that controls the trade-off between robustness and efficiency of the resulting estimators.

Thus, based on a random sample x of size n from

f_{θ_{j}}

, the MDPD estimate

{\tilde{θ}}_{j}

can be obtained by minimizing the empirical form of Equation (21) given below:

\begin{matrix} H_{α} (θ_{j} | x) & = \int f_{θ_{j}}^{1 + α} (x) d x - (1 + \frac{1}{α}) n^{- 1} \sum_{i = 1}^{n} f_{θ_{j}}^{α} (x_{i}); j = 1, 2, \dots, 6 \end{matrix}

(22)

or by solving

\begin{matrix} \int u_{θ_{j}} (x) f_{θ_{j}}^{1 + α} (x) d x - n^{- 1} \sum_{i = 1}^{n} u_{θ_{j}} (x_{i}) f_{θ_{j}}^{α} (x_{i}) = 0; j = 1, 2, \dots, 6, \end{matrix}

(23)

where

u_{θ_{j}} (x) = \partial ln f_{θ_{j}} (x) / \partial θ_{j}

is the score function corresponding to the density

f_{θ_{j}}

, instead of minimizing Equation (21), since the term

\int g^{1 + α} (z) d z

in Equation (21) is independent of

θ

and the term

\int g (z) f_{θ}^{α} (z) d z

can be replaced by

\int f_{θ}^{α} (z) d G (z)

, where

d G (z)

serves an empirical estimate of

g (z)

, owing to its linear involvement. Note that, as a limiting case, when

α \to 0

, the DPD reduces to the Kullback-Leibler divergence and yields an estimator equivalent to the MLE. On the other hand, the MDPD estimator is equivalent to the minimum mean square error estimator for

α = 1

. Pertaining to the behavior of the DPD measure for

α

in

(0, 1)

, a reasonable choice of

α

lies between 0 and 1 since the efficiency of the estimators gradually decreases with the increase in

α

(Ghosh et al. 2017; Jones et al. 2001).

One of the interesting properties of the MDPD estimation method is that the MDPD estimators also adhere to the invariance principle, akin to the MLEs. This property allows us to study the behavior of VaR or any other characteristic of a probability model based on MDPD estimates without delving into any complicated mathematical framework. Thus, the MDPD estimator of VaR for a given model

f_{θ_{j}}

can be obtained as

{\tilde{Q}}_{{\tilde{θ}}_{j}} (δ)

, corresponding to the chosen confidence level

δ

. Once again, for notational convenience, we denote the MDPD estimate of VaR with confidence level

δ

as

{\tilde{Q}}_{j} (δ) = {\tilde{Q}}_{{\tilde{θ}}_{j}} (δ)

.

4. Estimating Value-at-Risk via Model Averaging

Based on earlier discussions, it is essential to identify the best-fitting model to minimize the degree of uncertainty and inconclusiveness in estimating the quantity of interest. However, owing to the closeness of these models in highly dense regions and highly dissimilar tail behaviors, the effect of misspecification on the VaR in the case of claim size data requires special attention. Two significant kinds of model misspecifications can be considered: (i) a wrongly specified model (e.g., the underlying model is lognormal, but a Weibull model is assumed) and (ii) a contaminated-observation model (e.g., the underlying model is Weibull, but some contaminated observations/outliers follow a different distribution or the same class of distributions with different parameters). For empirical investigation, following White (1982) and Chow (1984), we examine the consequences of model misspecification in terms of the relative bias and relative variability when the MLE and MDPD are used to estimate the VaR through a Monte Carlo simulation study.

This section describes the proposed model selection and ranking approaches based on the maximum likelihood method and the minimum density power divergence method. We consider the practical situation that a specific model

M_{j}

with probability density

f_{θ_{j}}

(

j = 1, 2, \dots, 6

) is considered as the assumed model for modeling a given dataset. Based on this assumption, the MLEs and MDPD estimates of the model parameters, as well as the VaR, can be obtained, along with the associated maximized value of the LF and minimized value of the DPD measure, for the given dataset.

4.1. Model Selection Approach

We consider a model selection approach to select the best-fitting model for a particular dataset. Here, we consider two different approaches for model selection based on the maximized value of the LF and the minimized value of the DPD function for a given dataset x.

For the approach based on the maximized LF, the probability distribution with the maximized value of the likelihood has an intuitive appeal and can be reasonably claimed to be the best among all the considered models. Thus, for a given dataset, we can order all the k candidate models based on the magnitude of the maximized value of the LF if these models have the same number of parameters. Specifically, we denote

M_{[1]} > M_{[2]} > \dots > M_{[k]}

, where the subscript

[\cdot]

indicates the ordered position of a model, and the notation “>” means “is preferable to”. The same idea can be implemented using the DPD measure. For a given dataset x, the model yielding the minimum value of the DPD statistic

H_{α} ({\tilde{θ}}_{j} | x)

in Equation (21) can be considered as the closest one to the true model

g (\cdot)

among all the candidate models. Hence, based on the minimized value of the DPD statistic, one can order the models such that

M_{[1]}^{*} > M_{[2]}^{*} > \dots > M_{[k]}^{*}

, where

M_{[1]}^{*}

is the model with the smallest value of the DPD statistic,

M_{[2]}^{*}

is the model with the second smallest value of the DPD statistic, and so on. In other words,

M_{[1]} = M_{j}

means model

M_{j}

is the most preferable model based on the likelihood function approach, and

M_{[1]}^{*} = M_{j}

means model

M_{j}

is the most preferable model based on the DPD measure approach. For the model selection approach, after selecting the most probable model, we can estimate the VaR with the selected models based on likelihood function and DPD measure, i.e., models

M_{[1]}

and

M_{[1]}^{*}

, respectively.

4.2. Model Ranking Approach

To reduce the risk of misspecification and improve the accuracy of estimating the quantity of interest, model averaging methods have been widely adopted and found useful across various domains (see, for example, Chatfield 1995; Dormann et al. 2018; Okoli et al. 2018; Wintle et al. 2003). One argument for using the VaR estimates from multiple models for a given dataset is to mitigate the potential risks associated with relying on a single selected model, especially since the true model is unknown in practice. Model averaging essentially protects against a potentially suboptimal or misspecified model. Moreover, estimates from an individual model may exhibit high variance, whereas combining estimates from multiple models can effectively reduce this variance, leading to a more stable and accurate overall estimate. An ensemble estimate is typically less susceptible to the limitations of any single model and leverages the collective strengths of multiple models. Kass and Raftery (1995) and Raftery (1995) discussed the implementation of model averaging in the Bayesian paradigm through the use of posterior odds and/or the Bayes factor. However, the Bayesian model averaging methods can be sensitive to the choice of priors. An interesting yet simple method for model averaging in the frequentist framework was explored by Buckland et al. (1997) in which scaled weights

w_{j}

are assigned to each of the candidate models

M_{j}

(

j = 1, \dots, k

) and the weighted average is the final estimate of the quantity of interest.

Following Buckland et al. (1997), we compute the weighted estimate of the quantity of interest. For instance, if the quantity of interest is VaR, instead of estimating the VaR based on the model with the maximized value of the LF or the model with the minimum value of the DPD statistic (i.e., the model selection approach in Section 4.1), we consider the best

k^{*} (< k)

models to obtain an estimate of VaR. Suppose the MLEs and MDPD estimates of VaR at

δ

level based on models

M_{[ℓ]}

and

M_{[ℓ]}^{*}

are

{\hat{Q}}_{[ℓ]} (δ)

and

{\tilde{Q}}_{[ℓ]} (δ)

, respectively, then using the model ranking approach, the weighted VaR estimates based on maximum likelihood and DPD, denoted as

{\hat{Q}}^{*} (δ)

and

{\tilde{Q}}^{*} (δ)

, can be obtained as follows:

\begin{matrix} {\hat{Q}}^{*} (δ) & = & \sum_{ℓ = 1}^{k^{*}} w_{ℓ} \cdot {\hat{Q}}_{[ℓ]} (δ) \end{matrix}

(24)

\begin{matrix} and {\tilde{Q}}^{*} (δ) & = & \sum_{ℓ = 1}^{k^{*}} w_{ℓ}^{*} \cdot {\tilde{Q}}_{[ℓ]} (δ) \end{matrix}

(25)

respectively, with weights

w_{ℓ}

and

w_{ℓ}^{*}

,

ℓ = 1, 2, \dots k^{*}

, such that

\sum_{l = 1}^{k^{*}} w_{ℓ} = \sum_{l = 1}^{k^{*}} w_{ℓ}^{*} = 1 .

The associated weights

w_{ℓ}

(

w_{ℓ}^{*}

) are considered proportional to the maximized (minimized) value of LF (DPD), i.e.,

w_{1} > \dots > w_{k^{*}}

(

w_{1}^{*} > \dots > w_{k^{*}}^{*}

). Specifically, for the likelihood approach, suppose

L F_{[ℓ]}

is the value of maximum likelihood based on model

M_{[ℓ]}

and hence,

L F_{[1]} > L F_{[2]} > L F_{[3]} > \dots > L F_{[k]}

, then, for the top

k^{*}

models, the weights in terms of the maximized value of the LF can be expressed as

\begin{matrix} w_{ℓ} = \frac{L F_{[ℓ]}}{\sum_{ℓ^{'} = 1}^{k^{*}} L F_{[ℓ^{'}]}}, \end{matrix}

for

ℓ = 1, 2, \dots, k^{*}

, such that

w_{1} > w_{2} > \dots > w_{k *}

, and

\sum_{ℓ^{'} = 1}^{k^{*}} w_{ℓ^{'}} = 1

. Similarly, for the divergence approach, suppose

D P D_{[ℓ]}

is the value of density power divergence based on model

M_{[ℓ]}^{*}

, hence,

D P D_{[1]} < D P D_{[2]} < \dots < D P D_{[k]}

, and since the values of

H_{α} (\tilde{θ} | x)

for all considered models are less than 0, for the top

k^{*}

models, the weights in terms of the minimized values of the DPD can be expressed as

\begin{matrix} w_{ℓ}^{*} = \frac{| D P D_{[ℓ]} |}{\sum_{ℓ^{'} = 1}^{k^{*}} |D P D_{[ℓ^{'}]}|}, \end{matrix}

for

ℓ = 1, 2, \dots, k^{*}

, such that

w_{1}^{*} > w_{2}^{*} > \dots > w_{k *}^{*}

, and

\sum_{ℓ^{'} = 1}^{k *} w_{ℓ^{'}}^{*} = 1

.

Note that the model selection approach discussed in Section 4.1 can be formulated as a special case of the model ranking approach by setting

w_{1} = 1

and

w_{2} = \dots = w_{k *} = 0

(or

w_{1}^{*} = 1

and

w_{2}^{*} = \dots = w_{k *}^{*} = 0

). Furthermore, employing a specific model

M_{j}

without model selection can be viewed as a special case within the framework of the proposed model ranking approach, achieved by assigning

w_{j} = 1

and

w_{ℓ} = 0

for

ℓ \neq j

. Consequently, the efficacy of the proposed model ranking approach aligns with that of the non-model selection approach when suitable values for

k^{*}

and the weights are chosen. See Li et al. (2020); Mehrabani and Ullah (2020); Miljkovic (2025); Miljkovic and Grün (2021) and related work for further discussion of weighted estimates from multiple models.

5. Monte Carlo Simulation Study

In this section, a Monte Carlo simulation study is used to evaluate the performance of the proposed methodologies in the presence and absence of contamination. To compare the performance of different approaches and estimators, we consider the simulated biases and root mean square errors (RMSEs) of the MLE and MDPD estimates of VaR with confidence level

δ

under statistical distribution

f_{θ_{j}}

based on

N = 1000

simulations, which can be computed as

\begin{matrix} B i a s ({\hat{Q}}_{j} (δ)) & = & \frac{1}{N} \sum_{s = 1}^{N} {\hat{Q}}_{j}^{(s)} (δ) - Q_{j} (δ), \end{matrix}

(26)

\begin{matrix} R M S E ({\hat{Q}}_{j} (δ)) & = & {\{\frac{1}{N} \sum_{s = 1}^{N} {[{\hat{Q}}_{j}^{(s)} (δ) - Q_{j} (δ)]}^{2}\}}^{1 / 2}, \end{matrix}

(27)

\begin{matrix} B i a s ({\tilde{Q}}_{j} (δ)) & = & \frac{1}{N} \sum_{s = 1}^{N} {\tilde{Q}}_{j}^{(s)} (δ) - Q_{j} (δ), \end{matrix}

(28)

\begin{matrix} R M S E ({\tilde{Q}}_{j} (δ)) & = & {\{\frac{1}{N} \sum_{s = 1}^{N} {[{\tilde{Q}}_{j}^{(s)} (δ) - Q_{j} (δ)]}^{2}\}}^{1 / 2}, \end{matrix}

(29)

where

{\hat{Q}}_{j}^{(s)} (δ)

and

{\tilde{Q}}_{j}^{(s)} (δ)

are the MLE and MDPD estimate of VaR of

f_{θ_{j}}

with confidence level

δ

in the s-th simulation, and

Q_{j} (δ)

is the true value of VaR of

f_{θ_{j}}

with confidence level

δ

.

To facilitate the comparison of the various estimation methods and the proposed model selection/ranking strategies for addressing model uncertainty, we compute the relative bias and relative RMSE of two estimators, say

{\hat{Q}}_{(1)}

and

{\hat{Q}}_{(2)}

, defined as

\begin{matrix} R B i a s ({\hat{Q}}_{(1)}, {\hat{Q}}_{(2)}) & = & \frac{B i a s ({\hat{Q}}_{(1)})}{B i a s ({\hat{Q}}_{(2)})} \end{matrix}

(30)

\begin{matrix} and R R M S E ({\hat{Q}}_{(1)}, {\hat{Q}}_{(2)}) & = & \frac{R M S E ({\hat{Q}}_{(1)})}{R M S E ({\hat{Q}}_{(2)})}, \end{matrix}

(31)

respectively. The absolute value of the relative bias in Equation (30) and the value of relative RMSE in Equation (31) less than 1 indicate that estimator

{\hat{Q}}_{(2)}

is a better estimator compared to

{\hat{Q}}_{(1)}

. The sign of the relative bias in Equation (30) indicates if the directions of the biases of the two estimators

{\hat{Q}}_{(1)}

and

{\hat{Q}}_{(2)}

are the same (positive RBias) or they are opposite to each other. The simulation results are obtained based on a simulation size

N = 1000

and reported in the Appendices.

First, considering the assumed model and the true model agree (i.e., without model misspecification), we compare the performance of the likelihood-based and divergence-based parameter estimation procedures presented in Section 3. The average MLEs and MDPD estimates of

Q_{j} (0.99)

for the six probability models in Section 2 (

j = 1, 2, \dots, 6)

, the biases and RMSEs of the MLEs, and the relative biases and relative RMSEs of the MDPD estimates relative to the MLEs (i.e.,

R B i a s ({\tilde{Q}}_{j} (δ), {\hat{Q}}_{j} (δ))

and

R R M S E ({\tilde{Q}}_{j} (δ), {\hat{Q}}_{j} (δ))

) are presented in Table A1. From Table A1, it is evident that when the assumed model is indeed the true model and without any contaminated observations, the performance of the MLE is better than that of the MDPD estimator in terms of MSE, while the MDPD estimator provides smaller biases compared to the MLE in some situations.

In the following subsection, we present the results of the Monte Carlo simulation study to investigate the performance of the estimation procedures and the proposed model selection and ranking approaches under model misspecification with and without contamination. In Section 5.1, we compare the performance of the proposed model selection and model ranking procedures based on likelihood and divergence methods where contamination is not present. Additionally, since estimators based on DPD measures are known to be robust, we consider a contamination model in Section 5.2 and compare the performances of the proposed procedures to determine whether methods based on DPD are relatively better than those based on LF.

5.1. Performance of the Proposed Methods in the Absence of Contamination

To assess the performance of the proposed methods in the absence of contamination, the data is assumed to be governed by one of the six candidate distributions discussed in Section 2. Specifically, we generate random samples of size n from

F S (1.43, 834.95)

,

F R (1.05, 518.75)

,

L M (2.04, 2202.85)

,

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

, and

W E (0.79, 1690.57)

for

N = 1000

simulations.

Before evaluating the performance of the proposed model selection and model ranking approaches, we study the performance of the MLE and MDPD estimation methods for VaR under model misspecification with the non-model-selection approach (i.e., estimate the VaR under the assumed model

M_{j}

). For each dataset simulated from a true model (say, model

M_{h}

), the parameter vector and the corresponding VaR values are estimated under the assumed model using MLE and MDPD estimation methods. In Table A2, Table A3 and Table A4, we present the average VaR estimates with confidence level

0.99

, the relative biases relative to the bias of the MLEs, when the assumed model is the true model (i.e.,

R B i a s ({\hat{Q}}_{j} (δ), {\hat{Q}}_{h} (δ))

and

R B i a s ({\tilde{Q}}_{j} (δ), {\hat{Q}}_{h} (δ))

) and the relative RMSEs relative to the RMSE of the MLEs when the assumed model is the true model (i.e.,

R R M S E ({\hat{Q}}_{j} (δ), {\hat{Q}}_{h} (δ))

and

R R M S E ({\tilde{Q}}_{j} (δ), {\hat{Q}}_{h} (δ))

) respectively. Therefore, when the assumed model is the true model (values highlighted in boldface), the value of

R B i a s ({\hat{Q}}_{j} (δ), {\hat{Q}}_{h} (δ))

and

R R M S E ({\hat{Q}}_{j} (δ), {\hat{Q}}_{h} (δ))

is 1 under the likelihood method. Table A2, Table A3 and Table A4 illustrate that when the model is misspecified, biases in estimation arise, regardless of the method used for estimating the VaR. Furthermore, although some RMSEs may be smaller in certain instances compared to those observed under correct model specifications, overall, RMSEs tend to be higher when the model is misspecified. Particularly noteworthy is the substantial increase in biases and RMSEs when the model is misspecified as a Fréchet distribution. These findings underscore the imperative of developing effective strategies to mitigate the impact of model misspecification.

Here, we also evaluate the performance of VaR estimates based on the proposed model selection and model ranking approach presented in Section 4. For each simulated dataset, we perform the model selection approach by using the likelihood-based and the divergence-based approaches; then, based on the selected model (i.e., the most probable model), we compute the estimate of VaR by using the MLE or MDPD estimation methods, as

{\hat{Q}}^{†} (δ)

and

{\tilde{Q}}^{†} (δ)

, respectively. To evaluate the performance of the estimates

{\hat{Q}}^{†} (δ)

and

{\tilde{Q}}^{†} (δ)

using the model selection approach, we compute the relative biases and relative RMSEs relative to the biases and RMSEs of the MLEs under the correctly specified model

M_{h}

, i.e.,

R B i a s ({\hat{Q}}^{†} (δ), {\hat{Q}}_{h} (δ))

and

R R M S E ({\hat{Q}}^{†} (δ), {\hat{Q}}_{h} (δ))

for the model selection approach using MLE, and

R B i a s ({\tilde{Q}}^{†} (δ), {\hat{Q}}_{h} (δ))

and

R R M S E ({\tilde{Q}}^{†} (δ), {\hat{Q}}_{h} (δ))

for the DPD measure approach).

In Table A5, the average values of the estimates of VaR, based on the model selection approach, along with the associated relative biases and the relative RMSEs of all the proposed estimators, relative to the corresponding bias and RMSEs of the MLEs obtained under the assumption of the correctly specified model. Moreover, we present the simulation proportion of the best-fitting model in Table A6 (in terms of maximized likelihood or minimized divergence) among all competing models. The boldfaced proportions in Table A6 represent the proportions of the true model from which the data is generated, being selected as the best model in terms of maximized likelihood or minimized DPD in

N = 1000

simulations. The other proportions for different assumed models against a given true model indicate the proportions of the assumed model being chosen as the best model in the

N = 1000

simulations, which is the model then used to compute the final VaR estimates with Equation (24).

For the model ranking approach, we consider

k^{*} = 2

and 3, i.e., use the top two and top three most probable models to compute the weighted average estimate of VaR. In Table A7 and Table A8, we present the average estimates, the relative biases, and relative RMSEs relative to the MLEs under the true model for

{\hat{Q}}^{*} (δ)

and

{\tilde{Q}}^{*} (δ)

with

k^{*} = 2

and 3, respectively. To evaluate the performance of the proposed model ranking approach in identifying the true model, in Table A9 and Table A10, we present the proportion of times the true model is ranked within the top

k^{*}

(denoted by order of preference

= 1, 2

for

k^{*} = 2

; and order of preference

= 1, 2, 3

for

k^{*} = 3

) and the proportion of times the true model is not ranked as top

k^{*}

(denoted by order of preference = 0). In other words, the order of preference 0 indicates the case when the true model is not used for computing the weighted average VaR estimate.

The results in Table A7, Table A8, Table A9 and Table A10 indicate that, as expected, estimation procedures without knowledge of the true model generally perform worse than MLEs under the true model. Among the model selection (

k^{*} = 1

) and model ranking approaches (

k^{*} = 2

and

k^{*} = 3

), the relative MSEs are smaller for

k^{*} = 2

compared to

k^{*} = 1

or

k^{*} = 3

for most models considered, except that of the Weibull distribution. The value of

k^{*}

is subjective and is determined by the degree of confidence in the performance of the estimator under study. Specifically, when the level of uncertainty in the quantity to be estimated is high, larger values of

k^{*}

are more appropriate, and vice versa.

Although these estimators do not perform as well as those using the true model, model averaging offers better credibility since an incorrect model would be highly inconsistent. Additionally, Table A9 and Table A10 show that the proposed model selection and ranking techniques may not have high discriminating power for all models. For example, models like Fisk, Lomax, and paralogistic have proportions of the true model used for VaR calculation via the model ranking method of less than 0.5, likely due to the similarity of the candidate distributions, as discussed in Section 3.

5.2. Performance of the Proposed Methods in the Presence of Contamination

In non-life insurance claims, it is common to encounter some exceptionally high-magnitude observations. Although these occurrences are infrequent, yet, they can significantly distort model fitting and model selection if the criteria used are not robust. Literature suggests that the density power divergence method is a robust estimation criterion (see, for example, Basu et al. 1998, 2013; Jones et al. 2001, and the articles cited therein). When analyzing and fitting probability models to contaminated datasets, two key questions often arise. The first is how to identify and estimate the effect of contamination or outliers, especially if they are of significant importance. The second is how to design an estimator that is minimally affected by outliers, allowing the true characteristics of the data, free from contamination, to be accurately studied. In practical applications, contamination models can be treated as a convex mixture of two distributions (Brownie et al. 1983; Punzo and Tortora 2021).

In this subsection, we evaluate the performance of the model selection and model ranking approaches under contamination models. We generate contaminated data under a fixed contamination model, where a fixed proportion

ϵ

of the sample is obtained from the contamination model, and the remaining observations in the sample come from the original model of interest. Here, the observed random sample from the original model

f_{θ}

is denoted as

{\underset{\sim}{x}}^{(o)}

, and the observed random sample from the contamination model

f_{θ^{(c)}}

is respectively denoted as

{\underset{\sim}{x}}^{(c)}

. The following algorithm is used to generate a contaminated sample of size n for a given

ϵ

:

Generate ${\underset{\sim}{x}}^{(c)}$ $= (x_{1}, x_{2}, \dots, x_{[n \cdot ϵ]}$ ) from $f_{θ^{(c)}}$ , where $[n \cdot ϵ]$ is integer value of $(n \cdot ϵ)$ ;
Generate ${\underset{\sim}{x}}^{(o)}$ $= (x_{[n \cdot ϵ] + 1}, x_{[n \cdot ϵ] + 2}, \dots, x_{n - [n \cdot ϵ]}$ ) from $f_{θ}$ ;
Return the sample $\underset{\sim}{x} = ({\underset{\sim}{x}}^{(o)}, {\underset{\sim}{x}}^{(c)})$ .

For the probability distributions

f_{θ}

and

f_{θ^{(c)}}

, we consider the six probability distributions described in Section 2, where

f_{θ^{(c)}}

has the same probability law as

f_{θ}

, but with different values of scale parameters. The values of parameters of the contamination models were chosen to yield significantly high values of claim sizes, which would mostly appear as outliers for the original model (Wellmann and Gather 1999).

The performances of the proposed methods are evaluated in terms of their relative biases and relative RMSEs in connection with the biases and RMSEs of the MLEs of VaR values obtained based on the correctly specified model in the presence of contamination. An intuitive justification for comparing the proposed methods to the MLEs in the presence of contamination is that divergence-based measures are known to be robust. Therefore, it is reasonable to believe that one or more deviant observations should not significantly affect the divergence-based estimator’s general behavior for the characteristic of interest, which is the estimate of VaR of the original model.

As discussed in the contamination-free case in Section 5.1, before evaluating the performance of the proposed model selection and model ranking approaches, we examine the performance of the MLE and MDPD estimation methods for VaR under model misspecification and contamination without employing model selection. Table A18 presents the average VaR estimates along with their associated biases and RMSEs for the MLEs, and the relative biases and relative RMSEs for the MDPD estimates for some selected models. The table highlights the significant deviations of the VaR estimates from those obtained from the correctly specified model. Additionally, the biases and RMSEs of the MLEs for VaR are notably higher in the presence of contamination compared to the scenario without contamination.

We then evaluate the performance of the MLE and MDPD estimation methods for VaR under model misspecification with data contamination. Specifically, for the case with 10% contamination (i.e.,

ϵ = 0.1

), Table A19, Table A20 and Table A21 present the average VaR estimates along with their associated relative biases and relative RMSEs. These metrics are compared to the bias and MSE of the MLEs under the correctly specified model. The results in these tables indicate that the MDPD estimator for VaR outperforms the MLE in terms of reduced relative risks, particularly for values of the tuning parameter that are away from 0. Additionally, the findings highlight that using a non-model selection approach can lead to misleading VaR estimates when the model is misspecified.

Table A22, Table A23 and Table A24 present the average VaR estimates using the model selection approach (i.e., model ranking approach with (

k^{*} = 1

) and the model ranking approach with (

k^{*} = 2

) and (

k^{*} = 3

), along with their corresponding relative biases and relative RMSEs, compared to the MLE under the correctly specified model with 10% data contamination. Additionally, the simulation proportions of the true model being included in the final VaR estimate calculation based on the model selection approach and the model ranking approach with (

k^{*} = 2

) and (

k^{*} = 3

) are shown in Table A19, Table A20 and Table A21. These simulation results demonstrate that the proposed model selection and ranking approaches, particularly those based on the minimized DPD, provide more accurate VaR estimates compared to using a prespecified model when contamination is present in the dataset. Interestingly, there are cases (e.g., when the true model is lognormal or Weibull) where the true model is rarely used in the final VaR estimates. Overall, the model averaging technique based on minimized DPD offers a more accurate procedure for estimating VaR, with significantly better reliability in the presence of contamination, even though the true model may seldom be correctly identified.

To keep this paper concise, we present only some representative simulation results here; additional simulation results are provided in Appendices Appendix B and Appendix D.

6. Practical Data Analyses

In this section, we demonstrate the application of the proposed model selection and ranking approaches for estimating VaR using real claim size data from non-life insurance. Section 6.1 applies the methodology to the dataset introduced earlier in Section 3, which was used to motivate the study and illustrate the behavior of the candidate models. Section 6.2 extends the analysis to a different dataset from the insuranceData package in R (R Core Team 2025), thereby providing an additional example to assess the robustness and practical applicability of the proposed procedures across distinct non-life insurance contexts.

6.1. Vehicle Insurance Data

For the dataCar dataset discussed in Section 3, we estimate the VaR under the six candidate models and apply the model selection and model averaging approaches presented in the previous sections. The VaR estimates based on the model selection (for this dataset, the Fréchet distribution is the selected model) and based on the model ranking approach (with

k^{*} = 2

and 3) for both maximum likelihood and minimum density power divergence methods are reported in Table 2.

From the results in Table 2, it is both interesting and expected to observe significant variations in the VaR estimates, particularly at the 99% level. Given the unknown ground truth of the underlying probability model for vehicle insurance claim amounts, these results suggest that selecting different models can lead to substantially different conclusions. Conversely, the model ranking and averaging approaches moderate these differences, providing a more balanced estimate of VaR. We will further illustrate this point in the following numerical example.

6.2. Partial Casco Motorcycles Insurance Data

The dataset called dataOhlsson comprises partial casco insurance for motorcycles insured and claimed from the Swedish insurance company from 1994 to 1998. We are interested in the non-zero claim amounts in Swedish Krona (SEK) of those partial casco motorcycle insurance claims described in the variable skadkost. The fitted CDFs of the six candidate models for this dataset are plotted in Figure 2, and the corresponding VaR estimates based on the six candidate models are presented in Table 3. In Table 3, we highlight the VaR estimates based on the model selection (here, the lognormal distribution is the selected model) and present the VaR estimates using the model ranking approach (with

k^{*} = 2

and 3) for both maximum likelihood and minimum density power divergence methods.

Once again, we observe significant variations in the VaR estimates, particularly at the 99% level. These variations highlight the risks associated with relying on a single assumed model for VaR estimation, especially when the model is misspecified. For instance, if one considers the distribution of non-zero claim amounts for partial casco motorcycle insurance claims follows a Fréchet distribution, the maximum likelihood method indicates a 99% probability that future claim amounts will not exceed 1.19 billion. In stark contrast, if a Weibull distribution is assumed, the same method suggests a 99% probability that future claim amounts will not exceed 17.52 million, which is approximately 68 times smaller than the Fréchet estimate. This discrepancy highlights the importance of model selection in estimating VaR.

Although these numerical examples cannot prove the superiority of the proposed model selection and ranking approaches due to the unknown ground truth, they demonstrate that these approaches help mitigate the risk of drawing highly inaccurate conclusions resulting from model misspecification.

7. Extension to Candidate Models with Unequal Number of Parameters

In the preceding sections, the candidate probability models (see Section 2) are all two-parameter models, implying equal model complexity. This choice is common in comparative studies, as models with the same number of parameters allow for a fair comparison.

However, in some situations, researchers may consider probability models with different numbers of parameters. In such cases, model complexity must be explicitly accounted for when comparing model fits. In this section, we address the question of whether the model selection and ranking procedure based on either the LF or the DPD measure remains valid and practical when the candidate set contains models with unequal numbers of parameters. Although our earlier development assumed that all candidate models have the same number of unknown parameters, this section demonstrates how the proposed method can be extended to accommodate models with differing parameter counts.

When the likelihood-based estimation method described in Section 3.1 is used for parameter estimation, information-based criteria—such as the Akaike information criterion (AIC), the Bayesian information criterion (BIC), and the deviance information criterion (DIC)—are widely adopted for model selection (see, for example, Burnham and Anderson 2002; Castilla et al. 2020; Karagrigoriou and Mattheou 2009). These criteria incorporate a penalty for the number of model parameters, thereby addressing differences in model complexity. For example, the AIC is given by

\begin{matrix} A I C = 2 κ - 2 ln (L (\hat{θ} | x)], \end{matrix}

where

κ

is the number of parameters in the model,

L (\hat{θ} | x)

is the value of maximized likelihood function defined in Equation (19) evaluated at the MLE

\hat{θ}

defined in Equation (20). The AIC balances goodness-of-fit with parsimony by penalizing excessive parameters, favoring models that remain simple yet adequately explain the data. A smaller AIC value reflects a more desirable trade-off between fit and complexity.

When the divergence-based estimation method described in Section 3.2 is employed, the robust divergence-based Bayesian criterion (DBBC) proposed by Kurata and Hamada (2020), denoted

R C C_{α}

can serve as a model selection and ranking tool. It is defined as

\begin{matrix} R C C_{α} = 2 n H_{α} (x; \tilde{θ}) + κ ln (n), \forall α \geq 0, \end{matrix}

where

κ

denotes the number of parameters in the model and

H_{α} (x; \tilde{θ})

is the minimized value of the DPD measure given in Equation (22), evaluated at

θ = \tilde{θ}

. Like the AIC,

R C C_{α}

penalizes models with more parameters, and a lower value indicates a better compromise between model fit and complexity.

For illustration, consider a set of four candidate models, where two are generalized versions of the other two and naturally include them as special cases. Specifically, we use the exponentiated Weibull distribution, denoted as

E W (β_{7}, ν, λ_{7})

, (Mudholkar and Srivastava 1993), and the exponentiated Fréchet distribution, denoted as

E F (β_{8}, η, λ_{8})

, (Nadarajah and Kotz 2003), analyze the vehicle insurance dataset discussed in Section 3 and Section 6.1. The PDFs of

E W (β_{7}, ν, λ_{7})

and

E F (β_{8}, η, λ_{8})

are given by

\begin{matrix} f_{θ_{7}} (x) & = & \{\begin{matrix} ν β_{7} λ_{7} x^{β_{7} - 1} {[1 - e^{- λ_{7} x^{β_{7}}}]}^{ν - 1} e^{- λ_{7} x^{β_{7}}}, & for x > 0; \\ 0, & otherwise, \end{matrix} \end{matrix}

(32)

\begin{matrix} f_{θ_{8}} (x) & = & \{\begin{matrix} η β_{8} λ_{8}^{β_{8}} {[1 - e^{- {(λ_{8} / x)}^{β_{8}}}]}^{η - 1} x^{- (1 + β_{8})} e^{- {(λ_{8} / x)}^{β_{8}}}, & for x > 0; \\ 0, & otherwise, \end{matrix} \end{matrix}

(33)

where

θ_{7} = (β_{7}, ν, λ_{7})

and

θ_{8} = (β_{8}, η, λ_{8})

are the respective parameter vectors;

λ_{7} > 0

and

λ_{8} > 0

are the scale parameters, while

β_{7}, ν > 0

and

β_{8}, η > 0

are the shape parameters.

Table 4 presents the model ranking results based on the AIC and the

R C C_{α}

criteria for

α = 0.05, 0.1

and

0.2

. The four candidate models include two basic two-parameter models (the Fréchet (FR) and Weibull (WE) distributions) and their corresponding three-parameter generalizations, the exponentiated Fréchet (EF) and exponentiated Weibull (EW) distributions.

Across both AIC and

R C C_{α}

measures, the EF and EW models consistently outperform their simpler counterparts (

F R

and

W E

, respectively) for all

α

values considered here. This improvement can be attributed to the inclusion of an additional shape parameter in the EF and EW distributions, which increases model flexibility and allows a better adaptation to the tail behavior and overall distributional shape of the data.

Importantly, both AIC and

R C C_{α}

incorporate a penalty term for model complexity, ensuring that a model is not favored solely because it has more parameters. The fact that EF and EW achieve lower (better) values for these penalized criteria indicates that the improvement in goodness of fit more than compensates for the increase in parameter count. In other words, the additional shape parameter provides meaningful explanatory power rather than overfitting noise in the data.

These results suggest that, in this application, allowing greater distributional flexibility via an additional shape parameter can lead to a statistically justifiable improvement in model performance, even when penalizing for increased complexity. For explanatory purposes, the VaR estimates at

δ = 0.95

for the four candidate models are reported in Table 5.

It is worth noting that the methodology proposed here can be applied with model-selection criteria beyond AIC and

R C C_{α}

. For example, goodness-of-fit measures, such as the Kolmogorov-Smirnov distance and the Anderson-Darling statistic, can be considered.

8. Concluding Remarks

The findings presented in this article provide significant insights into the implications and mitigation strategies for model misspecification in insurance claims data analysis. By exploring various probability models and adopting both likelihood-based and divergence-based estimation methods, this study highlights the importance of accurate model selection and ranking in minimizing the adverse effects of model uncertainty. The numerical examples and simulation studies demonstrate that the proposed model selection and ranking approaches, particularly those based on the divergence measure, offer robust solutions for estimating risk measures such as VaR.

The practical implications of this research are profound, as these methods effectively address inaccuracies arising from model misspecification. These methods provide practicing and academic actuaries with advanced tools to mitigate risks and achieve more reliable outcomes in their risk assessments, even in the presence of model misspecification. Practically, relying on a single, potentially misspecified model can yield large errors and misleading conclusions for VaR. By aggregating across competing models and using robust estimators when appropriate, our approach yields more stable VaR estimates and decisions that are less sensitive to outliers, which are common in insurance claims data.

This, in turn, enhances the reliability of insurance data analysis, ultimately leading to better decision-making in the insurance industry. Future research could expand on these methods to refine the accuracy and applicability of model selection and ranking approaches in various actuarial contexts.

A few limitations of the proposed methodologies should also be acknowledged. Under model uncertainty, the true data-generating process is unknown, and all candidate parametric models are approximations; likelihood-based and divergence-based estimators target pseudo-true parameters, and the resulting risk measures may still be biased if the candidate models are poorly specified. Although aggregating across multiple probability distributions and employing robust estimators such as the MDPD can substantially reduce sensitivity to misspecification and outliers, the performance of the approach ultimately depends on the researcher’s ability to select a reasonably rich and flexible set of candidate models. If these candidates are uniformly inadequate, the resulting estimates may remain unreliable.

Author Contributions

Investigation, Methodology, and Original Draft Preparation: S.B.; Conceptualization, Methodology, Writing, Review, and Editing: H.K.T.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by The Society of Actuaries’ Committee on Knowledge Extension Research and the Casualty Actuarial Society Individual Grant. The initial work on this article commenced while the first author was visiting the Department of Mathematical Sciences, Bentley University (Waltham, MA), under the U.S. Department of State’s (Fulbright Scholars) Exchange Visitor program (Program No. G-1-00005).

Data Availability Statement

The original data presented in the study are openly available in the insuranceData R package (Wolny-Dominiak and Trzesiok 2014) at https://doi.org/10.32614/CRAN.package.insuranceData, accessed on 20 October 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

List of notations used in the tables in Appenidx A, Appenidx B, Appenidx C and Appenidx D.

${AV}_{L}$	Average ${\hat{Q}}_{j} (δ)$ for Likelihood
${AV}_{D}$	Average ${\tilde{Q}}_{j} (δ)$ for Divergence
${BS}_{L}^{T}$	$B i a s ({\hat{Q}}_{j} (δ))$
${MS}_{L}^{T}$	$R M S E ({\hat{Q}}_{j} (δ))$
${BS}_{L}$	$R B i a s ({\hat{Q}}_{j} (δ), {\hat{Q}}_{h} (δ))$
${BS}_{D}$	$R B i a s ({\tilde{Q}}_{j} (δ), {\hat{Q}}_{h} (δ))$
${MS}_{L}$	$R R M S E ({\hat{Q}}_{j} (δ), {\hat{Q}}_{h} (δ))$
${MS}_{D}$	$R R M S E ({\tilde{Q}}_{j} (δ), {\hat{Q}}_{h} (δ))$
${AV}_{s}$	Average of ${\hat{Q}}^{†} (δ)$ or ${\tilde{Q}}^{†} (δ)$
${AV}_{k^{*}}$	Average of ${\hat{Q}}^{} (δ)$ or ${\tilde{Q}}^{} (δ)$
${BS}_{s}$	$R B i a s ({\hat{Q}}^{†} (δ)$ or ${\tilde{Q}}^{†} (δ), {\hat{Q}}_{h} (δ))$
${BS}_{k^{*}}$	$R B i a s ({\hat{Q}}^{} (δ)$ or ${\tilde{Q}}^{} (δ), {\hat{Q}}_{h} (δ))$
${MS}_{s}$	$R R M S E ({\hat{Q}}^{†} (δ)$ or ${\tilde{Q}}^{†} (δ), {\hat{Q}}_{h} (δ))$
${MS}_{k^{*}}$	$R R M S E ({\hat{Q}}^{} (δ)$ or ${\tilde{Q}}^{} (δ), {\hat{Q}}_{h} (δ))$

Appendix A. VaR Estimation for δ = 0.99 Without Contamination

Table A1. Simulated average estimate of VaR with associated bias and RMSEs (relative bias and relative RMSEs of divergence estimators) for

δ = 0.99

, when the true models are

F S (1.43, 834.95)

,

F R (1.05, 518.75)

,

L M (2.04, 2202.85)

,

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

, denoted as

F S

,

F R

,

L M

,

L N

,

P L

and

W E

respectively.

Table A1. Simulated average estimate of VaR with associated bias and RMSEs (relative bias and relative RMSEs of divergence estimators) for

δ = 0.99

, when the true models are

F S (1.43, 834.95)

,

F R (1.05, 518.75)

,

L M (2.04, 2202.85)

,

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

, denoted as

F S

,

F R

,

L M

,

L N

,

P L

and

W E

respectively.

n	Estimator	$α$		Model
				$j = 1$	$j = 2$	$j = 3$	$j = 4$	$j = 5$	$j = 6$
			Statistics ^(a)	$FS$	$FR$	$LM$	$LN$	$PL$	$WE$
			$Q_{j} (δ)$	2.076	4.146	1.885	0.770	1.900	1.168
50	Likelihood		${AV}_{L}$	2.228	4.596	2.014	0.779	2.077	1.167
			${BS}_{L}^{T}$	0.152	0.450	0.128	0.009	0.177	−0.002
			${MS}_{L}^{T}$	1.044	3.286	1.382	0.268	1.209	0.275
	Divergence	0.05	${AV}_{D}$	2.238	4.613	2.080	0.784	2.099	1.175
			${BS}_{D}$	1.067	1.037	1.518	1.617	1.124	−3.974
			${MS}_{D}$	1.015	1.002	1.077	1.029	1.024	1.028
		0.10	${AV}_{D}$	2.250	4.653	2.165	0.790	2.124	1.181
			${BS}_{D}$	1.146	1.126	2.177	2.209	1.264	−8.247
			${MS}_{D}$	1.053	1.030	1.271	1.075	1.076	1.076
		0.20	${AV}_{D}$	2.276	4.755	2.493	0.800	2.181	1.194
			${BS}_{D}$	1.318	1.352	4.733	3.324	1.588	−16.199
			${MS}_{D}$	1.165	1.126	2.324	1.199	1.259	1.268
100	Likelihood		${AV}_{L}$	2.136	4.325	1.904	0.787	2.002	1.165
			${BS}_{L}^{T}$	0.060	0.179	0.019	0.018	0.102	−0.003
			${MS}_{L}^{T}$	0.656	1.816	0.841	0.177	0.715	0.199
	Divergence	0.05	${AV}_{D}$	2.145	4.331	1.944	0.789	2.014	1.168
			${BS}_{D}$	1.145	1.034	3.150	1.083	1.118	0.135
			${MS}_{D}$	1.030	1.012	1.057	1.010	1.028	1.009
		0.10	${AV}_{D}$	2.154	4.344	1.989	0.790	2.027	1.170
			${BS}_{D}$	1.294	1.102	5.549	1.154	1.243	−0.601
			${MS}_{D}$	1.073	1.041	1.159	1.037	1.079	1.032
		0.20	${AV}_{D}$	2.171	4.372	2.121	0.791	2.055	1.174
			${BS}_{D}$	1.573	1.259	12.578	1.241	1.513	−1.889
			${MS}_{D}$	1.171	1.122	1.572	1.124	1.217	1.121

^(a) expressed as a multiplier of

10^{4}

.

Table A2. Simulated average estimate of VaR (expressed as a multiplier of

10^{4}

), relative bias and relative RMSE with

δ = 0.99

when the non-model selection approach is adopted for true models

F S (1.43, 834.95)

and

F R (1.05, 518.75)

. Values in bold indicate the results obtained under the true model.

Table A2. Simulated average estimate of VaR (expressed as a multiplier of

10^{4}

), relative bias and relative RMSE with

δ = 0.99

when the non-model selection approach is adopted for true models

F S (1.43, 834.95)

and

F R (1.05, 518.75)

. Values in bold indicate the results obtained under the true model.

True Model $(h)$	$n$	Method	$α$		Assumed Model
					$j = 1$	$j = 2$	$j = 3$	$j = 4$	$j = 5$	$j = 6$
					$FS$	$FR$	$LM$	$LN$	$PL$	$WE$
$F S (1.43, 834.95)$	50	Likelihood based		${AV}_{L}$	2.228	36.727	1.717	1.667	1.830	1.240
				${BS}_{L}$	1.000	228.288	−2.365	-2.691	−1.622	−5.509
				${MS}_{L}$	1.000	223.204	1.259	0.833	0.972	1.163
		Divergence based	0.05	${AV}_{D}$	2.238	41.121	1.621	1.620	1.784	1.027
				${BS}_{D}$	1.067	257.236	−2.999	−3.007	−1.921	−6.908
				${MS}_{D}$	1.015	149.300	1.149	0.807	0.943	1.105
			0.10	${AV}_{D}$	2.250	46.241	1.520	1.589	1.747	0.857
				${BS}_{D}$	1.146	290.970	−3.662	−3.209	−2.169	−8.030
				${MS}_{D}$	1.053	158.270	1.088	0.819	0.950	1.208
			0.20	${AV}_{D}$	2.276	71.710	1.328	1.565	1.695	0.676
				${BS}_{D}$	1.318	458.761	−4.926	−3.368	−2.513	−9.222
				${MS}_{D}$	1.165	418.429	1.069	0.884	1.027	1.357
	100	Likelihood based		${AV}_{L}$	2.136	31.208	1.609	1.620	1.728	1.242
				${BS}_{L}$	1.000	482.290	−7.734	−7.548	−5.767	−13.814
				${MS}_{L}$	1.000	115.236	1.276	1.011	1.049	1.544
		Divergence based	0.05	${AV}_{D}$	2.145	30.075	1.506	1.569	1.682	0.991
				${BS}_{D}$	1.145	463.527	−9.440	−8.399	−6.524	−17.961
				${MS}_{D}$	1.030	97.245	1.277	1.040	1.071	1.712
			0.10	${AV}_{D}$	2.154	30.388	1.403	1.534	1.643	0.832
				${BS}_{D}$	1.294	468.718	−11.146	−8.971	−7.172	−20.586
				${MS}_{D}$	1.073	74.042	1.332	1.080	1.111	1.921
			0.20	${AV}_{D}$	2.171	35.501	1.212	1.498	1.582	0.662
				${BS}_{D}$	1.573	553.361	−14.309	−9.560	−8.172	−23.407
				${MS}_{D}$	1.171	94.815	1.518	1.151	1.205	2.166
$F R (1.05, 518.75)$	50	Likelihood based		${AV}_{L}$	1.852	4.596	2.866	1.695	1.883	2.248
				${BS}_{L}$	−5.095	1.000	−2.843	−5.443	−5.025	−4.216
				${MS}_{L}$	0.771	1.000	1.063	0.828	0.804	1.380
		Divergence based	0.05	${AV}_{D}$	1.620	4.613	2.439	1.362	1.572	1.335
				${BS}_{D}$	−5.609	1.037	−3.790	−6.184	−5.717	−6.242
				${MS}_{D}$	0.810	1.002	0.909	0.875	0.838	0.897
			0.10	${AV}_{D}$	1.447	4.653	2.077	1.151	1.338	0.924
				${BS}_{D}$	−5.994	1.126	−4.596	−6.652	−6.235	−7.155
				${MS}_{D}$	0.849	1.030	0.844	0.926	0.884	0.992
			0.20	${AV}_{D}$	1.213	4.755	1.518	0.903	1.026	0.578
				${BS}_{D}$	−6.514	1.352	−5.836	−7.202	−6.927	−7.924
				${MS}_{D}$	0.909	1.126	0.867	0.994	0.962	1.088
	100	Likelihood based		${AV}_{L}$	1.772	4.325	2.589	1.604	1.773	2.082
				${BS}_{L}$	−13.255	1.000	−8.695	−14.190	−13.247	−11.523
				${MS}_{L}$	1.357	1.000	1.183	1.447	1.374	1.741
		Divergence based	0.05	${AV}_{D}$	1.562	4.331	2.238	1.313	1.500	1.263
				${BS}_{D}$	−14.427	1.034	−10.651	−15.817	−14.774	−16.098
				${MS}_{D}$	1.454	1.012	1.237	1.579	1.494	1.614
			0.10	${AV}_{D}$	1.399	4.344	1.928	1.117	1.286	0.890
				${BS}_{D}$	−15.340	1.102	−12.386	−16.909	−15.967	−18.180
				${MS}_{D}$	1.535	1.041	1.335	1.679	1.597	1.801
			0.20	${AV}_{D}$	1.171	4.372	1.430	0.880	0.991	0.568
				${BS}_{D}$	−16.608	1.259	−15.165	−18.234	−17.617	−19.975
				${MS}_{D}$	1.651	1.122	1.541	1.803	1.748	1.972

Table A3. Simulated average VaR estiamtes (expressed as a multiplier of

10^{4}

), relative bias and relative RMSE with

δ = 0.99

when non-model selection approach is adopted for true models

L M (2.04, 2202.85)

and

L N (1.19, 6.81)

. Values in bold indicate the results obtained under the true model.

Table A3. Simulated average VaR estiamtes (expressed as a multiplier of

10^{4}

), relative bias and relative RMSE with

δ = 0.99

when non-model selection approach is adopted for true models

L M (2.04, 2202.85)

and

L N (1.19, 6.81)

. Values in bold indicate the results obtained under the true model.

True Model $(h)$	$n$	Method	$α$		Assumed Model
					$j = 1$	$j = 2$	$j = 3$	$j = 4$	$j = 5$	$j = 6$
					$FS$	$FR$	$LM$	$LN$	$PL$	$WE$
$L M (2.04, 2202.85)$	50	Likelihood based		${AV}_{L}$	4.016	687.827	2.014	2.681	3.158	1.327
				${BS}_{L}$	16.594	5.34 $\times 10^{3}$	1.000	6.201	9.913	–4.349
				${MS}_{L}$	2.087	5.20 $\times 10^{3}$	1.000	1.084	1.587	0.694
		Divergence based	0.05	${AV}_{D}$	4.393	2.36 $\times 10^{3}$	2.080	2.858	3.477	1.206
				${BS}_{D}$	19.534	1.84 $\times 10^{4}$	1.518	7.575	12.396	−5.291
				${MS}_{D}$	2.486	3.03 $\times 10^{4}$	1.077	1.285	1.992	0.585
			0.10	${AV}_{D}$	4.781	799.436	2.165	3.063	3.845	1.101
				${BS}_{D}$	22.553	6.21 $\times 10^{3}$	2.177	9.176	15.268	−6.109
				${MS}_{D}$	2.934	3.20 $\times 10^{3}$	1.271	1.567	2.543	0.623
			0.20	${AV}_{D}$	5.573	4.34 $\times 10^{3}$	2.493	3.607	4.769	0.962
				${BS}_{D}$	28.725	3.38 $\times 10^{4}$	4.733	13.411	22.462	−7.192
				${MS}_{D}$	4.009	3.27 $\times 10^{4}$	2.324	2.621	4.336	0.701
	100	Likelihood based		${AV}_{L}$	3.963	227.069	1.904	2.629	3.066	1.306
				${BS}_{L}$	110.731	1.20 $\times 10^{4}$	1.000	39.665	62.930	−30.905
				${MS}_{L}$	2.948	1.32 $\times 10^{3}$	1.000	1.343	1.979	0.852
		Divergence based	0.05	${AV}_{D}$	4.323	269.232	1.944	2.787	3.349	1.189
				${BS}_{D}$	129.950	1.43 $\times 10^{4}$	3.150	48.080	78.030	-37.124
				${MS}_{D}$	3.467	1.08 $\times 10^{3}$	1.057	1.573	2.417	0.905
			0.10	${AV}_{D}$	4.680	341.819	1.989	2.964	3.655	1.091
				${BS}_{D}$	148.990	1.81 $\times 10^{4}$	5.549	57.516	94.355	−42.327
				${MS}_{D}$	4.009	1.53 $\times 10^{3}$	1.159	1.857	2.937	0.991
			0.20	${AV}_{D}$	5.359	755.957	2.121	3.369	4.321	0.956
				${BS}_{D}$	185.177	4.02 $\times 10^{4}$	12.578	79.093	129.827	−49.531
				${MS}_{D}$	5.120	6.52 $\times 10^{3}$	1.572	2.598	4.239	1.133
$L N (1.19, 6.81)$	50	Likelihood based		${AV}_{L}$	1.138	6.933	0.705	0.779	0.889	0.519
				${BS}_{L}$	40.417	676.477	−7.122	1.000	13.163	−27.521
				${MS}_{L}$	2.166	34.744	1.315	1.000	1.472	1.154
		Divergence gence	0.05	${AV}_{D}$	1.170	8.847	0.703	0.784	0.901	0.485
				${BS}_{D}$	43.985	886.568	−7.346	1.617	14.414	−31.273
				${MS}_{D}$	2.331	49.121	1.329	1.029	1.544	1.208
			0.10	${AV}_{D}$	1.196	10.905	0.695	0.790	0.907	0.448
				${BS}_{D}$	46.764	1.11 $\times 10^{3}$	−8.226	2.209	15.133	−35.335
				${MS}_{D}$	2.480	67.405	1.346	1.075	1.616	1.302
			0.20	${AV}_{D}$	1.228	15.344	0.670	0.800	0.909	0.386
				${BS}_{D}$	50.304	1.60 $\times 10^{3}$	−10.913	3.324	15.278	−42.048
				${MS}_{D}$	2.726	121.488	1.383	1.199	1.755	1.491
	100	Likelihood based		${AV}_{L}$	1.137	6.732	0.720	0.787	0.891	0.538
				${BS}_{L}$	20.945	340.029	−2.821	1.000	6.946	−13.194
				${MS}_{L}$	2.631	40.628	1.352	1.000	1.529	1.502
		Divergence based	0.05	${AV}_{D}$	1.164	8.029	0.711	0.789	0.897	0.497
				${BS}_{D}$	22.471	414.027	−3.356	1.083	7.283	−15.548
				${MS}_{D}$	2.808	49.078	1.332	1.010	1.577	1.650
			0.10	${AV}_{D}$	1.183	9.265	0.697	0.790	0.898	0.456
				${BS}_{D}$	23.592	484.521	−4.158	1.154	7.337	−17.875
				${MS}_{D}$	2.955	57.909	1.325	1.037	1.618	1.838
			0.20	${AV}_{D}$	1.204	11.749	0.659	0.791	0.889	0.390
				${BS}_{D}$	24.805	626.190	−6.330	1.241	6.800	−21.625
				${MS}_{D}$	3.163	77.375	1.351	1.124	1.683	2.175

Table A4. Simulated average VaR estimates (expressed as a multiplier of

10^{4}

), relative bias and relative RMSE with

δ = 0.99

when non-model selection approach is adopted for true models

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

. Values in bold indicate the results obtained under the true model.

Table A4. Simulated average VaR estimates (expressed as a multiplier of

10^{4}

), relative bias and relative RMSE with

δ = 0.99

when non-model selection approach is adopted for true models

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

. Values in bold indicate the results obtained under the true model.

True Model $(h)$	$n$	Method	$α$		Assumed Model
					$j = 1$	$j = 2$	$j = 3$	$j = 4$	$j = 5$	$j = 6$
					$FS$	$FR$	$LM$	$LN$	$PL$	$WE$
$P L (1.27, 1117.04)$	50	Likelihood based		${AV}_{L}$	2.603	84.381	1.721	1.875	2.077	1.219
				${BS}_{L}$	3.977	466.850	−1.016	−0.143	1.000	−3.857
				${MS}_{L}$	1.201	470.828	1.110	0.727	1.000	0.910
		Divergence based	0.05	${AV}_{D}$	2.685	87.852	1.674	1.880	2.099	1.047
				${BS}_{D}$	4.443	486.497	−1.280	−0.114	1.124	−4.829
				${MS}_{D}$	1.285	295.427	1.088	0.721	1.024	0.798
			0.10	${AV}_{D}$	2.764	111.779	1.623	1.898	2.124	0.914
				${BS}_{D}$	4.885	621.925	−1.568	−0.016	1.264	−5.586
				${MS}_{D}$	1.385	543.075	1.102	0.752	1.076	0.860
			0.20	${AV}_{D}$	2.905	146.298	1.521	1.957	2.181	0.749
				${BS}_{D}$	5.685	817.306	−2.147	0.319	1.588	−6.516
				${MS}_{D}$	1.611	572.347	1.202	0.891	1.259	0.972
	100	Likelihood based		${AV}_{L}$	2.547	57.314	1.645	1.854	2.002	1.219
				${BS}_{L}$	6.348	543.883	−2.504	−0.455	1.000	−6.687
				${MS}_{L}$	1.444	322.779	1.065	0.793	1.000	1.158
		Divergence based	0.05	${AV}_{D}$	2.623	54.594	1.579	1.853	2.014	1.044
				${BS}_{D}$	7.089	517.186	−3.153	−0.470	1.118	−8.407
				${MS}_{D}$	1.567	131.347	1.029	0.791	1.028	1.269
			0.10	${AV}_{D}$	2.692	57.748	1.507	1.860	2.027	0.908
				${BS}_{D}$	7.774	548.144	−3.861	−0.399	1.243	−9.741
				${MS}_{D}$	1.696	131.301	1.027	0.817	1.079	1.422
			0.20	${AV}_{D}$	2.813	69.931	1.361	1.894	2.055	0.743
				${BS}_{D}$	8.955	667.718	−5.290	−0.066	1.513	−11.362
				${MS}_{D}$	1.944	172.143	1.100	0.918	1.217	1.634
$W E (0.79, 1690.57)$	50	Likelihood based		${AV}_{L}$	5.835	1.08 $\times 10^{4}$	1.488	3.732	4.269	1.167
				${BS}_{L}$	−2.95 $\times 10^{3}$	−6.81 $\times 10^{6}$	−202.110	−1.62 $\times 10^{3}$	−1.96 $\times 10^{3}$	1.000
				${MS}_{L}$	20.235	7.02 $\times 10^{5}$	2.721	12.039	14.748	1.000
		Divergence gence	0.05	${AV}_{D}$	7.056	2.24 $\times 10^{5}$	1.697	4.427	5.473	1.175
				${BS}_{D}$	−3.72 $\times 10^{3}$	−1.41 $\times 10^{8}$	−334.238	−2.06 $\times 10^{3}$	−2.72 $\times 10^{3}$	−3.974
				${MS}_{D}$	26.742	2.44 $\times 10^{7}$	4.323	16.692	22.635	1.028
			0.10	${AV}_{D}$	8.534	1.47 $\times 10^{7}$	2.070	5.408	7.306	1.181
				${BS}_{D}$	−4.66 $\times 10^{3}$	−9.33 $\times 10^{9}$	−570.471	−2.68 $\times 10^{3}$	−3.88 $\times 10^{3}$	−8.247
				${MS}_{D}$	36.210	1.70 $\times 10^{9}$	9.252	28.572	40.200	1.076
			0.20	${AV}_{D}$	13.312	2.51 $\times 10^{11}$	7.310	18.255	50.427	1.194
				${BS}_{D}$	−7.68 $\times 10^{3}$	−1.59 $\times 10^{14}$	−3.88 $\times 10^{3}$	−1.08 $\times 10^{4}$	−3.12 $\times 10^{4}$	−16.199
				${MS}_{D}$	143.473	2.89 $\times 10^{13}$	182.046	960.227	4.16 $\times 10^{3}$	1.268
	100	Likelihood based		${AV}_{L}$	5.561	1.97 $\times 10^{3}$	1.414	3.613	3.942	1.165
				${BS}_{L}$	−1.41 $\times 10^{3}$	−6.31 $\times 10^{5}$	−78.692	−784.091	−889.330	1.000
				${MS}_{L}$	23.902	4.97 $\times 10^{4}$	2.476	13.696	15.796	1.000
		Divergence based	0.05	${AV}_{D}$	6.586	2.54 $\times 10^{3}$	1.540	4.129	4.826	1.168
				${BS}_{D}$	−1.74 $\times 10^{3}$	−8.15 $\times 10^{5}$	−119.045	−949.515	−1.17 $\times 10^{3}$	0.135
				${MS}_{D}$	29.921	1.12 $\times 10^{5}$	3.390	16.782	21.422	1.009
			0.10	${AV}_{D}$	7.740	3.55 $\times 10^{3}$	1.715	4.745	5.958	1.170
				${BS}_{D}$	−2.11 $\times 10^{3}$	−1.14 $\times 10^{6}$	−175.210	−1.15 $\times 10^{3}$	−1.54 $\times 10^{3}$	−0.601
				${MS}_{D}$	36.956	1.57 $\times 10^{5}$	4.954	20.747	29.188	1.032
			0.20	${AV}_{D}$	10.388	9.62 $\times 10^{3}$	2.402	6.405	9.232	1.174
				${BS}_{D}$	−2.96 $\times 10^{3}$	−3.08 $\times 10^{6}$	−395.750	−1.68 $\times 10^{3}$	−2.59 $\times 10^{3}$	−1.889
				${MS}_{D}$	54.636	3.82 $\times 10^{5}$	15.422	33.644	56.212	1.121

Table A5. VaR estimates (expressed as a multiplier of

10^{4}

), relative biases and relative RMSEs for

k^{*} = 1, δ = 0.99

, when true models are

F S (1.43, 834.95)

,

F R (1.05, 518.75)

,

L M (2.04, 2202.85)

,

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

. Values in bold indicate the smallest

M S_{S}

among all the estimators.

Table A5. VaR estimates (expressed as a multiplier of

10^{4}

), relative biases and relative RMSEs for

k^{*} = 1, δ = 0.99

, when true models are

F S (1.43, 834.95)

,

F R (1.05, 518.75)

,

L M (2.04, 2202.85)

,

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

. Values in bold indicate the smallest

M S_{S}

among all the estimators.

n		Likelihood Based	Divergence Based			Likelihood Based	Divergence Based
n		Likelihood Based	$α = 0.05$	$α = 0.10$	$α = 0.20$	Likelihood Based	$α = 0.05$	$α = 0.10$	$α = 0.20$
		True model: $F S (1.43, 834.95)$				True model: $L N (1.19, 6.81)$
		True $Q_{1} (0.99) = 2.076$				True $Q_{4} (0.99) = 0.770$
50	${AV}_{s}$	1.130	1.146	1.187	1.255	0.481	0.542	0.597	0.722
	${BS}_{s}$	−6.235	−6.127	−5.854	−5.409	−31.646	−24.982	−18.901	−5.191
	${MS}_{s}$	2.324	2.477	2.734	3.331	3.043	3.755	4.476	5.846
100	${AV}_{s}$	0.949	0.962	0.945	0.929	0.437	0.469	0.506	0.632
	${BS}_{s}$	−18.654	−18.437	−18.725	−18.995	−18.949	−17.164	−15.006	−7.862
	${MS}_{s}$	2.587	2.678	2.701	2.794	3.675	4.113	4.790	7.318
		True model: $F R (1.05, 518.75)$				True model: $P L (1.27, 1117.04)$
		True $Q_{2} (0.99) = 4.146$				True $Q_{5} (0.99) = 1.900$
50	${AV}_{s}$	2.097	2.085	2.091	2.056	1.011	1.096	1.118	1.264
	${BS}_{s}$	−4.550	−4.577	−4.564	−4.641	−5.033	−4.553	−4.428	−3.605
	${MS}_{s}$	1.143	1.137	1.147	1.165	1.509	2.095	2.309	3.120
100	${AV}_{s}$	2.101	2.103	2.107	2.106	0.963	0.968	0.964	0.969
	${BS}_{s}$	−11.420	−11.405	−11.387	−11.388	−9.202	−9.155	−9.189	−9.147
	${MS}_{s}$	1.774	1.775	1.781	1.806	2.134	2.153	2.155	2.212
		True model: $L M (2.04, 2202.85)$				True model: $W E (0.79, 1690.57)$
		True $Q_{3} (0.99) = 1.885$				True $Q_{6} (0.99) = 1.168$
50	${AV}_{s}$	0.985	1.067	1.137	1.381	0.646	0.674	0.715	0.852
	${BS}_{s}$	−7.015	−6.372	−5.826	−3.930	330.374	312.732	286.627	200.286
	${MS}_{s}$	1.199	1.404	1.684	2.498	3.238	3.321	3.467	4.616
100	${AV}_{s}$	0.972	1.007	1.043	1.148	0.606	0.619	0.637	0.690
	${BS}_{s}$	−48.666	−46.801	−44.888	−39.318	180.380	176.138	170.449	153.560
	${MS}_{s}$	1.749	1.810	1.865	2.223	4.238	4.278	4.352	4.729

Table A6. Proportion of models included for the VaR estimates reported in Table A5 for sample sizes 50 and 100 when the true models are

F S (1.43, 834.95)

,

F R (1.05, 518.75)

,

L M (2.04, 2202.85)

,

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

. Values in bold indicate the results obtained under the true model.

Table A6. Proportion of models included for the VaR estimates reported in Table A5 for sample sizes 50 and 100 when the true models are

F S (1.43, 834.95)

,

F R (1.05, 518.75)

,

L M (2.04, 2202.85)

,

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

. Values in bold indicate the results obtained under the true model.

True Model	n	Method	$α$	Assumed Model
True Model	n	Method	$α$	$FS$	$FR$	$LM$	$LN$	$PL$	$WE$
$F S (1.43, 834.95)$	50	Likelihood		0.281	0.102	0.073	0.320	0.180	0.044
		Divergence	0.05	0.284	0.100	0.072	0.308	0.199	0.037
			0.10	0.287	0.103	0.072	0.294	0.210	0.034
			0.20	0.273	0.103	0.064	0.304	0.222	0.034
	100	Likelihood		0.408	0.028	0.054	0.271	0.234	0.005
		Divergence	0.05	0.405	0.032	0.043	0.264	0.252	0.004
			0.10	0.394	0.029	0.043	0.262	0.268	0.004
			0.20	0.358	0.029	0.04	0.295	0.272	0.006
$F R (1.05, 518.75)$	50	Likelihood		0.031	0.887	0.000	0.082	0.000	0.000
		Divergence	0.05	0.035	0.892	0.000	0.073	0.000	0.000
			0.10	0.040	0.895	0.000	0.065	0.000	0.000
			0.20	0.050	0.882	0.000	0.067	0.001	0.000
	100	Likelihood		0.012	0.961	0.000	0.027	0.000	0.000
		Divergence	0.05	0.015	0.964	0.000	0.021	0.000	0.000
			0.10	0.017	0.966	0.000	0.017	0.000	0.000
			0.20	0.017	0.961	0.000	0.022	0.000	0.000
$L M (2.04, 2202.85)$	50	Likelihood		0.063	0.007	0.341	0.257	0.091	0.241
		Divergence	0.05	0.070	0.014	0.337	0.260	0.099	0.220
			0.10	0.076	0.016	0.307	0.264	0.115	0.222
			0.20	0.088	0.031	0.270	0.264	0.124	0.223
	100	Likelihood		0.042	0.000	0.514	0.145	0.139	0.160
		Divergence	0.05	0.048	0.001	0.511	0.144	0.146	0.150
			0.10	0.056	0.002	0.497	0.155	0.146	0.144
			0.20	0.074	0.004	0.436	0.177	0.152	0.157
$L N (1.19, 6.81)$	50	Likelihood		0.089	0.096	0.062	0.601	0.094	0.058
		Divergence	0.05	0.097	0.115	0.053	0.586	0.104	0.045
			0.10	0.101	0.126	0.046	0.577	0.115	0.035
			0.20	0.112	0.156	0.031	0.547	0.125	0.029
	100	Likelihood		0.106	0.031	0.026	0.730	0.094	0.013
		Divergence	0.05	0.119	0.044	0.017	0.710	0.100	0.010
			0.10	0.131	0.056	0.016	0.693	0.097	0.007
			0.20	0.145	0.088	0.009	0.655	0.099	0.004
$P L (1.27, 1117.04)$	50	Likelihood		0.184	0.051	0.158	0.324	0.210	0.073
		Divergence	0.05	0.194	0.058	0.154	0.310	0.220	0.064
			0.10	0.195	0.060	0.151	0.310	0.225	0.059
			0.20	0.197	0.070	0.144	0.303	0.221	0.065
	100	Likelihood		0.238	0.011	0.177	0.243	0.320	0.011
		Divergence	0.05	0.231	0.011	0.166	0.244	0.340	0.008
			0.10	0.218	0.010	0.155	0.253	0.355	0.009
			0.20	0.216	0.011	0.154	0.256	0.350	0.013
$W E (0.79, 1690.57)$	50	Likelihood		0.000	0.000	0.123	0.063	0.018	0.796
		Divergence	0.05	0.003	0.000	0.132	0.073	0.019	0.773
			0.10	0.006	0.000	0.137	0.093	0.022	0.742
			0.20	0.014	0.002	0.149	0.132	0.024	0.679
	100	Likelihood		0.001	0.000	0.140	0.005	0.000	0.854
		Divergence	0.05	0.001	0.000	0.148	0.011	0.001	0.839
			0.10	0.001	0.000	0.153	0.015	0.004	0.827
			0.20	0.002	0.000	0.156	0.032	0.005	0.805

Table A7. VaR estimates (expressed as a multiplier of

10^{4}

), relative biases and relative RMSEs for

k^{*} = 2, δ = 0.99

, when true models are

F S (1.43, 834.95)

,

F R (1.05, 518.75)

,

L M (2.04, 2202.85)

,

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

. Values in bold indicate the smallest

M S_{k^{*}}

among all the estimators.

Table A7. VaR estimates (expressed as a multiplier of

10^{4}

), relative biases and relative RMSEs for

k^{*} = 2, δ = 0.99

, when true models are

F S (1.43, 834.95)

,

F R (1.05, 518.75)

,

L M (2.04, 2202.85)

,

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

. Values in bold indicate the smallest

M S_{k^{*}}

among all the estimators.

n		Likelihood Based	Divergence Based			Likelihood Based	Divergence Based
n		Likelihood Based	$α = 0.05$	$α = 0.10$	$α = 0.20$	Likelihood Based	$α = 0.05$	$α = 0.10$	$α = 0.20$
		True model: $F S (1.43, 834.95)$				True model: $L N (1.19, 6.81)$
		True $Q_{1} (0.99) = 2.076$				True $Q_{4} (0.99) = 0.770$
50	${AV}_{k^{*}}$	1.077	1.124	1.170	1.292	0.499	0.563	0.642	0.816
	${BS}_{k^{*}}$	−6.584	−6.269	−5.970	−5.162	−29.730	−22.714	−13.959	5.104
	${MS}_{k^{*}}$	1.306	1.355	1.429	1.676	1.640	1.804	2.189	3.209
100	${AV}_{k^{*}}$	0.956	0.948	0.959	0.968	0.483	0.537	0.586	0.719
	${BS}_{k^{*}}$	−18.547	−18.680	−18.491	−18.349	−16.350	−13.259	−10.452	−2.882
	${MS}_{k^{*}}$	1.843	1.853	1.857	1.893	2.044	2.253	2.495	3.568
		True model: $F R (1.05, 518.75)$				True model: $P L (1.27, 1117.04)$
		True $Q_{2} (0.99) = 4.146$				True $Q_{5} (0.99) = 1.900$
50	${AV}_{k^{*}}$	1.541	1.474	1.420	1.358	1.054	1.117	1.176	1.409
	${BS}_{k^{*}}$	−5.785	−5.935	−6.054	−6.192	−4.790	−4.436	−4.099	−2.783
	${MS}_{k^{*}}$	0.892	0.888	0.881	0.901	1.024	1.079	1.190	2.002
100	${AV}_{k^{*}}$	1.485	1.436	1.390	1.311	0.992	1.021	1.055	1.133
	${BS}_{k^{*}}$	−14.859	−15.131	−15.389	−15.831	−8.916	−8.636	−8.297	−7.532
	${MS}_{k^{*}}$	1.503	1.526	1.549	1.592	1.395	1.396	1.399	1.576
		True model: $L M (2.04, 2202.85)$				True model: $W E (0.79, 1690.57)$
		True $Q_{3} (0.99) = 1.885$				True $Q_{6} (0.99) = 1.168$
50	${AV}_{k^{*}}$	1.120	1.225	1.396	1.876	0.761	0.908	1.196	2.774
	${BS}_{k^{*}}$	−5.958	−5.141	−3.812	−0.074	257.901	164.620	−17.435	−1015.696
	${MS}_{k^{*}}$	0.812	0.865	1.141	2.350	2.510	3.508	6.069	38.488
100	${AV}_{k^{*}}$	1.081	1.151	1.226	1.406	0.659	0.725	0.837	1.393
	${BS}_{k^{*}}$	−42.852	−39.123	−35.150	−25.559	163.240	142.046	106.259	−71.952
	${MS}_{k^{*}}$	1.132	1.117	1.118	1.590	2.705	2.945	3.720	9.341

Table A8. VaR estimates (expressed as a multiplier of

10^{4}

), relative biases and relative RMSEs for

k^{*} = 3, δ = 0.99

, when true models are

F S (1.43, 834.95)

,

F R (1.05, 518.75)

,

L M (2.04, 2202.85)

,

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

. Values in bold indicate the smallest

M S_{k^{*}}

among all the estimators.

Table A8. VaR estimates (expressed as a multiplier of

10^{4}

), relative biases and relative RMSEs for

k^{*} = 3, δ = 0.99

, when true models are

F S (1.43, 834.95)

,

F R (1.05, 518.75)

,

L M (2.04, 2202.85)

,

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

. Values in bold indicate the smallest

M S_{k^{*}}

among all the estimators.

n		Likelihood Based	Divergence Based			Likelihood Based	Divergence Based
n		Likelihood Based	$α = 0.05$	$α = 0.10$	$α = 0.20$	Likelihood Based	$α = 0.05$	$α = 0.10$	$α = 0.20$
		True model: $F S (1.43, 834.95)$				True model: $L N (1.19, 6.81)$
		True $Q_{1} (0.99) = 2.076$				True $Q_{4} (0.99) = 0.770$
50	${AV}_{k^{*}}$	0.713	0.730	0.753	0.801	0.332	0.367	0.410	0.498
	${BS}_{k^{*}}$	−8.976	−8.868	−8.717	−8.399	−48.002	−44.178	−39.463	−29.769
	${MS}_{k^{*}}$	1.402	1.398	1.401	1.439	1.789	1.742	1.748	1.893
100	${AV}_{k^{*}}$	0.633	0.626	0.624	0.643	0.330	0.357	0.393	0.461
	${BS}_{k^{*}}$	−23.890	−23.996	−24.034	−23.722	−25.090	−23.505	−21.499	−17.606
	${MS}_{k^{*}}$	2.240	2.248	2.253	2.280	2.601	2.526	2.462	2.548
		True model: $F R (1.05, 518.75)$				True model: $P L (1.27, 1117.04)$
		True $Q_{2} (0.99) = 4.146$				True $Q_{5} (0.99) = 1.900$
50	${AV}_{k^{*}}$	0.906	0.838	0.797	0.747	0.718	0.752	0.788	0.901
	${BS}_{k^{*}}$	−7.195	−7.347	−7.438	−7.548	−6.693	−6.498	−6.297	−5.659
	${MS}_{k^{*}}$	1.012	1.023	1.033	1.046	1.074	1.075	1.077	1.286
100	${AV}_{k^{*}}$	0.853	0.802	0.761	0.712	0.677	0.688	0.702	0.743
	${BS}_{k^{*}}$	−18.388	−18.671	−18.898	−19.172	−12.012	−11.896	−11.765	−11.364
	${MS}_{k^{*}}$	1.823	1.849	1.870	1.897	1.749	1.740	1.728	1.730
		True model: $L M (2.04, 2202.85)$				True model: $W E (0.79, 1690.57)$
		True $Q_{3} (0.99) = 1.885$				True $Q_{6} (0.99) = 1.168$
50	${AV}_{k^{*}}$	0.849	0.927	1.045	1.367	0.777	0.975	1.283	6.694
	${BS}_{k^{*}}$	−8.074	−7.466	−6.544	−4.039	247.431	121.999	−72.393	−3494.306
	${MS}_{k^{*}}$	0.844	0.829	0.871	1.297	2.163	3.124	5.933	468.866
100	${AV}_{k^{*}}$	0.824	0.891	0.956	1.111	0.718	0.842	1.017	1.587
	${BS}_{k^{*}}$	−56.578	−52.988	−49.546	−41.279	144.572	104.563	48.518	−134.183
	${MS}_{k^{*}}$	1.325	1.264	1.219	1.255	2.545	2.450	2.947	7.726

Table A9. Proportion of inclusion of true model as per order of preference for the VaR estimates reported in Table A7 for sample sizes 50 and 100 when the true models are

F S (1.43, 834.95)

,

F R (1.05, 518.75)

,

L M (2.04, 2202.85)

,

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

.

Table A9. Proportion of inclusion of true model as per order of preference for the VaR estimates reported in Table A7 for sample sizes 50 and 100 when the true models are

F S (1.43, 834.95)

,

F R (1.05, 518.75)

,

L M (2.04, 2202.85)

,

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

.

n	Estimator	$α$	Order of Preference			Order of Preference
n	Estimator	$α$	0 ^$τ$	1	2	0 ^$τ$	1	2
			True model: $F S (1.43, 834.95)$			True model: $L N (1.19, 6.81)$
50	Likelihood		0.391	0.281	0.328	0.240	0.601	0.159
	Divergence	0.05	0.362	0.284	0.354	0.236	0.586	0.178
		0.10	0.357	0.287	0.356	0.236	0.577	0.187
		0.20	0.360	0.273	0.367	0.243	0.547	0.210
100	Likelihood		0.231	0.408	0.361	0.152	0.730	0.118
	Divergence	0.05	0.208	0.405	0.387	0.161	0.710	0.129
		0.10	0.185	0.394	0.421	0.168	0.693	0.139
		0.20	0.193	0.358	0.449	0.175	0.655	0.170
			True model: $F R (1.05, 518.75)$			True model: $P L (1.27, 1117.04)$
50	Likelihood		0.065	0.887	0.048	0.446	0.210	0.344
		0.05	0.062	0.892	0.046	0.434	0.220	0.346
	Divergence	0.10	0.058	0.895	0.047	0.437	0.225	0.338
	Likelihood	0.20	0.063	0.882	0.055	0.470	0.221	0.309
	Likelihood		0.013	0.961	0.026	0.236	0.320	0.444
100	Divergence	0.05	0.013	0.964	0.023	0.239	0.340	0.421
		0.10	0.013	0.966	0.021	0.251	0.355	0.394
		0.20	0.021	0.961	0.018	0.289	0.350	0.361
			True model: $L M (2.04, 2202.85)$			True model: $W E (0.79, 1690.57)$
50	Likelihood		0.322	0.341	0.337	0.101	0.796	0.103
	Divergence	0.05	0.351	0.337	0.312	0.122	0.773	0.105
		0.10	0.388	0.307	0.305	0.147	0.742	0.111
		0.20	0.451	0.270	0.279	0.206	0.679	0.115
100	Likelihood		0.210	0.514	0.276	0.034	0.854	0.112
	Divergence	0.05	0.237	0.511	0.252	0.043	0.839	0.118
		0.10	0.270	0.497	0.233	0.060	0.827	0.113
		0.20	0.339	0.436	0.225	0.080	0.805	0.115

^$τ$ non-inclusion of the VaR estimate of the true model.

Table A10. Proportion of inclusion of true model as per order of preference for the VaR estimates reported in Table A8 for sample sizes 50 and 100 when the true models are

F S (1.43, 834.95)

,

F R (1.05, 518.75)

,

L M (2.04, 2202.85)

,

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

.

Table A10. Proportion of inclusion of true model as per order of preference for the VaR estimates reported in Table A8 for sample sizes 50 and 100 when the true models are

F S (1.43, 834.95)

,

F R (1.05, 518.75)

,

L M (2.04, 2202.85)

,

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

.

n	Estimator	$α$	Order of Preference				Order of Preference
n	Estimator	$α$	0 ^$τ$	1	2	3	0 ^$τ$	1	2	3
			True model: $F S (1.43, 834.95)$				True model: $L N (1.19, 6.81)$
50	Likelihood		0.172	0.281	0.328	0.219	0.096	0.601	0.159	0.144
	Divergence	0.05	0.131	0.284	0.354	0.231	0.088	0.586	0.178	0.148
		0.10	0.108	0.287	0.356	0.249	0.089	0.577	0.187	0.147
		0.20	0.094	0.273	0.367	0.266	0.090	0.547	0.210	0.153
100	Likelihood		0.064	0.408	0.361	0.167	0.035	0.730	0.118	0.117
	Divergence	0.05	0.046	0.405	0.387	0.162	0.036	0.710	0.129	0.125
		0.10	0.040	0.394	0.421	0.145	0.030	0.693	0.139	0.138
		0.20	0.036	0.358	0.449	0.157	0.028	0.655	0.170	0.147
			True model: $F R (1.05, 518.75)$				True model: $P L (1.27, 1117.04)$
50	Likelihood		0.042	0.887	0.048	0.023	0.125	0.210	0.344	0.321
	Divergence	0.05	0.038	0.892	0.046	0.024	0.122	0.220	0.346	0.312
		0.10	0.033	0.895	0.047	0.025	0.129	0.225	0.338	0.308
		0.20	0.033	0.882	0.055	0.030	0.148	0.221	0.309	0.322
	Likelihood		0.008	0.961	0.026	0.005	0.034	0.320	0.444	0.202
100	Divergence	0.05	0.006	0.964	0.023	0.007	0.034	0.340	0.421	0.205
		0.10	0.005	0.966	0.021	0.008	0.035	0.355	0.394	0.216
		0.20	0.007	0.961	0.018	0.014	0.046	0.350	0.361	0.243
			True model: $L M (2.04, 2202.85)$				True model: $W E (0.79, 1690.57)$
50	Likelihood		0.204	0.341	0.337	0.118	0.070	0.796	0.103	0.031
	Divergence	0.05	0.234	0.337	0.312	0.117	0.090	0.773	0.105	0.032
		0.10	0.268	0.307	0.305	0.120	0.119	0.742	0.111	0.028
		0.20	0.329	0.270	0.279	0.122	0.174	0.679	0.115	0.032
100	Likelihood		0.102	0.514	0.276	0.108	0.022	0.854	0.112	0.012
	Divergence	0.05	0.126	0.511	0.252	0.111	0.030	0.839	0.118	0.013
		0.10	0.152	0.497	0.233	0.118	0.039	0.827	0.113	0.021
		0.20	0.210	0.436	0.225	0.129	0.067	0.805	0.115	0.013

^$τ$ non-inclusion of the VaR estimate of the true model.

Appendix B. VaR Estimation for δ = 0.95 Without Contamination

Table A11. Simulated average estimate of VaR with associated bias and RMSEs (relative bias and relative RMSEs of divergence estimators) for

δ = 0.95

, when the true models are

F S (1.43, 834.95)

,

F R (1.05, 518.75)

,

L M (2.04, 2202.85)

,

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

, denoted as

F S

,

F R

,

L M

,

L N

,

P L

and

W E

respectively. Values in bold indicate the results obtained under the true model.

Table A11. Simulated average estimate of VaR with associated bias and RMSEs (relative bias and relative RMSEs of divergence estimators) for

δ = 0.95

, when the true models are

F S (1.43, 834.95)

,

F R (1.05, 518.75)

,

L M (2.04, 2202.85)

,

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

, denoted as

F S

,

F R

,

L M

,

L N

,

P L

and

W E

respectively. Values in bold indicate the results obtained under the true model.

n	Estimator	$α$	Statistics ^(a)	Model
				$j = 1$	$j = 2$	$j = 3$	$j = 4$	$j = 5$	$j = 6$
				$FS$	$FR$	$LM$	$LN$	$PL$	$WE$
			$Q_{j} (δ)$	0.654	0.857	0.736	0.342	0.662	0.678
50			${AV}_{L}$	0.676	0.900	0.738	0.343	0.682	0.675
	Likelihood		${BS}_{L}^{T}$	0.022	0.060	0.001	0.001	0.021	−0.003
			${MS}_{L}^{T}$	0.217	0.405	0.251	0.092	0.236	0.136
	Divergence	0.05	${AV}_{D}$	0.678	0.900	0.747	0.344	0.686	0.679
			${BS}_{D}$	1.075	1.004	8.170	3.643	1.171	−0.221
			${MS}_{D}$	1.010	1.013	1.029	1.024	1.018	1.019
		0.10	${AV}_{D}$	0.680	0.901	0.758	0.346	0.689	0.681
			${BS}_{D}$	1.149	1.026	15.599	6.036	1.336	−1.230
			${MS}_{D}$	1.037	1.046	1.104	1.061	1.056	1.051
		0.20	${AV}_{D}$	0.682	0.904	0.790	0.349	0.696	0.686
			${BS}_{D}$	1.282	1.096	39.296	10.224	1.664	−2.911
			${MS}_{D}$	1.116	1.134	1.463	1.165	1.176	1.184
100			${AV}_{L}$	0.664	0.893	0.727	0.348	0.678	0.676
	Likelihood		${BS}_{L}^{T}$	0.009	0.036	−0.009	0.006	0.016	−0.002
			${MS}_{L}^{T}$	0.141	0.256	0.172	0.062	0.157	0.099
	Divergence	0.05	${AV}_{D}$	0.665	0.894	0.734	0.348	0.680	0.677
			${BS}_{D}$	1.158	1.014	0.298	1.075	1.122	0.351
			${MS}_{D}$	1.021	1.012	1.024	1.008	1.019	1.006
		0.10	${AV}_{D}$	0.667	0.895	0.740	0.348	0.682	0.678
			${BS}_{D}$	1.313	1.041	−0.421	1.130	1.236	−0.189
			${MS}_{D}$	1.056	1.040	1.073	1.032	1.057	1.023
		0.20	${AV}_{D}$	0.669	0.896	0.758	0.349	0.686	0.680
			${BS}_{D}$	1.572	1.092	−2.345	1.170	1.456	−1.036
			${MS}_{D}$	1.140	1.112	1.270	1.112	1.162	1.095

^(a) expressed as a multiplier of

10^{4}

.

Table A12. Simulated average estimate of VaR (expressed as a multiplier of

10^{4}

), relative bias and relative RMSE with

δ = 0.95

when the non-model selection approach is adopted for true models

F S (1.43, 834.95)

and

F R (1.05, 518.75)

. Values in bold indicate the results obtained under the true model.

Table A12. Simulated average estimate of VaR (expressed as a multiplier of

10^{4}

), relative bias and relative RMSE with

δ = 0.95

when the non-model selection approach is adopted for true models

F S (1.43, 834.95)

and

F R (1.05, 518.75)

. Values in bold indicate the results obtained under the true model.

					Assumed Model
True Model $(h)$	$n$	Method	$α$		$j = 1$	$j = 2$	$j = 3$	$j = 4$	$j = 5$	$j = 6$
					$FS$	$FR$	$LM$	$LN$	$PL$	$WE$
$F S (1.43, 834.95)$	50	Likelihood based		${AV}_{L}$	$0.676$	2.498	0.663	0.686	0.629	0.683
				${BS}_{L}$	1.000	84.546	0.371	1.460	−1.156	1.307
				${MS}_{L}$	1.000	17.973	1.164	1.094	0.997	1.610
		Divergence based	0.05	${AV}_{D}$	0.678	2.941	0.647	0.673	0.619	0.598
				${BS}_{D}$	1.075	104.831	−0.358	0.841	−1.609	−2.580
				${MS}_{D}$	1.010	19.103	1.086	1.017	0.976	1.070
			0.10	${AV}_{D}$	0.680	3.265	0.628	0.664	0.610	0.521
				${BS}_{D}$	1.149	119.711	−1.231	0.414	−2.031	−6.138
				${MS}_{D}$	1.037	21.393	1.029	1.003	0.980	0.983
			0.20	${AV}_{D}$	0.682	3.926	0.586	0.655	0.595	0.430
				${BS}_{D}$	1.282	150.006	−3.156	0.029	−2.722	−10.281
				${MS}_{D}$	1.116	36.205	0.981	1.071	1.035	1.174
	100	Likelihood based		${AV}_{L}$	0.664	2.584	0.654	0.677	0.615	0.688
				${BS}_{L}$	1.000	206.365	−0.059	2.404	−4.197	3.639
				${MS}_{L}$	1.000	20.936	1.133	1.088	1.014	1.712
		Divergence based	0.05	${AV}_{D}$	0.665	2.727	0.634	0.662	0.605	0.584
				${BS}_{D}$	1.158	221.649	−2.182	0.788	−5.338	−7.558
				${MS}_{D}$	1.021	20.275	1.061	1.022	1.021	1.118
			0.10	${AV}_{D}$	0.667	2.858	0.612	0.651	0.595	0.510
				${BS}_{D}$	1.313	235.765	−4.532	−0.328	−6.374	−15.491
				${MS}_{D}$	1.056	20.429	1.033	1.020	1.050	1.283
			0.20	${AV}_{D}$	0.669	3.199	0.566	0.641	0.579	0.424
				${BS}_{D}$	1.572	272.153	−9.418	−1.489	−8.083	−24.699
				${MS}_{D}$	1.140	24.517	1.105	1.090	1.136	1.736
$F R (1.43, 834.95)$	50	Likelihood based		${AV}_{L}$	0.594	0.916	0.844	0.705	0.631	1.032
				${BS}_{L}$	−4.410	1.000	−0.217	−2.552	−3.787	2.937
				${MS}_{L}$	0.865	1.000	1.053	0.934	0.870	2.728
		Divergence based	0.05	${AV}_{D}$	0.543	0.918	0.786	0.604	0.566	0.733
				${BS}_{D}$	−5.265	1.033	−1.183	−4.249	−4.883	−2.075
				${MS}_{D}$	0.915	1.009	0.914	0.858	0.902	0.980
			0.10	${AV}_{D}$	0.502	0.922	0.728	0.532	0.511	0.551
				${BS}_{D}$	−5.954	1.098	−2.154	−5.451	−5.805	−5.126
				${MS}_{D}$	0.978	1.040	0.836	0.929	0.973	0.956
			0.20	${AV}_{D}$	0.442	0.931	0.620	0.441	0.429	0.377
				${BS}_{D}$	−6.955	1.250	−3.967	−6.979	−7.188	−8.053
				${MS}_{D}$	1.091	1.128	0.838	1.089	1.122	1.228
	100	Likelihood based		${AV}_{L}$	0.581	0.893	0.824	0.685	0.616	1.005
				${BS}_{L}$	−7.591	1.000	−0.894	−4.731	−6.622	4.073
				${MS}_{L}$	1.236	1.000	1.041	1.073	1.167	2.672
		Divergence based	0.05	${AV}_{D}$	0.532	0.894	0.768	0.590	0.554	0.706
				${BS}_{D}$	−8.925	1.014	−2.431	−7.323	−8.325	−4.150
				${MS}_{D}$	1.373	1.012	0.972	1.210	1.320	1.109
			0.10	${AV}_{D}$	0.492	0.895	0.712	0.522	0.501	0.536
				${BS}_{D}$	−10.024	1.041	−3.977	−9.203	−9.779	−8.813
				${MS}_{D}$	1.501	1.040	0.988	1.403	1.480	1.393
			0.20	${AV}_{D}$	0.433	0.896	0.608	0.433	0.421	0.372
				${BS}_{D}$	−11.648	1.092	−6.852	−11.641	−11.995	−13.328
				${MS}_{D}$	1.705	1.112	1.176	1.701	1.753	1.927

Table A13. Simulated average estimate of VaR (expressed as a multiplier of

10^{4}

), relative bias and relative RMSE with

δ = 0.95

when non-model selection approach is adopted for true models

L M (2.04, 2202.85)

and

L N (1.19, 6.81)

. Values in bold indicate the results obtained under the true model.

Table A13. Simulated average estimate of VaR (expressed as a multiplier of

10^{4}

), relative bias and relative RMSE with

δ = 0.95

when non-model selection approach is adopted for true models

L M (2.04, 2202.85)

and

L N (1.19, 6.81)

. Values in bold indicate the results obtained under the true model.

					Assumed Model
True Model $(h)$	$n$	Method	$α$		$j = 1$	$j = 2$	$j = 3$	$j = 4$	$j = 5$	$j = 6$
					$FS$	$FR$	$LM$	$L N$	$PL$	$WE$
$L M (2.04, 2202.85)$	50	Likelihood based		${AV}_{L}$	0.988	9.122	0.738	0.942	0.876	0.728
				${BS}_{L}$	184.680	6149.505	1.000	151.066	102.676	−6.317
				${MS}_{L}$	1.591	126.745	1.000	1.480	1.282	1.193
		Divergence based	0.05	${AV}_{D}$	1.047	15.587	0.747	0.988	0.924	0.681
				${BS}_{D}$	227.613	10891.326	8.170	184.840	137.862	−40.603
				${MS}_{D}$	1.877	384.122	1.029	1.711	1.526	0.839
			0.10	${AV}_{D}$	1.104	14.643	0.758	1.040	0.976	0.635
				${BS}_{D}$	269.891	10198.940	15.599	222.475	175.556	−74.194
				${MS}_{D}$	2.187	139.374	1.104	2.022	1.839	0.795
			0.20	${AV}_{D}$	1.214	30.757	0.790	1.163	1.087	0.570
				${BS}_{D}$	350.259	22016.192	39.296	313.028	257.452	−122.024
				${MS}_{D}$	2.861	583.449	1.463	3.011	2.678	0.887
	100	Likelihood based		${AV}_{L}$	0.984	7.419	0.727	0.933	0.869	0.721
				${BS}_{L}$	−26.911	−725.582	1.000	−21.380	−14.366	1.664
				${MS}_{L}$	1.972	71.493	1.000	1.722	1.439	1.077
		Divergence based	0.05	${AV}_{D}$	1.043	9.264	0.734	0.977	0.915	0.673
				${BS}_{D}$	−33.286	−925.900	0.298	−26.136	−19.439	6.863
				${MS}_{D}$	2.348	81.529	1.024	2.003	1.732	0.923
			0.10	${AV}_{D}$	1.100	11.071	0.740	1.025	0.964	0.630
				${BS}_{D}$	−39.433	−1122.061	−0.421	−31.340	−24.697	11.544
				${MS}_{D}$	2.734	101.687	1.073	2.343	2.074	0.954
			0.20	${AV}_{D}$	1.203	16.500	0.758	1.130	1.062	0.567
				${BS}_{D}$	−50.633	−1711.510	−2.345	−42.771	−35.315	18.440
				${MS}_{D}$	3.490	223.177	1.270	3.174	2.860	1.173
$L N (1.19, 6.18)$	50	Likelihood based		${AV}_{L}$	0.363	0.903	0.327	0.343	0.330	0.313
				${BS}_{L}$	31.249	853.066	−23.533	1.000	−17.735	−44.701
				${MS}_{L}$	1.139	7.982	1.005	1.000	1.033	1.020
		Divergence based	0.05	${AV}_{D}$	0.369	1.061	0.326	0.344	0.332	0.297
				${BS}_{D}$	40.970	1092.348	−24.127	3.643	−14.747	−68.395
				${MS}_{D}$	1.213	10.421	1.013	1.024	1.067	0.986
			0.10	${AV}_{D}$	0.374	1.216	0.324	0.346	0.333	0.279
				${BS}_{D}$	48.263	1327.837	−26.982	6.036	−13.575	−95.557
				${MS}_{D}$	1.285	12.990	1.026	1.061	1.106	1.033
			0.20	${AV}_{D}$	0.379	1.501	0.318	0.349	0.332	0.248
				${BS}_{D}$	56.620	1761.416	−36.366	10.224	−15.694	−143.206
				${MS}_{D}$	1.414	18.620	1.059	1.165	1.191	1.228
	100	Likelihood based		${AV}_{L}$	0.365	0.921	0.335	0.348	0.334	0.323
				${BS}_{L}$	4.093	101.479	−1.191	1.000	−1.351	−3.316
				${MS}_{L}$	1.151	10.689	1.038	1.000	1.012	1.096
		Divergence based	0.05	${AV}_{D}$	0.371	1.049	0.333	0.348	0.335	0.304
				${BS}_{D}$	5.025	123.842	−1.542	1.075	−1.211	−6.653
				${MS}_{D}$	1.230	12.933	1.028	1.008	1.034	1.082
			0.10	${AV}_{D}$	0.374	1.164	0.330	0.348	0.335	0.284
				${BS}_{D}$	5.686	144.026	−2.114	1.130	−1.280	−10.129
				${MS}_{D}$	1.305	15.041	1.024	1.032	1.064	1.215
			0.20	${AV}_{D}$	0.378	1.375	0.320	0.349	0.331	0.250
				${BS}_{D}$	6.312	181.018	−3.791	1.170	−1.904	−16.054
				${MS}_{D}$	1.427	19.165	1.042	1.112	1.140	1.614

Table A14. Simulated average estimate of VaR (expressed as a multiplier of

10^{4}

), relative bias and relative RMSE with

δ = 0.95

when non-model selection approach is adopted for true models

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

. Values in bold indicate the results obtained under the true model.

Table A14. Simulated average estimate of VaR (expressed as a multiplier of

10^{4}

), relative bias and relative RMSE with

δ = 0.95

when non-model selection approach is adopted for true models

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

. Values in bold indicate the results obtained under the true model.

					Assumed Model
TrueModel $(h)$	$n$	Method	$α$		$j = 1$	$j = 2$	$j = 3$	$j = 4$	$j = 5$	$j = 6$
					$FS$	$FR$	$LM$	$LN$	$PL$	$WE$
$P L (1.27, 1117.04)$	50	Likelihood based		${AV}_{L}$	0.750	3.644	0.671	0.743	0.682	0.678
				${BS}_{L}$	4.306	145.089	0.443	3.971	1.000	0.780
				${MS}_{L}$	1.096	30.679	1.061	1.125	1.000	1.417
		Divergence based	0.05	${AV}_{D}$	0.766	4.407	0.663	0.746	0.686	0.610
				${BS}_{D}$	5.046	182.224	0.040	4.105	1.171	−2.538
				${MS}_{D}$	1.162	30.567	1.030	1.119	1.018	0.922
			0.10	${AV}_{D}$	0.780	5.046	0.652	0.752	0.689	0.549
				${BS}_{D}$	5.726	213.298	−0.480	4.379	1.336	−5.484
				${MS}_{D}$	1.239	40.763	1.016	1.163	1.056	0.843
			0.20	${AV}_{D}$	0.804	6.085	0.626	0.769	0.696	0.468
				${BS}_{D}$	6.895	263.827	−1.728	5.200	1.664	−9.418
				${MS}_{D}$	1.412	51.322	1.035	1.341	1.176	0.978
	100	Likelihood based		${AV}_{L}$	0.748	3.607	0.672	0.744	0.678	0.685
				${BS}_{L}$	5.253	178.905	0.588	5.003	1.000	1.407
				${MS}_{L}$	1.187	30.734	1.050	1.210	1.000	1.322
		Divergence based	0.05	${AV}_{D}$	0.763	3.946	0.659	0.745	0.680	0.611
				${BS}_{D}$	6.142	199.446	−0.180	5.066	1.122	−3.105
				${MS}_{D}$	1.278	27.772	1.000	1.210	1.019	0.984
			0.10	${AV}_{D}$	0.776	4.230	0.644	0.749	0.682	0.548
				${BS}_{D}$	6.953	216.689	−1.098	5.267	1.236	−6.920
				${MS}_{D}$	1.375	29.582	0.968	1.249	1.057	1.027
			0.20	${AV}_{D}$	0.799	4.877	0.610	0.760	0.686	0.466
				${BS}_{D}$	8.320	256.007	−3.165	5.984	1.456	−11.895
				${MS}_{D}$	1.569	36.153	0.977	1.397	1.162	1.373
$W E (0.79, 1690.57)$	50	Likelihood based		${AV}_{L}$	1.287	28.842	0.681	1.183	1.078	0.675
				${BS}_{L}$	−230.799	−1.067 $\times 10^{4}$	−1.040	−191.338	−151.677	1.000
				${MS}_{L}$	5.408	1.350 $\times 10^{3}$	1.110	4.793	3.929	1.000
		Divergence based	0.05	${AV}_{D}$	1.452	1.143 $\times 10^{2}$	0.714	1.338	1.234	0.679
				${BS}_{D}$	−293.208	−4.303 $\times 10^{4}$	−13.784	−250.069	−210.582	−0.221
				${MS}_{D}$	6.939	1.476 $\times 10^{4}$	1.373	6.422	5.509	1.019
			0.10	${AV}_{D}$	1.634	1.338 $\times 10^{3}$	0.763	1.532	1.430	0.681
				${BS}_{D}$	−362.149	−5.066 $\times 10^{5}$	−32.376	−323.500	−284.783	−1.230
				${MS}_{D}$	8.792	2.928 $\times 10^{5}$	2.019	9.156	7.869	1.051
			0.20	${AV}_{D}$	2.072	1.115 $\times 10^{6}$	1.010	2.520	2.243	0.686
				${BS}_{D}$	−527.976	−4.226 $\times 10^{8}$	−125.888	−697.980	−592.831	−2.911
				${MS}_{D}$	16.435	2.594 $\times 10^{8}$	9.900	91.794	66.216	1.184
	100	Likelihood based		${AV}_{L}$	1.271	23.481	0.674	1.177	1.054	0.676
				${BS}_{L}$	−349.095	−1.34 $\times 10^{4}$	2.154	−293.847	−221.680	1.000
				${MS}_{L}$	6.598	528.717	1.100	5.741	4.464	1.000
		Divergence based	0.05	${AV}_{D}$	1.422	29.491	0.698	1.307	1.189	0.677
				${BS}_{D}$	−438.486	−1.70 $\times 10^{4}$	−11.884	−370.264	−300.824	0.351
				${MS}_{D}$	8.293	791.471	1.270	7.212	6.017	1.006
			0.10	${AV}_{D}$	1.584	38.143	0.730	1.455	1.346	0.678
				${BS}_{D}$	−533.725	−2.21 $\times 10^{4}$	−30.474	−457.465	−393.307	−0.189
				${MS}_{D}$	10.146	1.08 $\times 10^{3}$	1.616	8.985	7.918	1.023
			0.20	${AV}_{D}$	1.925	74.205	0.830	1.821	1.730	0.680
				${BS}_{D}$	−734.430	−4.33 $\times 10^{4}$	−89.715	−673.064	−619.862	−1.036
				${MS}_{D}$	14.269	2.24 $\times 10^{3}$	3.328	13.921	13.036	1.095

Table A15. VaR estimates (expressed as a multiplier of

10^{4}

), relative biases and relative RMSEs for

k^{*} = 1, δ = 0.95

, when true models are

F S (1.43, 834.95)

,

F R (1.05, 518.75)

,

L M (2.04, 2202.85)

,

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

. Values in bold indicate the smallest

M S_{S}

among all the estimators.

Table A15. VaR estimates (expressed as a multiplier of

10^{4}

), relative biases and relative RMSEs for

k^{*} = 1, δ = 0.95

, when true models are

F S (1.43, 834.95)

,

F R (1.05, 518.75)

,

L M (2.04, 2202.85)

,

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

. Values in bold indicate the smallest

M S_{S}

among all the estimators.

n		Likelihood Based	Divergence Based			Likelihood Based	Divergence Based
n		Likelihood Based	$α = 0.05$	$α = 0.10$	$α = 0.20$	Likelihood Based	$α = 0.05$	$α = 0.10$	$α = 0.20$
		True model: $F S (1.43, 834.95)$				True model: $L N (1.19, 6.81)$
		True $Q_{1} (0.95) = 0.654$				True $Q_{4} (0.95) = 0.342$
50	${AV}_{s}$	0.340	0.340	0.342	0.345	0.174	0.181	0.187	0.201
	${BS}_{s}$	−14.433	−14.430	−14.341	−14.198	−254.946	−244.556	−235.040	−213.511
	${MS}_{s}$	2.373	2.397	2.441	2.554	2.770	2.845	2.936	3.140
100	${AV}_{s}$	0.322	0.322	0.319	0.315	0.174	0.178	0.182	0.196
	${BS}_{s}$	−35.577	−35.582	−35.937	−36.309	−29.365	−28.728	−27.985	−25.662
	${MS}_{s}$	3.403	3.419	3.424	3.446	3.986	4.025	4.094	4.391
		True model: $F R (1.05, 518.75)$				True model: $P L (1.27, 1117.04)$
		True $Q_{2} (0.95) = 0.857$				True $Q_{5} (0.95) = 0.662$
50	${AV}_{s}$	0.434	0.431	0.430	0.422	0.343	0.349	0.351	0.362
	${BS}_{s}$	−7.100	−7.145	−7.161	−7.290	−15.527	−15.226	−15.127	−14.594
	${MS}_{s}$	1.656	1.655	1.664	1.678	2.135	2.204	2.230	2.384
100	${AV}_{s}$	0.439	0.439	0.438	0.437	0.340	0.340	0.340	0.340
	${BS}_{s}$	−11.492	−11.500	−11.505	−11.549	−19.575	−19.534	−19.545	−19.556
	${MS}_{s}$	2.476	2.478	2.483	2.501	3.084	3.087	3.088	3.100
		True model: $L M (2.04, 2202.85)$				True model: $W E (0.79, 1690.57)$
		True $Q_{3} (0.95) = 0.736$				True $Q_{6} (0.95) = 0.678$
50	${AV}_{s}$	0.374	0.383	0.391	0.416	0.345	0.352	0.362	0.391
	${BS}_{s}$	−265.734	−258.878	−252.968	−234.818	126.059	123.430	119.696	108.616
	${MS}_{s}$	2.190	2.219	2.266	2.423	3.607	3.620	3.642	3.793
100	${AV}_{s}$	0.370	0.375	0.381	0.397	0.340	0.343	0.347	0.360
	${BS}_{s}$	39.792	39.264	38.602	36.897	199.204	197.236	194.804	187.452
	${MS}_{s}$	3.115	3.127	3.143	3.221	4.906	4.911	4.920	4.982

Table A16. VaR estimates (expressed as a multiplier of

10^{4}

), relative biases and relative RMSEs for

k^{*} = 2, δ = 0.95

, when true models are

F S (1.43, 834.95)

,

F R (1.05, 518.75)

,

L M (2.04, 2202.85)

,

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

. Values in bold indicate the smallest

M S_{k^{*}}

among all the estimators.

Table A16. VaR estimates (expressed as a multiplier of

10^{4}

), relative biases and relative RMSEs for

k^{*} = 2, δ = 0.95

, when true models are

F S (1.43, 834.95)

,

F R (1.05, 518.75)

,

L M (2.04, 2202.85)

,

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

. Values in bold indicate the smallest

M S_{k^{*}}

among all the estimators.

n		Likelihood Based	Divergence Based			Likelihood Based	Divergence Based
n		Likelihood Based	$α = 0.05$	$α = 0.10$	$α = 0.20$	Likelihood Based	$α = 0.05$	$α = 0.10$	$α = 0.20$
		True model: $F S (1.43, 834.95)$				True model: $L N (1.19, 6.81)$
		True $Q_{1} (0.95) = 0.654$				True $Q_{4} (0.95) = 0.342$
50	${AV}_{k^{*}}$	0.333	0.337	0.340	0.348	0.174	0.181	0.190	0.208
	${BS}_{k^{*}}$	−14.723	−14.571	−14.438	−14.072	−255.662	−244.217	−230.788	−203.567
	${MS}_{k^{*}}$	1.602	1.600	1.604	1.616	1.922	1.873	1.833	1.784
100	${AV}_{k^{*}}$	0.322	0.320	0.320	0.318	0.176	0.182	0.188	0.202
	${BS}_{k^{*}}$	−35.578	−35.809	−35.786	−35.945	−29.102	−28.004	−26.994	−24.523
	${MS}_{k^{*}}$	2.428	2.441	2.444	2.464	2.738	2.672	2.615	2.529
		True model: $F R (1.05, 518.75)$				True model: $P L (1.27, 1117.04)$
		True $Q_{2} (0.95) = 0.857$				True $Q_{5} (0.95) = 0.662$
50	${AV}_{k^{*}}$	0.373	0.359	0.348	0.331	0.344	0.350	0.355	0.372
	${BS}_{k^{*}}$	−8.118	−8.352	−8.544	−8.828	−15.450	−15.162	−14.915	−14.111
	${MS}_{k^{*}}$	1.269	1.291	1.308	1.348	1.458	1.445	1.441	1.492
100	${AV}_{k^{*}}$	0.368	0.356	0.345	0.328	0.342	0.345	0.349	0.356
	${BS}_{k^{*}}$	−13.451	−13.779	−14.061	−14.529	−19.404	−19.223	−18.983	−18.546
	${MS}_{k^{*}}$	1.953	1.995	2.033	2.098	2.102	2.087	2.069	2.060
		True model: $L M (2.04, 2202.85)$				True model: $W E (0.79, 1690.57)$
		True $Q_{3} (0.95) = 0.736$				True $Q_{6} (0.95) = 0.678$
50	${AV}_{k^{*}}$	0.384	0.396	0.413	0.454	0.352	0.374	0.411	0.545
	${BS}_{k^{*}}$	−258.561	−249.563	−237.003	−206.978	123.459	115.159	101.175	50.371
	${MS}_{k^{*}}$	1.500	1.473	1.459	1.513	2.497	2.431	2.433	3.894
100	${AV}_{k^{*}}$	0.382	0.391	0.402	0.424	0.340	0.351	0.368	0.441
	${BS}_{k^{*}}$	38.522	37.485	36.308	33.889	199.122	192.602	182.416	139.645
	${MS}_{k^{*}}$	2.136	2.093	2.048	2.003	3.469	3.391	3.298	3.236

Table A17. VaR estiamtes (expressed as a multiplier of

10^{4}

), relative biases and relative RMSEs for

k^{*} = 3, δ = 0.95

, when true models are

F S (1.43, 834.95)

,

F R (1.05, 518.75)

,

L M (2.04, 2202.85)

,

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

. Values in bold indicate the smallest

M S_{k^{*}}

among all the estimators.

Table A17. VaR estiamtes (expressed as a multiplier of

10^{4}

), relative biases and relative RMSEs for

k^{*} = 3, δ = 0.95

, when true models are

F S (1.43, 834.95)

,

F R (1.05, 518.75)

,

L M (2.04, 2202.85)

,

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

. Values in bold indicate the smallest

M S_{k^{*}}

among all the estimators.

n		Likelihood Based	Divergence Based			Likelihood Based	Divergence Based
n		Likelihood Based	$α = 0.05$	$α = 0.10$	$α = 0.20$	Likelihood Based	$α = 0.05$	$α = 0.10$	$α = 0.20$
		True model: $F S (1.43, 834.95)$				True model: $L N (1.19, 6.81)$
		True $Q_{1} (0.95) = 0.654$				True $Q_{4} (0.95) = 0.342$
50	${AV}_{k^{*}}$	0.224	0.225	0.226	0.227	0.116	0.120	0.125	0.134
	${BS}_{k^{*}}$	−19.726	−19.697	−19.652	−19.583	−343.706	−337.259	−329.954	−315.924
	${MS}_{k^{*}}$	2.021	2.016	2.017	2.018	2.477	2.437	2.396	2.322
100	${AV}_{k^{*}}$	0.217	0.215	0.214	0.213	0.118	0.121	0.125	0.132
	${BS}_{k^{*}}$	−46.755	−46.980	−47.154	−47.253	−39.309	−38.725	−38.005	−36.756
	${MS}_{k^{*}}$	3.127	3.141	3.152	3.166	3.628	3.579	3.521	3.431
		True model: $F R (1.05, 518.75)$				True model: $P L (1.27, 1117.04)$
		True $Q_{2} (0.95) = 0.857$				True $Q_{5} (0.95) = 0.662$
50	${AV}_{k^{*}}$	0.239	0.225	0.215	0.199	0.233	0.236	0.240	0.248
	${BS}_{k^{*}}$	−10.369	−10.600	−10.772	−11.039	−20.857	−20.694	−20.526	−20.149
	${MS}_{k^{*}}$	1.548	1.576	1.599	1.636	1.850	1.838	1.827	1.818
100	${AV}_{k^{*}}$	0.234	0.222	0.212	0.196	0.233	0.234	0.235	0.238
	${BS}_{k^{*}}$	−17.131	−17.456	−17.741	−18.182	−26.072	−26.001	−25.918	−25.713
	${MS}_{k^{*}}$	2.447	2.491	2.530	2.592	2.759	2.752	2.744	2.731
		True model: $L M (2.04, 2202.85)$				True model: $W E (0.79, 1690.57)$
		True $Q_{3} (0.95) = 0.736$				True $Q_{6} (0.95) = 0.678$
50	${AV}_{k^{*}}$	0.270	0.280	0.293	0.324	0.274	0.301	0.337	0.486
	${BS}_{k^{*}}$	−342.361	−334.847	−325.115	−302.308	152.993	142.831	129.183	72.596
	${MS}_{k^{*}}$	1.890	1.856	1.817	1.753	3.018	2.879	2.758	8.552
100	${AV}_{k^{*}}$	0.269	0.278	0.288	0.308	0.268	0.288	0.313	0.384
	${BS}_{k^{*}}$	50.784	49.721	48.694	46.459	241.322	229.906	215.181	172.926
	${MS}_{k^{*}}$	2.748	2.695	2.645	2.548	4.179	4.002	3.789	3.380

Appendix C. VaR Estimation for δ = 0.99 in the Presence of Contamination

Table A18. True VaR and the simulated average VaR estimates with associated biases and RMSEs (relative biases and relative RMSEs of divergence estimators) for

δ = 0.99

, in the presence of

ϵ = 0.1

contamination, for true models

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

, contaminated by

L N (1.19, 18.54)

,

P L (1.27, 89363.2)

and

W E (0.79, 84528.5)

respectively.

Table A18. True VaR and the simulated average VaR estimates with associated biases and RMSEs (relative biases and relative RMSEs of divergence estimators) for

δ = 0.99

, in the presence of

ϵ = 0.1

contamination, for true models

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

, contaminated by

L N (1.19, 18.54)

,

P L (1.27, 89363.2)

and

W E (0.79, 84528.5)

respectively.

n	Estimator	$α$	Statistics ^(a)	True Model
				$f_{θ}$ : $LN (1.19, 6.81)$	$PL (1.27, 1117.04)$	$WE (0.79, 1690.57)$
				$f_{θ^{(c)}}$ : $LN (1.19, 18.54)$	$PL (1.27, 89363.2)$	$WE (0.79, 84528.5)$
			$Q_{θ_{j}} (δ)$	0.770	1.900	1.168
50			${AV}_{L}$	1.53 $\times 10^{3}$	481.674	8.504
	Likelihood		${BS}_{L}^{T}$	1526.004	481.659	7.335
			${MS}_{L}^{T}$	1673.085	531.072	8.340
	Divergence	0.05	${AV}_{D}$	45.113	15.815	6.378
			${BS}_{D}$	0.029	0.033	0.710
			${MS}_{D}$	0.028	0.034	0.702
		0.10	${AV}_{D}$	1.027	1.968	4.317
			${BS}_{D}$	1.69 $\times 10^{4}$	0.004	0.429
			${MS}_{D}$	3.18 $\times 10^{4}$	0.005	0.430
		0.20	${AV}_{D}$	0.976	1.063	1.907
			${BS}_{D}$	1.35 $\times 10^{4}$	0.002	0.101
			${MS}_{D}$	2.50 $\times 10^{4}$	0.002	0.147
100			${AV}_{L}$	8.345	13.409	8.345
	Likelihood		${BS}_{L}^{T}$	1490.96	11.508	7.177
			${MS}_{L}^{T}$	1560.373	12.553	7.682
	Divergence	0.05	${AV}_{D}$	6.204	9.732	6.204
			${BS}_{D}$	0.702	0.681	0.702
			${MS}_{D}$	0.697	0.692	0.697
		0.10	${AV}_{D}$	4.167	7.189	4.167
			${BS}_{D}$	0.418	0.460	0.418
			${MS}_{D}$	0.418	0.487	0.418
		0.20	${AV}_{D}$	1.843	4.555	1.843
			${BS}_{D}$	0.094	0.231	0.094
			${MS}_{D}$	0.115	0.284	0.115

^(a) expressed as a multiplier of

10^{4}

.

Table A19. Simulated average VaR estimates (expressed as a multiplier of

10^{4}

) for

δ = 0.99

, relative bias and relative RMSE when a non-model selection approach is adopted for the true model

L N (1.19, 6.81)

with

ϵ = 0.1

contamination from

L N (1.19, 18.54)

. Values in bold indicate the results obtained under the true model.

Table A19. Simulated average VaR estimates (expressed as a multiplier of

10^{4}

) for

δ = 0.99

, relative bias and relative RMSE when a non-model selection approach is adopted for the true model

L N (1.19, 6.81)

with

ϵ = 0.1

contamination from

L N (1.19, 18.54)

. Values in bold indicate the results obtained under the true model.

True Model	n	Method	$α$		Assumed Model
True Model	n	Method	$α$		$FS$	$FR$	$LM$	$LN$	$PL$	$WE$
$X^{(c)} \sim L N (1.19, 18.54)$	50	Likelihood based		${AV}_{L}$	109.971	363.687	4.25 $\times 10^{3}$	$1.53 \times 10^{3}$	481.674	7.29 $\times 10^{3}$
				${BS}_{L}$	0.072	0.238	2.787	1.000	0.315	4.779
				${MS}_{L}$	0.071	0.263	2.875	1.000	0.317	4.958
		Divergence based	0.05	${AV}_{D}$	5.804	52.335	188.278	45.113	15.815	1.09 $\times 10^{3}$
				${BS}_{D}$	0.003	0.034	0.123	0.029	0.010	0.712
				${MS}_{D}$	0.004	0.045	0.130	0.028	0.011	0.692
			0.10	${AV}_{D}$	1.882	24.361	10.285	1.027	1.968	0.460
				${BS}_{D}$	0.001	0.015	0.006	$1.69 \times 10^{4}$	0.001	−2.03 $\times 10^{4}$
				${MS}_{D}$	0.001	0.024	0.008	$3.18 \times 10^{4}$	0.001	2.06 $\times 10^{4}$
			0.20	${AV}_{D}$	1.399	21.003	0.886	0.976	1.063	0.401
				${BS}_{D}$	4.12 $\times 10^{4}$	0.013	7.63 $\times 10^{5}$	$1.35 \times 10^{4}$	1.93 $\times 10^{4}$	−2.41 $\times 10^{4}$
				${BS}_{D}$	5.71 $\times 10^{4}$	0.023	4.99 $\times 10^{4}$	$2.50 \times 10^{4}$	4.16 $\times 10^{4}$	2.33 $\times 10^{4}$
$X^{(0)} \sim L N (1.9, 6.81)$	100	Likelihood based		${AV}_{L}$	108.003	344.371	4.08 $\times 10^{3}$	$1.49 \times 10^{3}$	467.613	7.08 $\times 10^{3}$
				${BS}_{L}$	0.072	0.230	2.735	1.000	0.313	4.748
				${MS}_{L}$	0.071	0.242	2.764	1.000	0.313	4.842
		Divergence based	0.05	${AV}_{D}$	5.594	46.778	178.362	44.869	15.068	1.07 $\times 10^{3}$
				${BS}_{D}$	0.003	0.031	0.119	0.030	0.010	0.719
				${MS}_{D}$	0.003	0.037	0.122	0.029	0.010	0.710
			0.10	${AV}_{D}$	1.803	20.469	9.157	0.942	1.816	0.460
				${BS}_{D}$	0.001	0.013	0.006	$1.15 \times 10^{4}$	0.001	−2.08 $\times 10^{4}$
				${MS}_{D}$	0.001	0.017	0.006	$2.07 \times 10^{4}$	0.001	2.09 $\times 10^{4}$
			0.20	${AV}_{D}$	1.346	16.691	0.772	0.913	1.003	0.400
				${BS}_{D}$	3.87 $\times 10^{4}$	0.011	1.34 $\times 10^{6}$	$9.61 \times 10^{5}$	1.57 $\times 10^{4}$	−2.48 $\times 10^{4}$
				${MS}_{D}$	4.80 $\times 10^{4}$	0.016	2.50 $\times 10^{4}$	$1.77 \times 10^{4}$	2.92 $\times 10^{4}$	2.43 $\times 10^{4}$

Table A20. Simulated average VaR estimates for

δ = 0.99

(expressed as a multiplier of

10^{4}

), relative bias and relative RMSE when the non-model selection approach is adopted for the true model

P L (1.27, 1117.04)

with

ϵ = 0.1

contamination from

P L (1.27, 89363.2)

. Values in bold indicate the results obtained under the true model.

Table A20. Simulated average VaR estimates for

δ = 0.99

(expressed as a multiplier of

10^{4}

), relative bias and relative RMSE when the non-model selection approach is adopted for the true model

P L (1.27, 1117.04)

with

ϵ = 0.1

contamination from

P L (1.27, 89363.2)

. Values in bold indicate the results obtained under the true model.

True Model	n	Method	$α$		Assumed Model
True Model	n	Method	$α$		$FS$	$FR$	$LM$	$LN$	$PL$	$WE$
$X^{(o)} \sim P L (1.27, 1117.04);$ $X^{(c)} \sim P L (1.27, 89363.2)$	50	Likelihood based		${AV}_{L}$	109.971	363.687	4.25 $\times 10^{3}$	1.53 $\times 10^{3}$	481.674	7.29 $\times 10^{3}$
				${BS}_{L}$	0.228	0.755	8.831	3.170	1.000	15.141
				${MS}_{L}$	0.223	0.830	9.058	3.152	1.000	15.620
		Divergence based	0.05	${AV}_{D}$	5.804	52.335	188.278	45.113	15.815	1.09 $\times 10^{3}$
				${BS}_{D}$	0.012	0.109	0.391	0.094	0.033	2.256
				${MS}_{D}$	0.012	0.144	0.410	0.089	0.034	2.181
			0.10	${AV}_{D}$	1.882	24.361	10.285	1.027	1.968	0.460
				${BS}_{D}$	0.004	0.051	0.021	0.002	0.004	0.001
				${MS}_{D}$	0.004	0.075	0.026	0.002	0.005	0.001
			0.20	${AV}_{D}$	1.399	21.003	0.886	0.976	1.063	0.401
				${BS}_{D}$	0.003	0.044	0.002	0.002	0.002	0.001
				${MS}_{D}$	0.003	0.074	0.002	0.002	0.002	0.001
	100	Likelihood based		${AV}_{L}$	10.483	1013.921	26.310	10.283	13.409	11.893
				${BS}_{L}$	0.746	87.937	2.121	0.728	1.000	0.868
				${MS}_{L}$	0.736	2116.957	2.170	0.721	1.000	1.381
		Divergence based	0.05	${AV}_{D}$	8.003	168.714	20.783	7.536	9.732	7.351
				${BS}_{D}$	0.530	14.495	1.641	0.490	0.681	0.474
				${MS}_{D}$	0.534	26.828	1.681	0.483	0.692	0.472
			0.10	${AV}_{D}$	6.406	173.302	15.590	5.641	7.189	4.092
				${BS}_{D}$	0.391	14.894	1.190	0.325	0.460	0.190
				${MS}_{D}$	0.408	28.656	1.243	0.331	0.487	0.198
			0.20	${AV}_{D}$	4.829	188.798	8.307	3.700	4.555	1.070
				${BS}_{D}$	0.254	16.240	0.557	0.156	0.231	−0.072
				${MS}_{D}$	0.288	41.126	0.649	0.190	0.284	0.074

Table A21. Simulated average estimate of VaR for

δ = 0.99

(expressed as a multiplier of

10^{4}

), relative bias and relative RMSE when non-model selection approach is adopted for true model

W E (0.79, 1690.57)

with

ϵ = 0.1

contamination from

W E (0.79, 84528.5)

. Values in bold indicate the results obtained under the true model.

Table A21. Simulated average estimate of VaR for

δ = 0.99

(expressed as a multiplier of

10^{4}

), relative bias and relative RMSE when non-model selection approach is adopted for true model

W E (0.79, 1690.57)

with

ϵ = 0.1

contamination from

W E (0.79, 84528.5)

. Values in bold indicate the results obtained under the true model.

True Model	n	Method	$α$		Assumed Model
True Model	n	Method	$α$		$FS$	$FR$	$LM$	$LN$	$PL$	$WE$
$X^{(o)} \sim W E (0.79, 1690.57)$ ; $X^{(c)} \sim W E (0.79, 84528.5)$	50	Likelihood based		${AV}_{L}$	17.677	1.25 $\times 10^{5}$	23.804	13.476	20.715	8.504
				${BS}_{L}$	2.251	1.70 $\times 10^{4}$	3.086	1.678	2.665	1.000
				${MS}_{L}$	2.311	4.45 $\times 10^{5}$	3.599	1.789	2.939	1.000
		Divergence based	0.05	${AV}_{D}$	17.644	2.29 $\times 10^{4}$	23.628	13.174	21.365	6.378
				${BS}_{D}$	2.246	3.13 $\times 10^{3}$	3.062	1.637	2.753	0.710
				${MS}_{D}$	2.429	2.42 $\times 10^{4}$	4.025	2.080	3.400	0.702
			0.10	${AV}_{D}$	18.283	1.46 $\times 10^{5}$	24.429	14.238	23.759	4.317
				${BS}_{D}$	2.333	1.98 $\times 10^{4}$	3.171	1.782	3.080	0.429
				${MS}_{D}$	2.837	2.64 $\times 10^{5}$	5.279	4.899	5.161	0.430
			0.20	${AV}_{D}$	30.019	5.19 $\times 10^{15}$	56.731	407.538	2.43 $\times 10^{3}$	1.907
				${BS}_{D}$	3.933	7.07 $\times 10^{14}$	7.575	55.398	330.565	0.101
				${MS}_{D}$	35.030	1.97 $\times 10^{16}$	48.841	1.48 $\times 10^{3}$	9.06 $\times 10^{3}$	0.147
	100	Likelihood based		${AV}_{L}$	16.531	4.81 $\times 10^{4}$	21.058	12.792	18.592	8.345
				${BS}_{L}$	2.141	6.70 $\times 10^{3}$	2.771	1.620	2.428	1.000
				${MS}_{L}$	2.144	1.57 $\times 10^{5}$	2.971	1.626	2.504	1.000
		Divergence based	0.05	${AV}_{D}$	16.194	4.86 $\times 10^{3}$	19.856	12.083	18.256	6.204
				${BS}_{D}$	2.094	677.407	2.604	1.521	2.381	0.702
				${MS}_{D}$	2.136	2.93 $\times 10^{3}$	2.870	1.555	2.533	0.697
			0.10	${AV}_{D}$	16.267	8.81 $\times 10^{3}$	18.594	11.776	18.501	4.167
				${BS}_{D}$	2.104	1.23 $\times 10^{3}$	2.428	1.478	2.415	0.418
				${MS}_{D}$	2.207	5.44 $\times 10^{3}$	2.835	1.567	2.720	0.418
			0.20	${AV}_{D}$	17.708	6.54 $\times 10^{4}$	18.254	12.589	21.715	1.843
				${BS}_{D}$	2.305	9.11 $\times 10^{3}$	2.381	1.591	2.863	0.094
				${MS}_{D}$	2.631	7.40 $\times 10^{4}$	3.738	1.938	4.027	0.115

Table A22. VaR estimates (expressed as a multiplier of

10^{4}

), relative biases and relative RMSEs for

k^{*} = 1, δ = 0.99

, when true models are

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

; contaminated with

ϵ = 0.1

from

L N (1.19, 18.54)

,

P L (1.27, 89363.2)

and

W E (0.79, 84528.5)

respectively. Values in bold indicate the smallest

M S_{S}

among all the estimators.

Table A22. VaR estimates (expressed as a multiplier of

10^{4}

), relative biases and relative RMSEs for

k^{*} = 1, δ = 0.99

, when true models are

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

; contaminated with

ϵ = 0.1

from

L N (1.19, 18.54)

,

P L (1.27, 89363.2)

and

W E (0.79, 84528.5)

respectively. Values in bold indicate the smallest

M S_{S}

among all the estimators.

True Model	n		Likelihood Based	Divergence Based
True Model	n		Likelihood Based	$α = 0.05$	$α = 0.10$	$α = 0.20$
$L N$	50	${AV}_{s}$	1.91 $\times 10^{3}$	26.404	7.359	1.775
		${BS}_{s}$	1.249	0.017	0.004	0.001
		${MS}_{s}$	1.977	0.032	0.009	0.003
	100	${AV}_{s}$	1.92 $\times 10^{3}$	23.821	7.191	1.415
		${BS}_{s}$	1.289	0.015	0.004	4.33 $\times 10^{4}$
		${MS}_{s}$	1.919	0.027	0.008	0.002
$P L$	50	${AV}_{s}$	1.91 $\times 10^{3}$	26.404	7.359	1.775
		${BS}_{s}$	3.96	0.055	0.015	0.004
		${MS}_{s}$	6.229	0.103	0.030	0.009
	100	${AV}_{s}$	18.307	15.576	12.506	8.123
		${BS}_{s}$	1.426	1.188	0.922	0.541
		${MS}_{s}$	2.357	2.228	2.079	1.745
$W E$	50	${AV}_{s}$	11.977	11.996	12.041	11.941
		${BS}_{s}$	1.473	1.476	1.482	1.469
		${MS}_{s}$	2.570	2.806	3.114	5.429
	100	${AV}_{s}$	10.185	9.525	8.938	7.737
		${BS}_{s}$	1.256	1.164	1.083	0.915
		${MS}_{s}$	1.999	1.888	1.834	1.703

Table A23. VaR estimates (expressed as a multiplier of

10^{4}

), relative biases and relative RMSEs for

k^{*} = 2, δ = 0.99

,

ϵ = 0.1

, when true models are

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

; contaminated with

ϵ = 0.1

from

L N (1.19, 18.54)

,

P L (1.27, 89363.2)

and

W E (0.79, 84528.5)

respectively. Values in bold indicate the smallest

M S_{k^{*}}

among all the estimators.

Table A23. VaR estimates (expressed as a multiplier of

10^{4}

), relative biases and relative RMSEs for

k^{*} = 2, δ = 0.99

,

ϵ = 0.1

, when true models are

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

; contaminated with

ϵ = 0.1

from

L N (1.19, 18.54)

,

P L (1.27, 89363.2)

and

W E (0.79, 84528.5)

respectively. Values in bold indicate the smallest

M S_{k^{*}}

among all the estimators.

True Model	n		Likelihood Based	Divergence Based
True Model	n		Likelihood Based	$α = 0.05$	$α = 0.10$	$α = 0.20$
$L N$	50	${AV}_{k^{*}}$	1.15 $\times 10^{3}$	59.950	5.073	1.543
		${BS}_{k^{*}}$	0.755	0.039	0.003	0.001
		${MS}_{k^{*}}$	0.778	0.042	0.004	0.002
	100	${AV}_{k^{*}}$	1.10 $\times 10^{3}$	56.131	4.692	1.287
		${BS}_{k^{*}}$	0.740	0.037	0.003	0.000
		${MS}_{k^{*}}$	0.747	0.038	0.003	0.001
$P L$	50	${AV}_{k^{*}}$	1.15 $\times 10^{3}$	59.950	5.073	1.543
		${BS}_{k^{*}}$	2.392	0.124	0.011	0.003
		${MS}_{k^{*}}$	2.451	0.132	0.015	0.006
	100	${AV}_{k^{*}}$	15.089	12.249	9.136	5.813
		${BS}_{k^{*}}$	1.146	0.899	0.629	0.340
		${MS}_{k^{*}}$	1.312	1.150	0.929	0.686
$W E$	50	${AV}_{k^{*}}$	11.439	11.600	12.220	18.288
		${BS}_{k^{*}}$	1.400	1.422	1.507	2.334
		${MS}_{k^{*}}$	1.621	1.780	2.209	11.184
	100	${AV}_{k^{*}}$	9.718	9.297	9.037	9.442
		${BS}_{k^{*}}$	1.191	1.133	1.097	1.153
		${MS}_{k^{*}}$	1.250	1.203	1.230	1.852

Table A24. VaR estimates (expressed as a multiplier of

10^{4}

), relative biases and relative RMSEs for

k^{*} = 3, δ = 0.99

, when true models are

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

; contaminated with

ϵ = 0.1

from

L N (1.19, 18.54)

,

P L (1.27, 89363.2)

and

W E (0.79, 84528.5)

respectively. Values in bold indicate the smallest

M S_{k^{*}}

among all the estimators.

Table A24. VaR estimates (expressed as a multiplier of

10^{4}

), relative biases and relative RMSEs for

k^{*} = 3, δ = 0.99

, when true models are

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

; contaminated with

ϵ = 0.1

from

L N (1.19, 18.54)

,

P L (1.27, 89363.2)

and

W E (0.79, 84528.5)

respectively. Values in bold indicate the smallest

M S_{k^{*}}

among all the estimators.

True Model	n		Likelihood Based	Divergence Based
True Model	n		Likelihood Based	$α = 0.05$	$α = 0.10$	$α = 0.20$
$L N$	50	${AV}_{k^{*}}$	561.130	28.389	2.830	0.879
		${BS}_{k^{*}}$	0.367	0.018	0.001	7.18 $\times 10^{5}$
		${MS}_{k^{*}}$	0.378	0.020	0.002	$7.18 \times 10^{4}$
	100	${AV}_{k^{*}}$	538.197	26.619	2.629	0.758
		${BS}_{k^{*}}$	0.360	0.017	0.001	−7.91 $\times 10^{6}$
		${MS}_{k^{*}}$	0.364	0.018	0.002	$4.74 \times 10^{4}$
$P L$	50	${AV}_{k^{*}}$	561.130	28.389	2.830	0.879
		${BS}_{k^{*}}$	1.165	0.059	0.006	0.002
		${MS}_{k^{*}}$	1.191	0.063	0.008	0.003
	100	${AV}_{k^{*}}$	8.372	6.881	5.517	3.653
		${BS}_{k^{*}}$	0.562	0.433	0.314	0.152
		${MS}_{k^{*}}$	0.649	0.545	0.445	0.336
$W E$	50	${AV}_{k^{*}}$	7.262	7.370	7.567	272.877
		${BS}_{s}$	0.831	0.845	0.872	37.041
		${MS}_{k^{*}}$	0.991	1.098	1.267	994.725
	100	${AV}_{k^{*}}$	6.234	6.025	5.885	6.195
		${BS}_{s}$	0.706	0.677	0.657	0.700
		${MS}_{k^{*}}$	0.756	0.746	0.765	1.042

Table A25. Proportion of models included for the VaR estimates reported in Table A22 for sample sizes 50 and 100 when the true models are

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

, contaminated by

L N (1.19, 18.54)

,

P L (1.27, 89363.2)

and

W E (0.79, 84528.5)

respectively with

ϵ = 0.1

. Values in bold indicate the results obtained under the true model.

Table A25. Proportion of models included for the VaR estimates reported in Table A22 for sample sizes 50 and 100 when the true models are

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

, contaminated by

L N (1.19, 18.54)

,

P L (1.27, 89363.2)

and

W E (0.79, 84528.5)

respectively with

ϵ = 0.1

. Values in bold indicate the results obtained under the true model.

True Model	n	Method	$α$	Assumed Model
True Model	n	Method	$α$	$FS$	$FR$	$LM$	$LN$	$PL$	$WE$
$L N$	50	Likelihood		0.000	0.193	0.807	0.000	0.000	0.000
		Divergence	0.05	0.000	0.953	0.047	0.000	0.000	0.000
			0.10	0.220	0.769	0.010	0.000	0.000	0.001
			0.20	0.231	0.324	0.012	0.289	0.098	0.046
	100	Likelihood		0.000	0.099	0.901	0.000	0.000	0.000
		Divergence	0.05	0.000	0.977	0.023	0.000	0.000	0.000
			0.10	0.192	0.807	0.001	0.000	0.000	0.000
			0.20	0.258	0.233	0.009	0.422	0.073	0.005
$P L$	50	Likelihood		0.000	0.193	0.807	0.000	0.000	0.000
		Divergence	0.05	0.000	0.953	0.047	0.000	0.000	0.000
			0.10	0.220	0.769	0.010	0.000	0.000	0.001
			0.20	0.231	0.324	0.012	0.289	0.098	0.046
	100	Likelihood		0.133	0.430	0.434	0.001	0.002	0.000
		Divergence	0.05	0.301	0.386	0.311	0.000	0.002	0.000
			0.10	0.511	0.335	0.152	0.000	0.002	0.000
			0.20	0.718	0.223	0.044	0.009	0.006	0.000
$W E$	50	Likelihood		0.423	0.059	0.338	0.074	0.091	0.015
		Divergence	0.05	0.477	0.061	0.288	0.075	0.082	0.017
			0.10	0.507	0.061	0.262	0.088	0.063	0.019
			0.20	0.433	0.057	0.203	0.160	0.070	0.077
	100	Likelihood		0.434	0.005	0.425	0.021	0.115	0.000
		Divergence	0.05	0.524	0.005	0.333	0.022	0.116	0.000
			0.10	0.588	0.006	0.262	0.029	0.111	0.004
			0.20	0.579	0.003	0.218	0.075	0.091	0.034

Table A26. Proportion of inclusion of true model as per order of preference for the VaR estimates reported in Table A23 for sample sizes 50 and 100 when the true models are

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

with contamination from

L N (1.19, 18.54)

,

P L (1.27, 89363.2)

and

W E (0.79, 84528.5)

respectively for

ϵ = 0.1

.

Table A26. Proportion of inclusion of true model as per order of preference for the VaR estimates reported in Table A23 for sample sizes 50 and 100 when the true models are

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

with contamination from

L N (1.19, 18.54)

,

P L (1.27, 89363.2)

and

W E (0.79, 84528.5)

respectively for

ϵ = 0.1

.

True Model	n	Method	$α$	Order of Preference
True Model	n	Method	$α$	$0^{τ}$	1	2
$L N$	50	Likelihood		1.000	0.000	0.000
		Divergence	0.05	1.000	0.000	0.000
			0.10	1.000	0.000	0.000
			0.20	0.397	0.289	0.314
	100	Likelihood		1.000	0.000	0.000
		Divergence	0.05	1.000	0.000	0.000
			0.10	1.000	0.000	0.000
			0.20	0.268	0.422	0.310
$P L$	50	Likelihood		1.000	0.000	0.000
		Divergence	0.05	0.998	0.000	0.002
			0.10	0.847	0.000	0.153
			0.20	0.672	0.098	0.230
	100	Likelihood		0.640	0.002	0.358
		Divergence	0.05	0.664	0.002	0.334
			0.10	0.525	0.002	0.473
			0.20	0.317	0.006	0.677
$W E$	50	Likelihood		0.980	0.015	0.005
		Divergence	0.05	0.981	0.017	0.002
			0.10	0.975	0.019	0.006
			0.20	0.886	0.077	0.037
	100	Likelihood		1.000	0.000	0.000
		Divergence	0.05	0.998	0.000	0.002
			0.10	0.993	0.004	0.003
			0.20	0.945	0.034	0.021

^τ non-inclusion of the VaR estimate of the true model.

Table A27. Proportion of inclusion of true model as per order of preference for the VaR estimates reported in Table A24 for sample sizes 50 and 100 when the true models are

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

with contamination from

L N (1.19, 18.54)

,

P L (1.27, 89363.2)

and

W E (0.79, 84528.5)

respectively for

ϵ = 0.1

.

Table A27. Proportion of inclusion of true model as per order of preference for the VaR estimates reported in Table A24 for sample sizes 50 and 100 when the true models are

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

with contamination from

L N (1.19, 18.54)

,

P L (1.27, 89363.2)

and

W E (0.79, 84528.5)

respectively for

ϵ = 0.1

.

True Model	n	Method	$α$	Order of Preference
True Model	n	Method	$α$	0^$τ$	1	2	3
$L N$	50	Likelihood		1.000	0.000	0.000	0.000
		Divergence	0.05	1.000	0.000	0.000	0.000
			0.10	0.992	0.000	0.000	0.008
			0.20	0.171	0.289	0.314	0.226
	100	Likelihood		1.000	0.000	0.000	0.000
		Divergence	0.05	1.000	0.000	0.000	0.000
			0.10	0.999	0.000	0.000	0.001
			0.20	0.054	0.422	0.310	0.214
$P L$	50	Likelihood		0.000	0.000	0.000	1.000
		Divergence	0.05	0.230	0.000	0.002	0.768
			0.10	0.146	0.000	0.153	0.701
			0.20	0.409	0.098	0.230	0.263
	100	Likelihood		0.227	0.002	0.358	0.413
		Divergence	0.05	0.215	0.002	0.334	0.449
			0.10	0.105	0.002	0.473	0.420
			0.20	0.133	0.006	0.677	0.184
$W E$	50	Likelihood		0.974	0.015	0.005	0.006
		Divergence	0.05	0.968	0.017	0.002	0.013
			0.10	0.950	0.019	0.006	0.025
			0.20	0.839	0.077	0.037	0.047
	100	Likelihood		0.997	0.000	0.000	0.003
		Divergence	0.05	0.994	0.000	0.002	0.004
			0.10	0.987	0.004	0.003	0.006
			0.20	0.912	0.034	0.021	0.033

^τ non-inclusion of the VaR estimate of the true model.

Appendix D. VaR Estimation for δ = 0.95 in the Presence of Contamination

Table A28. True VaR and the simulated average VaR estimates with associated bias and RMSEs (relative bias and relative RMSEs of divergence estimators) for

δ = 0.95

, in the presence of

ϵ = 0.1

contamination, for true models

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

, contaminated by

L N (1.19, 18.54)

,

P L (1.27, 89363.2)

and

W E (0.79, 84528.5)

respectively.

Table A28. True VaR and the simulated average VaR estimates with associated bias and RMSEs (relative bias and relative RMSEs of divergence estimators) for

δ = 0.95

, in the presence of

ϵ = 0.1

contamination, for true models

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

, contaminated by

L N (1.19, 18.54)

,

P L (1.27, 89363.2)

and

W E (0.79, 84528.5)

respectively.

n	Estimator	$α$	Statistics ^(a)	True Model
				$f_{θ}$ : $LN (1.19, 6.81)$	$PL (1.27, 1117.04)$	$WE (0.79, 1690.57)$
				$f_{θ^{(c)}}$ : $LN (1.19, 18.54)$	$PL (1.27, 89363.2)$	$WE (0.79, 84528.5)$
			$Q_{θ_{j}} (δ)$	0.342	0.005	0.678
50			${AV}_{L}$	103.940	16.769	8.504
	Likelihood		${BS}_{L}^{T}$	103.598	16.764	2.689
			${MS}_{L}^{T}$	109.252	17.479	2.969
	Divergence	0.05	${AV}_{D}$	7.454	1.997	2.723
			${BS}_{D}$	0.069	0.119	0.761
			${MS}_{D}$	0.068	0.122	0.757
		0.10	${AV}_{D}$	0.426	0.532	1.998
			${BS}_{D}$	0.001	0.031	0.491
			${MS}_{D}$	0.001	0.033	0.497
		0.20	${AV}_{D}$	0.413	0.367	1.019
			${BS}_{D}$	0.001	0.022	0.127
			${MS}_{D}$	0.001	0.022	0.176
100			${AV}_{L}$	103.249	2.353	3.334
	Likelihood		${BS}_{L}^{T}$	102.907	1.691	2.656
			${MS}_{L}^{T}$	105.697	1.778	2.799
	Divergence	0.05	${AV}_{D}$	7.463	1.924	2.673
			${BS}_{D}$	0.069	0.747	0.751
			${MS}_{D}$	0.069	0.757	0.749
		0.10	${AV}_{D}$	0.402	1.583	1.949
			${BS}_{D}$	0.001	0.545	0.479
			${MS}_{D}$	0.001	0.570	0.481
		0.20	${AV}_{D}$	0.394	1.171	0.997
			${BS}_{D}$	0.001	0.301	0.120
			${MS}_{D}$	0.001	0.357	0.145

^(a) expressed as a multiplier of

10^{4}

.

Table A29. VaR estimates (expressed as a multiplier of

10^{4}

), relative biases and relative RMSEs for

k^{*} = 1, δ = 0.95

, when true models are

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

, contaminated with

ϵ = 0.1

from

L N (1.19, 18.54)

,

P L (1.27, 89363.2)

and

W E (0.79, 84528.5)

respectively. Values in bold indicate the smallest

M S_{S}

among all the estimators.

Table A29. VaR estimates (expressed as a multiplier of

10^{4}

), relative biases and relative RMSEs for

k^{*} = 1, δ = 0.95

, when true models are

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

, contaminated with

ϵ = 0.1

from

L N (1.19, 18.54)

,

P L (1.27, 89363.2)

and

W E (0.79, 84528.5)

respectively. Values in bold indicate the smallest

M S_{S}

among all the estimators.

n		Likelihood Based	Divergence Based
n		Likelihood Based	$α = 0.05$	$α = 0.10$	$α = 0.20$
	True model: $L N (1.19, 6.81)$ ; $Q (0.95) = 0.342$
50	${AV}_{s}$	23.526	1.803	0.753	0.308
	${BS}_{s}$	0.224	0.014	0.004	0.000
	${MS}_{s}$	0.339	0.025	0.010	0.004
100	${AV}_{s}$	24.759	1.762	0.774	0.282
	${BS}_{s}$	0.237	0.014	0.004	−0.001
	${MS}_{s}$	0.348	0.024	0.010	0.004
	True model: $P L (1.27, 1117.04)$ ; $Q (0.95) = 0.005$
50	${AV}_{s}$	23.526	1.803	0.753	0.308
	${BS}_{s}$	1.403	0.107	0.045	0.018
	${MS}_{s}$	2.131	0.168	0.075	0.032
100	${AV}_{s}$	1.887	1.661	1.386	1.008
	${BS}_{s}$	0.725	0.591	0.428	0.204
	${MS}_{s}$	1.381	1.274	1.139	0.918
	True model: $W E (0.79, 1690.57)$ ; $Q (0.95) = 0.678$
50	${AV}_{s}$	1.607	1.590	1.576	1.478
	${BS}_{s}$	0.346	0.339	0.334	0.298
	${MS}_{s}$	0.699	0.713	0.735	0.809
100	${AV}_{s}$	1.513	1.463	1.416	1.314
	${BS}_{s}$	0.314	0.295	0.278	0.240
	${MS}_{s}$	0.649	0.629	0.616	0.594

Table A30. VaR estimates (expressed as a multiplier of

10^{4}

), relative biases and relative RMSEs for

k^{*} = 2, δ = 0.95

, when true models are

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

, contaminated with

ϵ = 0.1

from

L N (1.19, 18.54)

,

P L (1.27, 89363.2)

and

W E (0.79, 84528.5)

respectively. Values in bold indicate the smallest

M S_{k^{*}}

among all the estimators.

Table A30. VaR estimates (expressed as a multiplier of

10^{4}

), relative biases and relative RMSEs for

k^{*} = 2, δ = 0.95

, when true models are

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

, contaminated with

ϵ = 0.1

from

L N (1.19, 18.54)

,

P L (1.27, 89363.2)

and

W E (0.79, 84528.5)

respectively. Values in bold indicate the smallest

M S_{k^{*}}

among all the estimators.

n		Likelihood Based	Divergence Based
n		Likelihood Based	$α = 0.05$	$α = 0.10$	$α = 0.20$
	True model: $L N (1.19, 6.81)$ ; $Q (0.95) = 0.342$
50	${AV}_{k^{*}}$	16.727	2.922	0.571	0.280
	${BS}_{k^{*}}$	0.158	0.025	0.002	−0.001
	${MS}_{k^{*}}$	0.158	0.026	0.004	0.002
100	${AV}_{k^{*}}$	16.595	2.872	0.564	0.265
	${BS}_{k^{*}}$	0.158	0.025	0.002	−0.001
	${MS}_{k^{*}}$	0.157	0.025	0.003	0.001
	True model: $P L (1.27, 1117.04)$ ; $Q (0.95) = 0.005$
50	${AV}_{k^{*}}$	16.727	2.922	0.571	0.280
	${BS}_{k^{*}}$	0.997	0.174	0.034	0.016
	${MS}_{k^{*}}$	1.003	0.179	0.040	0.019
100	${AV}_{k^{*}}$	1.687	1.433	1.156	0.843
	${BS}_{k^{*}}$	0.606	0.456	0.292	0.107
	${MS}_{k^{*}}$	0.672	0.562	0.430	0.303
	True model: $L N (1.19, 6.81)$ ; $Q (0.95) = 0.342$
	True model: $W E (0.79, 1690.57)$ ; $Q (0.95) = 0.678$
50	${AV}_{k^{*}}$	1.570	1.561	1.570	1.654
	${BS}_{k^{*}}$	0.332	0.328	0.332	0.363
	${MS}_{k^{*}}$	0.365	0.378	0.412	0.782
100	${AV}_{k^{*}}$	1.480	1.444	1.415	1.405
	${BS}_{k^{*}}$	0.302	0.288	0.278	0.274
	${MS}_{k^{*}}$	0.316	0.308	0.309	0.343

Table A31. VaR estimates (expressed as a multiplier of

10^{4}

), relative biases and relative RMSEs for

k^{*} = 3, δ = 0.95

, when true models are

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

, contaminated with

ϵ = 0.1

from

L N (1.19, 18.54)

,

P L (1.27, 89363.2)

and

W E (0.79, 84528.5)

, respectively. Values in bold indicate the smallest

M S_{k^{*}}

among all the estimators.

Table A31. VaR estimates (expressed as a multiplier of

10^{4}

), relative biases and relative RMSEs for

k^{*} = 3, δ = 0.95

, when true models are

L N (1.19, 6.81)

,

P L (1.27, 1117.04)

and

W E (0.79, 1690.57)

, contaminated with

ϵ = 0.1

from

L N (1.19, 18.54)

,

P L (1.27, 89363.2)

and

W E (0.79, 84528.5)

, respectively. Values in bold indicate the smallest

M S_{k^{*}}

among all the estimators.

n		Likelihood Based	Divergence Based
n		Likelihood Based	$α = 0.05$	$α = 0.10$	$α = 0.20$
	True model: $L N (1.19, 6.81)$ ; $Q (0.95) = 0.342$
50	${AV}_{k^{*}}$	9.256	1.510	0.343	0.174
	${BS}_{k^{*}}$	0.086	0.011	0.000	−0.002
	${MS}_{k^{*}}$	0.086	0.012	0.002	0.002
100	${AV}_{k^{*}}$	9.189	1.488	0.336	0.167
	${BS}_{k^{*}}$	0.086	0.011	0.000	−0.002
	${MS}_{k^{*}}$	0.086	0.011	0.001	0.002
	True model: $P L (1.27, 1117.04)$ ; $Q (0.95) = 0.005$
50	${AV}_{k^{*}}$	9.256	1.510	0.343	0.174
	${BS}_{k^{*}}$	0.552	0.090	0.020	0.010
	${MS}_{k^{*}}$	0.554	0.092	0.023	0.011
100	${AV}_{k^{*}}$	1.018	0.880	0.750	0.560
	${BS}_{k^{*}}$	0.210	0.129	0.052	−0.060
	${MS}_{k^{*}}$	0.268	0.210	0.163	0.153
	True model: $W E (0.79, 1690.57)$ ; $Q (0.95) = 0.678$
50	${AV}_{k^{*}}$	1.023	1.016	1.015	1.609
	${BS}_{k^{*}}$	0.128	0.126	0.125	0.346
	${MS}_{k^{*}}$	0.174	0.184	0.202	5.728
100	${AV}_{k^{*}}$	1.449	1.416	1.391	1.391
	${BS}_{k^{*}}$	0.290	0.278	0.268	0.268
	${MS}_{k^{*}}$	0.303	0.295	0.295	0.326

References

Abubakar, Hamza, and Shamsul Rijal Muhammad Sabri. 2023. A Bayesian approach to Weibull distribution with application to insurance claims data. Journal of Reliability and Statistical Studies 16: 1–24. [Google Scholar] [CrossRef]
Adeleke, Isaac A., and Adeyinka Ibiwoye. 2011. Modeling claim sizes in personal line non-life insurance. International Business & Economics Research Journal (IBER) 10: 21–38. [Google Scholar] [CrossRef]
Bahnemann, Dietrich. 1996. Distributions for Actuaries. Arlington: Casualty Actuarial Society. [Google Scholar]
Basavalingappa, Anand, John M. Passage, and John R. Lloyd. 2017. Electromigration: Lognormal versus Weibull distribution. Paper presented at 2017 IEEE International Integrated Reliability Workshop (IIRW), South Lake Tahoe, CA, USA, 8–12 October; pp. 1–4. [Google Scholar] [CrossRef]
Basu, Anirban, Anirban Mandal, and Leandro Pardo. 2013. Testing statistical hypotheses based on the density power divergence. Annals of the Institute of Statistical Mathematics 65: 319–48. [Google Scholar] [CrossRef]
Basu, Anirban, Ian R. Harris, and M. Christopher Jones. 1998. Robust and efficient estimation by minimizing a density power divergence. Biometrika 85: 549–59. [Google Scholar] [CrossRef]
Basu, Bijay, Dinesh Tiwari, and Rajiv Prasad. 2009. Is Weibull distribution the most appropriate statistical strength distribution for brittle materials? Ceramics International 35: 237–46. [Google Scholar] [CrossRef]
Block, A. Daniel, and Lawrence M. Leemis. 2008. Parametric model discrimination for heavily censored survival data. IEEE Transactions on Reliability 57: 248–59. [Google Scholar] [CrossRef]
Bowden, Geoffrey J., Peter R. Barker, and John W. Twidell. 1983. The weibull distribution function and wind power statistics. Wind Engineering 7: 85–98. [Google Scholar]
Brazauskas, Vytautas, and Andreas Kleefeld. 2011. Folded and log-folded-t distributions as models for insurance loss data. Scandinavian Actuarial Journal 2011: 59–74. [Google Scholar] [CrossRef]
Brownie, Cecilia, Jean-Pierre Habicht, and David S. Robson. 1983. An estimation procedure for the contaminated normal distributions arising in clinical chemistry. Journal of the American Statistical Association 78: 228–37. [Google Scholar] [CrossRef]
Buch-larsen, Thomas, Jesper P. Nielsen, and Carles Bolancé. 2005. Kernel density estimation for heavy-tailed distributions using the Champernowne transformation. Statistics 39: 503–16. [Google Scholar] [CrossRef]
Buckland, Stephen T., Kenneth P. Burnham, and Noel H. Augustin. 1997. Model selection: An integral part of inference. Biometrics 53: 603–18. [Google Scholar] [CrossRef]
Burnham, Kenneth P., and David R. Anderson. 2002. Model Selection and Multimodel Inference, 2nd ed. New York: Springer. [Google Scholar] [CrossRef]
Castilla, Elena, Natalia Martín, and Kostas Zografos. 2020. Model selection in a composite likelihood framework based on density power divergence. Entropy 22: 270. [Google Scholar] [CrossRef] [PubMed]
Champernowne, David Gawen. 1952. The graduation of income distributions. Econometrica 20: 591–615. [Google Scholar] [CrossRef]
Chatfield, Christopher. 1995. Model uncertainty, data mining and statistical inference. Journal of the Royal Statistical Society: Series A (Statistics in Society) 158: 419–44. [Google Scholar] [CrossRef]
Chow, Gregory C. 1984. Maximum-likelihood estimation of misspecified models. Economic Modelling 1: 134–38. [Google Scholar] [CrossRef]
Christoffersen, Peter F. 2012. Elements of Financial Risk Management, 2nd ed. San Diego: Academic Press. [Google Scholar] [CrossRef]
Crow, Edwin L., and Kunio Shimizu. 1988. Lognormal Distributions—Theory and Applications. New York: Marcel Dekker. [Google Scholar]
Das, Jayanta, and Dilip C. Nath. 2022. Weighted quantile regression theory and its application Weibull distribution as an actuarial risk model: Computation of its probability of ultimate ruin and the moments of the time to ruin, deficit at ruin and surplus prior to ruin. Journal of Data Science 17: 161–94. [Google Scholar] [CrossRef]
Dey, Arindam, and Debasis Kundu. 2012. Discriminating between the Weibull and log-normal distributions for Type-II censored data. Statistics 46: 197–14. [Google Scholar] [CrossRef]
Dormann, Carsten F., Justin M. Calabrese, Gurutzeta Guillera-Arroita, Eleni Matechou, Volker Bahn, Kamil Bartoń, Colin M. Beale, Simone Ciuti, Jane Elith, Katharina Gerstner, and et al. 2018. Model averaging in ecology: A review of Bayesian, information-theoretic, and tactical approaches for predictive inference. Ecological Monographs 88: 485–504. [Google Scholar] [CrossRef]
Dumonceaux, Richard, and Charles E. Antle. 1973. Discrimination between the log-normal and the Weibull distributions. Technometrics 15: 923–26. [Google Scholar] [CrossRef]
Fisk, Percy R. 1961. The graduation of income distributions. Econometrica 29: 171–85. [Google Scholar] [CrossRef]
Fomby, Thomas, and Ray C. Hill. 2003. Maximum Likelihood Estimation of Misspecified Models: Twenty Years Later. Bingley: Emerald Group Publishing Limited. [Google Scholar]
Ghosh, Aniruddha, Ian R. Harris, and Leandro Pardo. 2017. A generalized divergence for statistical inference. Bernoulli 23: 2746–83. [Google Scholar] [CrossRef]
Gustafson, Paul. 2001. On measuring sensitivity to parametric model misspecification. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63: 81–94. [Google Scholar] [CrossRef]
Hewitt, Charles C., Jr., and Benjamin Lefkowitz. 1979. Methods for fitting distributions to insurance loss data. Proceedings of the Casualty Actuarial Society 66: 139–60. [Google Scholar]
Hogg, Robert V., and Stuart A. Klugman. 1984. Loss Distributions. Wiley Series in Probability and Statistics. Hoboken: John Wiley & Sons, Ltd. [Google Scholar] [CrossRef]
Johnson, Norman L., Samuel Kotz, and N. Balakrishnan. 1994. Continuous Univariate Distributions, Volume 1, 2nd ed. New York: John Wiley & Sons. [Google Scholar]
Jones, M. Christopher, Nils L. Hjort, and Anirban Basu. 2001. A comparison of related density-based minimum divergence estimators. Biometrika 88: 865–73. [Google Scholar] [CrossRef]
Jorion, Philippe. 2006. Value at Risk: The New Benchmark for Managing Financial Risk, 3rd ed. New York: McGraw Hill LLC. [Google Scholar]
Kaas, Rob, Marc Goovaerts, and Michel Denuit. 2008. Modern Actuarial Risk Theory, 2nd ed. Berlin/Heidelberg: Springer. [Google Scholar] [CrossRef]
Kakwani, Nanak, and Hai H. Son. 2022. Economic Inequality and Poverty—Facts, Methods, and Policies. Oxford: Oxford University Press. [Google Scholar] [CrossRef]
Kakwani, Nanak Chand. 1980. Income Inequality and Poverty. Technical Report. Washington, DC: World Bank Research Publication. [Google Scholar]
Karagrigoriou, Athanasios, and Konstantinos Mattheou. 2009. Measures of divergence in model selection. In Advances in Data Analysis: Theory and Applications to Reliability and Inference, Data Mining, Bioinformatics, Lifetime Data, and Neural Networks. New York: Springer, pp. 51–65. [Google Scholar]
Kass, Robert E., and Adrian E. Raftery. 1995. Bayes factors. Journal of the American Statistical Association 90: 773–95. [Google Scholar] [CrossRef]
Kleiber, Christian, and Samuel Kotz. 2003. Statistical Size Distributions in Economics and Actuarial Sciences. Hoboken: John Wiley & Sons. [Google Scholar]
Klugman, Stuart A., Harry H. Panjer, and Gordon E. Willmot. 2012. Loss Models: From Data to Decisions. Wiley Series in Probability and Statistics. Hoboken: John Wiley & Sons. [Google Scholar]
Kotz, Samuel, and Saralees Nadarajah. 2000. Extreme Value Distributions: Theory and Applications. Singparore: World Scientific. [Google Scholar]
Kreer, Markus, Ayşe Kızılersü, Anthony W. Thomas, and Alfredo D. Egídio dos Reis. 2015. Goodness-of-fit tests and applications for left-truncated Weibull distributions to non-life insurance. European Actuarial Journal 5: 139–63. [Google Scholar] [CrossRef]
Kurata, Shuichi, and Emiko Hamada. 2020. On the consistency and the robustness in model selection criteria. Communications in Statistics - Theory and Methods 49: 5175–95. [Google Scholar] [CrossRef]
Li, Ling, Hon Keung Tony Ng, Ali H. Algarni, Abdullah M. Almarashi, and Zaher A. Abo-Eleneen. 2020. A model-ranking approach for estimation based on accelerated degradation test data. IEEE Transactions on Reliability 69: 484–96. [Google Scholar] [CrossRef]
Ling, Mun Hon, and N. Balakrishnan. 2017. Model mis-specification analyses of Weibull and Gamma models based on one-shot device test data. IEEE Transactions on Reliability 66: 641–50. [Google Scholar] [CrossRef]
Lomax, Kenneth S. 1954. Business failures: Another example of the analysis of failure data. Journal of the American Statistical Association 49: 847–52. [Google Scholar] [CrossRef]
Loss Data Analytics Core Team. 2020. Loss Data Analytics: An open text authored by the Actuarial Community, Version 1.1 ed.
Luko, Stephen N. 1999. A review of the Weibull distribution and selected engineering applications. SAE Transactions 108: 398–412. [Google Scholar]
Mehrabani, Ali, and Aman Ullah. 2020. Improved average estimation in seemingly unrelated regressions. Econometrics 8: 15. [Google Scholar] [CrossRef]
Miljkovic, Tanja. 2025. Premium Estimation Under Model Uncertainty: Model Averaging for Left-Truncated Reinsurance Losses. Variance 18: 1–24. [Google Scholar]
Miljkovic, Tanja, and Bettina Grün. 2021. Using model averaging to determine suitable risk measure estimates. North American Actuarial Journal 25: 562–79. [Google Scholar] [CrossRef]
Mudholkar, Gopal S., and Devendra K. Srivastava. 1993. Exponentiated weibull family for analyzing bathtub failure-rate data. IEEE Transactions on Reliability 42: 299–302. [Google Scholar] [CrossRef]
Nadarajah, Saralees, and Samuel Kotz. 2003. The exponentiated Fréchet distribution. Interstat Electronic Journal 14: 1–7. [Google Scholar]
Okoli, Kenechukwu, Korbinian Breinl, Luigia Brandimarte, Anna Botto, Elena Volpi, and Giuliano Di Baldassarre. 2018. Model averaging versus model selection: Estimating design floods with uncertain river flow data. Hydrological Sciences Journal 63: 1913–26. [Google Scholar] [CrossRef]
Pasari, Sumanta. 2018. Stochastic modelling of earthquake interoccurrence times in northwest himalaya and adjoining regions. Geomatics, Natural Hazards and Risk 9: 568–88. [Google Scholar] [CrossRef]
Punzo, Antonio, and Cristina Tortora. 2021. Multiple scaled contaminated normal distribution and its application in clustering. Statistical Modelling 21: 332–58. [Google Scholar] [CrossRef]
Raftery, A. E. 1995. Bayesian model selection in social research. Sociological Methodology 25: 111–63. [Google Scholar] [CrossRef]
R Core Team. 2025. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. [Google Scholar]
Rinne, H. 2008. The Weibull Distribution—A Handbook. Boca Raton: CRC Press. [Google Scholar]
Rolski, Tomasz, Hanspeter Schmidli, Volker Schmidt, and Jozef L. Teugels. 1999. Stochastic Processes for Insurance & Finance. Chichester: John Wiley & Sons. [Google Scholar]
Singh, Surendra K., and Gary S. Maddala. 1976. A function for size distribution of incomes. Econometrica 44: 963–70. [Google Scholar] [CrossRef]
Smith, Richard L. 2003. Statistics of extremes, with applications in environment, insurance, and finance. In Extreme Values in Finance, Telecommunications, and the Environment. Edited by B. Finkenstadt and H. Rootzen. Boca Raton: Chapman and Hall/CRC, pp. 20–97. [Google Scholar] [CrossRef]
Steel, Mark F. J. 2020. Model averaging and its use in economics. Journal of Economic Literature 58: 644–719. [Google Scholar] [CrossRef]
Tadikamalla, Pandu R. 1978. Applications of the Weibull distribution in inventory control. Journal of the Operational Research Society 29: 77–83. [Google Scholar] [CrossRef]
Tse, Yiu-Kuen. 2023. Nonlife Actuarial Models: Theory, Methods and Evaluation. Cambridge: Cambridge University Press. [Google Scholar] [CrossRef]
Wais, Piotr. 2017. A review of Weibull functions in wind sector. Renewable and Sustainable Energy Reviews 70: 1099–1107. [Google Scholar] [CrossRef]
Wang, Yinzhi, Ingrid Hobæk Haff, and Arne Huseby. 2020. Modelling extreme claims via composite models and threshold selection methods. Insurance: Mathematics and Economics 91: 257–68. [Google Scholar] [CrossRef]
Wellmann, Järgen, and Ursula Gather. 1999. A note on contamination models and outliers. Communications in Statistics—Theory and Methods 28: 1793–1802. [Google Scholar] [CrossRef]
White, Halbert. 1982. Maximum likelihood estimation of misspecified models. Econometrica 50: 1–25. [Google Scholar] [CrossRef]
Wintle, Brendan A., Michel A. McCarthy, Chris T. Volinsky, and Rodney P. Kavanagh. 2003. The use of Bayesian model averaging to better represent uncertainty in ecological models. Conservation Biology 17: 1579–90. [Google Scholar] [CrossRef]
Wolny-Dominiak, Alicja, and Michal Trzesiok. 2014. insuranceData: A Collection of Insurance Datasets Useful in Risk Classification in Non-life Insurance, R Package Version 1.0.
Yu, Daoping. 2020. Comparing model selection criteria to distinguish truncated operational risk models. Frontiers in Applied Mathematics and Statistics 6: 557971. [Google Scholar] [CrossRef]
Zelinková, Katerina. 2015. Fitting probability distributions to market risk and insurance risk. Central European Review of Economics 18: 168–74. [Google Scholar]

Figure 1. Fitted (a) CDFs and (b) PDFs of the six candidate models based on MLEs of parameters of the dataCar data.

Figure 2. Fitted CDFs of the six candidate models based on MLEs of parameters of the dataOhlsson data.

Table 1. Point estimates and associated MLEs of VaR values (expressed in multiples of

10^{3}

), denoted as

{\hat{Q}}_{θ_{j}} (\cdot)

, of the considered models for

δ = 0.95

and 0.99 for dataCar ^#.

Table 1. Point estimates and associated MLEs of VaR values (expressed in multiples of

10^{3}

), denoted as

{\hat{Q}}_{θ_{j}} (\cdot)

, of the considered models for

δ = 0.95

and 0.99 for dataCar ^#.

Distribution	Shape	Scale	${\hat{Q}}_{θ_{j}} (0.95)$	${\hat{Q}}_{θ_{j}} (0.99)$
$F S$	1.43	834.95	6.51	20.60
$F R$	1.05	518.75	8.71	40.97
$L N$	1.19	6.81	7.33	18.74
$L M$	2.04	2202.85	6.42	14.43
$P L$	1.27	1117.04	6.54	18.66
$W E$	0.79	1690.57	6.83	11.81

^# available in the insuranceData package in R.

Table 2. VaR estimates (in

10^{3}

) of all the models for

δ = 0.95

and 0.99, along with weighted VaR estimates, in addition to the associated model ranks based on likelihood and divergence method when

k^{*} = 2, 3

, respectively, for the dataCar dataset.

Table 2. VaR estimates (in

10^{3}

) of all the models for

δ = 0.95

and 0.99, along with weighted VaR estimates, in addition to the associated model ranks based on likelihood and divergence method when

k^{*} = 2, 3

, respectively, for the dataCar dataset.

Method	$α$	${\hat{Q}}_{j} (δ)$ or ${\tilde{Q}}_{j} (δ)$						${\hat{Q}}^{} (δ)$ or ${\tilde{Q}}^{} (δ)$
Method	$α$	$FS$	$FR$	$LM$	$LN$	$PL$	$WE$	$k^{*} = 2$	$k^{*} = 3$
$δ = 0.95$
Ranks ^‡		3	1	5	2	4	6
Likelihood		6.51	8.71	7.33	6.42	6.54	6.83	7.56	7.21
Divergence	0.05	6.31	8.89	7.31	6.14	6.33	6.36	7.52	7.12
	0.10	6.04	8.94	7.29	5.86	6.07	5.87	7.40	6.95
	0.20	5.44	8.72	6.99	5.24	5.46	4.88	6.99	6.48
$δ = 0.99$
Likelihood		20.60	40.97	18.74	14.43	18.66	11.81	27.66	25.29
Divergence	0.05	19.78	42.41	18.63	13.68	17.84	10.77	28.07	25.31
	0.10	18.71	42.93	18.54	12.89	16.84	9.73	27.96	24.89
	0.20	16.32	41.74	17.11	11.24	14.51	7.79	26.61	23.21

^‡ The model ranks are the same for the likelihood and all the values of

α

used for the divergence-based method for both the values of

δ

considered.

Table 3. VaR estimates (in

10^{4}

) of all the models for

δ = 0.95, 0.99

along with weighted VaR estimates based on likelihood and divergence method when

k^{*} = 2

and 3, respectively, for the dataOhlsson dataset.

Table 3. VaR estimates (in

10^{4}

) of all the models for

δ = 0.95, 0.99

along with weighted VaR estimates based on likelihood and divergence method when

k^{*} = 2

and 3, respectively, for the dataOhlsson dataset.

Method	$δ$	$α$	${\hat{Q}}_{j} (δ)$ or ${\tilde{Q}}_{j} (δ)$						${\hat{Q}}^{} (δ)$ or ${\tilde{Q}}^{} (δ)$
Method	$δ$	$α$	$FS$	$FR$	$LM$	$LN$	$PL$	$WE$	$k^{*} = 2$	$k^{*} = 3$
Likelihood	0.95		14.81	69.57	10.99	12.83	13.66	9.46	11.15	11.10
	0.99		69.33	1190.63	35.77	38.59	59.02	17.52	28.05	30.62
		Ranks →	5	6	3	1	4	2
Divergence	0.95	0.05	15.76	82.58	12.11	13.53	14.72	9.30	12.82	13.45
		0.10	16.49	95.45	12.59	14.19	15.68	9.12	13.39	14.15
		0.20	17.19	121.24	14.49	15.30	17.16	8.66	14.90	15.65
	0.99	0.05	76.09	1469.99	43.81	41.36	66.67	17.09	42.59	50.61
		0.10	81.45	1774.31	46.87	43.96	73.85	16.65	45.41	54.89
		0.20	86.98	2452.49	61.32	48.42	84.97	15.55	54.86	64.89
		Ranks ^† →	4	6	2	1	3	5

^† The model ranks are the same for all the values of

α

used for the divergence-based method.

Table 4. Model ranking for four candidate models with two or three parameters based on AIC and

R C C_{α}

for the vehicle insurance dataset.

Table 4. Model ranking for four candidate models with two or three parameters based on AIC and

R C C_{α}

for the vehicle insurance dataset.

Models	AIC	${RCC}_{α}$
Models	AIC	$α = 0.05$	$α = 0.1$	$α = 0.2$
$F R$	77,195.22	−122,251	−40,651.3	−9130.11
$E F$	76,956.88	−122,350	−40,750.3	−9208.57
$W E$	78,987.19	−120,976	−39,756.6	−8699.82
$E W$	77,304.19	−122,147	−40,571.3	25.31705

Table 5. VaR estimates (in 1000) based on likelihood and divergence method for

δ = 0.95

for the four considered models.

Table 5. VaR estimates (in 1000) based on likelihood and divergence method for

δ = 0.95

for the four considered models.

Method	$α$	${\hat{Q}}_{j} (0.95)$ or ${\tilde{Q}}_{j} (0.95)$				${\hat{Q}}^{} (0.95)$ or ${\tilde{Q}}^{} (0.95)$
Method	$α$	$FR$	$EF$	$WE$	$EW$	$k^{*} = 2$	$k^{*} = 3$
Likelihood		8.712	19.445	6.830	7.118	14.065	11.745
Divergence	0.05	8.895	15.142	6.364	6.959	12.020	10.334
	0.1	8.943	18.619	5.866	6.944	13.787	11.511
	Ranks ^†	2	1	4	3
	0.2	8.729	21.478	4.884	0.001	15.134	11.837
	Ranks	2	1	3	4

^† The model ranks are the same for the likelihood method as well as the divergence-based method for

α = 0.05

and

0.10

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Basu, S.; Ng, H.K.T. Model Misspecification and Data-Driven Model Ranking Approach for Insurance Loss and Claims Data. Risks 2025, 13, 231. https://doi.org/10.3390/risks13120231

AMA Style

Basu S, Ng HKT. Model Misspecification and Data-Driven Model Ranking Approach for Insurance Loss and Claims Data. Risks. 2025; 13(12):231. https://doi.org/10.3390/risks13120231

Chicago/Turabian Style

Basu, Suparna, and Hon Keung Tony Ng. 2025. "Model Misspecification and Data-Driven Model Ranking Approach for Insurance Loss and Claims Data" Risks 13, no. 12: 231. https://doi.org/10.3390/risks13120231

APA Style

Basu, S., & Ng, H. K. T. (2025). Model Misspecification and Data-Driven Model Ranking Approach for Insurance Loss and Claims Data. Risks, 13(12), 231. https://doi.org/10.3390/risks13120231

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Model Misspecification and Data-Driven Model Ranking Approach for Insurance Loss and Claims Data

Abstract

1. Introduction

2. Probability Models for Claim Severity

2.1. Fisk Distribution

2.2. Fréchet Distribution

2.3. Lomax Distribution

2.4. Lognormal Distribution

2.5. Paralogistic Distribution

2.6. Weibull Distribution

3. Motivation and Estimation of Model Parameters

3.1. Likelihood-Based Estimation

3.2. Divergence-Based Estimation

4. Estimating Value-at-Risk via Model Averaging

4.1. Model Selection Approach

4.2. Model Ranking Approach

5. Monte Carlo Simulation Study

5.1. Performance of the Proposed Methods in the Absence of Contamination

5.2. Performance of the Proposed Methods in the Presence of Contamination

6. Practical Data Analyses

6.1. Vehicle Insurance Data

6.2. Partial Casco Motorcycles Insurance Data

7. Extension to Candidate Models with Unequal Number of Parameters

8. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. VaR Estimation for δ = 0.99 Without Contamination

Appendix B. VaR Estimation for δ = 0.95 Without Contamination

Appendix C. VaR Estimation for δ = 0.99 in the Presence of Contamination

Appendix D. VaR Estimation for δ = 0.95 in the Presence of Contamination

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI