Model Averaging and Grid Maps for Modeling Heavy-Tailed Insurance Data

Mothibe, Lira B.; Shongwe, Sandile C.

doi:10.3390/risks14010011

Open AccessArticle

Model Averaging and Grid Maps for Modeling Heavy-Tailed Insurance Data

by

Lira B. Mothibe

and

Sandile C. Shongwe

^*

Department of Mathematical Statistics and Actuarial Science, Faculty of Natural and Agricultural Sciences, University of the Free State, Bloemfontein 9301, South Africa

^*

Author to whom correspondence should be addressed.

Risks 2026, 14(1), 11; https://doi.org/10.3390/risks14010011

Submission received: 17 November 2025 / Revised: 18 December 2025 / Accepted: 22 December 2025 / Published: 5 January 2026

(This article belongs to the Special Issue Statistical Models for Insurance)

Download

Browse Figures

Versions Notes

Abstract

This work presents a practical approach to improve risk quantification for heavy-tailed insurance claims through model averaging and grid map visualization, addressing the drawbacks of traditional single “best” model selection commonly used in actuarial and model-fitting literature. This is a data-driven study with a focus on Danish fire loss data, where the following are fitted: (i) 16 standard single distributions, (ii) 256 composite distributions, and (iii) 256 mixture distributions; wherein, for the composite and mixture distributions, we focus on the top 20 leading models in terms of the information criterion (i.e., Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC)). Model selection uncertainty is explicitly addressed by AIC and BIC weighted averaging within the Occam’s window (relying on weighted point estimates), while grid maps simultaneously plot information criteria against risk measures, specifically the Value-at-Risk (VaR) and Tail Value-at-Risk (TVaR) at 95% and 99% thresholds, to highlight critical-fit versus tail-risk trade-offs. It is observed that the model-averaged risk measures from composite models align more closely with the empirical values. That is, model-averaged estimates across all categories align closely with empirical VaR_0.95 but conservatively elevate TVaR_0.99, promoting safer capital reserves. Grid maps and model averaging confirm that mixture and composite models better capture the heavy-tailed nature of Danish fire claims data as compared to fitting a single distribution.

Keywords:

heavy-tailed insurance data; model averaging; grid maps; standard single distributions; composite distributions; mixture models; Danish fire loss; Value-at-Risk; Tail Value-at-Risk; Occam’s window

1. Introduction and Literature Review

Precise modeling of heavy-tailed claims data is fundamental to the insurance sector, supporting risk evaluation, premium determination, and capital distribution (Ellili et al. 2023). Effectively capturing risk is crucial in both insurance and finance because it helps financial institutions forecast and mitigate potential loss, safeguard solvency, and fortify capital stability in unstable markets. In finance, accurate risk capture supports informed investment decisions and the prevention of systemic failures; however, in insurance, it directly influences the setting of premiums that reflect true exposure levels, the allocation of capital reserves to cover unexpected claims, and the overall resilience against catastrophic events (Chernobai et al. 2007). Without robust risk modeling, institutions face an elevated likelihood of financial crisis, as underestimated risks can deplete capital reserves and cause potential insolvency during stress periods.

For instance, Chernobai et al. (2007) highlight operational risks under frameworks like Basel II, where errors in risk assessment, such as flawed loss modeling or inadequate internal controls, have contributed to major financial collapses. A significant instance that triggered global bank failures and the 2008 global financial crisis was the flawed sale of mortgage-backed securities in the early 2000s, which illustrated how operational risk evaluation increased credit risk profiles (Chernobai et al. 2007). Collapses of such magnitude bring attention to the chain reaction of poor risk management. Initial modeling errors spread to financial systems with direct and indirect connection, resulting in significant economic setbacks and reduced stakeholder assurance. Recent studies have advanced the modeling of heavy-tailed and dependent insurance losses by introducing flexible dependence structures and bimodal distributions tailored to real-life claim datasets and directly enhancing the estimation of premiums, stable reserves, and conservative quantification of risk measures required under Solvency II (Yan et al. 2024; Yousof et al. 2023a, 2023b).

These issues extend to insurance, where incorrectly classifying claim distributions or failure to recognize extreme outcomes in the tails can cause substantial problems. Misclassification might lead to undervaluing premium rates, creating portfolios that are profitable but yet exposed to unanticipated extreme events (Embrechts et al. 2014). For instance, Embrechts et al. (2014) highlight that the downside of using a light-tailed distribution, like the normal distribution to model heavy-tailed claims, could cause underestimation of the Tail Value-at-Risk (TVaR) by a significant value (10–20%), leaving inadequate reserves for large claims. However, using an excessively heavy-tailed distribution can also cause overfunding, holding excessive funds far beyond the risk-required amounts and preventing competitive pricing. These require precise risk assessment to safeguard policyholders (Embrechts et al. 2014). Prolonged misclassification deteriorates reputation, drives up reinsurance, and increases the probability of business failure in extreme claim volumes.

Traditional modeling, often guided by the Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC), risks overfitting or underfitting, leading to poor generalization of loss distributions (Buckland et al. 1997). This study first discusses model averaging to mitigate model selection uncertainty that stems from over reliance on one single “best” model and secondly grid maps to balance complexity with fit for heavy-tailed insurance data and enhance forecasting (Miljkovic and Grün 2021; Claeskens and Hjort 2008; Blostein and Miljkovic 2019). Model selection uncertainty, which is the discrepancy between a chosen model and the true data-generating process, poses a significant challenge in insurance modeling (Taghizadeh-Mehrjardi et al. 2022; Ye et al. 2010; Hoeting et al. 1999). Selection of a single “best” model disregards competing models, resulting in inflated prediction confidence and biased risk measures, such as Value-at-Risk (VaR) and Expected Shortfall (ES) or TVaR (Buckland et al. 1997; Miljkovic and Grün 2021).

Model averaging counters reliance on a single model by weighting models based on their supporting evidence, using methods like Bayesian model averaging with posterior probabilities or frequentist approaches like Mallows model averaging with criteria like Mallows’ Cp (Amini and Parmeter 2011; Hansen 2007; Wang et al. 2009). For instance, Ando and Tsay (2010) showed Bayesian model averaging improves predictive probability, while Fragoso et al. (2018) emphasized its systematic weighting approach. Miljkovic and Grün (2021) used AIC and BIC weights to create heatmaps, revealing no single model consistently excels for insurance claims (see Figure 2.2 in Miljkovic and Grün 2021). This reduces overfitting and underfitting, ensuring robust risk estimates (Raftery et al. 2005; Volinsky et al. 1997). By thresholding Occam’s window, which is a model selection method used to identify a subset of plausible statistical models from a larger set of candidates in model averaging (Madigan and Raftery 1994), they prune implausible models and focus scrutiny on viable subsets.

Heavy-tailed distributions are critical for modeling insurance claims, where extreme losses dominate outcomes. Unlike Gaussian models, which underestimate tail probabilities, mixture and composite models tend to blend or splice distributions together (e.g., lognormal for bulk, Pareto for tails) to capture diverse behaviors (Aston 2006; Marambakuyana and Shongwe 2024). Grün and Miljkovic (2019) as well as Marambakuyana and Shongwe (2024) evaluated such models on Danish fire and South African taxi claims, selecting the “best” via BIC but overlooking uncertainty, risking skewed assessments. Model averaging’s value extends across fields. In ecology (Richards et al. 2011), multimodal inference improves predictions in noisy datasets (Dormann et al. 2018; Link and Barker 2006). In hydrology, Diks and Vrugt (2010) and Singh et al. (2010) showed it strengthens prediction. In economics, Moral-Benito (2015) and Steel (2020) used it for well-grounded forecasting, while Posada (2008) applied it in phylogenetics for phylogenetic trees. Financial and health applications further demonstrate their ability to reduce forecast errors (Wright 2008; Volinsky et al. 1997). By pruning implausible models, model averaging focuses on viable subsets, enhancing decision-making (Madigan and Raftery 1994).

Even though improvements have been made, when modeling heavy-tailed insurance data, it remains problematic and challenging to rely on single distribution to effectively capture unpredictability and rare extreme events (Marambakuyana and Shongwe 2024). Single-model frameworks, reliant on AIC, BIC, or Kolmogorov–Smirnov tests, amplify overconfidence, risking faulty reserve accuracy (Miljkovic and Grün 2021), but model averaging offers integrated, uncertainty-sensitive, refining risk metrics like VaR and TVaR for conservative yet precise estimates (Zhang et al. 2023). This framework prioritizes accuracy and resilience, strengthening insurance risk management (Embrechts et al. 2014; Hoeting et al. 1999).

In an effort to achieve stable risk modeling for heavy-tailed insurance data, Blostein and Miljkovic (2019) proposed the use of a novel grid mapping approach to integrate model selection criteria (AIC or BIC) with risk measures (VaR and TVaR) across the entire spectrum of considered models. The practical utility of the grid map approach is further illustrated through the analysis of a real dataset of left-truncated insurance losses. Grid maps facilitate comprehension by comparing AIC or BIC against VaR or TVaR at thresholds like 0.95 or 0.99, revealing fit versus tail performance trade-offs (Blostein and Miljkovic 2019; Taghizadeh-Mehrjardi et al. 2022).

For left-truncated data, common in claims exceeding deductibles, grid maps underscore how mixtures mitigate biases due to truncation, directing actuaries to balanced risk assessment models. Grid maps contextualize uncertainty, aiding strategic decisions in insurance by highlighting models that balance fit of the data and conservatism (Blostein and Miljkovic 2019). Using a dataset of Secura Re insurance claims with truncation due to deductibles, the grid map was applied by Blostein and Miljkovic (2019) to evaluate the single distribution as well as the two-component and three-component mixture models by separately plotting their AIC and BIC scores against VaR and TVaR estimates. It was observed that the single lognormal model is the better fit based on the AIC and BIC criteria, with the gamma-Weibull mixture model as the next better option.

It is first worth mentioning that Danish fire claims data has not yet been analyzed using model averaging or grid maps. Thus, this study applies both model averaging and grid mapping to the Danish fire claims (2492 observations, millions of Danish Kroner) dataset, a classic example of heavy-tailed losses; for a more detailed account of other studies based on Danish fire claims data, see Shongwe and Marambakuyana (2024). Even though Danish fire claims data has been widely studied, previous research has focused on finding the single “best” distribution plausible for modeling the data. Model selection uncertainty mitigation (model averaging) and the balance between model fit and risk metrics (grid mapping) have not been studied before; this is significantly important as it gives a broader picture of the data studied and nuances that come with data modeling using various distributions. The research objective is to investigate whether model averaging and grid mapping approaches, separately, are able to capture the heavy-tailed nature of Danish fire claims data by using the distributions outlined in Appendix A: (i) 16 single distributions—Table A1; (ii) 256 composite distributions—Table A2; and (iii) 256 mixture distributions—Table A2. Consequently, the corresponding research questions (RQ) that will be addressed are as follows:

(RQ1) How do the performances of single, composite, and mixture models compare when applied to Danish fire claims data using model averaging and grid mapping techniques?
(RQ2) Does model averaging and grid mapping have a significant difference in model selection?
(RQ3) How does model averaging and grid mapping affect the overall risk assessment in heavy-tailed data?

The remaining sections of this paper are structured as follows. Section 2 outlines the methodology used in this paper. Section 3 provides a detailed analysis and the results of Danish fire claims data modeling. Thereafter, a detailed discussion of related publications and the results of this paper are summarized in Section 4. The conclusions and limitations of this study are provided in Section 5 and Section 6, respectively. Appendix A lists all the distributions used in this paper, and Appendix B provides some of the theoretical characteristics of the distributions considered. Finally, Appendix C provides the sensitivity analysis of the Occam’s window and the distribution’s Q-Q plots.

2. Methodology

2.1. Model Specification

The 16 standard models used in this study and combinations of composite and mixture models are outlined in Appendix A (Table A1 and Table A2). Model specification measures of single, composite, and mixture models differ in every category (see Appendix B.1, Appendix B.2 and Appendix B.3) since each has a different negative log-likelihood (NLL) function. Note that in each case,

N L L = - l (θ)

, where

l (θ)

denotes the maximised log-likelihood function of a model, so that the AIC is defined as

A I C = 2 N L L + 2 p,

(1)

which balances model fit and complexity by penalizing the number of parameters (

p

), and the BIC is defined as

B I C = 2 N L L + p l o g (n),

(2)

where

n

is the number of observations or sample size. A lower AIC or BIC suggests a more optimal model, especially when comparing models with different numbers of parameters.

2.2. Model Averaging

Accounting for model selection uncertainty requires determining weights for each of the considered models based on the available data. These weights represent the evidence supporting each leading model among the set of models being evaluated. By calculating an average estimate using these model weights, the need to select a single “best” model is eliminated (Miljkovic and Grün 2021). Instead of depending on a single model, this technique integrates several plausible models into the analysis, resulting in a more resilient and thorough approach. Key risk measures, such as VaR and TVaR, are crucial metrics in model fitting. Over the years, various methods have been developed to determine these model weights, each with its own underlying principles and assumptions. The choice of weight determination approach can significantly impact the resulting model weights and, consequently, the average estimate.

Careful consideration of the specific modeling context and objectives is crucial when selecting an appropriate weight determination method to ensure that the model selection uncertainty is adequately addressed. Let

x = {(x_{1}, {\dots, x}_{n})}^{'}

represent a vector with

n

independent losses,

Λ

containing a class of models, with

F = \{M_{i} : i = 1, \dots, K\}

considered in modeling this data.

M_{i}

represents the

i^{t h}

known parametric model, and the model weights are generally given by

{\hat{β}}_{i} (x) = \frac{e^{- (\frac{∆ J_{i}}{2})}}{\sum_{i = 1}^{R} e^{- (\frac{{∆ J}_{i}}{2})}},

(3)

where

{\hat{β}}_{i} (x)

for model

M_{i}

represents the weight for each model, such that

{\hat{β}}_{i} (x) > 0

and

\sum_{i = 1}^{K} {\hat{β}}_{i} (x) = 1

, and

J_{i}

provides a quantitative measure of how well each model represents the observed data while accounting for factors such as model complexity, with

J_{i} = - 2 \log (x_{i}) + q (p_{i}),

(4)

where

x_{i}

is the maximized likelihood of model

i

and

q (p_{i})

is the penalty for increasing parameters, with

p_{i}

to be estimated for model

i

. In this instance,

J_{i}

represents the information criterion (AIC or BIC). This is essentially used in the search for the best distribution for modeling data to obtain better insights on how to handle future events. For insurance data, the smallest values of AIC or BIC among the candidate models indicate the data follows the said distribution (e.g., exponential), and therefore, the distribution can be used to forecast future claims. In the AIC averaging technique, the penalty

q (p)

is

2 (p),

while the penalty for the BIC averaging technique is

q (p) = p l o g (n)

, where

n

is the training sample size (see Taghizadeh-Mehrjardi et al. 2022). The

∆ J_{i}

for the BIC (assuming 1 true model exists) and AIC (focusing on prediction) is given by

∆ J_{i} = {B I C}_{i} - {B I C}_{m i n} and ∆ J_{i} = {A I C}_{i} - {A I C}_{m i n},

(5)

respectively.

In Bayesian inference, a model’s posterior probability given the data is proportional to its marginal likelihood multiplied by its prior probability. This framework requires specifying prior probabilities for candidate models. When models within a class are assumed equally likely a priori, the BIC approximates log Bayes factors under a unit information prior and moderate-to-large sample sizes. BIC-based model weights implicitly assume the true model exists within the candidate set, a principle central to Bayesian model averaging. The measure

∆ J_{i}

, derived from BIC differences, quantifies relative model probabilities within this paradigm. In contrast, the AIC operates within a frequentist framework, aiming to minimize information loss by balancing model fit and complexity.

Unlike BIC, AIC does not assume the true model is among the candidates and instead seeks the optimal approximating model. It penalizes overfitting using a fixed penalty term

2 (p)

, where

p

is the number of parameters, making it less conservative than BIC in large samples. AIC weights reflect predictive performance rather than posterior probabilities and do not require prior specifications. However, AIC lacks consistency in selecting the true model as sample sizes grow, favoring more complex models compared to BIC. The

{B I C}_{m i n}

and

{A I C}_{m i n}

represent the minimum value of each information criterion among all considered models in

F

. If

∆ J_{i} = 0

, the model

M_{i}

is the leading model among the considered class, and the larger

{∆ J}_{i}

is, the smaller the posteriori probability of model

i

is given the dataset. This means that model weights are concentrated only on a few best performing models; the other models have a near zero weight contribution.

Kullback–Leibler (KL) information, always non-negative from information theory that quantifies how one probability distribution

f (x | ϑ_{1}, ϑ_{2}, ϕ)

diverges from the true distribution

g (x | ϑ_{1}^{*}, ϑ_{2}^{*}, ϕ)

, measures the expected information loss or “surprise” when using the model distribution

f (x | ϑ_{1}, ϑ_{2}, ϕ)

to approximate the true

g (x | ϑ_{1}^{*}, ϑ_{2}^{*}, ϕ)

(Kullback and Leibler 1951), which is mathematically, for discrete distributions, as follows:

I_{i} (g, f_{i}) = \int (g (x| ϑ_{1}^{*}, ϑ_{2}^{*}, θ)) l n \frac{(g (x| ϑ_{1}^{*}, ϑ_{2}^{*}, θ))}{(f_{i} (x| ϑ_{1}, ϑ_{1}, θ))} d x

(6)

where

I_{i} (g, f_{i})

is information lost when a model is chosen to approximate the true model. KL divergence underpins AIC weights as an estimate of information loss, while BIC weights derive from Bayesian marginal likelihood. The KL information, always ≥ 0, measures the information lost in using any other model that is not the “true” model. One issue is that, when using model averaging, the model class

F = \{M_{i} : i = 1, \dots, K\}

can be very large, resulting in challenges. Madigan and Raftery (1994) proposed a way to discard models with inferior likelihood given the data using Occam’s window. Occam’s window limits the model space to a manageable subset of plausible models, improving computational efficiency and avoiding dilution of model averaging weights by poor models. By focusing on models within Occam’s window, the average estimate balances model fit and complexity, leading to more stable and interpretable combined predictions. This approach is often combined with weighting schemes based on criteria like AIC, BIC, or posterior model probabilities, which themselves relate to KL divergence measures. Also, by focusing only on models in the Occam’s window, dilution of weights on implausible models is avoided. The model space is reduced with the following criterion:

F^{'} = \{M : \frac{{m a x}_{r ϵ \{1, \dots, K\} {\hat{β}}_{i} (x)}}{{\hat{β}}_{i} (x)} \leq R\},

(7)

where

R

is a positive constant.

From the above equations, the model-averaged point estimates for single, composite, and mixture models for the VaR and TVaR at security level

α

are as provided by Miljkovic and Grün (2021):

\sum_{i = 1}^{K} {\hat{β}}_{i} (x) F_{i} (\bar{{V a R}_{i} (X)}| M_{i}) = α,

(8)

and

\bar{c} = \sum_{i = 1}^{K} {\hat{β}}_{i} (x) {T V a R}_{i} (\bar{{V a R}_{i} (X)}),

(9)

respectively, for all considered models in the space

F = \{M_{i} : i = 1, \dots, K\}

. For a brief discussion on the sensitivity analysis of Occam’s window, see Appendix C.

2.3. Grid Maps

The visual nature of grid maps is a key advantage, as it enables the gain of insights into how well each model fits the left-truncated data. This is especially critical due to the complications caused by truncation, potentially producing biased outcomes without proper adjustment. By presenting the results in a grid format, decision-makers can more easily interpret the implications of their model choices, ultimately leading to more informed and strategic decisions. Grid maps applied in the context of heavy-tailed loss data are particularly relevant in fields such as insurance and risk management. In these domains, accurate modeling of loss data is vital for risk evaluation, pricing, and informed decision-making. By taking advantage of the insights provided by grid maps, specialists in these fields improve their understanding of the underlying loss distributions and make more informed strategic choices.

The application of different thresholds, VaR and TVaR values such as 0.95 and 0.99, is critical for applying mixture models on left-truncated loss data in conjunction with AIC and BIC (see Blostein and Miljkovic 2019). Different thresholds help identify which data points are relevant for analysis, ensuring that only meaningful observations are included. VaR values quantify potential losses at defined confidence levels, while TVaR provides insights into the average loss that exceeds the VaR threshold, thus enhancing the understanding of tail risks. By highlighting AIC and BIC, we can evaluate the trade-offs between model complexity and goodness of fit, allowing for the selection of models that convey the underlying distribution of the data while accounting for truncation effects. The expected results from this methodology include improved risk assessments and more informed decision-making in finance and insurance. In Blostein and Miljkovic (2019), a statistical methodology is introduced for fitting left-truncated loss data using a G-component finite mixture model. This model employs any combination of gamma, lognormal, and Weibull distributions. The expectation-maximization (EM) algorithm, along with the EM initialization strategy, is used for model fitting.

A grid mapping is recommended by Blostein and Miljkovic (2019), as it considers the model information criterion (AIC or BIC) and risk measures (VaR or TVaR) simultaneously, using the entire space of models under consideration. This framework improves decision-making by providing actuaries with a transparent, visual tool to identify models that accurately capture both fit and extreme loss potential, crucial for financial stability in insurance markets. Models with moderate AIC or BIC scores but stable VaR or TVaR estimates (close to empirical values) often tend to be overlooked by traditional methods, which ends up contributing significantly to model predictions and can distort reserving accuracy. Unlike traditional methods that focus solely on AIC or BIC to select a single “best” model, this approach leverages the full spectrum of models such as single, composite, and mixture distributions to provide a comprehensive visualization of model performance and risk profiles. The grid map counters the risk of overlooking models that balance fit and risk, mitigating traditional frameworks.

2.4. R Studio 4.5.0 Packages

This study used the R programming software version 4.5.0 (with the full codes used in this paper provided as Supplementary Materials), with the following libraries and their main purpose provided in parenthesis: the SMPracticals (to access Danish fire claims data and certain statistical modeling capabilities), moments (to compute the descriptive measures), fitdistrplus (to conduct distribution fitting and goodness-of-fit assessment), ggplot2 (to create customizable graphics for data visualization), viridis (provides color scales for uniformity in visualization), mixtools (implements tools for mixture model estimation), actuar (implements tools for modeling of loss distributions), CompDist (implements tools for composite model estimation), and cluster (supports in cluster analysis).

3. Analysis and Results

3.1. Data Descriptive

In our study, we use Danish fire claims data, as shown in Table 1, consisting of 2492 claims reported in millions of Danish Krones (DKKs). Danish fire loss claims data shows that most fire insurance claims are relatively modest (with a median of 1.63 million DKK), but there are a few extremely large claims (up to 263.25 million DKK) that dramatically increase the average claim size to 3.06 million DKK. The data is extremely heavy tailed (i.e., kurtosis 549.57), meaning that, while most claims are small, rare catastrophic events cause massive losses, as we can see with the skewness (19.90). This pattern reflects very high variability (coefficient of variation 2.60), which is a ratio of the standard deviation to the mean. Previous research suggests that a coefficient of variation of 0.2 (20%) or less is often considered to resemble more stable data, while de Campos et al. (2018) highlight the high variability nature of Danish fire claims data.

Figure 1a shows an increasing mean excess plot for Danish fire claims data, which seems to have some outliers in the extreme end of the tails, indicating that a single distribution may not easily account for the varying heavy-tailed pattern; hence, the variability in Table 1 is very high. The histogram in Figure 1, with an overlay of the normal distribution (in red), depicts that, indeed, the heavy-tailed data (very large kurtosis and skewness values) does not follow the normal distribution. The plot of total within-cluster sum of squares (i.e., elbow method) against the number of clusters (k) in Figure 1c suggests clusters with elbow at

k = 3

. This bending point signifies that increasing the number of clusters past three yields only minimal improvement. Accordingly,

k = 3

is identified as the optimal clustering solution under the elbow method criterion.

3.2. Single Distributions Averaging

AIC weights for Danish fire claims in Table 2 suggest that Burr is the only plausible model for modeling the data. BIC weights strongly favor Burr, indicating it provides a better trade-off between fit and simplicity. Models with near-zero weights (e.g., inverse Weibull, lognormal, etc.) are statistically very unlikely to be the most appropriate models for this dataset. The difference between AIC and BIC weights highlights the importance of considering model complexity: AIC is more forgiving of additional parameters, while BIC is stricter. Raftery et al. (1997) demonstrated that Occam’s window avoids selecting overfit models. Also, a cut-off based on an AIC and a BIC difference of 10 (

δ \leq 10

) (see the sensitivity analysis in Appendix C, Figure A1) allows for a focused yet flexible set of plausible models for averaging. A smaller

δ

value gives a stricter set of plausible models for averaging. The

δ \leq 10

dictates that Burr is the only plausible model for selection.

The Burr distribution in Table 3 dominates both AIC and BIC weighted averages, delivering identical point estimates. The VaR or TVaR values are 8–140% higher than empirical figures, providing a conservative (capital-safe) buffer for the heaviest tails. For Danish fire claims, the Burr model is, therefore, the single most sound parametric choice for tail-risk quantification and pricing.

Figure 2 shows that Burr out-weighs the other 15 models in model averaging, with the highest weights in both the AIC and BIC for Danish fire claims data, with all the weight concentrated in one distribution. Burr is the only competing model in the Occam’s window at

δ \leq 10

, outlined in red.

3.3. Composite Distributions Averaging

Inverse Burr-Burr leads in the lowest AIC and highest AIC weight (0.580) in Table 4, better balancing fit and complexity. “Weibull-Inverse Weibull” has the lowest BIC, and it is favored for simplicity. Discarded models are highlighted in blue, with only 11 models in the Occam’s window at

δ \leq 10

if sorted in terms of the AIC. Models with Inverse Weibull distributions dominate, capturing data relatively better than the other considered models. AIC favors “Inverse Burr-Burr,” while BIC spreads weights more evenly (max 0.163). If the models are sorted in terms of the AIC, then Inverse Burr-Burr would be the most appropriate model, followed by Inverse Burr-Inverse Weibull.

Note that, in Table 5, the risk measures calculated are the same as those in Table 8 of Grün and Miljkovic (2019), as we both used the same Danish fire claim data. Note though, in Table 5, we are implementing the model averaging approach, and according to its methodology, only 11 models are applicable in calculating the point estimate for AIC. Most composite models in Table 5, such as Weibull-Inverse Weibull or paralogistic-Burr, produce VaR estimates close to empirical values. Model-averaged estimates are relatively closer to empirical VaR but overestimate TVaR_0.99, with BIC being less conservative than AIC. From the top 20 composite models for Danish fire claims data (Grün and Miljkovic 2019), model averaging dictates that there are only 11 composite models in the Occam’s window in terms of the AIC and yet has weights in all top 20 models in terms of the BIC. All weights are, therefore, incorporated in the model-averaged risk estimates. The model-averaged results mean insurers can expect the worst 1% of claims to be around 75.63 M DKK. This is valuable in premium setting and reserving.

Inverse Burr-Burr in Figure 3 has the highest weight in the AIC only and very weak weights in the BIC for the top 20 composite models; however, Inverse Burr-Inverse Weibull shows a balance in the AIC and BIC weights, proving to be the most appropriate to represent the “true” model and trade-offs between the models.

3.4. Mixture Distributions Averaging

The Burr-Burr mixture model in Table 6 seems to outperform the other considered models for Danish fire loss data, with AIC and BIC weights of 0.922 and 0.431, respectively, indicating the relatively better fit–complexity balance. The next competing models, Inverse Weibull-Burr (AIC: 0.040, BIC: 0.343) and Loglogistic-Burr (AIC: 0.015, BIC: 0.127), have much lower weights. Non-Burr mixtures (e.g., Inverse Weibull-Inverse Burr) have negligible weights (<0.001). The

δ \leq 10

in the top 20 mixture models for Danish loss data selects only five plausible models for modeling the data; however, paralogistic-Burr (highlighted in blue) is implausible in terms of the AIC and yet has weights in the BIC.

In Table 7, model-averaged estimates are close to empirical VaR_0.95 but overestimate VaR_0.99 and TVaR_0.95 while underestimating TVaR_0.99. Generally, model averaging balances individual model variability, providing stable estimates that align well with empirical VaR but are conservative for TVaR, suggesting caution in predicting extreme losses, which is valuable for risk management in insurance.

Figure 4 also confirms Burr-Burr as the top model, followed by Inverse Weibull-Burr. Mixture models show Burr dominates in both AIC and BIC weights for the top 20 mixture models. Both composite and mixture models strike a balance in the number of models in the Occam’s window.

3.5. Grid Mapping Analysis

For better visualization, model names are abbreviated to prevent unreadable clustered plots. Table 8 shows the set of single models for Danish fire claims data, which includes Burr (B), Inverse Weibull (Iw), Inverse Burr (IB), Inverse Paralogistic (IPl), Inverse Gamma (IG), Generalized Pareto (GenP), Loglogistic (Ll), Lognormal (Ln), Paralogistic (PLl), Inverse Gaussian (IGn), Inverse Exponential (IE), Inverse Pareto (IP), Pareto (P), Gamma (G), Weibull (W), and Exponential (E).

The composite models considered are Weibull-Inverse Weibull (WIW), Paralogistic-Inverse Weibull (PIW), Inverse Burr-Inverse Weibull (IBIW), Weibull-Inverse Paralogistic (WIPl), Inverse Burr-Inverse Paralogistic (IBIPl), Paralogistic-Inverse Paralogistic (PlIPl), Weibull-Loglogistic (WLl), Inverse Burr-Loglogistic (IBLl), Paralogistic-Loglogistic (PlLl), Loglogistic-Inverse Weibull (LlIW), Weibull-Burr (WB), Paralogistic-Burr (PlB), Inverse Burr-Burr (IBB), Loglogistic-Inverse Paralogistic (LlIPl), Inverse Burr-Inverse Gamma (IBIG), Paralogistic-Inverse Gamma (PlIG), Loglogistic-Loglogistic (LlLl), Weibull-Paralogistic (WPl), Paralogistic-Paralogistic (PlPl), and Inverse Burr-Paralogistic (IBPl).

Among the mixture models, there are Burr-Burr (BB), Inverse Weibull-Burr (IWB), Loglogistic-Burr (LlB), Inverse Paralogistic-Burr (IPlB), Paralogistic-Burr (PlB), Inverse Burr-Burr (IBB), Gamma-Burr (GB), Lognormal-Burr (LnB), Generalized Pareto-Burr (GenPB), Inverse Gaussian-Burr (IGnB), Inverse Gamma-Burr (IGB), Inverse Exponential-Burr (IEB), Exponential-Burr (EB), Inverse Pareto-Burr (IPB), Weibull-Burr (WB), Pareto-Burr (PB), Inverse Weibull-Inverse Burr (IWIB), Inverse Paralogistic-Inverse Weibull (IPlIW), Inverse Weibull-Inverse Gamma (IWIG), and Inverse Burr-Inverse Burr (IBIB). This notation system provides a concise and systematic way to reference the wide variety of models used in fitting Danish fire claims across single, composite, and mixture distributions.

AIC for single models ranges from 7676 to 10,565, with Burr lower (~7676) and exponential and Pareto higher (>10,000) in Figure 5. Composite models cluster at an AIC of 7640–7650 (e.g., Inverse Burr-Inverse Weibull), balancing fit and complexity. AIC for mixture models ranges from 7586 to 7620, led by Burr-Burr. BIC for single models spans from 7693 (Burr) to 10,570 (exponential), with Burr (7693.7) being stronger. Composite models have a low BIC (7640–7650), minimizing tail risk. BIC for mixture models ranges from 7586 to 7620, with Burr-Burr (7586.95) being better but riskier because, despite being the leading fit in that model class, it has a higher tail risk than most composite models. This increases capital risk by predicting extreme losses more vigorously. Composites tend to better balance the information criterion and risk measures with the lowest AIC or BIC and moderate tail risk measures; singles vary a lot in tail risk measures, and mixtures tend to have moderate AIC or BIC and moderate-to-high tail risk.

Models cluster, as shown in Figure 6, at AIC or BIC 7640–7950, VaR 5–9M DKK, and TVaR 18–27M. Composite models (e.g., inverse Burr-inverse Weibull) balance low AIC or BIC (~7640) and moderate VaR and TVaR (~8M and 22M). Single models like Burr have higher AIC or BIC (~7676) and VaR (~9M), while inverse paralogistic has lower VaR (~5.4M) but higher AIC (~7800). Mixture Burr models show high TVaR (30–60M) and AIC or BIC. Low AIC or BIC vs. moderate VaR or TVaR (close to empirical values) shows the model fits the data points relatively well and that the use of that model will yield relatively predictable and anticipated risks. This is critical as the most appropriate model selection is required since it sets triggers and drives capital requirements, which, in turn, determine the profitability of a company.

Figure 7 shows that models cluster at AIC or BIC 7640–7700, VaR 20–27M DKK, and TVaR 60–130M. Composite models (e.g., Weibull-inverse Weibull) balance low AIC or BIC (~7640) and moderate VaR and TVaR (22M and 62M). Single Burr models have higher AIC/BIC (~7690) and VaR (~30M). Burr mixtures show elevated VaR (up to 60M) and TVaR (>200M), with AIC or BIC ~7600, trading fit for higher tail risk. Composites tend to have a better tendency in balancing criteria and controlled risk. A clear distinction at 99% confidence is that mixture models set capital requirements and reserves too high. This risks the competitive edge that comes from holding over-excessive capital for almost unrealistic future extreme claims.

4. Discussion of Research Questions

This study outlines the imperfections in the single “best” model approach to loss distribution modeling, especially in heavy-tailed insurance loss data, specifically when applied to the Danish fire claims dataset. Earlier works, such as Grün and Miljkovic (2019) and Marambakuyana and Shongwe (2024), fitted models to the same data but relied exclusively on the selection of a single “best” model without considering model uncertainty and Occam’s window model trimming. The current study addresses these gaps by systematically applying weighted model averaging within Occam’s window

(δ \leq 10)

across all three model classes and introducing grid maps that simultaneously plot information criteria against VaR or TVaR at 95% and 99% thresholds.

To address RQ1, model averaging with both AIC and BIC weights confirms the preference for Burr-based models across categories, with composite models being the most appropriate out of the considered models in balancing interpretability and risk control most effectively within Occam’s window. Composite models closely align with empirical quantiles in grid mapping and Q-Q plots, with minor deviations reflecting their better balance, while single models vary widely, and mixture models are moderate (less efficient than composite model) in tail risk estimation.

For RQ2, model averaging complemented by grid maps produces statistically and practically significant advances over traditional single-model selection. Single-model reliance on Burr yields over-conservative TVaR_0.99 (130.99M DKK, 140% above empirical 54.60), meaning very high capital requirements to protect against catastrophic claims. In contrast, AIC-weighted composite averaging depicts a TVaR_0.99 of 75.41M DKK and BIC-weighted 64.96M DKK, which is relatively moderate. Under-conservative tail risk predictions can cause a great likelihood of vulnerability to financial distress when extreme events occur. Grid maps visually expose models previously hidden in Occam’s window (e.g., inverse Burr–inverse Weibull) that balance the fit of most of the data points and tail risk relatively better than any single “best” model, reducing selection bias and overconfidence in risk metrics.

Finally, for RQ3, to a greater extent than its competitors considered here, model averaging mitigates model uncertainty inherent in the over-reliance on the single “best” model by weighing competing models and using the derived weights to compute the VaR and TVaR, thus lowering overconfidence and bias. Grid mapping, on the other hand, tends to balance fit with the corresponding risk measures to establish which models have less information criterion values and do not overestimate or underestimate tail risk when compared to the empirical risk measures.

5. Conclusions

This data-driven study implements two methods, model averaging and grid maps, to fit the most appropriate distributions to Danish fire claims data, which are significantly positively heavy-tailed with extreme right tail outliers, have a large kurtosis, and have possibly two or three clusters (i.e., multimodal). The Burr distribution leads for Danish fire claims under single distributions; however, it is over-conservative when compared to empirical risk values. It is observed that the concentration of weights in one model risks overfitting. In pricing, the TVaR (131M DKK) shows more exposure, meaning higher premiums are needed due to the heavy tail behavior. This also means much larger reserves are needed to cover future extreme claims and demands significantly higher solvency capital. The Burr-Burr mixture model leads on Danish fire claims data with six plausible models and the lowest information criterion values across model classes. Model averaging, which relies on weighted point estimates, yields risk estimates that tend to be closer to the empirical ones; note though that the estimates are accurate for VaR_0.95 but overestimate VaR_0.99 and TVaR_0.95; however, TVaR_0.99 is underestimated. Even though mixture models have the lowest AIC or BIC across all model classes, model-averaged risk estimates from composite models present more stable estimates (close to empirical values).

Grid maps highlight the performance of mixture and composite models, such as Burr-Burr, which cluster tightly with low AIC or BIC and moderate VaR or TVaR, outperforming single models. Composite models provide a better balance between goodness of fit, parsimony, and tail risk control. Their moderate AIC or BIC combined with moderate VaR or TVaR indicates these models represent the heavy-tailed Danish fire claims while mitigating overly confident or risky tail estimates. This is ideal for stable risk quantification that is not too low or too high but close to empirical values. These results guide insurers in setting reserves, with Burr-based models providing conservative tail risk estimates for Danish fire claims. Overall, it is observed that mixture and composite models better capture the heavy-tailed nature of Danish fire claims data, with models like Burr-Burr and some of the composite models showing slight overestimation or underestimation of extreme tails and single models, like exponential, exhibiting a much poorer fit. Model averaging, supported by grid maps, enhances risk quantification but suggests higher reserves for extreme losses due to conservative TVaR estimates.

For a future study of Danish fire claims data, the possibility of using the train–test split scenario with the use of, say, the SMOTE (Synthetic Minority Oversampling Technique) approach, is recommended. This is to ensure that the extreme claims can be oversampled during the out-of-sample validation split and consequently build a model that has good predictive power. Also, it may be useful to investigate the quantification of deviations from the abline through tail-weighted measures. For this research work, the nlm () function and the nlminb () function in R package “stats” were used as optimization functions to find the maximum likelihood parameter estimates, and convergence was reached, that is, numerical instability was not encountered during implementation. Since, in this paper, the mixture estimations relied on direct MLE rather than explicit EM algorithms, for future research, other researchers may investigate the numerical instability or convergence when using the EM algorithms.

6. Limitations of This Study

The results of this study are not general for all insurance type data; these are only limited to Danish fire claims data. For other types of claims or insurance data, this analysis needs to be reassessed, and for Danish fire claims data, the inclusion of a detailed model-parameter uncertainty analysis is still required. This study did not consider the computation of bootstrap confidence intervals (for uncertainty quantification for VaR and TVaR), which would improve risk estimation reliability; hence, for future research, this is worth looking into. Also, varying the

δ

(Occam’s window) may lead to different model averaging results, and therefore, a sensitivity analysis is required (for this study, see Appendix C).

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/risks14010011/s1. R code used to reproduce the results in this paper is included as a supplementary file.

Author Contributions

Conceptualization, L.B.M. and S.C.S.; Methodology, L.B.M. and S.C.S.; Software, L.B.M. and S.C.S.; Validation, S.C.S.; Formal analysis, L.B.M.; Investigation, L.B.M.; Resources, S.C.S.; Data curation, L.B.M.; Writing—original draft, L.B.M.; Writing—review & editing, S.C.S.; Supervision, S.C.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data used in this study is available on R under the “SMPracticals” package (http://download.nust.na/pub3/cran/web/packages/SMPracticals/index.html, accessed on 1 March 2025). The original contributions presented in this study are included in the Supplementary Material. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AIC	Akaike Information Criterion
BIC	Bayesian Information Criterion
BMA	Bayesian Model Averaging
cdf	Cumulative Distribution Function
DKKs	Danish Krones
ES	Expected Shortfall
MMA	Mallows Model Averaging
NLL	Negative Log-Likelihood
p	Number of Model Parameters
pdf	Probability Distribution Function
Q-Q	Quantile-Quantile
TVaR	Tail Value-at-Risk
VaR	Value-at-Risk
WSS	Within-cluster Sum of Squares

Appendix A. Model Components

Table A1 shows the list of 16 single distributions used in this research study.

Table A1. Summary of the 16 standard distributions and their properties.

Distribution	p	Parameters	PDF	CDF
Exponential	1	$θ > 0$	$\frac{e^{- \frac{x}{θ}}}{θ}$	$1 - e^{- \frac{x}{θ}}$
Inverse Exponential	1	$θ > 0$	$\frac{θ e^{- \frac{θ}{x}}}{x^{2}}$	$e^{- \frac{θ}{x}}$
Gamma	2	$α > 0, θ > 0$	$\frac{{(\frac{x}{θ})}^{α} e^{- \frac{x}{θ}}}{x Γ (α)}$	$Γ (α; \frac{x}{θ})$
Inverse Gamma	2	$α > 0, θ > 0$	$\frac{{(\frac{θ}{x})}^{α} e^{- \frac{θ}{x}}}{x Γ (α)}$	$1 - Γ (α; \frac{θ}{x})$
Inverse Gaussian	2	$μ > 0, θ > 0$	${(\frac{θ}{2 π x^{3}})}^{\frac{1}{2}} e^{- \frac{θ z^{2}}{2 x}},$ $z = \frac{x - μ}{μ}$	$Φ [z {(\frac{θ}{x})}^{\frac{1}{2}}] + e^{\frac{2 θ}{μ}} Φ [- y {(\frac{θ}{x})}^{\frac{1}{2}}]$ , $z = \frac{x - μ}{μ}$
Inverse Paralogistic	2	$τ > 0, θ > 0$	$\frac{τ^{2} {(\frac{x}{θ})}^{τ^{2}}}{x {[1 + {(\frac{x}{θ})}^{τ}]}^{τ + 1}}$	$u^{τ}, u = \frac{{(\frac{x}{θ})}^{τ}}{1 + {(\frac{x}{θ})}^{τ}}$
Inverse Pareto	2	$τ > 0, θ > 0$	$\frac{τ θ x^{τ - 1}}{{(x + θ)}^{τ + 1}}$	${(\frac{x}{x + θ})}^{τ}$
Inverse Weibull	2	$τ > 0, θ > 0$	$\frac{τ {(\frac{θ}{x})}^{τ} e^{- {(\frac{θ}{x})}^{τ}}}{x}$	$e^{- {(\frac{θ}{x})}^{τ}}$
Loglogistic	2	$γ > 0, θ > 0$	$\frac{γ {(\frac{x}{θ})}^{γ}}{x {[1 + {(\frac{x}{θ})}^{γ}]}^{2}}$	$\frac{{(\frac{x}{θ})}^{γ}}{1 + {(\frac{x}{θ})}^{γ}}$
Lognormal	2	$μ > 0, σ > 0$	$\frac{1}{x σ \sqrt{2 π}} e^{- \frac{z^{2}}{2}} = \frac{ϕ (z)}{σ x}$ , $z = \frac{l n x - μ}{σ}$	$Φ (z)$
Paralogistic	2	$α > 0, θ > 0$	$\frac{α^{2} {(\frac{x}{θ})}^{α}}{x {[1 + {(\frac{x}{θ})}^{α}]}^{α + 1}}$	$1 - u^{α}, u = \frac{1}{1 + {(\frac{x}{θ})}^{α}}$
Pareto	2	$α > 0, θ > 0$	$\frac{α θ^{α}}{{(x + θ)}^{α + 1}}$	$1 - {(\frac{θ}{x + θ})}^{α}$
Weibull	2	$τ > 0, θ > 0$	$\frac{τ {(\frac{x}{θ})}^{τ} e^{- {(\frac{x}{θ})}^{τ}}}{x}$	$1 - e^{- {(\frac{x}{θ})}^{τ}}$
Burr	3	$α > 0, γ > 0, θ > 0$	$\frac{α γ {(\frac{x}{θ})}^{γ}}{x {[1 + {(\frac{x}{θ})}^{γ}]}^{α + 1}}$	$1 - u^{α}, u = \frac{1}{1 + {(\frac{x}{θ})}^{γ}}$
Generalized Pareto	3	$α > 0, τ > 0, θ > 0$	$\frac{Γ (α + τ)}{Γ (α) Γ (τ)} \frac{θ^{α} x^{τ - 1}}{{(x + θ)}^{α + τ}}$	$β (τ, α; u), u = \frac{x}{x + θ}$
Inverse Burr	3	$τ > 0, γ > 0, θ > 0$	$\frac{τ γ {(\frac{x}{θ})}^{γ τ}}{x {[1 + {(\frac{x}{θ})}^{γ}]}^{τ + 1}}$	$u^{τ}, u = \frac{{(\frac{x}{θ})}^{γ}}{1 + {(\frac{x}{θ})}^{γ}}$

Table A2 shows a summary of models fitted on Danish fire claims data from 16 single distributions to form the combinations that make up both the composite and mixture models. Composite models are made from two different distributions joined or combined at a threshold, where the one distribution fits the body (also referred to as head) and the other distribution fits the tail of the data. Mixture models are weighted averages of two distributions over the whole domain, where the convex combination of distributions capture the heterogeneous behavior of the data.

Table A2. Summary of how the 16 standard distributions that make composite and mixture models are combined.

No.	Head (Composite) 1st Distribution (Mixture)	Tail (Composite) 2nd Distribution (Mixture)	Composite and Mixture	Parameters	Total
1.	Burr	Burr	Burr-Burr	3 + 3 = 6	16
		Exponential	Burr-Exponential	3 + 1 = 4
		Gamma	Burr-Gamma	3 + 2 = 5
		Generalized Pareto	Burr-Generalized Pareto	3 + 3 = 6
		Inverse Burr	Burr-Inverse Burr	3 + 3 = 6
		Inverse Exponential	Burr-Inverse Exponential	3 + 1 = 4
		Inverse Gamma	Burr-Inverse Gamma	3 + 2 = 5
		Inverse Gaussian	Burr-Inverse Gaussian	3 + 2 = 5
		Inverse Paralogistic	Burr-Inverse Paralogistic	3 + 2 = 5
		Inverse Pareto	Burr-Inverse Pareto	3 + 2 = 5
		Inverse Weibull	Burr-Inverse Weibull	3 + 2 = 5
		Loglogistic	Burr-Loglogistic	3 + 2 = 5
		Lognormal	Burr-Lognormal	3 + 2 = 5
		Paralogistic	Burr-Paralogistic	3 + 2 = 5
		Pareto	Burr-Pareto	3 + 2 = 5
		Weibull	Burr-Weibull	3 + 2 = 5
2.	Exponential	Burr	Exponential-Burr	1 + 3 = 4	16
		Exponential	Exponential-Exponential	1 + 1 = 2
		Gamma	Exponential-Gamma	1 + 2 = 3
		Generalized Pareto	Exponential-Generalized Pareto	1 + 3 = 4
		Inverse Burr	Exponential-Inverse Burr	1 + 3 = 4
		Inverse Exponential	Exponential-Inverse Exponential	1 + 1 = 2
		Inverse Gamma	Exponential-Inverse Gamma	1 + 2 = 3
		Inverse Gaussian	Exponential-Inverse Gaussian	1 + 2 = 3
		Inverse Paralogistic	Exponential-Inverse Paralogistic	1 + 2 = 3
		Inverse Pareto	Exponential-Inverse Pareto	1 + 2 = 3
		Inverse Weibull	Exponential-Inverse Weibull	1 + 2 = 3
		Loglogistic	Exponential-Loglogistic	1 + 2 = 3
		Lognormal	Exponential-Lognormal	1 + 2 = 3
		Paralogistic	Exponential-Paralogistic	1 + 2 = 3
		Pareto	Exponential-Pareto	1 + 2 = 3
		Weibull	Exponential-Weibull	1 + 2 = 3
$⋮$	$⋮$	$⋮$	$⋮$	$⋮$	$⋮$
$⋮$	$⋮$	$⋮$	$⋮$	$⋮$	$⋮$
$⋮$	$⋮$	$⋮$	$⋮$	$⋮$	$⋮$
16.	Weibull	Burr	Weibull-Burr	2 + 3 = 5	16
		Exponential	Weibull-Exponential	2 + 1 = 3
		Gamma	Weibull-Gamma	2 + 2 = 4
		Generalized Pareto	Weibull-GenPareto	2 + 3 = 5
		Inverse Burr	Weibull-Inverse Burr	2 + 3 = 5
		Inverse Exponential	Weibull-Inverse Exponential	2 + 1 = 3
		Inverse Gamma	Weibull-Inverse Gamma	2 + 2 = 4
		Inverse Gaussian	Weibull-Inverse Gaussian	2 + 2 = 4
		Inverse Paralogistic	Weibull-Inverse Paralogistic	2 + 2 = 4
		Inverse Pareto	Weibull-Inverse Pareto	2 + 2 = 4
		Inverse Weibull	Weibull-Inverse Weibull	2 + 2 = 4
		Loglogistic	Weibull-Loglogistic	2 + 2 = 4
		Lognormal	Weibull-Lognormal	2 + 2 = 4
		Paralogistic	Weibull-Paralogistic	2 + 2 = 4
		Pareto	Weibull-Pareto	2 + 2 = 4
		Weibull	Weibull-Weibull	2 + 2 = 4
		Overall Number of Composite and Mixture Models:			256

Appendix B. Model Specifications

Appendix B.1. Single Model Specification

Let

F (x)

denote the cdf and

F^{- 1} (x)

denote the inverse cdf of a continuous random variable

X

. Then, the

{V a R}_{α} (X)

is the 100

α

% quantile of

F

, such that

P (X < {V a R}_{α} (X)) = α {V a R}_{α} (X) = F^{- 1} (α)

(A1)

Then,

{T V a R}_{α} (X)

represents the worst 100

(1 - α

)% average of all VaR values exceeding security level

α

, such that

{T V a R}_{α} (X) = E [X | X > {V a R}_{α} (X)] = \frac{1}{1 - α} \int_{α}^{1} {V a R}_{y} (X) d y .

(A2)

Appendix B.2. Composite Model Specification

The probability density function (pdf) of a composite parametric model in Abu Bakar et al. (2018) is given as

f ({x | ϑ}_{1}, ϑ_{2}, θ, ϕ) = \{\begin{matrix} \frac{1}{1 + ϕ} f_{1}^{*} (x | ϑ_{1}, θ), for 0 < x \leq θ, \\ \frac{ϕ}{1 + ϕ} f_{2}^{*} (x | ϑ_{2}, θ), if θ < x < \infty . \end{matrix}

(A3)

where

x = {(x_{1}, x_{2}, \dots, x_{n})}^{'}

defines a vector with

n

independent losses and

f_{1}^{*} (x | ϑ_{1}, θ)

and

f_{2}^{*} (x | ϑ_{2}, θ)

are truncated pdfs, which are defined in terms of their corresponding pdfs and cdfs as in Marambakuyana and Shongwe (2024):

f_{1}^{*} (x | ϑ_{1}, θ) = \frac{f_{1} (x | ϑ_{1})}{F_{1} (θ | ϑ_{1})},

(A4)

and

f_{2}^{*} (x | ϑ_{2}, θ) = \frac{f_{2} (x | ϑ_{2})}{1 - F_{2} (θ | ϑ_{2})},

(A5)

then,

F (ϑ_{1}, ϑ_{2}, θ, ϕ) = \{\begin{array}{l} \frac{1}{1 + ϕ} \frac{F_{1} (x | ϑ_{1})}{F_{1} (θ | ϑ_{1})}, & if 0 < x \leq θ, \\ \frac{1}{1 + ϕ} [1 + ϕ \frac{F_{2} (x | ϑ_{2}) - F_{2} (θ | ϑ_{2})}{1 - F_{2} (x | ϑ_{2})}], & if θ < x < \infty . \end{array}

(A6)

where

θ

is the continuity and differentiability threshold parameter and

ϕ \geq 0

is the weight parameter. For a random sample

x = \{x_{1}, x_{2}, \dots, x_{n}\}

, the log-likelihood function is given by

l (ϑ_{1}, ϑ_{2} | x) = \sum_{i = 1}^{n} \ln (f (x_{i} | ϑ_{1}, ϑ_{2}))

(A7)

The theoretical estimate for VaR of

X

(Miljkovic and Grün 2021) is defined by

{V a R}_{i} (X) = \{\begin{array}{l} F_{i 1}^{- 1} (α (1 + ϕ) F_{i 1} (θ | ϑ_{1})), & i f 0 < α \leq \frac{1}{1 + ϕ}, \\ F_{i 2}^{- 1} (F_{i 2} (x | ϑ_{2}) + \frac{({1 - F}_{i 2} (θ | ϑ_{2})) ((α (1 + ϕ) - 1) | ϑ_{2})}{ϕ}), & i f \frac{1}{1 + ϕ} < α \leq 1 \end{array}

(A8)

The theoretical estimates for TVaR of

X

are defined as

{T V a R}_{i} (π_{i}) = \{\begin{array}{l} \frac{1}{1 - α} [\frac{\int_{π_{i}}^{θ} x f_{i 1} (x | ϑ_{1}) d x}{F_{i 1} (θ | ϑ_{1})} + \frac{\int_{θ}^{\infty} x f_{i 2} (x | ϑ_{2}) d x}{1 - F_{i 2} (θ | ϑ_{2})}], & if 0 < α \leq \frac{1}{1 + ϕ}, \\ \frac{1}{1 - α} \frac{1}{(1 - F_{i 2} (θ | ϑ_{2}))} [\int_{π_{i}}^{\infty} x f_{i 2} (x | ϑ_{2}) d x], & if \frac{1}{1 + ϕ} < α \leq 1 . \end{array}

(A9)

Appendix B.3. Mixture Model Specification

The pdf of a two-component mixture model with mixing weights

π_{1} = \frac{1}{1 + ϕ}

and

π_{2} = \frac{ϕ}{1 + ϕ}

is given by

f (x | ϑ_{1}, ϑ_{2}, ϕ) = \frac{1}{1 + ϕ} f_{1} (x | ϑ_{1}) + \frac{ϕ}{1 + ϕ} f_{2} (x | ϑ_{2}), x > 0,

(A10)

The coefficients of

f_{1}

and

f_{2}

are called mixing weights and for

ϕ > 0

. For mixing weights, we have

\frac{1}{1 + ϕ} + \frac{ϕ}{1 + ϕ} = 1

. The corresponding cdf is given by

F (x | ϑ_{1}, ϑ_{2}, ϕ) = \frac{1}{1 + ϕ} F_{1} (x | ϑ_{1}) + \frac{ϕ}{1 + ϕ} F_{2} (x | ϑ_{2}), x > 0 .

(A11)

For a random sample

x = \{x_{1}, x_{2}, \dots, x_{n}\}

, the log-likelihood function is given as

l (ϑ_{1}, ϑ_{2}, ϕ | x) = - n \ln (1 + ϕ) + \sum_{i = 1}^{n} \{\ln [f_{1} (x_{i} | ϑ_{1}) + ϕ f_{2} (x_{i} | ϑ_{2})]\} .

(A12)

The theoretical estimate for VaR of

X

at

α

probability level is defined by

{V a R}_{α} (X) = F_{α}^{- 1} (x | ϑ_{1}, ϑ_{2}, ϕ),

(A13)

where

F_{x}^{- 1} (x | ϑ_{1}, ϑ_{2}, ϕ)

represents the inverse cumulative distribution function (Abu Bakar et al. 2018). The empirical estimate is found as the quantile value at probability level

α

:

{V a R}_{α} (X) = {\hat{F}}_{α}^{- 1} (x | ϑ_{1}, ϑ_{2}, ϕ),

(A14)

where

{\hat{F}}_{α}^{- 1} (x | ϑ_{1}, ϑ_{2}, ϕ)

represents the empirical inverse cumulative distribution function. We also examine the TVaR, which measures the mean loss beyond the

α

-level VaR. The theoretical expression for TVaR is

{T V a R}_{α} (X) = \int_{α}^{1} \hat{F} (x | ϑ_{1}, ϑ_{2}, ϕ) .

(A15)

The empirical estimate is calculated as the average of all observed values exceeding the estimated VaR as

{T V a R}_{α} (X) = \frac{1}{n} \sum_{i = 1}^{n} r_{i},

(A16)

where

r_{i} > {\hat{F}}_{α}^{- 1} (x | ϑ_{1}, ϑ_{2}, ϕ)

, for

i = 1, \dots, n

.

Appendix C. Occam’s $δ$ Sensitivity and Q-Q Plot Analysis

Figure A1 shows a joint risk measure sensitivity analysis for single, composite, and mixture models as

δ

changes. The red line shows that VaR and TVaR for all model classes converge at

δ = 10

at 95% and 99% confidence, meaning there is minimal (negligible) change in increasing the

δ

value above 10. Increasing the

δ

value will only include implausible models in the Occam’s window that have poor model fitting. Decreasing the value below the optimal

δ

value saturates weights on a much smaller number of models, increasing the risk of excessively underfitting. A sensitivity analysis finds the optimal value for risk measure convergence without the risk of over or under inclusion of models that are unnecessary in the Occam’s window for efficient averaging.

Against the heavy-tailed Danish fire claims in Figure A2, Burr overestimates quantiles (above the red abline), while inverse Weibull, inverse Burr, inverse paralogistic, inverse gamma, loglogistic, lognormal, paralogistic, and inverse Gaussian show downward curvature under the abline, underestimating quantiles and indicating lighter tails than the data. Generalized Pareto aligns well with minor tail mismatches. Inverse exponential and inverse Pareto overestimate tail extremes (above abline).

Figure A1. Occam’s window sensitivity analysis based on model-averaged risk measures.

The Pareto, gamma, Weibull, and exponential show downward curvature, underestimating variability, with exponential having the lightest tail and poorest fit. Models like inverse Weibull, inverse Burr, inverse paralogistic, inverse gamma, loglogistic, lognormal, paralogistic, and inverse Gaussian underestimate tail losses, implying lighter tails than the actual data, which might underestimate risk and capital needed for rare, severe claims. The Burr distribution tends to predict losses larger than observed (overestimates quantiles), leading to possibly conservative capital reserves, which may seriously affect profitability.

Composite models in Figure A3, like Weibull-inverse Weibull, paralogistic-inverse Weibull, and inverse Burr-inverse Weibull, align initially with the Danish fire claims’ Q-Q plot abline but curve slightly above, with tail points under the abline, indicating middle quantile overestimation and slight tail underestimation. Weibull-inverse paralogistic, inverse Burr-inverse paralogistic, and others deviate above the abline, overestimating quantiles with heavier tails than the data. Weibull-loglogistic and inverse Burr-loglogistic align well, with minor tail underestimation.

Burr-Burr and loglogistic-inverse paralogistic overestimate quantiles significantly, while inverse Burr-inverse gamma and paralogistic-inverse gamma fit closely, with slight tail underestimation. The Weibull-paralogistic, paralogistic-paralogistic, and inverse Burr-paralogistic deviate almost immediately above the abline, overestimating quantiles with heavier tails than the data. Composite models, like Weibull-loglogistic or inverse Burr-loglogistic, align well with the data and show only minor tail underestimation, meaning they provide a balanced risk quantification suitable for pricing and reserving purposes, with slightly lower risk assessments in extreme scenarios.

Figure A2. Q-Q plots for 16 single models for Danish fire claims data.

Burr-Burr, loglogistic-Burr, and similar mixtures overestimate Danish fire claims’ quantiles, as shown in Figure A4, with points above the Q-Q plot abline, suggesting heavier tails than the data. Inverse Weibull-Burr and inverse paralogistic-Burr overestimate mid-quantiles but underestimate tail extremes (since points are under the abline). Paralogistic-Burr, gamma-Burr, and others align briefly but deviate upward, overestimating quantiles. Inverse Weibull-inverse Burr and inverse paralogistic-inverse Weibull align well but underestimate tail values.

Figure A3. Q-Q plots for top 20 composite models for Danish fire claims data.

Inverse Burr-inverse Burr and inverse Weibull-inverse gamma fit closely, with slight tail underestimation. Models like Burr-Burr and loglogistic-Burr mostly overestimate claim quantiles, indicating that these models assume heavier tails than the data. This results in more conservative risk estimates, which may lead to higher capital reserves and potentially higher premiums. Inverse Burr-inverse Burr and inverse Weibull-inverse gamma fit the data closely with minimal tail underestimation, indicating balanced modeling of both regular and extreme claim amounts.

Figure A4. Q-Q plots for top 20 mixture models for Danish fire claims data.

References

Abu Bakar, Anuar S., Saralees Nadarajah, and Absl Kamarul Z. Adzhar. 2018. Loss modeling using Burr mixtures. Empirical Economics 54: 1503–16. [Google Scholar] [CrossRef]
Amini, Shahram M., and Christopher F. Parmeter. 2011. Bayesian model averaging in R. Journal of Economic and Social Measurement 36: 253–87. [Google Scholar] [CrossRef]
Ando, Tomohiro, and Ruey Tsay. 2010. Predictive likelihood for Bayesian model selection and averaging. International Journal of Forecasting 26: 744–63. [Google Scholar] [CrossRef]
Aston, John A. D. 2006. Modeling Macroeconomic Time Series via Heavy Tailed Distributions. Lecture Notes-Monograph Series 52: 138–48. [Google Scholar]
Blostein, Martin, and Tatjana Miljkovic. 2019. On modeling left-truncated loss data using mixtures of distributions. Insurance: Mathematics and Economics 85: 35–46. [Google Scholar] [CrossRef]
Buckland, Stephen T., Kenneth P. Burnham, and Nicole H. Augustin. 1997. Model selection: An integral part of inference. Biometrics 53: 603–18. [Google Scholar] [CrossRef]
Chernobai, Anna S., Svetlozar T. Rachev, and Frank J. Fabozzi. 2007. Operational Risk: A Guide to Basel II Capital Requirements, Models, and Analysis. Hoboken: John Wiley and Sons, Inc. [Google Scholar] [CrossRef]
Claeskens, Gerda, and Nils L. Hjort. 2008. Model Selection and Model Averaging. Cambridge Books. Available online: https://ideas.repec.org/b/cup/cbooks/9780521852258.html (accessed on 1 September 2025).
de Campos, Ana A. S. N., Leilane R. B. Dourado, Daniel Biagiotti, Natanael P. da Silva Santos, Daphinne C. N. Nascimento, and Katiene R. S. Sousa. 2018. Methods for classifying coefficients of variation in experimentation with poultrys. Comunicata Scientiae 9: 565–74. [Google Scholar] [CrossRef]
Diks, Cees G., and Jasper A. Vrugt. 2010. Comparison of point forecast accuracy of model averaging methods in hydrologic applications. Stochastic Environmental Research and Risk Assessment 24: 809–20. [Google Scholar] [CrossRef]
Dormann, Carsten F., Justin M. Calabrese, Gurutzeta Guillera-Arroita, Eleni Matechou, Volker Bahn, Kamil Bartoń, Colin M. Beale, Simone Ciuti, Jane Elith, Katharina Gerstner, and et al. 2018. Model averaging in ecology: A review of Bayesian, information-theoretic, and tactical approaches for predictive inference. Ecological Monographs 88: 485–504. [Google Scholar] [CrossRef]
Ellili, Nejla, Haitham Nobanee, Lama Alsaiari, Hiba Shanti, Bettylucille Hillebrand, Nadeen Hassanain, and Leen Elfout. 2023. The applications of big data in the insurance industry: A bibliometric and systematic review of relevant literature. The Journal of Finance and Data Science 9: 100102. [Google Scholar] [CrossRef]
Embrechts, Paul, Giovanni Puccetti, Ludger Rüschendorf, Ruodu Wang, and Antonela Beleraj. 2014. An academic response to Basel 3.5. Risks 2: 25–48. [Google Scholar] [CrossRef]
Fragoso, Tiago M., Wesley Bertoli, and Francisco Louzada. 2018. Bayesian model averaging: A systematic review and conceptual classification. International Statistical Review 86: 1–28. [Google Scholar] [CrossRef]
Grün, Bettina, and Tatjana Miljkovic. 2019. Extending composite loss models using a general framework of advanced computational tools. Scandinavian Actuarial Journal 2019: 642–60. [Google Scholar] [CrossRef]
Hansen, Bruce E. 2007. Least squares model averaging. Econometrica 75: 1175–89. [Google Scholar] [CrossRef]
Hoeting, Jennifer A., David Madigan, Adrian E. Raftery, and Chris T. Volinsky. 1999. Bayesian model averaging: A tutorial (with comments by Merlise Clyde, David Draper and Ed I. George, and a rejoinder by the authors. Statistical Science 14: 382–417. [Google Scholar] [CrossRef]
Kullback, S., and R. A. Leibler. 1951. On information and sufficiency. The Annals of Mathematical Statistics 22: 79–86. Available online: https://www.jstor.org/stable/2236703 (accessed on 1 October 2025). [CrossRef]
Link, William A., and Richard J. Barker. 2006. Model weights and the foundations of multimodel inference. Ecology 87: 2626–35. [Google Scholar] [CrossRef]
Madigan, David, and Adrian E. Raftery. 1994. Model selection and accounting for model uncertainty in graphical models using Occam’s window. Journal of the American Statistical Association 89: 1535–46. [Google Scholar] [CrossRef]
Marambakuyana, Walena A., and Sandile C. Shongwe. 2024. Composite and Mixture Distributions for Heavy-Tailed Data—An Application to Insurance Claims. Mathematics 12: 335. [Google Scholar] [CrossRef]
Miljkovic, Tatjana, and Bettina Grün. 2021. Using Model Averaging to Determine Suitable Risk Measure Estimates. North American Actuarial Journal 25: 562–79. [Google Scholar] [CrossRef]
Moral-Benito, Enrique. 2015. Model averaging in economics: An overview. Journal of Economic Surveys 29: 46–75. [Google Scholar] [CrossRef]
Posada, David. 2008. jModelTest: Phylogenetic model averaging. Molecular Biology and Evolution 25: 1253–56. [Google Scholar] [CrossRef]
Raftery, Adrian E., David Madigan, and Jennifer A. Hoeting. 1997. Bayesian model averaging for linear regression models. Journal of the American Statistical Association 92: 179–91. [Google Scholar] [CrossRef]
Raftery, Adrian E., Tilmann Gneiting, Fadoua Balabdaoui, and Michael Polakowski. 2005. Using Bayesian model averaging to calibrate forecast ensembles. Monthly Weather Review 133: 1155–74. [Google Scholar] [CrossRef]
Richards, Shane A., Mark J. Whittingham, and Philip A. Stephens. 2011. Model selection and model averaging in behavioural ecology: The utility of the IT-AIC framework. Behavioral Ecology and Sociobiology 65: 77–89. [Google Scholar] [CrossRef]
Shongwe, Sandile C., and Walena A. Marambakuyana. 2024. Danish fire insurance data: A review and additional analysis. In ITM Web of Conferences. Les Ulis: EDP Sciences, vol. 67, p. 01011. [Google Scholar] [CrossRef]
Singh, Abhishek, Srikanta Mishra, and Greg Ruskauff. 2010. Model averaging techniques for quantifying conceptual model uncertainty. Groundwater 48: 701–15. [Google Scholar] [CrossRef] [PubMed]
Steel, Mark F. J. 2020. Model averaging and its use in economics. Journal of Economic Literature 58: 644–719. [Google Scholar] [CrossRef]
Taghizadeh-Mehrjardi, Ruhollah, Hossein Khademi, Fatemeh Khayamim, Mojtaba Zeraatpisheh, Brandon Heung, and Thomas Scholten. 2022. A comparison of model averaging techniques to predict the spatial distribution of soil properties. Remote Sensing 14: 472. [Google Scholar] [CrossRef]
Volinsky, Chris T., David Madigan, Adrian. E. Raftery, and Richard. A. Kronmal. 1997. Bayesian model averaging in proportional hazard models: Assessing the risk of a stroke. Journal of the Royal Statistical Society Series C: Applied Statistics 46: 433–48. [Google Scholar] [CrossRef]
Wang, Haiying, Xinyu Zhang, and Guohua Zou. 2009. Frequentist model averaging estimation: A review. Journal of Systems Science and Complexity 22: 732–48. [Google Scholar] [CrossRef]
Wright, Jonathan H. 2008. Bayesian model averaging and exchange rate forecasts. Journal of Econometrics 146: 329–41. [Google Scholar] [CrossRef]
Yan, Tianxing, Yi Lu, and Himchan Jeong. 2024. Dependence Modelling for Heavy-Tailed Multi-Peril Insurance Losses. Risks 12: 97. [Google Scholar] [CrossRef]
Ye, Ming, Karl F. Pohlmann, Jenny B. Chapman, Greg M. Pohll, and Donald M. Reeves. 2010. A model-averaging method for assessing groundwater conceptual model uncertainty. Groundwater 48: 716–28. [Google Scholar] [CrossRef] [PubMed]
Yousof, Haitham M., Saiful I. Ansari, Yusra Tashkandy, Walid Emam, Masoom M. Ali, Mohamed Ibrahim, and Salwa L. Alkhayyat. 2023a. Risk Analysis and Estimation of a Bimodal Heavy-Tailed Burr XII Model in Insurance Data: Exploring Multiple Methods and Applications. Mathematics 11: 2179. [Google Scholar] [CrossRef]
Yousof, Haitham M., Walid Emam, Yusra Tashkandy, Masoom M. Ali, Richard Minkah, and Mohamed Ibrahim. 2023b. A Novel Model for Quantitative Risk Assessment under Claim-Size Data with Bimodal and Symmetric Data Modeling. Mathematics 11: 1284. [Google Scholar] [CrossRef]
Zhang, Jiawei, Yuhong Yang, and Jie Ding. 2023. Information criteria for model selection. Wiley Interdisciplinary Reviews: Computational Statistics 15: e1607. [Google Scholar] [CrossRef]

Figure 1. (a) Mean excess plot, (b) histogram with the normal distribution overlay, and (c) optimal number (elbow method) of clusters plot for Danish fire claims data.

Figure 2. 16 Single models’ weights for Danish fire claims (Occam’s window outlined in red).

Figure 3. Top 20 composite models’ weights for Danish fire claims (Occam’s window outlined in red).

Figure 4. Top 20 mixture models’ weights for Danish fire claims data (Occam’s window outlined in red).

Figure 5. Grid map of AIC (above) and BIC (below) of all models fitted to Danish fire claims data.

Figure 6. Grid map of AIC vs. VaR_0.95 and AIC vs. TVaR_0.95 (above) and BIC vs. VaR_0.95 and BIC vs. TVaR_0.95 (below) for Danish fire claims data, respectively.

Figure 7. Grid map of AIC vs. VaR_0.99 and AIC vs. TVaR_0.99 (above) and BIC vs. VaR_0.99 and BIC vs. TVaR_0.99 (below) for Danish fire claims data, respectively.

Table 1. Descriptive statistics on Danish fire claims data (in million DKK).

Statistic	Danish Fire Loss
Number of observations	2492
Minimum	0.31
1st quartile	1.16
Median	1.64
3rd quartile	2.65
Mean	3.06
Maximum	263.25
Sum	7623.25
Standard deviation	7.98
Coefficient of variation	2.60
Skewness	19.90
Kurtosis	549.57

Table 2. Summary of the criterion weights for 16 standard distributions for Danish fire claims.

Distribution	p	NLL	AIC	AIC Weight	BIC	BIC Weight
Burr	3	3835.12	7676.24	1	7693.71	1

Table 4. Summary of the criterion weights of the top 20 composite models for Danish fire claims.

Head	Tail	p	NLL	AIC	AIC Weight	BIC	BIC Weight
Weibull	Inverse Weibull	4	3820.01	7648.02	0.011	7671.30	0.163
Paralogistic	Inverse Weibull	4	3820.14	7648.28	0.009	7671.56	0.143
Inverse Burr	Inverse Weibull	5	3816.34	7642.68	0.151	7671.79	0.127
Weibull	Inverse paralogistic	4	3820.93	7649.87	0.004	7673.15	0.065
Inverse Burr	Inverse paralogistic	5	3817.07	7644.14	0.073	7673.25	0.061
Paralogistic	Inverse paralogistic	4	3821.04	7650.08	0	7673.36	0.058
Weibull	Loglogistic	4	3821.23	7650.46	0	7673.74	0.048
Inverse Burr	Loglogistic	5	3817.37	7644.74	0.054	7673.85	0.045
Paralogistic	Loglogistic	4	3821.32	7650.65	0	7673.93	0.044
Loglogistic	Inverse Weibull	4	3821.38	7650.76	0	7674.04	0.041
Weibull	Burr	5	3817.57	7645.14	0.044	7674.24	0.037
Paralogistic	Burr	5	3817.72	7645.43	0.038	7674.54	0.032
Inverse Burr	Burr	6	3814.00	7639.99	0.580	7674.92	0.027
Loglogistic	Inverse paralogistic	4	3822.15	7652.31	0	7675.59	0.019
Inverse Burr	Inverse gamma	5	3818.30	7646.61	0.021	7675.71	0.018
Paralogistic	Inverse gamma	4	3822.22	7652.43	0	7675.72	0.018
Loglogistic	Loglogistic	4	3822.41	7652.82	0	7676.10	0.015
Weibull	Paralogistic	4	3822.44	7652.88	0	7676.17	0.014
Paralogistic	Paralogistic	4	3822.53	7653.05	0	7676.34	0.013
Inverse Burr	Paralogistic	5	3818.68	7647.37	0.015	7676.47	0.012

Table 5. Summary of risk estimates from the top 20 composite models for Danish fire claims.

		${V a R}_{0.95}$	${V a R}_{0.99}$	${T V a R}_{0.95}$	${T V a R}_{0.99}$
Empirical Estimates		8.41	24.61	22.16	54.60
Parametric
Head	Tail
Weibull	Inverse Weibull	8.02	22.77	22.64	63.86
Paralogistic	Inverse Weibull	8.02	22.79	22.67	64.00
Inverse Burr	Inverse Weibull	8.01	22.73	22.59	63.67
Weibull	Inverse paralogistic	8.03	22.64	22.38	62.65
Inverse Burr	Inverse paralogistic	8.03	22.65	22.39	62.69
Paralogistic	Inverse paralogistic	8.03	22.68	22.44	62.89
Weibull	Loglogistic	8.05	22.7	22.43	62.8
Inverse Burr	Loglogistic	8.04	22.64	22.35	62.46
Paralogistic	Loglogistic	8.05	22.71	22.46	62.89
Loglogistic	Inverse Weibull	8.05	22.96	22.93	65.02
Weibull	Burr	8.22	25.18	26.98	82.59
Paralogistic	Burr	8.22	25.18	26.98	82.61
Inverse Burr	Burr	8.22	25.13	26.88	82.15
Loglogistic	Inverse paralogistic	8.05	22.79	22.6	63.55
Inverse Burr	Inverse gamma	8.1	22.33	21.42	57.83
Paralogistic	Inverse gamma	8.11	22.44	21.57	58.48
Loglogistic	Loglogistic	8.06	22.82	22.61	63.52
Weibull	Paralogistic	8.11	22.6	21.98	60.35
Paralogistic	Paralogistic	8.11	22.62	21.99	60.41
Inverse Burr	Paralogistic	8.1	22.47	21.78	59.52
Model-averaged point estimates (AIC)		8.16	24.30	25.38	75.63
Model-averaged point estimates (BIC)		8.05	22.95	22.92	64.97

Table 6. Summary of the information criteria from the top 20 mixture models for Danish fire claims.

1st Model	2nd Model	p	NLL	AIC	AIC Weight	BIC	BIC Weight
Burr	Burr	7	3786.47	7586.95	0.922	7627.69	0.431
Inverse Weibull	Burr	6	3790.61	7593.22	0.040	7628.15	0.343
Loglogistic	Burr	6	3791.60	7595.20	0.015	7630.13	0.127
Inverse paralogistic	Burr	6	3792.02	7596.03	0.010	7630.96	0.084
Paralogistic	Burr	6	3794.36	7600.72	0	7635.64	0.008
Inverse Burr	Burr	7	3790.73	7595.46	0.013	7636.21	0.006

Table 7. Summary of risk measures for the top 20 mixture models for Danish fire loss data.

		${V a R}_{0.95}$	${V a R}_{0.99}$	${T V a R}_{0.95}$	${T V a R}_{0.99}$
Empirical Estimates		8.41	24.61	22.16	54.603
Parametric
1st Model	2nd Model
Burr	Burr	8.26	27.0	34.74	119.72
Inverse Weibull	Burr	8.19	25.23	27.21	83.83
Loglogistic	Burr	9.25	36.07	60.00	234.13
Inverse paralogistic	Burr	8.20	25.32	27.37	84.50
Paralogistic	Burr	8.43	26.85	30.07	95.73
Inverse Burr	Burr	8.19	25.23	27.22	83.87
Model-averaged point estimates (AIC)		8.29	27.17	35.07	121.11
Model-averaged point estimates (BIC)		8.35	27.45	34.80	119.25

Table 8. Summary of parametric risk measures for the models in Danish claims data.

	AIC	BIC	${V a R}_{0.95}$	${V a R}_{0.99}$	${T V a R}_{0.95}$	${T V a R}_{0.99}$
Single models
Burr	7676.24	7693.71	9.06	30.99	38.34	130.99
Inverse Weibull	7937.66	7949.30	6.31	14.19	12.66	28.27
Inverse Burr	7939.76	7957.22	6.3	14.17	12.64	28.23
Inverse paralogistic	8190.64	8202.28	5.41	10.66	9.33	18.24
Inverse gamma	8199.76	8211.4	6.41	12.55	10.82	20.42
Generalized Pareto	8202.16	8219.62	6.41	12.55	10.82	20.43
Loglogistic	8565.18	8576.82	1.71	3.19	2.78	5.14
Lognormal	8871.78	8883.42	6.53	10.76	9.25	14.2
Paralogistic	9033.76	9045.41	6.00	10.36	8.98	14.97
Inverse Gaussian	9036.61	9048.26	8.66	14.47	12.31	18.50
Inverse exponential	9293.71	9299.53	31.49	160.70	-	-
Inverse Pareto	9295.73	9307.37	31.82	162.38	-	-
Pareto	10,107.81	10,119.45	9.35	17.11	14.44	24.07
Gamma	10,490.05	10,501.7	8.47	12.59	11.03	15.12
Weibull	10,544.94	10,556.58	9.4	14.79	12.76	18.22
Exponential	10,564.57	10,570.39	9.18	14.10	12.24	17.17
Head—Tail (Composite model)
Weibull-Inverse Weibull	7648.02	7671.30	8.02	22.77	22.64	63.86
Paralogistic-Inverse Weibull	7648.28	7671.56	8.02	22.79	22.67	64.00
Inverse Burr-Inverse Weibull	7642.68	7671.79	8.01	22.73	22.59	63.67
Weibull-Inverse paralogistic	7649.87	7673.15	8.03	22.64	22.38	62.65
Inverse Burr-Inverse paralogistic	7644.14	7673.25	8.03	22.65	22.39	62.69
Paralogistic-Inverse paralogistic	7650.08	7673.36	8.03	22.68	22.44	62.89
Weibull-Loglogistic	7650.46	7673.74	8.05	22.7	22.43	62.8
Inverse Burr-Loglogistic	7644.74	7673.85	8.04	22.64	22.35	62.46
Paralogistic-Loglogistic	7650.65	7673.93	8.05	22.71	22.46	62.89
Loglogistic-Inverse Weibull	7650.76	7674.04	8.05	22.96	22.93	65.02
Weibull-Burr	7645.14	7674.24	8.22	25.18	26.98	82.59
Paralogistic-Burr	7645.43	7674.54	8.22	25.18	26.98	82.61
Inverse Burr-Burr	7639.99	7674.92	8.22	25.13	26.88	82.15
Loglogistic-Inverse paralogistic	7652.31	7675.59	8.05	22.79	22.6	63.55
Inverse Burr-Inverse gamma	7646.61	7675.71	8.1	22.33	21.42	57.83
Paralogistic-Inverse gamma	7652.43	7675.72	8.11	22.44	21.57	58.48
Loglogistic-Loglogistic	7652.82	7676.10	8.06	22.82	22.61	63.52
Weibull-Paralogistic	7652.88	7676.17	8.11	22.6	21.98	60.35
Paralogistic-Paralogistic	7653.05	7676.34	8.11	22.62	21.99	60.41
Inverse Burr-Paralogistic	7647.37	7647.37	8.1	22.47	21.78	59.52
1st–2nd model (Mixture model)
Burr-Burr	7586.95	7627.69	8.26	27.0	34.74	119.72
Inverse Weibull-Burr	7593.22	7628.15	8.19	25.23	27.21	83.83
Loglogistic-Burr	7595.20	7630.13	9.25	36.07	60.00	234.13
Inverse paralogistic-Burr	7596.03	7630.96	8.20	25.32	27.37	84.50
Inverse Burr-Burr	7600.72	7635.64	8.19	25.23	27.22	83.87
Gamma-Burr	7595.46	7636.21	9.40	36.48	59.80	232.22
Inverse Gaussian-Burr	7608.01	7642.93	8.59	31.88	46.53	172.91
Lognormal-Burr	7610.12	7645.05	8.71	32.67	48.82	183.18
Generalized Pareto-Burr	7609.83	7650.57	8.81	33.31	50.72	191.77
Inverse gamma-Burr	7615.94	7650.86	9.14	32.64	43.71	156.09
Inverse exponential-Burr	7619.57	7654.5	9.61	34.56	-	-
Exponential-Burr	7630.06	7659.17	9.51	33.85	45.41	162.22
Inverse Pareto-Burr	7632.63	7661.74	9.61	34.55	-	-
Paralogistic-Burr	7632.12	7667.04	8.43	26.85	30.07	95.73
Weibull-Burr	7633.65	7668.57	8.44	26.89	30.14	96.04
Pareto-Burr	7634.65	7669.58	9.51	33.85	45.39	162.15
Inverse Weibull-Inverse Burr	7679.53	7714.45	8.15	21.51	20.11	51.88
Inverse paralogistic-Inverse Weibull	7690.49	7719.59	8.13	19.92	17.89	42.34
Inverse Weibull-Inverse gamma	7691.07	7720.17	8.53	22.16	19.84	48.35
Inverse Burr-Inverse Burr	7681.57	7722.32	8.15	21.49	20.10	51.83

Table 3. Summary of the risk measures for 16 standard distributions for Danish fire claims.

	${V a R}_{0.95}$	${V a R}_{0.99}$	${T V a R}_{0.95}$	${T V a R}_{0.99}$
Empirical estimates	8.41	24.61	22.16	54.60
Parametric
Burr	9.06	30.99	38.34	130.99
Model-averaged point estimates (AIC)	9.06	30.99	38.34	130.99
Model-averaged point estimates (BIC)	9.06	30.99	38.34	130.99

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mothibe, L.B.; Shongwe, S.C. Model Averaging and Grid Maps for Modeling Heavy-Tailed Insurance Data. Risks 2026, 14, 11. https://doi.org/10.3390/risks14010011

AMA Style

Mothibe LB, Shongwe SC. Model Averaging and Grid Maps for Modeling Heavy-Tailed Insurance Data. Risks. 2026; 14(1):11. https://doi.org/10.3390/risks14010011

Chicago/Turabian Style

Mothibe, Lira B., and Sandile C. Shongwe. 2026. "Model Averaging and Grid Maps for Modeling Heavy-Tailed Insurance Data" Risks 14, no. 1: 11. https://doi.org/10.3390/risks14010011

APA Style

Mothibe, L. B., & Shongwe, S. C. (2026). Model Averaging and Grid Maps for Modeling Heavy-Tailed Insurance Data. Risks, 14(1), 11. https://doi.org/10.3390/risks14010011

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Model Averaging and Grid Maps for Modeling Heavy-Tailed Insurance Data

Abstract

1. Introduction and Literature Review

2. Methodology

2.1. Model Specification

2.2. Model Averaging

2.3. Grid Maps

2.4. R Studio 4.5.0 Packages

3. Analysis and Results

3.1. Data Descriptive

3.2. Single Distributions Averaging

3.3. Composite Distributions Averaging

3.4. Mixture Distributions Averaging

3.5. Grid Mapping Analysis

4. Discussion of Research Questions

5. Conclusions

6. Limitations of This Study

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Model Components

Appendix B. Model Specifications

Appendix B.1. Single Model Specification

Appendix B.2. Composite Model Specification

Appendix B.3. Mixture Model Specification

Appendix C. Occam’s $δ$ Sensitivity and Q-Q Plot Analysis

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Model Averaging and Grid Maps for Modeling Heavy-Tailed Insurance Data

Abstract

1. Introduction and Literature Review

2. Methodology

2.1. Model Specification

2.2. Model Averaging

2.3. Grid Maps

2.4. R Studio 4.5.0 Packages

3. Analysis and Results

3.1. Data Descriptive

3.2. Single Distributions Averaging

3.3. Composite Distributions Averaging

3.4. Mixture Distributions Averaging

3.5. Grid Mapping Analysis

4. Discussion of Research Questions

5. Conclusions

6. Limitations of This Study

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Model Components

Appendix B. Model Specifications

Appendix B.1. Single Model Specification

Appendix B.2. Composite Model Specification

Appendix B.3. Mixture Model Specification

Appendix C. Occam’s δ Sensitivity and Q-Q Plot Analysis

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Appendix C. Occam’s $δ$ Sensitivity and Q-Q Plot Analysis