1. Introduction
Insurance companies have an inverted production cycle, where they receive the premium (product price) before knowing the cost (claims). As a result, insurers must estimate these costs and set aside sufficient funds to meet their commitments to policyholders and claimants, creating what is known as reserves. Traditional reserving methods often assume independence among portfolio risk components. However, practical experience shows that risks are frequently interconnected, and this interdependence, represented by correlations between various lines of business, plays a crucial role in determining the overall portfolio reserve.
Dependence modeling plays a pivotal role within the insurance industry and the broader field of risk analysis. It is essential to comprehend the relationships among various variables or events. Dependence modeling serves as a valuable tool for quantifying and characterizing these relationships, ultimately enhancing the precision of risk assessments by accounting for dependencies that can either magnify or mitigate risks.
Furthermore, it facilitates superior portfolio diversification by offering insights into asset inter-dependencies, thereby reducing overall portfolio risk. Consequently, investors and financial institutions rely on dependence modeling to evaluate the risks associated with portfolios comprising multiple assets or financial instruments.
In domains such as financial markets, there exists a category of infrequent yet highly impacting events known as “tail events”, which significantly influence risk. Dependence modeling is instrumental in identifying these tail dependencies, a critical aspect of managing and mitigating tail risks.
Insurance companies harness dependence modeling to establish premium rates and effectively manage their risk exposure. Through an understanding of the correlations between different events or claims, insurers can accurately price policies and allocate capital to adequately cover potential losses.
Lastly, dependence modeling is of paramount importance in the realm of regulation and stress testing. Stress tests are conducted to assess the performance of a system or portfolio under adverse conditions. Dependence modeling is indispensable for crafting realistic stress scenarios that consider the intricate interplay between various risk factors.
In the context of loss reserving, understanding dependencies aids in predicting the necessary risk capital, which serves as a buffer for property and casualty (P&C) insurers against potential losses stemming from extreme and adverse events.
To calculate loss reserves, we utilize aggregated data, referred to as loss triangles. In these triangles, rows represent accident years, while columns represent development periods. The lower section of the triangle, which we aim to predict, represents future (unpaid) claims.
There are two main approaches used to capture the dependencies between different loss triangles. The first one focuses on distribution-free multivariate reserving methods. For example,
Braun (
2004) showed the effectiveness of the multivariate chain-ladder method using simulated data, demonstrating an increased estimation accuracy of the prediction error when accounting for the correlation between loss triangles.
Merz and Wüthrich (
2008) also studied the prediction error of a modified multivariate chain-ladder model proposed by
Schmidt (
2006) and incorporated a dependence structure into their model.
The other main approach for modeling dependence between business lines employs parametric methods, based on various distributional families. One commonly used method for parametric loss reserving is the copula model. For example, a Gaussian copula is used by
Brehm (
2002) to model the joint distribution of unpaid losses.
De Jong (
2012) used a Gaussian copula correlation matrix to model the dependence between lines of business.
Shi et al. (
2012) used multivariate Gaussian copula to capture correlation due to accounting years using loss triangles, while
Merz et al. (
2013) allowed the correlation matrix to vary over time and produced a more accurate depiction of dependence.
Abdallah et al. (
2015) used hierarchical Archimedean copulas to model dependence within and between lines of business. More recently,
Shi (
2017) conducted an analysis of multiple inter-company loss triangles using the Bayesian hierarchical model.
Avanzi et al. (
2016) introduced a multivariate Tweedie approach to capture cell-wise dependence in loss reserving, while
Araiza Iturria et al. (
2021) presented a stochastic model aimed at capturing dependencies between loss triangles. In their work, they opted for a Tweedie-distributed double-generalized linear model to represent the marginal distribution.
Lally and Hartman (
2018) used hierarchical Bayesian Gaussian process regression to estimate loss reserves across a spectrum of product lines. Additionally,
Badounas and Pitselis (
2020) explored the use of the quantile regression technique in the context of loss reserves.
Bootstrapping is also another popular parametric approach used for loss reserving, which involves resampling historical data to simulate and generate new (synthetic) datasets, also called pseudo-responses.
Kirschner et al. (
2002) proposed a synchronized bootstrap, which aimed to estimate the prediction error of a multivariate dependence model.
Taylor and McGuire (
2007) modified their approach to account for the additional complexity introduced by the generalized linear model framework.
Shi and Frees (
2011) used Frank and Gaussian copula to model the dependence between lines of business and introduced a parametric bootstrapping method to estimate the prediction error.
The contribution of this work to the actuarial literature in general, and to loss reserving in particular, is twofold. Firstly, this work introduces rank-based methods to the Sarmanov Family of distributions. This family is considered a richer and more flexible class of distributions for modeling dependence between risks, thanks to its flexible structure that nicely joins the marginals.
Second, we suggest direct pairwise dependence modeling for both bivariate and trivariate loss reserving analyses, using a rank-based Sarmanov for multivariate distributions applied to more than two lines of business.
Sarmanov’s multivariate distribution, as described in Sarmanov’s seminal work by
Sarmanov (
1966), has garnered significant attention in various corners of the actuarial literature. This distribution is noted for its tractability, its ability to accommodate a large array of flexible dependence structures, and for linking different marginal distributions. This adaptability has recently led to heightened interest across multiple realms of actuarial research.
In the context of reserving, Sarmanov’s distribution has been effectively utilized, as highlighted by
Abdallah et al. (
2016b). This application enabled the capture of dependencies between two lines of business through the incorporation of random effects. Sarmanov’s distribution has also emerged as a valuable tool in analyzing ruin theory probabilities, as demonstrated by its application in studies conducted by
Yang and Yuen (
2016),
Guo et al. (
2017), and more recently,
Chen et al. (
2023).
Some of the referenced studies have demonstrated that, in comparison to alternative distributions like copulas, the Sarmanov family of distributions offers a superior fit to actual insurance data. For example,
Bolancé and Vernic (
2019) emphasize some disadvantages of the copula approach (e.g., elliptical) compared with the Sarmanov distributions. Moreover,
Bahraoui et al. (
2015) showed that the bivariate Sarmanov is more flexible than copulas in modeling dependence. Additionally, the correlation coefficients of Sarmanov’s family of distributions have wider ranges; see
Bahraoui et al. (
2015) and
Lee (
1996) for more details.
Bolancé et al. (
2020) proposed a Sarmanov method with beta marginals and put it in use for motor insurance pricing.
The adaptability and extensive utility of Sarmanov’s multivariate distribution have positioned it as a cornerstone in contemporary actuarial research. Its ability to navigate complex dependence structures has fostered a deeper understanding of dependencies and risk assessment in various segments of the insurance landscape.
In this paper, we specifically employ the Sarmanov distribution as we transition from the one-stage inference technique, where we simultaneously estimate both the marginals and the dependence parameters, to a two-stage inference modeling approach. Indeed, altering the dependence structure can result in distinct parameter estimations for the marginals, potentially leading to a different total reserve estimation. As a consequence, this method has the undesired effect of violating the linearity property of the mean. Therefore, we suggest employing a two-stage inference approach, commonly known as the rank-based method, utilizing the Sarmanov family of multivariate distributions. In the initial step, we fit generalized linear models (GLMs) to the individual marginals, establishing fixed parameters for the marginals and reserve estimations. Subsequently, we establish connections between the dependencies of these GLMs using the rank-based method, employing bivariate and trivariate Sarmanov distributions. It is worth noting that a similar approach has previously been explored using copula models. For example,
Genest and Nešlehová (
2014) discussed the rank-based methods for copula estimation, while
Côté et al. (
2016) introduced the rank-based methods for loss reserving, using nested Archimedean copulas, and a copula-based risk aggregation model.
The statistical properties of rank-based methods, including the consistency and asymptotic normality of estimators, were previously established by
Genest et al. (
1995). They conducted a comprehensive examination of a semi-parametric approach for estimating dependence parameters within a family of multivariate distributions. In this study, we showcase the practical applications of these methods and extend their utility to the multivariate Sarmanov distribution family. In essence, our research demonstrates that the proposed method more effectively captures the dependencies among lines of business (LOBs) and yields lower risk capital estimates compared to traditional one-stage inference models.
Section 2 provides an overview of loss triangle modeling, introducing notations and presenting a concise overview of the Sarmanov distribution.
Section 3 introduces the rank-based method to the Sarmanov family of multivariate distributions. For illustration and validation,
Section 4 applies the model to seven LOBs from different datasets, sourced from a major US and a large Canadian property–casualty insurer. In
Section 5, we analyze the implications for risk capital and demonstrate the advantages of our methods in terms of diversification benefits.
Section 6 concludes the paper.
5. Risk Capital Implications
In addition to reserves, companies also need to set aside additional funds as a buffer in case of potential losses caused by adverse scenarios or extreme events; it is called risk capital. It represents the amount of money that the companies can lose without causing significant harm to the financial situation. In practice, companies calculate their risk capital by summing up the risk capital of each LOB separately. This is called the “Silo” method; it was introduced by
Ajne (
1994). However, this method implicitly assumes that risks are perfectly correlated, and does not allow any forms of diversification.
Therefore, we address this issue by using a dependence model through the Sarmanov family of multivariate distributions, with both the one-stage inference and rank-based methods. This section then examines and compares both approaches and assesses their impacts on the risk capital and diversification benefits.
In order to calculate the risk capital, risk measures, such as the value-at-risk () and tail value-at-risk (), are used. is calculated as the percentile of the loss distribution, where is the risk tolerance.
is the expected loss, given that the loss is greater than the
level. Namely, we have
where
S is the total unpaid loss for the portfolio.
In our case, we use the
, which is a coherent risk measure, unlike the
for which the sub-additive property is, in general, not guaranteed. The capital allocation approach determines the share of the risk capital to be allocated to each LOB. It was first introduced by
Tasche (
1999) and is summarized by
Bargès et al. (
2009).
5.1. Simulation Procedure
To calculate the risk capital, we need a predictive distribution of reserves, which can be obtained by simulation, as these distributions cannot be obtained explicitly.
The simulation algorithm is the same for both one-stage inference and rank-based methods. In fact, to generate realizations from the Sarmanov distribution, we use the inversion method, based on the conditional cumulative distribution function, as described by
Pelican and Vernic (
2013). The simulation method has the following steps for both the bivariate and trivariate cases:
For the trivariate Sarmanov, the simulation procedure continues as follows:
Generate
from the conditional cumulative distribution function
of a random variable
, expressed as below:
Once we estimate the parameters from both the one-stage inference and rank-based methods, as described in
Section 2.3 and
Section 3, we simulate the 45 observations of the lower part of the triangle
, with
, and
, using the simulation procedure described above. Then we calculate the reserve and estimate the risk measure from the simulated lower part of the triangle, as follows.
For each simulation and LOB
ℓ, we compute the total unpaid loss:
as well as
, the total unpaid loss for the whole portfolio. Here, the
-based capital allocation is used and can be written as
where
is the empirical cumulative distribution function of
S and
N is the number of simulations. The total
-based capital allocation can be written as
The risk capital is defined as the difference between the risk measure and the value of liability (see, e.g.,
Dhaene et al. 2006). To replicate what is usually being done in practice, the risk measure is used at a high-risk tolerance, say 99%, while the value of liability (reserve) is usually assumed to be equal to the risk measure, but at a lower risk tolerance, generally between 60% and 80%, according to the risk appetite. Here, we set the risk tolerance at 60% for the reserve in our risk capital analysis. Mathematically, the risk capital associated with a risk
R, noted by
, is then calculated as follows:
We then compute the gain of the dependence model compared to the silo method below:
First, we apply the aforementioned procedures to the personal and commercial auto LOBs utilizing data from the US Schedule P. We compute the
for various risk thresholds, where
. Subsequently, we determine the risk capital and gains using the rank-based method and proceed to compare them against both the silo and one-stage inference methods. The results of these comparisons, based on 50,000 simulations, are presented in
Table 30. We present the lowest
, risk capital and highest gain for each risk level in bold.
Unsurprisingly, both one-stage inference and rank-based Sarmanov methods provide lower risk measures and risk capital than the silo method. This confirms and highlights the importance of the diversification benefit when modeling dependence between these two negatively dependent LOBs.
Significantly, it is evident from
Table 30 that the rank-based method surpasses the one-stage inference method in terms of gain when compared to the Silo method. Specifically, we note a reduced risk measure and an increased gain for the Sarmanov rank-based method. This highlights that the diversification benefit achieved through the rank-based method is greater than that attained with the one-stage inference method.
Subsequently, we reevaluate the implications of risk capital using two other datasets from the Canadian insurer data. The initial dataset includes the auto and home LOBs, and we proceed to compare the risk capital obtained through the rank-based Sarmanov method against that obtained via the traditional one-stage inference approach.
Table 31 demonstrates and corroborates that the bivariate Sarmanov model, when utilizing the rank-based method, yields lower risk measures and greater risk capital gains in comparison to both the silo and one-stage inference methods. This observation is further substantiated in the subsequent section through the application of bootstrapping. The lowest
, risk capital and highest gain for every risk level are indicated in bold below.
For the second dataset from the Canadian Insurer, we compare the risk capital for the bivariate case with the following pairs: BI and AB, BI and DI, and AB and DI, as well as for the trivariate case with the triplet BI, AB, and DI. Here, only models with significant dependence shown in
Section 4 are illustrated.
Table 32 demonstrates and validates that the bivariate Sarmanov model, employing the rank-based method, yields lower risk capital and higher gains when contrasted with both the silo and one-stage inference methods. The lowest risk capital and highest gain of the total of three LOBs are highlighted in bold.
In the trivariate scenario, we observe that the risk capital allocations are lower than in the bivariate case. Furthermore, the gains are higher, underscoring the additional risk diversification potential enabled by the rank-based trivariate Sarmanov method in the presence of multivariate dependence.
5.2. Bootstrap Procedure
The results from the simulation procedure section above do not incorporate parameter uncertainty, as the model is assumed to be correct. As such, a parametric bootstrap can be used in order to quantify estimation error and tackle potential model over-fitting. Therefore, in order to calculate the predictive distribution of reserves and risk capital, we also use the bootstrapping method to generate sample data and estimate the parameters. We use the same bootstrap algorithm as
Taylor and McGuire (
2007), which is also shown in work by
Shi and Frees (
2011) and
Abdallah et al. (
2016b). The following are the steps included in the bootstrapping method for bivariate or multivariate cases after estimating parameters using the methods described in
Section 2.3 and
Section 3.
Simulate 55 pseudo-responses , () from the Sarmanov model using the estimated parameters , , ,…,, with .
Estimate the parameters from the new simulated (synthetic) data , based on the different models.
Simulate the lower part (45 observations) of the triangle , where and , using the new estimated parameters obtained above.
Calculate the reserve and estimate the risk measures from the simulated lower part of the triangle.
We apply the bootstrap method to the three datasets. We first use the Kolmogorov–Smirnov test to check whether the simulation procedure produces adequate datasets (i.e., loss triangles), as shown in
Table 33. We observe that the null hypothesis is not rejected for all models, i.e., there is not enough evidence that the simulated data do not come from the same distribution of the original loss data for each LOB.
For the personal and commercial auto lines based on the US Schedule P Data,
Table 34 displays the
,
, as well as the corresponding risk capital estimates and gains obtained through 5000 bootstrap simulations. As the bootstrap is more computationally intensive, a reduced number of simulations is used for this section. The lowest
, risk capital and highest gain for each risk level are given in bold below.
The findings presented in
Table 34 corroborate the results obtained through simulations. Specifically, they demonstrate that, once again, the bivariate Sarmanov model employing the rank-based method yields lower risk measures compared to both the silo and one-stage inference methods. This reaffirms the conclusion that rank-based methods consistently outperform both models when applied to the personal and commercial auto LOBs from the US Schedule P dataset.
We next implement the bootstrap method on the Canadian Insurer Data 1, with the results presented in
Table 35. The lowest
, risk capital and highest gain for each risk level are written in bold. These results reaffirm the conclusions drawn in
Section 5.1, specifically that the bivariate Sarmanov distribution using the rank-based method consistently delivers the lowest risk capital allocations and the highest risk capital gains when compared to the one-stage inference model.
Finally, we apply the bootstrap method to the Canadian Insurer Data 2, and the results are shown in
Table 36. We highlighted the lowest risk capital and highest gain for the total three LOBs in bold. The findings from
Section 5.1 are again confirmed, i.e., the trivariate Sarmanov distribution with the rank-based method provides the smallest risk capital allocations and the largest risk capital gain among all models.
It is worth noting that the risk measures obtained through bootstrapping are significantly higher for all models compared to those reported through simulation. This emphasizes the significance of accounting for parameter uncertainty.
6. Summary and Concluding Remarks
In this paper, we introduced rank-based techniques to enhance the modeling of the Sarmanov family of multivariate distributions within the context of loss-reserving. Our findings demonstrate that these rank-based methods not only more effectively capture the inter-dependencies between different LOBs when compared to one-stage inference but also yield superior outcomes in terms of risk capital allocation.
The dependence structure has also been extended to more than two LOBs with the trivariate case, which provides the largest risk capital gains and diversification benefits among all models. We provided comprehensive explanations and descriptions for estimations, reserve calculations, as well as simulation and bootstrap procedures for all the models utilized in this paper.
The methods were calibrated and validated on seven LOBs from real-world data and led to the same conclusions that, namely, the robust rank-based estimation method outperforms the classical one-stage inference approach for both bivariate and trivariate Sarmanov models. Indeed, the rank-based Sarmanov model effectively captures the interdependence among LOBs in cases where the one-stage inference model falls short (see the summary in
Table 29). Moreover, as demonstrated in the preceding section, the proposed rank-based Sarmanov model not only yields lower risk measures but also produces a more substantial diversification benefit when compared to the one-stage inference model.
The challenge in aggregate loss reserving lies in dealing with over-parameterization due to the limited dataset available within the loss triangle. Although rank-based methods partially alleviate this problem by fixing the marginal parameters, future research could explore the application of rank-based Sarmanov methods at the micro-level of reserving, where more (detailed) data are accessible.
Furthermore, to enhance the accuracy of residuals, we can also work on improving the fit of the marginal model. In this regard, future investigations may consider utilizing the generalized partial linear model (GPLM), which incorporates both linear and nonlinear components. This approach provides greater flexibility in capturing intricate relationships between the response variable and predictor variables. Such flexibility proves particularly valuable when dealing with non-linear relationships, a common occurrence in real-world datasets (see, for example,
He et al. 2005;
Yousof and Gad 2015).
The Sarmanov distribution family offers numerous advantages over alternative dependence models, such as copulas. Its flexible structure renders it a promising tool for effectively capturing dependencies among LOBs. This methodology can be readily extended to encompass more than three LOBs, as well as broader risk considerations. Furthermore, its applicability extends beyond LOBs and can be effectively employed in other domains of actuarial science, including the valuation of premiums and the development of pricing strategies.
For industry professionals, this research also carries tangible and pragmatic significance. The rank-based multivariate Sarmanov method offers a more comprehensive understanding of dependence structures and portfolio dynamics. Consequently, it can be a valuable resource for P&C insurance companies, aiding them in meeting the International Financial Reporting Standard (IFRS 17) regulations while enhancing their solvency risk assessment. This, in turn, will result in positive economic and societal impacts by improving the insurance company’s solvency ratio. Furthermore, the proposed model aligns harmoniously with industry best practices, as it encourages actuaries to avoid adjusting the estimated reserve of one LOB based on another. Instead, it places a strong emphasis on integrating the impact of correlated LOBs into risk management and tail dependence evaluations. This approach aims to harness diversification benefits and provide valuable insights to inform strategic decisions.