1. Introduction
Copula models are widely used in insurance and finance for pricing, hedging, and risk management, as well as in health sciences, hydrology, and other applied sciences; see e.g.,
Chen and Guo (
2019);
Czado (
2019);
Joe (
2014);
Kularatne et al. (
2021);
McNeil et al. (
2015). Such wide applicability has triggered important contributions both in probabilistic and statistical aspects of copula models; see
Durante and Sempi (
2015);
Joe (
2014) and references therein. The estimation of copula model parameters from observed data appears, at first, to be a straightforward inference exercise. However, it has, in fact, significant pitfalls. The estimation of copula models without fully understanding the properties of the estimators can have undesirable consequences such as, among others, the overestimation of the dependence in the data; see the discussion in
Fermanian and Scaillet (
2005). One of the difficulties is that the distribution of the univariate margins is, in principle, unknown. Estimation procedures have been proposed to circumvent this problem, but no estimation procedure seems to be clearly the best. In fact,
Kojadinovic and Yan (
2010) show that the performance of commonly used estimation methods depends on the size of the sample and the strength of the dependence in the data. In finance, large samples of data are often available but the same does not happen in other applications where data are, by their nature, limited. This is the case, for instance, if the observations naturally occur at a low frequency or the population of interest is, by itself, small. Here, we propose new semiparametric estimators and show that the level of dependence obtained can differ substantially in a general insurance case where the data are available only quarterly.
Sklar’s representation theorem (
Sklar 1959) characterizes a so-called
copula model for a random vector
with multivariate distribution
H by a copula function,
C, and univariate marginal cumulative distribution functions (cdfs)
for
, as
A copula is then a multivariate distribution with standard uniform univariate margins. If the univariate marginal cdfs are continuous, then the copula is unique. The versatility of copula models is apparent from Sklar’s representation theorem. By combining different distributions for the univariate margins with copula functions, a variety of models can be easily specified. Such flexibility can have a cost when it comes to the task of estimating the copula model from observed data.
Assuming that the univariate margins and copula all belong to absolutely continuous families of distributions, the obvious estimation method is maximum likelihood (ML). By default, the ML estimation of a model’s parameters is performed in one step. But, mainly due to numerical problems, which typically arise during the optimization of a likelihood function with several parameters and possibly multi-dimensional integrals, a two-step maximum likelihood estimation method has been introduced, the so-called
inference functions for margins (IFM) from
Joe and Xu (
1996) and
Joe (
1997). The IFM method consists of estimating first the parameters for each univariate margin distribution independently, and then estimating the dependence parameters from the multivariate log-likelihood where the univariate margins parameter estimates are held fixed. Although the two-step IFM method can suffer from some loss of efficiency in cases of strong dependence, it still enjoys strong asymptotic efficiency as shown by
Joe (
2005). A further advantage of a two-step estimation method is that the estimation of the univariate margins parameters is not affected by a possible misspecification of the multivariate copula model. A fundamental challenge with the ML estimation, either the one or two-step procedure, is to ensure the correct choice of distributions for the univariate margins. This is especially relevant if we are particularly interested in modelling the dependence structure of the random vector. Through a simulation study,
Fermanian and Scaillet (
2005) find that misspecification of the margins may translate into a severe positive bias and high mean square errors in the estimation of the copula parameters leading to an overestimation of the degree of dependence in the data. An extensive simulation study from
Kim et al. (
2007) shows that the one-step ML and the IFM methods are indeed nonrobust against misspecification of the marginal distributions.
Kim et al. (
2007) also shows that, when the margins are unknown, in order to avoid the consequences of misspecification, it is better to use the
maximum pseudo-likelihood (MPL) estimation procedure studied in
Genest et al. (
1995) and
Shih and Louis (
1995). Semiparametric estimation in copula models is indeed used widely even in nonstationary cases, for instance climate data, as is the case in
Nasri et al. (
2019).
For a random sample
from distribution
, the MPL is a semiparametric estimation procedure consisting of selecting the parameter
that maximizes the log pseudo-likelihood function
where
is the probability density function (pdf) of the copula family
, and the univariate marginal distributions
estimator is a rescaled empirical distribution function of the
jth variable. Further asymptotic properties of the MPL estimator have been studied in
Klaassen and Wellner (
1997) and
Genest and Werker (
2002). The finite sample properties of the MPL estimator have been studied in
Kojadinovic and Yan (
2010) in a study where they compare the MPL estimator with the two method-of-moments (MM) estimators based on the inversion of Spearman’s rho and Kendall’s tau coefficients. The MM estimators have been studied by
Genest (
1987);
Genest and Rivest (
1993);
Oakes (
1982).
Kojadinovic and Yan (
2010) found that the MPL estimator performs better than the MM estimators in terms of mean squared error, except for small and weakly dependent vectors. Using the MM procedure as an alternative to MPL for small weakly dependent vectors is not the best solution, as we will demonstrate in this study. Instead, we propose to modify the canonical MPL estimator by using consistent nonparametric estimators of the univariate marginal distributions different from that used since
Genest et al. (
1995).
After deriving theoretically the consistency and asymptotic normality of the proposed MPL estimators, we study their small sample properties via a simulation study. We compare three alternative MPL estimators with the canonical MPL, and with the MM estimators based on the inversion of Kendall’s tau and Spearman’s rho. We find that changing the nonparametric estimator of the univariate margins indeed improves the finite sample performance of the MPL estimator, in terms of bias and mean squared error, while preserving its asymptotic properties. To confirm the large sample performance of the estimators we evaluate their relative efficiency via simulation.
Instead of proposing to use alternative nonparametric estimators of the univariate margins, another possibility would be to obtain a bias correction function for the canonical MPL estimator. Such a bias correction function would depend not only on the copula parameter and sample size, but also, importantly, on the specific copula itself. The approach that we propose to use here has the advantage of not depending on the specific copula. A bias reduction correction can also have the effect of increasing the variance of the estimator and possibly the mean square error, (see e.g.,
Søbye et al. 2021). That does not happen with the estimators we propose here.
Another possible approach is to estimate the multivariate model nonparametrically using empirical copulas, the asymptotic properties of which can be found in
Genest and Segers (
2010) and
Segers et al. (
2017). See also, e.g.,
Yang et al. (
2020) on the nonparametric estimation of copula regression models. Naturally, empirical copula model estimation requires larger samples. Especially in applications where the size of the sample available is limited, there might be enough data to estimate the univariate marginal distribution functions nonparametrically but not enough data to estimate the empirical copula. That is one of the reasons why the semiparametric method from
Genest et al. (
1995) has become commonly used in applications.
Although we chose to compare the MPL estimators proposed here with the MM estimators, as in
Kojadinovic and Yan (
2010), other semiparametric estimators have been introduced in the literature.
Tsukahara (
2005) studied two semiparametric estimation procedures and concluded that these, overall, have a higher mean squared error when compared with the canonical MPL estimator.
Chen et al. (
2006) introduced and studied the properties of an MPL estimator where the unknown marginal density functions are approximated by linear combinations of finite-dimensional known basis functions with increasing complexity called sieves. They find that for weak dependence the sieve method performs comparably to the canonical MPL in finite samples. Given these results, comparing the proposed estimators with the canonical MPL and the MM estimators seems an appropriate choice.
In
Section 2 of this article, we introduce the canonical MPL estimation procedure and its statistical properties. It is our starting point, as we benchmark the MPL estimators that we propose against the canonical MPL estimator.
Section 3 motivates and proposes the new MPL estimators.
Section 4 addresses their asymptotic properties. We show that the finite sample properties of the MPL estimators depend on the copula model in
Section 5.
Section 6 summarizes the MM estimators used in the simulation study. In
Section 7 we report and discuss the results of the simulation study where we compare the small sample performance of the six estimators. We apply our results to a case of general insurance data in
Section 8.
Section 9 concludes the paper. Proofs and tables with simulation results are given in the
Appendix A and
Appendix B.
3. Alternative MPL Estimators
The semiparametric canonical MPL estimation procedure hinges on a nonparametric estimator of each marginal univariate distribution for . As introduced in the previous section, this nonparametric estimator is the rescaled empirical distribution function . Here, we motivate and propose the use of alternative nonparametric estimators for the univariate margins in the MPL estimation procedure.
In the implementation of the canonical MPL method, for each univariate margin
, the pseudo-observations
, defined in (
2), are calculated as
where
is the rank of
among
.
For clarification, MPL estimation presents no scalability problems. The estimators involve ranking the multivariate observations one margin at the time. Hence, it can be easily implemented for high dimensional data sets.
3.1. Pseudo-Observations and Moments of Order Statistics
To motivate the new estimators, first we show the relation between the pseudo-observations and order statistics. Assume that are n-independent and identically distributed (iid) univariate random variables. Arrange these in ascending order of magnitude as , and call the rth order statistic, for .
Proposition 1. Consider a random sample from a univariate distribution with continuous cdf F and the corresponding transformed vector where for . If we define the function for , then each pseudo-observation , for , can be obtained as , where is the rank of among .
The proof of Proposition 1 can be found in the
Appendix A. The conclusion is that the pseudo-observations in (
3), proposed by
Genest et al. (
1995), can be obtained from the expected value of the order statistics defined as a function of the rank of the corresponding sample observations, i.e.,
where
, for
, and
is the rank of
within
.
Note that, as we are assuming that the random variable
X is continuous and its cdf
F is an increasing function, we have that the rank of
among
is the same as the rank of
among
. We remark here that
Clayton and Cuzick (
1985) also used expected order statistics from unit exponential distributions in the estimation of the dependence parameter of a bivariate hazards model.
At this point, it is important to recall that our goal is to improve the performance of the canonical MPL which uses the pseudo-observations computed as in (
4). With this objective in mind, we explore the properties of the pseudo-observations in (
4) inherited from the fact that these are obtained from expected values of order statistics, and how this affects the performance of the canonical MPL estimator.
If the random variable
X has cdf
F, then the distribution of the order statistics
is skewed (except for
if
n is even), especially when
r is closer to 1 or
n. Given that the expected value can be highly influenced by the skewness of the distribution, it is then possible that the properties of the pseudo-observations in (
4) are affected by the skewness of
and consequently also the canonical MPL estimator.
Figure 1 displays the pdf of the order statistics
and
in
, for samples of size
and
respectively. The strong skewness of the pdf implies that the mean is further away from the peak of the distribution than the median and, obviously, the mode. The pseudo-observations calculated using the mean of the order statistics might suffer from the skewness of the pdf. Hence, we propose to use the median or the mode of the order statistics, instead of the mean, to compute the pseudo-observations, and we study their effect on the performance of the new MPL estimators obtained from (
1). From
Figure 1 we can see that the skewness of the order statistics pdf is higher for smaller samples. We will see that, in fact, the smaller the sample, the larger the improvement of the performance of the estimator when using the median or the mode rather than the mean of the order statistics.
3.2. Pseudo-Observations and the Median of Order Statistics
We first propose to use the median of the
rth order statistic as an alternative to using the mean of the order statistic. If the continuous random variable
X has cdf
F then
is drawn from a standard uniform distribution and the median of the order statistic
is
where
is the regularised incomplete beta function. The computations can be made faster using the approximation (see
Hyndman and Fan 1996;
Kerman 2011) given by
Defining the function
, the corresponding pseudo-observations for the estimation of the copula parameter via the pseudo-likelihood method are then
We will refer to the copula parameter estimation procedure consisting of using the pseudo-observations given by (
5) in the log pseudo-likelihood function in (
1) as the median MPL.
3.3. Pseudo-Observations and the Mode of Order Statistics
The second alternative we explore to compute the pseudo-observations is using the mode of the
rth order statistic from a standard uniform distribution, which is given by
In this case, defining the function
, the pseudo-observations are
We will refer to the copula parameter estimation procedure consisting of using the pseudo-observations given by (
6) in the log pseudo-likelihood function as the mode MPL.
For the minimum and the maximum in each margin, i.e., for and (), it is not possible to use the mode of the corresponding order statistic as pseudo-observations in the pseudo log-likelihood function because these would be zero and one, respectively. In these cases, we use instead the mean of the order statistics and as in the canonical MPL because this is our benchmark estimator.
At this point we would like to remark the following. Instead of calculating the pseudo-observations as the mean, median or mode of the order statistics , we could consider using , or . If F is strictly monotonic then , which is one of the proposed estimators above. The pseudo-observations, calculated as or , depend on the distribution F. As we want to assume that F is unknown, we do not consider these alternatives.
3.4. Midpoint and Pseudo-Observations
In the canonical MPL estimator, the motivation to rescale the empirical distribution by multiplying it with
is justified (starting with
Genest et al. 1995) due to the need to keep the pseudo-observations away from the boundary of the interval
. To that end, the adjustment to the empirical distribution function
can rather be carried out using
Here, the additive factor
ensures that the pseudo-observations are strictly in the interval
. This approach, introduced by
Hazen (
1914), is popular with hydrologists and it is also used by
Joe (
2014) in the process of converting sample observations to normal scores. We include it in our study as an alternative to calculating the pseudo-observations, which are then given by
We will refer to the copula parameter estimation procedure consisting of using the pseudo-observations given by (
7) in the log pseudo-likelihood function in (
1) as the midpoint MPL.
4. Large Sample Properties of the MPL Estimators
Before moving on to the small-sample performance simulation study, we consider the consistency and asymptotic normality of the different estimators. As already pointed out by other authors, (e.g.,
Genest et al. 1995;
Kojadinovic and Yan 2010), using
as pseudo-observations in the log pseudo-likelihood function in (
1) corresponds to multiplying
by the empirical distribution of the univariate
jth variable. Each of the estimators of the univariate marginal distribution functions
used above can be written as a function of the empirical distribution estimator
for the corresponding variable. Given that the empirical distribution is a consistent estimator, as an immediate consequence of the strong law of large numbers, the consistency of the univariate cdf estimators used follows.
Genest et al. (
1995) show the consistency and asymptotic normality of the canonical MPL estimator building on the work of
Ruymgaart et al. (
1972). In this section, we generalize their result for the median, mode, and midpoint MPL estimators proposed here. For simplicity of exposition, hereafter we consider bivariate distributions
with copula
, real parameter
, and continuous univariate cdfs
and
, such that
,
. The results obtained can be generalised to the multivariate case.
The regularity conditions for the consistency and asymptotic normality of the MPL estimators are similar to those underlying the maximum likelihood estimator. Given a random sample
from distribution
, the MPL estimate
takes the value that maximizes the log pseudo-likelihood function (
1)
Let
. The semiparametric estimate
solves the equation
with
denoting the partial derivative of
l with respect to
. To derive an expression for the semiparametric MPL estimator
we follow
Genest et al. (
1995) and start by expanding (
8) in a Taylor series. As a result, we obtain
where
and
denotes the second derivative of
l with respect to
. Hence, a standardised version of
is
whose large sample properties relate to those of multivariate rank statistics of the form
under the following assumptions.
Assumption 1. is a continuous function from into such thatexists. Assumption 2. Define the function , on , let p and q be positive numbers satisfying and .
- (i)
with M a positive constant, and ;
- (ii)
with M a positive constant, and , and admits continuous partial derivatives on such that with and .
The limiting behaviour of the MPL estimators can then be summarised as follows.
Proposition 2. Under assumptions 1 and 2, each of the median, mode and midpoint MPL estimators is consistent and is asymptotically normal with variancewhere andwith denoting the indicator of A and . The proof of Proposition 2 and necessary results are obtained in the
Appendix A. The simulation study in
Section 7 illustrates this result. For the cases of multidimensional dependence parameter or in a multivariate context, the previous results on the consistency and asymptotic normality of the modified MPL estimators can be extended as in
Genest et al. (
1995) following similar arguments.
5. Finite Sample Properties of the MPL Estimators
Consider the random sample
of iid pairs from distribution
. Let
be the vector of ranks corresponding to
, and
the vector of ranks corresponding to
. The mean square error of
can be derived (at least approximated) from the moments of
and
and relation (
9). Hence, we are interested in the properties of statistics
and
which are both of the form
.
Let
denote the inverse of
in
, the space of all permutations of
. Define
, where
. We can then write the statistic
in its dual form
If
and
are independent then
has a uniform distribution in
and the derivation of its moments is straightforward (see e.g.,
Hájek 1969). Proposition 3 in the
Appendix A shows how to obtain the moments of
and
given the distribution of
. But if
and
are not independent then the distribution of
, to the best of the author’s knowledge, is unknown. To give an idea of how different the distribution of the
is from a uniform in the non independent case, we run a simulation from a Clayton copula with dependence parameter corresponding to a Kendall’s tau correlation of
(see
Joe 2014). We simulated
50,000 samples, each sample having
pairs of observations from the Clayton copula. The histograms of the simulated observations of
for
are displayed in
Figure 2. The histograms in
Figure 2 show how the distribution of
can be far from a uniform in the case of dependent samples. Given that the finite sample properties of the MPL estimators depend on the copula family via the unknown distribution of
we proceed our investigation of the finite sample properties of the MPL estimators with a simulation study.
6. Method-of-Moments Estimators
In our simulation study, we also compare the performance of the four semiparametric MPL estimators with the method-of-moments (MM) estimators obtained from the relation between the copula parameter and the coefficients Kendall’s tau,
, and Spearman’s rho,
; see
Oakes (
1982),
Genest (
1987),
Genest and Rivest (
1993). Copula parameter estimates obtained from these rank coefficients via the MM can be referred to as inversion-method estimates. The reason to include the two inversion-method estimators is first, because these perform better than the canonical MPL estimator for small weakly dependent samples, and second, to facilitate the comparison of our results with other related studies.
The MM estimation procedure is mostly used in the bivariate one-parameter copula model case, although it may be used in the multivariate and/or multiparameter cases, for instance, by imposing conditions on the dependence structure. In our simulation study we restrict ourselves to the one-parameter bivariate copulas case as explained in
Section 7. Hence, consider the random sample
from an absolutely continuous bivariate copula model
, where
belongs to an open subset of
, and
and
are continuous cdfs. Inversion-method estimators rely on a consistent estimator of a copula moment. A consistent estimator of the copula moment Kendall’s tau is given by
Given the ranks
corresponding to
, where
is the rank of
among
for
, a consistent estimator of the bivariate copula moment Spearman’s rho is
The copula parameter estimate,
, is then obtained by inversion from the relation between
and
or
as
or as
, when the functions
and
are bijections. In those cases where there is no analytic expression for the relation between the copula parameter and
or
then a numerical approximation must be used. The consistency, asymptotic normality, and variance of
and
are well documented in the literature and we refrain from repeating it here, directing the reader to
Kojadinovic and Yan (
2010) and relevant references therein.
7. The Finite Sample Performance of the Estimators
In this section, we compare the performance of the semiparametric pseudo-likelihood estimator when calculating the pseudo-observations as in (
4)–(
7), and the MM Kendall’s tau and Spearman’s rho estimators. Recall that we refer to the MPL estimators for the copula model parameters corresponding to (
4)–(
7) as canonical MPL, median MPL, mode MPL, and midpoint MPL, respectively. To compare the performance of the six estimators, we perform a simulation study. The calculations are performed using R (
R core Team 2020) and the package
copula (
Hofert et al. 2020).
Given their wide applicability to finance and insurance, we consider the copula families Clayton, Gumbel–Hougaard, Plackett, Normal, and Student-t. The Clayton family was first written in the form of a copula by
Kimeldorf and Sampson (
1975). Due to its joint lower tail dependence property, this family as been used to model the association between inter-event times, from epidemiology to insurance. The
Gumbel (
1960) copula can be used to model joint upper tail dependence, for instance, between large losses on financial assets or insurance claims. The bivariate
Plackett (
1965) family is radially symmetric and has been used as an alternative to the bivariate normal copula; see
Nelsen (
2006). The Normal and Student-t copulas are often used in classic finance and insurance multivariate models. Details on each of these copula families can be found, e.g., in
Joe (
2014). Without loss of generality, we consider the case of positive dependence in the simulation study.
We use six different levels of dependence corresponding to Kendall’s tau of 0.1, 0.2, 0.3, 0.4, 0.6, and 0.8, and four sample sizes of 50, 100, 200, and 400. These choices are also informed by the study of
Kojadinovic and Yan (
2010) to make it possible to benchmark some of our results against theirs. For each level of dependence and sample size, we simulate 5000 samples from all the copula families. Each sample is then used to estimate the copula parameter and standard error.
For clarification, we do not study the effect of the univariate marginal distributions because these play no role on the copula MPL estimation procedure. The pseudo-observations used in (
1) to obtain the MPL estimators are adjusted ranks of each marginal observations and do not depend on the particular distribution of each margin. The set of ranks corresponding to an iid random sample
from distribution
F is a permutation
T from the set of all possible permutations of
. If the observations are independent, then the probability of obtaining permutation
T is
, independently of the distribution
F, (see e.g.,
Hájek 1969).
7.1. Results
For each copula and degree of dependence considered, we present in
Table A1,
Table A2,
Table A3 and
Table A4 (in
Appendix B) the results for sample sizes 50, 100, 200, and 400, respectively. In the tables, the different copula models are labelled as: C for the Clayton, G for the Gumbel–Hougaard, P for the Plackett, N for the Normal, and t for Student-t. For the six estimators, we report the percentage relative bias (PRB
), the empirical standard deviation of the estimates (
), the mean of the estimated standard errors (
), and the empirical percentage coverage (PC
) of the approximate 95% confidence interval for the dependence parameter calculated as
. In the tables, we identify the results using a different subscript for each estimator. The notation for the canonical MPL is
, for the median MPL is
, for the mode MPL is
, for the midpoint MPL is
, for the MM Kendall’s tau inversion is
, and for the MM Spearman’s rho inversion is
.
The results for the percentage relative bias can be visualised in
Figure 3, where we plot the PRB for
. As already observed by
Kojadinovic and Yan (
2010), the MM estimators have a smaller relative bias than the canonical MPL for small weakly dependent samples, except for the Student-t, where the Spearman’s inversion method performs quite poorly. However, the relative advantage of the MM estimators over the canonical MPL reduces when the sample size increases (see
Table A2,
Table A3 and
Table A4 in
Appendix B). For dependence levels
the MM estimators can actually have a much larger PRB as it is the case for the Plackett copula. The newly considered median, mode and midpoint MPL estimators have smaller bias than the canonical MPL for weakly dependent samples (
) across all sample sizes. The mode and the midpoint MPL estimators have lower bias than the MM estimators for weakly dependent samples especially for smaller samples. The differences between the estimators in terms of bias reduce as the sample size increases. The median MPL performs remarkably well, in terms of bias, for the Normal and Student-t copulas across all levels of dependence.
The values for the empirical standard deviation of the estimates are very close to the mean of the estimated standard errors. This supports the assumptions underlying the estimator for the asymptotic variance. The empirical percentage coverage (PC) does not seem very different across the six estimators either. We can see that the PC tends to be larger than the 95% level for weaker dependence () and smaller than the 95% level for stronger dependence. From the results for the standard errors and percentage coverage obtained from the simulations, we find no evidence to contradict the asymptotic normality of the estimators. Overall, the results are consistent across the different copula families and sample sizes considered here.
Table A5 contains the estimated root mean square error (RMSE) for the canonical MPL estimator obtained for each sample size, copula, and level of dependence considered. The RMSE increases with the level of dependence, except for the Normal and Student-t, and decreases as the sample becomes larger. Hence, the higher RMSE for the canonical MPL estimator is observed for small strongly dependent samples and the lower RMSE is obtained from weakly dependent large samples. The increase in the RMSE with the level of dependence is supported by the fact that the estimated standard errors also increase with the strength of dependence, as shown in the PRB tables. For the Normal and Student-t copulas the estimated standard errors and RMSE of the canonical MPL decrease with the strength of the dependence and sample size.
In
Table A5, we also report the percentage relative efficiency (PRE) calculated as 100 times the estimated RMSE of the canonical MPL divided by the estimated RMSE of each of the other five estimators. We plot the PRE values in
Figure 4 of the five estimators in relation to the canonical MPL for sample size
. We observe that the MM Kendall’s tau- and Spearman’s rho-based estimators outperform the canonical MPL estimator for small weakly dependent samples but this advantage vanishes when the level of dependence becomes stronger or the sample size increases. These results are perfectly in line with the results from
Kojadinovic and Yan (
2010). The three semiparametric MPL estimators proposed here outperform, in terms of MSE, both MM estimators for all levels of dependence and sample size. Consequently, the three estimators introduced also outperform the canonical MPL for low dependence small samples. For stronger levels of dependence,
, and samples larger than 100 the canonical MPL has the smallest MSE for the sample sizes considered. It is worth noting that, in the simulations, the proposed estimators substantially outperform the canonical MPL for weak dependence while for stronger dependence, the outperformance of the canonical MPL is modest. It is interesting that the MM estimators can have a quite poor performance in terms of MSE for stronger dependence in relation to the MPL estimators. Between the three MPL estimators introduced here, the mode MPL is overall the best for weakly dependent samples. This is particularly clear in
Figure 4.
Finally, we estimate the asymptotic relative efficiency of the median MPL, mode MPL and midpoint MPL in relation to the canonical MPL estimator. The asymptotic percentage relative efficiency for each estimator is calculated as the estimated variance of the canonical MPL estimate, divided by the estimated variance of the MPL estimate given by the method being compared with, multiplied by 100. The estimates are obtained from a pseudo-randomly generated sample of size
100,000. The results, presented in
Table A6, confirm that the three proposed MPL estimators and the canonical MPL estimator are asymptotically equally efficient.
8. Application to General Insurance Loss Ratios
In our application, we show the impact of using different MPL estimators while modelling the dependence between general insurance business classes, which is relevant for pricing, reserving and regulatory capital. We apply our results to loss ratios net of reinsurance from three insurance classes: houseowners/householders, domestic motor vehicles, and commercial motor vehicles. The data have been downloaded from the Australian Prudential Regulation Authority (APRA) (
https://www.apra.gov.au/, accessed on 20 December 2023) general insurance statistics website. The historical loss ratios are available only from September 2010 until March 2023, comprising a sample of
quarterly observations per insurance class.
Common factors underlying the risks covered under these three insurance classes, like weather conditions for instance, suggest the presence of dependence between the loss ratios. The Pearson’s linear correlation between houseowners/householders (house) and domestic motor vehicle (dom-motor) loss ratios is
, between house and commercial motor vehicle (com-motor) is
and between dom-motor and com-motor is
. To select a copula model, we use the goodness-of-fit test from
Genest et al. (
2009) implemented in the R package
copula. Although net of reinsurance, there might still be signs of upper joint tail dependence in the loss ratios. Indeed, a 180° rotated Clayton copula fitted to the loss ratios of house and dom-motor gives a
p-value of
, compared with
from fitting a Gumbel copula,
from a Student-t copula and
from a normal copula. In panel A of
Table 1, we list the estimates obtained from fitting a rotated Clayton model to house and dom-motor using the different MPL and MM estimators. For benchmarking, we also list the Kendall’s tau,
, and upper tail dependence,
, implied by the copula parameter estimates from the different methods. The mode MPL estimation produces the lowest copula parameter estimate and the lowest standard error, while the corresponding canonical MPL estimates are the largest among the MPL estimators. The MM estimators produce the largest copula estimates and standard errors. This agrees with the results we obtained for the finite sample performance of the estimators in
Section 7.1. For a Clayton copula with
, we observed that the mode MPL has the lowest PRB and standard error indicating that the mode MPL should give the least upward biased estimate of dependence. Comparing the results from the different estimators, note that the Kendall’s tau implied by the copula estimates ranges from
to
while the upper tail dependence ranges from
to
. Depending on the volume of earned premiums of these insurance classes on a particular insurance company, such variability will potentially have a significant financial impact on the calculation of reserves and regulatory capital of the firm.
For the case of house and com-motor loss ratios, with an even lower linear correlation of
, the goodness-of-fit test from
Genest et al. (
2009) ranks first the 180° rotated Clayton copula model with a
p-value of
, followed by a Gumbel copula with
, a normal copula with
and a Student-t copula with a
p-value. The results are consistent with the previous observations; see panel B in
Table 1. The mode MPL produces the lowest overall estimate for the copula parameter and standard error, and the canonical MPL gives the highest estimates among the MPL estimators. The MM estimates are the highest across all the estimation methods. The implied Kendall’s tau varies now between
and
, while the upper tail dependence parameter ranges from
to
.
Finally, we consider the pair with the highest linear correlation among the named insurance classes reported in the APRA data: domestic motor vehicle and commercial motor vehicle. These two insurance classes have a sample linear correlation of
between the corresponding loss ratios. In this case, the Gumbel copula model ranks first with a goodness-of-fit test
p-value of
. It is not surprising that a textbook three dimensional model, as a multivariate Gumbel or Clayton copulas for instance, does not have enough flexibility to accommodate real data as it is the case here. For the Gumbel copula model with parameter
, Kendall’s tau is given by
and the upper tail dependence parameter is
; see
Joe (
2014). The results from fitting a Gumbel copula model to com-motor and dom-motor, reported in panel C of
Table 1, are coherent with those obtained for the previous two pairs of loss ratios. The mode MPL parameter estimate and standard error are the lowest across all the estimation methods, the canonical MPL has the largest estimates among the MPL estimators and the MM estimates are the largest overall. Nevertheless, the differences between the estimates are much smaller than in the previous two lower dependence cases, as we can see from the implied Kendall’s tau and upper tail coefficient estimates. The Kendall’s tau ranges between
and
and the upper tail coefficient varies from
and
.
From the three cases considered in this application, we observe that the variation of the estimates from the different MPL estimators increases as the dependence level decreases. The mode MPL has consistently the lowest parameter estimate and standard error. At a lower dependence level, the implied Kendall’s tau obtained by the MM estimators is almost double that obtained from the mode MPL. This study based on empirical data confirms what we would expect to observe according to the results we obtained in
Section 7.1 for the finite sample performance of the estimators.