1. Introduction
Based on the kernel method and transformations, we present a new nonparametric estimator of a multivariate copula that improves the empirical copula and the most prominent kernel estimators (see reference [
1], for a detailed review). We use the new estimator to analyse and test the extreme value dependence between the losses in the Spanish stock market index and different stock market indexes of Europe, USA and China.
The copula model allows us to represent the dependence structure of a multivariate random vector of continuous variables
, which combines with marginal distributions to give the multivariate distribution. This idea was established in the fundamental theorem proposed by Sklar [
2]. This theorem shows that a multivariate cumulative distribution function (cdf)
H of the random vector
, with marginal distributions functions
, has associated a copula
C, so that:
In practice, the dependence structure and marginal distributions are unknown and both will need to be fitted. We assume that marginal cdfs can be easily adjusted using parametric distributions or nonparametric methods and we focus on the fitting of dependence structure using a copula. It is often difficult by visualizing the data to select the appropriate dependency structure and, therefore, the right copula model. Alternatively, a nonparametric estimation of a copula can be obtained whose results can be used for estimating joint probabilities or for testing the adequacy of a copula family, for example, the extreme value copula family. In this paper, these two aims of our new nonparametric estimator are analysed through a simulation study.
Because there are a lot of dependence structures represented by different copulas families, specific tests for choosing the best copula are useful. The approach for developing a test for the adequacy of copulas takes its lead from, for example, the proposal of Genest and Rivest [
3] for bivariate Archimedean copulas; the test of Scaillet [
4] on inference for the positive quadrant dependence hypothesis; the test for equality between two copulas of [
5] or the test of symmetry for bivariate copulas of Genest et al. [
6].
On inference for extreme value copulas, alternative types of tests have been proposed, among which the most well known are the test of Genest et al. [
7] based on a Cramér-von Mises statistic, the test analysed by Ghorbal et al. [
8] based on an
U-statistic and the test of Kojadinovic et al. [
9] that uses the
-
property and is also based on a Cramér-von Mises statistic (see also [
10] for complete properties of the test based on
-
property).
The test proposed by Kojadinovic et al. [
9] is based on the empirical copula that is equivalent to the multivariate empirical distribution. However, the empirical copula is inefficient for certain shapes of distribution, for example, when the marginal cdfs are associated with extreme value distributions. Alternatively, Omelka et al. [
1] analyse how testing extreme value copula can be based on different kernel estimators. The main difficulty of a classical kernel estimator is its bias on the boundaries when the function values at these points are positive. Based on this concern, Chen and Huang [
11] analyse the kernel copula estimator with local linear boundary correction which the authors proved reduces bias and variance. Alternatively, Omelka et al. [
1] propose the transformation of a kernel copula estimator based on standard normal inverse distribution function transformations, which is very easy to implement and has the same weak convergence properties as the previous proposal. In this paper, an improved transformed kernel estimator is proposed that has the same weak convergence properties and is useful for the inference on extreme value copulas. The theoretical results are shown for the bivariate case, but they are easily extrapolated to the multivariate case.
In
Section 2, we present the background on kernel estimation of copulas, the new estimator and its theoretical asymptotic properties for testing the
-
property of extreme value copulas.
Section 3 presents the simulation results that allow us to analyse finite-sample properties and inference errors type 1 and 2. As an illustration, in
Section 4, a financial risk analysis is carried out where the extreme value copula family hypothesis between the Spanish stock market index and different neighbouring and non-neighbouring countries is tested. Finally, we conclude in
Section 5.
2. Kernel Estimation of Copulas
Let
, ∀
, be a sample of
n independent and identically distributed (i.i.d.) bivariate data, the product kernel estimator of the bivariate cdf can be expressed as:
where
K is the cdf associated with the kernel function
k, that is a bounded or asymptotically bounded and symmetric probability density function (pdf) (see [
12,
13] for a review on kernel estimation of the multivariate distribution function). Examples of such functions are the Epanechnikov and the Gaussian kernels (see [
14]). The parameters
and
, known as the bandwidths or smoothing parameters, control the smoothness of the estimation. Thus, the larger the value of
and
, the smoother the resulting function. Their values depend on the sample size
n—the biggest sample size
n, the lower the smoothing parameters—but obtaining optimal values for these smoothing parameters is one of the greatest difficulties posed by the kernel estimation.
Based on Slark’s theorem, from (
2) we specify the kernel estimator of copula as:
where
,
, are estimators of the marginal cdfs that, in practice, can be obtained based on a parametric distribution or with a non-parametric estimator. Given that the copulas allow us to separate dependence structure from marginal distribution, we focus on estimating the first; so, the aim is to estimate a multivariate cdf with
marginal distributions, whose kernel estimator for bivariate case is expressed as:
where, unlike (
2), given that the marginal distributions are
, we assume
and
as
, taking into account the relationship between
b and
n, hereinafter we denote it as
. In practice, we need to define observations
, ∀
, the values of the marginal empirical distributions
,
and
, are a natural choice. However, it is known that empirical distribution takes value 1 at the maximum value observed and most of the commonly used copulas (Gumbel, Clayton, Gaussian and Student’s
t) are not finite derivatives (copula density values) at corners
; then, these empirical distributions are replaced by corrected versions that are known as pseudo-data and that can be defined as:
or, as Chen and Huang [
11] suggested,
. So, the kernel estimator of a copula is defined as:
To obtain the estimator defined in (
5) a kernel function,
K, needs to be selected that will have minimal effect on the results obtained, and to calculate the bandwidth
, whose value will have an important effect on the estimated copula. The bandwidth
can be calculated using some cross-validation or plug-in method or using the rule-of-thumb proposed by Silverman [
14] for the kernel estimator of pdf adapted to the kernel estimator of cdf (see [
12,
15]).
The properties of a kernel estimator depend on some smoothness characteristics of the cdf; in our context in particular, it is a requirement that the first two derivatives take finite values different from zero. Furthermore, when the distribution has a bounded domain and the density at boundary takes positive values, as in the case of the bivariate copula with domain on
, the estimator defined in (5) has boundary bias. This means that the kernel estimator at boundary is not consistent (see [
16] pp. 46–47, for a clear description in the kernel density estimator context). This is problematic since our aim is to test if our data is generated by an extreme value copula. There are three alternative proposals to achieve consistency at boundary of a kernel estimator of a copula. Boundary kernel methods are the most common techniques proposed in the context of kernel regression and density estimation (see [
17,
18]), the main difficulty with the use of this type of kernel being that it does not integrate one which, in practice, could be inconvenient. Chen and Huang [
11] proposed a kernel estimator of copulas with linear boundary correction, the weakness of their method is that for many common families of copulas (e.g., Clayton, Gumbel, Gaussian and Student’s
t) the bias at some of the corners of the unit square is only of order
, versus the
that is reached in the central values of the domain, where
is the asymptotic order operator. Another way to correct boundary bias is using the mirror-reflection kernel estimator, this method being proposed by Gijbels and Mielniczuk [
19] to estimate the density of the copula. In all cases, the main difficulty of a kernel estimator with or without boundary bias is calculating the smoothing parameter whose value will greatly affect the results.
An alternative strategy to avoid boundary bias and to calculate the smoothing parameter easily is to transform
marginal distributions of the copula so that the kernel estimator of the new marginal distributions does not have boundary bias and their shapes allow us to minimise the bias of the kernel estimator. This idea also addresses the problems of the estimator defined in (3) based on the original scale of the data. On the one hand, although the marginal distributions are not uniform, they can have shapes that could also be subject to inconsistency at the boundaries, i.e., the distribution could have bounded domain on one or both sides with positive density. On the other hand, the problems associated with the kernel estimator defined in (3) are widely known when the distribution to estimate has one or two long tails (see [
20,
21,
22]).
The transformed estimator of the copula is based on the equality:
i.e., the values of the copula function
C evaluated on original
scale are equal to the values of function
evaluated on transformed scale. So, the transformed kernel estimator (TKE) of a copula is defined as:
where
is a transformation which is equal to the inverse of a given continuous cdf. The estimator defined in (
6) has a fundamental advantage over the kernel estimator defined in (
5) and its versions that incorporate boundary bias reduction; given that the function
is the inverse of a given cdf, we know the marginal distributions of
and the bandwidth can be calculated based on these distributions.
Omelka et al. [
1] proposed that
, where
is the cdf of the standard normal distribution. This standard normal transformation is based on the idea that the normal distribution does not have boundary bias problems and it can be estimated easily using a classical kernel estimator. This transformed estimator is called Gaussian transformed kernel estimator and is defined as:
In practice, in this case the value of bandwidth can be calculated using the idea of rule-of-thumb of Silverman [
14] applied to the standard normal marginal cdfs, that is
. In the simulation study presented in
Section 3, we show the difference between the mean integrated squared error of a copula,
, using optimal
and using the proposed rule-of-thumb.
We propose an alternative estimator to the one defined in (7) using a transformation
T that is better than
. Our proposal is based on the second-order approximation properties of univariate kernel estimator of marginal distributions. When
as
,
f is a continuous pdf and the first derivative
exists, the bias and variance of kernel estimator of cdf are (see [
15,
23,
24]):
and
By addition of the integrated variance and the integrated squared bias, we can approximate the MISE of the kernel estimation of marginal distributions as:
where the integral limits are given by the domain of argument variable
Y. From expression (
10) it is easy to deduce that the distribution that minimises MISE also minimises the functional
. Terrell [
25] found the pdf family that minimises the functionals of type
, where
p is the order of the derivative. This principle was applied to cdf and quantile kernel estimation by Alemany et al. [
21], who showed how the
, whose pdf and cdf are:
minimises the functional
,
, and therefore minimises the integrated bias of the classical kernel estimator of a cdf.
Section 2 includes the theoretical results on testing extreme value copulas, and Theorem 1 shows as the cdf
M has the properties that allow us to conclude that the kernel estimator of
does not have boundary bias (see [
1]). So, the Beta transformed kernel estimator of a copula is:
where
can be calculated using rule-of-thumb applied to the
marginal distributions, that is
. In the simulation study shown in
Section 3, the MISE calculated with this bandwidth is compared to the one that minimises MISE.
Next, we present some theoretical results related to the weak convergence to a Gaussian process
of the estimator defined in (
12) and the
-
property for testing extreme value copulas.
Theoretical Results
We use the result from Fermanian et al. [
26] for the weak convergence of the kernel estimator of a copula defined in (
5) to a Gaussian process
in the space of all bounded real-valued functions on
, i.e.,
, which is expressed as follows:
where
, are the partial derivatives of the function
C with respect to
, ⟼ indicates weak convergence and
is a Brownian bridge on
with covariance function:
where ∧ is the minimum.
The weak convergence defined in (
13) requires that the copula has continuous partial derivatives. Furthermore, Omelka et al. [
1] proved the weak convergence of local linear, mirror reflection and Gaussian transformed kernel estimators of copula. These authors remark that it is sufficient to assume that the first partial derivatives are continuous on
, i.e., we can eliminate the corners. This is an important result, given that most of the commonly used copulas (Clayton, Gumbel, Normal and Student’s
t) do not have finite partial derivatives at the corners.
The weak convergence of our Beta transformed kernel estimator is defined in the following theorem.
Theorem 1. Let us suppose a continuous copula C, with continuous first order partial derivatives and bounded second order partial derivatives on that satisfies the following asymptotic properties: , and . If the Beta transformed kernel estimator meets the weak convergence defined in (13). Proof of Theorem 1. Let
be the inverse transformation function in
, the proof of Theorem 1 comes directly from the results of Theorem 2 in Omelka et al. [
1], who proved that, if the first derivative
and
are bounded, then
converge weakly to the Gaussian process
. For
we have that
,
and
,
. Directly, we know that the pdf
m of the
is bounded. Moreover, if the quotient
is analysed, the maximum is approximately found at
. □
The weak convergence of Theorem 1 allows us to use
for the inference on copulas. We focus on an extreme value copula test based on the proposal of Kojadinovic et al. [
9], that analyses the
-
property associated with this family of copulas (see, for example, [
27]). A copula is
-
if
and
in
the null hypothesis
is not rejected from the alternative
. In practice, we test the
-
hypothesis using some values of
(see [
9]),
To test the previous hypotheses we propose estimating = using the Beta transformed kernel estimator of the copula, i.e., .
Proposition 1. If the partial derivatives of the copula are continuous then for any we have: in .
Proof of Proposition 1. The result in Proposition 1 is obtained from:
Using the convergence of Theorem 1:
We now need to prove the weak convergence of
. To this end, we use the result of Kojadinovic et al. [
9], that proved the weak convergence of this difference for empirical copula (see also [
10]). In general, this result can be directly extrapolated to the kernel estimator and, in particular, to the Beta transformed kernel estimator, considering that
. Then, under
,
,
it weakly converges to process (
14). □
For hypothesis testing given a fixed
r, we use a Cramér-von Mises statistic:
and for a range of values
, the following statistic can be considered:
For implementing the test based on
, we use the numerical approximation proposed by Kojadinovic et al. [
9], replacing the empirical copula by a Beta transformed kernel estimator of the copula. The procedure is as follows:
The statistics are approximated using a uniformly spaced grid , , of points on , i.e., .
R independent copies of
,
are generated, such that
where
are independent copies of
. The process of obtaining these independent copies of
is described in
Appendix A.
To calculate the copies of
as
and to obtain the
p-value of the statistics as:
4. Data Analysis
For illustrating the usefulness of our proposed estimator
, we analyse the dependence between the Spanish stock market index (IBEX35) and the stock market indexes of some neighbouring European countries, namely Germany (DAX), France (CAC40), Italy (FTSE MIB), Portugal (PSI20) and United Kingdom (FTSE100) as well as the two principal stock market indexes of the USA (DOWJONES and S&P500) and the Hong Kong stock market index (HANG SENG) (see [
28] for an analysis of extreme dependence between markets).
Two types of results are shown:
The fit of non parametric copulas to estimate the probability that the observed losses of two stock market indexes together exceed some percentiles, i.e., we estimate the value of , , with the analysed kernels estimators.
The test to analyse if the data is generated by an extreme value copula.
To carry out the analysis we use a database of the monthly losses of the stock market indexes from January 2000 to March 2021. These losses are calculated from the quotes of the analysed indexes that are public and can be downloaded, for example, from
Investing.com. Throughout the period analysed, three major events influenced market performance leading to higher losses than in periods of stability: the Lehman Brothers crisis that began in September 2008, the referendum on Brexit on 23 June 2016, and the ongoing COVID-19 crisis which started in March 2020. The three events are considered systemic risks that affect all markets and, if this effect is simultaneous, the data should be generated by an extreme value copula. In
Table 5, the main descriptive statistics of the losses in percentage are shown. Furthermore, normality tests and a positive skewness test are carried out and, in all cases, normality hypothesis is rejected and skewness greater than zero can not be rejected, i.e., in absolute value, positive losses are bigger that negative ones.
The losses of the Spanish stock market index are plotted together with the indexes of the countries listed for comparison. In
Figure 1, we compare Spain (in blue) with four countries that also currently belong to the European Union (in black) and,
Figure 2, the comparison is made with the other countries (in black).
To obtain the Beta transformed kernel estimator we need the data to be i.i.d. so, with this in mind, we analyse if the the monthly losses of the stock market indexes have some kind of time dependence on the mean or on the variance. The simple and partial autocorrelation functions of the series and the square series allows us to find the
model used to filter series and to get i.i.d. data (see, for example, [
29]). The filter models used are shown in
Table A2 of
Appendix A.
In
Table 6, we show the results of
for
estimated with
, i.e., the probability of jointly exceeding a given extreme quantile. The upper tail dependence can be approximated as
. In
Appendix C, we show that the results obtained with the empirical copula and
provide lower values than
. It should be noted that the empirical copula tends to underestimate the probability of the tail when extreme values exist. Furthermore, in the simulation study
improves
for all the compared copulas.
We obtain the results of the extreme value copula test of Kojadinovic et al. [
9] based on the empirical copula, and the same test based on the Beta transformed kernel estimator that is analysed in this paper, using the asymptotically optimal smoothing parameter
and a grid of 4 values around it. As expected, all the results indicate that all the analysed bivariate data have a dependence structure generated by an extreme value copula. This behaviour has been accentuated by the COVID-19 crisis, which has led to greater losses and a systemic risk that lasts over time (see [
30,
31] for a review on effect of COVID-19 on markets returns and volatility). In
Figure 3 and
Figure 4, pairs of pseudo-data are plotted; in all cases some accumulation of points is detected near the corner
, which is an indicator of extreme value dependence.