Goodness-of-Fit Tests For Elliptical and Independent Copulas through Projection Pursuit

Touboul, Jacques

doi:10.3390/a4020087

Open AccessArticle

Goodness-of-Fit Tests For Elliptical and Independent Copulas through Projection Pursuit

by

Jacques Touboul

Université Pierre et Marie Curie, Laboratoire de Statistique Théorique et Appliquée, 175 rue du Chevaleret, 75013 Paris, France

Algorithms 2011, 4(2), 87-114; https://doi.org/10.3390/a4020087

Submission received: 15 March 2011 / Revised: 5 April 2011 / Accepted: 8 April 2011 / Published: 26 April 2011

Download

Browse Figures

Versions Notes

Abstract

: Two goodness-of-fit tests for copulas are being investigated. The first one deals with the case of elliptical copulas and the second one deals with independent copulas. These tests result from the expansion of the projection pursuit methodology that we will introduce in the present article. This method enables us to determine on which axis system these copulas lie as well as the exact value of these very copulas in the basis formed by the axes previously determined irrespective of their value in their canonical basis. Simulations are also presented as well as an application to real datasets.

Keywords:

copulas; goodness-of-fit; projection pursuit; elliptical distributions

1. Introduction

The need to describe the dependency between two or more random variables triggered the concept of copulas. We consider a joint cumulative distribution function (cdf) F on ℝ^d and its cdf margins F₁, F₂, …,F_d. A copula C is a function such that F = C(F₁, F₂, …, F_d). Sklar [1] is the first to lay the foundations of this new theory. Several parametric families of copulas have been defined, namely elliptical, archimedean, periodic copulas etc., see Joe [2] and Nelsen [3] as well as Appendix A for an overview of these families. Finding criteria to determine the best copula for a given problem can only be achieved through a goodness-of-fit (GOF) approach. So far several GOF copula approaches have been proposed in the literature, e.g., Carriere [4], Genest and Rémillard [5], Fermanian [6], Genest Quessy and Rémillard [7], Michiels and De Schepper [8], Genest Favre Béliveau and Jacques [9], Mesfioui Quessy and Toupin [10], Genest Rémillard and Beaudoin [11], Berg [12], Bücher and Dette [13], among others. However, the field is still at an embryonic stage which explains the current shortage in recommendations. In univariate distributions, the GOF assessment can be performed using for instance the well-known Kolmogorov test. In the multivariate field, there are fewer alternatives. A simple way to build GOF approaches for multivariate random variables is to consider multi-dimensional chi-square approaches, as in for example Broniatowski [14]. However, these approaches present feasibility issues for high dimensional problems due to the curse of dimensionality. In order to solve this, we recall some facts from the theory of projection pursuit.

The objective of projection pursuit is to generate one or several projections providing as much information as possible about the structure of the dataset regardless of its size. Once a structure has been isolated, the corresponding data are transformed through a Gaussianization. Through a recursive approach, this process is iterated to find another structure in the remaining data, until no further structure can be evidenced in the data left at the end. Friedman [15] and Huber [16] count among the first authors who introduced this type of approaches for evidencing structures. They each describe, with many examples, how to evidence such a structure and consequently how to estimate the density of such data through two different methodologies each. Their work is based on maximizing Kullback-Leibler divergence. In the present article, we will introduce a new projection pursuit methodology based on the minimisation of any ϕ-divergence greater than the L¹-distance (ϕ-PP). We will show that this algorithm presents the extra advantage of being robust and fast from a numerical standpoint. Its key rationale lies in the fact that it allows not only to carry out GOF tests for elliptical and independent copulas but also to determine the axis system upon which these very copulas are based. The exact expression of these copulas in the basis constituted by these axes can therefore be derived.

This paper is organised as follows: Section 2 contains preliminary definitions and properties. In Section 3, we present in details the ϕ-projection pursuit algorithm. In Section 4, we present our first results. In Section 5, we introduce our tests. In Section 6, we provide three simulations pertaining to the two major situations described herein and we will study a real case.

2. Basic theory

2.1. An Introduction to Copulas

In this section, we recall the concept of copula. We will also define the family of elliptical copulas through a brief reminder of elliptical distributions—see Appendix A for an overview of other families.

Sklar's Theorem

First, let us define a copula in ℝ^d.

Definition 2.1

A d-dimensional copula is a joint cumulative distribution function C defined on [0, 1]^d, with uniform margins.

The following theorem explains in what extent a copula does describe the dependency between two or more random variables.

Theorem 2.1 (Sklar [1])

Let F be a joint multivariate distribution with margins F₁, …, F_d, then, there exists a copula C such that

F (x_{1}, \dots, x_{d}) = C (F_{1} (x_{1}), \dots, F_{d} (x_{d}))

(2.1)

If marginal cumulative distributions are continuous, then the copula is unique. Otherwise, the copula is unique on the range of values of the marginal cumulative distributions.

Remark 2.1

First, for any copula C and any u_i in [0, 1], 1 ≤ i ≤ d, we have

W (u_{1}, \dots, u_{d}) = max {1 - d + \sum_{i = 1}^{d} u_{i}, 0} \leq C (u_{1}, \dots, u_{d}) \leq min_{j \in {1, \dots, d}} u_{j} = M (u_{1}, \dots, u_{d})

where W and M are called the Frechet-Hoeffding copula boundaries and are also copulas.

We set the independent copula Π as $Π (u_{1}, \dots, u_{d}) = Π_{i = 1}^{d} u_{i}$ , for any u_i in [0, 1], 1 ≤ i ≤ d.

Moreover, we define the density of a copula as the density associated with the cdf C, which we will name as c:

Definition 2.2

Whenever there exists, the density of C is defined by $c (u_{1}, \dots, u_{d}) = \frac{\partial^{d}}{\partial u_{1} \dots \partial u_{d}} C (u_{1}, \dots, u_{d})$ , for any u_i in [0, 1], 1 ≤ i ≤ d.

Finally, let us present several examples of copulas (see also Appendix A to find an overview).

Example 2.1

The Gaussian copula C_ρ (in ℝ²):

Defining Ψ_ρ as the standard bivariate normal cumulative distribution function with ρ correlation, the Gaussian copula function is

C_{ρ} (u, v) = Ψ_{ρ} (Ψ^{- 1} (u), Ψ^{- 1} (v))

where u, v ∈ [0, 1] and where Ψ is the standard normal cumulative distribution function.

The Student copula C_ρ (in ℝ²):

Defining T_ρ,k as the standard bivariate student cumulative distribution function with ρ as the correlation coefficient and with k as the degree of freedom of the distribution, the Student copula function is

C_{ρ} (u, v) = T_{ρ, k} (T_{k}^{- 1} (u), T_{k}^{- 1} (v))

where u, v ∈ [0, 1] and where T_k is the standard Student cumulative distribution function.

The Elliptical copula :

Similarly as above, elliptical copulas are the copulas of elliptical distributions (an overview is provided in Appendix A).

2.2. Brief Introduction to the ϕ-Projection Pursuit Methodology (ϕ-PP)

Let us first introduce the concept of ϕ-divergence.

The Concept of ϕ-Divergence

Let φ be a strictly convex function defined by $φ : \bar{ℝ +} \to \bar{ℝ +}$ , and such that φ(1) = 0. We define a ϕ-divergence of P from Q, where P and Q are two probability distributions over a space Ω such that Q is absolutely continuous with respect to P-by

D_{ϕ} (Q, P) = \int φ (\frac{d Q}{d P}) d P

or

D_{ϕ} (q, p) = \int φ (\frac{q (x)}{p (x)}) p (x) d x

, if P and Q present p and q as density respectively.

Throughout this article, we will also assume that φ(0) < ∞, that φ′ is continuous and that this divergence is greater than the L¹ distance—see also Appendix B page 109.

Functioning of the Algorithm

Let f be a density on ℝ^d. We consider an instrumental density g with the same mean and variance as f. We start with performing the D_ϕ(g, f) = 0 test; should this test turn out to be positive, then f = g and the algorithm stops, otherwise, the first step of our algorithm consists in defining a vector a₁ and a density g⁽¹⁾ by

a_{1} = arg inf_{a \in ℝ_{*}^{d}} D_{ϕ} (g \frac{f_{a}}{g_{a}}, f) and g^{(1) =} g \frac{f_{a_{1}}}{g_{a_{1}}}

(2.2)

where

ℝ_{*}^{d}

, is the set of non-null vectors of ℝ^d and f_a (resp. g_a) stands for the density of a^⊤X (resp. a^⊤Y) when f (resp. g) is the density of X (resp. Y).

In our second step, we replace g with g⁽¹⁾ and we repeat the first step, and so on. By iterating this process, we end up obtaining a sequence (a₁, a₂, …) of vectors in $ℝ_{*}^{d}$ and a sequence of densities g⁽ⁱ⁾.

Remark 2.2

First, to obtain an approximation of f, we stop our algorithm when the divergence equals zero, i.e., we stop when D_ϕ(g^(j), f) = 0 since it implies g^(j) = f with j ≤ d, or when our algorithm reaches the d^th iteration, i.e., we approximate f with g^(d).

Second, we get D_ϕ(g⁽⁰⁾, f) ≥ D_ϕ(g⁽¹⁾, f) ≥ …‥ ≥ 0 with g⁽⁰⁾ = g.

Finally, the specific form of the relationship (2.2) implies that we deal with M-estimation. We can therefore state that our method is robust—see Sections 6, Yohai [19], Toma [20] as well as Huber [21].

The main steps of the present algorithm have been summarized in Table 1.

At present, let us study the following example:

Example 2.2

Let f be a density defined on ℝ³ by f(x₁, x₂, x₃) = n(x₁, x₂)h(x₃), with n being a bi-dimensional Gaussian density, and h being a non-Gaussian density. Let us also consider g, a Gaussian density with the same mean and variance as f.

Since g(x₁, x₂/x₃) = n(x₁, x₂), we have $D_{ϕ} (g \frac{f_{3}}{g_{3}}, f) = D_{ϕ} (n . f_{3}, f) = D_{ϕ} (f, f) = 0$ as f₃ = h, i.e., the function $a \mapsto D_{ϕ} (g \frac{f_{a}}{g_{a}}, f)$ reaches zero for e₃ = (0, 0, 1)′, where f₃ and g₃ are the third marginal densities of f and g respectively. We therefore obtain g(x₁, x₂/x₃) = f(x₁, x₂/x₃).

To recapitulate our method, if D_ϕ(g, f) = 0, we derive f from the relationship f = g; whenever a sequence (a_i)_i_=1,…_j, j < d, of vectors in $ℝ_{*}^{d}$ defining g^(j) and such that D_ϕ(g^(j), f) = 0 exists, then $f (. / a_{i}^{⊤} x, 1 \leq i \leq j) = g (. / a_{i}^{⊤} x, 1 \leq i \leq j)$ i.e., f coincides with g on the complement of the vector subspace generated by the family {a_i}_i=1,…,j—see also Section 3 for a more detailed explanation.

In the remaining of our study of the algorithm, after having clarified the choice of g, we will consider the statistical solution to the representation problem, assuming that f is unknown and that X₁, X₂,…X_m are i.i.d. with density f. We will provide asymptotic results pertaining to the family of optimizing vectors a_k,m—which we will define more precisely below—as m goes to infinity. Our results also prove that the empirical representation scheme converges towards the theoretical one.

3. The Algorithm

3.1. The Model

Let f be a density on ℝ^d. We assume there exists d non-null linearly independent vectors a_j, with 1 ≤ j ≤ d, of ℝ^d, such that

f (x) = n (a_{j + 1}^{⊤} x, \dots, a_{d}^{⊤} x) h (a_{1}^{⊤} x, \dots, a_{j}^{⊤} x)

(3.1)

with j < d, n being an elliptical density on ℝ^d−j and with h being a density on ℝ^j, which does not belong to the same family as n. Let X = (X₁, …, X_d) be a vector with f as density

We define g as an elliptical distribution with the same mean and variance as f.

For simplicity, let us assume that the family {a_j}_1≤_j_≤_d is the canonical basis of ℝ^d:

The very definition of f implies that (X_j₊₁, …, X_d) is independent from (X₁, …, X_j). Hence, the density of (X_j₊₁, …, X_d) given (X₁, …, X_j) is n.

Let us assume that D_ϕ(g^(j), f) = 0, for some j ≤ d. We then get $\frac{f (x)}{f_{a_{1}} f_{a_{2}} \dots f_{a_{j}}} = \frac{g (x)}{g_{a_{1}}^{(1 - 1)} g_{a_{2}}^{(2 - 1)} \dots g_{a_{j}}^{(j - 1)}}$ , since, by induction, we have $g^{(j)} (x) = g (x) \frac{f_{a_{1}}}{g_{a_{1}}^{(1 - 1)}} \frac{f_{a_{2}}}{g_{a_{2}}^{(2 - 1)}} \dots \frac{f_{a_{j}}}{g_{a_{j}}^{(j - 1)}}$ .

Consequently, lemma C.1 and the fact that the conditional densities with elliptical distributions are also elliptical, as well as the above relationship, lead us to infer that $n (a_{j + 1}^{⊤} x, ., a_{d}^{⊤} x) = f (. / a_{i}^{⊤} x, 1 \leq i \leq j) = g (. / a_{i}^{⊤} x, 1 \leq i \leq j)$ . In other words, f coincides with g on the complement of the vector subspace generated by the family {a_i}_i=1,…,j.

Now, if the family {a_j}_1≤_j_≤_d is no longer the canonical basis of ℝ^d, then this family is again a basis of ℝ^d. Hence, lemma C.2 implies that

g (. / a_{1}^{⊤} x, \dots, a_{j}^{⊤} x) = n (a_{j + 1}^{⊤} x, \dots, a_{d}^{⊤} x) = g (. / a_{1}^{⊤} x, \dots, a_{j}^{⊤} x)

(3.2)

which is equivalent to D_ϕ(g^(j), f) = 0, since by induction

g^{(j)} = g \frac{f_{a_{1}}}{g_{a_{1}}^{(1 - 1)}} \frac{f_{a_{2}}}{g_{a_{2}}^{(2 - 1)}} \dots \frac{f_{a_{j}}}{g_{a_{j}}^{(j - 1)}}

.

The end of our algorithm implies that f coincides with g on the complement of the vector subspace generated by the family {a_i}_i=1,…,j. Therefore, the nullity of the ϕ-divergence provides us with information on the density structure.

In summary, the following proposition clarifies our choice of g which depends on the family of distribution one wants to find in f :

Proposition 3.1

With the above notations, D_ϕ(g^(j), f) = 0 is equivalent to

g (. / a_{1}^{⊤} x, \dots, a_{j}^{⊤} x) = f (. / a_{1}^{⊤} x, \dots, a_{j}^{⊤} x)

More generally, the above proposition defines the co-support of f as the vector space generated by the vectors a₁, …, a_j.

Definition 3.1

Let f be a density on ℝ^d. We define the co-vectors of f as the sequence of vectors a₁, …, a_j which solves the problem D_ϕ(g^(j), f) = 0 where g is an elliptical distribution with the same mean and variance as f. We define the co-support of f as the vector space generated by the vectors a₁, …, a_j.

Remark 3.1

Any (a_i) family defining f as in (3.1) is an orthogonal basis of ℝ^d—see lemma C.3

3.2. Stochastic Outline of Our Algorithm

Let X₁, X₂,‥,X_m (resp. Y₁, Y₂,‥,Y_m) be a sequence of m independent random vectors with the same density f (resp. g). As customary in nonparametric ϕ-divergence optimizations, all estimates of f and f_a, as well as all uses of Monte Carlo methods are being performed using subsamples X₁, X₂,‥,X_n and Y₁, Y₂,‥,Y_n—extracted respectively from X₁, X₂,‥,X_m and Y₁, Y₂,‥,Y_m—since the estimates are bounded below by some positive deterministic sequence θ_m—see Appendix D.

Let ℙ_n be the empirical measure based on the subsample X₁, X₂,.,X_n. Let f_n (resp. f_a,n for any a in $ℝ_{*}^{d}$ be the kernel estimate of f (resp. f_a), which is built from X₁, X₂,‥,X_n (resp. a^⊤X₁, a^⊤X₂,‥,a^⊤X_n).

As defined in Section 2.2, we consider the following sequences (a_k)_k_≥1 and (g^(k))_k_≥1 such that

\begin{array}{l} a_{k} is a non null vector of ℝ^{d} defined by a_{k} = arg min_{a \in ℝ_{*}^{d}} D_{ϕ} (g^{(k - 1)} \frac{f_{a}}{g_{a}^{(k - 1)}}, f) \\ g^{(k)} is the density defined by g^{(k)} = g^{(k - 1)} \frac{f_{a_{k}}}{g_{a_{k}}^{(k - 1)}} with g^{(0)} = g \end{array}

(3.3)

The stochastic setting up of the algorithm uses f_n and $g_{n}^{(0)} = g$ instead of f and g⁽⁰⁾ = g—since g is known. Thus, at the first step, we build the vector ǎ₁ which minimizes the ϕ-divergence between f_n and $g \frac{f_{a, n}}{g_{a}}$ and which estimates a₁. First, since proposition D.1 and lemma C.4 show how the infimum of the criteria (or index)

{\overset{ˇ}{D}}_{ϕ} (g \frac{f_{a, n}}{g_{a}}, f_{n}) = \frac{1}{n} \sum_{i = 1}^{n} φ (\frac{g (X_{i}) f_{a, n} (a^{⊤} X_{i})}{f_{n} (X_{i}) g_{a} (a^{⊤} X_{i})})

is reached, we are then able to minimize the ϕ-divergence between f_n and

g \frac{f_{a, n}}{g_{a}}

. Second, defining ǎ₁ as the argument of this minimization, proposition 4.3 infers that this vector tends to a₁. Finally, we define the density

{\overset{ˇ}{g}}_{n}^{(1)}

as

{\overset{ˇ}{g}}_{n}^{(1)} = g \frac{f_{{\overset{ˇ}{a}}_{1}, n}}{g_{{\overset{ˇ}{a}}_{1}}}

which estimates g⁽¹⁾ through theorem 4.1.

Now, from the second step and as defined in Section 2.2, we derive the fact that the density g^(2–1) is unknown. Consequently, once again, the samples have to be truncated.

All estimates of f and f_a (resp. g⁽¹⁾ and $g_{a}^{(1)}$ ) are being performed using a subsample X₁, X₂,…,X_n (resp. $Y_{1}^{(1)}, Y_{2}^{(1)}, \dots, Y_{n}^{(1)}$ ) extracted from X₁, X₂,…,X_m (resp. $Y_{1}^{(1)}, Y_{2}^{(1)}, \dots, Y_{m}^{(1)}$ , which is a sequence of m independent random vectors with same density g⁽¹⁾) such that the estimates are bounded below by some positive deterministic sequence θ_m—see Appendix D.

Let ℙ_n be the empirical measure of the subsample X₁, X₂,…,X_n. Let f_n (resp. $g_{n}^{(1)}, f_{a, n}, g_{a, n}^{(1)}$ for any a in $ℝ_{*}^{d}$ ) be the kernel estimate of f (resp. g⁽¹⁾ and f_a as well as $g_{a}^{(1)}$ ) which is built from X₁, X₂,…,X_n (resp. $Y_{1}^{(1)}, Y_{2}^{(1)}, \dots, Y_{n}^{(1)}$ and a^⊤X₁, a^⊤X₂,…,a^⊤X_n as well as $a^{⊤} Y_{1}^{(1)}, a^{⊤} Y_{2}^{(1)}, \dots, a^{⊤} Y_{n}^{(1)}$ ).

The stochastic setting up of the algorithm uses f_n and $g_{n}^{(1)}$ instead of f and g⁽¹⁾. Thus, we build the vector ǎ₂, which minimizes the ϕ-divergence between f_n and $g_{n}^{(1)} \frac{f_{a, n}}{g_{a, n}^{(1)}}$ , since g⁽¹⁾ and $g_{a}^{(1)}$ are unknown—and which estimates a₂. First, since proposition D.1 and lemma C.4 show how the infimum of the criteria (or index)

{\overset{ˇ}{D}}_{ϕ} (g_{n}^{(1)} \frac{f_{a, n}}{g_{a, n}^{(1)}}, f_{n}) = \frac{1}{n} \sum_{i = 1}^{n} φ (\frac{g_{n}^{(1)} (X_{i}) f_{a, n} (a^{⊤} X_{i})}{f_{n} (X_{i}) g_{a, n}^{(1)} (a^{⊤} X_{i})})

is reached, we are then able to minimize the ϕ-divergence between f_n and

g_{n}^{(1)} \frac{f_{a, n}}{g_{a, n}^{(1)}}

. Second, defining ǎ₂ as the argument of this minimization, proposition 4.3 infers that this vector tends, to a₂. Finally, we define the density

{\overset{ˇ}{g}}_{n}^{(2)}

as

{\overset{ˇ}{g}}_{n}^{(2)} = g_{n}^{(1)} \frac{f_{{\overset{ˇ}{a}}_{2}, n}}{g_{{\overset{ˇ}{a}}_{2}, n}^{(1)}}

which estimates g⁽²⁾ through theorem 4.1.

And so on, we end up obtaining a sequence (ǎ₁, ǎ₂, …) of vectors in $ℝ_{*}^{d}$ estimating the co-vectors of f and a sequence of densities ${({\overset{ˇ}{g}}_{n}^{(k)})}_{k}$ such that ${\overset{ˇ}{g}}_{n}^{(k)}$ estimates g^(k) through theorem 4.1.

Let us now summarize the main steps of the stochastic implementation of our algorithm (the dual representation of the estimators will be further detailed in Table 2 below).

4. Results

4.1. Hypotheses on f

In this paragraph, we define the set of hypotheses on f which could possibly be of use in our work. Discussion on several of these hypotheses can be found in Appendix E.

In the remaining of this section, for legibility reasons, we replace g with g⁽^k⁻¹⁾. Let

\begin{matrix} Θ = ℝ^{d}, Θ^{D_{ϕ}} = {b \in Θ | \int φ^{*} (φ^{'} (\frac{g (x)}{f (x)} \frac{f_{b} (b^{⊤} x)}{g_{b} (b^{⊤} x)})) d P < \infty} \\ M (b, a, x) = \int φ^{'} (\frac{g (x)}{f (x)} \frac{f_{b} (b^{⊤} x)}{g_{b} (b^{⊤} x)}) g (x) \frac{f_{a} (a^{⊤} x)}{g_{a} (a^{⊤} x)} d x - φ^{*} (φ^{'} (\frac{g (x)}{f (x)} \frac{f_{b} (b^{⊤} x)}{g_{b} (b^{⊤} x)})) \\ ℙ_{n} M (b, a) = \int M (b, a, x) d ℙ_{n}, P M (b, a) = \int M (b, a, x) d P \end{matrix}

where P is the probability measure presenting f as density.

Similarly as in chapter V of Van der Vaart [22], let us define :

(A1) : For all ε > 0, there is η > 0, such that for all c ∈ Θ^D_ϕ verifying ‖c − a_k‖ ≥ ε, we have PM(c, a) − η > PM(a_k, a), with a ∈ Θ.
(A2) : ∃ Z < 0, n₀ > 0 such that (n ≥ n₀ ⇒ sup_a∈Θ sup_{c∈{Θ^D_ϕ}^c} ℙ_nM(c, a) < Z)
(A3) : There exists V, a neighbourhood of a_k, and H, a positive function, such that, for all c ∈ V, we have |M(c, a_k, x)| ≤ H(x)(P − a.s.) with PH < ∞,
(A4) : There exists V, a neighbourhood of a_k, such that for all ε, there exists a η such that for all c ∈ V and a ∈ Θ, verifying ‖a − a_k‖ ≥ ε, we have PM(c, a_k) < PM(c, a) − η.

Putting $I_{a_{k}} = \frac{\partial^{2}}{\partial a^{2}} D_{ϕ} (g \frac{f_{a_{k}}}{g_{a_{k}}}, f)$ , let us consider now four new hypotheses:

(A5) : $P {‖ \frac{\partial}{\partial b} M (a_{k}, a_{k}) ‖}^{2}$ and $P {‖ \frac{\partial}{\partial a} M (a_{k}, a_{k}) ‖}^{2}$ are finite and the expressions $P \frac{\partial^{2}}{\partial b_{i} \partial b_{j}} M (a_{k}, a_{k})$ and I_{a_k} exist and are invertible.
(A6) : There exists k such that PM(a_k, a_k) = 0.
(A7) : (Var_P(M(a_k, a_k)))^1/2 exists and is invertible.
(A0) : f and g are assumed to be positive and bounded and such that K(g, f) ≥ ∫ |f(x) − g(x)|dx where K is the Kullback-Leibler divergence.

Estimation of the First Co-Vector of f

Let $ℛ$ be the class of all positive functions r defined on ℝ and such that g(x)r(a^⊤x) is a density on ℝ^d for all a belonging to $ℝ_{*}^{d}$ . The following proposition shows that there exists a vector a such that $\frac{f_{a}}{g_{a}}$ minimizes D_ϕ(gr, f) in r:

Proposition 4.1

There exists a vector a belonging to $ℝ_{*}^{d}$ such that

arg min_{r \in ℛ} D_{ϕ} (g r, f) = \frac{f_{a}}{g_{a}} and r (a^{⊤} x) = \frac{f_{a} (a^{⊤} x)}{g_{a} (a^{⊤} x)}

Following Broniatowski [33], let us introduce the estimate of $D_{ϕ} (g \frac{f_{a, n}}{g_{a}}, f_{n})$ , through ${\overset{ˇ}{D}}_{ϕ} (g \frac{f_{a, n}}{g_{a}}, f_{n}) = {sup}_{b \in Θ} \int M (b, a, x) d ℙ_{n} (x)$ .

Proposition 4.2

Let ǎ be such that $\overset{ˇ}{a} : = arg {inf}_{a \in ℝ_{*}^{d}} {\overset{ˇ}{D}}_{ϕ} (g \frac{f_{a, n}}{g_{a}}, f_{n})$ .

Then, ǎ is a strongly convergent estimate of a, as defined in proposition 4.1.

Let us also introduce the following sequences (ǎ_k)_k_≥1 and ${({\overset{ˇ}{g}}_{n}^{(k)})}_{k \geq 1}$ , for any given n—see Section 3.2—such that

ǎ_k is an estimate of a_k as defined in proposition 4.2 with ${\overset{ˇ}{g}}_{n}^{(k - 1)}$ instead of g, ${\overset{ˇ}{g}}_{n}^{(k)}$ is defined by ${\overset{ˇ}{g}}_{n}^{(0)} = g$ , ${\overset{ˇ}{g}}_{n}^{(k)} (x) = {\overset{ˇ}{g}}_{n}^{(k - 1)} (x) \frac{f_{{\overset{ˇ}{a}}_{k}, n} ({\overset{ˇ}{a}}_{k}^{⊤} x)}{{[{\overset{ˇ}{g}}^{(k - 1)}]}_{{\overset{ˇ}{a}}_{k}, n} ({\overset{ˇ}{a}}_{k}^{⊤} x)}$ , i.e., ${\overset{ˇ}{g}}_{n}^{(k)} (x) = g (x) Π_{j = 1}^{k} \frac{f_{{\overset{ˇ}{a}}_{j}, n} ({\overset{ˇ}{a}}_{j}^{⊤} x)}{{[{\overset{ˇ}{g}}^{(j - 1)}]}_{{\overset{ˇ}{a}}_{j}, n} ({\overset{ˇ}{a}}_{j}^{⊤} x)}$ .

We also note that ${\overset{ˇ}{g}}_{n}^{(k)}$ is a density.

Convergence Study at the k^th Step of the Algorithm:

In this paragraph, we show that the sequence (ǎ_k)_n converges towards a_k and that the sequence ${({\overset{ˇ}{g}}_{n}^{(k)})}_{n}$ converges towards g^(k).

Let č_n(a)= arg sup_c_∈Θ ℙ_nM(c, a), with a ∈ Θ, and γ̌_n = arg inf_a_∈Θ sup_c_∈Θ ℙ_nM(c, a). We state

Proposition 4.3

Both sup_a∈Θ ‖č_n(a) – a_k‖ and γ̌_n converge toward a_k a.s.

Finally, the following theorem shows that ${\overset{ˇ}{g}}_{n}^{(k)}$ converges almost everywhere towards g^(k):

Theorem 4.1

It holds ${\overset{ˇ}{g}}_{n}^{(k)} \to_{n} g^{(k)}$ a.s.

Testing of the Criteria

In this paragraph, through a test of our criteria, namely $a \mapsto D_{ϕ} ({\overset{ˇ}{g}}_{n}^{(k)} \frac{f_{a, n}}{{[{\overset{ˇ}{g}}^{(k)}]}_{a, n}}, f_{n})$ , we build a stopping rule for this procedure. First, the next theorem enables us to derive the law of our criteria:

Theorem 4.2

For a fixed k, we have $\sqrt{n} {({Var}_{P} (M ({\overset{ˇ}{c}}_{n} ({\overset{ˇ}{γ}}_{n}), {\overset{ˇ}{γ}}_{n})))}^{- 1 / 2} (ℙ_{n} M ({\overset{ˇ}{c}}_{n} ({\overset{ˇ}{γ}}_{n}), {\overset{ˇ}{γ}}_{n}) - ℙ_{n} M (a_{k}, a_{k})) \underset{\to}{ℒ a w} N (0, I)$ , where k represents the k^th step of our algorithm and where I is the identity matrix in ℝ^d.

Note that k is fixed in theorem 4.2 since γ̌_n = arg inf _a_∈Θ sup_c_∈Θ ℙ_nM(c, a) where M is a known function of k—see Section 4.1. Thus, in the case when $D_{ϕ} (g^{(k - 1)} \frac{f_{a_{k}}}{g_{a_{k}}^{(k - 1)}}, f) = 0$ , we obtain

Corollary 4.1

We have $\sqrt{n} {({Var}_{P} (M ({\overset{ˇ}{c}}_{n} ({\overset{ˇ}{γ}}_{n}), {\overset{ˇ}{γ}}_{n})))}^{- 1 / 2} ℙ_{n} M ({\overset{ˇ}{c}}_{n} ({\overset{ˇ}{γ}}_{n}), {\overset{ˇ}{γ}}_{n}) \underset{\to}{ℒ a w} N (0, I)$ .

Hence, we propose the test of the null hypothesis

$(H_{0}) : D_{ϕ} (g^{(k - 1)} \frac{f_{a_{k}}}{g_{a_{k}}^{(k - 1)}}, f) = 0$ versus the alternative $(H_{1}) : D_{ϕ} (g^{(k - 1)} \frac{f_{a_{k}}}{g_{a_{k}}^{(k - 1)}}, f) \neq 0$ .

Based on this result, we stop the algorithm, then, defining a_k as the last vector generated, we derive from corollary 4.1 a α-level confidence ellipsoid around a_k, namely $ℰ_{k} = {b \in ℝ^{d}; \sqrt{n} {({Var}_{P} (M (b, b)))}^{- 1 / 2} ℙ_{n} M (b, b) \leq q_{α}^{N (0, 1)}}$ , where $q_{α}^{N (0, 1)}$ is the quantile of a α-level reduced centered normal distribution and where ℙ_n is the empirical measure arising from a realization of the sequences (X₁, …, X_n) and (Y₁, …, Y_n).

Consequently, the following corollary provides us with a confidence region for the above test:

Corollary 4.2

$ℰ$ _k is a confidence region for the test of the null hypothesis (H₀) versus (H₁).

5. Goodness-of-Fit Tests

5.1. The Basic Idea

Let f be a density defined on ℝ². Let us also consider g, a known elliptical density with the same mean and variance as f. Let us also assume that the family (a_i) is the canonical basis of ℝ² and that D_ϕ(g⁽²⁾, f) = 0.

Hence, since lemma C.1 page 110 implies that $g_{a_{j}}^{(j - 1)} = g_{a_{j}}$ if j ≤ d, we then have $g^{(2)} (x) = g (x) \frac{f_{1}}{g_{1}} \frac{f_{2}}{g_{2}^{(1)}} = g (x) \frac{f_{1}}{g_{1}} \frac{f_{2}}{g_{2}}$ . Moreover, we get f with g⁽²⁾ = f, as derived from property B.1 page 110.

Consequently, $f = g (x) \frac{f_{1}}{g_{1}} \frac{f_{2}}{g_{2}}$ , i.e., $\frac{f}{f_{1} f_{2}} = \frac{g}{g_{1} g_{2}}$ , and then $\frac{\partial^{2}}{\partial x \partial y} C_{f} = \frac{\partial^{2}}{\partial x \partial y} C_{g}$ where C_f (resp. C_g) is the copula of f (resp. g).

More generally, if f is defined on ℝ^d, then the family (a_i) is once again free (see lemma C.5), i.e., the family (a_i) is once again a basis of ℝ^d. The relationship D_ϕ(g^(d), f) = 0 therefore implies that g^(d) = f, i.e., for any x ∈ ℝ^d, $f (x) = g^{(d)} (x) = g (x) Π_{k = 1}^{d} \frac{f_{a_{k}} (a_{k}^{⊤} x)}{{[g^{(k - 1)}]}_{a_{k}} (a_{k}^{⊤} x)} = g (x) Π_{k = 1}^{d} \frac{f_{a_{k}} (a_{k}^{⊤} x)}{g_{a_{k}} (a_{k}^{⊤} x)}$ since lemma C.1 page 110 implies that $g_{a_{k}}^{(k - 1)} = g_{a_{k}}$ if k ≤ d. In other words, for any x ∈ ℝ^d, it holds

\frac{g (x)}{Π_{k = 1}^{d} g_{a_{k}} (a_{k}^{⊤} x)} = \frac{f (x)}{Π_{k = 1}^{d} f_{a_{k}} (a_{k}^{⊤} x)}

(5.1)

Finally, putting A = (a₁, …, a_d) and defining vector y (resp. density f̃, copula C̃_f of f̃, density g̃, copula C̃_g of g̃) as the expression of vector x (resp. density f, copula C_f of f, density g, copula C_g of g) in basis A, then, the following proposition provides us with the density associated with the copula of f as being equal to the density associated with the copula of g in basis A :

Proposition 5.1

With the above notations, should a sequence (a_i)_i=1,…d of not null vectors in $ℝ_{*}^{d}$ defining g^(d) and such that D_ϕ(g^(d), f) = 0 exist, then $\frac{\partial^{d}}{\partial y_{1} \dots \partial y_{d}} {\tilde{C}}_{f} = \frac{\partial^{d}}{\partial y_{1} \dots \partial y_{d}} {\tilde{C}}_{g}$ .

5.2. With the Elliptical Copula

Let f be an unknown density defined on ℝ^d. The objective of the present section is to determine whether the copula of f is elliptical. We thus define an instrumental elliptical density g with the same mean and variance as f, and we follow the procedure of Section 3.2. As explained in Section 5.1, we infer from proposition 5.1 that the copula of f equals the copula of g when D_ϕ(g^(d), f) = 0, i.e., when a_d is the last vector generated from the algorithm and when (a_i) is the canonical basis of ℝ^d. Thus, in order to verify this assertion, corollary 4.1 page 96 provides us with a α-level confidence ellipsoid around this vector, namely

ℰ_{d} = {b \in ℝ^{d}; \sqrt{n} {({Var}_{P} (M (b, b)))}^{- 1 / 2} ℙ_{n} M (b, b) \leq q_{α}^{N (0, 1)}}

where

q_{α}^{N (0, 1)}

is the quantile of a α-level reduced centered normal distribution, where ℙ_n is the empirical measure arising from a realization of the sequences (X₁, …, X_n) and (Y₁, …, Y_n)—see Appendix D—and where M is a known function of d, f_n and

g_{n}^{(d - 1)}

—see Section 4.1.

Consequently, keeping the notations introduced in Section 5.1, we perform a statistical test of the null hypothesis

(H_{0}) : \frac{\partial^{d}}{\partial x_{1} \dots \partial x_{d}} C_{f} = \frac{\partial^{d}}{\partial x_{1} \dots \partial x_{d}} C_{g} versus (H_{1}) : \frac{\partial^{d}}{\partial x_{1} \dots \partial x_{d}} C_{f} \neq \frac{\partial^{d}}{\partial x_{1} \dots \partial x_{d}} C_{g}

Since, under (H₀), we have D_ϕ(g^(d), f) = 0, then the following theorem provides us with a confidence region for this test.

Theorem 5.1

The set $ℰ$ _d is a confidence region for the test of the null hypothesis (H₀) versus the alternative (H₁).

Remark 5.1

1/If D_ϕ(g^(k), f) = 0, for k < d, then we reiterate the algorithm until g^(d) is created in order to obtain a relationship for the copula of f.

2/If the a_i do not constitute the canonical basis, then keeping the notations introduced in Section 5.1, our algorithm meets the test:

(H_{0}) : \frac{\partial^{d}}{\partial y_{1} \dots \partial y_{d}} {\tilde{C}}_{f} = \frac{\partial^{d}}{\partial y_{1} \dots \partial y_{d}} {\tilde{C}}_{g} versus (H_{1}) : \frac{\partial^{d}}{\partial y_{1} \dots \partial y_{d}} {\tilde{C}}_{f} \neq \frac{\partial^{d}}{\partial y_{1} \dots \partial y_{d}} {\tilde{C}}_{g}

Thus, our method permits to determine whether the copula of f equals the copula of g in the (a₁, …, a_d) basis.

5.3. With the Independent Copulas

Let f be a density on ℝ^d and let X be a random vector with f as density. The objective of this section is to determine whether f is the product of its margins, i.e., whether the copula of f is the independent copula. Let g be an instrumental product of univariate Gaussian density—with diag(Var(X₁),…, Var(X_d)) as covariance matrix and with the same mean as f. As explained at Section 5.2, we follow the procedure described at Section 3.2, i.e., proposition 5.1 infers that the copula of f is the independent copula when D_ϕ(g^(d), f) = 0, we then perform a statistical test of the null hypothesis:

(H_{0}) : f = Π_{i = 1}^{d} f_{i} versus the alternative (H_{1}) : f \neq Π_{i = 1}^{d} f_{i}

Since, under (H₀), we have D_ϕ(g^(d), f) = 0, the following theorem provides us with a confidence region for our test.

Theorem 5.2

Keeping the notations of Section 5.2, the set $ℰ$ _d is a confidence region for the test of the null hypothesis (H₀) versus the alternative (H₁).

Remark 5.2

(1) As explained in Section 5.2, if D_ϕ(g^(k), f) = 0, for k < d, we reiterate the algorithm until g^(d) is created in order to derive a relationship for the copula of f.

(2) If the a_i do not constitute the canonical basis, then keeping the notations of Section 5.1, our algorithm meets the test:

(H_{0}) : f = Π_{i = 1}^{d} f_{a_{i}} versus the alternative (H_{1}) : f \neq Π_{i = 1}^{d} f_{a_{i}}

Thus, our method enables us to determine if the copula of f is the independent copula in the (a₁, …, a_d) basis.

5.4. Study of the Subsequence (g^(k′)) Defined by D_ϕ(g^(k′), f) = 0 for Any k′

Let Algorithms 04 00087i1 be the set of non-negative integers defined by $Q = {k_{i}^{'}; k_{1}^{'} = 1, k_{q}^{'} = d, k_{i}^{'} < k_{i + 1}^{'}}$ , where q—such that q ≤ d—is its cardinal. In the present section, our goal is to study the subsequence (g⁽^k^′)) of the sequence (g^(k))_k_=1‥d defined by D_ϕ(g^(k′), f) = 0 for any k′ belonging to Algorithms 04 00087i1 .

First, we have:

D_ϕ(g^(d), f) = 0 ⇔ g^(d) = f, through property B.1
$\Leftrightarrow \frac{g (x)}{Π_{k = 1}^{d} g_{a_{k}} (a_{k}^{⊤} x)} = \frac{f (x)}{Π_{k = 1}^{d} f_{a_{k}} (a_{k}^{⊤} x)}$ , as explained in Section 5.2
$\Leftrightarrow \frac{\tilde{g} (y)}{Π_{k = 1}^{d} {\tilde{g}}_{k} (y_{k})} = \frac{\tilde{f} (y)}{Π_{k = 1}^{d} {\tilde{f}}_{k} (y_{k})}$ , which amounts to the previous relationship written in the A = (a₁, …, a_d) basis with the notations introduced in Section 5.2.

Moreover, defining ${\tilde{k}}_{i}^{'}$ as the previous integer $k_{i}^{'}$ , in the space {1, …, d}, with i > 1, and as explained in Section 3.1, the relationship D_ϕ(g^(k′), f) = 0 implies that

\tilde{f} (y_{i}, \dots, y_{{\tilde{k}}_{i + 1}^{'}} / y_{i}, \dots, y_{{\tilde{k}}_{i}^{'}}, y_{{\tilde{k}}_{i + 1}^{'}}, \dots, y_{d}) = {\tilde{f}}_{i, i + 1} (y_{i}, \dots, y_{{\tilde{k}}_{i + 1}^{'}})

where f̃_i,i+1 is the density of vector

(a_{i}^{⊤} X, \dots, a_{{\tilde{k}}_{i + 1}^{'}}^{⊤} X)

in the A = (a₁,…,a_d) basis. Consequently,

\tilde{f} (y) = {\tilde{f}}_{1, 2} (y_{1}, \dots y_{{\tilde{k}}_{2}^{'}}) \cdot {\tilde{f}}_{2, 3} (y_{k_{2}^{'}}, \dots, y_{{\tilde{k}}_{3}^{'}}) \dots {\tilde{f}}_{q - 1, d} (y_{k_{q - 1}^{'}}, \dots, y_{{\tilde{k}}_{d}^{'}})

.

Hence, we can infer that

\frac{\tilde{f} (y)}{Π_{k = 1}^{d} {\tilde{f}}_{k} (y_{k})} = \frac{{\tilde{f}}_{1, 2} (y_{i}, \dots, y_{{\tilde{k}}_{2}^{'}})}{Π_{k = 1}^{{\tilde{k}}_{2}^{'}} {\tilde{f}}_{k} (y_{k})} . \frac{{\tilde{f}}_{2, 3} (y_{{\tilde{k}}_{2}^{'}}, \dots, y_{{\tilde{k}}_{3}^{'}})}{Π_{k = k_{2}^{'}}^{{\tilde{k}}_{3}^{'}} {\tilde{f}}_{k} (y_{k})} \dots \frac{{\tilde{f}}_{q - 1, d} (y_{{\tilde{k}}_{q - 1}^{'}}, \dots, y_{{\tilde{k}}_{d}^{'}})}{Π_{k = {\tilde{k}}_{q - 1}^{'}}^{d} {\tilde{f}}_{k} (y_{k})}

(5.2)

The following theorem explicitly describes the form of the f copula in the A = (a₁, …, a_d) basis:

Theorem 5.3

Defining C̃_{f_i,j} as the copula of f̃_i,j and keeping the notations introduced in Sections 5.1 and 5.4, it holds

\frac{\partial^{d}}{\partial y_{1} \dots \partial y_{d}} {\tilde{C}}_{f} = \frac{\partial^{{\tilde{k}}_{2}^{'}}}{\partial y_{1} \dots \partial y_{{\tilde{k}}_{2}^{'}}} {\tilde{C}}_{f_{1, 2}} . \frac{\partial^{{\tilde{k}}_{3}^{'} - k_{2}^{'} + 1}}{\partial y_{k_{2}^{'}} \dots \partial y_{{\tilde{k}}_{3}^{'}}} {\tilde{C}}_{f_{2, 3}} \dots \frac{\partial^{d - k_{q - 1}^{'} + 1}}{\partial y_{k_{q - 1}^{'}} \dots \partial y_{d}} {\tilde{C}}_{f_{q - 1, d}}

Remark 5.3

If there exists i such that i < d and $k_{i}^{'} = {\tilde{k}}_{i + 1}^{'}$ , then the notation ${\tilde{f}}_{i, i + 1} (y_{k_{i}^{'}}, \dots, y_{{\tilde{k}}_{i + 1}^{'}})$ means ${\tilde{f}}_{k_{i}^{'}} (y_{k_{i}^{'}})$ . Thus, if, for any k, we have D_ϕ(g^(k), f) = 0, then, for any i < d, we have $k_{i}^{'} = {\tilde{k}}_{i + 1}^{'}$ , i.e., we have $\tilde{f} = Π_{k = 1}^{d} {\tilde{f}}_{k} (y_{k})$ , where f̃_k is the k^th marginal density of f̃.

At present, using relationship 5.2 and remark 5.3, the following corollary gives us the copula of f as equals to 1 in the {a₁, …, a_d} basis when, for any k, D_ϕ(g⁽^k^′), f) = 0:

Corollary 5.1

In the case where, for any k, D_ϕ(g^(k), f) = 0, it holds:

\frac{\partial^{d}}{\partial y_{1} \dots \partial y_{d}} {\tilde{C}}_{f} = 1

6. Simulations

Let us examine three simulations and an application to real datasets. The first simulation studies the elliptical copula and the second studies the independent copula. In each simulation, our program will aim at creating a sequence of densities (g^(j)), j = 1,‥,d such that g⁽⁰⁾ = g, g^(j) = g⁽^j⁻¹⁾f_{a_j}/[g⁽^j⁻¹⁾]_{a_j} and D_ϕ(g^(d), f) = 0, where D_ϕ is a divergence—see Appendix B for its definition—and $a_{j} = arg {inf}_{b} D_{ϕ} (g^{(j - 1)}) f_{b} / g_{b}^{(j - 1)}, f)$ , for all j = 1, …, d. We will therefore perform the tests introduced at theorems 5.1 and 5.2. Finally, the third simulation compares the optimisations obtained, when we execute the process with, each time, a new ϕ-divergence.

Simulation 6.1

We are in dimension 2(=d), and we use the χ² divergence to perform our optimisations. Let us consider a sample of 50(=n) values of a random variable X with a density law f defined by :

f (x) = c_{ρ} (F_{Gumbel} (x_{1}), F_{Exponential} (x_{2})) . Gumbel (x_{1}) . Exponential (x_{2})

where c is the Gaussian copula with correlation coefficient ρ = 0.5, and where the Gumbel distribution parameters are −1 and 1 and the exponential density parameter is 2.

Let us generate then a Gaussian random variable Y with a density—that we will name as g—presenting the same mean and variance as f.

We theoretically obtain k = 2 and (a₁, a₂) = ((1, 0), (0, 1)).

To get this result, we perform the following test:

(H_{0}) : (a_{1}, a_{2}) = ((1, 0), (0, 1)) versus (H_{1}) : (a_{1}, a_{2}) \neq ((1, 0), (0, 1))

Then, theorem 5.1 enables us to verify (H₀) by the following 0.9(=α) level confidence ellipsoid

ℰ_{2} = {b \in ℝ^{2}; {({Var}_{P} (M (b, b)))}^{(- 1 / 2)} ℙ_{n} M (b, b) \leq q_{α}^{N (0, 1)} / \sqrt{n} ≃ 0, 2533 / 7.0710 = 0.03582}

Results of this optimisation can be found in Table 3 and Figure 1.

Therefore, we can conclude that H₀ is verified.

Simulation 6.2

We are in dimension 2(=d), and we use the χ² divergence to perform our optimisations.

Let us consider a sample of 50(=n) values of a random variable X with a density law f defined by

f (x) = Gumbel (x_{1}) . Exponential (x_{2})

where the Gumbel distribution parameters are −1 and 1 and the exponential density parameter is 2.

Let g be an instrumental product of univariate Gaussian densities with diag(V ar(X₁), …, V ar(X_d)) as covariance matrix and with the same mean as f.

We theoretically obtain k = 2 and (a₁, a₂) = ((1, 0), (0, 1)). To get this result, we perform the following test:

(H_{0}) : (a_{1}, a_{2}) = ((1, 0), (0, 1)) versus (H_{1}) : (a_{1}, a_{2}) \neq ((1, 0), (0, 1))

Then, theorem 5.2 enables us to verify (H₀) by the following 0.9(=α) level confidence ellipsoid

ℰ_{2} = {b \in ℝ^{2}; {({Var}_{P} (M (b, b)))}^{(- 1 / 2)} ℙ_{n} M (b, b) \leq q_{α}^{N (0, 1)} / \sqrt{n} ≃ 0.03582203}

Results of this optimisation can be found in Table 4 and Figure 2.

Therefore, we can conclude that $f = Π_{i = 1}^{d} f_{i}$ .

Simulation 6.3

(On the choice of a ϕ-divergence). In this paragraph, we perform our algorithm several times. We first use several ϕ-divergences (see Appendix B for their definitions and their notations). We then perform a sensitivity analysis by varying the number n of simulated variables. Finally we introduce outliers.

At present, we consider a sample of n values of a random variable X with a density f defined by f(x) = Laplace(x₁).Gumbel(x₀),

where the Gumbel distribution parameters are (1, 2) and where the Laplace distribution parameters are 4 and 3. In theory, we get a₁ = (0, 1) and a₂ = (1, 0). Then, following the procedure of the first simulation, we get


n = 50	Outliers = 0	Time	Outliers = 2	Time
Relative Entropy	(0.10, 0.83) (1.13, 0.11)	30 mn	(0.1, 0.8) (0.80, 0.024)	43 mn
χ²-divergence	(0, 0.8) (1.021, 0.09)	22 mn	(0.12, 0.79) (0.867, −0.104)	31 mn
Hellinger distance	(0.1, 0.9) (0.91, 0.15)	35 mn	(0.1, 0.85) (0.81, 0.14)	46 mn

n = 100	Outliers = 0	Time	Outliers = 5	Time
Relative Entropy	(0.09, 0.89) (1.102, 0.089)	50 mn	(0.1, 0.88) (1.15, 0.144)	60 mn
χ²-divergence	(0, 0.9) (0.97, −0.1)	43 mn	(−0.1, 0.9) (0.87, 0.201)	52 mn
Hellinger distance	(0.1, 0.91) (0.93, −0.11)	57 mn	(−0.05, 1.1) (0.79, 0.122)	62 mn

n = 500	Outliers = 0	Time	Outliers = 25	Time
Relative Entropy	(0, 1.07) (1.1, −0.05)	107 mn	(0.13, 0.75) (0.79, 0.122)	121 mn
χ²-divergence	(0, 0.95) (1.12, −0.02)	91 mn	(0.15, 0.814 (0.922, 0.147)	103 mn
Hellinger distance	(−0.01, 0.95) (1.01, −0.073)	100 mn	(−0.17, 1.3) (0.973, 0.206)	126 mn

Remark 6.1

We have worked with a calculator presenting the following characteristics :
-
Processor : Mobile AMD 3000+,
-
Memory RAM : 512 DDR,
-
Windows XP.
Our method, which uses the χ² as ϕ-divergence, is faster and its performance is as good if not better than any other divergence method.

This results from the fact that the projection index (or criteria) of χ² is a second degree polynomial. It is consequently easier and faster to assess. Moreover, these simulations illustrate the robustness of our method.

6.1. Application to Real Datasets

Let us for instance study the moves in the stock prices of Renault and Peugeot from January 4, 2010 to July 25, 2010. We thus gather 140(=n) data from these stock prices, see Table 7 and Table 8 below.

Let us also consider X₁ (resp. X₂) the random variable defining the stock price of Renault (resp. Peugeot). We will assume—as it is commonly done in mathematical finance—that the stock market abides by the classical hypotheses of the Black-Scholes model—see Black and Scholes [34].

Consequently, X₁ and X₂ each present a log-normal distribution as probability distribution.

Let f be the density of vector (ln(X₁), ln(X₂)), let us now apply our algorithm to f with the Kullback-Leibler divergence as ϕ-divergence. Let us generate then a Gaussian random variable Y with a density—that we will name as g—presenting the same mean and variance as f.

We first assume that there exists a vector a such that $D_{ϕ} (g \frac{f_{a}}{g_{a}}, f) = 0$ .

In order to verify this hypothesis, our reasoning will be the same as in Simulation 6.1. Indeed, we assume that this vector is a co-factor of f. Consequently, corollary 4.2 enables us to estimate a by the following 0.9(=α) level confidence ellipsoid

ℰ_{1} = {b \in ℝ^{2}; {({Var}_{P} (M (b, b)))}^{(- 1 / 2)} ℙ_{n} M (b, b) \leq q_{α}^{N (0, 1)} / \sqrt{n} ≃ 0, 2533 / \sqrt{140} = 0.02140776} .

Numerical results of the first projection are summarized in Table 5.

Therefore, our first hypothesis is confirmed.

However, our goal is to study the copula of (ln(X₁), ln(X₂)). Then, as explained in Section 5.4, we formulate another hypothesis assuming that there exists a vector a such that $D_{ϕ} (g^{(1)} \frac{f_{a}}{g_{a}^{(1)}}, f) = 0$ .

In order to verify this hypothesis, we use the same reasoning as above. Indeed, we assume that this vector is a co-factor of f. Consequently, corollary 4.2 enables us to estimate a by the following 0.9(=α) level confidence ellipsoid $ℰ_{2} = {b \in ℝ^{2}; {({Var}_{P} (M (b, b)))}^{(- 1 / 2)} ℙ_{n} M (b, b) \leq q_{α}^{N (0, 1)} / \sqrt{n} ≃ 0, 2533 / \sqrt{140} = 0.02140776}$ . Numerical results of the second projection are summarized in Table 6.

Therefore, our second hypothesis is confirmed.

In conclusion, as explained in corollary 5.1, the copula of f is equal to 1 in the {a₁, a₂} basis.

This result has been illustrated at Figures 3, 4 and 5.

6.2. Critics of the Simulations

In the case where f is unknown, we will never be sure to have reached the minimum of the ϕ-divergence: the simulated annealing method has been used to solve our optimisation problem, and therefore it is only when the number of random jumps tends in theory towards infinity that the probability to get the minimum tends to 1. We also note that no theory on the optimal number of jumps to implement does exist, as this number depends on the specificities of each particular problem.

Moreover, we choose the $50^{- \frac{4}{4 + d}}$ for the AMISE of the two simulations. This choice leads us to simulate 50 random variables—see Scott [23] page 151, none of which have been discarded to obtain the truncated sample.

This has also been the case in our application to real datasets.

Finally, the shape of the copula in the case of real datasets in the {a₁, a₂} basis is also noteworthy.

Figure 4 shows that the curve reaches a quite wide plateau around 1, whereas Figure 5 shows that this plateau prevails on almost the entire [0, 1]² set. We can therefore conclude that the theoretical analysis is indeed confirmed by the above simulation.

6.3. Conclusions

Projection pursuit is useful in evidencing characteristic structures as well as one-dimensional projections and their associated distribution in multivariate data. This article clearly demonstrates the efficiency of the φ-projection pursuit methodology for goodness-of-fit tests for copulas. Indeed, the robustness as well as the convergence results that we achieved convincingly fulfilled our expectations regarding the methodology used.

Figure 1. Graph of the estimate of (x₁, x₂) ↦ c_ρ(F_Gumbel(x₁), F_Exponential(x₂)).

Figure 2. Graph of the independent copula estimate.

Figure 3. Graph of the copula of (ln(X₁), ln(X₂)) in the canonical basis.

Figure 4. Graph of the copula of (ln(X₁), ln(X₂)) in the {a₁, a₂} basis.

Figure 5. Graph of the copula of (ln(X₁), ln(X₂)) in the {a₁, a₂} basis—other view.

Table 1. Proposal.

**Table 1.** Proposal.
0.	We define g, a density with same mean and variance as f and we set g⁽⁰⁾ = g.

i − 1.	We perform the goodness-of-fit test D_ϕ(g⁽ⁱ⁻¹⁾, f) = 0:
	• Should this test be passed, we derive f from $f = g Π_{i = 1}^{j} \frac{f_{a_{i}}}{g_{a_{i}}^{(i - 1)}}$
	And the algorithm stops.
	• Should this test not be verified, and should we look to approximate f, when we get to the d^th iteration of the algorithm, we derive f from $f = g Π_{i = 1}^{d} \frac{f_{a_{i}}}{g_{a_{i}}^{(i - 1)}}$
	Otherwise, let us define a vector a_i and a density g⁽ⁱ⁾ by $a_{i} = arg {inf}_{a \in ℝ_{*}^{d}} D_{ϕ} (g^{(i - 1)} \frac{f_{a}}{g_{a}^{(i - 1)}}, f), and g^{(i)} = g^{(i - 1)} \frac{f_{a_{i}}}{g_{a_{i}}^{(i - 1)}}$

i.	Then we replace g⁽ⁱ⁻¹⁾ with g⁽ⁱ⁾ and go back to i − 1.

Table 2. Stochastic outline of the algorithm.

**Table 2.** Stochastic outline of the algorithm.
0.	We define g, a density with same mean and variance as f and we set g⁽⁰⁾ = g.

i − 1.	Given ${\overset{ˇ}{g}}_{n}^{(i)}$ , find ǎ_i such that the index is minimized, where f_a,n is a marginal density estimate based on a^⊤X₁, a^⊤X₂,…,a^⊤X_n, and where is a density estimate based on the projection to a of a Monte Carlo random sample from .
i − 1.	And we set ${\overset{ˇ}{g}}_{n}^{(i)} = {\overset{ˇ}{g}}_{n}^{(i - 1)} \frac{f_{{\overset{ˇ}{a}}_{1}, n}}{{\overset{ˇ}{g}}_{{\overset{ˇ}{a}}_{1}, n}^{(i - 1)}}$
i	Then we replace ${\overset{ˇ}{g}}_{n}^{(i)}$ with and go back to i − 1 until the criteria reaches the stopping rule of this procedure (see below).

Table 3. Simulation 1: Numerical results of the optimisation.

**Table 3.** Simulation 1: Numerical results of the optimisation.
Our Algorithm
Projection Study 0:	minimum : 0.445199
	at point : (1.0171,0.0055)
	P-Value : 0.94579
Test:	H₁ : a₁ ∉ $ℰ$ ₁ : True
Projection Study 1:	minimum : 0.009628
	at point : (0.0048,0.9197)
	P-Value : 0.99801
Test:	H₀ : a₂ ∈ $ℰ$ ₂ : True
χ²(Kernel Estimation of g⁽²⁾, g⁽²⁾)	3.57809

Table 4. Simulation 2: Numerical results of the optimisation.

**Table 4.** Simulation 2: Numerical results of the optimisation.
Our Algorithm
Projection Study 0 :	minimum : 0.057833
	at point : (0.9890,0.1009)
	P-Value : 0.955651
Test :	H₁ : a₁ ∉ $ℰ$ ₁ : True
Projection Study 1 :	minimum : 0.02611
	at point : (−0.1105,0.9290)
	P-Value : 0.921101
Test :	H₀ : a₂∈ $ℰ$ ₂ : True
χ²(Kernel Estimation of g⁽²⁾, g⁽²⁾)	1.25945

Table 5. Numerical results: First projection.

**Table 5.** Numerical results: First projection.
Our Algorithm
Projection Study 0:	minimum : 0.02087685
	at point : a₁=(19.1,-12.3)
	P-Value : 0.748765
Test:	H₀ : a₁ ∈ $ℰ$ ₁ : True
K(Kernel Estimation of g⁽¹⁾, g⁽¹⁾	4.3428735

Table 6. Numerical results: Second projection.

**Table 6.** Numerical results: Second projection.
Our Algorithm
Projection Study 1:	minimum : 0.0198753
	at point : a₂=(8.1,3.9)
	P-Value : 0.8743401
Test:	H₀ : a₂ ∈ $ℰ$ ₂ : True
K(Kernel Estimation of g⁽²⁾, g⁽²⁾)	4.38475324

Table 7. Stock prices of Renault and Peugeot.

**Table 7.** Stock prices of Renault and Peugeot.
Date	Renault	Peugeot	Date	Renault	Peugeot	Date	Renault	Peugeot
23/07/10	34.9	24.2	22/07/10	34.26	24.01	21/07/10	33.15	23.3
20/07/10	32.69	22.78	19/07/10	33.24	23.36	16/07/10	33.92	23.77
15/07/10	34.44	23.71	14/07/10	35.08	24.36	13/07/10	35.28	24.37
12/07/10	33.84	23.16	09/07/10	33.46	23.13	08/07/10	33.08	22.65
07/07/10	32.15	22.19	06/07/10	31.12	21.56	05/07/10	30.02	20.81
02/07/10	30.17	20.85	01/07/10	29.56	20.05	30/06/10	30.78	21.07
29/06/10	30.55	20.97	28/06/10	32.34	22.3	25/06/10	31.35	21.68
24/06/10	32.29	22.25	23/06/10	33.58	22.47	22/06/10	33.84	22.77
21/06/10	34.06	23.25	18/06/10	32.89	22.7	17/06/10	32.08	22.31
16/06/10	31.87	21.92	15/06/10	32.03	22.12	14/06/10	31.45	22.2
11/06/10	30.62	21.42	10/06/10	30.42	20.93	09/06/10	29.27	20.34
08/06/10	28.48	19.73	07/06/10	28.92	20.15	04/06/10	29.19	20.27
03/06/10	30.35	20.46	02/06/10	29.33	19.53	01/06/10	28.87	19.45
31/05/10	29.39	19.54	28/05/10	29.16	19.55	27/05/10	29.18	19.81
26/05/10	27.5	18.5	25/05/10	26.76	18.08	24/05/10	28.75	18.81
21/05/10	28.78	18.82	20/05/10	28.53	18.84	19/05/10	29.49	19.25
18/05/10	30.95	19.76	17/05/10	30.92	19.35	14/05/10	31.35	19.34
13/05/10	33.65	20.76	12/05/10	33.63	20.52	11/05/10	33.38	20.34
10/05/10	33.28	20.3	07/05/10	31	19.24	06/05/10	32.4	20.22
05/05/10	32.95	20.45	04/05/10	33.3	21.03	03/05/10	35.58	22.63
30/04/10	35.41	22.45	29/04/10	35.53	22.36	28/04/10	34.75	22.33

Table 8. Stock prices of Renault and Peugeot.

**Table 8.** Stock prices of Renault and Peugeot.
Date	Renault	Peugeot	Date	Renault	Peugeot	Date	Renault	Peugeot
27/04/10	36.2	22.9	26/04/10	37.65	23.73	23/04/10	36.72	23.5
22/04/10	34.36	22.72	21/04/10	35.01	22.86	20/04/10	35.62	22.88
19/04/10	34.08	21.77	16/04/10	34.46	21.71	15/04/10	35.16	22.22
14/04/10	35.1	22.22	13/04/10	35.28	22.45	12/04/10	35.17	21.85
09/04/10	35.76	21.9	08/04/10	35.67	21.67	07/04/10	36.5	21.89
06/04/10	36.87	22	01/04/10	35.5	21.97	31/03/10	34.7	21.8
30/03/10	34.8	22.24	29/03/10	35.7	22.73	26/03/10	35.54	22.58
25/03/10	35.53	22.73	24/03/10	33.8	21.82	23/03/10	34.1	21.58
22/03/10	33.73	21.64	19/03/10	34.12	21.68	18/03/10	34.44	21.75
17/03/10	34.68	21.98	16/03/10	34.33	21.88	15/03/10	33.57	21.53
12/03/10	33.9	21.86	11/03/10	33.27	21.58	10/03/10	33.12	21.47
09/03/10	32.69	21.54	08/03/10	32.99	21.66	05/03/10	32.89	21.85
04/03/10	31.64	21.26	03/03/10	31.65	20.7	02/03/10	31.05	20.2
01/03/10	30.26	19.54	26/02/10	30.2	19.39	25/02/10	29.42	18.98
24/02/10	30.9	19.49	23/02/10	30.54	19.74	22/02/10	31.89	20.06
19/02/10	32.29	20.67	18/02/10	32.26	20.41	17/02/10	31.69	20.31
16/02/10	31.08	19.8	15/02/10	30.25	19.66	12/02/10	29.56	19.57
11/02/10	31	20.4	10/02/10	32.78	21.21	09/02/10	33.31	22.31
08/02/10	32.63	21.95	05/02/10	32.15	22.33	04/02/10	33.72	22.86
03/02/10	35.32	23.93	02/02/10	35.29	23.8	01/02/10	35.31	24.05
29/01/10	34.26	23.64	28/01/10	33.94	23.31	27/01/10	33.85	23.88
26/01/10	34.97	24.86	25/01/10	35.06	24.35	22/01/10	35.7	24.95
21/01/10	36.1	25	20/01/10	36.92	25.35	19/01/10	38.4	25.81
18/01/10	39.28	25.95	15/01/10	38.6	25.7	14/01/10	39.56	26.67
13/01/10	39.49	26.13	12/01/10	38.36	25.98	11/01/10	39.21	26.65
08/01/10	39.38	26.5	07/01/10	39.69	26.7	06/01/10	39.25	26.32
05/01/10	38.31	24.74	04/01/10	38.2	24.52

Appendix

All the demonstrations of this article have been gathered in the Technical Report [24].

A. On the Different Families of Copula

There exists many copula families. Let us here present the most important amongst them.

A.1. Elliptical Copulas

The Gaussian Copula

The Gaussian copula can be used in several fields. For example, many credit models are built from this copula, which also presents the property to make extreme values (minimal or maximal) independent in the limit; see Joe [2] for more details. For example, in ℝ², it is derived from the bivariate normal distribution and from Sklar's theorem. Defining Ψ_ρ as the standard bivariate normal cumulative distribution function with ρ correlation, the Gaussian copula function is C_ρ(u, v) = Ψ_ρ (Ψ⁻¹(u), Ψ⁻¹(v)) where u, v ∈ [0, 1] and where Ψ is the standard normal cumulative distribution function. Then, the copula density function is :

c_{ρ} (u, v) = \frac{ψ_{X, Y, ρ} (Ψ^{- 1} (u)), (Ψ^{- 1} (v))}{ψ (Ψ^{- 1} (u)) ψ (Ψ^{- 1} (v))}

where

ψ_{X, Y, ρ} (x, y) = \frac{1}{2 π \sqrt{1 - ρ^{2}}} exp (- \frac{1}{2 (1 - ρ^{2})} [x^{2} + y^{2} - 2 ρ x y])

is the density function for the standard bivariate Gaussian with Pearson product-moment correlation coefficient ρ and where ψ is the standard normal density. This definition can obviously be extended to ℝ^d.

The Elliptical Copula

Let us begin with defining the class of elliptical distributions and its properties—see also Cambanis [17], Landsman [18]:

Definition A.1

X is said to abide by a multivariate elliptical distribution, denoted X ∼ E_d(μ, Σ, ξ_d), if X has the following density, for any x in ℝ^d:

f_{X} (x) = \frac{α_{d}}{{| \sum |}^{1 / 2}} ξ_{d} (\frac{1}{2} {(x - μ)}^{'} \sum^{- 1} (x - μ))

where Σ is a d × d positive-definite matrix and where μ is a d-column vector,

where ξ_d is referred as the “density generator”,

where α_d is a normalisation constant, such that $α_{d} = \frac{Γ (d / 2)}{{(2 π)}^{d / 2}} {(\int_{0}^{\infty} x^{d / 2 - 1} ξ_{d} (x) d x)}^{- 1}$ ,

with $\int_{0}^{\infty} x^{d / 2 - 1} ξ_{d} (x) d x < \infty$ .

Property A.1

(1) For any X ∼ E_d(μ, Σ, ξ_d), for any m × d matrix with rank m ≤ d, A, and for any m-dimensional vector b, we have AX + b ∼ E_m(Aμ + b, AΣA′, ξ_m).

Therefore, any marginal density of multivariate elliptical distribution is elliptical, i.e., $X = (X_{1}, X_{2}, \dots, X_{d}) ~ E_{d} (μ, \sum, ξ_{d}) \Rightarrow X_{i} ~ E_{1} (μ_{i}, σ_{i}^{2}, ξ_{1})$ , 1 ≤ i ≤ d, with $f_{X_{i}} (x) = \frac{α_{1}}{σ_{i}} ξ_{1} (\frac{1}{2} {(\frac{x - μ_{i}}{σ})}^{2})$ . (2) Corollary 5 of Cambanis [17] states that conditional densities with elliptical distributions are also elliptical. Indeed, if X = (X₁, X₂)′ ∼ E_d(μ, Σ, ξ_d), with X₁ (resp. X₂) of size d₁ < d (resp. d₂ < d), then X₁/(X₂ = a) ∼ E_d₁(μ′, Σ′, ξ_d₁) with $μ' = μ_{1} + \sum_{12} \sum_{22}^{- 1} (a - μ_{2})$ and $\sum' = \sum_{11} - \sum_{12} \sum_{22}^{- 1} \sum_{21}$ , with μ = (μ₁, μ₂) and Σ = (Σ_ij)_1≤i,j≤2.

Remark A.1

Landsman [18] shows that multivariate Gaussian distributions derive from ξ_d(x) = e^−x and that if X = (X₁, …, X_d) has an elliptical density such that its marginals verify E(X_i) < ∞ and $E (X_{i}^{2}) < \infty$ for 1 ≤ i ≤ d, then μ is the mean of X and Σ is a multiple of the covariance matrix of X. Consequently, from now on, we will assume this is indeed the case.

Definition A.2

Let t be an elliptical density on ℝ^k and let q be an elliptical density on ℝ^k′. The elliptical densities t and q are said to belong to the same family of elliptical densities, if their generating densities are ξ_k and ξ_k′ respectively, which belong to a common given family of densities.

Example A.1

Consider two Gaussian densities Algorithms 04 00087i2 (0, 1) and ((0, 0), Id₂). They are said to belong to the same elliptical family as they both present x ↦ e^−x as generating density.

Finally, let us introduce the definition of an elliptical copula which generalizes the above overview of the Gaussian copula:

Definition A.3

Elliptical copulas are the copulas of elliptical distributions.

A.2. Archimedean Copulas

These copulas exhibit a simple form as well as properties such as associativity. They also present a variety of dependent structures. They can generally be defined under the following form

A (u_{1}, u_{2}, \dots, u_{n}) = ξ^{- 1} (\sum_{i = 1}^{n} ξ (F_{i} (u_{i})))

where (u₁, u₂, …, u_n) ∈ [0, 1] ⁿ and where ξ is known as a “generator function”. This ξ function must be at least d – 2 times continuously differentiable, must have a decreasing and convex d – 2 derivative, and must be such that ξ(1) = 0.

Let us now present several examples:

Clayton copula:
The Clayton copula is an asymmetric Archimedean copula, displaying greater dependency in the negative tail than in the positive tail. Let us define X (resp. Y) as the random vector having F (resp G) as cumulative distribution function (CDF). Assuming that the vector (X, Y) has a Clayton copula, then this copula is given by:

$A (x, y) = {(F {(x)}^{θ} + G {(y)}^{θ} - 1)}^{1 / θ}$

And its generator is:

$ξ (x) = x^{θ} - 1$

For θ = 0, the random variables are independent.
Gumbel copula:
The Gumbel copula (Gumbel-Hougard copula) is an asymmetric Archimedean copula, presenting greater dependency in the positive tail than in the negative tail. This copula is given by:

$ξ (x) = {(- ln (x))}^{α}$
Frank copula:
The Frank copula is a symmetric Archimedean copula given by:

$ξ (x) = ln (\frac{e^{α x} - 1}{e^{α} - 1})$

A.3. Periodic Copula

In 2005, Alfonsi and Brigo [25] derived a new way of generating copulas based on periodic functions. Defining h (resp. Algorithms 04 00087i3 ) as a 1-periodic non-negative function that integrates to 1 over [0, 1] (resp. as a double primitive of h), then both

A (u + v) - A (u) - A (v) and - A (u - v) + A (u) + A (- v)

are copula functions, the second one not necessarily being exchangeable.

B. ϕ-Divergence

Let us call h_a the density of a^⊤Z if h is the density of Z. Let φ be a strictly convex function defined by $φ : \bar{ℝ +} \to \bar{ℝ +}$ , and such that φ(1) = 0.

Definition B.1

We define a ϕ-divergence of P from Q, where P and Q are two probability distributions over a space Ω such that Q is absolutely continuous with respect to P, by

D_{ϕ} (Q, P) = \int φ (\frac{d Q}{d P}) d P

(B.1)

The above expression (B.1) is also valid if P and Q are both dominated by the same probability.

The most used distances (Kullback, Hellinger or χ²) belong to the Cressie-Read family (see Cressie-Read [26], Csiszár I. [27] and the books of Friedrich and Igor [28], Pardo Leandro [29] and Zografos K. [30]). They are defined by a specific φ. Indeed,

-: with the Kullback-Leibler divergence, we associate φ(x) = K(x) = xln(x) − x + 1
-: with the Hellinger distance, we associate $φ (x) = H (x) = 2 {(\sqrt{x} - 1)}^{2}$
-: with the χ² distance, we associate $φ (x) = χ^{2} (x) = \frac{1}{2} {(x - 1)}^{2}$
-: more generally, with power divergences, we associate $φ (x) = \frac{x^{γ} - γ x + γ - 1}{γ (γ - 1)}$ , where γ ∈ ℝ \ (0, 1)
-: and, finally, with the L¹ norm, which is also a divergence, we associate φ(x) = |x − 1|.

Let us now expose some well-known properties of divergences.

Property B.1

We have D_ϕ(P, Q) = 0 ⇔ P = Q.

Property B.2

The divergence function Q ↦ D_ϕ(Q, P) is convex and lower semi-continuous for the topology that makes all the applications of the form Q ↦ ∫ fdQ continuous (where f is bounded and continuous), and lower semi-continuous for the topology of the uniform convergence.

Finally, we will also use the following property derived from the first part of corollary (1.29) page 19 of Friedrich and Igor [28],

Property B.3

If T : (X, A) → (Y, B) is measurable and if D_ϕ(P, Q) < ∞, then D_ϕ(P, Q) ≥ D_ϕ(PT⁻¹, QT⁻¹) with equality being reached when T is surjective for (P, Q).

C. Miscellaneous

Lemma C.1

For any p ≤ d, we have $g_{a_{p}}^{(p - 1)} = g_{a_{p}}$ .

Lemma C.2

We have $g (. / a_{1}^{⊤} x, \dots, a_{j}^{⊤} x) = n (a_{j + 1}^{⊤} x, \dots, a_{d}^{⊤} x) = f (. / a_{1}^{⊤} x, \dots, a_{j}^{⊤} x)$ .

Lemma C3

Should there exist a family (a_i)_i=1…d such that $f (x) = n (a_{j + 1}^{⊤} x, \dots, a_{d}^{⊤} x) h (a_{1}^{⊤} x, \dots, a_{j}^{⊤} x)$ , with j < d, with f, n and h being densities, then this family is an orthogonal basis of ℝ^d.

Lemma C.4

${inf}_{a \in ℝ_{*}^{d}} D_{ϕ} (g \frac{f_{a}}{g_{a}}, f)$ is reached when the ϕ-divergence is greater than the L¹ distance as well as the L² distance.

Lemma C.5

Whenever there exists p, p ≤ d, such that D_ϕ(g^(p), f) = 0, then the family of (a_i)_i=1,…,p is free and is orthogonal.

Lemma C.6

For any continuous density f, we have $y_{m} = | f_{m} (x) - f (x) | = O_{P} (m^{- \frac{2}{4 + d}})$ .

D. Study of the Sample

Let X₁, X₂,‥,X_m be a sequence of independent random vectors with the same density f. Let Y₁, Y₂,‥,Y_m be a sequence of independent random vectors with the same density g. Then, the kernel estimators f_m, g_m, f_a,m and g_a,m of f, g, f_a and g_a, for all $a \in ℝ_{*}^{d}$ , almost surely and uniformly converge since we assume that the bandwidth h_m of these estimators meets the following conditions (see Bosq [32]):

(ℋ y p) : h_{m} ↘_{m} 0, m h_{m} ↗_{m} \infty, m h_{m} / L (h_{m}^{- 1}) \to_{m} \infty and L (h_{m}^{- 1}) / LLm \to_{m} \infty,

with L(u) = ln(u ∨ e).

Let us consider

B_{1} (n, a) = \frac{1}{n} \sum_{i = 1}^{n} φ^{'} {\frac{f_{a, n} (a^{⊤} Y_{i})}{g_{a, n} (a^{⊤} Y_{i})} \frac{g_{n} (Y_{i})}{f_{n} (Y_{i})}} \frac{f_{a, n} (a^{⊤} Y_{i})}{g_{a, n} (a^{⊤} Y_{i})} and B_{2} (n, a) = \frac{1}{n} \sum_{i = 1}^{n} φ^{*} {φ^{'} {\frac{f_{a, n} (a^{⊤} X_{i})}{g_{a, n} (a^{⊤} X_{i})} \frac{g_{n} (X_{i})}{f_{n} (X_{i})}}}

Our objective is to estimate the minimum of $D_{ϕ} (g \frac{f_{a}}{g_{a}}, f)$ . To achieve this, samples have to be truncated:

Let us consider now a positive sequence θ_m such that θ_m → 0, $y_{m} / θ_{n}^{2} \to 0$ , where y_m is the almost sure convergence rate of the kernel density estimator— $y_{m} = O_{P} (m^{- \frac{2}{4 + d}})$ , see lemma C.6— $y_{m}^{(1)} / θ_{m}^{2} \to 0$ , where $y_{m}^{(1)}$ is defined by

| φ (\frac{g_{m} (x)}{f_{m} (x)} \frac{f_{b, m} (b^{⊤} x)}{g_{b, m} (b^{⊤} x)}) - φ (\frac{g (x)}{f (x)} \frac{f_{b} (b^{⊤} x)}{g_{b} (b^{⊤} x)}) | \leq y_{m}^{(1)}

for all b in

ℝ_{*}^{d}

and all x in ℝ^d, and finally

\frac{y_{m}^{(2)}}{θ_{m}^{2}} \to 0

, where

y_{n}^{(2)}

is defined by

| φ^{'} (\frac{g_{m} (x)}{f_{m} (x)} \frac{f_{b, m} (b^{⊤} x)}{g_{b, m} (b^{⊤} x)}) - φ^{'} (\frac{g (x)}{f (x)} \frac{f_{b} (b^{⊤} x)}{g_{b} (b^{⊤} x)}) | \leq y_{m}^{(2)}

for all b in

ℝ_{*}^{d}

and all x in ℝ^d.

We then generate f_m, g_m and g_b,m from the starting sample and we select the X_i and Y_i vectors such that f_m(X_i) ≥ θ_m and g_b,m(b^⊤Y_i) ≥ θ_m, for all i and for all $b \in ℝ_{*}^{d}$ .

The vectors meeting these conditions will be called X₁, X₂, …, X_n and Y₁, Y₂, …, Y_n.

Consequently, the next proposition provides us with the condition required to derive our estimates:

Proposition D.1

Using the notations introduced in Broniatowski [33] and in Section 4.1, it holds ${lim}_{n \to \infty} {sup}_{a \in ℝ_{*}^{d}} | (B_{1} (n, a) - B_{2} (n, a)) - D_{ϕ} (g \frac{f_{a}}{g_{a}}, f) | = 0$ .

Remark D.1

With the Kullback-Leibler divergence, we can take for θ_m the expression m^−ν, with $0 < ν < \frac{1}{4 + d}$ .

E. Hypotheses' Discussion

Not all hypotheses will be used simultaneously.

Hypotheses (A1) and (A4) lead us to assume we deal with a saddle point: being used to demonstrate the convergence of č_n(a) and γ_k towards a_k, they make it easier to use the dual form of the divergence. Moreover, since our criteria $a \mapsto D_{ϕ} (g \frac{f_{a}}{g_{a}}, f)$ is differentiable on $ℝ_{*}^{d}$ and continuously differentiable on ℝ^d, these hypotheses can be easily obtained. However, if other discontinuities, for which the criteria can not be extended by continuity, do exist, then the above hypotheses would be very difficult to verify even in very favorable cases.

As shown by the below subsection for relative entropy, hypothesis (A2) generally holds.

Hypotheses (A5) and (A7) are classical hypotheses from which a limit distribution for the criteria can be derived. Yet these hypotheses are difficult to obtain when the criteria $a \mapsto D_{ϕ} (g \frac{f_{a}}{g_{a}}, f)$ admits discontinuities—close to the co-vectors of f—for which it can not be continuously differentiable.

Hypothesis (A6) thus enables to create a stopping rule for the process since this hypothesis is equivalent to the nullity of the application $D_{ϕ} (g \frac{f_{a}}{g_{a}}, f)$ in a_k.

Hypothesis (A0) constitutes an alternative to the starting hypothesis according to which the divergence should be greater than the L¹ distance. Although weaker, this hypothesis also requires that for all i, we have K (g⁽ⁱ⁾, f) ≥ ∫ |f(x) − g⁽ⁱ⁾(x)|dx at each iteration of the algorithm.

E.1. Discussion of (A2)

Let us work with the Kullback-Leibler divergence and with g and a₁.

For all $b \in ℝ_{*}^{d}$ , we have $\int φ^{*} (φ^{'} (\frac{g (x) f_{b} (b^{⊤} x)}{f (x) g_{b} (b^{⊤} x)})) f (x) d x = \int (\frac{g (x) f_{b} (b^{⊤} x)}{f (x) g_{b} (b^{⊤} x)} - 1) f (x) d x = 0$ , since, for any b in $ℝ_{*}^{d}$ , the function $x \mapsto g (x) \frac{f_{b} (b^{⊤} x)}{g_{b} (b^{⊤} x)}$ is a density. The complement of Θ^D_ϕ in $ℝ_{*}^{d}$ is ∅ and then the supremum looked for in ℝ̅ is −∞. We can therefore conclude. It is interesting to note that we obtain the same verification with f, g^(k−1) and a_k.

E.2. Discussion of (A3)

This hypothesis consists in the following assumptions:

(0) We work with the Kullback-Leibler divergence,

(1) We have $f (. / a_{1}^{⊤} x) = g (. / a_{1}^{⊤} x)$ , i.e., $K (g \frac{f_{1}}{g_{1}}, f) = 0$ —we could also derive the same proof with f, g⁽^k⁻¹⁾ and a_k

Preliminary (A)

Shows that $A = {(c, x) \in ℝ_{*}^{d} \ {a_{1}} \times R^{d}; \frac{f_{a_{1}} (a_{1}^{⊤} x)}{g_{a_{1}} (a_{1}^{⊤} x)} > \frac{f_{c} (c^{⊤} x)}{g_{c} (c^{⊤} x)}, g (x) \frac{f_{c} (c^{⊤} x)}{g_{c} (c^{⊤} x)} > f (x)} = \emptyset$ through a reductio ad absurdum, i.e., if we assume A ≠ ∅.

Thus, our hypothesis enables us to derive

f (x) = f (. / a_{1}^{⊤} x) f_{a 1} (a_{1}^{⊤} x) = g (. / a_{1}^{⊤} x) f_{a 1} (a_{1}^{⊤} x) > g (. / c^{⊤} x) f_{c} (c^{⊤} x) > f

since

\frac{f_{a_{1}} (a_{1}^{⊤} x)}{g_{a_{1}} (a_{1}^{⊤} x)} \geq \frac{f_{c} (c^{⊤} x)}{g_{c} (c^{⊤} x)}

implies

g (. / a_{1}^{⊤} x) f_{a 1} (a_{1}^{⊤} x) = g (x) \frac{f_{a_{1}} (a_{1}^{⊤} x)}{g_{a_{1}} (a_{1}^{⊤} x)} \geq g (x) \frac{f_{c} (c^{⊤} x)}{g_{c} (c^{⊤} x)} = g (. / c^{⊤} x) f_{c} (c^{⊤} x)

, i.e., f > f. We can thus conclude.

Preliminary (B)

Shows that $B = {(c, x) \in ℝ_{*}^{d} \ {a_{1}} \times R^{d}; \frac{f_{a_{1}} (a_{1}^{⊤} x)}{g_{a_{1}} (a_{1}^{⊤} x)} < \frac{f_{c} (c^{⊤} x)}{g_{c} (c^{⊤} x)}, g (x) \frac{f_{c} (c^{⊤} x)}{g_{c} (c^{⊤} x)} < f (x)} = \emptyset$ through a reductio ad absurdum, i.e., if we assume B ≠ ∅.

Thus, our hypothesis enables us to derive

f (x) = f (. / a_{1}^{⊤} x) f_{a 1} (a_{1}^{⊤} x) = g (. / a_{1}^{⊤} x) f_{a 1} (a_{1}^{⊤} x) < g (. / c^{⊤} x) f_{c} (c^{⊤} x) < f

We can consequently conclude as above.

Let us now verify (A3):

We have $P M (c, a_{1}) - P M (c, a) = \int l n (\frac{g (x) f_{c} (c^{⊤} x)}{g_{c} (c^{⊤} x) f (x)}) {\frac{f_{a_{1}} (a_{1}^{⊤} x)}{g_{a_{1}} (a_{1}^{⊤} x)} - \frac{f_{c} (c^{⊤} x)}{g_{c} (c^{⊤} x)}} g (x) d x$ . Moreover, the logarithm ln is negative on ${x \in ℝ_{*}^{d}; \frac{g (x) f_{c} (c^{⊤} x)}{g_{c} (c^{⊤} x) f (x)} < 1}$ and is positive on ${x \in ℝ_{*}^{d}; \frac{g (x) f_{c} (c^{⊤} x)}{g_{c} (c^{⊤} x) f (x)} \geq 1}$ .

Thus, the preliminary studies (A) and (B) show that $l n (\frac{g (x) f_{c} (c^{⊤} x)}{g_{c} (c^{⊤} x) f (x)})$ and ${\frac{f_{a_{1}} (a_{1}^{⊤} x)}{g_{a_{1}} (a_{1}^{⊤} x)} - \frac{f_{c} (c^{⊤} x)}{g_{c} (c^{⊤} x)}}$ always present a negative product. We can therefore conclude, since (c, a) ↦ PM(c, a₁) − PM(c, a) is not null for all c and for all a, with a ≠ a₁.

References

Sklar, M. Fonctions de répartition à n dimensions et leurs marges. Publ. Inst. Stat. Univ. 1959, 8, 229–231. [Google Scholar]
Joe, H. Multivariate Models and Dependence Concepts. Monographs on Statistics and Applied Probability, 1st ed.; Chapman and Hall/CRC: London, UK, 1997. [Google Scholar]
Nelsen, R.B. An introduction to Copulas. Springer Series in Statistics, 2nd ed.; Springer: New York, NY, USA, 2006. [Google Scholar]
Carriere, J.F. A large sample test for one-parameter families of Copulas. Comm. Stat. Theor. Meth. 1994, 23, 1311–1317. [Google Scholar]
Genest, C.; Rémillard, B. Tests of independence and randomness based on the empirical Copula process. Test 2004, 13, 335–370. [Google Scholar]
Fermanian, J.D. Goodness of fit tests for copulas. J. Multivariate Anal. 2005, 95, 119–152. [Google Scholar]
Genest, C.; Quessy, J.F.; Rémillard, B. Goodness-of-fit procedures for copula models based on the probability integral transformation. Scand. J. Stat. 2006, 33, 337–366. [Google Scholar]
Michiels, F.; De Schepper, A. A Copula Test Space Model—How to Avoid the Wrong Copula Choice. Kybernetika 2008, 44, 864–878. [Google Scholar]
Genest, C. Metaelliptical copulas and their use in frequency analysis of multivariate hydrological data. Water Resour. Res. 2009, 43, W09401:1–W09401:12. [Google Scholar]
Mesfioui, M.; Quessy, J.F.; Toupin, M.H. On a new goodness-of-fit process for families of copulas. La Revue Canadienne de Statistique 2009, 37, 80–101. [Google Scholar]
Genest, C.; Rémillard, B. Goodness-of-fit tests for copulas: A review and a power study. Insurance: Math. Econ. 2009, 44, 199–213. [Google Scholar]
Berg, D. Copula goodness-of-fit testing: An overview and power comparison. Eur. J. Finance 2009, 15, 675–701. [Google Scholar]
Bücher, A.; Dette, H. Some comments on goodness-of-fit tests for the parametric form of the copula based on L²-distances. J. Multivar. Anal. 2010, 101, 749–763. [Google Scholar]
Broniatowski, M.; Leorato, S. An estimation method for the Neyman chi-square divergence with application to test of hypotheses. J. Multivar. Anal. 2006, 97, 1409–1436. [Google Scholar]
Friedman, J.H.; Stuetzle, W.; Schroeder, A. Projection pursuit density estimation. J. Am. Statist. Assoc. 1984, 79, 599–608. [Google Scholar]
Huber, P.J. Projection pursuit. Ann. Stat. 1985, 13, 435–525. [Google Scholar]
Cambanis, S.; Huang, S.; Simons, G. On the theory of elliptically contoured distributions. J. Multivar. Anal. 1981, 11, 368–385. [Google Scholar]
Landsman, Z.M.; Valdez, E.A. Tail conditional expectations for elliptical distributions. N. Am. Actuar. J. 2003, 7, 55–71. [Google Scholar]
Yohai, V.J. Optimal robust estimates using the Kullback-Leibler divergence. Stat. Probab. Lett. 2008, 78, 1811–1816. [Google Scholar]
Toma, A. Optimal robust M-estimators using divergences. Stat. Probab. Lett. 2009, 79, 1–5. [Google Scholar]
Huber, P.J. Robust Statistics; Wiley: New York, NY, USA, 1981; (republished in paperback, 2004). [Google Scholar]
van der Vaart, A.W. Asymptotic statistics. Cambridge Series in Statistical and Probabilistic Mathematics; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
Scott, D.W. Multivariate Density Estimation. Theory, Practice, and Visualization. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics; A Wiley-Interscience Publication. John Wiley and Sons, Inc.: New York, NY, USA, 1992. [Google Scholar]
Touboul, J. Goodness-of-fit Tests For Elliptical And Independent Copulas Through Projection Pursuit. arXiv. Statistics Theory 2011. arXiv: 1103.0498. [Google Scholar]
Aurélien, A.; Damiano, B. New families of Copulas based on periodic functions. Commun. Stat. Theor. Meth. 2005, 34, 1437–1447. [Google Scholar]
Cressie, N.; Read, T.R.C. Multinomial goodness-of-fit tests. J. R. Stat. Soc. Series B 1984, 46, 440–464. [Google Scholar]
Csiszár, I. On topology properties of f-divergences. Studia Sci. Math. Hungar. 1967, 2, 329–339. [Google Scholar]
Liese, F.; Vajda, I. Convex Statistical Distances; BSB B. G. Teubner Verlagsgesellschaft: Leipzig, Germany, 1987. [Google Scholar]
Pardo, L. Statistical Inference Based on Divergence Measures. Statistics: Textbooks and Monographs; Chapman & Hall/CRC: Boca Raton, FL, USA, 2006. [Google Scholar]
Zografos, K.; Ferentinos, K.; Papaioannou, T. φ-divergence statistics: sampling properties and multinomial goodness of fit and divergence tests. Commun. Stat. Theor. Meth. 1990, 19, 1785–1802. [Google Scholar]
Azé, D. Eléments D'analyse Convexe et Variationnelle, Ellipses; Dunod: Paris, French, 1997. [Google Scholar]
Bosq, D.; Lecoutre, J.P. Livre-Theorie De L'Estimation Fonctionnelle; Economica, 1999. [Google Scholar]
Broniatowski, M.; Keziou, A. Parametric estimation and tests through divergences and the duality technique. J. Multivar. Anal. 2009, 100, 16–36. [Google Scholar]
Black and Scholes. The pricing of options and corporate liabilities. J. Polit. Econ. 1973, 81, 635–654. [Google Scholar]

Classification: MSC 62H05 62H15 62H40 62G15

© 2011 by the author; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/.)

Share and Cite

MDPI and ACS Style

Touboul, J. Goodness-of-Fit Tests For Elliptical and Independent Copulas through Projection Pursuit. Algorithms 2011, 4, 87-114. https://doi.org/10.3390/a4020087

AMA Style

Touboul J. Goodness-of-Fit Tests For Elliptical and Independent Copulas through Projection Pursuit. Algorithms. 2011; 4(2):87-114. https://doi.org/10.3390/a4020087

Chicago/Turabian Style

Touboul, Jacques. 2011. "Goodness-of-Fit Tests For Elliptical and Independent Copulas through Projection Pursuit" Algorithms 4, no. 2: 87-114. https://doi.org/10.3390/a4020087

Article Menu

Goodness-of-Fit Tests For Elliptical and Independent Copulas through Projection Pursuit

Abstract

1. Introduction

2. Basic theory

2.1. An Introduction to Copulas

Sklar's Theorem

Definition 2.1

Theorem 2.1 (Sklar [1])

Remark 2.1

Definition 2.2

Example 2.1

2.2. Brief Introduction to the ϕ-Projection Pursuit Methodology (ϕ-PP)

The Concept of ϕ-Divergence

Functioning of the Algorithm

Remark 2.2

Example 2.2

3. The Algorithm

3.1. The Model

Proposition 3.1

Definition 3.1

Remark 3.1

3.2. Stochastic Outline of Our Algorithm

4. Results

4.1. Hypotheses on f

Estimation of the First Co-Vector of f

Proposition 4.1

Proposition 4.2

Proposition 4.3

Theorem 4.1

Testing of the Criteria

Theorem 4.2

Corollary 4.1

Corollary 4.2

5. Goodness-of-Fit Tests

5.1. The Basic Idea

Proposition 5.1

5.2. With the Elliptical Copula

Theorem 5.1

Remark 5.1

5.3. With the Independent Copulas

Theorem 5.2

Remark 5.2

5.4. Study of the Subsequence (g(k′)) Defined by Dϕ(g(k′), f) = 0 for Any k′

Theorem 5.3

Remark 5.3

Corollary 5.1

6. Simulations

Simulation 6.1

Simulation 6.2

Simulation 6.3

Remark 6.1

6.1. Application to Real Datasets

6.2. Critics of the Simulations

6.3. Conclusions

Appendix

A. On the Different Families of Copula

A.1. Elliptical Copulas

The Gaussian Copula

The Elliptical Copula

Definition A.1

Property A.1

Remark A.1

Definition A.2

Example A.1

Definition A.3

A.2. Archimedean Copulas

A.3. Periodic Copula

B. ϕ-Divergence

Definition B.1

Property B.1

Property B.2

Property B.3

C. Miscellaneous

Lemma C.1

Lemma C.2

Lemma C3

Lemma C.4

Lemma C.5

Lemma C.6

5.4. Study of the Subsequence (g^(k′)) Defined by D_ϕ(g^(k′), f) = 0 for Any k′