Goodness-of-fit Tests For Elliptical And Independent Copulas Through Projection Pursuit

Two goodness-of-fit tests for copulas are being investigated. The first one deals with the case of elliptical copulas and the second one deals with independent copulas. These tests result from the expansion of the projection pursuit methodology we will introduce in the present article. This method enables us to determine on which axis system these copulas lie as well as the exact value of these very copulas in the basis formed by the axes previously determined irrespective of their value in their canonical basis. Simulations are also presented as well as an application to real datasets.


Outline of the article
The need to describe the dependency between two or more random variables triggered the concept of copulas. Let us consider a joint cumulative distribution function (cdf) F on R d and let us consider its cdf margins F 1 , F 2 , ...,F d , then a copula C is a function such that F = C(F 1 , F 2 , ..., F d ). Sklar (1959) is the first to have established the bases of this new theory. Several parametric families of copulas have since been defined, namely elliptical, archimedean, periodic copulas etcsee Joe (1997) and Nelsen (2006) as well as appendix A for an overview of these families. Finding criterias to determine the best copula for a given problem can only be achieved through a goodness-of-fit (GOF) approach. Several GOF copula approaches have so far been proposed in the literature, e.g. Carriere (1994), Genest and Rémillard (2004), Fermanian (2005), Genest Quessy and Rémillard (2006), Michiels and De Schepper (2008), Genest Favre Béliveau and Jacques (2009) , Mesfioui Quessy and Toupin (2009) , Genest Rémillard and Beaudoin (2009-2), Berg (2009), Bücher and Dette (2010), among others. However, the field is still at an embryonic stage which explains the current shortage in recommendations. In univariate distributions, the GOF assessment can be performed using for instance the well-known Kolmogorov test. In the multivariate field, there are fewer alternatives. A simple way to build GOF approaches for multivariate random variables is to consider multi-dimensional chi-square approaches, as in for example Broniatowski (2006). However, these approaches present feasibility issues for high dimensional problems due to the curse of dimensionality. In order to solve this, we will now introduce the theory of projection pursuit.
The objective of projection pursuit is to generate one or several projections providing as much information as possible about the structure of the dataset regardless of its size. Once a structure has been isolated, the corresponding data are transformed through a Gaussianization. Through a recursive approach, this process is iterated to find another structure in the remaining data, until no futher structure can be evidenced in the data left at the end. Friedman (1984) and Huber (1985) count among the first authors who introduced this type of approaches for evidencing structures. They each describe, with many examples, how to evidence such a structure and consequently how to estimate the density of such data through two different methodologies each. Their work is based on maximizing Kullback-Leibler divergence. In the present article, we will introduce a new projection pursuit methodology based on the minimisation of any φ-divergence greater than the L 1 -distance (φ-PP). As we will develop later on, this way of implementing this methodology encompasses all other previous methods. This algorithm also presents the extra advantage of being more robust and more rapid from a numerical standpoint. Finally, this process allows not only to carry out GOF tests for elliptical and independent copulas but also to determine the axis system upon which these very copulas are based. It will also enable us to derive the exact expression of these copulas in the basis constituted by these axes. This paper is organised as follows : section 1 contains preliminary definitions and properties. In section 2, we present in details the φ-projection pursuit algorithm. In section 3, we present our first results. In section 4, we introduce our tests. In section 5, we provide two simulations pertaining to the two major situations described herein and we will study a real case.

An introduction to copulas
In this section, we will introduce the concept of copula. We will also define the family of elliptical copulas through a brief reminder of elliptical distributions -see appendix A for an overview of other families.
Remark 1.2. Landsman (2003) shows that multivariate Gaussian distributions derive from They also show that if X = (X 1 , ..., X d ) has an elliptical density such that its marginals verify E(X i ) < ∞ and E(X 2 i ) < ∞ for 1 ≤ i ≤ d, then µ is the mean of X and Σ is a multiple of the covariance matrix of X. Consequently, from now on, we will assume this is indeed the case. Definition 1.4. Let t be an elliptical density on R k and let q be an elliptical density on R k ′ . The elliptical densities t and q are said to belong to the same family of elliptical densities, if their generating densities are ξ k and ξ k ′ respectively, which belong to a common given family of densities.
Example 1.1. Consider two Gaussian densities N(0, 1) and N((0, 0), Id 2 ). They are said to belong to the same elliptical family as they both present x → e −x as generating density.
Finally, let us introduce the definition of an elliptical copula which generalizes the above overview of the Gaussian copula : Definition 1.5. Elliptical copulas are the copulas of elliptical distributions.

Brief introduction to the φ-projection pursuit methodology (φ-PP)
Let us first introduce the concept of φ−divergence.

The concept of φ−divergence
Let ϕ be a strictly convex function defined by ϕ : R + → R + , and such that ϕ(1) = 0. We define a φ−divergence of P from Q -where P and Q are two probability distributions over a space Ω such that Q is absolutely continuous with respect to P -by p(x) )p(x)dx, if P and Q present p and q as density respectively. Throughout this article, we will also assume that ϕ(0) < ∞, that ϕ ′ is continuous and that this divergence is greater than the L 1 distance -see also Appendix B page 21.

Functioning of the algorithm
Let f be a density on R d . We define an instrumental density g with the same mean and variance as f . We start with performing the D φ (g, f ) = 0 test; should this test turn out to be positive, then f = g and the algorithm stops, otherwise, the first step of our algorithm consists in defining a vector a 1 and a density g (1) by where R d * is the set of non null vectors of R d and f a (resp. g a ) stands for the density of a ⊤ X (resp. a ⊤ Y) when f (resp. g) is the density of X (resp. Y). In our second step, we will replace g with g (1) and we will repeat the first step. And so on, by iterating this process, we will end up obtaining a sequence (a 1 , a 2 , ...) of vectors in R d * and a sequence of densities g (i) . We will thus prove that the underlying structures of f evidenced through this method are identical to the ones obtained through projection pursuit 4 methodologies based on Kullback-Leibler divergence maximisation, such as Huber's methodsee appendix E.3. We will also evidence the above structures, which will enable us to infer more information on f -see example below.
Remark 1.3. First, to obtain an approximation of f , we stop our algorithm when the divergence equals zero, i.e. we stop when D φ (g ( j) , f ) = 0 since it implies g ( j) = f with j ≤ d, or when our algorithm reaches the d th iteration, i.e. we approximate f with g (d) .
Finally, the specific form of the relationship (1.2) establishes that we deal with M-estimation. We can therefore state that our method is more robust than projection pursuit methodologies based on Kullback-Leibler divergence maximisation -see Yohai (2008), Toma (2009) as well as Huber (2004).
At present, let us study the following example: , with n being a bi-dimensional Gaussian density, and h being a non Gaussian density. Let us also consider g, a Gaussian density with the same mean and variance as f .
) reaches zero for e 3 = (0, 0, 1) ′ -where f 3 and g 3 are the third marginal densities of f and g respectively. We therefore obtain g( i.e. f coincides with g on the complement of the vector subspace generated by the family {a i } i=1,..., j -see also section 2 for a more detailed explanation.
In the remaining of the study of the algorithm, after having clarified the choice of g, we will consider the statistical solution to the representation problem, assuming that f is unknown and that X 1 , X 2 ,... X m are i.i.d. with density f . We will provide asymptotic results pertaining to the family of optimizing vectors a k,m -that we will define more precisely below -as m goes to infinity. Our results also prove that the empirical representation scheme converges towards the theoretical one.

The model
Let f be a density on R d . We assume there exists d non null linearly independent vectors a j , with j < d, n being an elliptical density on R d− j and with h being a density on R j , which does not belong to the same family as n. Let X = (X 1 , ..., X d ) be a vector with f as density.
We define g as an elliptical distribution with the same mean and variance as f . For simplicity, let us assume that the family {a j } 1≤ j≤d is the canonical basis of R d : The very definition of f implies that (X j+1 , ..., X d ) is independent from (X 1 , ..., X j ). Hence, the density of (X j+1 , ..., X d ) given (X 1 , ..., X j ) is n. Let us assume that D φ (g ( j) , f ) = 0, for some j ≤ d. We then get f (x) ...
Consequently, through lemma F.9 and through the fact that the conditional densities with elliptical distributions are also elliptical, as well as through the above relationship, we can infer that In other words, f coincides with g on the complement of the vector subspace generated by the family Now, if the family {a j } 1≤ j≤d is no longer the canonical basis of R d , then this family is again a basis of R d . Hence, lemma F.2 implies that The end of our algorithm implies that f coincides with g on the complement of the vector subspace generated by the family {a i } i=1,..., j . Therefore, the nullity of the φ−divergence provides us with information on the density structure.
In summary, the following proposition clarifies our choice of g which depends on the family of distribution one wants to find in f : More generally, the above proposition leads us to defining the co-support of f as the vector space generated by the vectors a 1 , ..., a j . Definition 2.1. Let f be a density on R d . We define the co-vectors of f as the sequence of vectors a 1 , ..., a j which solves the problem D φ (g ( j) , f ) = 0 where g is an elliptical distribution with the same mean and variance as f . We define the co-support of f as the vector space generated by the vectors a 1 , ..., a j .
Remark 2.1. Any (a i ) family defining f as in (2.1), is an orthogonal basis of R d -see lemma F.11

Stochastic outline of our algorithm
Let X 1 , X 2 ,..,X m (resp. Y 1 , Y 2 ,..,Y m ) be a sequence of m independent random vectors with the same density f (resp. g). As customary in nonparametric φ−divergence optimizations, all estimates of f and f a , as well as all uses of Monte Carlo methods are being performed using subsamples X 1 , X 2 ,..,X n and Y 1 , Y 2 ,..,Y n -extracted respectively from X 1 , X 2 ,..,X m and Y 1 , Y 2 ,..,Y m -since the estimates are bounded below by some positive deterministic sequence θ m -see Appendix C. Let P n be the empirical measure based on the subsample X 1 , X 2 ,.,X n . Let f n (resp. f a,n for any a in R d * ) be the kernel estimate of f (resp. f a ), which is built from X 1 , X 2 ,..,X n (resp. a ⊤ X 1 , a ⊤ X 2 ,..,a ⊤ X n ). As defined in section 1.2, we introduce the following sequences (a k ) k≥1 and (g (k) ) k≥1 : • a k is a non null vector of R d such that a k = arg min The stochastic setting up of the algorithm uses f n and g (0) n = g instead of f and g (0) = g, since g is known. Thus, at the first step, we build the vectorǎ 1 which minimizes the φ−divergence between f n and g f a,n g a and which estimates a 1 . Proposition C.1 and lemma F.8 enable us to minimize the φ−divergence between f n and g f a,n g a . Definingǎ 1 as the argument of this minimization, proposition 3.3 shows us that this vector tends to a 1 . Finally, we define the densityǧ (1) m asǧ (1) m = g fǎ 1 ,m gǎ 1 which estimates g (1) through theorem 3.1.
Now, from the second step and as defined in section 1.2, the density g (k−1) is unknown. Consequently, once again, we have to truncate the samples.
All estimates of f and f a (resp. g (1) and g (1) a ) are being performed using a subsample X 1 , X 2 ,..,X n (resp. m -which is a sequence of m independent random vectors with the same density g (1) ) such that the estimates are bounded below by some positive deterministic sequence θ m (see Appendix C). Let P n be the empirical measure based on the subsample X 1 , X 2 ,..,X n . Let f n (resp. g (1) n , f a,n , g (1) a,n for any a in R d * ) be the kernel estimate of f (resp. g (1) and f a as well as The stochastic setting up of the algorithm uses f n and g (1) n instead of f and g (1) . Thus, we build the vectorǎ 2 which minimizes the φ−divergence between f n and g (1) n f a,n g (1) a,n -since g (1) and g (1) a are unknown -and which estimates a 2 . Proposition C.1 and lemma F.8 enable us to minimize the φ−divergence between f n and g (1) n f a,n g (1) a,n . Definingǎ 2 as the argument of this minimization, proposition 3.3 shows that this vector tends to a 2 in n. Finally, we define the densityǧ (2) n asǧ (2) a 2 ,n which estimates g (2) through theorem 3.1. And so on, we will end up obtaining a sequence (ǎ 1 ,ǎ 2 , ...) of vectors in R d * estimating the covectors of f and a sequence of densities (ǧ (k) n ) k such thatǧ (k) n estimates g (k) through theorem 3.1.

Hypotheses on f
Let X 1 , X 2 ,..,X m (resp. Y 1 , Y 2 ,..,Y m ) be a sequence of m independent random vectors with the same density f (resp. g). As customary in nonparametric φ−divergence optimizations, all estimates of f and f a as well as all uses of Monte Carlo methods are being performed using subsamples X 1 , X 2 ,..,X n and Y 1 , Y 2 ,..,Y n -extracted respectively from X 1 , X 2 ,..,X m and Y 1 , Y 2 ,..,Y msince the estimates are bounded below by some positive deterministic sequence θ m -see appendix C. Let P n be the empirical measure of the subsample X 1 , X 2 ,.,X n . Let f n (resp. f a,n for any a in R d * ) be the kernel estimate of f (resp. f a ), which is built from X 1 , X 2 ,..,X n (resp. a ⊤ X 1 , a ⊤ X 2 ,..,a ⊤ X n ). At present, let us define the set of hypotheses on f . Discussion on several of these hypotheses can be found in appendix D. In the remaining of this section, to be more legible we replace g with g (k−1) . Let where P is the probability measure presenting f as density.
Similarly as in chapter V of Van der Vaart (1998), let us define : , f ), let us consider now four new hypotheses: (H5) : P ∂ ∂b M(a k , a k ) 2 and P ∂ ∂a M(a k , a k ) 2 are finite and the expressions P ∂ 2 ∂b i ∂b j M(a k , a k ) and I a k exist and are invertible. (H6) : There exists k such that PM(a k , a k ) = 0. (H7) : (Var P (M(a k , a k ))) 1/2 exists and is invertible.
(H0) : f and g are assumed to be positive and bounded and such that where K is the Kullback-Leibler divergence.

Estimation of the first co-vector of f
Let R be the class of all positive functions r defined on R and such that g( The following proposition shows that there exists a vector a such that f a g a minimizes D φ (gr, f ) in r: g a , f n ). Then,ǎ is a strongly convergent estimate of a, as defined in proposition 3.1.
Let us also introduce the following sequences (ǎ k ) k≥1 and (ǧ (k) n ) k≥1 , for any given n -see section 2.2.: •ǎ k is an estimate of a k as defined in proposition 3.2 withǧ (k−1) n instead of g, We also note thatǧ (k) n is a density.

Convergence study at the k th step of the algorithm:
In this paragraph, we show that the sequence (ǎ k ) n converges towards a k and that the sequence (ǧ (k) n ) n converges towards g (k) . Letč n (a) = arg sup c∈Θ P n M(c, a), with a ∈ Θ, andγ n = arg inf a∈Θ sup c∈Θ P n M(c, a). We state Proposition 3.3. Both sup a∈Θ č n (a) − a k andγ n converge toward a k a.s.

Testing of the criteria
In this paragraph, through a test of our criteria, namely a → D φ (ǧ (k) n f a,n [ǧ (k) ] a,n , f n ), we will build a stopping rule for this procedure. First, the next theorem enables us to derive the law of our criteria: Theorem 3.2. For a fixed k, we have √ n(Var P (M(č n (γ n ),γ n ))) −1/2 (P n M(č n (γ n ),γ n ) − P n M(a k , a k )) Law → N(0, I), where k represents the k th step of our algorithm and where I is the identity matrix in R d .
Note that k is fixed in theorem 3.2 sinceγ n = arg inf a∈Θ sup c∈Θ P n M(c, a) where M is a known function of k -see section 3.1. Thus, in the case when Hence, we propose the test of the null hypothesis (H 0 ) : Based on this result, we stop the algorithm, then, defining a k as the last vector generated, we derive from corollary 3.1 a α-level confidence ellipsoid around a k , namely is the quantile of a α-level reduced centered normal distribution and where P n is the empirical measure araising from a realization of the sequences (X 1 , . . . , X n ) and (Y 1 , . . . , Y n ). Consequently, the following corollary provides us with a confidence region for the above test: Corollary 3.2. E k is a confidence region for the test of the null hypothesis (H 0 ) versus (H 1 ).

The basic idea
Let f be a density defined on R 2 . Let us also consider g, a known elliptical density with the same mean and variance as f . Let us also assume that the family (a i ) is the canonical basis of R 2 and that D φ (g (2) , f ) = 0. Hence, since lemma F.9 page 27 implies that g ( j−1) Moreover, we get f with g (2) = f , as derived from property B.1 page 21.
where C f (resp. C g ) is the copula of f (resp. g).
More generally, if f is defined on R d , then the family (a i ) is once again free -see lemma F.10 page 27 -, i.e. the family (a i ) is once again a basis of R d . The relationship D φ (g (d) , f ) = 0 therefore since lemma F.9 page 27 implies that g (k−1) Finally, putting A = (a 1 , ..., a d ) and defining vector y (resp. densityf , copulaC f off , densityg, copulaC g ofg) as the expression of vector x (resp. density f , copula C f of f , density g, copula C g of g) in basis A, then, the following proposition provides us with the density associated with the copula of f as being equal to the density associated with the copula of g in basis A :

With the elliptical copula
Let f be an unknown density defined on R d . The objective of the present section is to determine whether the copula of f is elliptical. We thus define an instrumental elliptical density g with the same mean and variance as f , and we follow the procedure of section 2.2. As explained in section 4.1, we infer from proposition 4.1 that the copula of f equals the copula of g when D φ (g (d) , f ) = 0, i.e. when a d is the last vector generated from the algorithm and when (a i ) is the canonical basis of R d . Thus, in order to verify this assertion, corollary 3.1 page 9 provides us with a α-level confidence ellipsoid around this vector, namely is the quantile of a α-level reduced centered normal distribution, where P n is the empirical measure araising from a realization of the sequences (X 1 , . . . , X n ) and (Y 1 , . . . , Y n ) -see appendix C -and where M is a known function of d , f n and g (d−1) n -see section 3.1. Consequently, keeping the notations introduced in section 4.1, we can perform a statistical test of the null hypothesis Since, under (H 0 ), we have D φ (g (d) , f ) = 0, then the following theorem provides us with a confidence region for this test. Thus, our method enables us to tell wether the copula of f equals the copula of g in the (a 1 , . . . , a d ) basis.

With the independent copulas
Let f be a density on R d and let X be a random vector with f as density. The objective of this section is to determine whether f is the product of its margins, i.e. whether the copula of f is the independent copula. Let thus g be an instrumental product of univariate Gaussian density -with diag(Var(X 1 ), ..., Var(X d )) as covariance matrix and with the same mean as f -as explained at section 4.2, let us follow the procedure described at section 2.2, i.e. proposition 4.1 infers that the copula of f is the independent copula when D φ (g (d) , f ) = 0. Thus, we perform a statistical test of the null hypothesis : Thus, our method enables us to determine if the the copula of f is the independent copula in the (a 1 , . . . , a d ) basis.

Study of the subsequence
where q -such that q ≤ d -is its cardinal. In the present section, our goal is to study the subsequence (g (k ′ ) ) of the sequence (g (k) ) k=1..d defined by D φ (g (k ′ ) , f ) = 0 for any k ′ belonging to Q. First, we have :  = (a 1 , . . . , a d ) basis.

Simulations
Let us examine two simulations and an application to real datasets. The first simulation studies the elliptical copula and the second studies the independent copula. In each simulation, our program will aim at creating a sequence of densities (g ( j) ), j = 1, .., d such that g (0) = g, , f ), for all j = 1, ..., d. We will therefore perform the tests introduced in theorems 4.1 and 4.2.

Application to real datasets
Let us for instance study the moves in the stock prices of Renault and Peugeot from January 4, 2010 to July 25, 2010. We thus gather 140(=n) data from these stock prices -see data below. Let us also consider X 1 (resp. X 2 ) the random variable defining the stock price of Renault (resp. Peugeot). We will assume -as it is commonly done in mathematical finance -that the stock market abides by the classical hypotheses of the Black-Scholes model -see Black and Scholes (1973). Consequently, X 1 and X 2 each present a log-normal distribution as probability distribution. Let f be the density of vector (ln(X 1 ), ln(X 2 )), let us now apply our algorithm to f with the Kullback-Leibler divergence as φ-divergence. Let us generate then a Gaussian random variable Y with a density -that we will name g -presenting the same mean and variance as f . We first assume that there exists a vector a such that D φ (g f a g a , f ) = 0. In order to verify this hypothesis, our reasoning will be the same as in Simulation 5.1. Indeed, we assume that this vector is a co-factor of f . Consequently, corollary 3.2 enables us to estimate a by the following 0.9(=α) level confidence ellipsoid 02140776}. And, we obtain H 0 : a 1 ∈ E 1 : True K(Kernel Estimation of g (1) , g (1) ) 4.3428735 Therefore, our first hypothesis is confirmed. However, our goal is to study the copula of (ln(X 1 ), ln(X 2 )). Then, as explained in section 4.4, we formulate another hypothesis assuming that there exists a vector a such that D φ (g (1) f a g (1) a , f ) = 0. In order to verify this hypothesis, we will use the same reasoning as above. Indeed, we assume that this vector is a co-factor of f . Consequently, corollary 3.2 enables us to estimate a by the following 0.9(=α) level confidence ellipsoid 02140776}. And, we obtain H 0 : a 2 ∈ E 2 : True K(Kernel Estimation of g (2) , g (2) ) 4.38475324 Therefore, our second hypothesis is confirmed.
In conclusion, as explained in corollary 4.1, the copula of f is equal to 1 in the {a 1 , a 2 } basis. Figure 3: Graph of the copula of (ln(X 1 ), ln(X 2 )) in the canonical basis. Figure 4: Graph of the copula of (ln(X 1 ), ln(X 2 )) in the {a 1 , a 2 } basis. Figure 5: Graph of the copula of (ln(X 1 ), ln(X 2 )) in the {a 1 , a 2 } basis -other view.

Critics of the simulations
In the case where f is unknown, we will never be sure to have reached the minimum of the φ-divergence: we have indeed used the simulated annealing method to solve our optimisation problem, and therefore it is only when the number of random jumps tends in theory towards infinity that the probability to get the minimum tends to 1. We also note that no theory on the optimal number of jumps to implement does exist, as this number depends on the specificities of each particular problem. Moreover, we choose the 50 − 4 4+d for the AMISE of the two simulations. This choice leads us to simulate 50 random variables -see Scott (1992) page 151 -, none of which have been discarded to obtain the truncated sample. This has also been the case in our application to real datasets. Finally, the shape of the copula in the case of real datasets in the {a 1 , a 2 } basis is also noteworthy. Figure 4 shows that the curve reaches a quite wide plateau around 1, whereas Figure 5 shows that this plateau prevails on almost the entire [0, 1] 2 set. We can therefore conclude that the theoritical analysis is indeed confirmed by the above simulation.

Conclusion
Projection Pursuit is useful in evidencing characteristic structures as well as one-dimensional projections and their associated distribution in multivariate data. This article clearly evidences the efficiency of the ϕ-projection pursuit methodology for goodness-of-fit tests for copulas. Indeed, the robustness as well as the convergence results we achieved, convincingly fulfilled our expectations regarding the methodology used.

A. On the different families of copula
There exists many copula families. Let us here present the most important amongst them.

A.1. Archimedean copulas
These copulas present a simple form with properties such as associativity and have a variety of dependent structures. They can generally be defined under the following form where (u 1 , u 2 , . . . , u n ) ∈ [0, 1] n and where Ψ is known as a "generator function". This Ψ function must be at least d − 2 times continuously differentiable, must have a decreasing and convex d − 2 derivative, and must be such that Ψ(1) = 0.
Let us now present several examples : 1/ Clayton copula: The Clayton copula is an asymmetric archimedean copula, exhibiting greater dependency in the negative tail than in the positive tail. Let us define X (resp. Y) as the random vector having F (resp G) as cumulative distribution function (CDF). Assuming that the vector (X, Y) has a Clayton copula, then this copula is given by: H(x, y) = (F(x) θ + G(y) θ − 1) 1/θ And its generator is: Ψ(x) = x θ − 1 For θ = 0 in the Clayton copula, the random variables are statistically independent. The generator function approach can be extended to create multivariate copulas, simply by including more additive terms. 2/ Gumbel copula: The Gumbel copula (a.k.a. Gumbel-Hougard copula) is an asymmetric archimedean copula, exhibiting greater dependency in the positive tail than in the negative tail. This copula is given by: Ψ(x) = (− ln(x)) α

A.2. Periodic copula
In 2005, Aurélien Alfonsi and Damiano Brigo (2005) introduced a way of constructing copulas based on periodic functions. Defining h (resp. H) as a 1-periodic non-negative function that integrates to 1 over [ 0, 1] (resp. as a double primitive of h), then both are copula functions, the second one not being necessarily exchangeable.

B. φ-Divergence
Let us call h a the density of a ⊤ Z if h is the density of Z. Let ϕ be a strictly convex function defined by ϕ : R + → R + , and such that ϕ(1) = 0.
Definition B.1. We define a φ−divergence of P from Q, where P and Q are two probability distributions over a space Ω such that Q is absolutely continuous with respect to P, by D φ (Q, P) = ϕ( dQ dP )dP. (B.1) The above expression (B.1) is also valid if P and Q are both dominated by the same probability.
Property B.2. The divergence function Q → D φ (Q, P) is • convex, • lower semi-continuous, for the topology that makes all the applications of the form Q → f dQ continuous where f is bounded and continuous, and • lower semi-continuous for the topology of the uniform convergence.
Finally, we will also use the following property derived from the first part of corollary (1.29) page 19 of Friedrich and Igor (1987), Property B.3. If T : (X, A) → (Y, B) is measurable and if D φ (P, Q) < ∞, then D φ (P, Q) ≥ D φ (PT −1 , QT −1 ) with equality being reached when T is surjective for (P, Q).