Recovering the Most Entropic Copulas from Preliminary Knowledge of Dependence

: This paper provides a new approach to recover relative entropy measures of contemporaneous dependence from limited information by constructing the most entropic copula (MEC) and its canonical form, namely the most entropic canonical copula (MECC). The MECC can effectively be obtained by maximizing Shannon entropy to yield a proper copula such that known dependence structures of data (e.g., measures of association) are matched to their empirical counterparts. In fact the problem of maximizing the entropy of copulas is the dual to the problem of minimizing the Kullback-Leibler cross entropy (KLCE) of joint probability densities when the marginal probability densities are ﬁxed. Our simulation study shows that the proposed MEC estimator can potentially outperform many other copula estimators in ﬁnite samples.


Introduction
There has been a substantial literature on estimation and inference of relative entropy measures of joint dependence as measures of serial correlation. These particular measures of dependence were first proposed by Joe [1] and extended by Granger and Lin [2]. Relative entropy based measures of dependence have so far received much interest in econometrics because they provide very general concepts for gauging joint dependence; and they can be used for a set of variables that can be a mixture of continuous, ordinal-categorical, and nominal-categorical variables. Interested readers are referred to [3][4][5] for a concise review of important contributions in this area.
Econometricians have recently become interested in the computation of maximum entropy densities (see, e.g., Golan [6], Usta and Kantar [7], and references therein for the background and discussions regarding maximum entropy (ME) densities.) The ME densities are derived by maximization of an information criterion (the level of uncertainty) subject to mass and mean preserving constraints. The justification for using the ME in this context can be found in [8]. Rockinger and Jondeau [9] apply the ME method to determine the ME return distribution which is then utilized to extend Bollerslev's GARCH into autoregressive conditional skewness and kurtosis. Maasoumi and Racine [10] employ a metric entropy measure of dependence to examine the predictability of asset returns. Hang [11] uses the ME to determine flexible functional forms of regression functions subject to side conditions. Miller and Liu [12] propose a method to recover a joint distribution function by applying the KLCE distance while imposing a required degree of dependence through the joint moments. An example is the normal distribution which is completely characterized by first and second moments. In this case, the minimum KLCE distribution is the multivariate Normal distribution where the dependence is specified through conventional linear correlation.
There has been a great deal of interest in copulas, especially in financial economics, as they have the potential to model and explain asymmetric dependence between random variables separately from their marginal distributions. For example, Patton [13] employs various families of copulas to investigate the inter-relationship between univariate skewnesses, asymmetric dependence between asset returns, and the optimal portfolios of assets. Rodriguez [14] models financial contagion using copulas. Chollete, Heinen, and Valdesogo [15] propose a multivariate regime-switching copula to capture asymmetric dependence and regime-switching in portfolio selection. Ning, Xu, and Wirjanto [16] investigate asymmetric pattern in volatility clustering by employing a semi-parametric copula approach. Detailed indications of various econometric aspects or applications of copulas in economics and finance can be found, for instance, in the survey papers by Patton [17] and Fan and Patton [18]. A comprehensive treatment of copula theory is presented in the monograph by Nelsen [19].
Given the broad context described above, we propose a theoretical framework to recover relative entropy measures of joint dependence from limited information by constructing a set of the most entropic copulas (MEC's), which can essentially be done by maximizing Shannon entropy subject to constraints on the uniform marginal distributions and other constraints on the copula-based measures of dependence (or the distance between the MEC and an arbitrary nested copula). In the class of MECs, there exists a simplified form, namely the most entropic canonical copula (MECC). Moreover, it can be shown that the proposed MEC approach and the KLCE approach in Miller and Liu [12] are dual in the sense that they can recover the same joint distribution. Applications of MEC's to economics include Chu [20], Dempster, Medova, and Yang [21], Friedman and Huang [22], Veremyev, Tsyurmasto, Uryasev, and Rockafellar [23], Zhao and Lin [24].
We shall now discuss the contributions of the current paper in relation to [20]. The similarity between the two papers is that rank correlations are employed as prior information about dependence in order to construct the MECC. This paper differs from [20] in several respects. First, in [20], Carleman's condition permits constraints on moments to be employed so as to ensure that the MEC satisfies all the properties of a copula while, in the present paper, constraints are explicitly imposed on marginal copula densities. Therefore the entropy maximization problem defined in [20] is merely a good approximation of the entropy maximization problem in this study. Second the main problem in [20] is the standard entropy maximization problem while the main problem in the present paper involves a continuum of constraints on the marginal distributions, which can be written as integrals with varying end-points that need to be smoothed out by using kernels. This kernel-smoother can generate MECs with smooth densities whilst the discrete approximation technique proposed by [21] can only allow for MECs with discrete densities. The feasibility and benefits of the proposed approach to construct MECs will then be demonstrated through a Monte-Carlo simulation study presented in Section 3.
Although our analysis is restricted to the bivariate case, the multivariate case is a straightforward extension. The remainder of the paper is organized in three sections. In Section 2, we formulate and approximate most entropic copulas (MECs). Next, we discuss the link between the MEC and the minimum KLCE density and the extent to which the MEC is more flexible than the KLCE method. We then compute the MEC and the MECC subject to marginal constraints and other constraints on various copula-based dependence measures such as Spearman's rho and tau. We also outline the large sampling properties of the relevant parameter estimators. We present these results in Theorems 2.1-2.4. A simulation study is presented in Section 3, demonstrating that the MEC fits data well when compared with other competing procedures (e.g., parametric copulas and kernel estimators). Derivation of statistical properties for the proposed copula estimator is rather challenging and will be left for future research. Finally, to facilitate reading of this paper, we collect all materials of technical flavour into the three main appendices at the end of this paper.

Maximum Entropy and Copula
This section provides a brief explanation of entropy and copula. We refer to [25] for a comprehensive review of entropy econometrics and [19] for important results concerning copulas.
Shannon entropy has been used as an information criterion to construct the probability densities for economic or financial variables such as stock returns, income, GDP, etc. (see, inter alia, [26][27][28]). A univariate ME density is generally obtained by maximizing Shannon entropy, − p(x) log p(x)dx, with respect to p(x) under probability and moment constraints. A bivariate ME density that is closest to a given reference density, say the product of two univariate densities, can be obtained by minimizing the KLCE under joint moment constraints (see, e.g., [1] and [12]): where f is a bivariate density, g 1 and g 2 are some univariate densities, supp(g 1 ) = {x ∈ R : g 1 (x) = 0}, supp(g 2 ) = {y ∈ R : g 2 (y) = 0}, and h is an arbitrary function such that µ 0 < ∞. The copula is proposed by Sklar [29] as a method to construct joint distributions with given marginals. The advantage of copulas is that dependence between random variables can be parametrically specified entirely independently from their marginals. A bivariate copula is defined as a function C(·, ·) from [0, 1] 2 to [0, 1] with the following properties: 1] such that u 1 ≤ u 2 and v 1 ≤ v 2 (see, e.g., [19], p. 8)). Note that Property (2) always holds if C(u, v) has a positive density c(u, v), and Property (1) implies that a copula is a function with Uniform[0,1] marginals. Sklar's theorem links a copula, C(u, v), to a joint distribution, F(X, Y), via F(X, Y) = C(G 1 (X), G 2 (Y)), where G 1 and G 2 are the marginals.
We shall use measures of association and rank correlations to construct the MEC, which we discuss next. Measures of association are, unlike joint moments, invariant under nonlinear transformations of the underlying random variables, and thus they are natural measures of dependence for non-elliptical random variables (see Appendix A for formal definitions of measures of association). A measure of association is, in general, defined as τ = [0,1] 2 h(u, v)dC(u, v), where h is a bivariate function such that |τ| < +∞. This measure, based on C, is also referred to as the copula-based measure of dependence. In practice, τ can be estimated by the rank statistic τ = 1 represents the ranks of (X i , Y i ) in a sample of size N. An advantage of using rank statistics as nonparametric measures of nonlinear dependence is that they are robust-in the sense that they will be insensitive to contamination and maintain a high efficiency for heavier tailed elliptical distributions as well as for multivariate normal distributions (see, e.g., [30] for a detailed treatment of rank statistics). Examples of τ include Spearman's rho and Blest's rank correlations (see, e.g., [31]), which are summarized in Table 1.

Measures of Association Rank Correlation
Spearman's rho: Blest's measure I: Blest's measure II: Blest's measure III: Nonetheless, it is worth mentioning that the definition of τ is somewhat restrictive since it does not include Kendall's tau, for example. 1 Moreover, not every rank correlation can be formulated in terms of the above general rank statistic τ. For instance, the statistic R g , which was proposed by Gideon and Hollister [32] as a coefficient of rank correlation resistant to outliers even in a small sample, has the form: where p s is the value of S i with the subscript i satisfying R i = s, and [•] is the greatest integer notation. In addition, R g estimates a copula-based measure of dependence, In the present paper, we use the bivariate Shannon entropy of a copula, given by By Sklar's theorem the Shannon entropy of a copula is then equivalent to the KLCE: Hence, minimization of the KLCE and maximization of the bivariate Shannon entropy are dual problems. Let c(u, v) denote the MEC. Then, in view of [1], the relative entropy measure of dependence (recovered from limited information) is given by −W( c). Generally speaking, a multivariate Shannon entropy can be defined in an obvious way, and this dual relationship holds. However, as pointed out in Friedman and Huang [22] the problem of maximizing a multivariate Shannon entropy of copulas can suffer from the curse of dimensionality because the number of constraints (on the marginal densities) needed for the MEC to satisfy all the properties of a copula increases as the problem involve more dimensions.

The Most Entropic Copula
We assume for the rest of this paper that the MEC is a differentiable function so that its copula density exists. The bivariate MEC (or the MEC) is obtained by maximizing the bivariate Shannon entropy (2) under two following constraints: (1) the marginals of c(u, v) are Uniform[0,1]; and (2) the measures of association, defined in Section 2.1, are set equal to the corresponding rank correlations. We call this Problem EM.

Problem EM:
Maximize 1 We are indebted to a referee for pointing this out. subject to where (4) implies that c(u, v) is a joint density on the unit circle; Equations (5) and (6) imply that the marginals of c(u, v) are Uniform[0,1] distributions; Equation (7) imposes a constraint on the joint behavior of U and V. To give an example, let h(u, v; θ N ) = 12uv − 3 − ρ S , then the left-hand side of (7) becomes Spearman's rho and θ N = ρ S (note that, in what follows, we sometimes omit 'N' for brevity) is the rank correlation associated with Spearman's rho. To give another example, suppose that the true data generating copula, say C 0 (u, v), belongs to a family, C 0 . Given this prior information, to recover a MECC from the data, one may randomly choose a copula, τ} and τ is an estimate of the difference between the probabilities of concordance and discordance (cf. Appendix A). By doing this, it is expected that some feature of the family C 0 could be effectively incorporated into the MECC. Other examples of Equation (7) also include Blest's coefficients or Gideon and Hollister's (1987) coefficient, etc. Also note that we may have more than one constraint like (7). It is to be stressed at this point that some versions of the MEC problem may exhibit boundary solutions due to theoretical restrictions on the measures of dependence employed (e.g., the Hoeffding-Frechet bounds on correlation statistics). Consequently, the large-sample theory stated in Section 2.3 below only holds for interior solutions to the stated problem. 2 For future reference, we shall denote by c(u, v) = c(u, v, Λ), where Λ is a vector of coefficients, as the MEC [that solves Problem EM]. The MECs (accordingly the MECC) can then be approximated by replacing the continuums of varying end-points in (5) and (6) by sets of definite integrals. We now present an approximate solution to Problem EM in Theorem 2.1 below. THEOREM 2.1. The MEC, c(u, v), can be approximated by an approximator, c n,N h (u, v), as follows: We are indebted to a referee for suggesting to us this point. and Λ n = λ 0 . . . , λ 2 n −1 , γ 0 , . . . , γ 2 n −1 contains the minimal values of the following potential function: x −∞ exp{− 1 2 y 2 }dy is the standard normal cdf (arising from smoothing indicator functions, I (u ∈ [k2 −n , (k + 1)2 −n ]), with the Gaussian kernel) and c(u, v) is an arbitrary copula (which may involve a nuisance parameter that needs to be estimated).

Proof:
The proof utilizes the standard method of Variational Calculus for maximization of functions in normed linear spaces (see, e.g., [33], p. 129). See Appendix D.
As we can see, the MEC density nests an arbitrary copula, c(u, v), (cf. Equation (9)). Indeed, the MEC depends on both b 0 and c(u, v), thus no uniqueness is obtained. However, we can obtain a canonical form, which is called the MECC, by setting b 0 to zero. This idea of a canonical model can be traced back to Jeffreys 3 who proposed to use the principle of simplicity for deductive inference-that is, for any given set of data, there is usually an infinite number of possible laws that will "explain" the data precisely; and the simplest model should be chosen.
It is also worth noting at this point that, like the empirical copula, the MECC is a valid distribution function; however, it satisfies the Uniform[0,1] marginal constraints only asymptotically. In addition the potential function Q n,N h (Λ, θ) in the above theorem is a multivariate convex function of Λ, which in general has a unique minimum because it is the product of (positive) univariate convex functions.
We can claim that the MECC, c(u, v), is equivalent to a maximum likelihood estimator (MLE). Now, we need to verify this claim-given a bivariate sample (X i , Y i ) for i = 1, . . . , N, the average maximum log-likelihood function is given by (8), 2 n for every k = 0, . . . , (2 n − 1); and the last equality holds because ). Hence, the claim has been verified.

REMARK 2.1.
To compute the MECC, we could use either a Monte-Carlo integration procedure or Gaussian quadratures to approximate the potential function (10) (see Appendix C for further details), and then employ a global optimization technique (for example the stochastic search algorithm proposed by Csendes [34]) to minimize this function.
In general, we can also approximate c(u, v) by using a collection of equally-spaced partitions of the unit interval [0, 1], and then, a high-order kernel smoothing of the indicator function. This is stated in Theorem 2.2: THEOREM 2.2. The MEC, c(u, v), can be approximated by an approximator, c L,h (u, v), as follows: for some kernel function, K(•), in K r (R), where K r (R) is the space of symmetric, Lebesgue integrable, kernel functions of order, r, (cf. Definition B.1) and Λ L = { λ 1 , . . . , λ L , γ 1 , . . . , γ L } contains the minimal values of the following potential function: for a given b 0 and c(u, v).

Proof:
The proof is very similar to Theorem 2.1 combined with Lemma B.1. So we shall omit its details here.

Large Sample Properties with Unknown Parameters of Dependence
The approximate MECC densities are members of a statistical exponential family parametrized by the Lagrange multipliers. Since the true parameters of dependence Θ 0 in (7) are unknown, a random sample of size N is then used to form their consistent estimates Θ N . Therefore, the sampling properties of Λ N may be derived from the associated sampling properties of Θ N . Let Q n (Λ, Θ) represent the approximate potential function with the dependence parameters Θ as formulated in Section 2, where Λ N and Λ 0 denote the minimal values of Q n (Λ, Θ) for Θ = Θ N and Θ = Θ 0 respectively. The Hessian matrices of Q n (Λ, Θ) are H 1,n (Λ, Θ) = ∇ ΛΛ Q n (Λ, Θ) and H 2,n (Λ, Θ) = ∇ ΛΘ Q n (Λ, Θ). The following assumptions are maintained is also a non-empty and compact set, where dim(Λ) is the number of the Lagrange multipliers in Q n (Λ, Θ). Therefore, the number of marginal constraints is dim(Λ) − dim(M). AS2. The map from M to N is a diffeomorphism (i.e., one-to-one, continuous and onto in both directions). AS3. Q n (Λ, Θ) is a strictly convex function of Λ for all Θ and uniformly continuous (in probability) in AS4. The vector of dependence parameter estimates is asymptotically normal such that where Ψ is an asymptotic variance-covariance matrix of Θ N .
AS2 states that the relationship between M and N is one-to-one in both directions (i.e., for a given set of dependence parameter estimates Θ N in M there exists uniquely a set of the Lagrange multipliers Λ N in N which contains a unique subset of the Lagrange multipliers determining the dependence constraints). This assumption ensures that the potential function has uniquely minimal values for a given set of parameters. Conversely, these minimal values are uniquely determined by a set of parameters. Regarding AS4, Θ N may be a set of sample moments after N draws from the kernel densities constructed from actual data. If all the moments exist and Carleman's condition holds, then Θ N are consistent asymptotically normal estimates of Θ 0 (see, e.g., Hardel, Muller, Sperlich, and Werwatz [35]).

THEOREM 2.3.
In view of AS1-AS4, we obtain Proof: See Appendix D.
Proof: Noting that H 2,n (Λ 0 , Θ) = I, the proof follows directly from Theorem 2.3. Theorem 2.4 suggests that in general the efficiency of the estimators Λ N can be improved by using more marginal constraints. However, adding too many marginal constraints can decrease efficiency since this may increase the probability that the covariances of {u, v, h(u, v, Θ N )} in Q n (Λ, Θ N ) are negative. Thus, the Hessian matrix H 1,n (Λ 0 , Θ 0 ) contains some negative elements which may cause the asymptotic variance of N 1/2 ( Λ N − Λ 0 ) to increase overall. Theorems 2.3 and 2.4 can be used to develop tests of hypotheses about the "distance" between the MECC and another copula of the exponential function family.

Simulation
In this section, we perform some simulations to investigate the finite-sample properties of the MECC approximators (proposed above). We shall address three main issues in these simulations. First, the MECC can outperform the parametric copulas used in this study (the Gaussian copula, Student's t copula, the Clayton copula, and the Gumbel copula) while its performance remains comparable to other nonparametric estimators (i.e., the "shrinked" local linear (LLS) type kernel copula estimator and the "shrinked" mirror-reflection (MRS) kernel copula estimator proposed by Omelka, Gijbels, and Veraverbeke [36]). Second, an increase in the number of marginal constraints leads to an improvement in the performance of the MECC. Third the MECC, for the most part, becomes as stable as other parametric copulas as more marginal constraints are utilized.
To accomplish the above objectives, we choose Frank's copula, where θ ∈ (−∞, ∞)/{0}, as the true model whereby samples are generated. (See [37,38] for the statistical properties of Frank's copula.) This copula is radially symmetric and close to the independence as θ approaches the origin, i.e., lim θ→0 C(u, v; θ) = uv. Later, we shall use two values, 0.1 and 0.8, for the true parameter θ; these values, roughly speaking, correspond to the close-to-independence case and the weak dependence case respectively.
The simulation procedure is outlined as follows. First, we generate 100 samples of 5000 observations from Frank's copula for each value of θ. With these samples in hand, we estimate four commonly-used parametric copulas, mentioned above, by using MLE method. We also estimate 12 MECCs (that is, MECC(L, M) with combinations of L = 4, 16, 64 marginal constraints and M = 1, 2, 3, 4 joint moment constraints) by using our proposed method. To gauge the errors of these estimators, we shall use the integrated mean squared error (IMSE ); where c(u, v; θ) is the density of Frank's copula; and c(u, v) represents an estimate using one of the above-mentioned parametric copulas or a MECC. Next, for each copula, we use the 100 samples of 5000 observations drawn from Frank's copula to estimate the squared bias and the variance (as the functions of u and v). Both the integrated squared bias (Int. Bias 2 ) and the integrated variance (Int. Var.) are then obtained by evaluating the estimated squared bias ( Bias , where E denotes the empirical mean calculated using 100 samples) and the i=1 denotes a sample of 10000 points (drawn from the Uniform [0,1] distribution) whereby both c(u, v, θ) and c(u, v) are evaluated. To gauge the errors of the nonparametric copula estimators, we shall use the expressions for the asymptotic bias and variance given in [36,39]; the optimal bandwidth is obtained by minimizing the integrated asymptotic MSE [39]. We report our simulation results in Table 2.
First, it can be noticed from Table 2 that the MECCs significantly outperform elliptical copulas (i.e., the Normal copula and Student's t copula) in terms of Int. Bias 2 and IMSE. However, with a small number of marginal constraints the MECCs are mostly less stable than other parametric copulas; the only way to improve the stability (Int. Var.) of the MECCs is to increase the number of marginal constraints. For the close-to-independence case (θ = 0.1), the asymmetric copulas (i.e., the Clayton copula and the Gumbel copula) outperform the MECCs. The intuition for these asymmetric copulas to have small Int. Bias 2 and Int. Var. is that Frank's copula, the Clayton copula, and the Gumbel copula all behave like the independence copula for θ = 0.1. It is also interesting to note that the MECCs often outperform the LLS and MRS estimators in terms of Int. Bias whilst these nonparametric estimators outperform the MECCs in terms of Int. Var. The reason for the existence of non-zero Int. Bias in the LLS and MRS estimators is that the optimal bandwidth (being shrinked close to zero at the corners of the unit square) can keep the bias bounded, but does not completely remove the bias. Note: MECC(L, M) denotes the MECC estimated by using L marginal constraints and M moment constraints. All the figures are rounded to four decimal places. LLS is the "shrinked" version of the local linear-type kernel estimator of a copula [36,39]. MRS is the "shrinked" version of the mirror-reflection kernel estimator of a copula [36,40].
Second, when θ = 0.8 the data will become less independent, leading to a significant increase in Int. Bias 2 pertaining to the estimation of the Clayton copula and Gumbel copula by using samples drawn from Frank's copula. In this case, MECC(4,1), MECC(16,1), MECC(64,1), MECC(4,2), MECC(64,2), and MECC(64,3) all show significant improvements in Int. Bias 2 over all the other estimators. It is also important to note at this point that, for a fixed number of marginal constraints, Int. Bias 2 and Int. Var. tend to deteriorate as one increases the number of joint moment constraints. To ameliorate this, it suffices to increase the number of marginal constraints as one adds one more joint moment constraint into the MEC problem. Indeed, as shown in Table 2, for one joint moment constraint, one merely needs four marginal constraints to yield MECC(4,1) with minimum Int. Bias 2 and IMSE; meanwhile, for two joint moment constraints, one needs to use up to 64 marginal constraints to yield MECC(64,2) with minimum Int. Bias 2 , Int. Var., and IMSE. Our final observation is that, for a fixed number of moment constraints, an increase in the number of marginal constraints will always lead to a significant reduction in Int. Var.
Finally, to check the general validity of the obtained simulation results, we also replicate the above simulation study using data generated from Clayton copulas. Table 3 shows that the good performance of the MECCs relative to other copula estimators is still carried over to this case when a sufficient number of marginal constraints is being used. Note: MECC(L, M) denotes the MECC estimated by using L marginal constraints and M moment constraints. All the figures are rounded to four decimal places. LLS is the "shrinked" version of the local linear-type kernel estimator of a copula [36,39]. MRS is the "shrinked" version of the mirror-reflection kernel estimator of a copula [36,40].

Conclusions
We propose to employ the entropy-maximization principle to recover copulas from limited information regarding contemporaneous dependence between random variables. The main results of this article are twofold. First, we provide an entropy approach to recover relative entropy measures of joint dependence that are independent of marginal distributions by constructing most entropic copulas (MECs), in particular, their canonical forms, namely most entropic canonical copulas (MECC). Second, as a consequence of the MEC, we can construct ME joint distributions with a fixed dependence structure given by a MEC. Our method is shown to incorporate Miller and Liu [12]'s approach and can handle both moment-based and copula-based measures of dependence. Simulation results confirm that the accuracy of the approximate MECC can effectively be improved by increasing the number of side constraints.
where (X 1 , Y 1 ) and (X 2 , Y 2 ) are independent vectors of continuous random variables with joint distributions F 1 (X, Y) and F 2 (X, Y) respectively which have common marginals G 1 (X) (of X 1 , X 2 ) and G 2 (Y) (of Y 1 , Y 2 ). When (X 1 , Y 1 ) and (X 2 , Y 2 ) have the same joint distribution function F(X, Y), τ is Kendall's tau (τ K ). The other measures of dependence such as Spearman's rho and Gini's gamma can be defined similarly.

B. Auxiliary
where x is the Euclidean norm of x. Let f (x) be another function on R n such that | f (x)| < ∞. Then, at every point, x 0 , of continuity of f , Φ N (y, x) has the following properties: where δ(•) is Dirac's delta function. (See [44] (p. 30))
We now present a Quasi-Newton algorithm to minimize (C5).
To compute the MECC, we used a stochastic search algorithm to minimize (C5) whilst setting M = N = 30.

Proof of Theorem 2.3:
For all Θ T ∈ M, Q n (Λ, Θ T ) has a unique finite supremum for all T in view of AS1 and AS2. AS2 and Θ T p −→ Θ 0 implies that Prob{ Λ T ∈ ∂N } −→ 0. Thus, Q n (Λ, Θ T ) has a unique interior supremum Λ T for a sufficiently large T. Let Λ 0 denote the unique supremum of