Nonparametric Estimation of Conditional Copula using Smoothed Checkerboard Bernstein Sieves

Conditional copulas are useful tools for modeling the dependence between multiple response variables that may vary with a given set of predictor variables. Conditional dependence measures such as conditional Kendall's tau and Spearman's rho that can be expressed as functionals of the conditional copula are often used to evaluate the strength of dependence conditioning on the covariates. In general, semiparametric estimation methods of conditional copulas rely on an assumed parametric copula family where the copula parameter is assumed to be a function of the covariates. The functional relationship can be estimated nonparametrically using different techniques but it is required to choose an appropriate copula model from various candidate families. In this paper, by employing the empirical checkerboard Bernstein copula (ECBC) estimator we propose a fully nonparametric approach for estimating conditional copulas, which doesn't require any selection of parametric copula models. Closed-form estimates of the conditional dependence measures are derived directly from the proposed ECBC-based conditional copula estimator. We provide the large-sample consistency of the proposed estimator as well as the estimates of conditional dependence measures. The finite-sample performance of the proposed estimator and comparison with semiparametric methods are investigated through simulation studies. An application to real case studies is also provided.


Introduction
Copulas have found many applications in the field of finance, insurance, system reliability, etc., owing to its utility in modeling the dependence among variables (see, e.g., Nelsen (2007), Jaworski et al. (2010) and Joe (2014) for details about copulas and their applications).In some situations, the dependence structure between variables can be influenced by a set of covariates and it is thereby of interest to understand how such dependence changes with the values of covariates.For instance, it is well known that the life expectancy at birth of males and females in a country are often highly dependent due to shared economic or environmental factors and it is possible that the strength of dependence relies on these factors.When the covariate is binary or discrete-valued with few levels, one can estimate a copula for each given level of the discrete-valued covariate separately.In constrast, the influence of continuous-value covariate on the dependence structure should be formulated in a functional way, and this is where conditional copulas (Patton (2006a); Patton (2006b)) along with the corresponding conditional versions of dependence measures come into play.
Suppose we are interested in the dependence among the components of a random vector Y = (Y 1 , Y 2 , ..., Y d ), given covariates X = (X 1 , X 2 , ..., X p ).The conditional joint and marginal distribution of Y given X = x can be denoted as and If F 1x , F 2x , ..., F dx are continuous, then by an extension of the well-known Sklar's theorem (Sklar (1959)) for conditional distributions (e.g.see Patton (2006b)), there exists a unique copula C x such that and the function C x is called a conditional copula, which captures the conditional dependence structure of Y given X = x.The focus of this paper is modeling continuous-valued responses and covariates.Thus, in what follows, we assume that the conditional marginal CDFs F 1x , j = 1, . . ., d and the CDFs of each response and covariate are absolutely continuous.
The literature contains a variety of parametric families for modeling copulas.Some commonly used copula families are Archimedean copulas, elliptical copulas, etc.; see Žežula (2009) and Joe (2014), etc.Assuming that the conditional copula belongs to a parametric copula family where the copula parameter is a function of the covariate(s), there has been previous work addressing the estimation of conditional copula in a semiparametric setting.
In regard to frequentist methods based on an assumed parametric class, Acar et al. (2011) propose to estimate the functional relationship between the copula parameter and the covariate nonparametrically by using local likelihood approach.But they assume known marginals and the maximization is conducted for a fixed value of the covariate, i.e., With the intention of identifying the entire function between the copula parameter and the covariate, it is necessary to solve the maximization problem for a sufficiently large grid of values within the range of the covariate.Abegaz et al. (2012) extend the work to a more general setting of unknown marginals and apply a two-stage technique that has been widely adopted in copula estimation: in the first stage, the nonparametric estimates of conditional marginals are obtained using kernel-based method and by plugging in theses estimates, the functional link is estimated by maximizing the pseudo log-likelihood in the second stage.As alternative estimation methods for the function relationship, Vatter and Chavez-Demoulin (2015) develop generalized additive models for conditional dependence structures, and Fermanian and Lopez (2018) introduce so-called single-index copulas, etc.
However, the misspecification of copula family could lead to severely biased estimation even though a sophisticated and flexible parametric model is employed (e.g.see Geerdens et al. ( 2018)), so it is required to select an appropriate copula model from a large number of candidate families.In order to do so, many copula selection techniques have been proposed either in either frequentist or Bayesian setting, e.g., Acar et al. (2011) select the copula family based on cross-validated prediction errors, while the deviance information criterion (DIC) is utilized for the choice of copula in Craiu and Sabeti (2012).
Acknowledging the limitations of parametric copula models as mentioned above, fully nonparametric approaches have also been proposed for conditional copula estimation.Gijbels et al. (2011) suggests the empirical estimators for conditional copulas where the weights are smoothed over the covariate space through kernel-based methods.They further derive nonparametric estimates for the conditional dependence measures including conditional Kendall's tau and conditional Spearman's rho.Since the bandwidth selection is very crucial for any of the smoothing methods, they also develop an algorithm for selecting the bandwidths.The asymptotic properties of the estimators together with conditional dependence measure estimates are established in Veraverbeke et al. (2011).Gijbels et al.
(2012) further consider more complex covariates like multivariate covariates and box-type conditioning events are studied in Derumigny and Fermanian (2020).On the other hand, there has been recent work on Bayesian nonparametric estimation of conditional copula.Leisen et al. (2017) introduce the effect of a covariate to the Bayesian infinite mixture models proposed by Wu et al. (2014).However, large-sample asymptotic properties of the Bayesian models have been almost unexplored and still remains an area of open work.
In this paper, we focus on the nonparametric estimation of conditional copulas and have realized that it can be done in a relatively easy way by employing the empirical checkerboard Bernstein copula (ECBC) estimator.When the covariates are continuous-valued, the main idea of extending the copula models to include covariates is to first estimate the full copula of responses along with covariates and then take partial derivatives to obtain the conditional distribution of responses given the covariates.As a fully nonparametric approach, it is not required to make any selection of the proper copula family, which is a key step in semiparametric methods to avoid the adverse consequence of model misspecification.Compared to the kernel-based empirical estimators, the selection of bandwidths is unnecessary either, making it easy to implement in practice.The proposed ECBC-based conditional copula estimator immediately leads to nonparametric estimates of the conditional dependence measures, which can be expressed in a very neat form under matrix operations.The large-sample consistency of the proposed estimator is also provided in the paper.

Models for Conditional Copula
In the following, we focus on the bivariate conditional copula of (Y 1 , Y 2 ) with a single covariate X for simplicity.Notice that the extension to more than two dimensions and multiple covariates is straightforward.As suggested by Gijbels et al. (2011), it is often favorable to remove the effect of the covariate on the marginal distributions before estimating C x .In order to do that, we can transform the original observations (Y i1 , Y i2 ) to marginally uniformly distributed (unobserved) samples which can be estimated by pseudo-observations where F1x and F2x are the estimated conditional marginal distributions.
Motivated by Janssen et al. (2016) who apply the empirical Bernstein estimator of bivariate copula derivative to conditional distribution estimation with a single covariate, we are able to use the proposed multivariate copula estimator ECBC to estimate the conditional marginal distributions of Y 1 and Y 2 given X = x, respectively.Specifically, , where F nY j and F nX are the modified empirical estimation of the (unconditional) marginal distributions F Y j and F X , respectively, e.g., . These pseudo-observations can be then treated as samples from a 2-dimensional copula C j , which can be estimated by the ECBC copula estimator as follows where and C # jn is the empirical checkerboard copula.Then the partial derivative C (1) j of C j with respect to v can be estimated by using where Notice that the following relationship holds between the conditional marginal distribution function of Y j given X = x and the partial derivative C (1) Thus, we can estimate the conditional marginal distributions using for j = 1, 2, and then the corresponding pseudo-observations ( Ûi1 , Ûi2 ), i = 1, ..., n of the conditional copula C x adjusted for the effect of the covariate on the marginal distributions can be estimated as given in ( 5).
Now we can use the covariate-adjusted pseudo-observations ( Ûi1 , Ûi2 ), i = 1, ..., n along with the pseudo-observations of the covariate Vi , i = 1, ..., n to estimate a 3-dimensional copula C(u 1 , u 2 , v) again using ECBC and denote it as C # (u 1 , u 2 , v).Similar to (8), it is easy to obtain the partial derivative C #(1) of C # with respect to v, which is denoted as where Notice that we can use C #(1) (u 1 , u 2 |F nX (x)) as an estimate of the conditional copula C x , however, C #(1) (u 1 , u 2 |v) is itself a valid bivariate copula for any value of v ∈ [0, 1] only asymptotically.This is because the conditional marginal distributions of C #(1) (u 1 , u 2 |v) are not necessarily uniform distributions for finite samples.Aiming to obtain a more accurate estimate of the conditional copula for small samples, we consider the conditional marginal distributions of C #(1) (u 1 , u 2 |v) given as and By using Sklar's theorem, we are able to obtain a conditional copula estimator which is a genuine copula itself denoted as where F −1 1 (u 1 |v) and F −1 2 (u 2 |v) are the inverse functions of F 1 and F 2 , respectively.It is to be noted that C # (u 1 , u 2 |v) is a valid copula for any value of v ∈ [0, 1], and as a result, the conditional copula C x can be estimated by Theorem 1. Assume that the underlying trivariate copula C(u 1 , u 2 , v) is absolutely continuous and conditional copula Then for any fixed 0 < v < 1, we have where the expectation is taken with respect to the empirical prior distribution of l 1 , l 2 , and m as given for ECBC.
Remark: Following the hierarchical shifted Poisson distributions proposed for ECBC in Lu and Ghosh (2023) , the empirical prior distribution of l 1 , l 2 , and m are given as The choice of the above priors are motivated by the asymptotic theory of empirical checkerboard copula methods Janssen et al. (2014).The use of sample size or more generally data dependent priors have been used extensively in literature (e.g., see Wasserman (2000) and Parrado-Hernández et al. ( 2012)) and have been shown to produce desirable asymptotic properties of the posterior distributions.
Next, by extending the dependence measures given in Schweizer et al. (1981) to conditional versions, we are able to estimate the conditional dependence measures (e.g.conditional Spearman's rho, conditional Kendall's tau, etc.) using the estimator C #(1) (u 1 , u 2 |v).
For instance, the estimate of conditional Kendall's tau takes the form and the estimate of conditional Spearman's rho is given as Let us denote Then we can rewrite the estimator C #(1) (u 1 , u 2 |v) and its conditional marginal distributions as and respectively.As a result, a closed-form estimate of conditional Kendall's tau takes the form where B is the beta function.Similarly, we are able to obtain a closed-form estimate of conditional Spearman's rho as ( For the purpose of computing the estimates of conditional dependence measures more efficiently, we apply matrix operations to the tensor products in expressions ( 24) and ( 25).
For given (h 1 , h 2 ), h 1 = 1, . . ., l 1 , h 2 = 1, . . ., l 2 , let us denote Then we have a h 1 = (a h 1 ,0 , . . ., a h 1 ,l 1 −1 ) T and b h 2 = (b h 2 ,0 , . . ., b h 2 ,l 2 −1 ) T .We also denote a Thus, the estimate of conditional Kendall's tau given in (29) can be rewritten as Furthermore, we can denote two l 1 ×l 2 matrices, Similarly, we are able to rewrite the estimate of conditional Spearman's rho given in (25).Let us first denote two vectors, If we further denote two By applying the above matrix operations, we are able to obtain very neat expressions of the estimates of conditional dependence measures and the computational efficiency can be improved significantly.

Numerical Illustrations using Simulated Data
We now show the finite-sample performance of the conditional copula estimator C # x (u 1 , u 2 ).Similar to the simulation setup in Acar et al. (2011), data (U i1 , U i2 |X i ), i = 1, ..., n are generated from the Clayton copula using the package copula in R under the following models: (U i1 , U i2 )|X i ∼ C(u 1 , u 2 |θ i ), where θ i = exp(0.8Xi − 2) and X i ∼ U nif (0, 3).The true copula parameter varies from 0.14 to 1.49 with Spearman's rho ranging from 0.10 to 0.60.The pseudo-observations of the covariate are defined as where F nX (x) = 1 n+1 n i=1 I(X i ≤ x).N = 100 replicates are drawn from the true copula with sample size n = 200.
Figure 1 shows the contour plots of the Monte Carlo average of the estimated C # x (u 1 , u 2 ) given x = 0.5, x = 1, x = 1.5, and x = 2, respectively, across 100 Monte Carlo replicates.
The contour plots are drawn based on a 15 × 15 equally spaced grid of points in the unit square, meaning that for a given v we need to find 15 + 15 = 30 roots.Since F 1 and F 2 are both non-decreasing functions, we can calculate the inverse functions F −1 1 (u 1 |v) and F −1 2 (u 2 |v) by applying the function uniroot in R to equations ( 22) and ( 23) for a given value of v.The true copula parameters are 0.20 (Spearman's rho equal to 0.09), 0.30 (Spearman's rho equal to 0.20).0.45 (Spearman's rho equal to 0.27), and 0.67 (Spearman's rho equal to 0.37) for x = 0.5, x = 1, x = 1.5, and x = 2, respectively.It can be observed from the plots that all the estimated contour lines overlap with the true lines at the boundaries, which is a evidence that the conditional copula estimator C # (u 1 , u 2 |v) is a genuine copula with uniform conditional marginal distributions.Moreover, there is almost no bias between the estimated conditional copula averaged over 100 Monte Carlo samples and the true conditional copula across different values of the covariate, illustrating the proposed ECBC-based method works well in estimating conditional copula.
Then we can plot the conditional Kendall's tau and conditional Spearman's rho as given in ( 29) and ( 31) as a function of the covariate.The covariate x ranges from 0 to 3 so we compute the dependence measures at seven different values (0.05, 0.5, 1, 1.5, 2, 2.5, 2.95).
The following plots show the Monte Carlo average of estimates of dependence measures and the 90% Monte Carlo confidence bands (5th and 95th percentiles of the dependence measure estimates) across 100 Monte Carlo replicates.
Overall, the estimates averaged over 100 Monte Carlo samples seem to be fairly close to the true conditional dependence measures across different values of the covariate.The variance tends to increase and the Monte Carlo average tends to underestimate a little bit when it gets closer to the boundaries of the covariate.
Next, we would like to compare the performance of our proposed nonparametric method to the semiparametric method in Acar et al. (2011) through simulation studies.They assume a conditional copula model where the copula function comes from a parametric copula family and the copula parameter is a function of the covariate.Different copula families, e.g., Clayton and Gumbel, were considered and the functional relationship between the  copula parameter and the covariate was estimated using a nonparametric local likelihood approach.The severe consequence of misspecified copula model was investigated in Acar et al. (2011) and they proposed a copula selection method based on cross-validated prediction errors.In contrast, the proposed conditional copula estimator is fully nonparametric so there is no need to make any choice of the copula family.
Simulation setups follow Acar et al. (2011).The data (U i1 , U i2 |X i ), i = 1, ..., n are generated from the Clayton copula under the following models: where (i): The comparison can be done numerically by calculating the conditional Kendall's tau and some performance measures, including the integrated square Bias (IBIAS 2 ), integrated Variance (IVAR) and integrated mean square error (IMSE) as given in Acar et al. (2011): where the second equality holds because τ x (X) = τ (F X (X)) = τ (V ) and X ∼ U nif (2, 5).
We compute Monte Carlo estimates of these performance measures by following the tricks in Segers et al. (2017) and compare our proposed method (referred to as "ECBC-based") to the local likelihood method (referred to as "Local") in Acar et al. (2011).The results are shown in Table 1.
From the results we can see that when data are generated from the Clayton copula (the underlying true copula), our ECBC-based method outperforms the local likelihood method for the incorrect parametric case (Gumbel) in terms of bias and MSE, although the performance is not as good as the local likelihood method for the correct parametric case (Clayton).Nonetheless, the advantage of the proposed nonparametric method is that we can avoid the adverse impact of misspecified copula and obtain fairly good estimation of conditional copula and conditional dependence measures without having to select the 'best' copula model from numerous copula families.
Table 1: Comparison of the proposed method (referred to as "ECBC-based") to the local likelihood method (referred to as "Local") using Monte Carlo estimates of three performance measures, IBIAS 2 , IVAR and IMSE.Data are generated from the Clayton copula under two different functional relationships between the copula parameter and the covariate.

Real Case Study
We now apply the proposed methodology to a data set of life expectancy at birth of males and females with GDP (in USD) per capita as a covariate for 210 countries or regions.The data are available from the World Factbook 2020 of CIA.Similar data sets were analyzed in Gijbels et al. (2011) and Abegaz et al. (2012).Life expectancy at birth summarizes the average number of years to be lived in a country while GDP per capita is often considered as an indicator of a country's standard of living.We are interested in the dependence between the life expectancy at birth of males and females and would like to see if the strength of dependence is influenced by the GDP per capita.In other words, it is of interest to investigate the dependence between the life expectancy at birth of males (Y 1 ) and females (Y 2 ) conditioning on the covariate X, where X = log 10 (GDP) is log 10 transformation of GDP per capita.
The pairwise scatterplots of the data are shown in Figure 3(a), from which we can see that there is strong positive correlation between the life expectancy of males (referred to as Male) and females (referred to as Female).Figure 3(a) also shows that the life expectancy tends to increase with the log 10 transformation of GDP per capita (referred to as log10.GDP) for both males and females.Before estimating the conditional copula of (Y 1 , Y 2 ) given X, we first remove the effect of the covariate X on the marginal distributions of Y 1 and Y 2 .As a result, the covariate-adjusted pseudo-observations of Y 1 and Y 2 (referred to as Male.pseudo and Female.pseudo,respectively) and the pseudo-observations of X (referred to ad log10.GDP.pseudo) are given in 3(b).
We then estimate the conditional copula and the conditional dependence of life expectancy at birth of males and females given the covariate X. Figure 4 shows the estimated conditional Kendall's tau as a function of log 10 (GDP).It can be observed from the plot that the estimate of Kendall's tau decreases from around 0.8 to 0.6 as GDP per capita increases from 10 3 = 1000 to 10 4.6 ≈ 40000 USD and it picks up slightly as GDP per capita becomes greater than 40000 USD.Overall, the dependence between the life expectancy at birth of males and females is relatively larger for countries with lower GDP per capita (less than 10000 USD) and the dependence is relatively smaller for countries with higher GDP per capita (greater than 10000 USD).

Concluding Remarks
This article provides a nonparametric approach for estimating conditional copulas based on the empirical checkerboard Bernstein copula (ECBC) estimator.The proposed nonparametric method has its own advantages compared to the semiparametric methods as it gets rid of model misspecification by not relying on any selection of copula family and demonstrates a good finite-sample performance.The large-sample consistency of the proposed ECBC-based conditional copula estimator is also presented.In addition, we derive closedform nonparametric estimates of the conditional dependence measures from the proposed estimator.
Due to the complexity in modelling and inference caused by the dependence of conditional copula on the covariates, it is quite common in practice, particularly for vine copulas, to assume that the dependence structure is not influenced by the value of covariates, which is referred to as 'simplifying assumption' (e.g.see Haff et al. (2010), Acar et al. (2012), Stoeber et al. (2013), Nagler and Czado (2016) and Schellhase and Spanhel (2018)).In the literature, there have been some available tests of the simplifying assumption; see Acar et al. (2013), Gijbels et al. (2017), Gijbels et al. (2017), Derumigny and Fermanian (2017) and Kurz and Spanhel (2017), etc.Our proposed ECBC-based conditional copula estimator can be useful for constructing new tests of the simplifying assumption.We have shown the framework of obtaining a general estimate of the conditional copula that is allowed to vary with the value of covariates.It is also straightforward to obtain an estimate satisfying the simplifying assumption based on the covariate-adjusted pseudo-observations again using ECBC estimator.Therefore, it could be possible to build test statistics based on some discrepancy criteria like Kolmogorov-Smirnov type, Anderson-Darling type, etc., where the distributions of such test statistics could be approximated by bootstrap schemes.
Another interesting topic for future work would be extending the estimation framework to high-dimensional conditional copula.We can perhaps first use some dimension reduction methods like principal component analysis (PCA) and then develop copula models based on the lower dimensional principal components of the covariates.

Appendix
Proof of Theorem 1 Proof.We denote Then we can rewrite the Bernstein copula as and the ECBC copula estimator as where C # n is the empirical checkerboard copula.Thus the partial derivative of 3-dimensional ECBC B(C # n ; u 1 , u 2 , v) with respect to v takes the form of where P ′ m,k (v) is the derivative of P m,k (v) with respect to v. Let us denote the partial derivative of the Bernstein copula B(C; u 1 , u 2 , v) with respect to v as and the partial derivative of the empirical Bernstein copula B(C n ; u 1 , u 2 , v) with respect to v as Using the triangle inequality we have First, we can show that In above the second inequality follows from the fact that since l j h j u l j j (1 − u j ) l j −h j , l j = 0, 1, . . ., h j , j = 1, 2 are binomial probabilities, l j h j =0 l j h j u h j j (1 − u j ) l j −h j = 1 for any u j ∈ [0, 1], j = 1, 2. Under the assumption that the marginal CDFs are continuous, it follows from the Remark 2 in Genest et al. (2017) that for d-dimensional copula and from Lemma 1 in Janssen et al. (2014) it follows that for any fixed 0 < v < 1, Thus, for any fixed 0 < v < 1 we have  C (1) (u 1 , u 2 |v) which means that v)   ∂v is Lipschitz continuous on [0, 1] 3 , then there exists a Lipschitz constant L s.t.
the conditional supremum norm of a conditional function g(u 1 , u 2 |v) defined on the unit square [0, 1] 2 for a fixed v. We denote the common supremum norm as || • ||.The following theorem provides the large-sample consistency of the estimator C # (u 1 , u 2 |v) for fixed value of 0 < v < 1 using the conditional supremum norm.

Figure 2 :
Figure 2: The plots of the estimated conditional Kendall's tau and conditional Spearman's rho as a function of the covariate.
and Y 2 given X = x are also absolutely continuous.The goal is to estimate the conditional copula C x from a random sample of i.i.d.observations (Y i1 , Y i2 , X i ), i = 1, ..., n.