1. Introduction
It is well known that for the purpose of modeling dependence in a risk management setting, the multivariate normal distribution is not flexible enough, and therefore its use can lead to a misleading assessment of risk(s). Indeed, the multivariate normal has light tails and its copula is tail-independent such that inference based on this model heavily underestimates joint extreme events. An important class of distributions that generalizes this simple model is that of 
normal variance mixtures. A random vector 
 follows a normal variance mixture, denoted by 
, if, in distribution,
      
      where 
 is the 
location (vector), 
 for 
 denotes the symmetric, positive semidefinite 
scale (matrix) and 
 is a non-negative random variable independent of the random vector 
 (where 
 denotes the identity matrix); see, for example, (
McNeil et al. 2015, Section 6.2) or (
Hintz et al. 2020). Here, the random variable 
W can be thought of as a shock mixing the normal 
, thus allowing 
 to have different tail behavior and dependence structure than the special case of a multivariate normal.
The multivariate 
t distribution with 
 degrees of freedom (dof) is also a special case of (
1), for 
; a random variable (rv) 
W is said to follow an inverse-gamma distribution with shape 
 and rate 
, notation 
, if 
W has density 
 for 
 (here, 
 denotes the gamma function). If 
, then 
, 
, so that all margins are univariate 
t with the same dof 
. The 
t copula, which is the implicitly derived copula from 
 for a correlation matrix 
P via Sklar’s theorem, is a widely used copula in risk management; see, e.g., 
Demarta and McNeil (
2005). It allows one to model pairwise dependencies, including tail dependence, flexibly via the correlation matrix 
P. When 
, all 
k-dimensional margins of 
 are identically distributed. To overcome this limitation, one can allow different margins to have different dof. On a copula level, this leads to the notion of grouped 
t copulas of 
Daul et al. (
2003) and generalized 
t copulas of 
Luo and Shevchenko (
2010).
In this paper, we, more generally, define 
grouped normal variance mixtures via the stochastic representation
      
      where 
 is a 
d-dimensional non-negative and comonotone random vector with 
 that is independent of 
. Denote by 
 the quantile function of a random variable 
W. Comonotonicity of the 
 implies the stochastic representation
      
If a 
d-dimensional random vector 
 satisfies (
2) with 
 given as in (
3), we use the notation 
 where 
 for 
 and the inequality is understood component-wise.
As mentioned above, in the case of an (ungrouped) normal variance mixture distribution from (
1), the scalar random variable (rv) 
W can be regarded as a shock affecting all components of 
. In the more general setting considered in this paper where 
 is a vector of comonotone mixing rvs, different, perfectly dependent random variables affect different margins of 
. By moving from a scalar mixing rv to a comonotone random vector, one obtains non-elliptical distributions well beyond the classical multivariate 
t case, giving rise to flexible modeling of joint and marginal body and tail behaviors. The price to pay for this generalization is significant computational challenges: Not even the density of a grouped 
t distribution is available in closed form.
At first glance, the definition given in (
2) does not indicate any “grouping” yet. However, Equation (
3) allows one to group components of the random vector 
 such that all components within a group have the same mixing distribution. More precisely, let 
 be split into 
S sub-vectors, i.e., 
 where 
 has dimension 
 for 
 and 
. Now let each 
 have stochastic representation 
. Hence, all univariate margins of the subvector 
 are identically distributed. This implies that all margins of the corresponding subvector 
 are of the same type.
An example is the copula derived from 
 in (
2) when 
 for 
; this is the aforementioned grouped 
t copula. Here, different margins of the copula follow (potentially) different 
t copulas with different dof, allowing for more flexibility in modeling pairwise dependencies. A grouped 
t copula with 
, that is when each component has their own mixing distribution, was proposed in 
Venter et al. (
2007) (therein called “individuated 
t copula”) and studied in more detail in  
Luo and Shevchenko (
2010) (therein called “
t copula with multiple dof”). If 
, the classical 
t copula with exactly one dof parameter is recovered.
For notational convenience, derivations in this paper are often done for the case 
, so that the 
 are all different; the case 
, that is when grouping is present, is merely a special case where some of the 
 are identical. That being said, we chose to keep the name “grouped” to refer to this class of models so as to reflect the original motivation for this type of model, e.g., as in 
Daul et al. (
2003), where it is used to model the components of a portfolio in which there are subgroups representing different business sectors.
Previous work on grouped 
t copulas and their corresponding distributions includes some algorithms for the tasks needed to handle these models, but were mostly focused on demonstrating the superiority of this class of models over special cases such as the multivariate normal or 
t distribution. More precisely, in 
Daul et al. (
2003), the grouped 
t copula was introduced and applied to model an internationally diversified credit portfolio of 92 risk factors split into 8 subgroups. It was demonstrated that the grouped 
t copula is superior to both the Gaussian and 
t copula in regards to modeling the tail dependence present in the data. 
Luo and Shevchenko (
2010) also study the grouped 
t copula and, unlike in 
Daul et al. (
2003), allow group sizes of 1 (corresponding to 
 in our definition). They provide calibration methods to fit the copula to data and furthermore study bivariate characteristics of the grouped 
t copula, including symmetry properties and tail dependence.
However, to the best of our knowledge, there currently does not exist an encompassing body of work providing all algorithms and formulas required to handle these copulas and their corresponding distributions, both in terms of evaluating distributional quantities and in terms of general fitting algorithms. In particular, not even the problem of computing the distribution and density function of a grouped t copula has been addressed. Our paper fills this gap by providing a complete set of algorithms for performing the main computational tasks associated with these distributions and their associated copulas, and does so in an as automated way as possible. This is done not only for grouped t copulas, but (in many cases) for the more general grouped normal variance mixture distributions/copulas, which allow for even further flexibility in modeling the shock variables . Furthermore, we assume that the only available information about the distribution of the  are the marginal quantile functions in the form of a “black box“, meaning that we can only evaluate these quantile functions but have no mathematical expression for them (so that neither the density, nor the distribution function of  are available in closed form).
Our work includes the following contributions: (i) we develop an algorithm to evaluate the distribution function of a grouped NVM model. Our method only requires the user to provide a function that evaluates the quantile function of the 
 through a black box. As such, different mixing distributions can be studied by merely providing a quantile function without having to implement an integration routine for the model at hand; (ii) as mentioned above, the density function for a grouped 
t distribution does not exist in closed form, neither does it for the more general grouped NVM case. We provide an adaptive algorithm to estimate this density function in a very general setting. The adaptive mechanism we propose ensures the estimation procedure is precise even for points that are far from the mean; (iii) to estimate Kendall’s tau and Spearman’s rho for a two-dimensional grouped NVM copula, we provide a representation as an expectation, which in turn leads to an easy-to-approximate two- or three-dimensional integral; (iv) we provide an algorithm to estimate the copula and its density associated with the grouped 
t copula, and fitting algorithms to estimate the parameters of a grouped NVM copula based on a dataset. While the problem of parameter estimation was already studied in 
Daul et al. (
2003) and 
Luo and Shevchenko (
2010), the computation of the copula density which is required for the joint estimation of all dof parameters has not been investigated in full generality for arbitrary dimensions yet, which is a gap we fill in this paper.
The four items from the list of contributions described in the previous paragraph correspond to 
Section 3, 
Section 4, 
Section 5 and 
Section 6 of the paper. 
Section 2 includes a brief presentation of the notation used, basic properties of grouped NVM distributions and a description of randomized quasi-Monte Carlo methods that are used throughout the paper since most quantities of interest require the approximation of integrals. 
Section 7 provides a discussion. The proofs are given in 
Section 8.
All our methods are implemented in the 
R package 
nvmix (
Hofert et al. (
2020)) and all numerical results are reproducible with the demo 
grouped_mixtures.
  3. Distribution Function
Let  componentwise (entries  to be interpreted as the corresponding limits). Then  is the probability that the random vector  falls in the hyper-rectangle spanned by the lower-left and upper-right endpoints  and , respectively. If , we recover  which is the (cumulative) distribution function of .
Assume wlog that 
, otherwise adjust 
, 
 accordingly. Then
      
      where 
 for 
. Note that the function 
 itself is a 
d-dimensional integral for which no closed formula exists and is typically approximated via numerical methods; see, e.g., 
Genz (
1992).
Comonotonicity of the 
 allowed us to write 
 as a 
-dimensional integral; had the 
 a different dependence structure, this convenience would be lost and the resulting integral in (
11) could be up to 
-dimensional (e.g., when all 
 are independent).
  3.1. Estimation
In 
Hintz et al. (
2020), randomized quasi-Monte Carlo methods have been derived to approximate the distribution function of a normal variance mixture 
 from (
1). Grouped normal variance mixtures can be dealt with similarly, thanks to the comonotonicity of the mixing random variables in 
.
In order to apply RQMC to the problem of estimating 
, we need to transform 
 to an integral over the unit hypercube. To this end, we first address 
. Let 
 be the Cholesky factor of 
 (a lower triangular matrix such that 
). We assume that 
 has full rank which implies 
 for 
. 
Genz (
1992) (see also 
Genz and Bretz (
1999, 
2002, 
2009)) uses a series of transformations, relying on 
C being a lower triangular matrix, to write
        
        where the 
 and 
 are recursively defined via
        
        and 
 is 
 with 
 replaced by 
 for 
. Note that the final integral in (
12) is 
-dimensional.
Combining the representation (
12) of 
 with Equation (
11) yields
        
        where
        
        for 
. The 
 are recursively defined by
        
        for 
 and the 
 are 
 with 
 replaced by 
 for 
.
Summarizing, we were able to write  as an integral over the d-dimensional unit hypercube. Our algorithm to approximate  consists of two steps:
First, a greedy re-ordering algorithm is applied to the inputs 
, 
, 
. It re-orders the components 
 of 
 and 
 as well as the corresponding rows and columns in 
 in a way that the expected ranges of 
 in (
15) are increasing with the index 
i for 
. Observe that the integration variable 
 is present in all remaining 
 integrals in (
14) whose ranges are determined by the ranges of 
; reordering the variables according to expected ranges therefore (in the vast majority of cases) reduces the overall variability of 
g (namely, 
 for 
). Reordering also makes the first variables “more important” than the last ones, thereby reducing the effective dimension of the integrand. This is particularly beneficial for quasi-Monte Carlo methods, as these methods are known to perform well in high-dimensional problems with low effective dimension; see, e.g., 
Caflisch et al. (
1997), 
Wang and Sloan (
2005). For a detailed description of the method, see (
Hintz et al. 2020, Algorithm 3.2) (with 
 replaced by 
 and similarly for 
 for 
 to account for the generalization); similar reordering strategies have been proposed in 
Gibson et al. (
1994) for calculating multivariate normal and in 
Genz and Bretz (
2002) for multivariate 
t probabilities.
Second, an RQMC algorithm as described in 
Section 2.2 is applied to approximate the integral in (
14) with re-ordered 
, 
, 
 and 
. Instead of integrating 
g from (
15) directly, antithetic variates are employed so that effectively, the function 
 is integrated.
The algorithm to estimate  just described is implemented in the function pgnvmix() of the R package nvmix.
  3.2. Numerical Results
In order to assess the performance of our algorithm described in 
Section 3.1, we estimate the error as a function of the number of function evaluations. Three estimators are considered. First, the “Crude MC“ estimator is constructed by sampling 
 and estimating 
 by 
. The second and third estimator are based on the integrand 
g from (
15), which is integrated once using MC (“g (MC)”) and once using a randomized Sobol’ sequence (“g (sobol)”). In either case, variable reordering is applied first.
We perform our experiments for an inverse-gamma mixture. As motivated in the introduction, an important special case of (grouped) normal variance mixtures is obtained when the mixing distribution is inverse-gamma. In the ungrouped case when 
 with 
, the distribution of 
 is multivariate 
t (notation 
) with density
        
The distribution function of 
 does not admit a closed form; estimation of the latter was discussed for instance in 
Genz and Bretz (
2009), 
Hintz et al. (
2020), 
Cao et al. (
2020). The same holds for a grouped inverse-gamma mixture model. If 
 for 
, the random vector 
 follows a grouped 
t distribution, denoted by 
 or by 
 for 
. If 
, denote by 
 the group sizes. In this case, we use the notation 
 or 
 for 
. If 
, it follows that 
.
For our numerical examples to test the performance of our procedure for estimating 
, assume 
 for a correlation matrix 
P. We perform the experiment in 
 with 
 and in 
 with 
. The following is repeated 15 times: Sample an upper limit 
 and a correlation matrix 
P (sampled based on a random Wishart matrix via the function 
rWishart() in R). Then estimate 
 using the three aforementioned methods using various sample sizes and estimate the error for the MC estimators based on a CLT argument and for the RQMC estimator as described in 
Section 2.2. 
Figure 1 reports the average absolute errors for each sample size over the 15 runs.
Convergence speed as measured by the regression coefficient  of  where  is the estimated error are displayed in the legend. As expected, the MC estimators have an overall convergence speed of ; however, the crude estimator has a much larger variance than the MC estimator based on the function g. The RQMC estimator (“g (sobol)”) not only shows much faster convergence speed than its MC counterparts, but also a smaller variance.
  4. Density Function
Let us now focus on the density of 
, where we assume that 
 has full rank in order for the density to exist. As mentioned in the introduction, the density of 
 is typically not available in closed form, not even in the case of a grouped 
t distribution. The same conditioning argument used to derive (
11) yields that the density of 
 evaluated at 
 can be written as
      
      where 
 denotes the (squared) Mahalanobis distance of 
 from 
 with respect to 
 and the integrand 
 is defined in an obvious manner. Except for some special cases (e.g., when all 
 are inverse-gamma with the same parameters), this integral cannot be computed explicitly, so that we rely on numerical approximation thereof.
  4.1. Estimation
From (
18), we find that computing the density 
 of 
 evaluated at 
 requires the estimation of a univariate integral. As interest often lies in the logarithmic density (or log-density) rather than the actual density (e.g., likelihood-based methods where the log-likelihood function of a random sample is optimized over some parameter space), we directly consider the problem of estimating 
 for 
 with 
h given in (
18).
Since 
 is expressed as an integral over 
, RQMC methods to estimate 
 from 
Section 2.2 can be applied directly to the problem in this form. If the log-density needs to be evaluated at several 
, one can use the same point-sets 
 and therefore the same realizations of the mixing random vector 
 for all inputs. This avoids costly evaluations of the quantile functions 
.
Estimating 
 via RQMC as just described works well for input 
 of moderate size, but deteriorates if 
 is far away from the mean. To see this, 
Figure 2 shows the integrand 
h for three different input 
 and three different settings for 
. If 
 is “large”, most of the mass is contained in a small subdomain of 
 containing the abscissa of the maximum of 
h. If an integration routine is not able to detect this peak, the density is substantially underestimated. Further complication arises as we are estimating the log-density rather than the density. Unboundedness of the natural logarithm at 0 makes estimation of 
 for small 
 challenging, both from a theoretical and a computational point of view due to finite machine precision.
In (
Hintz et al. 2020, Section 4), an adaptive RQMC algorithm is proposed to efficiently estimate the log-density of 
. We generalize this method to the grouped case. The grouped case is more complicated because the distribution is not elliptical, hence the density does not only depend on 
 through 
. Furthermore, the height of the (unique) maximum of 
h in the ungrouped case can be easily computed without simulation, which helps the adaptive procedure find the relevant region; in the grouped case, the value of the maximum is usually not available. Lastly, 
S (as opposed to 1) quantile evaluations are needed to obtain one function value 
; from a run time perspective, evaluating these quantile functions is the most expensive part.
If 
 is “large”, the idea is to apply RQMC only in a relevant region 
 with 
. More precisely, given a threshold 
 with 
, choose 
 (
l for “left” and 
r for “right”) with 
 so that 
 if and only if 
. For instance, take
        
        with 
 so that 
 is 10 orders smaller than 
.
One can then apply RQMC (with a proper logarithm) in the region  (by replacing every  by ), producing an estimate for . By construction, the remaining regions do not contribute significantly to the overall integral anyway, so that a rather quick integration routine suffices here. Note that neither , nor  are known explicitly. However,  can be estimated from pilot-runs and  can be approximated using bisections.
Summarizing, we propose the following method to estimate , , for given inputs  and error tolerance .
This algorithm is implemented in the function dgnvmix(, log = TRUE) in the R package nvmix, which by default uses a relative error tolerance.
The advantage of the proposed algorithm is that only little run time is spent on estimating “easy” integrals, thanks to the pilot run in Step 1. If  and  (the current default in the nvmix package), this step gives 15 360 pairs . These pairs give good starting values for the bisections to find . Note that no additional quantile evaluations are needed to estimate the less important regions  and .
  4.2. Numerical Results
Luo and Shevchenko (
2010) are faced with almost the same integration problem when estimating the density of a bivariate grouped 
t copula. They use a globally adaptive integration scheme from 
Piessens et al. (
2012) to integrate 
h. While this procedure works well for a range of inputs, it deteriorates for input 
 with large components.
 Consider first 
 and recall that the density of 
 is known and given by (
17); this is useful to test our estimation procedure. As such, let 
 and consider the problem of evaluating the density of 
 at 
. Some values of the corresponding integrands are shown in 
Figure 2. In 
Table 1, true and estimated (log-)density values are reported; once estimated using the 
R function 
integrate(), which is based on the QUADPACK package of 
Piessens et al. (
2012) and once using 
dgnvmix(), which is based on Algorithm 1. Clearly, the 
integrate() integration routine is not capable of detecting the peak when input 
 is large, yielding substantially flawed estimates. The estimates obtained from 
dgnvmix(), however, are quite close to the true values even far out in the tail.
        
| Algorithm 1: Adaptive RQMC Algorithm to Estimate . | 
| Given , , , , , estimate , , via: Compute  with sample size  using the same random numbers for all input , . Store all uniforms with corresponding quantile evaluations  in a list .If all estimates , , meet the error tolerance , go to Step 4. Otherwise let ,  with  be the inputs whose error estimates exceed the error tolerance.For each remaining input , , do:
					   (a)Use all pairs  in  to compute values of  and set . If the largest value of h is obtained for the largest (smallest) u in the list , set  ().(b)If  , set   and if  , set  . Unless already specified, use bisections to find   and   such that   and   ( ) is the smallest (largest) u  such that   from (19 ) with   replaced by  . Starting intervals for the bisections can be found from the values in  .(c)If , approximate  using a trapezoidal rule with proper logarithm and knots  where  are those u’s in  satisfying . Call the approximation . If , set .(d)If , approximate  using a trapezoidal rule with proper logarithm and knots  where  are those u’s in  satisfying . Call the approximation . If , set .(e)Estimate   via RQMC. That is, compute   from (10 ) where every   is replaced by  . Increase n  until the error tolerance   is met. Then set   which estimates  .(f)
Return , .
 | 
The preceding discussion focused on the classical multivariate 
t setting, as the density is known in this case. Next, consider a grouped inverse-gamma mixture model and let 
. The density 
 of 
 is not available in closed form, so that here we indeed need to rely on estimation of the latter. The following experiment is performed for 
 with 
 and for 
 where 
 (corresponding to two groups of size 5 each). First, a sample from a more heavy tailed grouped 
t distribution of size 2500 is sampled (with degrees of freedom 
 and 
, respectively) and then the log-density function of 
 is evaluated at the sample. The results are shown in 
Figure 3.
It is clear from the plots that integrate() again gives wrong approximations to  for input  far out in the tail; for small input , the results from integrate() and from dgnvmix() coincide. Furthermore, it can be seen that the density function is not monotonic in the Mahalanobis distance (as grouped normal mixtures are not elliptical anymore). The plot also includes the log-density functions of an ungrouped d-dimensional t distribution with degrees of freedom 3 and 6, respectively. The log-density function of the grouped mixture with  is not bounded by either; in fact, the grouped mixture shows heavier tails than both the t distribution with 3 and with 6 dof.
  5. Kendall tau and Spearman rho
Two widely used measures of association are the rank correlation coefficients Spearman’s rho  and Kendall’s tau . For elliptical models, one can easily compute Spearman’s rho as a function of the copula parameter  which can be useful in estimating the matrix P non-parametrically. For grouped mixtures, however, this is not easily possible. In this section, integral representations for Spearman’s rho and Kendall’s tau in the general grouped NVM case are derived.
If  is a random vector with continuous margins , then  and , where  independent of  and  is the linear correlation between X and Y. Both  and  depend only on the copula of F.
If 
 is elliptical and 
, then
      
      see (
Lindskog et al. 2003, Theorem 2). This formula holds only approximately for grouped normal variance mixtures. In 
Daul et al. (
2003), an expression was derived for Kendall’s tau of bivariate, grouped 
t copulas. Their result is easily extended to the more general grouped normal variance mixture case; see 
Section 8 for the proof.
Proposition 1. Let  and . Then where .
 Next, we address Spearman’s rho 
. For computing 
, it is useful to study 
. If 
 where 
P is a correlation matrix with 
 and 
, then
      
      see, e.g., (
McNeil et al. 2015, Proposition 7.41). Using the same technique, we can show that this result also holds for grouped normal variance mixtures; see 
Section 8 for the proof.
Proposition 2. Let  and . Then  Remark 1. If  is a grouped elliptical distribution in the sense of (5), a very similar idea can be used to show that .  Next, we derive a new expression for Spearman’s rho 
 for bivariate grouped normal variance mixture distributions; see 
Section 8 for the proof.
Proposition 3. Let  and . Thenwhere .    Numerical Results
Let 
. It follows from Proposition 1 that
        
Similarly, Proposition 3 implies that
        
Hence, both 
 and 
 can be expressed as integrals over the 
d-dimensional unit hypercube with 
 so that RQMC methods as described in 
Section 2.2 can be applied directly to the problem in this form to estimate 
 and 
, respectively. This is implemented in the function 
corgnvmix() (with 
method = "kendall" or 
method = "spearman") of the 
R package 
nvmix.
As an example, we consider three different bivariate grouped 
t distributions with 
 and plot estimated 
 as a function of 
 in 
Figure 4. The elliptical case (corresponding to equal dof) is included for comparison. When the pairwise dof are close and 
 is not too close to 1, the elliptical approximation is quite satisfactory. However, when the dof are further apart there is a significant difference between the estimated 
 and the elliptical approximation. This is highlighted in the plot on the right side, which displays the relative difference 
. Intuitively, it makes sense that the approximation deteriorates when the dof are further apart, as the closer the dof, the “closer” is the model to being elliptical.
  6. Copula Setting
So far, the focus of this paper was on grouped normal variance mixtures. This section addresses grouped normal variance mixture copulas, i.e., the copulas derived from  via Sklar’s theorem. The first part addresses grouped NVM copulas in full generality and provides formulas for the copula, its density and the tail dependence coefficients. The second part details the important special case of inverse-gamma mixture copulas, that is copulas derived from a grouped t distribution, . The third part discusses estimation of the copula and its density whereas the fourth part answers the question of how copula parameters can be fitted to a dataset. The last part of this section includes numerical examples.
  6.1. Grouped Normal Variance Mixture Copulas
Copulas provide a flexible tool for modeling dependent risks, as they allow one to model the margins separately from the dependence between the margins. Let 
 be a 
d-dimensional random vector with continuous margins 
. Consider the random vector 
 given by 
; note that 
 for 
. The 
copula C of 
F (or 
) is the distribution function of the margin-free 
, i.e.,
        
If 
F is absolutely continuous and the margins 
 are strictly increasing and continuous, the 
copula density is given by
        
        where 
f denotes the (joint) density of 
F and 
 is the marginal density of 
. For more about copulas and their applications to risk management, see, e.g., 
Embrechts et al. (
2001); 
Nelsen (
2007).
Since copulas are invariant with respect to strictly increasing marginal transformations, we may wlog assume that 
, 
 is a correlation matrix and we may consider 
. We find using (
11) that the 
grouped normal variance mixture copula is given by
        
        and its density can be computed using (
18) as
        
        where 
 and 
 denote the distribution function and density function of 
 for 
; directly considering 
 also makes (
25) more robust to compute.
In the remainder of this subsection, some useful properties of gNVM copulas are derived. In particular, we study symmetry properties, rank correlation and tail dependence coefficients.
  6.1.1. Radial Symmetry and Exchangeability
A 
d-dimensional random vector 
 is radially symmetric about 
 if 
. It is evident from (
2) that 
 is radially symmetric about its location vector 
. In layman’s terms this implies that jointly large values of 
 are as likely as jointly small values of 
. Radial symmetry also implies that 
.
If  for all permutations  of , the random vector  is called exchangeable. The same definition applies to copulas. If , then  is in general not exchangeable unless  in which case . The lack of exchangeability implies that , in general.
  6.1.2. Tail Dependence Coefficients
Consider a bivariate 
 copula. Such copula is radially symmetric, hence the lower and upper tail dependence coefficients are equal, i.e., 
, where
          
          for 
. In the case where only the quantile functions 
 are available, no simple expression for 
 is available. In 
Luo and Shevchenko (
2010), 
 is derived for grouped 
t copulas, as will be discussed in 
Section 6.2. Following the arguments used in their proof, the following lemma provides a new expression for 
 in the more general normal variance mixture case.
Proposition 4. The tail dependence coefficient λ for a bivariate  with  satisfieswhere for ,    6.2. Inverse-Gamma Mixtures
If 
 for a positive definite correlation matrix 
P, the copula of 
 extracted via Sklar’s theorem is the well known 
t copula, denoted by 
. This copula is given by
        
        where 
 and 
 denote the distribution function and quantile function of a univariate standard 
t distribution. Note that (
26) is merely the distribution function of 
 evaluated at the quantiles 
. The copula density 
 is
        
The (upper and lower) tail dependence coefficient 
 of the bivariate 
 with 
 is well known to be
        
        see (
Demarta and McNeil 2005, Propositon 1). The multivariate 
t distribution being elliptical implies the formula 
 for Kendall’s tau.
A closed formula for Spearman’s rho is not available, but our Proposition 3 implies that
        
Next, consider a grouped inverse-gamma mixture model. If 
, the copula of 
 is the grouped 
t copula, denoted by 
. From (
24),
        
        and the copula density follows from (
25) as
        
The (lower and upper) tail dependence coefficient 
 of 
 is given by
        
        see (
Luo and Shevchenko 2010, Equation (26)). Here, 
 denotes the density of a 
 distribution.
Finally, consider rank correlation coefficients for grouped 
t copulas. No closed formula for either Kendall’s tau or Spearman’s rho exists in the grouped 
t case. An exact integral representation of 
 for 
 follows from Proposition 1. No substantial simplification of (
21) therein can be achieved by considering the special case when 
. In order to compute 
, one can either numerically integrate (
21) (as will be discussed in the next subsection) or use the approximation 
 which was shown to be a “very accurate” approximation in 
Daul et al. (
2003).
For Spearman’s rho, no closed formula can be derived either, not even in the ungrouped 
t copula case, so that the integral in (
22) in Proposition 3 needs be computed numerically, as will be discussed in the next subsection.
The discussion in this section highlights that moving from a scalar mixing rv W (as in the classical t case) to comonotone mixing rvs  (as in the grouped t case) introduces challenges from a computational point of view. While in the classical t setting, the density, Kendall’s tau and the tail dependence coefficient are available in closed form, all of these quantities need to be estimated in the more general grouped setting. Efficient estimation of these important quantities is discussed in the next subsection.
  6.3. Estimation of the Copula and Its Density
Consider a 
d-dimensional normal variance mixture copula 
. From (
24), it follows that
        
        where 
 is the distribution function of 
 and 
 is the distribution function of 
 for 
. If the margins are known (as in the case of an inverse-gamma mixture), evaluating the copula is no harder than evaluating the distribution function of 
 so that the methods described in 
Section 3.1 can be applied.
When the mixing rvs 
 are only known through their quantile functions in the form of a “black box”, one needs to estimate the marginal quantiles 
 of 
F first. Note that
        
        which can be estimated using RQMC. The quantile 
 can then be estimated by numerically solving 
 for 
x, for instance using a bisection algorithm or Newton’s method.
The general form of gNVM copula densities was given in (
25). Again, if the margins are known, the only unknown quantity is the joint density 
 which can be estimated using the adaptive RQMC procedure proposed in 
Section 4.1. If the margins are not available, 
 can be estimated as discussed above. The marginal densities 
 can be estimated using an adaptive RQMC algorithm similar to the one developed in 
Section 4.1; see also (
Hintz et al. 2020, Section 4).
Remark 2. Estimating the copula density is the most challenging problem discussed in this paper if we assume that  is only known via its marginal quantile functions. Evaluating the copula density  at one  requires estimation of:
- the marginal quantiles , which involves estimation of  and then numerical root finding, for each , 
- the marginal densities evaluated at the quantiles  for . This involves estimation of the density of a univariate normal variance mixture, 
- the joint density evaluated at the quantiles , which is another one dimensional integration problem. 
 It follows from Remark 2 that, while estimation of  is theoretically possible with the methods proposed in this paper, the problem becomes computationally intractable for large dimensions d. If the margins are known, however, our proposed methods are efficient and accurate, as demonstrated in next subsection, where we focus on the important case of a grouped t model. Our methods to estimate the copula and the density of  are implemented in the functions pgStudentcopula() and dgStudentcopula() in the R package nvmix.
  6.4. Fitting Copula Parameters to a Dataset
In this subsection, we discuss estimation methods for grouped normal variance mixture copulas. Let  be independent and distributed according to some distribution with  as underlying copula, with  and group sizes  with . Furthermore, let  be (a vector of) parameters of the kth mixing distribution for ; for instance, in the grouped t case,  is the degrees of freedom for group k. Finally, denote by  the vector consisting of all mixing parameters. Note that we assume that the group structure is given. We are interested in estimating the parameter vector  and the matrix P of the underlying copula .
In 
Daul et al. (
2003), this problem was discussed for the grouped 
t copula where 
 for 
. In this case, all subgroups are 
t copulas and 
Daul et al. (
2003) suggest estimating the dof 
 separately in each subgroup. Computationally, this is rather simple as the density of the ungrouped 
t copula is known analytically. 
Luo and Shevchenko (
2010) consider the grouped 
t copula with 
, so 
 for 
. Since any univariate margin of a copula is uniformly distributed, separate estimation is not feasible. As such, 
Luo and Shevchenko (
2010) suggest estimating 
 jointly by maximizing the copula-likelihood of the grouped mixture. In both references, the matrix 
P is estimated by estimating pairwise Kendall’s tau and using the approximate identity 
 for 
. Although we have shown in 
Section 5 that in some cases, this approximation could be too crude, our assessment is that in the context of the fitting examples considered in the present section, this approximation is sufficiently accurate. 
Luo and Shevchenko (
2010) also consider joint estimation of 
 by maximizing the corresponding copula likelihood simultaneously over all 
 parameters. Their numerical results in 
 suggest that this does not lead to a significant improvement. In large dimensions 
, the optimization problem becomes intractable, however, so that the first non-parametric approach for estimating 
P is likely to be preferred.
We combine the two estimation methods, applied to the general case of a grouped normal variance mixture, in Algorithm 2.
        
| Algorithm 2: Estimation of the Copula Parameters  and  of . | 
| Given iid , estimate  and P of the underlying  as follows: Estimation of P. Estimate Kendall’s tau  for each pair . Use the approximate identity  to find the estimates . Then combine the estimates  into a correlation matrix , which may have to be modified to ensure positive definiteness.Transformation to pseudo-observations. If necessary, transform the data  to pseudo-observations  from the underlying copula, for instance, by setting  where  is the rank of  among .Initial parameters. Maximize the copula log-likelihood for each subgroup k  with   over their respective parameters separately. That is, if   (where  ) denotes the sub-vector of   belonging to group k , and if   is defined accordingly, solve the following optimization problems:
                   For “groups” with , choose the initial estimate  from prior/expert experience or as a hard-coded value.Joint estimation. With initial estimates  ,   at hand, optimize the full copula likelihood to estimate  ; that is,
                  
 | 
The method proposed in 
Daul et al. (
2003) returns the initial estimates obtained in Step 3. A potential drawback of this approach is that it fails to consider the dependence between the groups correctly. Indeed, the dependence between a component in group 
 and a component in group 
 (e.g., measured by Kendall’s tau or by the tail-dependence coefficient) is determined by both 
 and 
. As such, these parameters should be estimated jointly.
Note that the copula density is not available in closed form, not even in the grouped 
t case, so that each call of the likelihood function in (
30) requires the approximation of 
n integrals. This poses numerical challenges, as the estimated likelihood function is typically “bumpy”, having many local maxima due to estimation errors.
If 
 is only known via its marginal quantile functions, as is the general theme of this paper, the optimization problem in (
29) and in (
30) become intractable (unless 
d and 
n are small) due to the numerical challenges involved in the estimation of the copula density; see also Remark 2. We leave the problem of fitting grouped normal variance mixture copulas in full generality (where the distribution of the mixing random variables 
 is only specified via marginal quantile functions in the form of a “black box”) for future research. Instead, we focus on the important case of a grouped 
t copula. Here, the quantile functions 
 (of 
) and the densities 
 are known for 
, since the margins are all 
t distributed. This substantially simplifies the underlying numerical procedure. Our method is implemented in the function 
fitgStudentcopula() of the R package 
nvmix. The numerical optimizations in Steps 3 and 4 are passed to the 
R optimizer 
optim() and the copula density is estimated as in 
Section 6.3.
Example 1. Consider a 6-dimensional grouped t copula, with three groups of size 2 each and degrees of freedom  and 7, respectively. We perform the following experiment: We sample a correlation matrix P using the R function rWishart(). Then, for each sample size , we repeat sampling  15 times, and in each case, estimate the degrees of freedom once using the method in Daul et al. (2003) (i.e., by estimating the dof in each group separately) and once using our method from the previous section. The true matrix P is used in the fitting, so that the focus is really on estimating the dof. The results are displayed in Figure 5. The estimates on the left are obtained for each group separately; on the right, the dof were estimated jointly by maximizing the full copula likelihood (with initial estimates obtained as in the left figure). Clearly, the jointly estimated parameters are much closer to their true values (which are known in this simulation study and indicated by horizontal lines), and it can be confirmed that the variance decreases with increasing sample size n.  Example 2. Let us now consider the negative logarithmic returns of the constituents of the Dow Jones 30 index from 1 January 2014 to 31 December 2015 ( data points obtained from the R package qrmdata of Hofert and Hornik (2016)) and, after deGARCHing, fit a grouped t copula to the standardized residuals. We choose the natural groupings induced by the industry sectors of the 30 constituents and merge groups of size 1 so that 9 groups are left. Figure 6 displays the estimates obtained for various specifications of  maxit, the maximum number of iterations for the underlying optimizer (note that the current default of optim() is as low as maxit = 500). The points for maxit = 0 correspond to the initial estimates found from separately fitting t copulas to the groups. The initial estimates differ significantly from the maximum likelihood estimates (MLEs) obtained from the joint estimation of the dof. Note also that the MLEs change with increasing maxit argument, even though they do not change drastically anymore if 1500 or more iterations are used. Note that the initial parameters result in a much more heavy tailed model than the MLEs. Figure 6 also displays the estimated log-likelihood of the parameters found by the fitting procedure. The six lines correspond to the estimated log-likelihood using six different seeds. It can be seen that estimating the dof jointly (as opposed to group-wise) yields a substantially larger log-likelihood, whereas increasing the parameter maxit (beyond a necessary minimum) only gives a minor improvement.  In order to examine the impact of the different estimates on the underlying copula in terms of its tail behavior, 
Figure 7 displays the probability 
 estimated using methods from 
Section 6.3 as a function of 
u; in a risk management context, 
 is the probability of a jointly large loss, hence a rare event. An absolute error tolerance of 
 was used to estimate the copula. The figure also includes the corresponding probability for the ungrouped 
t copula, for which the dof were estimated to be 6.3. 
Figure 7 indicates that the initial estimates yield the most heavy tailed model. This seems reasonable since all initial estimates for the dof range between 0.9 and 5.3 (with average 2.8). The models obtained from the MLEs exhibit the smallest tail probability, indicating that these are the least heavy tailed models considered here. This is in line with 
Figure 6, which shows that the dof are substantially larger than the initial estimates. The ungrouped 
t copula is more heavy tailed than the fitted grouped one (with MLEs) but less heavy tailed than the fitted grouped one with initial estimates.
This example demonstrates that it is generally advisable to estimate the dof jointly when grouped modeling is of interest, rather than group-wise as suggested in 
Daul et al. (
2003). Indeed, in this particular example, the initial estimates give a model that substantially overestimates the risk of jointly large losses. As can be seen from 
Figure 6, optimizing an estimated log-likelihood function is not at all trivial, in particular when many parameters are involved. Indeed, the underlying optimizer never detected convergence, which is why the user needs to carefully assess which specification of 
maxit to use. We plan on exploring more elaborate optimization procedures which perform better in large dimensions for this problem in the future.
Example 3. In this example, we consider the problem of mean-variance (MV) portfolio optimization in the classical Markowitz (1952) setting. Consider d assets, and denote by  and  the expected return vector on the risky assets in excess of the risk free rate and the variance-covariance (VCV) matrix of asset returns in the portfolio at time t, respectively. We assume that an investor chooses the weights  of the portfolio to maximize the quadratic utility function , where in what follows we assume the risk-aversion parameter . When there are no shortselling (or other) constraints, one finds the optimal  as . As in Low et al. (2016), we consider relative portfolio weights, which are thus given by As such, the investor needs to estimate  and . If we assume no shortselling, i.e.,  for , the optimization problem can be solved numerically, for instance using the R package quadprog of Turlach et al. (2019). Assume we have return data for the d assets stored in vectors , , and a sampling window . We perform an experiment similar to Low et al. (2016) and compare a historical approach with a model-based approach to estimate  and . The main steps are as follows: - 1. 
- In each period , estimate  and  using the M previous return data , . 
- 2. 
- Compute the optimal portfolio weights  and the out-of-sample return . 
In the historical approach,  and  in the first step are merely computed as the sample mean vector and sample VCV matrix of the past return data. Our model-based approach is a simplification of the approach used in Low et al. (2016). In particular, to estimate  and  in the first step, the following is done in each time period: - 1a. 
- Fit marginal  models with standardized t innovations to , . 
- 1b. 
- Extract the standardized residuals and fit a grouped t copula to the pseudo-observations thereof. 
- 1c. 
- Sample n vectors from the fitted copula, transform the margins by applying the quantile function of the respective standardized t distribution and based on these n d-dimensional residuals, sample from the fitted  giving a total of n simulated return vectors, say , . 
- 1d. 
- Estimate  and  from , . 
The historical and model-based approaches each produce  out-of-sample returns from which we can estimate the certainty-equivalent return (CER) and the Sharpe-ratio (SR) aswhere  and  denote the sample mean and sample standard deviation of the  out-of-sample returns; see also Tu and Zhou (2011). Note that larger, positive values of the SR and CER indicate better portfolio performance. We consider logarithmic returns of the constituents of the Dow Jones 30 index from 1 January 2013 to 31 December 2014 ( data points obtained from the R package qrmdata of Hofert and Hornik (2016)), a sampling window of  days,  samples to estimate  and  in the model-based approach, a risk-free interest rate of zero and no transaction costs. We report (in percent) the point estimates ,  and  for the historical approach and for the model-based approach based on an ungrouped and grouped t copula in Table 2 assuming no shortselling. To limit the run time for this illustrative example, the degrees of freedom for the grouped and ungrouped t copula are estimated once and held fixed throughout all time periods . We see that the point estimates for the grouped model exceed the point estimates for the ungrouped model.    7. Discussion and Conclusions
We introduced the class of grouped normal variance mixtures and provided efficient algorithms to work with this class of distributions: Estimating the distribution function and log-density function, estimating the copula and its density, estimating Spearman’s rho and Kendall’s tau and estimating the parameters of a grouped NVM copula given a dataset. Most algorithms (and functions in the package nvmix) merely require one to provide the quantile function(s) of the mixing distributions. Due to their importance in practice, algorithms presented in this paper (and their implementation in the R package nvmix) are widely applicable in practice.
We saw that the distribution function (and hence, the copula) of grouped NVM distributions can be efficiently estimated even in high dimensions using RQMC algorithms. The density function of grouped NVM distributions is in general not available in closed form, not even for the grouped t distribution, so one relies on its estimation. Our proposed adaptive algorithm is capable of estimating the log-density even in high dimensions accurately and efficiently. Fitting grouped normal variance mixture copulas, such as the grouped t copula, to data is an important yet challenging task due to lack of a tractable density function. Thanks to our adaptive procedure for estimating the density, the parameters can be estimated jointly in the special case of a grouped t copula. As was demonstrated in the previous section, it is indeed advisable to estimate the dof jointly, as otherwise one might severely over- or underestimate the joint tails.
A computational challenge that we plan to further investigate is the optimization of the estimated log-likelihood function, which is currently slow and lacks a reliable convergence criterion that can be used for automation. Another avenue for future research is to study how one can, for a given multivariate dataset, assign the components to homogeneous groups.