Grouped Normal Variance Mixtures

Grouped normal variance mixtures are a class of multivariate distributions that generalize classical normal variance mixtures such as the multivariate t distribution, by allowing different groups to have different (comonotone) mixing distributions. This allows one to better model risk factors where components within a group are of similar type, but where different groups have components of quite different type. This paper provides an encompassing body of algorithms to address the computational challenges when working with this class of distributions. In particular, the distribution function and copula are estimated efficiently using randomized quasi-Monte Carlo (RQMC) algorithms. We propose to estimate the log-density function, which is in general not available in closed form, using an adaptive RQMC scheme. This, in turn, gives rise to a likelihood-based fitting procedure to jointly estimate the parameters of a grouped normal mixture copula jointly. We also provide mathematical expressions and methods to compute Kendall’s tau, Spearman’s rho and the tail dependence coefficient λ. All algorithms presented are available in the R package nvmix (version ≥ 0.0.5).


Introduction
It is well known that for the purpose of modeling dependence in a risk management setting, the multivariate normal distribution is not flexible enough, and therefore its use can lead to a misleading assessment of risk(s). Indeed, the multivariate normal has light tails and its copula is tail-independent such that inference based on this model heavily underestimates joint extreme events. An important class of distributions that generalizes this simple model is that of normal variance mixtures. A random vector X = (X 1 , . . . , X d ) follows a normal variance mixture, denoted by X ∼ NVM d (µ, Σ, F W ), if, in distribution, where µ ∈ R d is the location (vector), Σ = AA for A ∈ R d×k denotes the symmetric, positive semidefinite scale (matrix) and W ∼ F W is a non-negative random variable independent of the random vector Z ∼ N k (0, I k ) (where I k ∈ R k×k denotes the identity matrix); see, for example, (McNeil et al. 2015, Section 6.2) or . Here, the random variable W can be thought of as a shock mixing the normal Z, thus allowing X to have different tail behavior and dependence structure than the special case of a multivariate normal. The multivariate t distribution with ν > 0 degrees of freedom (dof) is also a special case of (1), for W ∼ IG(ν/2, ν/2); a random variable (rv) W is said to follow an inverse-gamma distribution with shape α > 0 and rate β > 0, notation W ∼ IG(α, β), if W has density f W (w) = β −α w −α−1 exp(−1/(βw))/Γ(α) for w > 0 (here, Γ(x) = W = (W 1 , . . . , W d ) = (F ← W 1 (U), . . . , F ← W d (U)), U ∼ U(0, 1).
If a d-dimensional random vector X satisfies (2) with W given as in (3), we use the notation X ∼ gNVM(µ, Σ, F W ) where F W (w) = P(W ≤ w) for w ∈ R d and the inequality is understood component-wise.
As mentioned above, in the case of an (ungrouped) normal variance mixture distribution from (1), the scalar random variable (rv) W can be regarded as a shock affecting all components of X. In the more general setting considered in this paper where W is a vector of comonotone mixing rvs, different, perfectly dependent random variables affect different margins of X. By moving from a scalar mixing rv to a comonotone random vector, one obtains non-elliptical distributions well beyond the classical multivariate t case, giving rise to flexible modeling of joint and marginal body and tail behaviors. The price to pay for this generalization is significant computational challenges: Not even the density of a grouped t distribution is available in closed form.
At first glance, the definition given in (2) does not indicate any "grouping" yet. However, Equation (3) allows one to group components of the random vector X such that all components within a group have the same mixing distribution. More precisely, let W be split into S sub-vectors, i.e., W = (W 1 , . . . , W S ) where W k has dimension d k for k = 1, . . . , S and ∑ S k=1 d k = d. Now let each W k have stochastic representation W k = (F ← W k (U), . . . , F ← W k (U)). Hence, all univariate margins of the subvector W k are identically distributed. This implies that all margins of the corresponding subvector X k are of the same type.
An example is the copula derived from X in (2) when F W k = IG (ν k /2, ν k /2) for k = 1, . . . , S; this is the aforementioned grouped t copula. Here, different margins of the copula follow (potentially) different t copulas with different dof, allowing for more flexibility in modeling pairwise dependencies. A grouped t copula with S = d, that is when each component has their own mixing distribution, was proposed in Venter et al. (2007) (therein called "individuated t copula") and studied in more detail in Luo and Shevchenko (2010) (therein called "t copula with multiple dof"). If S = 1, the classical t copula with exactly one dof parameter is recovered.
For notational convenience, derivations in this paper are often done for the case S = d, so that the F W j are all different; the case S < d, that is when grouping is present, is merely a special case where some of the F W j are identical. That being said, we chose to keep the name "grouped" to refer to this class of models so as to reflect the original motivation for this type of model, e.g., as in Daul et al. (2003), where it is used to model the components of a portfolio in which there are subgroups representing different business sectors.
Previous work on grouped t copulas and their corresponding distributions includes some algorithms for the tasks needed to handle these models, but were mostly focused on demonstrating the superiority of this class of models over special cases such as the multivariate normal or t distribution. More precisely, in Daul et al. (2003), the grouped t copula was introduced and applied to model an internationally diversified credit portfolio of 92 risk factors split into 8 subgroups. It was demonstrated that the grouped t copula is superior to both the Gaussian and t copula in regards to modeling the tail dependence present in the data. Luo and Shevchenko (2010) also study the grouped t copula and, unlike in Daul et al. (2003), allow group sizes of 1 (corresponding to S = d in our definition). They provide calibration methods to fit the copula to data and furthermore study bivariate characteristics of the grouped t copula, including symmetry properties and tail dependence.
However, to the best of our knowledge, there currently does not exist an encompassing body of work providing all algorithms and formulas required to handle these copulas and their corresponding distributions, both in terms of evaluating distributional quantities and in terms of general fitting algorithms. In particular, not even the problem of computing the distribution and density function of a grouped t copula has been addressed. Our paper fills this gap by providing a complete set of algorithms for performing the main computational tasks associated with these distributions and their associated copulas, and does so in an as automated way as possible. This is done not only for grouped t copulas, but (in many cases) for the more general grouped normal variance mixture distributions/copulas, which allow for even further flexibility in modeling the shock variables W. Furthermore, we assume that the only available information about the distribution of the W j are the marginal quantile functions in the form of a "black box", meaning that we can only evaluate these quantile functions but have no mathematical expression for them (so that neither the density, nor the distribution function of W j are available in closed form).
Our work includes the following contributions: (i) we develop an algorithm to evaluate the distribution function of a grouped NVM model. Our method only requires the user to provide a function that evaluates the quantile function of the W j through a black box. As such, different mixing distributions can be studied by merely providing a quantile function without having to implement an integration routine for the model at hand; (ii) as mentioned above, the density function for a grouped t distribution does not exist in closed form, neither does it for the more general grouped NVM case. We provide an adaptive algorithm to estimate this density function in a very general setting. The adaptive mechanism we propose ensures the estimation procedure is precise even for points that are far from the mean; (iii) to estimate Kendall's tau and Spearman's rho for a two-dimensional grouped NVM copula, we provide a representation as an expectation, which in turn leads to an easy-to-approximate two-or three-dimensional integral; (iv) we provide an algorithm to estimate the copula and its density associated with the grouped t copula, and fitting algorithms to estimate the parameters of a grouped NVM copula based on a dataset. While the problem of parameter estimation was already studied in Daul et al. (2003) and Luo and Shevchenko (2010), the computation of the copula density which is required for the joint estimation of all dof parameters has not been investigated in full generality for arbitrary dimensions yet, which is a gap we fill in this paper.
The four items from the list of contributions described in the previous paragraph correspond to Sections 3-6 of the paper. Section 2 includes a brief presentation of the notation used, basic properties of grouped NVM distributions and a description of randomized quasi-Monte Carlo methods that are used throughout the paper since most quantities of interest require the approximation of integrals. Section 7 provides a discussion. The proofs are given in Section 8.
All our methods are implemented in the R package nvmix ) and all numerical results are reproducible with the demo grouped_mixtures.

Notation, Basic Properties and Tools
This section provides the necessary background, notations and tools needed throughout the remainder of this paper. The first part addresses some properties of grouped normal variance mixtures, such as mean, covariance and the relationship with elliptical distributions. The second part of this section describes randomized quasi-Monte Carlo methods, which is the type of integration method we apply to approximate quantities that are expressed as expectations, such as the distribution function of grouped normal variance mixtures.

Grouped Normal Variance Mixtures
To simplify the notation throughout the remainder of this paper, we use the shorthand notations Many properties of grouped normal variance mixtures are derived by conditioning on the d-dimensional random vector W, or equivalently by conditioning on the underlying univariate uniform rv U. Indeed, where N d (µ, Σ) denotes the d-dimensional multivariate normal distribution with mean vector µ and covariance matrix Σ. One can see that W "mixes" the covariance matrix of a multivariate normal and can be regarded as a shock affecting all components of X.

Mean and Covariance
Furthermore, corr(X) = P, where P denotes the correlation matrix corresponding to Σ. If A = I d , the (uncorrelated) components of X are independent if and only if all components of W are constant with probability 1 and thus if X is multivariate normal; see (McNeil et al. 2015, Lemma 6.5).
Assuming k = d, the matrix A is typically the Cholesky factor computed from a given Σ.
Other decompositions of Σ into AA for some A ∈ R d×d can be obtained from the eigendecomposition or singular-value decomposition.

Relationship with Elliptical Distributions
It is well known that normal variance mixtures, such as the multivariate normal and t distributions, are elliptical. A d-dimensional random vector Y is said to have an elliptical distribution, denoted by where µ ∈ R d , AA = Σ and S ∼ U(S d−1 ) independent of a non-negative rv R ∼ F R . (Devroye 1986, Chapter 5)) it is easy to see from (4) , then X is in general not elliptical, unless F W 1 = · · · = F W d . This can be seen from (4), since the scalar radial rv R cannot be used to model comonotone shocks. Applying the same principle used to define grouped normal variance mixtures in (2), one can define grouped elliptical distributions via the stochastic representation where ∼ U(0, 1) and R j ≥ 0 a.s.
If X ∼ gNVM d (µ, Σ, F W ), we can set F R j = F W j for j = 1, . . . , d and F W to be the distribution function of a χ d random variable (the square root of a χ 2 d random variable). This shows that X is a grouped elliptical distribution in the sense of (5). In this work we focus on the more tractable class of grouped NVM distributions and do not further detail grouped elliptical distributions.

Randomized quasi-Monte Carlo Methods
Many quantities of interest in this paper, such as the distribution function of a gNVM distribution, can, after a suitable transformation, be expressed as where d ∈ N can be large (e.g., µ = F(a, b) for large d; see (15) in the next section) or small (e.g., d = 3 in (22)) and the integral cannot be computed explicitly. As a method that works flexibly for small and large dimensions, one might consider Monte Carlo (MC) estimation, that is, estimate µ bŷ ∼ U(0, 1) d , whose asymptotic (1 − α)-confidence interval (CI) can be approximated for sufficiently large n by . An often superior alternative to MC estimation are quasi-Monte Carlo (QMC) methods. Instead of averaging function values of a random sample U 1 , . . . , U n , a low discrepancy point-set P n = {v 1 , . . . , v n } ⊂ [0, 1) d is employed, which aims at filling the unit hypercube in a more homogeneous way. To make error estimation easily possible, one can randomize the point-set P n in a way such that the points in the resulting point set, sayP n , are uniformly distributed over [0, 1] d while keeping the low discrepancy of the point set overall, giving rise to randomized QMC (RQMC) methods. In our algorithms, we use a digitally-shifted Sobol' sequence as implemented in the function sobol(, randomize = "digital.shift") of the R package qrng; see Hofert and Lemieux (2019). We remark that generatingP n is even slightly faster than the Mersenne Twister, which is R's default (pseudo-)random number generator.
Given B independently randomized copies of P n , sayP n,b = {u 1,b , . . . , u n,b } for b = 1, . . . , B, we construct B independent RQMC estimators of the form which are combined into the RQMC estimator.
of µ. An approximate (1 − α)-CI for µ can be estimated as is an unbiased estimator of var(μ RQMC b,n ). One can computê µ RQMC n from (8) for some initial sample size n (e.g., n = 2 7 ) and iteratively increase the sample size of eachμ RQMC b,n in (7) until the length of the CI in (9) satisfies a pre-specified error tolerance. In our implementations, we use B = 15, an absolute default error tolerance ε = 0.001 (which can be changed by the user) and z 1−α/2 = 3.5 (so α ≈ 0.00047). By usingμ RQMC n as approximation for the true value of µ, one can also consider relative instead of absolute errors.
Sometimes it is necessary to estimate log(µ) rather than µ; in particular, when µ is small. For instance, if µ = f (x) where f (x) is the density of some random vector evaluated at x ∈ R d , interest may lie in log( f (x)) as this quantity is needed to compute the log-likelihood of a random sample (which then may be optimized over some parameter space). When µ is small, it is typically not a good idea to use log(µ) ≈ log(μ RQMC n ) directly, but rather to compute a numerically more robust estimator for log(µ), a proper logarithm. Define the function LSE (for Logarithmic Sum of Exponentials) as LSE(c 1 , . . . , c n ) = log where c 1 , . . . , c n ∈ R and c max = max{c 1 , . . . , c n }.
The sum inside the logarithm on the right side of this equation is bounded between 1 and n so that the right side of this equation is numerically more robust than the left side.

Distribution Function
Let −∞ ≤ a < b ≤ ∞ componentwise (entries ±∞ to be interpreted as the corresponding limits). Then F(a, b) = P(a < X ≤ b) is the probability that the random vector X falls in the hyper-rectangle spanned by the lower-left and upper-right endpoints a and b, respectively. If a = (−∞, . . . , −∞), we recover F(a, x) = F(x) = P(X 1 ≤ x 1 , . . . , X d ≤ x d ) which is the (cumulative) distribution function of X.
Assume wlog that µ = 0, otherwise adjust a, b accordingly. Then where itself is a d-dimensional integral for which no closed formula exists and is typically approximated via numerical methods; see, e.g., Genz (1992). Comonotonicity of the W j allowed us to write F(a, b) as a (d + 1)-dimensional integral; had the W j a different dependence structure, this convenience would be lost and the resulting integral in (11) could be up to 2d-dimensional (e.g., when all W j are independent).

Estimation
As demonstrated in Section 2.1, we need to approximate In Hintz et al. (2020), randomized quasi-Monte Carlo methods have been derived to approximate the distribution function of a normal variance mixture X ∼ NVM d (µ, Σ, F W ) from (1). Grouped normal variance mixtures can be dealt with similarly, thanks to the comonotonicity of the mixing random variables in W.
In order to apply RQMC to the problem of estimating F(a, b), we need to transform F(a, b) to an integral over the unit hypercube. To this end, we first address Φ Σ . Let C = (C ij ) d i,j=1 be the Cholesky factor of Σ (a lower triangular matrix such that CC = Σ). We assume that Σ has full rank which implies C jj > 0 for j = 1, . . . , d. Genz (1992) (see also Genz and Bretz (199920022009) uses a series of transformations, relying on C being a lower triangular matrix, to write where thed i andê i are recursively defined viâ where for u = (u 0 , u 1 , . . . , u d−1 ) ∈ (0, 1) d . The e i are recursively defined by for i = 2, . . . , d and the d i are e i with b i replaced by a i for i = 1, . . . , d.
Summarizing, we were able to write F(a, b) as an integral over the d-dimensional unit hypercube. Our algorithm to approximate F(a, b) consists of two steps: First, a greedy re-ordering algorithm is applied to the inputs a, b, Σ. It re-orders the components 1, . . . , d of a and b as well as the corresponding rows and columns in Σ in a way that the expected ranges of g i in (15) are increasing with the index i for i = 1, . . . , d. Observe that the integration variable u i is present in all remaining d − i + 1 integrals in (14) whose ranges are determined by the ranges of g 1 , . . . , g i ; reordering the variables according to expected ranges therefore (in the vast majority of cases) reduces the overall variability of g (namely, var(g(U)) for U ∼ U(0, 1) d ). Reordering also makes the first variables "more important" than the last ones, thereby reducing the effective dimension of the integrand. This is particularly beneficial for quasi-Monte Carlo methods, as these methods are known to perform well in high-dimensional problems with low effective dimension; see, e.g., Caflisch et al. (1997), Wang and Sloan (2005). For a detailed description of the method, see , Algorithm 3.2) (with a j /µ √ W replaced by a j /µ √ W j and similarly for b j for j = 1, . . . , d to account for the generalization); similar reordering strategies have been proposed in Gibson et al. (1994) for calculating multivariate normal and in Genz and Bretz (2002) for multivariate t probabilities.
Second, an RQMC algorithm as described in Section 2.2 is applied to approximate the integral in (14) with re-ordered a, b, Σ and F W . Instead of integrating g from (15) directly, antithetic variates are employed so that effectively, the functiong(u) = (g(u) + g(1 − u))/2 is integrated.
The algorithm to estimate F(a, b) just described is implemented in the function pgnvmix() of the R package nvmix.

Numerical Results
In order to assess the performance of our algorithm described in Section 3.1, we estimate the error as a function of the number of function evaluations. Three estimators are considered. First, the "Crude MC" estimator is constructed by sampling X 1 , . . . , X n ind.
The second and third estimator are based on the integrand g from (15), which is integrated once using MC ("g (MC)") and once using a randomized Sobol' sequence ("g (sobol)"). In either case, variable reordering is applied first. We perform our experiments for an inverse-gamma mixture. As motivated in the introduction, an important special case of (grouped) normal variance mixtures is obtained when the mixing distribution is inverse-gamma. In the ungrouped case when The distribution function of X ∼ t d (ν, µ, Σ) does not admit a closed form; estimation of the latter was discussed for instance in Genz andBretz (2009), Hintz et al. (2020), Cao et al. (2020). The same holds for a grouped inverse-gamma mixture model. If W j ∼ IG(ν j /2, ν j /2) for j = 1, . . . , d, the random vector X follows a grouped t distribution, denoted by X ∼ gt d (ν 1 , . . . , ν d ; µ, Σ) or by X ∼ gt d (ν, µ, Σ) for ν = (ν 1 , . . . , ν d ). If 1 < S < d, denote by d 1 , . . . , d S the group sizes. In this case, we use the notation For our numerical examples to test the performance of our procedure for estimating F(a, b), assume X ∼ gt d (ν, 0, P) for a correlation matrix P. We perform the experiment in d = 5 with ν = (1.5, 2.5, . . . , 5.5) and in d = 20 with ν = (1, 1.25, . . . , 5.5, 5.75). The following is repeated 15 times: Sample an upper limit b ∼ U(0, 3 √ d) d and a correlation matrix P (sampled based on a random Wishart matrix via the function rWishart() in R). Then estimate P(X ≤ b) using the three aforementioned methods using various sample sizes and estimate the error for the MC estimators based on a CLT argument and for the RQMC estimator as described in Section 2.2. Figure 1 reports the average absolute errors for each sample size over the 15 runs.
Convergence speed as measured by the regression coefficient α of log(ε) = α log(n) + c whereε is the estimated error are displayed in the legend. As expected, the MC estimators have an overall convergence speed of 1/ √ n; however, the crude estimator has a much larger variance than the MC estimator based on the function g. The RQMC estimator ("g (sobol)") not only shows much faster convergence speed than its MC counterparts, but also a smaller variance.

Density Function
Let us now focus on the density of X ∼ gNVM(µ, Σ, F W ), where we assume that Σ has full rank in order for the density to exist. As mentioned in the introduction, the density of X is typically not available in closed form, not even in the case of a grouped t distribution. The same conditioning argument used to derive (11) yields that the density of where denotes the (squared) Mahalanobis distance of x ∈ R d from µ with respect to Σ and the integrand h(u) is defined in an obvious manner. Except for some special cases (e.g., when all W j are inverse-gamma with the same parameters), this integral cannot be computed explicitly, so that we rely on numerical approximation thereof.

Estimation
From (18), we find that computing the density f (x) of X ∼ gNVM d (µ, Σ, F W ) evaluated at x ∈ R d requires the estimation of a univariate integral. As interest often lies in the logarithmic density (or log-density) rather than the actual density (e.g., likelihood-based methods where the log-likelihood function of a random sample is optimized over some parameter space), we directly consider the problem of estimating log(µ) for µ = 1 0 h(u) du with h given in (18). Since µ is expressed as an integral over (0, 1), RQMC methods to estimate log(µ) from Section 2.2 can be applied directly to the problem in this form. If the log-density needs to be evaluated at several x 1 , . . . , x N , one can use the same point-setsP n,b and therefore the same realizations of the mixing random vector W for all inputs. This avoids costly evaluations of the quantile functions F ← W j . Estimating log( f (x)) via RQMC as just described works well for input x of moderate size, but deteriorates if x is far away from the mean. To see this, Figure 2 shows the integrand h for three different input x and three different settings for F W . If x is "large", most of the mass is contained in a small subdomain of (0, 1) containing the abscissa of the maximum of h. If an integration routine is not able to detect this peak, the density is substantially underestimated. Further complication arises as we are estimating the log-density rather than the density. Unboundedness of the natural logarithm at 0 makes estimation of log(µ) for small µ challenging, both from a theoretical and a computational point of view due to finite machine precision. In (Hintz et al. 2020, Section 4), an adaptive RQMC algorithm is proposed to efficiently estimate the log-density of X ∼ NVM d (µ, Σ, F W ). We generalize this method to the grouped case. The grouped case is more complicated because the distribution is not elliptical, hence the density does not only depend on x through D 2 (x, µ, Σ). Furthermore, the height of the (unique) maximum of h in the ungrouped case can be easily computed without simulation, which helps the adaptive procedure find the relevant region; in the grouped case, the value of the maximum is usually not available. Lastly, S (as opposed to 1) quantile evaluations are needed to obtain one function value h(u); from a run time perspective, evaluating these quantile functions is the most expensive part. If x is "large", the idea is to apply RQMC only in a relevant region (u l , u r ) with argmax u h(u) =: u * ∈ (u l , u r ). More precisely, given a threshold ε th with 0 < ε th < h max = max u∈(0,1) h(u), choose u l , u r (l for "left" and r for "right") with 0 ≤ u l ≤ u * ≤ u r ≤ 1 so that h(u) > ε th if and only if u ∈ (u l , u r ). For instance, take ε th = 10 log(h max )/ log(10)−k th (19) with k th = 10 so that ε th is 10 orders smaller than h max . One can then apply RQMC (with a proper logarithm) in the region (u l , u r ) (by replacing every By construction, the remaining regions do not contribute significantly to the overall integral anyway, so that a rather quick integration routine suffices here. Note that neither h max , nor u l , u r are known explicitly. However, h max can be estimated from pilot-runs and u l , u r can be approximated using bisections. Summarizing, we propose the following method to estimate log( f (x i )), i = 1, . . . , N, for given inputs x 1 , . . . , x N and error tolerance ε.
This algorithm is implemented in the function dgnvmix(, log = TRUE) in the R package nvmix, which by default uses a relative error tolerance.
The advantage of the proposed algorithm is that only little run time is spent on estimating "easy" integrals, thanks to the pilot run in Step 1. If n 0 = 2 10 and B = 15 (the current default in the nvmix package), this step gives 15 360 pairs (u, F ← W (u)). These pairs give good starting values for the bisections to find u l , u r . Note that no additional quantile evaluations are needed to estimate the less important regions (0, u l ) and (u r , 1).

Numerical Results
Luo and Shevchenko (2010) are faced with almost the same integration problem when estimating the density of a bivariate grouped t copula. They use a globally adaptive integration scheme from Piessens et al. (2012) to integrate h. While this procedure works well for a range of inputs, it deteriorates for input x with large components.
Consider first X ∼ t d (ν, 0, I d ) and recall that the density of X is known and given by (17); this is useful to test our estimation procedure. As such, let X ∼ t 2 (ν = 6, 0, I 2 ) and consider the problem of evaluating the density of X at x ∈ {(0, 0), (5, 5), (25, 25), (50, 50)}. Some values of the corresponding integrands are shown in Figure 2. In Table 1, true and estimated (log-)density values are reported; once estimated using the R function integrate(), which is based on the QUADPACK package of Piessens et al. (2012) and once using dgnvmix(), which is based on Algorithm 1. Clearly, the integrate() integration routine is not capable of detecting the peak when input x is large, yielding substantially flawed estimates. The estimates obtained from dgnvmix(), however, are quite close to the true values even far out in the tail.

1.
Computeμ RQMC log f (x i ),n 0 with sample size n 0 using the same random numbers for all input x i , i = 1, . . . , N. Store all uniforms with corresponding quantile evaluations F ← W in a list L.

3.
For each remaining input x s , s = 1, . . . , N , do: (a) Use all pairs (u, F ← W (u)) in L to compute values of h(u) and setĥ max = max u∈L h(u). If the largest value of h is obtained for the largest (smallest) u in the list L, set u * = 1 (u * = 0).

(b)
If u * = 1, set u r = 1 and if u * = 0, set u l = 0. Unless already specified, use bisections to find u l and u r such that u l < u * < u r and u l (u r ) is the smallest (largest) u such that h(u) > ε th from (19) with h max replaced byĥ max . Starting intervals for the bisections can be found from the values in L. (c) If u l > 0, approximate log( u l 0 h(u) du) using a trapezoidal rule with proper logarithm and knots u 1 , . . . , u m where u i are those u's in L satisfying u ≤ u l . Call the approximation µ (0,u l ) (x s ). If u l = 0, setμ (0,u l ) = −∞. (d) If u r < 1, approximate log( 1 u r h(u) du) using a trapezoidal rule with proper logarithm and knots u 1 , . . . , u p where u i are those u's in L satisfying u ≥ u r . Call the Returnμ RQMC log f (x l ) , l = 1, . . . , N. The preceding discussion focused on the classical multivariate t setting, as the density is known in this case. Next, consider a grouped inverse-gamma mixture model and let X ∼ gt d (ν, µ, Σ). The density f gt ν,µ,P of X ∼ gt d (ν, µ, Σ) is not available in closed form, so that here we indeed need to rely on estimation of the latter. The following experiment is performed for X ∼ gt 2 (ν, 0, I 2 ) with ν = (3, 6) and for X ∼ gt 10 (ν, 0, I 10 ) where ν = (3, . . . , 3, 6, . . . , 6) (corresponding to two groups of size 5 each). First, a sample from a more heavy tailed grouped t distribution of size 2500 is sampled (with degrees of freedom ν = (1, 2) and ν = (1, . . . , 1, 2, . . . , 2), respectively) and then the log-density function of X ∼ gt d (ν, 0, I d ) is evaluated at the sample. The results are shown in Figure 3. It is clear from the plots that integrate() again gives wrong approximations to f (x) for input x far out in the tail; for small input x, the results from integrate() and from dgnvmix() coincide. Furthermore, it can be seen that the density function is not monotonic in the Mahalanobis distance (as grouped normal mixtures are not elliptical anymore). The plot also includes the log-density functions of an ungrouped d-dimensional t distribution with degrees of freedom 3 and 6, respectively. The log-density function of the grouped mixture with ν = (3, 6) is not bounded by either; in fact, the grouped mixture shows heavier tails than both the t distribution with 3 and with 6 dof.

Kendall tau and Spearman rho
Two widely used measures of association are the rank correlation coefficients Spearman's rho ρ S and Kendall's tau ρ τ . For elliptical models, one can easily compute Spearman's rho as a function of the copula parameter ρ which can be useful in estimating the matrix P non-parametrically.
For grouped mixtures, however, this is not easily possible. In this section, integral representations for Spearman's rho and Kendall's tau in the general grouped NVM case are derived.
If X ∼ ELL 2 (µ, Σ, F R ) is elliptical and ρ = Σ 12 / √ Σ 11 Σ 22 , then see (Lindskog et al. 2003, Theorem 2). This formula holds only approximately for grouped normal variance mixtures. In Daul et al. (2003), an expression was derived for Kendall's tau of bivariate, grouped t copulas. Their result is easily extended to the more general grouped normal variance mixture case; see Section 8 for the proof.
Next, we address Spearman's rho ρ S . For computing ρ S , it is useful to study P(X 1 > 0, X 2 > 0). If X ∼ ELL 2 (µ, P, F R ) where P is a correlation matrix with P 12 = ρ and P(X = 0) = 0, then see, e.g., (McNeil et al. 2015, Proposition 7.41). Using the same technique, we can show that this result also holds for grouped normal variance mixtures; see Section 8 for the proof.
Next, we derive a new expression for Spearman's rho ρ S for bivariate grouped normal variance mixture distributions; see Section 8 for the proof.

Numerical Results
Let X ∼ gNVM 2 (0, P, F W ). It follows from Proposition 1 that Similarly, Proposition 3 implies that Hence, both ρ τ (X 1 , X 2 ) and ρ S (X 1 , X 2 ) can be expressed as integrals over the d-dimensional unit hypercube with d ∈ {2, 3} so that RQMC methods as described in Section 2.2 can be applied directly to the problem in this form to estimate ρ τ (X 1 , X 2 ) and ρ S (X 1 , X 2 ), respectively. This is implemented in the function corgnvmix() (with method = "kendall" or method = "spearman") of the R package nvmix.
As an example, we consider three different bivariate grouped t distributions with ν ∈ {(1, 2), (4, 8), (1, 5), (4, 20), (1, ∞), (4, ∞)} and plot estimated ρ τ as a function of ρ in Figure 4. The elliptical case (corresponding to equal dof) is included for comparison. When the pairwise dof are close and ρ is not too close to 1, the elliptical approximation is quite satisfactory. However, when the dof are further apart there is a significant difference between the estimated ρ τ and the elliptical approximation. This is highlighted in the plot on the right side, which displays the relative difference (ρ ell τ − ρ τ )/ρ ell τ . Intuitively, it makes sense that the approximation deteriorates when the dof are further apart, as the closer the dof, the "closer" is the model to being elliptical.

Copula Setting
So far, the focus of this paper was on grouped normal variance mixtures. This section addresses grouped normal variance mixture copulas, i.e., the copulas derived from X ∼ gNVM(F W , µ, Σ) via Sklar's theorem. The first part addresses grouped NVM copulas in full generality and provides formulas for the copula, its density and the tail dependence coefficients. The second part details the important special case of inverse-gamma mixture copulas, that is copulas derived from a grouped t distribution, X ∼ gt d (ν, µ, Σ). The third part discusses estimation of the copula and its density whereas the fourth part answers the question of how copula parameters can be fitted to a dataset. The last part of this section includes numerical examples.

Grouped Normal Variance Mixture Copulas
Copulas provide a flexible tool for modeling dependent risks, as they allow one to model the margins separately from the dependence between the margins. Let X ∼ F be a d-dimensional random vector with continuous margins F 1 , . . . , F d . Consider the random vector U given by U = (U 1 , . . . , U d ) = (F 1 (X 1 ), . . . , F d (X d )); note that U j ∼ U(0, 1) for j = 1, . . . , d. The copula C of F (or X) is the distribution function of the margin-free U, i.e., If F is absolutely continuous and the margins F 1 , . . . , F d are strictly increasing and continuous, the copula density is given by where f denotes the (joint) density of F and f j is the marginal density of F j . For more about copulas and their applications to risk management, see, e.g., Embrechts et al. (2001); Nelsen (2007).
Since copulas are invariant with respect to strictly increasing marginal transformations, we may wlog assume that µ = 0, Σ = P is a correlation matrix and we may consider X ∼ gNVM d (0, P, F W ). We find using (11) that the grouped normal variance mixture copula is given by and its density can be computed using (18) as where F j and f j denote the distribution function and density function of X j ∼ NVM 1 (0, 1, F W i ) for j = 1, . . . , d; directly considering log(c gNVM P,F W (u)) also makes (25) more robust to compute. In the remainder of this subsection, some useful properties of gNVM copulas are derived. In particular, we study symmetry properties, rank correlation and tail dependence coefficients.

Radial Symmetry and Exchangeability
It is evident from (2) that X ∼ gNVM d (µ, Σ, F W ) is radially symmetric about its location vector µ. In layman's terms this implies that jointly large values of X are as likely as jointly small values of X. Radial symmetry also implies that c If (X Π(1) , . . . , X Π(d) ) d = (X 1 , . . . , X d ) for all permutations Π of {1, . . . , d}, the random vector X is called exchangeable. The same definition applies to copulas. If X ∼ gNVM d (0, I d , F W ), then X is in general not exchangeable unless F W 1 = · · · = F W d in which case X ∼ NVM d (0, P, F W 1 ). The lack of exchangeability implies that c gNVM I d ,F W (u 1 , . . . , u d ) = c gNVM I d ,F W (u Π(1) , . . . , u Π(d) ), in general.

Tail Dependence Coefficients
Consider a bivariate C gNVM P,F W copula. Such copula is radially symmetric, hence the lower and upper tail dependence coefficients are equal, i.e., λ l = λ u =: λ ∈ [0, 1], where In the case where only the quantile functions F ← W j are available, no simple expression for λ is available. In Luo and Shevchenko (2010), λ is derived for grouped t copulas, as will be discussed in Section 6.2. Following the arguments used in their proof, the following lemma provides a new expression for λ in the more general normal variance mixture case.
Proposition 4. The tail dependence coefficient λ for a bivariate C gNVM P,F W with ρ = P 12 satisfies

Inverse-Gamma Mixtures
If X ∼ t d (ν, 0, P) for a positive definite correlation matrix P, the copula of X extracted via Sklar's theorem is the well known t copula, denoted by C t ν,P . This copula is given by where t ν and t −1 ν denote the distribution function and quantile function of a univariate standard t distribution. Note that (26) is merely the distribution function of X ∼ t d (ν, 0, P) evaluated at the quantiles t −1 ν (u 1 ), . . . , t −1 ν (u d ). The copula density c t ν,P (u) is The (upper and lower) tail dependence coefficient λ of the bivariate C t ν,P with ρ = P 12 is well known to be see (Demarta and McNeil 2005, Propositon 1). The multivariate t distribution being elliptical implies the formula ρ τ = 2 arcsin(ρ)/π for Kendall's tau.
The same formula was given in (McNeil et al. 2015, Proposition 7.44).
Next, consider a grouped inverse-gamma mixture model. If X ∼ gt d (ν, 0, P), the copula of X is the grouped t copula, denoted by C gt ν,P . From (24), and the copula density follows from (25) as The (lower and upper) tail dependence coefficient λ of C gt ν 1 ,ν 2 ,P is given by , see (Luo and Shevchenko 2010, Equation (26)). Here, f χ 2 ν denotes the density of a χ 2 ν distribution. Finally, consider rank correlation coefficients for grouped t copulas. No closed formula for either Kendall's tau or Spearman's rho exists in the grouped t case. An exact integral representation of ρ τ for C gt ν 1 ,ν 2 ,P follows from Proposition 1. No substantial simplification of (21) therein can be achieved by considering the special case when W j ∼ IG(ν j /2, ν j /2). In order to compute ρ τ , one can either numerically integrate (21) (as will be discussed in the next subsection) or use the approximation ρ τ ≈ 2 π arcsin(ρ) which was shown to be a "very accurate" approximation in Daul et al. (2003). For Spearman's rho, no closed formula can be derived either, not even in the ungrouped t copula case, so that the integral in (22) in Proposition 3 needs be computed numerically, as will be discussed in the next subsection.
The discussion in this section highlights that moving from a scalar mixing rv W (as in the classical t case) to comonotone mixing rvs W 1 , . . . , W S (as in the grouped t case) introduces challenges from a computational point of view. While in the classical t setting, the density, Kendall's tau and the tail dependence coefficient are available in closed form, all of these quantities need to be estimated in the more general grouped setting. Efficient estimation of these important quantities is discussed in the next subsection.

Estimation of the Copula and Its Density
Consider a d-dimensional normal variance mixture copula C gNVM P,F W . From (24), it follows that where F X is the distribution function of X ∼ gNVM(0, P, F W ) and F j is the distribution function of NVM 1 (0, 1, F W j ) for j = 1, . . . , d. If the margins are known (as in the case of an inverse-gamma mixture), evaluating the copula is no harder than evaluating the distribution function of X so that the methods described in Section 3.1 can be applied. When the mixing rvs W j are only known through their quantile functions in the form of a "black box", one needs to estimate the marginal quantiles F j of F first. Note that which can be estimated using RQMC. The quantile F ← j (u j ) can then be estimated by numerically solving F j (x) = u for x, for instance using a bisection algorithm or Newton's method.
The general form of gNVM copula densities was given in (25). Again, if the margins are known, the only unknown quantity is the joint density f X which can be estimated using the adaptive RQMC procedure proposed in Section 4.1. If the margins are not available, F ← j can be estimated as discussed above. The marginal densities f j can be estimated using an adaptive RQMC algorithm similar to the one developed in Section 4.1; see also , Section 4).

Remark 2.
Estimating the copula density is the most challenging problem discussed in this paper if we assume that F W is only known via its marginal quantile functions. Evaluating the copula density c gNVM P,F W at one u ∈ [0, 1] d requires estimation of: • the marginal quantiles F ← j (u j ), which involves estimation of F j and then numerical root finding, for each j = 1, . . . , d, • the marginal densities evaluated at the quantiles f j (F ← j (u j )) for j = 1, . . . , d. This involves estimation of the density of a univariate normal variance mixture, • the joint density evaluated at the quantiles f (F ← 1 (u 1 ), . . . , F ← d (u d )), which is another one dimensional integration problem.
It follows from Remark 2 that, while estimation of c gNVM P,F W is theoretically possible with the methods proposed in this paper, the problem becomes computationally intractable for large dimensions d. If the margins are known, however, our proposed methods are efficient and accurate, as demonstrated in next subsection, where we focus on the important case of a grouped t model. Our methods to estimate the copula and the density of C gt ν,k,P are implemented in the functions pgStudentcopula() and dgStudentcopula() in the R package nvmix.

Fitting Copula Parameters to a Dataset
In this subsection, we discuss estimation methods for grouped normal variance mixture copulas. Let X 1 , . . . , X n be independent and distributed according to some distribution with C gNVM P,F W as underlying copula, with X i = (X i,1 , . . . , X i,d ) and group sizes d 1 , . . . , d S with ∑ S j=1 d j = d. Furthermore, let ν k be (a vector of) parameters of the kth mixing distribution for k = 1, . . . , S; for instance, in the grouped t case, ν k = ν k is the degrees of freedom for group k. Finally, denote by ν = (ν 1 , . . . , ν S ) the vector consisting of all mixing parameters. Note that we assume that the group structure is given. We are interested in estimating the parameter vector ν and the matrix P of the underlying copula C gNVM P,F W . In Daul et al. (2003), this problem was discussed for the grouped t copula where d k ≥ 2 for k = 1, . . . , S. In this case, all subgroups are t copulas and Daul et al. (2003) suggest estimating the dof ν 1 , . . . , ν S separately in each subgroup. Computationally, this is rather simple as the density of the ungrouped t copula is known analytically. Luo and Shevchenko (2010) consider the grouped t copula with S = d, so d k = 1 for k = 1, . . . , d. Since any univariate margin of a copula is uniformly distributed, separate estimation is not feasible. As such, Luo and Shevchenko (2010) suggest estimating ν 1 , . . . , ν S jointly by maximizing the copula-likelihood of the grouped mixture. In both references, the matrix P is estimated by estimating pairwise Kendall's tau and using the approximate identity ρ τ (X i , X j ) ≈ 2 arcsin(ρ i,j )/π for i = j. Although we have shown in Section 5 that in some cases, this approximation could be too crude, our assessment is that in the context of the fitting examples considered in the present section, this approximation is sufficiently accurate. Luo and Shevchenko (2010) also consider joint estimation of (P, ν) by maximizing the corresponding copula likelihood simultaneously over all d + d(d − 1)/2 parameters. Their numerical results in d = 2 suggest that this does not lead to a significant improvement. In large dimensions d > 2, the optimization problem becomes intractable, however, so that the first non-parametric approach for estimating P is likely to be preferred.
We combine the two estimation methods, applied to the general case of a grouped normal variance mixture, in Algorithm 2.
Algorithm 2: Estimation of the Copula Parameters ν and P of C gNVM P,F W . Given iid X 1 , . . . , X n , estimate ν and P of the underlying C gNVM P,F W as follows:

1.
Estimation of P. Estimate Kendall's tau ρ τ (X i , X j ) for each pair 1 ≤ i < j ≤ d. Use the approximate identity ρ τ (X i , X j ) ≈ 2 arcsin(ρ i,j )/π to find the estimates ρ i,j . Then combine the estimates ρ i,j into a correlation matrixP, which may have to be modified to ensure positive definiteness.

2.
Transformation to pseudo-observations. If necessary, transform the data X 1 , . . . , X n to pseudo-observations U 1 , . . . , U n from the underlying copula, for instance, by setting U i,j = R i,j /(n + 1) where R i,j is the rank of X i,j among X 1,j , . . . , X n,j .

3.
Initial parameters. Maximize the copula log-likelihood for each subgroup k with d k ≥ 2 over their respective parameters separately. That is, if U denotes the sub-vector of U i belonging to group k, and ifP (k) is defined accordingly, solve the following optimization problems: For "groups" with d k = 1, choose the initial estimateν (k) 0 from prior/expert experience or as a hard-coded value.

4.
Joint estimation. With initial estimatesν (k) 0 , k = 1, . . . , S at hand, optimize the full copula likelihood to estimate ν; that is, ν = argmax l(ν; U 1 , . . . , U n ), l(ν; U 1 , . . . , The method proposed in Daul et al. (2003) returns the initial estimates obtained in Step 3. A potential drawback of this approach is that it fails to consider the dependence between the groups correctly. Indeed, the dependence between a component in group k 1 and a component in group k 2 (e.g., measured by Kendall's tau or by the tail-dependence coefficient) is determined by both ν (k 1 ) and ν (k 2 ) . As such, these parameters should be estimated jointly.
Note that the copula density is not available in closed form, not even in the grouped t case, so that each call of the likelihood function in (30) requires the approximation of n integrals. This poses numerical challenges, as the estimated likelihood function is typically "bumpy", having many local maxima due to estimation errors.
If F W is only known via its marginal quantile functions, as is the general theme of this paper, the optimization problem in (29) and in (30) become intractable (unless d and n are small) due to the numerical challenges involved in the estimation of the copula density; see also Remark 2. We leave the problem of fitting grouped normal variance mixture copulas in full generality (where the distribution of the mixing random variables W j is only specified via marginal quantile functions in the form of a "black box") for future research. Instead, we focus on the important case of a grouped t copula. Here, the quantile functions F ← j (of X j ) and the densities f j are known for j = 1, . . . , d, since the margins are all t distributed. This substantially simplifies the underlying numerical procedure. Our method is implemented in the function fitgStudentcopula() of the R package nvmix. The numerical optimizations in Steps 3 and 4 are passed to the R optimizer optim() and the copula density is estimated as in Section 6.3. Example 1. Consider a 6-dimensional grouped t copula, with three groups of size 2 each and degrees of freedom 1, 4 and 7, respectively. We perform the following experiment: We sample a correlation matrix P using the R function rWishart(). Then, for each sample size n ∈ {250, 500, . . . , 1750, 2000}, we repeat sampling X 1 , . . . , X n 15 times, and in each case, estimate the degrees of freedom once using the method in Daul et al. (2003) (i.e., by estimating the dof in each group separately) and once using our method from the previous section. The true matrix P is used in the fitting, so that the focus is really on estimating the dof. The results are displayed in Figure 5. The estimates on the left are obtained for each group separately; on the right, the dof were estimated jointly by maximizing the full copula likelihood (with initial estimates obtained as in the left figure). Clearly, the jointly estimated parameters are much closer to their true values (which are known in this simulation study and indicated by horizontal lines), and it can be confirmed that the variance decreases with increasing sample size n. n Estimate q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Initial estimates for a grouped t copula (d=6) n Estimate q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q MLEs for a grouped t copula (d=6), 15 repetitions for each n q df1 df2 df3 Figure 5. Estimated dof for a 6-dimensional grouped t copula with 3 groups of size 2 each.
(left) Estimates are obtained by fitting the t copulas of each group separately; (right) the joint copula-likelihood of the grouped t copula is maximized.
Example 2. Let us now consider the negative logarithmic returns of the constituents of the Dow Jones 30 index from 1 January 2014 to 31 December 2015 (n = 503 data points obtained from the R package qrmdata of Hofert and Hornik (2016)) and, after deGARCHing, fit a grouped t copula to the standardized residuals. We choose the natural groupings induced by the industry sectors of the 30 constituents and merge groups of size 1 so that 9 groups are left. Figure 6 displays the estimates obtained for various specifications of maxit, the maximum number of iterations for the underlying optimizer (note that the current default of optim() is as low as maxit = 500). The points for maxit = 0 correspond to the initial estimates found from separately fitting t copulas to the groups. The initial estimates differ significantly from the maximum likelihood estimates (MLEs) obtained from the joint estimation of the dof. Note also that the MLEs change with increasing maxit argument, even though they do not change drastically anymore if 1500 or more iterations are used. Note that the initial parameters result in a much more heavy tailed model than the MLEs. Figure 6 also displays the estimated log-likelihood of the parameters found by the fitting procedure. The six lines correspond to the estimated log-likelihood using six different seeds. It can be seen that estimating the dof jointly (as opposed to group-wise) yields a substantially larger log-likelihood, whereas increasing the parameter maxit (beyond a necessary minimum) only gives a minor improvement. In order to examine the impact of the different estimates on the underlying copula in terms of its tail behavior, Figure 7 displays the probability C(u, . . . , u) estimated using methods from Section 6.3 as a function of u; in a risk management context, C(u, . . . , u) is the probability of a jointly large loss, hence a rare event. An absolute error tolerance of 10 −7 was used to estimate the copula. The figure also includes the corresponding probability for the ungrouped t copula, for which the dof were estimated to be 6.3. Figure 7 indicates that the initial estimates yield the most heavy tailed model. This seems reasonable since all initial estimates for the dof range between 0.9 and 5.3 (with average 2.8). The models obtained from the MLEs exhibit the smallest tail probability, indicating that these are the least heavy tailed models considered here. This is in line with Figure 6, which shows that the dof are substantially larger than the initial estimates. The ungrouped t copula is more heavy tailed than the fitted grouped one (with MLEs) but less heavy tailed than the fitted grouped one with initial estimates. This example demonstrates that it is generally advisable to estimate the dof jointly when grouped modeling is of interest, rather than group-wise as suggested in Daul et al. (2003). Indeed, in this particular example, the initial estimates give a model that substantially overestimates the risk of jointly large losses. As can be seen from Figure 6, optimizing an estimated log-likelihood function is not at all trivial, in particular when many parameters are involved. Indeed, the underlying optimizer never detected convergence, which is why the user needs to carefully assess which specification of maxit to use. We plan on exploring more elaborate optimization procedures which perform better in large dimensions for this problem in the future.
Example 3. In this example, we consider the problem of mean-variance (MV) portfolio optimization in the classical Markowitz (1952) setting. Consider d assets, and denote by µ t and Σ t the expected return vector on the risky assets in excess of the risk free rate and the variance-covariance (VCV) matrix of asset returns in the portfolio at time t, respectively. We assume that an investor chooses the weights x t of the portfolio to maximize the quadratic utility function U( where in what follows we assume the risk-aversion parameter γ = 1. When there are no shortselling (or other) constraints, one finds the optimal x t as x t = Σ −1 t µ t . As in Low et al. (2016), we consider relative portfolio weights, which are thus given by As such, the investor needs to estimate µ t and Σ t . If we assume no shortselling, i.e., x t,j ≥ 0 for j = 1, . . . , d, the optimization problem can be solved numerically, for instance using the R package quadprog of Turlach et al. (2019). Assume we have return data for the d assets stored in vectors y t , t = 1, . . . , T, and a sampling window 0 < M < T. We perform an experiment similar to Low et al. (2016) and compare a historical approach with a model-based approach to estimate µ t and Σ t . The main steps are as follows: 1. In each period t = M + 1, . . . , T, estimate µ t and Σ t using the M previous return data y i , i = t − M, . . . , t − 1. 2. Compute the optimal portfolio weights w t and the out-of-sample return r t = w t y t .
In the historical approach, µ t and Σ t in the first step are merely computed as the sample mean vector and sample VCV matrix of the past return data. Our model-based approach is a simplification of the approach used in Low et al. (2016). In particular, to estimate µ t and Σ t in the first step, the following is done in each time period: 1a. Fit marginal ARMA(1, 1) − GARCH(1, 1) models with standardized t innovations to y i , i = t − M, . . . , t − 1. 1b. Extract the standardized residuals and fit a grouped t copula to the pseudo-observations thereof. 1c. Sample n vectors from the fitted copula, transform the margins by applying the quantile function of the respective standardized t distribution and based on these n d-dimensional residuals, sample from the fitted ARMA(1, 1) − GARCH(1, 1) giving a total of n simulated return vectors, say y i , i = 1, . . . , n. 1d. Estimate µ t and Σ t from y i , i = 1, . . . , n. The historical and model-based approaches each produce T − M out-of-sample returns from which we can estimate the certainty-equivalent return (CER) and the Sharpe-ratio (SR) as CER =μ r − 1 2σ 2 r and SR =μ r σ r , whereμ r andσ r denote the sample mean and sample standard deviation of the T − M out-of-sample returns; see also Tu and Zhou (2011). Note that larger, positive values of the SR and CER indicate better portfolio performance. We consider logarithmic returns of the constituents of the Dow Jones 30 index from 1 January 2013 to 31 December 2014 (n = 503 data points obtained from the R package qrmdata of Hofert and Hornik (2016)), a sampling window of M = 250 days, n = 10 4 samples to estimate µ t and Σ t in the model-based approach, a risk-free interest rate of zero and no transaction costs. We report (in percent) the point estimatesμ r , CER and SR for the historical approach and for the model-based approach based on an ungrouped and grouped t copula in Table 2 assuming no shortselling. To limit the run time for this illustrative example, the degrees of freedom for the grouped and ungrouped t copula are estimated once and held fixed throughout all time periods t = M + 1, . . . , T. We see that the point estimates for the grouped model exceed the point estimates for the ungrouped model. Table 2. Estimated returns, certainty-equivalent returns (CERs) and Sharpe-ratios (SRs) (all in percentage points) of a mean-variance (MV) investor under three investment rules assuming no short-sales: Model-based with grouped and ungrouped t copula and historical.

Discussion and Conclusions
We introduced the class of grouped normal variance mixtures and provided efficient algorithms to work with this class of distributions: Estimating the distribution function and log-density function, estimating the copula and its density, estimating Spearman's rho and Kendall's tau and estimating the parameters of a grouped NVM copula given a dataset. Most algorithms (and functions in the package nvmix) merely require one to provide the quantile function(s) of the mixing distributions. Due to their importance in practice, algorithms presented in this paper (and their implementation in the R package nvmix) are widely applicable in practice.
We saw that the distribution function (and hence, the copula) of grouped NVM distributions can be efficiently estimated even in high dimensions using RQMC algorithms. The density function of grouped NVM distributions is in general not available in closed form, not even for the grouped t distribution, so one relies on its estimation. Our proposed adaptive algorithm is capable of estimating the log-density even in high dimensions accurately and efficiently. Fitting grouped normal variance mixture copulas, such as the grouped t copula, to data is an important yet challenging task due to lack of a tractable density function. Thanks to our adaptive procedure for estimating the density, the parameters can be estimated jointly in the special case of a grouped t copula. As was demonstrated in the previous section, it is indeed advisable to estimate the dof jointly, as otherwise one might severely over-or underestimate the joint tails.
A computational challenge that we plan to further investigate is the optimization of the estimated log-likelihood function, which is currently slow and lacks a reliable convergence criterion that can be used for automation. Another avenue for future research is to study how one can, for a given multivariate dataset, assign the components to homogeneous groups.

Proofs
Proof of Proposition 1. This is an immediate application of a proposition proven in (Daul et al. 2003, p. 6).
Proof of Proposition 3. We follow (McNeil et al. 2015, Proposition 7.44), where a similar result was shown for X ∼ NVM 2 (µ, Σ, F W ). Since X has continuous margins, we can write whereX 1 and X 2 are random variables withX 1 d = X 1 and X 2 d = X 2 and where (X 1 , X 2 ),X 1 and X 2 are all independent; see (McNeil et al. 2015, Proposition 7.34).
Taking the derivative with respect to q gives ∂ ∂q 1 0 Φ ρ (x 1 (u, q), x 2 (u, q)) du = We x 2 ) can be found analogously by swapping the roles of Z 1 and Z 2 . Plugging the derivatives into (32) and taking the limit gives the result in the statement.