Bivariate Copula Trees for Gross Loss Aggregation with Positively Dependent Risks

: We propose several numerical algorithms to compute the distribution of gross loss in a positively dependent catastrophe insurance portfolio. Hierarchical risk aggregation is performed using bivariate copula trees. Six common parametric copula families are studied. At every branching node, the distribution of a sum of risks is obtained by discrete copula convolution. This approach is compared to approximation by a weighted average of independent and comonotonic distributions. The weight is a measure of positive dependence through variance of the aggregate risk. During gross loss accumulation, the marginals are distorted by application of insurance ﬁnancial terms, and the value of the mixing weight is impacted. To accelerate computations, we capture this effect using the ratio of standard deviations of pre-term and post-term risks, followed by covariance scaling. We test the performance of our algorithms using three examples of complex insurance portfolios subject to hurricane and catastrophes.


Introduction
Catastrophe models are widely used by insurers to help weather extreme natural catastrophe events and to ascertain that they are able to manage their risk and pay their claims (Grossi 2004;Grossi et al. 2005). The models estimate how frequent and severe future catastrophes are likely to be, where they are likely to occur and the amount of damage they can inflict. Individual instances of peril, e.g., a simulated earthquake with a particular magnitude, are called events and are typically created using Monte Carlo sampling of model parameters (Clark 2015). Large catalogs of simulated events are generated, representing an ensemble of hypothetical catastrophes in the period of, e.g., 10,000 years. For each event in the catalog, the model calculates the peril's intensity (e.g., wind speed for hurricanes, degree of ground shaking for earthquakes or water level for floods) at each location within the affected area. Then, the intensity together with information about the type of construction and occupancy for each insured property is converted into a probability distribution of loss using a so-called vulnerability module. Finally, property losses are summed up into the event's total loss using a loss-aggregation algorithm (Grossi et al. 2005). Insurance contract terms, applicable to the sharing of risk on each property or groups of properties, determine loss perspective on the event total, such as: • Ground-up loss (total loss without any policy conditions applied) • Retained loss (loss retained by insured party) • Gross loss (loss to an insurer after application of policy financial terms) To investigate the impact of catalog events on a portfolio of properties, the event total losses are further aggregated within each catalog year into an annual loss. Property loss, event total and annual loss are typically modeled as discrete random variables (Cossette et al. 2003) characterized by probability mass functions (pmfs). Finally, the mixture of annual loss pmfs is transformed into the exceedance probability (EP) curve (Grossi et al. 2005). The EP curve helps insurers determine losses that correspond to percentiles of the annual loss for all simulated catalog years. The basic step in portfolio loss analysis is computation of the event total loss (Cossette et al. 2003;Dhaene et al. 2014;Wang 1998). This procedure can be thought of as estimation of the distribution of the sum S = X 1 + X 2 + . . . , +X m of positively dependent, non-negative random variables X 1 , X 2 , . . . , X m when the joint distribution of these variables is unavailable. Many researchers have studied this problem in the context of Fréchet theory and have shown its application to finance (see, e.g., Deelstra et al. 2011;Kaas et al. 2000;Linders and Stassen 2016). The latter contributions offer closed-form expressions for the upper or lower convex bounds of S when the statistics at hand are restricted to, usually parametric, continuous marginal distributions characterizing X 1 , X 2 , . . . , X m and partial information on the joint dependence structure of the random vector (X 1 , X 2 , . . . , X m ). Another line of research addressed the problem of risk aggregation, modeling the joint distribution using copulas. By assuming a parametric form of the marginals and a parametric form of the copula, the probability density function (pdf) of S is derived in closed form (see, e.g., Cossette et al. 2013;Marri and Moutanabbir 2022;Vernic 2016).For high-dimensional portfolios, however, fitting a copula that reproduces the prescribed data statistics proves difficult. Another technical complication is that for some combinations of parametric marginals and parametric copulas, estimation of the pdf of S entails a multivariate convolution integral, which is computationally cumbersome for large portfolios. An avant garde workaround referred to as copula-based hierarchical risk aggregation has been proposed recently (Arbenz et al. 2012;Bruneton 2011;Côté and Genest 2015;Derendinger 2015;Joe and Sang 2016) to alleviate the aforementioned bottlenecks. This approach eliminates the need to parameterize one copula for all the risks as it defines the joint, usually bivariate, dependence between partial sums of risks in the subsequent loss accumulation steps. Such a hierarchical pathway of adding dependent risks is represented as a bivariate aggregation tree with copulas describing summation nodes. The original implementation in Arbenz et al. (2012) uses Monte Carlo sampling. Significantly faster convolution-based implementation has been discussed in our previous contribution on ground-up loss estimation (Wójcik et al. 2019). In the present study, we extend this technique in three ways. First, we adapt it to gross loss computation. Second, apart from using Fréchet copula only, we also consider five other copula families commonly applied in insurance and mathematical finance (see, e.g, Bouyé et al. 2000;Venter 2002). Lastly, we enhance the methodology for risk summation from the previously implemented mixture of classical convolution and numerical quantile addition to discrete copula convolution.
Following, we seek to approximate the pmf p φ(S) of φ(S) = φ(X 1 + X 2 +, . . . , X m ) where the insurance policy terms φ represent nonlinear transformations of partial loss aggregates in hierarchical trees. We assume that the random vector (X 1 , X 2 , . . . , X m ) is weakly associated and that the generally nonparametric marginal pmfs of X i and the non-negative covariance Cov[X i , X j ] are known; see Section 2. We analyze gross loss aggregation runs using three insurance portfolios impacted by hurricane and earthquake events. The dependency models cover a broad range of association, shifting focus from correlation between moderately sized losses in the bulk of distributions to correlation between large losses in the upper tails of distributions. We compare the total gross loss pmfs p φ(S) obtained with these models, along with their convex decomposition, in terms of second-order statistics and tail measures of risk. We also provide a detailed discussion of algorithmic implementation of our methods and compare their execution times. We introduce a fast approximation based on Fréchet copula and covariance scaling. For the analyzed portfolios, the latter generates a ∼2-300× speedup compared to the other copulas.
The paper is organized as follows. To start, we provide independent and comonotonic bounds for a gross loss sum of weakly associated risks. Then, we outline a copula-based risk accumulation scheme and discuss the basics of bivariate copulas underlying the summation nodes. Next, we cover the computational aspects of two-stage gross loss accumulation. Lastly, we perform gross loss analysis for two hurricane events and one earthquake event in the United States. Different features of the proposed models are discussed by comparing several risk measures and processing times of aggregation runs.

Basic Concepts
We start with introducing some notations. A non-negative random variable X with finite expectation and variance is called a risk. For m ≥ 2 let X = (X 1 , . . . , X m ) be the portfolio of risks (locations). We assume for any pair of disjoint subsets A 1 , A 2 ⊆ {1, . . . , m}, components of X are weakly associated: for all pairs of nondecreasing functions φ : R k → R, ψ : R m−k → R for 1 ≤ k < m such that the above covariance exists (Christofides and Vaggelatou 2004). We denote by X ⊥ = (X ⊥ 1 , X ⊥ 2 , . . . X ⊥ m ) and X + = (X + 1 , X + 2 , . . . X + m ) independent and comonotone random vectors characterized by the same marginals P X 1 , P X 2 , . . . , P X m as X. By definition, P X ⊥ (x) = ∏ P X i (x i ) and P X + (x) = min (P X 1 (x 1 ), P X 2 (x 2 ), . . . , P X m (x m )). Since (1) fulfills the properties of multivariate positive dependence structure (B1-B7 in Colangelo et al. 2005), the following stochastic order holds: Now, let g : R m → R be a supermodular function such that: where x ∧ y is the componentwise minimum and x ∨ y the componentwise maximum of x and y. Christofides and Vaggelatou (2004) show that (1) leads to the following precedence: where ≤ sm stands for the supermodular order. Equivalently, for all real supermodular functions g, provided that the expectations exist. A few examples of supermodular functions are shown in Figure 1. It follows from Proposition 7 in Cossette et al. (2003) that (4) allows establishment of a convex order relation between the dependent sum S = X 1 + X 2 + . . . + X m and its independent S ⊥ = X ⊥ 1 + X ⊥ 2 + . . . + X ⊥ m and comonotone S + = X + 1 + X + 2 + . . . + X + m counterparts: where ≤ cx stands for the convex order defined as: for all real convex functions v : R → R provided the expectations exist. Equality of expectations E[S ⊥ ] = E[S] = E[S + ] is guaranteed by the fact that X ⊥ , X and X + are members of the same Fréchet class (see, e.g., Dhaene et al. 2014). Therefore (6) yields the following variance order: If the joint distribution P X of the weakly associated vector X can be written as: then, as originally suggested by Dhaene et al. (2014) and discussed in Wójcik et al. (2019), the distribution P S of the positively dependent sum S can be approximated as: where P S ⊥ and P S + are distributions of independent and comonotonic sums S ⊥ and S + , respectively, and the weight measures dependence in portfolio X implicitly through the variance of the sum S of individual risks (Dhaene et al. 2014). The approximation (10) with the weight in (11) preserves the second-order moments of the target sum of risks: We emphasize that (10) does not imply that holds for all s. It only implies that P ⊥ S (s) and P + S (s) can cross one or more times as an immediate consequence of the convex order precedence in (6).

Financial Terms
Insurance financial terms are functions that modify loss payments. For individual risk X, they act as nonlinear transformation Y = φ X (X) with the constraint 0 ≤ φ X (X) ≤ X. If X is discrete with the pmf p X , then the transformation φ X (X) yields: Following the mixture representation idea in (10), the gross loss distribution P φ(S) of the modified arbitrary sum φ(S) can be approximated as: is the gross loss weight and φ(S) = ∑ i φ i (X i ); see Example 4.2 in Dhaene et al. (2014). The most common financial term is the stop-loss function, which represents the loss to an insurer after application of the deductible d on risk X and reads: Its expectation E[φ X (X)] = π(d) is referred to as the stop-loss transform. The limit function describes the application of the policy limit l : The limited stop-loss payoff is generated by consecutive application of the deductible and limit: Its expectation E[φ X (X)] = π(d) − π(d + l) is the difference in stop-loss transforms. On occasion, (20) is referred to as the excess-of-loss (XOL) layer L X (d, l). An insurance contract may also take on a share r of the layer, in which case When the limited stop-loss function is applied to partition the risk X into n−layers (t i−1 ; t i ] i = 1, . . . , n with t 0 = −∞ and t n = +∞, we have XOL layer decomposition expressed as the comonotonic summation: Several practical examples illustrating how the different functions above are used in typical commercial catastrophic insurance portfolios are discussed in Wójcik and Zvezdov (2021).

Ordering the Gross Loss Sums
We recall that, by construction, a positively dependent sum of ground-up risks satisfies convex ordering in (6). During gross loss aggregation, this sequencing can be impacted by financial terms. For comparing gross loss sums, it is convenient to use the stop-loss order ≤ sl related to the convex order by: There are two fundamental compositions (Theorems D1 and D2, Chapter 6 in Marshall et al. 1979) of supermodular functions that guarantee preservation of the stop-loss order. The first composition passes supermodular ψ : R m → R as an argument to convex and non-decreasing φ : R → R resulting in the supermodular g = φ • ψ; see, e.g., Figure 2B,D. When applied to (6) we have: The second supermodular composition g = ψ(φ 1 , . . . , φ m ) is comprised of the supermodular ψ : R m → R and φ i : R → R, i = 1, . . . , m, which are all non-decreasing or all non-increasing; see, e.g., Figure 2C,F and Example 6.3.11 in Denuit et al. (2005). When applied to the individual summands constituting (6), we obtain: Figure 2. Computing the sum of three risks using direct aggregation tree (A-C) and hierarchical aggregation tree with sequential topology (D-G) from ground-up loss perspective (left column) and gross loss perspective (middle and right columns). The branching nodes of hierarchical trees (black dots) represent summation of the incoming pairs of individual and/or cumulative risks. For gross loss perspective, transformation nodes (white dots) represent application of the financial terms φ, ψ to individual and/or cumulative risks.
For the financial terms in Section 2.2, the precedence (25) always holds true. On the other hand, the precedence (24) holds true only for (18). For (19), the sign of inequality changes and loss precedence follows the concave order. For (20) and for the ith layer (22), the problem is more complicated because φ is neither convex nor concave. To proceed in such a case, we invoke the notion of the truncation transform in Hürlimann (1998), which replaces a random variable X with: holds, we use the Karlin-Novikoff-Stoyan-Taylor multiple-crossing conditions for P X and P Y (Karlin and Novikoff 1963;Stoyan and Daley 1983;Taylor 1983) and (Theorems 2.1 and A.1 in Hürlimann 1998) to demand that and that either or b is the crossing point for the marginals such that the sign change of P Y (x) − P X (x) in the neighborhood of b is from "-" to "+". An example explaining the role of requirements (28) and (29) is shown in Figure 3.  The dashed vertical lines represent the truncation transform bounds a = 0.5 and b = 1.75. In (E), the alignment of the corresponding cdfs P S ⊥ and P S + shows that (29) holds. This, together with the necessary condition (28), implies that the truncated cdfs in (F,G) characterize the stop-loss order S ⊥ (a, b) ≤ sl S + (a, b). In (H), the binary decision whether the necessary condition is true or false is plotted as a function of the truncation bounds 0 ≤ a < b ≤ 2. The means µ S ⊥ (a,b) , µ S + (a,b) characterize the independent and comonotonic sums after application of the truncation transform. The red region is where the necessary condition holds. The black dot represents the actual truncation bounds used throughout this example. The impermissible region where b < a is plotted in grey.
Since (20) can be rewritten as: determining if (6) is preserved after application of a deductible and limit comes down to checking if is true using (29) or (30). Accordingly, for XOL layers we require

Partial Sums and Aggregation Trees
The weighted average in (10), referred to as the mixture method in Wang (1998), corresponds to the direct aggregation model in Wójcik et al. (2019). In situations where the partial sum S I = ∑ i∈I X i is of interest for any subset I ⊆ {1, . . . , m}, loss aggregation can be performed by decomposing the positively dependent sum S using the bivariate hierarchical aggregation model in Wójcik et al. (2019). First, we select the two risks X i , X j and define a bivariate dependency model as, e.g., the one in (9) for that pair. Then, we compute the partial sum X i + X j and treat it as a new, integrated risk. The procedure can be reiterated with the remaining m − 1 risks until a single total sum has been computed. The hierarchical models can be visualized as aggregation trees (see examples in Figure 2), which describe collection of relationships between pairs of risks and provide a bottom-up view of the total risk.
When constructing a bivariate aggregation tree, we first need to identify the tree topology, i.e., the order in which the risks are accumulated. Algorithms for computing the order of risk additions for ground-up loss are discussed in Wójcik et al. (2019). For gross loss, the tree topology is defined by the hierarchy of financial terms. The following is how a typical catastrophe insurance portfolio is organized. First, the location terms in (18), (19) or (20) are applied on individual risks. Then, locations tend to aggregate to the sub-limits expressed as (18), (19) or (20) and applied to a geographical subset of the total set of locations, such as, e.g., ZIP code. Thereafter, partial aggregates enter the XOL layers in (21). Next, the layer losses are accumulated to insurance policies using comonotonic addition in (22). Finally, policy losses are summed up to portfolio-level gross loss. We use sequential order of aggregation to sum up risks going into the same financial structure, e.g., an XOL layer and for final accumulation to the event loss total. An example for three risks is shown in Figure 2E. Based on the results in Wójcik et al. (2019), see Table 1 on p. 15, the sequential order is a good compromise, and offers good examples, between simplicity of implementation and accuracy of reproduction of tail statistics of partial aggregate risks as compared to large-sample Monte Carlo runs.

Copulas at Summation Nodes
The second step in constructing an aggregation tree is to provide a description of the bivariate joint distribution P X,Y for each pair of risks X, Y with the marginals P X , P Y entering a typical summation node in Figure 4A. In this study, we focus on the bivariate copula functions C : [0, 1] 2 → [0, 1], which, by Sklar's theorem (Sklar 1959), provide the following representation of the joint distribution: It follows from the stochastic order in (2) that for any bivariate copula where is designated a Fréchet family in Denuit et al. (2002); an example is shown in Figure 5.
Other bivariate copulas used in this paper are listed in Table 1. They include a variety of distributions often used in insurance applications: two elliptical distributions (Gaussian and Student's t), two Archimedean distributions (Gumbel and Joe) and a Morgenstern distribution obtained by adding a perturbation to the independence copula; see, e.g., Bouyé et al. (2000); Joe (1997) for a detailed discussion of probability density shapes and joint tail behavior, and Section 5 for discussion of the impact of these properties on gross loss estimates. We further discretize any continuous copula C in Table 1 using the methodology from Section 7.1 in Geenens (2020). We define the (R × O) discrete copula pmf as: This discretization guarantees that the marginals of the copula pmf are uniform, i.e., ∑ j c( i+1 R , The distribution p S of the sum S = X + Y is then expressed as the discrete copula convolution: Note that if (34) can be approximated using (9), we obtain Fréchet decomposition of the copula C: where the weight w is the comonotonicity coefficient in De Schepper (2006, 2011) defined as: * Φ −1 is the inverse cdf of a standard normal, and Φ Σ is the standard bivariate normal cdf with the covariance matrix Σ parameterized by θ. ** P −1 ν is the inverse cdf of a standard Student's t with ν ∈ {1, 3, 10, 30} degrees of freedom. In this paper, ν is chosen a priori and not subject to the optimization in Algorithm 1; The parameter θ is bounded by [−1, 1] instead of [0, 1] because negative θ in copula could still imply positive dependence in bivariate joint distribution estimated in Algorithm 1.
The pmf of the sum X + Y is then given by (10), where its independent component is computed as: and its comonotonic component by differencing: using, e.g., the algorithms in Wójcik et al. (2019). The key to application of (34) as a dependency model for the gross loss summation in Figure 4B is the copula invariance: under any non-decreasing transformations φ X , φ Y such as, e.g., those representing the financial terms in Section 2.2. As a consequence, the pmf (38) or, if Fréchet decomposition is used, into (39), with the weight expressed as: Finally, we remark that any copula family in Table 1 is positively ordered, which implies that for θ 1 ≤ θ 2 and for all u, v ∈ [0; 1]. From this inequality, we can easily see that for the weight in (40) written as a function of θ, we have:  Table 1 with θ = 0.07 discretized on 11 × 11 grid, (B) Fréchet decomposition of Joe copula, (C) the bivariate pmf p X,Y obtained by combining the discretized Gamma(5, 1) marginals p X and p Y (black bars) using the discretized Joe copula and (D) Fréchet decomposition of p X,Y .

Covariance Scaling
When the available statistical descriptors are limited to only the marginals p X , p Y and the covariance Cov[X, Y], and computing speed is of particular importance (see, e.g., Wang 1998;Wójcik et al. 2019), it is advantageous to use the Fréchet copula in (36) as opposed to using the Fréchet decomposition in (39) of any other copula in Table 1; see Section 4 for details. For the ground-up summation X + Y in Figure 4A, the weight is computed using (41). For the gross loss summation φ X (X) + φ Y (Y) in Figure 4B, the weight is computed using (45). In this case we first obtain the transformed marginals p φ X , p φ Y using (15). Next, we express the (unknown) covariance Cov[φ X (X), φ Y (Y)] in the numerator of (45) by precising the condition (1) in Section 2, i.e., Finally, we use (implicit) affine transformations of the ground-up risks: where σ X , σ Y and σ φ X (X) , σ φ Y (Y) are ground-up and gross loss standard deviations for individual risks, respectively, to write the gross loss weight as: where Note that such ansatz does not alter the integration scheme in Figure 4B. It only approximates the gross loss weight computation when the dependence structure is given by (36). The advantage is the substantial gain in computational speed of loss integration as compared to using the copula convolution in (38); for details see Section 4.

A Comment on Back Allocation
We point out that (49) and (50) are inspired by the procedure of partitioning the gross loss sum φ X (X) + φ Y (Y) back to individual risk level X and Y. Defining the offsets in (49) and (50) as: where µ X , µ Y and µ φ X (X) , µ φ Y (Y) are ground-up and gross loss means for individual risks, respectively, guarantees that In catastrophe insurance, this procedure is referred to as back allocation; see, e.g., Mitchell-Wallace et al. (2017). By replacing the integration node in Figure 4B with its second-order proxy in Figure 4C, the back-allocated losses are sometimes used to evaluate the impact of financial terms on specific geographical/administrative groups of risks. For example, two groups of risks that belong to different insurance policies and are subject to different policy limits may contain subsets of risks that belong to one county. Therefore, back allocation is needed if county-level gross loss is of interest. We stress that preservation of the secondorder moments in (55) and (56) 2017), which satisfies only (55) by prorating higher-level financial structure mean loss in proportion to the lower-level mean loss(es). In this study, we do not use back allocation per se, so we leave further investigation of the new method for future research.

Computational Aspects
Our bivariate copula approach requires two separate passes of risk summation over an aggregation tree: ground-up pass and gross loss pass. The former is performed to identify the parameters of copulas at the summation nodes in Figure 4A such that they reproduce imposed ground-up covariance. The gross loss pass is performed to obtain an estimate of the total insured loss pmf using the summation nodes in Figure 4B. with copulas parameterized during the ground-up pass. The order of summations (tree topology) is fixed for both passes and, as mentioned in Section 3, dictated by the structure of financial terms and sequential order of aggregation; compare, e.g., the sequential gross loss aggregation tree in Figure 2G, which determines the order of ground-up summation in Figure 2B. In the course of the ground-up pass, at each summation node the marginals P X and P Y , the parametric copula C(u, v; θ) (see Table 1) and the covariance Cov[X, Y] are given. Our objective is to find: This corresponds to matching the prescribed Person's correlation ρ(X, Y) with its copulabased counterpart ρ (X, Y) obtained by integrating C(P X (x), P Y (y); θ ). It trivially follows that the target variance is reproduced at each node of the ground-up aggregation tree. Algorithm 1 offers a numerical solution to (57). Newton's method (see, e.g., Kelley 2003) is invoked first, and if it fails, the bisection method in Sikorski (1982) is called. Failure occurs when Newton's update, which initially falls outside θ's domain, is capped at the boundary. As a result, the algorithm triggers an infinite loop of bouncing back and forth between the boundary and the points within the domain. Both Newton's and bisection optimizers require O(N x N y ) time and space in each iteration. Newton's method takes extra time to evaluate the derivative in Step 7, but it usually converges much faster than the bisection method (Ehiwario and Aghamie 2014). For Gaussian copula, the optimization procedure in Xiao and Zhou (2019) is applied to computing ρ (X, Y) in Algorithm 1, Step 9. The idea is to approximate a bivariate Gaussian pdf with Hermite polynomials and to subsequently reduce the correlation function of θ to polynomial series. Acceleration due to this technique is substantial because of Newton's method's suitability for solving polynomials. In our implementation, the Taylor expansion (Equation 39 in Xiao and Zhou 2019) stops when the rightmost term becomes less than 0.00001. For Student's t copula, θ is bounded by [−1, 1] instead of [0, 1] while being optimized in Algorithm 1. This is because negative θ in the copula could still imply positive dependence in bivariate joint distribution. If spurious negative correlation between the marginal risks is detected during the gross loss pass, we proceed assuming independence and using discrete convolution. Numeric threshold and maximum iteration t max in Algorithm 1 are set to 0.0001 and 1000, respectively. These values provide a reasonable balance between numeric precision and computing speed.
Next, the gross loss pass is performed with Algorithm 2 using copulas optimized in the ground-up pass. The algorithm implements the summation node in Figure 4B. The dominant computational overhead comes from Step 2, which has both asymptotic space consumption and asymptotic time complexity of O(N x N y ). The latter is due to processing primitive arithmetic such as exponentiation and logarithm, which constitute most copulas listed in Table 1. Speed comparison with other fundamental operations, e.g., addition and multiplication, is shown in Table 2. In Step 7, the copula decomposition in (39) is performed. This variant, however, is slower than the original copula approach. It requires both computing the bivariate joint in (39) and the mixture distribution of the sum in (10) with the weight in (45). Computation of the mixture has asymptotic space complexity of O(N x + N y ) due to auxiliary memory allocations entailed in regridding and estimating the comonotonic sum (Algorithms 3, 6 in Wójcik et al. 2019) and asymptotic time complexity of O(N x N y ) dominated by the convolution algorithm (Section 2.4.1 Wójcik et al. 2019). We point out that the covariance scaling approach in Section 3.2 does not require the groundup pass. The weight in (45) depends only on the scaled covariance and the transformed marginals, both computed adaptively during the gross loss pass.
Algorithm 2 Add two risks in gross loss pass Space O(1). 4: if decomposition flag is unset then 5: . 10: end if 11: return p φ(S) . All algorithms were coded in C++. Bivariate Gaussian cdf implementation is based on the C code in the library by Donnelly (1973). Bivariate Student's t cdf implementation converts the Fortran code in R package mvtnorm (see Genz et al. 2021) into C++. Figure 7 shows the time cost of adding two random variables using copulas in Table 1. The Fréchet copula with covariance scaling facilitates the fastest processing due to its uncomplicated form and adaptive parameter estimation in Algorithm 3. Slightly slower Morgenstern copula convolution takes advantage of the straightforward algebraic expression for bivariate cdf and constant ∂C(u, v; θ)/∂θ; see Appendix A.3. Similarly, Student's t algorithm with ν = 1 (Cauchy) gains efficiency from the uncomplicated derivative in Appendix A.4 and relatively fast arctan operation; see Table 2. When ν > 1, the bivariate Student's t cdf has a sign function of θ that is not differentiable (Equations (10) and (11) in Dunnett and Sobel 1954). In this case, θ is estimated by the bisection method in Algorithm 1. Because the cdf is evaluated in a recursion dependent on ν, the computing time grows with increasing degrees of freedom. When Gaussian copula is used, evaluations of bivariate and inverse univariate Gaussian cdfs are needed. Both entail integrals of computationally expensive exponential functions, which makes risk integration slower than that based on Cauchy copula. Equations for Joe and Gumbel copulas and their partial derivatives contain compositions of power functions (see Appendices A.1 and A.2), which is the main reason for their longest processing times. 1: Compute standard deviations σ X and σ Y . 2: Compute gross loss pmfs p φ(X) , p φ(Y) and their standard deviations σ φ X (X) and σ φ Y (Y) . 3: Compute the covariance scaling factor: . 4: Compute the comonotonic sum pmf p φ(S) + , and derive the comonotonic covariance Cov[φ X (X + ), φ Y (Y + )] . 5: Compute the gross loss weight w in (51). 6: Make mixture p φ(S) ← (1 − w)p φ(S) ⊥ + wp φ(S) + . 7: return p φ(S) .

Results
To test the proposed algorithms for the copulas in Table 1, we computed gross loss for two major hurricane events and one major earthquake event in the United States. These natural catastrophes inflicted damage to 31,896, 9056 and 1209 locations, respectively, from three undisclosed commercial insurance portfolios. Some characteristics of these portfolios are shown in Table 3. Following the convention used in catastrophe insurance, we express risk as the damage ratio, i.e., loss over the replacement value. For each event, the damage ratio X k |I k based on knowledge of the peril intensity I k at kth affected location is given by the catastrophe model prediction ζ k |I k and described by zero-inflated, limited transformed beta distribution (see Klugman et al. 2012;Ospina and Ferrari 2010). We assume that the mean of each distribution is determined by the expectation of model prediction and that covariance between risks characterizes model error, so where the zero-mean noise term η represents the epistemic uncertainty in the catastrophe model development (see, e.g., Grossi 2004), and h is the damage function, which converts peril intensity to damage ratio. For hurricanes the intensity is expressed as wind speed and is computed using the U.S. hurricane model in AIR-Worldwide (2015). For earthquakes the intensity is measured as spectral acceleration of ground motion and is obtained from the U.S. earthquake model in AIR-Worldwide (2017). Loss distributions were discretized on a 64-point grid; some examples are shown in Figure 8. Spatial dependency was captured by the nested block diagonal correlation matrix R corresponding to the covariance matrix Σ (see Figure 2B in Wójcik et al. 2019). This structure was because of imposing exchangeable correlation in spatial grid blocks at two scales using the random effects model in Einarsson et al. (2016). For hurricanes, the widths of spatial bins were 1 km and 20 km. The values assigned to off-diagonal entries were 0.07 if two risks were within the same 1 km block, and 0.02 if two risks resided in the same 20 km block but different 1 km blocks. Correlation was set to zero if any two risks belonged to different 20 km blocks. For earthquakes, the widths of the spatial bins were 1 km and 25 km, respectively, with the corresponding correlation coefficients 0.26 and 0.09. Any two risks from different 25 km blocks were considered uncorrelated.  The insurance hierarchy which determined the order of gross loss aggregation in each portfolio was: sub-limits, XOL layers and policies; see Section 3 for details and Table 3 for the number of components in each tier of financial terms. A hierarchical aggregation tree with sequential topology was chosen to perform gross loss accumulation; see, e.g., Figure 2. In all runs, at every summation node the support size of aggregate loss pmf was limited to 256 points, and losses with probabilities ≤10 −10 were set to zero. This "tail truncation" was intended to maintain fine discretization at the bulk of the pmf, particularly during the ground-up pass. To speed up computations based on Fréchet copula with covariance scaling and those based on Frechèt decomposition, we used numerical convolution with linear regridding; see Algorithm A1 and Algorithm 3 in Wójcik et al. (2019). We also considered a multivariate extension of the Fréchet copula in (36) with a multivariate extension of the covariance scaling factor in (52). The multivariate Fréchet model uses the direct tree (see Figure 2A-C) to aggregate risks going into the same financial structure, e.g., a sublimit. In other words, the summation nodes on an aggregation tree are followed by the transformation nodes (financial terms) in Figure 2, with the exception of when the risks are summed up to an event total, without breaking the summation into smaller bivariate chunks.
Total gross loss pmfs for Portfolios 1, 2 and 3 are presented in Figures 9-11, respectively. Table 4 shows three statistics characterizing these distributions: mean (µ), standard deviation (σ) and the expectation of loss conditional on the loss being greater than the 1st, 5th and 10th percentile, correspondingly (1, 5, 10%-TVaR; see, e.g., Artzner et al. 1999). In Portfolios 1, 2 and 3, over half of the total ground-up loss comes from the top 2.4%, 0.52% and 0.41% properties, respectively, with the largest replacement values. Policies written to expensive commercial properties typically have a share term on the XOL layer in (21).
This linear term specifies that only a small ratio, e.g., 8%, of the XOL layer is covered. As a result, the loss reduction from ground-up to gross occurs by scaling the support of the gross loss distribution after application of the XOL layer. Accordingly, for each portfolio in Table 4, all copulas produce gross loss pmfs with similar expected values. On the other hand, variance of the total loss is impacted by the covariances between pairs of risks; hence, the heterogeneity in σs is relatively high among different copulas. We observe that the variability declines, however, from Portfolios 1 to 3 as fewer buildings dominate the total replacement.  Figure 9. Total gross loss pmfs for Portfolio 1. The pmfs are plotted within the same x-axis range for clarity. Blue pmfs are obtained using aggregation tree with copulas in Table 1 at summation nodes. Red pmfs are obtained by replacing each copula with its corresponding Fréchet decomposition into comonotonic part and independent part.  Table 1 at summation nodes. Red pmfs are obtained by replacing each copula with its corresponding Fréchet decomposition into comonotonic part and independent part.  . Total gross loss pmfs for Portfolio 3. The pmfs are plotted within the same x-axis range for clarity. Blue pmfs are obtained using aggregation tree with copulas in Table 1 at summation nodes. Red pmfs are obtained by replacing each copula with its corresponding Fréchet decomposition into comonotonic part and independent part.
The ground-up total damage ratio of Portfolio 1 is only 0.06% (Table 3), and most marginal, location-level loss distributions are strongly right-skewed. When imposing a prescribed ground-up correlation between the long-tailed marginals, the correlation in the Gaussian copula usually takes higher values than the default correlation in joint distribution. Reason being, the Gaussian copula (family B1 in Joe 1997) is symmetric and concentrates on dependence in the bulk more than in the right tail of the bivariate distribution. Once the marginals become less skewed after financial terms have been applied, correlations between the gross losses are frequently inflated compared to the corresponding correlations between the ground-up losses. As a result, the Gaussian copula induces the highest variance of the total gross loss. For the Fréchet copula, this effect is attenuated by covariance scaling, which preserves the ground-up correlations. In the case of the Morgenstern copula (family B10 in Joe 1997), the positive correlation range is limited, i.e., for θ ∈ (0; 1], the corresponding Spearman's correlation is ρ s ∈ (0; 1/3]. Owing to this restriction, the gross loss variance is slightly lower than the variance for the Fréchet copula. In contrast, Joe and Gumbel copulas (families B5 and B6 in Joe 1997) are both designed to model the right tail dependence. They need only a small value of the dependence parameter θ (Table 1) to reproduce the imposed ground-up correlation between the marginals with heavy right tails. During the gross loss pass, probability mass of partial loss aggregates is shifted from tail to bulk of their pmfs, and as a result, correlation between the partial summands becomes weaker. This leads to the low variance of the total gross loss as compared to other copulas; see Figure 9.
In Portfolio 2, the marginals are significantly less skewed as compared to Portfolio 1. This is manifested by high value of the total ground-up damage ratio; see Table 3. For the Morgenstern copula, we can often find θ < 1 to impose the prescribed correlation. Accordingly, the variance of the total gross loss matches that estimated using Gaussian copula and that estimated using the Fréchet decomposition of the Gaussian copula. We remark that in Figures 9-11 the gross loss pmfs produced by the Fréchet decomposition have close σs to their copula counterparts. To make this comparison more precise, for each copula and its Fréchet decomposition, we computed the percentage errors for µ, σ and 1, 5, 10%-TVaR and averaged the results over all copulas and all portfolios. The errors were 0.19%, 2.21%, 5.8%, 1.43% and 1.57%, respectively. This means that the approximation in (39) can be considered sufficiently accurate for the gross loss estimation experiments at hand.
Both aggregation based on the Fréchet decomposition and aggregation using the Fréchet copula with covariance scaling are instances of the same mixture method, which slightly differ from each other in the gross loss weight computation; compare (45) with (51).
That is why the shapes of their total gross loss pmfs are similar. We also observe that the pmf computed with the multivariate Fréchet copula in Figure 9 has larger grid step on a wider support with vanishing probabilities than pmf computed with the bivariate Fréchet copula. In this instance, the final pmf's support was determined by comonotonic summation, where tail truncation rarely occurs; see Wójcik et al. (2019). Consequently, the values of TVaR for the multivariate Fréchet model are smaller than those for its bivariate version. Table 4 shows that a smaller σ indicates lower TVaRs at 1%, 5% and 10% in most cases. Especially the Student's t model, with low number of degrees of freedom ν, tends to generate much lower TVaRs than other copulas. Because lower degrees of freedom shift the dependence to the tail (Patton 2013), the σ of the pmf by Student's t is the lowest when ν = 1, and gradually approaches that of the Gaussian's with increasing ν. The aggregation trees were implemented using linked lists. This data structure is easy to maintain and straightforward to extend, e.g., to account for different arrangements of financial terms. The associated computational cost is negligible unless the portfolio size is trivial, e.g., ten risks. For the Fréchet family, the covariances to scale (Section 3.2) are first located by a tree search and then added up for deriving the mixture weight between partial sums. We point out that for nested block-diagonal covariance matrices described in our previous study (Wójcik et al. 2019), the sum of covariances can be computed in asymptotically linear time using the second Newton's identity, see, e.g., Mead (1992). During portfolio runs, the computational cost of the search and summation includes the delays in main memory access. These extra costs in the portfolio runs will reduce the relative time differences in Figure 7, where all the data reside in L1 cache during the benchmark test. However, the overall speed advantage of the Fréchet family remains substantial, as shown in Table 5. Because loss distributions in Portfolio 2 are less skewed than those in Portfolio 1 and Portfolio 3, the pmfs during loss accumulation have fewer probabilities to truncate and thus maintain larger support sizes. This incurs extra computing time for all the copulas in Table 1 other than the Fréchet family due to the quadratic space complexity of Algorithms 1 and 2. Therefore, the relative time costs for Portfolio 2 are noticeably higher in Table 5.

Conclusions
We have proposed a two-stage gross loss estimation procedure based on the concept of copula-based risk aggregation. Partial sums of weakly associated risks were computed using bivariate aggregation trees equipped with six commonly used copula distributions. The results for three typical insurance portfolios show that the total gross loss characteristics depend not only on the choice of copula determining which part of the bivariate distribution has the strongest dependency, but also on the total ground-up damage ratio, the shape of the marginal loss distributions, and on the parameters describing the financial terms at different tiers of the gross loss accumulation. We conclude that the relative time cost of portfolio aggregation is mainly affected by two components: the copula optimization algorithm in the ground-up pass and data management in the gross loss pass. Processing speed of the former is slower in the case of copulas with difficult algebraic expressions for partial derivatives. Substantial acceleration is achieved by introducing the Fréchet copula with covariance scaling, which replaces optimization with adaptive weight estimation. For the largest portfolio, with the highest variability of total loss characteristics, this approach compares agreeably with Gaussian and Morgenstern copulas in terms of the second-order moments, and with other copulas in terms of tail risk measures.

Future Research
The impact of the back-allocation procedure in Section 3.3 on gross loss aggregation requires further investigation. Implementation of more flexible copulas, e.g, the Berstein copula or generalizations of Archimedean copulas at summation nodes would be compelling, as they have been applied in the risk aggregation problem (see Marri and Moutanabbir 2022). Finally, empirical validation of the copula models at hand using insurance claims data remains an open research problem. Our model output is the total loss distribution of a portfolio. For validation, ideally we should also have the actual total loss distribution inferred from claims data. However, for a particular catastrophic event, we can only obtain a single realization, the sum of claims, from the claims data. A potential solution would be to create an ensemble of sub-portfolios to estimate the empirical distribution of the sum of claims, and then compare predicted vs. empirical distributions using probability scores in Gneiting and Raftery (2007), e.g., those derived from Bregman divergences.
On a finite precision machine, any continuous copula C is in fact discrete on the regular mesh 0, 1 R , . . . , R−1 R , 1 × 0, 1 O , . . . , O−1 O , 1 over the unit square [0; 1] 2 (Geenens 2020). The values of R and O do not depend on the discretization of P X and P Y , i.e., {P X (x 1 ), P X (x 2 ), . . . , P X (x N x )} and {P Y (y 1 ), P Y (y 2 ), . . . , P Y (y N y )}, but are calculated as follows: , . . . , 1 P X (x N x ) , , . . . , 1 P Y (y N y ) where lcm is the least common multiple function for rational numbers, e.g., lcm (0.3, 0.55) = 100. In practice, computing the full pmf c (u, v) in (37) is intractable because of the support size RO, which is prohibitively large due to the high numeric precision on modern computers. Doing so is also unnecessary because we only need N x N y probabilities from the copula pmf, namely {c P X (x i ), P Y (y j ) } 1≤i≤N x ,1≤j≤N y , to obtain p X+Y using (38).