Direct and Hierarchical Models for Aggregating Spatially Dependent Catastrophe Risks

: We present several fast algorithms for computing the distribution of a sum of spatially dependent, discrete random variables to aggregate catastrophe risk. The algorithms are based on direct and hierarchical copula trees. Computing speed comes from the fact that loss aggregation at branching nodes is based on combination of fast approximation to brute-force convolution, arithmetization (regriding) and linear complexity of the method for computing the distribution of comonotonic sum of risks. We discuss the impact of tree topology on the second-order moments and tail statistics of the resulting distribution of the total risk. We test the performance of the presented models by accumulating ground-up loss for 29,000 risks affected by hurricane peril.


Introduction
The main objective of catastrophe (CAT) modeling is to predict the likelihood, severity and socio-economic consequences of catastrophic events such as hurricanes, earthquakes, pandemias and terrorism.Insurance companies use models to prepare for the financial impact from catastrophic events.Such models offer realistic loss estimates for a wide variety of future scenarios of catastrophes.Losses computed by CAT models can either be deterministic pertaining to a specific historical event (e.g., 2011 Tohoku earthquake in Japan or 2012 Hurricane Sandy in the U.S.), or probabilistic, inferred from an ensemble of hypothetical events (Clark 2015).In the probabilistic framework, large catalogs of events are randomly simulated using Monte-Carlo (MC) methods coupled with physical/conceptual models.For example, based on historical hurricane data, distributions of parameters such as frequencies, intensities, paths etc. are estimated and then used to randomly simulate values of these parameters to obtain a footprint of potential hurricane events in the period of, e.g., 10,000 catalog years.These events are not meant to predict hurricanes from year 2020 to year 12,020, but instead, each of the 10,000 years is considered as a feasible realization of the hurricane activity in the year 2020 (Latchman 2010).Insurance companies investigate how catalog events affect a portfolio of properties using financial risk analysis (portfolio rollup).This operation typically consists of first aggregating event losses per property to obtain the event total loss, and then aggregating event totals within each catalog year to obtain the aggregate annual loss.Typically, the event loss per property, the event total, and the annual loss are all characterized by finite discrete probability distributions.Finally, the mixture of annual loss distributions is used to construct the exceedance probability (EP) curve Grossi et al. (2005).The EP curve is essentially the complementary cumulative distribution function which describes the portfolio loss, and is the key for insurers to estimate the probabilities of experiencing various levels of financial impact.
The first step of risk analysis for large insurance portfolio consists of probabilistic loss aggregation over a large number of locations for each event in the stochastic catalog.In the past few decades a number of methods have been proposed to address this difficult technical issue, see Shevchenko (2010); Wang (1998) for detailed overview.These approaches can roughly be categorized into three mainstream groups: (i) parametric, see e.g., Chaubey et al. (1998); Panjer and Lutek (1983); Venter (2001), where distributions characterizing individual losses belong to some parametric family and their convolution is given by an analytic expression or parametric closed form approximation, (ii) numerical, where individual loss distributions are given in discrete, generally non-parametric form and the distribution of the total risk is obtained using a variant of numerical convolution in Evans and Leemis (2004) boosted by Fast Fourier Transform in Robertson (1992) to make computations tractable (iii) MC, where, for a number of realizations, random samples are drawn from individual loss distributions and simply added up to obtain the aggregate loss, see e.g., Arbenz et al. (2012); Côté and Genest (2015); Galsserman (2004).The approach proposed in this paper originates from the category (ii).This is because of high computing speed requirement for ground-up/gross CAT loss analysis and also due to the fact that risks at different locations and their partial aggregates are described by discrete, generally non-parametric distributions (see Section 2.4 for details).We introduce two major enhancements to loss aggregation via numerical convolution.First, the proposed algorithm operates on irregular positive supports with the size of up to 300 points and treats atoms (point probability masses at minimum and maximum loss) separately.Second, positive correlations between pairs of risks are modeled by a mixture of Split-Atom convolution in Wojcik et al. (2016) and comonotonic distribution (Dhaene et al. 2002) of the sum of risks using several risk aggregation schemes based on copula trees in Arbenz et al. (2012); Côté and Genest (2015).High computing speed of our procedure stems from the fact that, by design, we aim at reproducing only the second order moments of the aggregate risk.Numerical experiments presented in this contribution show, however, that also tail measures of risk compare favorably with the estimates obtained from large sample MC runs.
The paper is organized as follows.First, we introduce the framework for aggregation of spatially dependent risks with copula trees and discuss direct and hierarchical models given positive dependence structure.Next, we present a computationally fast way to sum the dependent risks at branching nodes of a copula tree and algorithms for determining the tree topology.Finally, we show an example of ground-up loss estimation for a historical hurricane event in the U.S. Pros and cons of the proposed aggregation models are discussed and compared to the corresponding MC approach.

Problem Statement
When aggregating CAT risks it is essential to account for spatial dependency between these risks relative to CAT model estimate.In general, combining dependent loss variables requires knowledge of their joint (multivariate) probability distribution.However, the available statistics describing the association between these variables are frequently limited to e.g., correlation matrix Wang (1998).To compute the aggregate loss distribution given such incomplete information, the risks are combined within copula trees (Arbenz et al. 2012;Côté and Genest 2015) where dependencies between losses are captured at each step of the aggregation using copula approach (see, e.g., Cherubini et al. 2004;Nelsen 2006).In the current study, we consider two accumulation schemes which assume non-negative correlations between pairs of risks.The question we attempt to answer is: "What is the most computationally efficient copula tree which aggregates spatially dependent risks pairwise and approximates non-parametric distribution of the sum of losses for a particular CAT event in such a way that its second order moments are reproduced?"

Direct Model
Following the idea in Wang (1998) and Dhaene et al. (2014) the positive dependence between risks can be represented using Fréchet copula via weighting independence and comonotonicity: where be independent and comonotonic random vectors with the same marginals P X 1 , P X 2 , . . ., P X M as X.By definition P X ⊥ (x) = ∏ P X i (x i ) and P X + (x) = min (P X 1 (x 1 ), P X 2 (x 2 ), . . ., P X M (x M )), which is equivalent to the statement that a random vector is either independent or comonotonic if and only if it has either product or min copula showed as the first and second summand in (1), respectively.For any positive dependent random vector X (Definition 2 and 3 in Koch and De Schepper 2011), (Collorary 2.3 in Koch and De Schepper 2006) the following bounds hold: Assuming that the distribution of X is induced by (1) and reads: it follows from (Denuit et al. 2001, Theorem 3.1) and (Hürlimann 2001, Remark 2.1) that the dependent sum S = X 1 + X 2 + . . .+ X M is always bounded in convex order by the corresponding independent sum where ≤ cx denotes the convex order.By definition for all real convex functions v, provided the expectations exist.As a consequence S + has heavier tails than S and the following variance order holds: where P S ⊥ and P S + are distributions of independent and comonotonic sums S ⊥ and S + , respectively.Such ansatz, referred to as the the mixture method in Wang (1998), corresponds to the flat aggregation tree in the upper panel of Figure 1.For computational convenience, we elect to approximate the mixing coefficient w as the multivariate dependence measure in Dhaene et al. (2014), so where r is the classical Pearson correlation.Since the denominator of (8) is a normalizing constant which depends only on the shape of the marginals, any general correlation matrix r(X i , X j ) = ρ i,j with positive entries as, e.g., shown in Figure 2B,C can be represented by the exchangeable correlation ρ e in Figure 2A without impacting the value of w.That is, Moreover, ( 8) is equivalent to the comonotonicity coefficient in Koch and De Schepper (2011) if ( 7) holds for all bivariate marginals: We observe that w = r(X i , X j ) if and only if r(X + i , X + j ) = 1 which holds when all univariate marginals differ only in location and/or scale parameters (Dhaene et al. 2002).

Hierarchical Model
If unique description of the joint distribution of individual risks is not crucial and the focus is solely on obtaining an easily interpretable model for the total risk, the individual risks can be aggregated in a hierarchical way.Such process involves specification of partial dependencies between the groups of risks in different aggregation steps (Arbenz et al. 2012).For pairwise accumulation, we first select the two risks X i , X j and construct a copula model for that pair.Then, we replace X i and X j by their sum X i + X j and treat it as a new, combined risk.A simple example is given in the middle panel of Figure 1 depicting the sequential risk aggregation scheme in Côté and Genest (2015).With bivariate (1) inducing the convex sum approximation (7) at the branching nodes we have: For ease of notation, we dropped the arguments of the probability density functions (pdfs) characterizing the partial sums S i = X 1 + . . .+ X i+1 .Observing that for the sequential tree, the partial sums S ⊥ i and S + i are symbolic abbreviations of (X 1 + . . .4), the partial weights w i read: We remark, that if the order in which the aggregation is performed does not trivially follow from the tree structure, any convention can be used to make the numbering of partial sums unique, e.g., see Côté and Genest (2015).However, this has no implication on the fact that, in general, hierarchical trees do not uniquely determine the joint distribution of risk X.To answer the research question posed in Section 2.1, this non-uniqueness is not critical.Conversely, in situations where, e.g., capital allocation is of interest (see Côté and Genest 2015), an extra conditional independence assumption in Arbenz et al. (2012) is needed.For instance, the aggregation scheme in the middle panel of Figure 1 would require:

Implementation of Risk Aggregation at Branching Nodes
A sample reordering method, inspired from Iman and Conover (1982) and assembled into the customized procedure in Arbenz et al. (2012), has recently been used to facilitate summation of risks in both the direct and hierarchical models.Despite the elegant simplicity of this approach, it comes at high computational cost for large samples.To reduce that cost and to orchestrate reproduction of the second order moments of the target sum S, we opt to use the following set of algorithms instead: (i) second order approximation to brute force convolution referred to as the Split-Atom convolution Wojcik et al. (2016), (ii) arithmetization (aka regriding) of loss distributions Vilar (2000), (iii) estimation of the distribution of the comonotonic sum of risks (comonotonization) and (iv) construction of the mixture distribution in (7).These algorithms are described in Sections 2.4.1-2.4.3.An individual risk X is a discrete random variable expressed in terms of the damage ratio defined as loss divided by replacement value.The corresponding pdf p X is represented by zero-and-one inflated mixture: where α and β are atoms at zero and one damage ratio, respectively, and pX is the (discretized) pdf describing the main part of the mixture, see Figure 3 for an example and Section 3 for parameterization used in this study.The atoms are a common feature inferred from analysis of CAT insurance claims data.They also emerge during gross loss portfolio roll-up as a result of applying stop-loss insurance and/or re-insurance terms-deductibles and limits assembled into a variety of tiers/structures.0.0 0.2 0.4 0.6 0.8 1.0 0.00 0.04 0.08 x p X (x) Figure 3.A discrete loss pdf represented as a mixture of two "spikes" (atoms) at minimum and maximum x damage ratio (red) and the main part (blue).Damage ratio is discretized on 64-point grid.

Split-Atom Convolution
The pdf p S ⊥ of a sum of two independent discrete random variables X and Y with pdfs p X and p Y respectively, can be computed as: The classical way of implementing ( 15) for random variables defined on irregular support is brute force (BF) convolution in Evans and Leemis (2004).Let N x and N y be the number of points discretizing p X and p Y support, respectively.The complexity of BF convolution is O N x N y log(N x N y ) because computing all possible products of probabilities and all possible sums of losses is O(N x N y ), and redundancy removal is O(N x N y log(N x N y )), see Evans and Leemis (2004) for details.Such high computational demand makes the BF algorithm impractical for convolving large number of CAT loss distributions.Our solution in Wojcik et al. (2016), referred to as the Split-Atom convolution, reduces the computational cost to O(N x N y ) where N x < N x and/or N y < N y .The idea is to separate two atoms ("spikes") at min/max losses and arithmetize (regrid) the main parts of loss distributions.The regriding is the key to accelerating computations.The Split-Atom approach preserves min/max losses from different loss perspective (e.g., insurer, re-insurer, insured, FAC underwriter etc.) and enhances the accuracy of p S reproduction in situations where substantial probability mass is concentrated in the atoms.This is relevant for ground-up and, even more so, for gross loss estimation when stop-loss insurance is applied.Various ways of splitting the atoms and compacting main parts of loss distributions can be considered depending on computing speed and memory requirements and organization/arrangement of the original and convolution grids.An example is given in Algorithm 1.The original grids are non-uniform: the main parts of p X and p Y are defined on regular grids with spans h x and h y , but the spacing between atoms and the main parts is arbitrary.Convolution grid is designed to have the same irregular spacing for preserving min/max losses.The speedup comes from execution of step 28 in Algorithm 1 using Algorithm 2. Depending on the application, other variants of the Split-Atom convolution are conceivable.For example, the 9-products approach in Algorithm 1 not only reproduces min/max losses but also probabilities describing these losses, or the 4-products approach which only splits the right atom to gain extra computing speed.See Appendix A, Algorithm A1.
Algorithm 1: Split-Atom Convolution: 9-products Input : Two discrete pdfs p X and p Y with supports: X // --"--29: Regrid B (1−9) onto convolution grid s Output : Discrete probability density function p S ⊥ of the independent sum S ⊥ = X + Y with the support s = {s 1 , s 2 , s 2 + h s , s 2 + 2h s , . . ., s 2 + (N s − 3)h s , s N s } and the associated probabilities Algorithm 2: Brute force convolution for supports with the same span Input : Two discrete probability density functions p X and p Y , where the supports of X and Y are defined using the same span h as: x = {x 1 , x 1 + h, . . ., x 1 + (N x − 1)h}, y = {y 1 , y 1 + h, . . ., y 1 + (N y − 1)h} and the associated probabilities as Output : Discrete probability density function p S ⊥ of the independent sum S ⊥ = X + Y with the support s = {s 1 , s 2 , . . ., s N x +N y −1 } and the corresponding probabilities Further acceleration of Algorithm 1 can be achieved using Fast Fourier Transform (FFT) and applying convolution theorem, (Wang 1998, Section 3.3.1)to convolve the main parts of p X and p Y .Therefore, step 28 in Algorithm 1, is replaced with the following product: where P (3) Ω 1 and P Ω 2 represent FFTs of the probabilities p X and p Y .IFFT in ( 16) stands for the Inverse Fast Fourier Transform.Implementation of ( 16) requires the supports x (3) and y (3) have the same span h s , which is guaranteed by steps 16 and 18 in Algorithm 1, and also have the same range containing power-of-2 numbers of points, which can be guaranteed by extending one or both supports and padding zero probabilities.

Regriding
Split-Atom convolution requires that two pdfs being convolved have the main part of their supports discretized using the common span (grid step size) h s , see step 8 in Algorithm 1. Prior to convolution, either one or both of these pdfs may need arithmetization hereafter referred to as the regriding.In general, this operation takes the discrete pdf p X defined on fine scale support with the span h and determines the new arithmetic pdf p X on coarse scale support with the span h > h with the property ∑ x x m p X (x) = ∑ x x m p X (x ) of equating m moments with p X (see, e.g., Gerber 1982;Vilar 2000;Walhin and Paris 1998).
For m = 1, linear regriding or mass dispersal in Gerber (1982) redistributes a probability mass on the original grid to two neighboring points on the new grid such that only E[X] is preserved.This is achieved by solving a 2 × 2 local linear system, see Algorithm 3, Step 6 and animation therein.For m = 2, the standard approach is to apply the method of local moment matching (LMM) which reproduces local second order moments of p X in predefined intervals on the new grid, assuring preservation of both E[X] and Var[X] (Gerber 1982;Vilar 2000;Walhin and Paris 1998).Despite technical simplicity of LMM, there are two caveats: "wiggly" behavior and/or negative probabilities in p X (see, e.g., Table 4 in Walhin and Paris 1998).The first one is due to the fact that LMM performs local moment matching in fixed intervals (x k , A simple improvement is proposed in Appendix A, Algorithm A2.The second caveat is that solving the 3 × 3 local linear system for matching the local moments (Equation ( 16) in Gerber 1982) guarantees a negative in the three dispersed probability masses.The negative mass can only be balanced out by a positive one dispersed from solving the next local linear system if the positive mass is greater than the negative one.Upon completion of the LMM algorithm, negative probabilities could still exist at arbitrary locations on the new grid.One way to handle this issue is to retreat to linear regriding as in Panjer and Lutek (1983).Another way is to use linear goal programming in Vilar (2000), i.e., for every span h determine the arithmetic pdf which conserves the first moment E[X] and has the nearest second moment to the target Var[X].Therefore, the negative mass is mitigated by sacrificing the quality of reproducing Var[X].Here, we propose an alternative two-stage strategy listed in Algorithms 4 and 5 hereafter referred to as the 4-point regriding.

Algorithm 3: Linear regriding
Input : Discrete pdf p X with fine scale support x = {x 1 , x 1 + h, . . ., x 1 + (N − 1)h} and associated probabilities p x j p X (x j ) In Stage I, a probability mass is dispersed to the two neighboring points and the two end points (first and last point on the new grid) such that the mean and the second moment are preserved.The local linear system for matching moments (Algorithm 4, Step 15) guarantees (i) positive masses added to the neighboring points and negative masses added to the end points, (ii) probabilities added to the neighboring points are inversely proportional to distances between these neighboring points and the projecting point.Property (ii) mimics linear regriding and completes a 4 × 4 invertible local linear system along with the other three constraints for matching moments.Negative probabilities resulted from solving the system are typically small in absolute value because the end points are usually far from the projecting point.In practice, Stage I rarely leaves negative probabilities at one or both endpoints of the new grid on completion of the algorithm.
If this is not the case, in Stage II, the negative probability at the first (last) point is dispersed to the second (second to last), the third (third to last), and the last (first) point, respectively, subject to second order moment matching, see Steps 6 and 16 in Algorithm 5.The local linear system in step 7 (or 17) guarantees positive mass dispersed to the third (third to last) point and negative masses to the first (last) point and to the second (second to last) point respectively, see animation in Algorithm 5. Probability at the first (last) point is then set to zero and the second (second to last) point becomes the first (last) point on the new grid, reducing min/max range of p X support.The algorithm alternates between the first and the last points until both hold nonnegative probabilities, or, until no more points are available for balancing out the negative probabilities (see the animation in Example 3, Algorithm 5).The latter indicates 4-point regriding failure and invokes linear regriding.q q q q q q q q Algorithm 4: 4-point regridding, Stage I Input : Discrete pdf p X with fine scale support x = {x 1 , x 1 + h, . . ., x 1 + (N − 1)h} and associated probabilities p x j p X (x j ) 17: end for 18: if min p X (x 1 ), p X (x N ) < 0 then 19: Go to Algorithm 5. 20: end if Example 2. 4-point regriding of fine scale pdf (blue bars) onto new grid (orange circles).The resulting coarse scale pdf is represented as green bars.
Output : Discrete pdf p X (x ).q q q q q q q q Algorithm 5: 4-point regridding, Stage II Input : Discrete pdf p X (x ) from Algorithm 4; min p X (x 1 ), p X (x N ) < 0.
end if x t p X (x t ) Output : Discrete pdf p X (x ).

Comonotonization and Mixture Approximation
The basic construction idea for p + S is listed in Algorithm 6 and illustrated by Example 3. The method proceeds recursively, defining the next element of the new distribution p + S based on the first elements of p X and p Y then modifying p X and p Y , respectively.It requires O(N x + N y ) operations.
q Algorithm 6: Distribution of the comonotonic sum Input : Two discrete probability density functions p X and p Y , with irregular positive supports x = {x 1 , x 2 , . . ., x N x −1 , x N x }; y = {y 1 , y 2 , . . ., y N y −1 , y N y } and associated probabilities  Example 3: To visualize the algorithm on the right, consider two pdfs p X and p Y listed in the left panel of the above cartoon.These pdfs are represented as subsegments of the 0-1 probability line.Quantiles of the same order are added and recursively registered as points on the S + support.At the start of the algorithm, we (i) subtract the smaller of {p X (x 1 ), p Y (y 1 )} from the larger, (ii) assign the smaller to p S + (s + 1 ), (iii) replace the larger with the residual, and (iv) zero the smaller.Next, we move on to the first nonzero probabilities in p X and p Y , and repeat the above procedure for computing p S + (s + 2 ).The algorithm ends once the probabilities in both p X and p Y are exhausted.
Output : Discrete probability density function p S + of the comonotonic sum S + = X + Y satisfying , where u ∼ U(0, 1) and P −1 X , P −1 Y are the inverse cumulative distribution functions of X and Y respectively.The support of p S + is defined as s + = {s + 1 , s + 2 , . . .+ , s + N s −1 , s + N s } and associated probabilities as Once p S ⊥ , p S + and w are known, the mixture distribution p S is composed using (11) .In general, supports of p S ⊥ and p S + cover the same range but are defined on different individual grids, so arithmetizing on common grid is needed.If 4-point regriding is used, the target E[S] is preserved exactly and the target Var[S] is preserved with small error due to occasional retreats to linear regriding.

Order of Convolutions and Tree Topology
Following Wojcik et al. (2016), to further minimize this error, convolutions should be ordered to assure that two distributions undergoing convolution have supports covering approximately same domains (x N x − x 1 ≈ y N y − y 1 ) with the same span (h x ≈ h y ) in Algorithm 1.Since convolution is computationally more expensive than comonotonization, we assume that the order of convolutions governs the order of comonotonizations.Therefore, topology of a particular aggregation tree should be determined by the order of convolutions.For example, ascending order arranges risks at leaf nodes from smallest to largest maxima prior to aggregation, and then accumulates the risks using closest pair strategy depicted in the bottom panel of Figure 1.More sophisticated risk sorting strategy originates from the balanced multi-way number partitioning and is referred to as the balanced largest-first differencing method in Zhang et al. (2011).Here, one seeks to split a collection of numbers (risk maxima) into subsets with (roughly) the same cardinality and subset sum.In general, sorting based convolution orders are useful where no specific way to group risks at leaf nodes of aggregation tree is of importance for total risk analysis.When the goal is to assess the impact of spatial dependencies among elements within a CAT risk portfolio on aggregate loss, geographical grouping of properties (locations) affected by a particular CAT event is crucial (Einarsson et al. 2016).To account for such patterns we propose to use the recursive nearest neighbor order (RNN) of convolutions where depth-first search in Cormen et al. (2009) is performed to define an aggregation tree, see example in Figure 4.In contrast with sorting based strategies, the RNN keeps the original (predefined) ordering of risks at the leaf nodes of the aggregation tree intact.
23 / 2 = 11.5 7 / 2 = 3.5 12 / 2 = 6 It must be mentioned that order of convolutions is implicitly affecting the topology of the direct aggregation model in (7).This is shown in Figure 5 where the convex sum approximation at the root node is composed of aggregates constructed using, e.g., the RNN approach in Figure 4.
. An example of recursive nearest neighbor (RNN) approach for determining topology of direct risk aggregation tree for six risks shown in Figure 4.Note that the order of comonotonic aggregation follows the order of independent aggregation.

Results
As an example of loss analysis, we estimated ground-up loss for a major hurricane event in the US affecting 29,139 locations from portfolio of an undisclosed insurance company.Loss X k at kth location is described by inflated transformed beta pdf, i.e., the main part in ( 14) is parameterized using transformed beta in Venter (2013).We assume that the mean of each pdf is a function of CAT model prediction and that covariance between risks is a function of model error, so where Θ k is the parameter vector, I k stands for hurricane peril intensity end g is the damage function which transforms intensity into damage ratio.The intensity is expressed as wind speed and computed using the U.S. hurricane model in AIR-Worldwide (2015).Loss distributions are discretized on 64-point grid.Spatial dependency is described by Σ with nested block diagonal structure in Figure 2B.This structure is a consequence of using exchangeable correlation in spatial bins at two scales: 1 km and 20 km.The values assigned to off-diagonal elements of Σ are 0.07 if two locations belong to the same 1 km bin and 0.02 if two location belong to the same 20 km bin but different 1 km bins, respectively.Correlation between any two locations in different 20 km bins is set to zero.Details of our estimation methodology are given in Einarsson et al. (2016).Portfolio rollup was performed via direct, hierarchical sequential and hierarchical RNN models using Split-Atom convolution with linear and 4-point regriding.Note that for the hierarchical sequential model, the order of convolutions simply corresponds to the order of locations in insurance portfolio.The maximum number of points discretizing the aggregate distribution grid was set to 256.Additionally, to keep the discretization dense at the bulk of the aggregate distribution, we investigated the effect of right tail truncation at losses with probabilities ≤ 10 −10 .
The runs above were compared with Monte Carlo (MC) simulations.We applied Latin Hypercube Sampling in McKay et al. (1979) with sample reordering in Arbenz et al. (2012) to impose correlations between samples.We generated 30 realizations with 1 MM samples each.Second order statistics and Tail Value at Risk (or p%-TVaR, which measures the expectation of all the losses that are greater than the loss at a specified p% probability, see, e.g., Artzner et al. 1999;Latchman 2010) for these runs are presented in Table 1.For the direct model, linear regriding inflates the variance of the total risk.This behavior is alleviated by tail truncation and/or 4-point regriding coupled with RNN order of convolutions.Variance of the total risk obtained from the sequential model with linear regriding and no truncation is higher than the corresponding variance of the RNN model.This is due to increasing size of the partial aggregate pdf support at upper branching nodes of the sequential tree as compared to that of the individual risks being cointegrated to these nodes.Again, tail truncation and 4-point regriding tackles this effect.For the RNN model, the reproduction of the tail statistics is poor if linear regriding is applied with or without tail truncation.These errors are reduced by 4-point method in Algorithms 4 and 5. Figure 6 shows comparisons between the aggregate distributions for direct and hierarchical models.Models based on linear regriding (the upper row) lead to coarse discretization of the final aggregation grid, with obvious shape mismatch between the sequential model and MC run.This is due to the lack of mechanism for keeping the dense discretization of the bulk of the total loss distribution and, for sequential model, due to increasing differences in max losses between partial aggregates and risks being cointegrated at branching nodes.Such mechanism is included in 4-point regriding (lower row in Figure 6).Interestingly, the grid of the total loss for direct model is still coarse.Here, the support span of p S + is much larger than that of p S ⊥ .Placing combination of both sums on the same grid causes the discretization of the comonotonic sum to dominate.Conversely, the bulk is well resolved by hierarchical models, however, only the RNN model reproduces the shape of the total loss distribution obtained from MC run.Processing times of the investigated models are listed in Table 2. Clearly, the proposed models outperform MC simulations.This is because (i) sampling from skewed loss distributions requires large sample size to resolve higher order moments of the total loss distribution while the implementation of the mixture method presented in the paper operates on 256-point grid and guarantees reproduction of the second order moments only, (ii) floating point addition (MC) and multiplication (the mixture method) have the same performance on (most) modern processors, (iii) large samples cause out-of-cache IO during reshuffling and sorting, i.e., the complexity O(N log N) for fast sort does not apply.
Further, we investigated reproduction of the second order moments, 1,5,10%-TVaR and run times as a function of maximum grid size permitted during aggregation.These results are displayed in Figure 7. Oscillation of error curves (Figure 7A-E) is caused by variability in the total risk support size attributed to the maximum permitted grid size and to the risk aggregation scheme used.As expected, estimates of the mean (µ) and the standard deviation (σ) perfectly reproduce their target values regardless of the support size.For small support sizes, the RNN model approximates TVaR better than sequential model (see Figure 7A-C).The pronounced underestimation of 1%-TVaR in Figure 7C is because 4-point regriding in Algorithm 5 eliminates negative probabilities at the expense of truncating min/max range of convolution and/or comonotonization support.The remedy is to increase the maximum support size.The sequential order has roughly linear growth of grid size because at the lth hierarchy of the sequential tree (see Figure 1), the left child node is always the aggregation of the first l − 1 distributions, which guarantees the lth hierarchy has a grid size (before possible truncation) greater than that of the left child.The RNN order has nonlinear growth of grid size because for the lth aggregation, the two child nodes could be the aggregations any number less than l of distributions, see Figure 4.In turn, grid sizes in the sequential run are larger on average than those in the RNN run.The latter yields higher speed shown in Figure 7F.

Figure 1 .Figure 2 .
Figure 1.Aggregation of five risks using copula trees.Direct model (upper panel), hierarchical model with sequential topology (middle panel) and hierarchical model with closest pair topology (lower panel).The leaf nodes represent the risks whose aggregate we are interested in.The branching nodes of direct tree represent a multivariate copula model for the incoming individual risks while the branching nodes of hierarchical trees represent a bivariate copula model for the incoming pairs of individual and/or cumulative risks.(A) (B) (C) Linear regriding of fine scale pdf (blue bars) onto new grid (orange circles).The resulting coarse scale pdf is represented as green bars.Output : Discrete pdf p X (x )

Figure 4 .
Figure 4.An example of RNN approach for determining topology of hierarchical risk aggregation tree for six risks with zero minima.The maxima and cumulative maxima characterizing losses for the six risks are presented in the upper panel.(A) The algorithm takes the largest cumulative max and halves it to obtain the number c.Then, it binary searches for the number closest to c except for the last element in the sequence.This number (showed in bold) becomes the cumulative maximum of the new subsequence.The search is repeated until the subsequence consists of two (B) The resulting hierarchical aggregation tree.

Figure 6 .
Figure 6.MC (red line) and convolution/comonotoinization based (blue bars) distributions of the total risk for 29,139 locations affected by hurricane peril using different aggregation models with linear regriding (upper row) and 4-point regriding (lower row).No tail truncation was applied.For consistency, the losses are plotted in [0;$100 MM] interval.
Dhaene et al. (2014)the same Fréchet classDhaene et al. (2014).To approximate the distribution P S of the arbitrary sum S we use the weighted average step size of the main part of convolution grid s 8: h s = max(h x , h y , h s )// final step size of the main part of convolution grid s 9: N s =x Nx +y Ny −(x 1 +y 1 ) h s // corresponding number of points 10: // set irregular convolution grid:

Table 1 .
Mean (µ), standard deviation (σ) and tail Value-at-Risk (TVaR % ) at levels 1%, 5% and 10% of the total hurricane risk for 29,139 locations for (A) direct, (B) hierarchical sequential and (C) hierarchical RNN aggregation models compared to the average values from 30 realizations of MC runs.The losses are in [$MM].Numbers in brackets represent percentage errors relative to MC simulations.

Table 2 .
Processing times [s]for different risk aggregation models.MC run is a single realization with 1 MM samples.The mixture method implementation for hierarchical trees was optimized for performance with nested block diagonal correlation structure in Figure2B.Intel i7-4770 CPU @ 3.40 GHz architecture with 16 GB RAM was used.