Macro vs. Micro Methods in Non-Life Claims Reserving (an Econometric Perspective)

Traditionally, actuaries have used run-off triangles to estimate reserve ("macro"models, on agregated data). But it is possible to model payments related to individual claims. If those models provide similar estimations, we investigate uncertainty related to reserves, with"macro"and"micro"models. We study theoretical properties of econometric models (Gaussian, Poisson and quasi-Poisson) on individual data, and clustered data. Finally, application on claims reserving are considered.


Best estimates and Variability.
In the context of macro-level models, [3] mention that prediction errors can be large, because of the small number of observations used in run-off triangles. Quantifying uncertainty in claim reserving methods is not only important in actuarial practice and to assess accuracy of predictive models, it is also a regulatory issue. [14] and [15] obtained, on real data analysis, lower variance on the total amount of reserves with "micro" models than with "macro" ones. A natural question is about the generality of such result. Should "micro" model generate less variability than standard "macro" ones? That is the question that initiated that paper.
1.3. Agenda. In section 2, we detail intuitive results we expect when aggregating data by clusters, moving from micro-level models to macrolevel ones. More precisely, we explain why with a linear model and a Poisson regression, macro-and micro-level models are equivalent. We also discuss the case of the Poisson regression model with random intercept. In section 3, we study "micro" and "macro" models in the context of claims reserving, on real data, as well as simulated ones.

Clustering in Generalized Linear Mixed Models
In the economic literature, several papers discuss the use of "micro" vs. "macro" data, for instance in the context of unemployment duration in [16] or in the context of inflation in [17]. In [16], it is mentioned that both models are interesting, since "micro" data can be used to capture heterogeneity while "macro" data can capture cycle and more structural patterns. In [17], it is demonstrated that both heterogeneity and aggregation might explain the persistence of inflation at the macroeconomic level. In order to clarify notation, and make sure that objects are well defined, we use small letters for sample values, e.g. y i , and capital letters for underlying random variables, e.g. Y i in the sense that y i is a realisation of random variable Y i . Hence, in the case of the linear model (see Section 2.1), we usually assume that Y i ∼ N (x T i b, σ 2 ), and then b is the estimated model, in the sense that b = (xx T ) −1 xy while B = (xx T ) −1 xY (here covariates x are given, and non stochastic). Since B is seen as a random variable, we can write E[ B] = b.
With a Poisson regression, And for convenience, we will denote Y i ∼ qP(λ i ), with an abuse of notation.
In this section, we will derive some theoretical results regarding aggregation in econometric models.
Stacking observations within a cluster yield the following model where T with similar assumptions except for Var [e g ] = σ 2 /n g . Those two models are equivalent, in the sense that the following proposition holds. Proposition 2.1. Model (1) on a micro level and model (2) on a macro level are equivalent, in the sense that (i) a OLS = b OLS when weights n g are used in model (2); and (ii) i,g y i,g = g y g where y g = n g y g . Proof.
(i) The ordinary least-squares estimator for a -from model (1) -is defined as which can also be written Now, observe that where the first term is independent of a (and can be removed from the optimization program), and the term with cross-elements sums to 0. Hence, a = argmin where b is the least square estimator of b from model (2), when weights n g are considered.
(ii) If we consider the sum of predicted values, observe that i,g Hence, the sum of predictions obtained from model (1) is the same as the sum of predictions obtained from model (2), even if partial sums are considered.
In the proposition above, the equality should be understood as the equality between estimators. Hence we have the following corollary.
Corollary 2.2. We define the following matrices the (1 × n g ) vectors 1 ng = 1 . . . 1 and 0 ng = 0 . . . 0 , and the matrix The OLS estimators are given by Model (1) on a micro level and model (2) on a macro level are equivalent, in the sense that Proof. Straightforward calculations lead to (11 T For the equality of variances, we have The proof of the equality of variances is similar. . This may be too restrictive for many actuarial illustrations, which often show more variation than given by expected values. We use the term over-dispersed for a model where the variance exceeds the expected value. A common way to deal with overdispersion is a quasi-likelihood approach (see [19] for further discussion) where a model is characterized by its first two moments. Consider either a Poisson regression model, or a quasi-Poisson one, In the case of a Poisson regression, and in the context of a quasi-Poisson regression, with ϕ micro > 0 for a quasi-Poisson regression (ϕ micro > 1 for overdispersion). Here again, stacking observations within a cluster yield the following model (on the sum and not the average value, to have a valid interpretation with a Poisson distribution) In the context of a Poisson regression, Var [Y g ] = λ g , and in the context of a quasi-Poisson regression, with ϕ macro > 0 for a quasi-Poisson regression. Here again, those two models ("micro" and "macro") are equivalent, in the sense that the following proposition holds. Proof.
(i) Maximum likelihood estimator of a is the solution of With offsets λ * g = exp[x T g b+log(n g )], g = 1, . . . , m, maximum likelihood estimator of b is the solution (as previously, we can remove ϕ macro ) of Hence, a = b, as (unique) solutions of the same system of equations.
(ii) The sum of predicted values is -Nevertheless, as we will see later on, the Corollary obtained in the context of a Gaussian linear model does not hold in the context of a quasi-Poisson regression.  (ii) By using a similar argument, we have when n goes to infinity -In small or moderate-sized samples, it should be noted that A and B may be biased for A and B, respectively. Generally, this bias is negligible compared with the standard errors (see [20] and [21]).
In the quasi-Poisson micro-level model (from model (7)), as discussed above, the estimator of a is the solution of the quasi-score function which implies a QLE = a M LE . The classical Pearson estimator for the dispersion parameter ϕ micro is .
Empirical evidence (see [29]) support the use of the Pearson estimator for estimating ϕ because it is the most robust against the distributional assumption. In a similar way, the quasi-Poisson macro-level model (from model (10)), the estimator of b is the solution of which implies here also b QLE = b M LE . The dispersion parameter ϕ is estimated by .
Clearly, ϕ micro = ϕ macro involving the following results.
Corollary 2.5. Model (7) on a micro level and model (10) on a macro level are not asymptotically equivalent for quasi-Poisson regressions, in the sense that , when n goes to infinity; and n goes to infinity. Proof.
(i) The property that variances are not equal is a direct consequence of classical results from the theory of generalized linear models (see [19]), since the covariance matrices of estimators are given by when n goes to infinity. Thus, covariance matrices of estimators are asymptotically equal for the Poisson regression model but differ for the quasi-Poisson model because ϕ micro = ϕ macro . (ii) Since the MLE and the QLE share the same asymptotic distribution (see [19]), the proof is similar to 2.4(ii).

Poisson regression with random effect.
In the micro-level model described by Equation (7), observations made for the same event (subject) at different periods are supposed to be independent. Withinsubject correlation can be included in the model by adding random, or subject-specific, effects in the linear predictor. In the Poisson regression model with random intercept, the between-subject variation is modeled by a random intercept γ which represents the combined effects of all omitted covariates. Let Y (t) g represent the sum of all observations from subject t, in the cluster g and where I is the (T ×T ) identity matrix, and N T (µ, Σ) the T -dimensional Gaussian distribution with mean µ and covariance matrix Σ. Straightforward calculations lead to  One may be interested to verify the need of a source of betweensubject variation. Statistically, it is equivalent to testing the variance of γ to be zero. In this particular case, the null hypothesis places σ 2 on the boundary of the model parameter space which complicates the evaluation of the asymptotic distribution of the classical likelihood ratio test (LRT) statistic. From the very general result of [23], it can be demonstrated (see [25]) that the asymptotic null distribution of the LRT statistic is a 50/50 mixture of χ 2 0 and χ 2 1 as g n g → ∞. In this case, obtaining an equivalent macro-level model is of little practical interest since the construction of the variance-covariance matrix would require knowledge of the individual ("micro") data.

Clustering and loss reserving models
A loss reserving macro-level model is constructed from data summarized in a table called run-off triangle. Aggregation is performed by occurrence and development periods (typically years). For occurrence period i, i = 1, 2, . . . , I, and for development period j, j = 1, 2, . . . I, let C i,j and Y i,j represent the total cumulative paid amount and the incremental paid amount, respectively with Y i,j = C i,j − C i,j−1 , i = 1, . . . , I, j = 2, . . . , I.
where columns, rows and diagonals represent development, occurrence and calendar periods, respectively. Each incremental cell Y i,j can be seen as a cluster stacking n i,j amounts paid in the same development period j for the occurrence period i. These payments come from M claims and let Y (k) i,j represent the sum of all observations from claims k in the cluster (i, j). It should be noted that all claims are not necessarily represented in each of the clusters.
To calculate a best estimate for the reserve, the lower part of the triangle must be predicted and the total reserve amount is To quantify uncertainty in estimated claims reserve, we consider the mean square error of prediction (MSEP). Let R be a Y-mesurable estimator for E [R|Y] and a Y-mesurable predictor for R where Y represents the set of observed clusters. The MSEP is Independence between R and Y is assumed, so the equation is simplified as follows 3.1. The quasi-Poisson model for reserves.
3.1.1. Construction. From the theory presented in Subsection 2.2, we construct quasi-Poisson macro-and micro-level models for reserves. For both models, constitutive elements are defined in Table 1. Table 1. Quasi-Poisson macro-and micro-level models for reserve (i, j = 1, . . . , I). All clusters and all payments are independent. Components Macro Micro As a direct consequence of Proposition 2.3, the best estimate for the total reserve amount is where K represents unobserved clusters. For both models, the Proposition 3.1 gives results for the unconditional MSEP.
Proposition 3.1. In the quasi-Poisson macro-level model, the unconditional MSEP is given by where x and W are defined by Equation (12). The unconditional MSEP for the quasi-Poisson micro-level model is similar with ϕ macro replaced by ϕ micro .
Proof. The proof for the macro-level model is done in [21]. For the micro-level model, we have i,j , the bias is generally of small order and by using the approximation exp[x] ≈ 1 + x for x ≈ 0, we obtain By using the fact that b = a and the remark at the end of subsection 2.2, we obtain Thus, the difference between the variability in macro-and microlevel models results from the difference between dispersion parameters. Define standardized residuals for both models r i,g = (y i,g − y i,g ) y i,g and r g = (y g − y g ) y g .
Direct calculations lead to Thus, if the total number of payments ( g n g ) is greater than the value Ψ(m − (k + 1)) + k + 1, then the micro-level model (7) will lead to a greater precision for the best estimate of the total reserve amount and conversely. Adding one or more covariate(s) at the micro level will decrease the numerator of Ψ and will increase the interest of the micro-level model.

Illustration and Discussion.
To illustrate these results, we consider the incremental run-off triangle from UK Motor Non-Comprehensive account (published by [26]) presented in Table 2 where each cell (i, j), i + j ≤ 7, is assumed to be a cluster g, i.e., the value Y g is the sum of n g independent payments.
and 2 micro-level models The final reserve amount obtained from the Mack's model ( [2]) is 28 655 773$. To create micro-level datasets from the "macro" one, we perform the following procedure: (1) simulate the number of payments for each cluster assuming N g ∼ P(θ), g = 1, . . . , m; (2) for each cluster, simulate a (n g × 1) vector of proportions assuming ω g = ω 1 . . . ω ng T ∼ Dirichlet(1), g = 1, . . . , m; (3) for each cluster, define  Figure 1 shows √ M SEP as a function of a expected total number of payments, for the portfolio. Above a certain level, (close to 3 400 here), accuracy of the "micro" approach exceed the "macro". In order to illustrate the impact of adding a covariate at the microlevel, we define a quasi-Poisson micro-level model with a weakly correlated covariate (Model E) and with a strongly correlated covariate (Model F). Following a similar procedure, we obtain results presented in Table 3 and Figure 2.  Figure 1 quasi-Poisson reg.
Model E (ρ ≈ 0) 28 657 364 see Figure 2 Model F (ρ ≈ 0.8) 20 514 566 see Figure 2 As opposed to standard classical results on hierarchical models, the average of explanatory variable within a cluster ((1/n g ) i x ig ) has not been added to the macro-level model (Model B), for several reasons, With an explanatory variable highly correlated with the response variable, results obtained with Model D and E are very close. As claimed by Proposition 3.1 and equation (14), an explanatory variable highly correlated with the response variable will decrease the value of √ M SEP , and lowers the threshold above which the micro-level model is more accurate than the macro-level one.
The quasi-Poisson macro-level model (Model B) with maximum likelihood estimators leads to the same reserves as the chain-ladder algorithm and the Mack's model (see [28]), assuming the clusters exposure, for (i, j) ∈ K, is one. To obtain similar results with a quasi-Poisson micro-level model (Model D), a similar assumption is necessary: exposure of each claim within cluster (i, j) is 1/n i,j . That assmption implies, on a micro level, that predicted individual payments Y (k) ij are proportional to 1/n ij . That assumption has unfortunately no foundation.
In the Poisson and quasi-Poisson micro-level models (Model C and D), payments related to the same claim, in two different clusters are supposed to be non-correlated. As discussed in the previous Section, it is possible to include dependencies among payments for a given claim using a Poisson regression with random effects. Simulations and computations were performed in R, using packages ChainLadder and gtools.
3.2. The Mixed Poisson model for reserves.
3.2.1. Construction. From the results obtained in Section 2.3, it is possible to construct a micro-model for the reserves that includes a random intercept.The later will allow to model dependence between payments from a given claim. Note that it is hard to find an aggregated model with random effects that could be compared with individual ones. In the context of claims reserves, Y (t) g represents the sum of paids made for claim t within cluster g. The assumptions of that model (called model G) are Because of those two random variables in the model, two kinds of predictions can be derived: un-conditional ones, where g |γ t ∼ P( λ g e γt ) λ g = exp[x T g c + ln(1/n g )] so that E Y (t) g = λ g e σ 2 /2 ; and conditional ones, where the unknown magnitude of claim t is predicted by the so-called best linear estimate (that minimizes the MSEP) γ t (see [21]) so that E Ỹ (t) g = λ g eγ t .
It is then possible to compute the overall best estimate for the total amount of reserves.

3.2.2.
Illustration and Discussion. In order to construct a micro-level model from triangle 2, we follow a procedure to the one described in the provious section, with steps 1-3 (that are not mentioned here) 4. for each accident year, allocate randomly the source of each payment 5. fit model G; and 6. compute the best estimate and the MSEP of the reserve.
For a fixed value of θ, the procedure is repeated 1000 times. Various values were considered for θ (10, 25, 50, 100 and 250), and results were similar. In order to avoid heavy tables, only the case where θ = 10 is mentioned here. Simulations and computations were performed with R, relying on package lme4. On Figure 3 we can see predictions of the model on observed data, while on Figure 4 we can see predictions of the model for non-observed cells. Finaly, results are reported in Table  4. At each step, a LRT is performed (see section 2.3) and each time, the variance at origin was significant non-null, meaning that correlation among payments (related to the same claim) is positive. Observe that with the random model, the log-likelihood is approximated using numerical intergration, which might bias computed p-values of the test. Here, p have been confirmed using a bootstrap procedure (using package glmmML).