Misspecification Tests for Log-Normal and Over-Dispersed Poisson Chain-Ladder Models

Despite the widespread use of chain-ladder models, so far no theory was available to test for model specification. The popular over-dispersed Poisson model assumes that the over-dispersion is common across the data. A further assumption is that accident year effects do not vary across development years and vice versa. The log-normal chain-ladder model makes similar assumptions. We show that these assumptions can easily be tested and that similar tests can be used in both models. The tests can be implemented in a spreadsheet. We illustrate the implementation in several empirical applications. While the results for the log-normal model are valid in finite samples, those for the over-dispersed Poisson model are derived for large cell mean asymptotics which hold the number of cells fixed. We show in a simulation study that the finite sample performance is close to the asymptotic performance.


Introduction
"Can we trust chain-ladder models?" is a central question in non-life insurance claim reserving.It hinges on the model assumptions: if these are violated the answer would be "no".For example, the popular over-dispersed Poisson chain-ladder model assumes a fixed variance to mean ratio across the run-off triangle.If this is false then distribution forecasts are bound to fail.Yet, there is no statistical theory available to test for a violation of this assumption.
We show that testing for a violation of central assumptions is straightforward in two popular chain-ladder models: over-dispersed Poisson and log-normal.While the over-dispersed Poisson model assumes a fixed variance to mean ratio, the log-normal model imposes a common variance of the log data.Further, both models assume a chain-ladder structure.That is, accident year effects do not vary by development year and vice versa.We show that these assumptions are not only testable, but testable with standard tools that can easily be implemented in a spreadsheet.
The over-dispersed Poisson model arguably owes its special status to the ubiquitous chain-ladder technique.Kremer (1985) showed that this deterministic technique so commonly used in claim reserving is replicated by maximum likelihood estimation in a Poisson model.However, integer support and the implicit assumption that the variance equals the mean cannot be reconciled with insurance claim data.This explains the need for the over-dispersed Poisson model which relaxes both of these assumptions.Unlike the Poisson model, the over-dispersed Poisson model is moment-based and does not come equipped with a distributional framework.Despite this shortcoming, distribution forecasts are needed and bootstrapping (England 2002;England and Verrall 1999) is in widespread use.Yet, so far we do not have a statistical theory for the bootstrap in this setting.
Recently, Harnau and Nielsen (2017) proposed a distributional framework that incorporates the moment assumptions of the over-dispersed Poisson model.This framework allows for a compelling asymptotic theory that does not require a large array but rather large cell means.The practical implication is that for a run-off triangle with a large, potentially unknown, number of payments, we can use a fixed sample size Gaussian distribution theory.They derive parameter distributions, tests for model reduction, such as the absence of calendar effects, and closed form distribution forecasts.Their assumptions accommodate, among others, many compound Poisson distributions.In insurance, these have the interpretation that each cell of aggregate incremental claims is the sum of a Poisson number of claims each with a random individual claim amount.The asymptotic theory then does not assume that we have many such cells, but rather that the mean of the Poisson number of claims is large.We stress that while Harnau and Nielsen (2017) largely use terminology from the age-period-cohort literature, the theory immediately applies to the reserving literature by renaming age, period, and cohort effects to development, calendar, and accident effects.
Modeling aggregate incremental claims as log-normal rather than over-dispersed Poisson is also common.Kremer (1982) introduced a log-normal model with multiplicative mean structure mirroring the over-dispersed Poisson chain-ladder model.While this model does not replicate the classic chain-ladder technique, it is easily estimated by least squares.Recently, Kuang et al. (2015) derived explicit expression for the estimators in the log-normal model.These have interpretation as a geometric, rather than the classic arithmetic chain ladder.Other contributions for the log-normal model are discussed in the excellent overview of stochastic reserving models by England and Verrall (2002).
We are of course not the first to question the validity of the assumptions in these models.Yet, so far the problem was dealt with by specifying more flexible models.For example, Hertig (1983) considers a log-normal model that allows the log data variance to vary by development year.The double-chain-ladder model by Martínez Miranda et al. (2012) has, conditional on the incurred counts, an approximate over-dispersed Poisson structure where the over-dispersion varies by accident year.The "distribution-free" model by Mack (1993) has separate variance parameters for each development year.We note that while this model also replicates the classical chain-ladder point forecasts, it differs from the over-dispersed Poisson model and so far lacks a distributional framework that would allow for a rigorous statistical theory.Thus, while it is a popular model, we do not consider it further in this paper.
While using more flexible models seems sensible when assumptions are violated, we should not be too quick to dispose of well-known simple models.Particularly for forecasting, such simpler models may be advantageous.A statistical framework for misspecification testing is thus needed.The tests may corroborate the initial modeling choice of the expert, draw attention to an issue, or confirm the suspicion that the model is not well suited for the task.Whichever scenario the expert encounters, the misspecification tests can help to make an informed choice.
The test statistics we propose in this paper are well known in an analysis of variance (ANOVA) context.There, the researcher is usually presented with several samples and wants to test for treatment effects.The data are often assumed to be independent Gaussian.The first step is to test for common variances across samples.This is done with a Bartlett test based on an easily computed likelihood ratio statistic.Then, given common variances, a standard F-test can be used to test for different means between the samples, indicating a treatment effect.
The difference to the ANOVA application is that we generally have data for only one sample, often a run-off triangle.We thus reverse engineer the ANOVA situation by splitting the data into several artificial sub-samples.This idea has a long history in the econometric literature.For instance, Chow (1960) proposed a test for structural breaks that involved splitting the sample at the known breakpoint.In the (weak) instrumental variable literature, Angrist and Krueger (1995) proposed a split-sample procedure with the objective to break the bias of the instrumental variable estimator towards the ordinary least squares estimator.Figure 1 shows examples of how we could split run-off triangles into sub-samples.In Section 2, we give a precise definition for the conditions that both the data set as well as the artificial sub-samples must meet.We note that while we do not provide guidance on how to chose the sub-sample structure in this paper, the choice does not affect the size of the proposed tests under the null hypothesis.
In a log-normal model, taking logs yields Gaussian data such that we can directly apply the Bartlett and F-test from the ANOVA scenario.While the finite sample distribution of the Bartlett test statistic has no closed form, it does not have nuisance parameters and critical values could easily be simulated.However, Bartlett (1937) suggests a χ 2 approximation to the exact distribution that allows us to sidestep simulations.For a special case with just two sub-samples, we can also apply an F-test for the hypothesis for common variances of the log data ; while Bartlett and F-tests are not identical, simulations indicate that they give similar information.Next, we show that an F-test for common mean parameters is not only straightforward but also independent of the Bartlett test.These results are collected in Section 3.
In the over-dispersed Poisson model, the asymptotic framework by Harnau and Nielsen (2017) catapults us into a finite dimensional Gaussian world.Therefore, the results developed for the log-normal model carry over.We can now asymptotically use a Bartlett test as a test for common over-dispersion across sub-samples.Similarly, an F-test for common mean parameters across sub-samples is asymptotically F-distributed and asymptotically independent of the dispersion parameter tests.We stress again that the asymptotic theory does not require a large triangle but rather large means of the cells in the triangle.As for the log-normal model, we could simulate critical values for the Bartlett test; however a χ 2 approximation can still be justified.We show all this in Section 4.
The Bartlett test is easily implemented and makes an empirical application straightforward.The same is true for an F-test on the means.We illustrate the testing procedure, splitting the data, estimating the sub-models, Bartlett testing for common dispersion parameters, and F-testing for common mean parameters, in Section 5 with several empirical applications.
We clear up remaining questions about the power of the tests and the performance of approximations in a simulation study based on a run-off triangle.First, it would not take much to simulate critical values of the Bartlett test statistic under the null, rather than to use a χ 2 approximation.However, we show in a simulation study that this approximation works so well that simulating critical values seems superfluous.Second, we produce power curves under several alternatives for the test for common variances of the log data in a log-normal model.Third, we find that the asymptotic results for the over-dispersed Poisson model are well approximated in finite samples, at least in our simulations.The simulation study is in Section 6.
Finally, we discuss some open questions for future research such as how to choose the sub-sample structure and whether one can select between over-dispersed Poisson and log-normal model.With this, Section 7 concludes the paper.

Data and Sub-Samples
Our aim is to test model specification by using statistics that are usually employed to test for common parameters across separate samples.However, we are presented with just a single sample, such as a run-off triangle.Thus, we artificially construct separate samples by splitting the data at hand into sub-samples.Many intuitive splits can be accommodated by the theory, for example, all sub-samples in Figure 1.Here, we define precisely the permissible structures for data and sub-samples, illustrated on an example of a run-off triangle.
For the theory in this paper, we assume that data are a generalized trapezoid as defined by Kuang et al. (2008).This flexible format allows for different numbers of accident and development years, and can accommodate missing past and future calendar years.Run-off triangles are a special case with as many accident as development years and only future calendar years missing.For accident year i and development year j, we count calendar years k with an offset so k = i + j − 1. Generalized trapezoids are characterized by the index set where I l and I u , J l and J u , and K l and K u are the smallest and largest accident, development and calendar year indices available, respectively.We denote the number of cells in I by n.The run-off triangle in Table 1, taken from Taylor and Ashe (1983), are a generalized trapezoid with I l = J l = K l = 1, I u = J u = K u = 10 and n = 55.We also assume that each sub-sample is a generalized trapezoid.We denote sub-samples by I 1 , . . .I m .The sub-samples should be disjoint so I s ∩ I t = ∅ and their union should be the original sample so ∪ I = I.All sub-samples of the examples in Figure 1 are generalized trapezoids.For instance, the sub-sample I 2 in Figure 1c is specified by The purpose of the generalized trapezoid assumption is to ensure parameter identification later on.We note that this assumption is often more restrictive than needed.Examples for arrays that do not fall into the generalized trapezoid category are arrays with missing cells and disconnected arrays such as the combination of sub-samples I 1 and I 3 in Figure 1b.However, for many of these arrays identification may still be given and then the theory developed below will still be valid.

Log-Normal Model
Given data and sub-samples, we can specify a log-normal model, define estimators, and provide the theory for specification testing.The idea is to start with a model that allows parameters to vary across sub-samples and then to test for reductions to a model with common parameters.The latter, most restrictive, model is commonly used in claim reserving.If we reject a reduction to this model, it is likely misspecified.Estimation is done by least squares.The first hypothesis is that log data variances are common across sub-samples; we can test this with a Bartlett test.The second hypothesis is for common linear predictors and can be assessed with an independent F-test.

Model and Hypotheses
The unrestricted model allows both log data means and variances to vary across sub-samples.For this model, we assume that the aggregate incremental claims Y ij, for accident year i, development year j, and sub-sample are independent log-normal with While we focus on linear predictors µ ij, with accident and development year effect, the theory in this paper allows for more general or restrictive linear predictors.For example, we could incorporate calendar year effects as in Zehnwirth (1994) or Kuang et al. (2011).
The first hypotheses restricts log data variances to be common across sub-samples I .The remaining assumptions are maintained; thus, linear predictors are still allowed to vary across sub-samples.We write the hypothesis as The model that arises by imposing this restriction is The second hypothesis nests the first but also restricts linear predictors to be common across sub-samples.The hypothesis is Under this hypothesis, all parameters are common across sub-samples I .Thus, we can feasibly drop the sub-script and write the model under this hypothesis as This is the log-normal geometric chain-ladder model.We can also think about the hypotheses on the original scale.Mean and variance parameters on the log-scale map into median and coefficients of variations on the original scale.Taking the model Thus, the separation between mean and variance on the log-scale translates to separation between median and coefficient of variation on the original scale.Hence, we can alternatively think of H σ 2 as the hypothesis of common coefficients of variation and of H µ,σ 2 as further imposing common median parameters.

Estimation
We estimate on the log-scale with standard estimators, least squares for log data means and residual sum of squares for log data variances.Since the theory for testing developed below is adapted from a Gaussian framework, estimation on the log-scale is intuitive.Before specifying the estimators, we briefly discuss identification.
The identification problem is that for any a, b.Thus, the levels of accident and development effects are not identified.However, the linear predictors µ are identified (Kuang et al. 2008).These are thus invariant to the identification constraints imposed on the individual effects.Therefore, it does not matter whether we impose ad-hoc constraints such as α I l = β J l = 0 or non ad-hoc constraints as suggested by Kuang et al. (2008).We choose to discuss estimation based on the latter, which has the advantage that it allows for straightforward counting of degrees of freedom.By way of example, we apply the identification by Kuang et al. (2008) to a run-off triangle with I = {(i, j) : 1 ≤ i, j, k ≤ I}.Defining the first difference operator as ∆, the idea is to re-write Then, µ ij = x ij ξ where the design vector x ij = (1, 1 (i≤2) , . . ., 1 (i≤I) , 1 (j≤2) , . . ., 1 (j≤I) , ) and the identified parameter vector is ξ = (µ 11 , ∆α 2 , . . ., ∆α I , ∆β 2 , . . ., ∆β I ) .We denote the number of parameters as p = length(ξ).The identification method can be extended to generalized trapezoids as well as to linear predictors with calendar year effects.

Estimation in Unrestricted Model M LN
For the unrestricted model M LN , we estimate linear predictors as With degrees of freedom d f = n − p , we estimate log data variances by (1)

Estimation with Common Variances in M LN
σ 2 Imposing the restriction of common log data variances H σ 2 does not require re-estimation as the estimators from M LN can be re-used.The estimators for the linear predictors µ ij, are identical to those of M LN .The log data variance in M LN σ 2 is estimated by where

Estimation with Common Variances and Linear Predictors in M LN
µ,σ 2 Under the hypothesis H LN µ,σ 2 , which imposes common log data mean and variance parameters, both estimators change.We drop the -subscript indicating the sub-sample since estimation is done over the full sample I.With that, we write the estimators for the linear predictors in M LN µ,σ 2 as We estimate the log data variance σ 2 under this hypothesis, defining

Remarks
Least squares estimation for the identified parameter vector ξ is maximum likelihood estimation in the log-normal model.Kuang et al. (2015) derive a representation of the least squares estimators that is interpretable as a geometric chain-ladder, in contrast to the classic, arithmetic, chain-ladder.
For many regression models, there is little difference between scaling the residual sum of squares by the degrees of freedom or the number of observations; the former yields an unbiased estimator for σ 2 , the latter the maximum likelihood estimator.However, the scaling does matter here due to the large parameter to observation ratio.By way of example, the Taylor and Ashe (1983) data has n = 55 observations but only d f = 36 degrees of freedom so that σ2,LN is some 50% larger then the rival estimator RSS/n.This is amplified for the sub-samples.

Testing for Common Variances
We show how to test for common log data variances, that is for H σ 2 in M LN using a Bartlett test.In a special case with two sub-samples, we can use an F-test instead of a Bartlett test.
The Bartlett test (Bartlett 1937) was designed to test for common variances across several Gaussian samples.Thus, it is directly applicable to the log sub-samples.We only give a rough overview of the theory; for a more detailed derivation in contemporary terminology see Jørgensen (1993, pp. 94-96).The test rests on the independent χ 2 -distribution of σ2,LN in M LN .Rather than deriving a test in the Gaussian model for log(Y ij, ), Bartlett (1937) considers a joint χ 2 model for the variance estimators.In this χ 2 model, the log-likelihood ratio statistic for the hypothesis H σ 2 is for σ2 and σ2 as defined in ( 1) and ( 2 where ) goes to infinity.However, Bartlett (1937) goes a step further and suggest to divide LR LN by Comparing LR LN /C rather than LR LN to a χ 2 m−1 substantially improves the quality of the approximation and makes it useful even in rather small samples.That is, under H σ 2 , The Bartlett correction factor C improves the order of magnitude of the error term.This idea has been shown to apply generally to likelihood ratio tests; see, for instance, Lawley (1956) and Barndorff-Nielsen and Cox (1984).
While using an asymptotic approximation for the Bartlett test is appealing, we could also simulate critical values of the exact distribution.This is feasible because the exact distribution of LR LN , Ba, is free of nuisance parameters.However, if Ba/C is sufficiently close to χ 2 m−1 , simulating the critical values may be unnecessary even for rather small degrees of freedom.Looking ahead, we confirm in a simulation study in Section 6.1 that the asymptotic approximation indeed works very well.
As an alternative to the Bartlett test, we can test the equality of dispersion parameters across two sub-samples with an F-test that is not equivalent to a Bartlett test.The F-test follows quickly given independence and distribution of the log data variance estimators σ 2,LN in (1).Under H σ 2 , so that we can use a (two-sided) F-test to test the hypothesis; see, for example, Snedecor and Cochran (1967, chp. 4.15).We can write LR LN as a function of This mapping is not monotone.Intuitively, the Bartlett test is one-sided compared to a two-sided F-test.Thus, we would expect LR to be increasing both for small and large F LN σ 2 .We can now find scenarios in which the F-test and the Bartlett test lead to different decisions: for example, with d f 1 = 1 and d f 2 = 2 an equal-tailed 5% F-test just about rejects the null for a draw F LN σ 2 = 0.025, while a 5% Bartlett test does not reject with LR(0.025) = 4.23 and a (simulated) exact critical value of 4.91.This leaves the question which test should be used; we investigate this in Section 6.2.
Usually, a drawback of both F and Bartlett test is their sensitivity to departures from Gaussianity of the log data log(Y ij, ).Box (1953) goes as far as comparing the Bartlett test to a test for Gaussianity and argues in favor of robust tests, prioritizing robustness over other qualities such as power.However, sensitivity to non-Gaussianity is not necessarily undesirable for an application to insurance claim-reserving since distribution forecasts of the log-normal model would also be invalid if the data is not log-normal.Besides, we find F-test and Bartlett test appealing for their simplicity and because they carry over to over-dispersed Poisson models as we will see later.Thus, we do not consider methods to improve robustness to departures from Gaussianity such as made by Shoemaker (2003) for F-tests.

Testing for Common Linear Predictors
Now that we know how to test for common variances, we turn to testing for common linear predictors.The idea is to test sequentially: first for common variances, then for common linear predictors.We show how to use an F-tests for the latter and prove that this test is independent of Bartlett and F-tests for common variances.Thus, size control is not an issue.
If we take the model with common variances M LN σ 2 as given, then testing for H µ,σ 2 amounts to testing for common linear predictors.Since standard Gaussian theory applies, under the hypothesis.Thus, we can use a (one-sided) F-test to test for a reduction from M LN σ 2 to M LN µ,σ 2 .Unlike the dispersion Bartlett and F-tests, this F-test is equivalent to the corresponding exact Gaussian likelihood ratio test.However, a χ 2 approximation to the likelihood ratio test may not work well due to rather few degrees of freedom.Thus, we prefer the F-test since it is easier to implement.
A sequential test approach for common variance and common linear predictors is sensible.This is because we can show the tests are independent.We formulate the independence result in a theorem; all proofs are in Appendix A.
Theorem 1.In model M LN µ,σ 2 , the test statistic F LN µ is independent of F LN σ 2 and LR LN .
In applications, we would first conduct a, say, 5% Bartlett test for H σ 2 .Conditional on non-rejection of the hypothesis, we can conduct an F-test for H µ,σ 2 at 5% critical values and be assured that it truly has a 5% size if the hypothesis is correct.

Over-Dispersed Poisson
The over-dispersed Poisson model is appealing because it naturally links to the classic chain-ladder technique, unlike the log-normal model.Harnau and Nielsen (2017) developed an asymptotically framework in which the over-dispersed Poisson model is asymptotically Gaussian.Using their results, we show that finite sample results from the log-normal model hold asymptotically in the over-dispersed Poisson model.The structure of this section reflects the similarities between the log-normal and over-dispersed Poisson model.After setting up the model, we specify the estimators; these are based on a Poisson quasi-likelihood, thus replicating the chain-ladder.Before we can proceed, the over-dispersed Poisson model needs another ingredient, a sampling scheme for the asymptotic theory that we take from Harnau and Nielsen (2017).Then, we show that we can use test for common over-dispersion with a Bartlett test.Finally, we can use an F-test to test for common mean parameters.We prove that this F-test is independent of the over-dispersion test.

Model and Hypotheses
We set up a model that allows over-dispersion and mean parameters to vary across sub-samples, and specify hypotheses for common over-dispersion, and common mean parameters.This mirrors the process from the log-normal model.The key assumption of the over-dispersed Poisson model involves infinitely divisible distributions: to justify it we provide an example that is appealing for insurance claim-reserving.
We adopt the assumptions for the over-dispersed Poisson model from Harnau and Nielsen (2017).One assumption is distributional and allows for an asymptotic theory, the other imposes the desired over-dispersed Poisson chain-ladder structure.Specifically, we assume that aggregate incremental claims Y ik, are independent across (i, k) ∈ I and = {1, . . ., m} with non-degenerate infinitely divisible distribution, at least three moments, and non-negative support.The second assumption imposes a log-linear mean and common over-dispersion within the sub-sample: for all (i, j) ∈ I and ∈ {1, . . ., m}.
The first hypotheses imposes common over-dispersion parameters across sub-samples.It matches the hypothesis from the log-normal model: The remaining assumptions are maintained.We can write the model under this assumption as The second hypothesis again nests the first and imposes common linear predictors.The hypothesis is Dropping the superfluous subscript, we write the model under this hypothesis as the familiar The model under this hypothesis in a run-off triangle replicates the chain-ladder.Thus, M ODP µ,σ 2 is the model we would ideally like to use.
We can motivate the assumption of an over-dispersed infinitely divisible distribution for the aggregate incremental claims by a compound Poisson story.We can think of the aggregate incremental claims Y as a random Poisson number of claims N each with an independent random claim amount X so the Y = ∑ N s=1 X s are compound Poisson.Compound Poisson distributions are infinitely divisible.The over-dispersion σ 2 simplifies to E(X 2 )/E(X).Thus, it is common across the data set if the same is true for the claim amount distribution.If the claim amount distribution varies across sub-samples, so does the over-dispersion.

Estimation
With the model and hypotheses in place, we move on to estimation.The estimators match those in Harnau and Nielsen (2017).Means are estimated by Poisson quasi-likelihood, over-dispersion parameters by Poisson log-likelihood ratios.By estimating means by Poisson quasi-likelihood, we match the classic arithmetic chain-ladder forecasts in run-off triangles as Kremer (1985) showed.Just as the results for the log-normal model, the theory in this section is invariant to the identification scheme since the statistics are functions of the identified linear predictors.We choose the same identification scheme as in the log-normal model, matching the notation.

Estimation in Unrestricted Model M ODP
We estimate linear predictors by Poisson quasi-likelihood ξODP where ξODP = arg max The over-dispersion parameter estimators are Poisson quasi log-likelihood ratios; looking ahead, this is justified by their asymptotic χ 2 distribution.Specifically, the estimator for σ 2 is the Poisson deviance divided by the degrees of freedom.The deviance is the log-likelihood ratio against a saturated model with as many parameters as observations and perfect fit.Specifically for deviance D , the estimator for

Estimation with Common Variances in M ODP
σ 2 In the model with common variances we can, as in the log-normal model, compute estimators from those for the unrestricted model.Estimators for the linear predictors µ ij, are unchanged.The estimator for the over-dispersion parameters is the degree of freedom weighted average where D .= ∑ m =1 D and, as before, d f .= ∑ m =1 d f .

Estimation with Common Variances and Linear Predictors in M ODP
µ,σ 2 In the model with common linear predictors and over-dispersion parameters, we estimate over the full sample.Dropping the subscript, ξODP where ξODP = arg max

Sampling Scheme
The asymptotic theory requires a sampling scheme.The challenge is that the number of observations n grows with the number of parameters: new accident or development years would demand their own parameters.Harnau and Nielsen (2017) circumvent this problem.They propose a sampling scheme that requires the means of the cells in the data set I to grow proportionally.This is reminiscent of multinomial sampling as used, for example, by Martínez Miranda et al. (2015) in a Poisson model.Crucially, the number of observations n, thus the number of parameters, remains fixed.We adopt their sampling scheme and motivate it by a compound Poisson example.
The sampling scheme stipulates that the aggregate mean E(Y .. ) = E(∑ ij∈I Y ij ) over the array grows in such a way that the skewness skew(Y ij, ) vanishes while keeping the frequencies E(Y ij, )/E(Y .. ) fixed.The requirement on the skewness is somewhat unconventional and is motivated by a limit theorem proved by Harnau and Nielsen (2017, Theorem 1).
For intuitive appeal, the skewness in the compound Poisson example from Section 4.1 vanishes as the expected number of claims grows.More precisely, considering once again aggregate incremental claims Y = ∑ N s=1 X s with N being the random Poisson number of claims and X s the random claim amounts, the skewness of Y vanishes if the mean of the number of claims N grows for a fixed claim amount distribution X s .

Asymptotic Testing for Common Over-Dispersion
Having set up the model and sampling scheme, we turn to the asymptotic theory.We show that the asymptotic distribution of the Bartlett test and the two-sample F-test for common over-dispersion match the finite sample distribution of the test for common log data variance in the log-normal model.We can justify a χ 2 approximation to the distribution of the Bartlett test through a sequential asymptotic argument.
To test for common over-dispersion across sub-samples in the over-dispersed Poisson model, we can proceed just as is the log-normal model.This is because the asymptotic distribution of σ2,ODP matches the exact distribution of σ2,LN in the log-normal model (Harnau and Nielsen 2017, Lemma 1): Therefore, to test H σ 2 , we merely replace the estimators from the log-normal model with the over-dispersion estimators and compute Since the theory for the variance tests in the log-normal model hinged on the distribution of the log data variance estimators alone, we can immediately jump to the main result of the paper.
Theorem 2. In the over-dispersed Poisson model with common over-dispersion M ODP σ 2 of Sections 4.1 and 4.3, LR ODP converges to the Bartlett distribution Ba(d f 1 , . . ., d f ) from (4).Further, the F-statistic F ODP In Section 6.3 below, we show that finite sample approximations to the asymptotic results in Theorem 2 work well.To make the χ 2 approximation for the Bartlett test work we can use a sequential asymptotic argument.In the log-normal model, the χ 2 approximation followed through large degree of freedom asymptotics.In the over-dispersed Poisson model, we first let the aggregate mean E(Y .. ) grow such that LR ODP /C is distributed Ba.Then, we can increase the sub-sample dimension and thus the degrees of freedom so Ba becomes χ 2 .Then, under H σ 2 , we can expect A simultaneous double asymptotic theory for large E(Y .. ) and degrees of freedom would have to wrestle with the complication that the number of mean parameters grows with the dimension of the sub-samples.Hence, such a generalization is by no means trivial and the simulations in Section 6 make it seem unnecessary.

Asymptotic Testing for Common Linear Predictors
We show how to F-test for common mean parameters.We also prove asymptotic independence of this F-test and tests for common over-dispersion.
As in the log-normal model, we can use a sequential testing strategy, first testing for H σ 2 , then for H µ,σ 2 .Harnau and Nielsen (2017, Theorem 4) showed that under H µ,σ 2 and thus in M ODP µ,σ 2 , an F-statistic has an asymptotic F-distribution: Thus, we can use a (one-sided) F-test to test for a reduction from M ODP σ 2 to M ODP µ,σ 2 .If we compare to the test in the log-normal model, we simply replaced the residual sum of squares RSS with Poisson quasi-deviances D. The difference is that the F-distribution is now asymptotic and not exact.
To justify a sequential testing approach, it is useful to show that the test is independent of the Bartlett and F-test for common dispersion, just as it was for the log-normal model.Therefore, under H µ,σ 2 the distribution of F ODP µ is asymptotically unaffected by conditioning on non-rejection of tests for common over-dispersion.We confirm in simulations below that this result holds approximately in finite samples.Hence, size control is not an issue in sequential testing, just as for the log-normal model.

Empirical Applications
To illustrate implementation of the theory we take it to the data.A run-off triangle first analyzed by Verrall et al. (2010) is appealing for a log-normal application: Kuang et al. (2015) raised the question of misspecification for this model on this data.As an over-dispersed Poisson example, we chose the data set by Taylor and Ashe (1983) in Table 1 which has become a sort of benchmark data set for this model.Verrall (1991), England and Verrall (1999), and Pinheiro et al. (2003) all use this data, to name but a few.Finally, the data by Barnett and Zehnwirth (2000) seem to require a calendar effect for modeling; we take this opportunity to demonstrate that we can easily test for specification in a model with an extended chain-ladder structure that includes a calendar effect.We use the R (R Core Team 2016) package apc (Nielsen 2015) for the empirical applications and simulations below.

Log-Normal Chain-Ladder
Kuang et al. ( 2015) employ a log-normal chain-ladder model for data in a run-off triangle first analyzed by Verrall et al. (2010).They remark that the largest residuals congregate within the first five accident years, indicating a potential misspecification.Verrall et al. (2010) used the data to illustrate a model that makes use of the number of reported claims that is also available; we do not make use of this information.The data relate to a portfolio of motor policies from the insurer Royal & Sun Alliance.We show this triangle in Table A1.
We take the remarks about misspecification by Kuang et al. (2015) as an opportunity to apply the specification tests for common log data variance and mean parameters.To do so, we first specify the sub-samples.Then, we set up the unrestricted model and test the hypotheses.Figure 2 summarizes the results.
Figure 2a shows how we split the data I, a run-off triangle with ten accident and development years.We split into two sub-samples: I 1 contains the first five and I 2 the last five accident years.Choosing this specific structure seems intuitive given Kuang et al. (2015) remarks about the location of large residuals.Given the sub-samples, we specify the unrestricted independent log-normal model We first consider the hypothesis H σ 2 : σ 2 1 = σ 2 2 for a reduction to Figure 2b shows the relevant estimates and test results.Since we have just two sub-samples, we can test the hypothesis either with a Bartlett test or an F-test for common variances.The two test give a rather similar indication.The Bartlett statistic B LN has a χ 2 p-value of 0.09 and the F-statistic F LN σ 2 a two-sided F p-value of 0.12.If we take the variance test results as an indication not to reject H σ 2 , we can take M LN σ 2 as our primary model and test for H µ,σ 2 .That is, we test for a reduction to Based on the F-statistic F LN µ , we cannot reject this hypothesis with a p-value of 0.91.Thus, we do not find compelling evidence against a reduction to M LN µ,σ 2 .Alternatively, we could make use of the information that there is not just a discrepancy between the sub-samples when it comes to residuals, but that those in I 1 are larger.With this information, we could alternatively have conducted a one-sided F-test for a one-sided hypothesis H σ 2 : σ 2 1 > σ 2 2 .This test yields a p-value of 0.06, a much closer call.Note that we cannot evaluate one-sided hypotheses with a Bartlett test.

Over-Dispersed Poisson Chain-Ladder
The Taylor and Ashe (1983) data in Table 1 has served many times as an empirical application for over-dispersed Poisson chain-ladder models.Thus, it seems only appropriate to investigate the model specification.We summarize results in Figure 3. Figure 3a shows the chosen sub-sample structure.We split the sample after the fifth accident, development, and calendar year into four sub-samples.Unlike in the case of the Verrall et al. (2010) data above, we do not have information indicating a specific sub-sample structure.While arbitrary, we find the chosen structure appealing because all sub-samples are run-off triangles themselves and of relatively similar size.Further, we hope that splits after each of the three time-scales increases our chances to find breaks.We point out that the specific sub-sample structure has no effect on the size of the tests if the hypothesis is true.
Figure 3b shows estimates and test results.The unrestricted model is the over-dispersed Poisson model discussed in Section 4.1 so that Looking at evidence for varying over-dispersion, we test for H σ 2 with a Bartlett test.While we can see quite a bit of variation in the dispersion estimates, ranging from σ2,ODP 4 = 17,592 to σ2,ODP 2 = 168,293, the test does not convincingly reject the hypothesis with a p-value of 0.08.Even though relative deviations from the degree of freedom weighted average σ2,ODP = 68,038 are less stark, it seems to us that making a decision by eyeballing alone would be difficult in this case.
If the Bartlett test results convince us that a reduction to M ODP σ 2 is sensible, we can test for common linear predictors.Given an F-statistic of F ODP µ = 0.46, we cannot reject this simplification with a p-value of 0.93.
Overall, the target over-dispersed Poisson model for the Taylor and Ashe (1983) data survives both misspecification tests at a 5% level for this sub-sample structure.Thus, we may be more confident now to model it with an over-dispersed Poisson chain-ladder model.
We could also opt to repeat the test for other sub-sample structures, adjusting the size to take into account that tests for different sub-sample structures on the same data are generally not independent.For example, retesting for the split into two sub-samples consider above and shown in Figure 1a.For this structure, a Bartlett test statistic of B ODP = 2.89 yields a p-value of 0.09 and an F-test statistic of F ODP µ = 0.63 a p-value of 0.64.Further, we can test for a split into three sub-samples after calendar years four and seven, similar to the structure in Figure 1b.For this structure, we get B ODP = 1.27 with a p-value of 0.53 and F ODP µ = 1.84 with a p-value of 0.11.Controlling the overall size of the thrice repeated sequential tests with a Bonferroni correction, we would reject if any p-value was below 5%/3 ≈ 0.017.This is not the case so the model survives this battery of tests as well.

Log-Normal (Extended) Chain-Ladder
As a final empirical application, we look at a run-off triangle first considered by Barnett and Zehnwirth (2000).We show this data in Table A2.These data are known to be modeled best with a predictor with not just accident and development, but also calendar effects.We look at a model with and without calendar effects.Barnett and Zehnwirth (2000) and also Kuang et al. (2011) consider a log-normal model for this data and we follow them in this choice.As before, we split the data, specify the model, and test for the hypotheses.The results are summarized in Figure 4.  Figure 4a shows the sub-sample structure we choose.Given the apparent need for calendar effects, we aim to maximize power for varying dispersion parameters along the same time dimension and split the run-off triangle, this time with eleven accident and development years, after periods five and eight into three sub-samples.
The top of Figure 4b shows estimation and test results for a model without calendar effect.This model is given by A Bartlett test for the hypothesis H σ 2 of common log data variances has a χ 2 p-value of (just under) 0.05.We may consider this as evidence against H σ 2 .For comparison with the model with calendar effect considered next, we still compute an F-test for the hypothesis H µ,σ 2 .We point out that this test is not strictly a test for common linear-predictors if we are not comfortable to accept M LN σ 2 as a model.The statistic F LN µ = 11.20 has a 0.00 p-value so that we reject H µ,σ 2 .Thus, M LN µ,σ 2 is not well specified.At the bottom of Figure 4b we show results for a model with calendar effects γ for calendar years k = i + j − 1.The model is The theory for specification tests is not affected by this change and thus still valid.A Bartlett test for H σ 2 in this model yields a χ 2 p-value of 0.36 so we may feel comfortable to impose common log data variances and take M LNe σ 2 as given.An F-test for common linear predictors leaves us with a p-value of 0.41.Thus, reducing the model to M LNe µ,σ 2 seems sensible.Therefore, we cannot reject the specification of the model with calendar effect.
If we directly compare the two models, we can see that the calendar effect has a substantial impact on the specification tests.While the model with calendar effect seems to be well specified, the model without this effect raises red flags for both a test for common variances and common linear predictors.The test for common linear predictors is much more strongly affected by dropping the calendar effect than the Bartlett test.This indicates that the shift in log data variances is smaller than that in linear predictors.
We look at the shift in linear predictor in two ways.First, we can directly test for for dropping the calendar effects from the well specified M LNe µ,σ 2 .A standard F-test for the hypothesis H γ : γ k = 0 ∀k yields a p-value of 0.00, consistent with the rejection of the model without calendar effects M LN µ,σ 2 above.Alternatively, we can test for a reduction from M LNe σ 2 , corresponding to the hypothesis H γ k, : γ k, = 0 ∀k, .This reduction allows for breaks in linear predictors between sub-samples.Interestingly, an F-test cannot reject H γ k, (p-value 0.92).As an intuition, we recall that the chain-ladder predictor without calendar effects can accommodate a constant trend in calendar years, but not deviations from that trend.Thus, allowing for separate sets of linear predictors on the sub-samples implicitly allows for three different calendar trends.While still less flexible than the model with an effect for each calendar year, this seems to be good enough.Note, however, that the Bartlett flags the reduction from M LN to M LN σ 2 (but not from M LNe to M LNe σ 2 ).Overall, the analysis suggests that calendar effects are needed in this data set for two reasons for this sub-sample structure: to capture the structure of the linear predictors themselves, and, to a lesser extent, to achieve homogeneous variance across the log data.
We note that for this data, repeating the tests for different sub-samples structures does affect the results.Indeed, considering sub-samples similar to before, the specification of the log-normal extended chain-ladder model is rejected.Specifically, splitting the data into two sub-samples after the fifth accident year, a Bartlett test yields a p-value of 0.017 and an F-test a p-value of 0.004.Considering four sub-samples with splits after the fifth calendar year, the fifth development year and the sixth accident year, the p-value of the Bartlett test is 0.03 and that of the F-test 0.05.Again controlling the size of the repeated tests with a Bonferroni correction, we would reject the null hypothesis if we can find a p-value below about 0.017.This is the case for the F-test and a knife-edge decision for the Bartlett test in the two sub-sample scenario.Thus, for this data we may want to consider a different model or at least be somewhat more skeptical of its results.

Simulations
The developed theory begs several questions that we answer in a simulation study.First, we argued that we can sidestep simulating critical values of the Bartlett distribution Ba and instead approximate these by a Bartlett corrected χ 2 critical value.We show that this works very well.Second, we compute power curves of Bartlett and F-test for common log data variances under several alternatives in a log-normal model to get a better understanding for the tests' behavior.Third, we show that an asymptotic approximation in an over-dispersed Poisson model resembles the asymptotic distribution closely, both under the null and the considered alternatives.Finally, we derived above that F-tests for common linear predictors in the over-dispersed Poisson model are asymptotically independent of tests for common over-dispersion.We confirm that the size of the former test seems unaffected by conditioning on the results of the latter, even in finite samples.

Performance of Bartlett test χ 2 Approximation
The theory tells us that the distribution of Ba/C, which is the exact distribution of the Bartlett statistic B LN in the log-normal model, is close to a χ 2 for large degrees of freedom.We show that the approximation works very well for a range of degrees of freedom.
We draw realizations from the adjusted Bartlett distribution Ba(d f 1 , . . ., d f m )/C as follows.For = 1, . . ., m, we draw independent χ 2 distributed V with d f degrees of freedom and compute s = V /d f and s = ∑ m =1 d f /d f .s .Then, {d f .log( s) − ∑ m =1 d f log(s )}/C is Ba/C distributed.Figure 5a shows the upper 10% probability spectrum of a pp-plot for the adjusted Bartlett distribution Ba(d f 1 , . . ., d f m )/C against a χ 2 m−1 .We show plots for the tuples (26, 6), (3,5,8), (6,6,9), and (6,3,6,6) encountered in the empirical applications above.The plots are based on 10 7 draws for each tuple.The plots seem indistinguishable from the 45-degree line, even though we zoomed in to the upper 10% of the spectrum.Figure 5b is constructed in the same way as Figure 5a, except the degrees of freedom are halved and rounded down.Now, we can see some deviations from the 45-degree line.As expected, we can see convergence to the 45-degree line as the degrees of freedom increase.
In Table 2, we take a closer look at the approximation at α = 1%, 5%, 10% critical values c α of a χ 2 m−1 specifically.The table shows P(Ba/C > c α ), corresponding to the true size of a Bartlett test in a log-normal model if we use the χ 2 approximation rather than simulated critical values.While we can see some differences for some of the halved critical values, we would argue that the approximation for the degree of freedom tuples from the empirical applications is so good that using it is reasonable and should not affect the modeling decision.

Rejection Frequencies of Tests for Common Variance in Log-Normal Model
As a supplement to the behavior of the tests for common log data variance under the null hypothesis in the log-normal model given in Section 3.3, we now also take a look at power.We simulate the three sub-sample structures from the empirical applications and consider rejection frequencies of the tests used in the corresponding applications.We find that the Bartlett and F-test for common variance have very similar power, at least in this simulation.Further, we see that the power does not necessarily decrease with the number of sub-samples.
For the sub-sample structures from the empirical applications (see Figure 1), we simulate ).Thus, we simulate for m = 2, 3, 4 sub-samples.Before specifying the parameter values, we point out that the distribution of tests in this model depends only on ratios σ 2 s /σ 2 t , the degrees of freedom d f , and the number of sub-samples m.To see this, we first re-write Now, under M LN , RSS D = σ 2 χ 2 d f independently.Thus, the distribution of LR LN is invariant to common changes in levels of σ 2 as well as to µ ij, .Therefore, we can normalize the smallest σ 2 to unity and set µ ij, = 0 without loss of generality.The distribution of the F-statistic F LN σ 2 shares these properties.
For each sub-sample scenario, we consider a range of values for the log data variance ratios σ 2 s /σ 2 t .For m > 2 sub-samples there is more than one ratio such that we cannot effectively visualize all combinations.We thus consider the following special case.For each sub-sample structure, we the compute the spacing of the estimates from the corresponding empirical application.That is, we order the empirical estimates σ2 (1) < • • • < σ2 (m) and compute the m spacing-coefficients x = ( σ2 − σ2 (1) )/( σ2 (m) − σ2 (1) ).We note that x (1) = 0 and x (m) = 1.The spacings (x 1 , . . ., x m ) in the empirical examples are (1, 0) (Verrall et al. 2010), (0, 0.76, 1) (Barnett and Zehnwirth 2000), and (0.09, 1, 0.58, 0) (Taylor and Ashe 1983).The log data variance for the -th subset is then To trace out power curves, we vary the largest ratio σ 2 (m) /σ 2 (1) from one, corresponding to H σ 2 : σ 2 = σ 2 , to twenty in 0.5 increments.As noted above, we can set σ 2 (1) = 1 without loss of generality.For each degree of freedom scenario and for each ratio σ 2 (m) /σ 2 (1) , we draw 10 6 sub-samples.For each draw, we compute the test statistics used in the corresponding empirical application, F LN σ 2 as in (6) and B LN as in (5).We note that for m = 3, we compute only the Bartlett test statistic for the model with calendar effect B LNe to make the plot less cluttered.Thus, the degrees of freedom for σ2,LN in the three scenarios are (26, 6), (3,5,8) and (6,3,6,6).We use χ 2 critical values for the Bartlett tests.
Figure 6 shows rejection frequencies at 5% critical values.We can see that all tests have the right size under H σ 2 , that is for σ 2 (m) /σ 2 (1) = 1.The power of two-sided F-test and Bartlett test in the two sub-sample scenario is very similar with a slight advantage for the Bartlett test.Thus, the choice between the two test may mostly depend on taste.Comparing Bartlett tests across scenarios, we see that the power for m = 4 sub-samples is larger than that for m = 3 sub-samples.Thus, fewer sub-samples do not necessarily imply higher power.Intuition comes from the degree of freedom weighting.For m = 3 sub-samples, if we drop the variance with the smallest degree of freedom the larger two variances are relatively homogeneous.Meanwhile, for m = 4 sub-samples there is still plenty of variation left among the largest three variances.Thus, since the test attributes more weight to the better estimates with higher degrees of freedom, the scenario with m = 3 sub-samples is a rather tough case.
We indicated the σ 2 (m) /σ 2 (1) ratios we found in the individual empirical applications by vertical lines.We recall that the spacing of intermediate variances is taken from the empirical applications.Therefore, suppose that the empirical estimates are the truth such that H σ 2 is violated.Then we can read of the power against this scenario directly from the plot.For example, in the application to the Verrall et al. (2010) data, the F-test would have a power of about 35% while the Bartlett test power would be closer to 40%.

Performance of Over-Dispersed Poisson Model Asymptotics
The theoretical results for the over-dispersed Poisson model are asymptotic, rather than exact as in the log-normal model.We show that an asymptotic approximation works well.Tests for common over-dispersion have the right size under the null.The power under the alternative in finite samples is close to the asymptotic power.Further, F-tests for common linear predictors conditional on non-rejection of over-dispersion tests are very close to F distributed in finite samples.

Rejection Frequencies of Tests for Common Over-Dispersion
We can use the rejection frequencies from the log-normal simulations as a benchmark for those in the over-dispersed Poisson model.To see this we recall that as the overall mean E(Y .. ) → ∞, the over-dispersion estimator σ2,ODP D → σ 2 χ 2 d f /d f in the over-dispersed Poisson model M ODP .This matches the exact distribution of σ2,LN in M LN .Thus, asymptotically, the distribution of LR ODP in M ODP and LR LN in M LN are identical for identical ratios σ 2 s /σ 2 t .The same holds for F ODP σ 2 and F LN σ 2 .We simulate for the same three sub-sample structures as in the log-normal simulations.For the simulation design, we set-up an unrestricted model M ODP that satisfies the assumptions in Section 4.1.
For the distribution of the cells we choose compound Poisson-gamma so Y ij, = ∑ C ij, s=1 X s, where C ij, D = Poisson{exp(µ ij, )} independent of the i.i.d.gamma distributed X s, with scale σ 2 − 1 and shape (σ 2 − 1) −1 .We note that the parametrization for the linear predictors µ ij, and the level of the over-dispersion σ 2 matters in finite samples.This is in contrast to the log-normal model.The reason is that the finite sample distribution of σ2,ODP in M ODP is generally not σ 2 χ 2 d f /d f .Thus, for each considered scenario, we set the linear predictors µ ij, to the estimates μODP ij, from the data in the corresponding empirical application.Similarly, we set the smallest over-dispersion σ 2 (1) = σ2,ODP .
We again vary the ratios σ 2 (m) /σ 2 (1) from one to twenty, using the exact same spacing x from (13) in the log-normal simulations so σ .
Figure 7a shows the rejection frequencies at 5% critical values for the four test statistics from the empirical applications as in the log-normal model but now computed based on σ2,ODP .

Independence of Test for Common Linear Predictors
We move on to evaluate the quality of a finite sample approximation to the asymptotic independence in Lemma 1. Specifically, we consider the finite sample distribution of F ODP µ as in (11) given that a tests for common over-dispersion did not reject.Arguably, this is the most interesting case since it matches the natural order of the two specification tests.
We simulate under the null H µ,σ 2 , that is for a model with common linear predictors and over-dispersion M ODP µ,σ 2 .As before, cells Y ij are compound Poisson-gamma.We consider three scenarios, setting the parameters to the estimates for M ODP µ,σ 2 in the three empirical examples.We draws 10 6 triangles per scenario.
For each draw, we compute tests based on the sub-sample structure of the corresponding empirical application.We first conduct a Bartlett test for H σ 2 at 5% critical values.If we do not reject H σ 2 based on this test, we keep the triangle, otherwise we throw it out.Since we simulate under the null hypothesis, we thus keep about 95% of the draws.Only for the draws we keep do we compute the F-statistic for common linear predictors F ODP µ .
Figure 8 shows a pp-plot for the F ODP µ against F d f −d f .,d f .for the triangles that survived Bartlett testing.To be able to tell a difference from the 45-degree line, we limit our attention to the upper 10% of the probability spectrum since.This is also the most interesting range for testing.Even in this spectrum, each plot is very close to the 45-degree line.Therefore, under H µ,σ 2 , we can be reassured that an F-test for common linear predictors has the correct size in finite samples even if we apply it only conditionally on non-rejection of a test for common over-dispersion.

Remark
We note that all simulations are for tests that consider the correct sub-sample structure under the alternative.Of course, this does not seem realistic in applications.However, for tests computed on a given sub-sample structure, it appears we would generally be able to choose a true, different, sub-sample structure against which the tests would at best have limited power.For example, say we compute the tests on the two sub-samples with a split after the fifth accident year in Figure 1a while really there are three sub-samples with an additional split after the fifth development year.Then, we could choose parameterizations for the three true true sub-samples to balance out the variation between the two incorrectly chosen sub-samples, thus minimizing power.Therefore, it seems to us that such simulation results would be almost entirely driven by our chosen parametrization and provide little insight beyond that.We believe the real answer to this problem must come from a theory that is agnostic to the sub-sample structure as discussed below.However, we stress again that the size of the tests under the null hypothesis is not affected by the chosen sub-sample structure.

Discussion
Some questions are left open for future research.For example, it is not clear how to best choose the sub-sample structure and the number of sub-samples.Further, the question arises whether we can somehow select between the over-dispersed Poisson and log-normal model.Finally, a misspecification test for independence of the cells would be a useful addition to the modeling toolkit.
So far, we chose the sub-sample structures somewhat arbitrarily if potentially informed by prior knowledge of the data.While the size of the tests under the null is not affected by the sub-sample structure, the power of the tests under the alternative is affected both by the chosen number of sub-samples and their structure.In applications, the expert may consider choosing a range of sub-samples structures and conducting tests for each, adjusting the size based on the number of tests to account for multiple testing as shown in the empirical applications.For future research, it would be useful to derive a theory that is agnostic to the number of sub-samples and their structure while still directly controlling size.It might be fruitful to look for ideas in time-series econometrics which has been concerned with tests for parameter breaks for a long time.In this literature, Chow (1960) had proposed a test for parameter breaks that required knowledge of the breakpoint.By now, there are several test available that are agnostic with respect to the number of breaks, related to the number of sub-samples in our problem, and the position of breaks, akin to the sub-sample structure.Examples include Andrews' test (Andrews 1993), generalizations of Chow tests (Nielsen and Whitby 2015), and indicator saturation (Hendry 1999).However, these tests are designed for data with a single time-scale and results are generally based on long time-series.In contrast, we are confronted with data with three interlinked time-scales and the arrays are often small with a large number of parameters that is growing with the array size.Thus, the known results do not carry over and a it appears that a new theory is needed.
Since we have seen two models in this paper, log-normal and over-dispersed Poisson, a natural question is when we should choose which model.As we have seen, the log-normal model assumes a fixed standard deviation to mean ratio while the over-dispersed Poisson model considers the variance to mean ratio to be fixed.Making use of recent results for generalized log-normal models by Kuang and Nielsen (2018), a class of models that includes the log-normal but is more general, Harnau (2018) proposes a test to distinguish between (generalized) log-normal and over-dispersed Poisson models based on this discrepancy.
Finally, a misspecification test for the assumption that the cells in the array are independent would be useful.This is an assumption that both the log-normal and the over-dispersed Poisson model impose.In contrast, the "distribution free" model by Mack (1993) relaxes this somewhat, assuming independence only across accident years.

Figure 1 .
Figure 1.Examples for splits of run-off triangles into two (a), three (b) and four (c) sub-samples.Sub-samples are denoted by I .Accident years i are in the rows, development years j in the columns.
), respectively.Define now the Bartlett distribution Ba(•) such that LR LN D = Ba(d f 1 , . . ., d f m ) under the hypothesis.Considering LR LN as a function of the estimators so LR LN = LR( σ2,LN 1 , . . ., σ2,LN m ), the Bartlett distribution is characterized by

Lemma 1 .
In the over-dispersed Poisson model M ODP µ,σ 2 of Sections 4.1 and 4

Figure 2 .
Figure 2. Log-normal chain-ladder model for Verrall et al. (2010) data.Sub-sample structure shown in (a), estimation and test results in (b).

Figure 3 .
Figure 3. Over-dispersed Poisson chain-ladder model for Taylor and Ashe (1983) data.Sub-sample structure shown in (a), estimation and test results in (b).

Figure 4 .
Figure 4. Log-normal chain-ladder (LN) and extended chain-ladder (LNe) model for Barnett and Zehnwirth (2000) data.Sub-sample structure shown in (a), estimation and test results in (b).

Figure 5 .
Figure 5. pp-plots for the adjusted Bartlett distribution Ba/C against χ 2 for varying degrees of freedom.(a) and (b) show results for degrees of freedom corresponding to the empirical applications and half those degrees of freedom, respectively.

Figure 6 .
Figure 6.Power curves for log-normal dispersion tests based on sub-sample structures from empirical applications.Empirical maximum to minimum ratios indicated by horizontal lines.BZ is short forBarnett and Zehnwirth (2000), VNJ forVerrall et al. (2010), and TA for Taylor andAshe (1983).

Table 1 .
Insurance run-off triangle taken from Taylor and Ashe (1983) as an example for a generalized trapezoid.Entries are aggregate incremental paid amounts for claims of accident year i and development year j.Calendar years k = i + j − 1 are on the diagonals increasing from the top left.