5.1. Quality of Saddle Point Approximations
We show that saddle point approximations work well compared to large Monte Carlo simulations.
We consider three parameterizations. First, we let the design
X correspond to that of a chain-ladder model for a ten-by-ten run-off triangle and set the frequency matrix
to the least squares estimates
of the
Verrall et al. (
2010) data in
Table 1 (
). Second, for the same design, we now set the frequency matrix to the least squares plug-in estimates based on a popular dataset by
Taylor and Ashe (
1983) (
). We provide these data in the
Appendix A in
Table A2. Third, we consider a design
X for an extended chain-ladder model in an eleven-by-eleven run-off triangle and set
to the least squares plug-in estimates of the
Barnett and Zehnwirth (
2000) data (
), also shown in the Appendix in
Table A1. We remark that in the computations, we drop the corner cells of the triangles that would be fit perfectly in any case; this helps to avoid numerical issues without affecting the results.
Given a data generating process
chosen from
and
, a design matrix
X and a frequency matrix
, we use a large Monte Carlo simulation as a benchmark for the saddle point approximation. First, we draw
realizations
from
. For the Monte Carlo cdf
, we then find the quantiles
, so
for
. To compute the saddle point approximation
, we use the implementation of the procedure described in
Section 4.4 in the package
quad_form_ratio. Then, for each Monte Carlo quantile
, we compute the difference
. Taking the Monte Carlo cdf as the truth, we refer to this as the saddle point approximation error.
Figure 1a shows the generalized log-normal saddle point approximation error
plotted against
. One and two (pointwise) Monte Carlo standard errors
are shaded in blue and green, respectively. While the approximation errors for
are generally not significantly different from zero, the same cannot be said for the other two sets of parameters. For the parameterizations
and
, the errors start and end in zero and are negative in between. Despite statistically-significant differences, the approximation is very good with a maximum absolute approximation error of just over
. The errors in the tails are much smaller, as we might have expected given the results by
Butler and Paolella (
2008) discussed in
Section 4.4.
Figure 1b shows the plot for the approximation error to
produced in the same way as
Figure 1a. The approximation error is positive and generally significantly different from zero across parameterizations. Yet, the largest error is about
with smaller errors in the tails.
We would argue that the saddle point approximation errors, while statistically significant, are negligible in applications. That is, using a saddle point approximation rather than a large Monte Carlo simulation is unlikely to affect the practitioner’s modelling decision.
5.2. Finite Sample Approximations under the Null
The asymptotic theory above left us without guidance on how to choose between test statistics R and estimators for the nuisance parameter that appears in the limiting distributions . While the choice is irrelevant for large or small , we show that it matters in finite samples and that some combinations perform much better than others when it comes to approximation under the null hypothesis.
In applications, we approximate the distribution of
R by
. That is, defining the
quantile of
as
, we hope that
under the null hypothesis. To assess whether this is justified, we simulate the approximation quality across 16 asymptotically identical combinations of
R-statistics and ratios of quadratic forms
. We describe the simulation process in three stages. First, we explain how we set up the data generating processes for the generalized log-normal and over-dispersed Poisson model. Second, we lay out explicitly the combinations we consider. Third, we explain how we compute the approximation errors. As in
Section 5.1, we point out that we drop the corner cells of the triangles in simulations. This aids numerical stability without affecting the results.
For the generalized log-normal model, we simulate independent log-normal variables
, so
. We consider three settings for the true parameters corresponding largely to the estimates from the same three datasets we used in
Section 5.1, namely the
Verrall et al. (
2010) data (
),
Taylor and Ashe (
1983) data (
) and
Barnett and Zehnwirth (
2000) data (
). Specifically, we consider pairs
set to the estimated counterparts
for
. The estimates
are
for
,
for
and
for
. Theory tells us that the approximation errors should decrease with
, thus as
s increases.
For the over-dispersed Poisson model, we use a compound Poisson-gamma data generating process, largely following
Harnau and Nielsen (
2017) and
Harnau (
2018a). We simulate independent
where
and
are independent Gamma distributed with scale
and shape
. This satisfies the assumptions for the over-dispersed Poisson model in
Section 3.3.1 and
Section 3.3.3. For the true parameters
, we consider three sets of estimates
from the same data as for the log-normal data generating process. We use least squares estimates
so that the frequency matrix
is identical within parameterization between the two data generating processes. The estimates for
are 10,393 for
, 52,862 for
and 124 for
. Those for
are
for
,
for
and
for
. Again, we consider
, but this time scaling the aggregate predictor. If this increases, so should the approximation quality. We recall that
and
pin down
through the one-to-one mapping
. Thus, multiplying
by
s corresponds to adding
to
.
For a given data generating process, we independently draw
run-off triangles
and compute a battery of statistics for each draw. First, we compute the four test statistics
,
,
and
as defined in (
10). Second, we compute the estimates for the frequency matrices
based on least squares estimates, quasi-likelihood estimates and feasible weighted least squares estimates with least squares and with the quasi-likelihood first stage. This leads to four different approximations to the limiting distribution, which, dropping the subscript for the data generating process, we denote by:
Given a data generating process and a choice of test statistic and limiting distribution approximation , we approximate by Monte Carlo simulation. For each combination , we have B paired realizations; for example, and the distribution are based on the triangle . Denote the saddle point approximation to the cdf of as . Neglecting the saddle point approximation error, we then compute as , exploiting that whenever . We do this for .
To evaluate the performance, we consider three metrics: area under the curve of absolute errors (also roughly the mean absolute error), maximum absolute error and error at (one-sided) critical values. We compute the area under the curve as where , so ; we can also roughly interpret this as the mean absolute error since . The maximum absolute error is . Finally, the error at critical values is for the generalized log-normal and for the over-dispersed Poisson data generating process.
Figure 2 shows bar charts for the area under the curve for all 16 combinations of
R and
stacked across the three parameterizations for
. The chart is ordered by the sum of errors across parameterizations and data generating processes within combination, increasing from top to bottom. The maximum absolute error summed over parameterizations is indicated by “+”. Since a bar chart for the maximum absolute errors is qualitatively very similar to the plot for the area under the curve, we do not discuss it separately and instead provide it as
Figure A1 in the
Appendix A.
Looking first at the sum over parameterizations and data generating processes within combinations, we see large differences in approximation quality both for the area under the curve of absolute errors and the maximum absolute error. The former varies from about 5 pp (percentage-points) for to close to 30 pp for , the latter from 8 pp to 45 pp. It is notable that the four combinations involving are congregated at the bottom of the pack. In contrast, the three best performing combinations all involve . These three top-performers have a substantial head start compared to their competition. While their AUC varies from pp to pp, there is a jump to pp for fourth place. Similarly, the maximum absolute errors of the top three contenders lie between pp and pp, while those for fourth place add up to pp.
Considering next the contributions of the individual parameterizations to the area under the curve across data generating processes, the influence is by no means balanced. Instead, the average contribution over combinations of the , and parameterizations is about , and , respectively. This ordering is well aligned in magnitude and ordering with that of and , loosely interpretable as a measure for the expected approximation quality. Still, considering the contributions of the parameterizations within combinations, we see substantial heterogeneity. For example, the parameterization contributes much less to than , while the reverse is true for .
Finally, we see substantial variation between the two data generating processes. While the range of areas under the curve of absolute errors aggregated over parameterizations for the generalized log-normal is pp to 10 pp, that for the over-dispersed Poisson is pp to pp. The best performer for the generalized log-normal is, perhaps unsurprisingly, . Intuitively, since the data generating process is log-normal, the asymptotic results would be exact for this combination if we plugged the true parameters into the frequency matrices. Just shy of these, we plug in the least squares parameter estimates, which are maximum likelihood estimated. It is perhaps more surprising that using is not generally a good idea for the over-dispersed Poisson data generating process even though the fact that these combinations take the bottom four slots is largely driven by the parametrization. Reassuringly, the top three performers across data generating processes also take the top three spots within data generating processes, albeit with a slightly changed ordering.
Figure 3a shows box plots for the size error at
nominal size computed over the three parameterizations and two data generating processes within combinations
for
. Positive errors indicate an over-sized and negative errors an under-sized test. In the plots, medians are indicated by blue lines inside the boxes. The boxes show the interquartile range. Whiskers represent the full range. The ordering is increasing in the sum of the absolute errors at
critical values from top to bottom.
Looking at the medians, we can see that these are close to zero, ranging from pp to pp. However, there is substantial variation in the interquartile range, pp for to pp for and range, pp for to pp for ). The best and worst performers from the analysis for the area under the curve and maximum absolute errors are still found in the top and bottom positions. Particularly the performance of seems close to perfection with a range from pp to pp.
Figure 3b is constructed in the same way as
Figure 3a, but for
, halving the variance for the generalized log-normal and doubling the aggregate predictor for the over-dispersed Poisson data generating process. Theory tells us that the approximation quality should improve, and this is indeed what we see. The medians move towards zero, now taking values between
pp and
pp; the largest interquartile range is now
pp and the largest range
pp.
Overall, the combination performs very well across the considered parameterizations and data generating processes. This is not to say that we could not marginally increase performance in certain cases, for example by picking when the true data generating process is log-normal. However, even in this case in which we get the data generating process exactly right, not much seems to be gained in approximation quality where it matters most, namely in the tails relevant for testing. Thus, it seems reasonable to simply use regardless of the hypothesized model, at least for size control.
5.3. Power
Having convinced ourselves that we can control size across a number of parameterizations, we show that the tests have good power. First, we consider how the power in finite sample approximations compares to power in the limiting distributions. Second, we investigate how power changes as the means become more dispersed based on the impact on the limiting distributions
and
alone, as discussed in
Section 4.5.
5.3.1. Finite Sample Approximations Under the Alternative
We show that combinations of
R-statistics and approximate limiting distributions
that do well for size control under the null hypothesis also do well when it comes to power at
critical values. The data generating processes are identical to those in
Section 5.2 and so are the three considered parameterizations
,
and
. To avoid numerical issues, we again drop the perfectly-fitted corner cells of the triangles without affecting the results.
To avoid confusion, we stress that we do not consider the impact of more dispersed means in this section. Thus, if we mention asymptotic results, we refer to large when the true data generating process is over-dispersed Poisson and for small when it is generalized log-normal, holding the frequency matrix fixed.
For a given parametrization, we first find the asymptotic power. When the generalized log-normal model is the null hypothesis, we find the critical values , using the true parameter values for . Then, we compute the power . Conversely, when the over-dispersed Poisson is the null model, we find and compute the power . Lacking closed-form solutions, we again use saddle point approximations, iteratively solving the equations for the critical values to a precision of .
Next, we approximate the finite sample power of the top four combinations for size control in
Section 5.2,
,
,
and
, by the rejection frequencies under the alternative for
. For example, say the generalized log-normal model is the null hypothesis, and we want to compute the power for the combination
. Then, we first draw
triangles
from the over-dispersed Poisson data generating process. For each draw
b, we find
critical values
. We compute these based on saddle point approximations, solving iteratively up to a precision of
. Then, we approximate the power as
. For the over-dispersed Poisson null hypothesis, we proceed equivalently, using the left tail instead. In this way, we approximate power for all three parameterizations and all four combinations.
Before we proceed, we point out that we should be cautious to interpret power without taking into account the size error in finite samples. A test with larger than nominal size would generally have a power advantage purely due to the size error. One way to control for this is to consider size-adjusted power, which levels the playing field by using critical values not at the nominal, but at the true size. In our case, this would correspond to critical values from the true distribution of the test statistic R, rather than the approximated distribution . Therefore, the choice of would not play a role any more. To sidestep this issue, we take a different approach and compare how close the power of the finite sample approximations matches the asymptotic power.
Table 2 shows the asymptotic power and the gap between power in finite sample approximations and asymptotic power.
Looking at the asymptotic power first, we can see little variation between data generating processes within parameterizations. The power is highest for the parameterization with , followed by with and with . This ordering aligns with that of the standard deviations of the frequencies under these parameterizations, which are given by , and for , and , respectively.
When considering the finite sample approximations, we see that their power is relatively close to the asymptotic power. For , absolute deviations range from pp to pp and for from pp and pp. Compared to that, discrepancies for the parameterization are larger. The smallest discrepancy of pp arises for when the data generating process is generalized log-normal. As before, this is intuitive since it corresponds to plugging maximum likelihood estimated parameters into . With pp, the largest discrepancy arises for for an over-dispersed Poisson data generating process. Mean absolute errors across parameterizations and data generating processes are rather close, ranging from pp for to pp for . Our proposed favourite from above comes in second with pp. We would argue that we can still justify the use of regardless of the data generating process.
5.3.2. Increasing Mean Dispersion in Limiting Distributions
We consider the impact of more dispersed means on power based on the the test statistics’ limiting distributions and . We show that the power grows quickly as we move from identical means across cells to a scenario where a single frequency hits zero.
For a given diagonal frequency matrix
with values
, we define the linear combination:
Thus, for
, we recover
, while for
, we are in a setting where all cells have the same frequencies, so all means are identical. In the latter scenario,
and
collapse to a point-mass at
n, as discussed in
Section 4.4. We consider
t ranging from just over zero to just under
. The significance of
is that
corresponds to the matrix where the smallest frequency is exactly zero.
For each
t, we approximate one-sided
critical values of
and
through:
We iteratively solve the equations up to a precision of . Theorem 3 tells us that the critical values should grow for both models, but that converges as t approaches , while goes to infinity.
Then, for given t and critical values, we find the power when the null model is generalized log-normal and when the null model is over-dispersed Poisson . Again, we use saddle point approximation. Based on Theorem 3, we should see the power go to unity as t approaches .
We consider the same parameterizations , and of frequency matrices and design matrices X as above. The values for are for , for and for . To avoid numerical issues, we again drop the perfectly-fitted corner cells from the triangles. In this case, while the power is not affected, the critical values are scaled down by the ratio of computed over the smaller array without corner cells to that computed over the full triangle. Since this is merely proportional, the results are not affected qualitatively.
Figure 4a shows the power when the generalized log-normal model is the null hypothesis. For all considered parameterizations, this is close to
for
t close to zero, increasing monotonically with
t and approaching unity as
t approaches
, as expected.
For
, where
corresponds to the least squares estimated frequencies from the data, the power matches what we found in
Table 2.
Figure 4b shows the difference in power between the two models plotted over
t. For the three settings we consider, these curves have a similar shape and start and end at zero. Generally, the power is very comparable, with differences between
pp and 1 pp, again matching our findings from
Table 2 for
.
Figure 5a shows the one-sided
critical values
plotted over
t. As expected, these are increasing for all settings.
Figure 5b shows the ratio of the critical values
to
. This starts at unity, initially decreases, then increases and, finally, explodes towards infinity as we approach
.
Taking the plots together, we get the following interpretation. We recall that the two distributions are identical for . Further, the rejection regions for the generalized log-normal null is the upper tail, while the lower tail is relevant for the over-dispersed Poisson model. However, for small t, the mass of both and is highly concentrated around n, and the distributions are quite similar. This explains why the power is initially close to for either. Further, due to the concentration, and are initially close. As t increases, both distributions become more spread out and move up the real line, with moving faster than . This is reflected in the increase in power. Initially, increases faster than , so their ratio decreases. Yet, for t large enough, overtakes , indicating the point at which power reaches for either model. The power differential is necessarily zero at this point. Finally, explodes while converges as t approaches , so the ratio diverges.