One-Year Change Methodologies for Fixed-Sum Insurance Contracts

We study the dynamics of the one-year change in P&C insurance reserves estimation by analyzing the process that leads to the ultimate risk in the case of “fixed-sum” insurance contracts. The random variable ultimately is supposed to follow a binomial distribution. We compute explicitly various quantities of interest, in particular the Solvency Capital Requirement for one year change and the Risk Margin, using the characteristics of the underlying model. We then compare them with the same figures calculated with existing risk estimation methods. In particular, our study shows that standard methods (Merz–Wüthrich) can lead to materially incorrect results if the assumptions are not fulfilled. This is due to a multiplicative error assumption behind the standard methods, whereas our example has an additive error propagation as often happens in practice.


Introduction
In the new solvency regulations, companies are required to estimate their risk over one year and to compute a Risk Margin (RM) for the rest of the time until the ultimate risk margin is reached. Actuaries are not accustomed to do so. Until recently, their task mostly consisted of estimating the ultimate claims and their risk. There are only very few actuarial methods that are designed to look at the risk over one year. Among the most popular ones is the approach proposed by Merz and Wüthrich (2008) as an extension of the Chain-Ladder following Mack's assumptions (Mack 1993). They obtain an estimation of the mean square error of the one-year change based on the development of the reserve triangles using the Chain-Ladder method. An alternative way to model the one-year risk is developed by Ferriero: the Capital Over Time (COT) method (Ferriero 2016). The latter assumes a modified jump-diffusion Lévy process to the ultimate and gives a formula, based on this process, for determining the one-year risk as a portion of the ultimate risk. In a previous paper, Dacorogna et al. have presented a simple model with a binomial distribution to develop the principles of risk pricing (Dacorogna and Hummel 2008) in insurance. This has been formalized and extended to study the diversification effects in a subsequent work (Busse et al. 2013).
The goal of this paper is to study the time evolution of the risk that leads to the ultimate risk through a simple example that is easy to handle, and so calculate exactly the one year and ultimate risks. This is why we take here the example (as in Busse et al. 2013) of a binomial distribution to model the ultimate loss, and see it as the result of an evolution of different binomials over time. The company is exposed to the risk n times. Different exposures come in different steps, and each exposure has a

The Probabilistic Model
Let us introduce a process described by random variables following binomials' distribution where the risk of a claim has a probability p and the exposure to this risk is n-times. We use the same probabilistic framework as in Busse et al. (2013). The reader can think of n as the total number of policies and of p as the probability of a claim occurring for any policy.
Let X be a Bernoulli random variable (note that all rv's introduced in this paper will be defined on the same probability space (Ω, A, P)) representing the loss obtained when throwing an unbiased dice, i.e., when obtaining a "6": X = 1, with probability p = 1/6, 0, with probability 1 − p.
Recall that, for X ∼ B(p), E(X) = p and var(X) = p(1 − p). Let (X i , i = 1, . . . , n) be an n-sample with parent rv X, (corresponding to the sequence of exposures n independent exposures we interpret this as a one-step case). The number of losses after n exposures is modelled by S n = n ∑ i=1 X i , a binomial distribution B(n, p). Recall that P(S n = k) = n k p k (1 − p) n−k , k = 0, · · · , n.
(1) E(S n ) = E(X) = np and that, by independence, var(S n ) = np (1 − p). Note that defining the loss as a Bernoulli variable clearly specifies each single loss amount as a fixed quantity. It is the accumulation of these losses that will make the final loss amount S n = ∑ n i=1 X i stochastic. One could define a different probability distribution for the X i , but the Bernoulli distribution is the most appropriate choice for the type of exposures in our case, i.e., the case in which the policies offer a lump sum as compensation.

The Case of a Multi-Step n
We consider now the case where the risk is replicated over n steps, with given n ∈ N \ {0}. At each step i (for i = 1, . . . , n), the number of exposures is random and represented by a rv N i that satisfies the condition which implies that 0 ≤ N 1 ≤ n and 0 ≤ N i ≤ n − ∑ i−1 j=1 N j , ∀ 2 ≤ i ≤ n. Condition (H) is part of the assumption that the ultimate loss distribution is fixed and known. This assumption facilitates our objective, which is to study how the risk can materialize over time given the knowledge of the ultimate loss distribution. Actuarial methods have been developed to estimate the ultimate risk. It is thus reasonable to assume that they are doing a good job at this. Our purpose is thus to find ways to decompose this ultimate risk over time (steps).
All over the process, we are exposed to the risk a random number of times at each step independently, and with the same probability p. We keep the same notation as in the one-step case, X being the Bernoulli rv B(p) that represents the loss obtained when exposing the contract to the risk with the probability p.
At each intermediate step i, we expose the contract N i times. For ease of notation, let us define a new rv A i , which is the sum up to i of all the exposures: For the process, we obtain losses represented by X 1 , . . . , X N 1 , for i = 1, X 1+A i−1 , . . . , X N i +A i−1 , for i = 2, . . . , n, in particular, for i = n, after n-steps, the losses are (X 1 , . . . , X n ). The variable X will then denote the parent rv of the X i s. Hence, when looking at the number of exposures and losses, it means that: (i) at an intermediate step i, the number of exposures is N i and the number of losses obtained at this step is given by: Risks 2018, 6, 75 4 of 29 (the equality in distribution is discussed and proved in Appendix A) setting S · := ∑ · j=1 X j , and is, conditionally on N i , a binomial rv S Recall (see e.g., Mikosch 2004) that, by independence of N i and X, we have and var S (i) (ii) up to an intermediate step i, the total number of exposures is A i and the total number of losses is given by and is, conditionally on (N j , 1 ≤ j ≤ i), a binomial rv: (iii) at the end of the multi-steps process under condition (H), the total number of exposures is n and the total number of losses is, as in the one-step case This is what is called the ultimate loss, once the process is completed.
Summarizing the notation, S is the rv of the losses at the intermediate step i, S A i is the rv of the losses up to the intermediate step i and S n is the rv of the ultimate losses U(n).
For 1 ≤ i ≤ n, N i the σ-algebra (as usual, N 0 or F 0 denote the trivial σ-algebra {∅, Ω}) generated by the sequence of the rvs (N j , 1 ≤ j ≤ i) and let F i denote the σ-algebra generated by the sequence of the rvs (N j , 1 ≤ j ≤ i) and the corresponding losses (X j , 1 ≤ j ≤ N 1 + . . . + N i ): Note that ∀1 ≤ i ≤ n, N i ⊂ F i . The process described here would generate a loss triangle following a Bornhutter-Ferguson (Bornhuetter and Ferguson 1972) type of dynamic, which is essentially linear in the error propagation as Risks 2018, 6, 75 5 of 29 in our example, and would lead this way to the ultimate. This type of dynamic is typical of certain lines of business mentioned in the introduction and often used by reserving actuaries to estimate the ultimate loss. Such loss triangle will be generated assuming independency across different underwriting years. This assumption, together with the independency between development periods already mentioned in the introduction of Section 2, are simplifications which allow to calculate explicitly the quantities of interest (below) and are anyway two of the assumptions behind the Merz-Wüthrich methodology, which is the standard actuarial methodology object of our study.

Random Variables of Interest
Now, let us introduce some variables of interest, namely: • the ultimate loss U(n), which is S n , • the expected ultimate loss, given the information up to the step i (for 1 ≤ i ≤ n): the X i s being independent of the N i s. Note that we can also define which corresponds to the expected loss at ultimate. Note that Equation (8) defines a martingale. Note also that it is a real number (although the rvs are integer valued). • the variation D(i) of the expected ultimate loss between two successive steps defines exactly the one year change, when choosing yearly steps: Here D(i) is also a real rv. When i = 1, the D(1) is closely related to the solvency capital required as defined in the Solvency II framework, which reflects the risk of changes in the technical provision in one year. The difference lies in that D(1) does not take into account the risk margin change. However, this is of minor importance in the SCR estimation because the risk margin change is of a smaller order of magnitude. Indeed, in practice, it is commonly accepted that the risk margin, which represents the risk loading for the market value of the liability, is approximately constant from one year to the other. We note here that the D(i)s are the innovation of the martingale defined in Equation (8).
For simplicity, all the quantities of interest are considered undiscounted in our paper. The discounting would anyway not invalidate our findings, but only make the calculations more complex.
Note that, as in (ii), we can express the expected ultimate loss using the conditional expectations, as The goal of this study is to understand better the behavior of D(i). This means looking for the distribution of U(i).
Using (8) and (10), and the properties of the conditional expectation, we can write, after straightforward computations, Applying (12) provides We can see from (13) that D(i) depends on N i but also on the past information via A i−1 . We can then write the probability distribution of D(i) conditional on this information: using the independence between the X i s and the N j s in the equation before the last, and (3) in the last one.
Let us now choose a probability distribution for the N i s. This choice is arbitrary. We could, for instance, use various distributions typical for modelling frequency or emergence of claims in actuarial modelling, like the Poisson distribution. For simplicity, we will pick first the case of a uniform distribution conditionally to N so that, for 2 ≤ i ≤ n − 1, we have We remind the reader that n = ∑ n i=1 N i . Strictly speaking, we do not even need to have N 1 , ..., N i−1 explicitly. Only their sum matters so that We can now proceed to compute the expectation of N i . Proposition 1. The expectation of N i as a function of i is equal to: Proof of Proposition 1. Using the property of the conditional expectation, then that N 1 is uniformly distributed, and finally the linearity of the expectation, we obtain: Let us prove the proposition by induction. For i = 1, by uniformity of the distribution of the rv N, we have: Risks 2018, 6, 75 7 of 29 Suppose (16) is true for any i ≤ j − 1 and let us prove it for i = j. This is straightforward using (17).
The last step has actually the same expectation as the previous one since it is the complementary and is completely conditioned by the sum of the previous steps We have thus fully characterized the expectation of the number of exposures at each step. (In the Appendix B one can find the distribution of the N i and also of the D(i).)

Incremental Pattern and Capital
Another quantity of interest is the incremental pattern γ i of the expected value of the losses. It can be written, using (4) and (9), as: For computing the RM, we need to compute the capital requirement (SCR) at each step, using the risk measures VaR 99.5% and TVaR 99% , which we will simply call ρ. We define the capital requirement C i at step i as Once we have this quantity, we can then write the RM, R n , as a function of the number of steps n, as: where η designates the cost of capital.

Moments of D(i)
The properties we present here are specific to our choice of process, but they can be easily generalized in the framework of martingale assumptions. Let us compute the first two conditional moments of the vector D = (D(1), ..., D(n)) given F .
Corollary 1. From which, we can deduce the following two expressions as corollary: (a) As a consequence of the Proposition 2, the moments of D are given by Proof of Proposition 2. (i) Using the definitions (8) and (10) and the tower property (with k > i) produces (ii) The conditional variance of D(i) can be obtained in the following way: again, take k ∈ {i + 1, ..., n}, and recall the general property of conditional variance, for two random variables X and Y, var(X) = E(var(X|Y)) + var(E(X|Y)). By using this result on var(D(k)|F i ), we obtain We now use Equation (13) on both terms of the right side of the last equation. Since S (k) We now consider the following argument: Assuming F i known, is equivalent in terms of rvs to starting a new process with n − i steps and n − A i rvs. A generalization of Equations (16) and (18) therefore gives It follows then that We also notice that, since This finishes the calculation of the conditional variance.
This finishes the characterization of the second moment of D(i). Indeed, we have shown that Proof of Corollary 1. Since the particular case i = 1 corresponds to the unconditional moments of D, we have: (a) Thus, the unconditional moments of D are given by (b) As an immediate consequence of Equation (24), one can write the conditional variance of the ultimate as (This formula can also be derived directly, arguing that

Completion Time
Our process is defined on a finite time scale of n-steps. However, the process will, most of the time, finish much before the n-th step (the ultimate time). It is therefore of interest to know how fast the process reaches the maximum exposures allowed. This is also very important for computational reasons. Indeed, when simulating from the process, one can stop the simulation procedure earlier than the ultimate time thus sparing large amounts of calculations. In order to answer this question, we first consider a type of process that differs slightly from ours in the sense that it has an infinite time range (infinite steps), but is still only allowed a finite number of exposures, n. For an infinite time process with n rvs, let us denote by the completion time given by the step at which the last exposure is realized. Before trying to approximate a finite process with an infinite process, we need to show that the infinite process finishes with probability 1.
Proposition 3. The probability of the completion time at infinity is 0: Proof of Proposition 3. For n = 1, we can argue that which implies that P(T 1 = ∞) = 0. Note that P(T 1 = ∞ | N 1 = 0) = P(T 1 = ∞) is justified by the fact that, since the process is infinite, assuming N 1 = k is equivalent to starting a new infinite process with n − k rvs. This argument is used here with k = 0 but is explained with general k because we will use it further for other values of k. For n > 1, we can do very similar calculations by induction. Assume that which again proves that P(T n = ∞) = 0.
Since the process finishes with probability 1 and the probability of an infinite process with n rvs finishing after time n is very small for n large enough, the approximation is very reasonable.
We would like to find a formula for E(T n ). By conditioning on the first step, we obtain Again, assuming N 1 = k is equivalent to starting a new process with n − k rvs. Therefore, E(T n |N 1 = k) = 1 + E(T n−k ), where the 1 takes into account the first step. As E(T n ) only depends on n, let us denote it by E(T n ) = f (n). We can then write the following iterative formula Solving for f (n) and realising that f (0) = 0, we obtain which gives by iteration, for n ≥ 2: Note that Hence, by iterating Equation (30), we obtain the formula We have now an expression that gives the average completion time of an infinite process as 1 plus the truncated harmonic series, which can itself be approximated by where γ ≈ 0.5772 is the Euler constant. If n is large enough, we therefore have a simple way to estimate approximately how many steps the process will last on average.

Distribution of the D(i)
In Section 3.2, we studied analytically the two first moments of the D(i). Here, we simulate 200,000 times the process of size n = 15 to estimate the empirical distribution of the D(i). We also compute the normal distribution with same mean and variance to compare the distributions.
The results are displayed in Figure 1. Let us recall Equation (13) Conditional on N i , this random variable has centered binomial distribution (i.e., binomial distribution minus its average). The Central Limit Theorem applies to binomial rv X with parameters n, p. Thus, the distribution, f , of a sum of n independent Bernoulli rvs with parameter p will converge to a Gaussian distribution: f (X) ≈ N np, np(1 − p) , for n large enough. Assume that F is a discrete mixture of normal distributions with mean 0, variances σ 2 1 , ..., σ 2 N and weights p 1 , ..., p n . Then, ∀x ∈ R, where W is a normal rv N (0, σ 2 i ). In particular, if for some indices i and j, 1/σ i and 1/σ j are close to each other, then Φ(x/σ i ) and Φ(x/σ j ) are also going to be close to each other. In our case, 1/σ i = (ip(1 − p)) −1/2 and these values are going to be close when i is large enough. If a sufficiently large part of the weight is on large values of i, our mixture of centred binomial distributions is going to be close to a Normal distribution. In other words, D(i) is going to have a distribution that is close to Normal only if the probability of having a large N i is high. For n = 15, as we can see in Figure 1, it is the case. For i = 2, the approximation is still reasonable despite a mass larger than normal around 0. For i > 2, the fit is rather good for large and small quantiles but not for the middle part of the distribution. We recall from Equation (16) that expectation of N i is divided by 2 at every step so that small values of N i and in particular 0 will have a larger probability. It should be noted that, due to the probability mass of small values of N i , these mixture distributions, even if they are close to a normal distribution, will always have a higher than normal probability density around 0. To verify this intuition empirically, we also simulate 200,000 times the process for sizes n = 50 and n = 100. The results do confirm our intuition. Indeed, for n = 50, the approximation is good for i = 1, 2, 3 and starts failing for i = 4, where E(N 4 ) = 3.125. For n = 100, we can of course go one step further, which makes the normal approximation good also for D(4).

Simulation of the Completion Time
We have seen, in Section 3.3, Equations (31) and (32), how to approximate the average time until the end of the process. We test this result empirically by simulating 10,000 times the process of size n for n = 20, 40, ..., 100, 200, 400, ..., 1000, 2000, 4000, ..., 30,000. For each n, we calculate the average time to complete the total number of exposures and the 95% Gaussian confidence interval given by adding or subtracting 1.96 times the empirical standard deviation of the sample. We also calculate the harmonic series and its approximation by a logarithm. The two approximations are almost identical and fit very well the results. Indeed, in 25 times, the 95% confidence interval misses once, which is what is expected. The approximations should however be expected not to be valid for very short processes because the logarithm approximation is based on convergence and because both the logarithm and harmonic series approximations are based on the approximation of a finite process by an infinite one, which is inaccurate for very short processes. In order to know how long a process must be for the approximation to be valid, we simulate 10,000 times the process of size n for n = 2, ..., 20. From the results, displayed in Figure 2, we see that for n = 1, ..., 6, the fit is not good. However, from n ≥ 7 onwards, our approximation is a reasonable and easy way to find the completion time. We shall use this result later when we build loss triangles.

Capital Requirements
In this section, we propose a framework to test the accuracy of some of the methods to estimate the one year change. We construct triangles using the model presented in Section 3.4. From these triangles, we estimate the one year change using the classica Merz-Wüthrich method Wüthrich and Merz (2008) and the capital-over-time (COT) method, currently used at SCOR Ferriero (2016). We then compare the results obtained analytically and by simulation based on the specifications of our model to those estimated by the Merz-Wüthrich and with the capital-over-time (COT) method.

Triangles
Until now, we have only considered the development properties of one process. Actual liability data are generally available in the form of triangles that represent the losses attributed to insurance contracts for each underwriting year after their current numbers of years of development. It is therefore reasonable to compare the results of our model in this framework. We first need to define the notation. Let us consider n representations of the process of size n. We denote their ultimate losses U 1 (n), ..., U n (n). The ultimate loss of the triangle is then We denote by N i,1 , ..., N i,n the number of exposures realized at each step of the ith process and S (j) N i,j the losses due to N i,j . The cumulated losses of row i up to column j are written S (j) i . We consider the discrete filtration (F k ) 0≤k≤n given by the information available at each calendar year. Formally, The individual filtrations of each representation of the process are denoted by We denote the jth one-year change of the ith individual process by D i (j) and define the one-year changes of the triangle until its completion by Equation (33) simply writes the global one-year change as the sum of the individual one-year changes. Note that, due to linearity of expectation, Equation (33) also implies that the ∆(i) have expectation 0 and are uncorrelated. Their variances can be calculated by summing the variances of the individual one-year changes that constitute them.
Similarly to the individual process situation, we define the capital requirement for calendar year i as for some risk measure ρ. The risk-margin associated with these capitals is for the cost of capital η.

Methodology and Results' Comparison
We now describe the methodology used to obtain our risk measure results and to compare them with the Merz-Wüthrich and the COT methods, both briefly explained in Appendix D. For convenience, we use the following notationÑ for the number of exposures remaining to be realized for row i of the triangle after time j. In particular, N i,j ∼ Unif {0, 1, ...,Ñ i,j−1 } for j = 1, ..., n − 1. In particular,Ñ i,j is an F i,j −measurable random variable.

First Year Capital Comparison
A first value of interest is the required capital for the first year The Merz-Wüthrich method only provides var(∆(1)|F 0 ), while the COT method was originally designed for xTVaR (A4). An assumption concerning the link between var and TVaR is therefore to be made. Since ∆(1)|F 0 follows a mixture of binomial distributions for a generally large number of rvs, its distribution can be approximated relatively well by a normal distribution (this approximation may lead up to 20% underestimation of the risk depending on the number of exposures n). Note that the number of rvs from the binomial is not a uniform variable but a sum of uniform variables, which diminishes the probability of very large or very low values, thus making the normal approximation better than for a simple uniform number of rvs. Normal distribution fixes the relation between var and TVaR: In our case, µ = E(∆(1)|F 0 ) = 0 and Combining Equations (34) and (35), we can obtain an approximate analytical value for K 1 , The COT method, such as explained in Appendix D, was designed for real insurance data. In particular, parameter b models the dependence between relative loss increments. In the case of our model, the relative loss increments are uncorrelated, which points to the choice of parameter b = 0.5 instead of the value chosen with mean time to payment. The choice of coefficient p b is also arbitrary. Indeed, p b determines the proportion of the risk that is due to the jump part of the process. For our process, there is no "special" type of behaviour that the model could have and that would increase the risk. Therefore, we choose p b = 0. In general, the type of data is not known, in particular the dependency between loss increments is not known. Thus, we are also interested in the results given by the COT method applied the standard way. We therefore also compute the COT estimator with p b chosen according to Formula (A5). In our case, b cannot be chosen like in the formula, as the pattern used in the mean time to payment computation is a paid pattern that we do not have for our model. For the jump case, we choose b = 0.75 as for a long-tail process. Indeed, the (incurred) pattern of our n-step process corresponds generally to the type of patterns that one can find in long (or possibly medium) tail lines of business. We will refer to the two variations of the method as "COT method with jump part" for the version with standard p b and b = 0.75 and "COT method without jump part" for the version with b = 0.5 and p b = 0.
In the case of an n-step Bernoulli model triangle, we notice that the accident-year (incremental) patterns are given by The first factor is simply the total number of exposures remaining to be realized, the first row not being counted because it is finished. Since, at each step but the last, half of the current exposures of the process are expected to be realized, the number of exposures remaining for each unfinished is expected to be the number of exposures remaining in the original triangle divided by 2 for each past step that is not the last step. Hence, the expected remaining exposures are n ∑ j=i+2Ñ j,n+1−j for the lines that are not finished after i steps, plus N i+1,n−i 2 i−1 for the line that finishes precisely after i steps. The pattern designates the results of the binomial random variable and not the number of exposures. However, since the rvs (random variable X) are independent and have the same expectation, the numerator and denominator are both multiplied by p leaving the result unchanged.
In particular, with the help of Equation (37) and after few manipulations, we have that This result uses the property of martingales that the variance of the sum of martingale increments is equal to the sum of variances. An analogous property is false for the TVaR. However, we get for the COT model without jump part, an approximation (see Ferriero 2016) Moreover, if we assume that the normal approximation is not an approximation but indeed an exact distribution, it can be shown through straightforward calculations that this expression becomes an exact result for the required capital for the first year (see Ferriero 2016).
The Merz-Wüthrich method is the one posing the most problems. Indeed, the triangles generated with our process are very noisy in the sense that the simulated triangles can quite often have many zeros. The Mack hypotheses, on which the Merz-Wüthrich method is based, are multiplicative in nature and, therefore, very sensitive to zeros. If there is a zero in the first column of a triangle, Mack's estimation fails to compute the parameters σ 1 . There are more robust ways, such as the one developed in Busse et al. (2010), to calculate these estimators. However, all of them (except removing the line) fail if an entire row of the triangle is 0. This happens quite often for n small. If n is large, the problem becomes, as we explain in Section 3.5, that the process terminates on average in time log(n) + 1.57, which means that the largest part of the triangle shows no variation at all and gives σ i = 0. In order to eliminate all these problems, we simulate our test triangles with n = 100,000 and work on the truncated top side of the triangle of size m = 5 + log(n) + 1.57 ≈E(T n ) = 19, where 5 is a safety margin to insure that the run-off of the process is finished or at least almost finished.
We set p = 0.1%, which is a more realistic value given the high number of policies, simulate 500 triangles and, for each of them, calculate the first year capital K 1 using the theoretical value, the COT method with and without jumps and the Merz-Wüthrich method. We display, in Table 1, the mean capital and the standard deviation of the capital around that mean over the 500 triangles. We also calculate the mean absolute deviation (MAD) MAD = E |X − X| and mean relative absolute deviation (MRAD) with respect to the theoretical value using standard and robust mean estimation. Note that, in our example, the relative risk, i.e., the first year capital relative to the reserves volume, is about 18% (the reserves are approximately 100 and the first year capital approximately 18), which is a realistic value. The reserves in our model can be easily computed by the close formula n(1 − 1/2 m )p, as proved in Appendix C. As a point of comparison, using the prescription of the Solvency Standard Formula, we find a stand-alone capital intensity (SCR/Reserves) between 14% to 26% for the P&C reserves. Given the type of risks we are considering here, it is logical that the capital intensity should be at the lower end of the range. By the way, we also see, as expected, that the average claim is much smaller than the maximum claim (100,000) given the fact that the chances that 100,000 independent policies claim at the same time with such a low probability of claims (p = 1 ) is practically nil.
The results presented in Table 1 are striking. While the COT method gives answers close to the theoretical value with a slight preference, as expected, for the COT without jumps, the Merz-Wüthrich method is way off (1356.6% off the true value), and the true result is not even within one standard deviation away. The coefficient of variation σ/µ for this method is more than 59% while in all the other cases it hovers around 21%. There are many explanations for this. Looking at triangles and analysing the properties of the methods allows us to understand those results. The true risk depends on the number of rvs remaining to be realized. In most cases, only the few last underwriting years are truly important in that matter because the others will be almost fully developed. For Merz-Wüthrich, as most of the volatility of the process will appear on the first step, the most crucial part of the triangle is the last line of the triangle, which is the only process representation at this stage of development. If the latter is large, it influences the Merz-Wüthrich capital in the same direction. Merz-Wüthrich interprets a large value as: "Something happened on that accident year, there is going to be more to pay than expected". The logic behind our model is different, through the "fixed number of rvs" property, it is: "What has been paid already needs not to be paid anymore". A large value on the last line of the triangle is therefore likely to indicate that few rvs remain to be realized, which implies smaller remaining risk. This explains negative correlation because the same cause has the exact opposite effect on the result.
How can we explain the difference of magnitude in the estimated capitals? This may be due to the fact that our model is additive while Merz-Wüthrich is multiplicative. If a small number appears in the first column and then the situation reestablishes on the second step by realizing a larger number of exposures, we know that this is irrelevant for future risk. However, Mack and subsequently Merz-Wüthrich don't consider the increase but the ratio. If the first value is small, the ratio may be large. However, the estimated ratiof j is the mean of the ratios weighted with the value of the first column, i.e., Equation (A2) can be rewritten aŝ Therefore, cases with large ratios, such as described before, will not appear inf 1 but inσ 2 1 . The Merz-Wüthrich (Mack) method considers that small and large values are as likely to be multiplied by a factor, which is not the case with our model for which small values are likely to be multiplied by large factors and large values are likely to be multiplied by small factors. This is confirmed by plotting and comparing the distributions of the different capital measurements (Figure 3), we can notice that, while the true capital and the two COT capitals seem to follow a normal distribution, the distribution of the Merz-Wüthrich capital seems to follow rather a log-normal distribution. Another interesting statistic to understand how related these capital measurements are is the correlation between them. Computing the correlation matrix yields the results presented in Table 2, we see that the correlation is almost 100% for the two COT estimates and the true value. Indeed, with or without jumps, the COT method is very close to the theoretical result. This is partially due to the fact that the ultimate distribution is known and that all these methods simply multiply the ultimate risk by a constant. The correlation is not exactly 100% due to the stochasticity induced by the simulations used to calculate ultimate risk in the COT methods. The Merz-Wüthrich capital however shows a negative correlation. The standard and robust estimators are very different, which suggests the presence of very large values of Merz-Wüthrich capital and departure from normality. This is confirmed by Figure 3 where the distribution in the bottom right plot is very different from a Gaussian. It indicates in particular that the robust estimator is more representative of the data. A correlation of −46% is rather strong. It is not true though that a small true capital implies a large Merz-Wüthrich capital, nor the opposite, but there is a real tendency among large values of true capital to coincide with relatively small values of Merz-Wüthrich capital. Table 2. Correlation matrix of the different capital measures. Above, the standard correlation estimate and below, the robust "MVE" estimate Rousseeuw and Leroy (1987). Another important quantity to study is the risk margin defined in Equation (21). We compare here the results obtained with the COT method described in Appendix D to those obtained from theory. Note that we cannot do this for the Merz-Wüthrich method as it is only giving the variation for the first year.

Standard Corr True Value SCOR, No Jumps SCOR, Jumps Merz-Wüthrich
We want to estimate Our methodology is very close to the one for the first-year capital using normal approximation and Equations (34) and (35). Assume F i−1 known, we can then generalise Equation (36), to get the following expression We then use the normality assumption to write thus obtaining a theoretical form for the tail value at risk, given the triangle developed up to calendar-year i − 1. Our methodology, starting from a triangle of realized rvs, is to complete it R times using the Bernoulli model and to calculate on each completed triangle TVaR κ (∆(i)|F i−1 ) according to the formula of Equation (39). By taking the mean over all R triangles, we obtain the required capital for calendar-year i that we sum up and multiply by the cost of capital (chosen here at 6%, as in the Solvency II directive) to obtain the risk margin.
In this case, we do not need to simulate truncated large triangles to make our comparison. Indeed, both the COT method and the theoretical simulation method work on small triangles. However, for the results to be similar and to avoid too frequent "zero risk left" situations, we still use a truncated large triangle like for the first-year capital comparison, i.e., triangles of size 19 and with n = 100,000 rvs. Like for the first-year capital, we simulate 500 triangles from the process and, for each of them, calculate the capital required at each consecutive year and the risk margin using for the theoretical simulation method R = 10,000 triangle completions and for the COT b = 0.5 and p b = 0 (without jump part) and b = 0.75 (long tail) and p b from Equation (A5) (with "jump part"). In Table 3 and Figure 4, we can observe the results obtained on average and the measures of deviation over the 500 triangles. Table 3. Statistics for the risk margin on the 500 simulated triangles. The average risk margin, the standard deviation of the risk margin around the average and the mean absolute and relative deviation (MAD/MRAD) from the true value are displayed.  As we just saw, assuming normality, the first year capital of the COT method without jump part is an exact result. However, for i > 1 (still assuming normality), the method gives

Method
However, from Schwarz inequality, which also holds for conditional expectation, for any positive integrable random variable Y and any σ−algebra F , the COT method without jumps is systematically overestimating the true capital, as we can see in Figure 4. However, the overestimation is not very big (Table 3) and the method replicates reasonably well the form of the actual yearly capital. The average relative absolute error of the risk margin is 10.57% (see results in Table 3). We do not show the results for the capital at each year, but they lead to a similar message with the error increasing with the years as the capital itself decreases. The same method with jumps has less success with 26.52% of absolute error. This error is always underestimation, which is also true for each year. The error on the capital is always bigger with jumps. It is only at the end (calendar year 13 here) that the capital estimation is better with jumps and these values are almost 0, so they are not very relevant for the risk margin.
In general, one can see (Tables 1 and 3) that, for our n-steps model, the COT method without jumps is the one that performs the best. If we compute the autocorrelation of consecutive loss increments, we obtain 5% of mean correlation, which is close to independence. The independence situation corresponds to the calibration of the COT method with b = 0.5, thus explaining why the COT method without jumps provides the best results. This raises the question of what value of b would give the risk margin the closest to the benchmark. To answer this, we simulate another 100 triangles and calculate each time the risk margin with the benchmark method and with both COT methods with and without jumps, for all values of parameter b between 0.3 and 1 by steps of 0.01. For the COT method without jumps, we find that the fitted values for b are between 0.52 and 0.53, which is very close to the b = 0.5 that we have been using. For the COT method with jumps, the mean best b is also close to 0.5. However, we find some best b observations that are below 0.5, which stands for negative dependence between accident years and is not accepted by the COT method. In this case, the best b is much further than the one that has been used (0.75) by SCOR for real data. This is not unexpected since the method yields rather poor results (Table 3) for our n-steps model. (In the Appendix E we discuss the numerical stability of the above approximations. Furthermore, in the Appendix F we discuss the one-year capital for the first period as proportion to the sum of all the one-year capitals over all the periods.)

Conclusions
In this study, we have decomposed the various steps to reach the ultimate loss through a simple, but realistic example, which is used in a variety of line of business all over the World. The goal is to study the one year change required by the new risk based solvency regulations (Solvency II and the Swiss Solvency Test). Our example allows us to compute explicit analytical expressions for the variables of interest and thus test two methods used by actuaries to derive the one year change: the Merz-Wüthrich method and the COT method developed at SCOR. We find that the COT method is able to reproduce quite well the model properties while Merz-Wüthrich is not, even though the assumptions behind both methodologies are not satisfied in the case of our example (therefore the COT methodology is more robust). It is thus dangerous to use the Merz-Wüthrich method without making sure that its assumptions are met by the underlying data. Even though this seems obvious at first, the authors feel the need to warn about the risk of using the Merz-Wüthrich methodology acritically to any non-life portfolio, ignoring the fullfillment of the Merz-Wüthrich method's assumptions, as this is the tendency among practitioners, regulators and auditors in the last years.

Acknowledgments:
The authors would like to thank Christoph Hummel for suggesting the binomial process and acknowledge his first contribution with a model of tossing a coin. The authors would also like to thank Arthur Charpentier for providing the R code of the Merz-Wüthrich method and for discussions. We also would like to thank anonymous referees for the useful remarks.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Proof of an Equality in Distribution
In this appendix, we examine the equality in distribution formulated in Equation (3): The reason for it is that both sides are a sum of N k Bernoulli random variables with the same parameter, so that no matter what the distribution of N k is, both sides will have the same distribution. This can be proved rigorously by calculating both characteristic functions. The characteristic function of a random variable Y is defined as φ Y (t) = E e itY and determines uniquely the distribution of a function. The characteristic function of S The same calculations and arguments give the characteristic function of S N k : thus proving the equality in distribution. The calculations can be pushed forward. Indeed, E j e itX · P(N k = j) because the X l are i.i.d, Note that, if k = 1, and N 1 ∼ U ({0, ..., n}), then we can write We therefore have a close form for the characteristic function of S N 1 .

Appendix B. The Distributions of N i and of D(i)
Proposition A1. The distribution of N i , for 1 ≤ i ≤ n − 1, is and P[N n = k] = P[N n−1 = k].
Proof. Indeed, the first value of the probability is Then, we can write, for n = 2, and, for n = 3, We can continue iteratively up to n − 1 and obtain (A1). We should note here that, because of Condition (H), for the last distribution, we have P[N n = k] = P[N n−1 = k]. Alternatively, reminding that N n + N n−1 = n − A n−2 , we come to the same conclusion.
Proposition A2. The distribution of D(i) conditioned to A i−1 , 2 ≤ i ≤ n is equal to for 0 ≤ a i−1 ≤ n, and x ∈ R.
Proof. The formula can be obtained directly from (14), by using (15) and that S i has binomial distribution.

Appendix C. The Reserves in Our Model
For our model with only one row, i.e., one rv, the reserves at i < n years (steps) are simply In case we have a claims triangle with m rows and columns, where each row is our process with n exposures, the reserves at calendar year i are The idea behind the Merz-Wüthrich method is to calculate the Chain-Ladder estimation of the ultimate uncertainty at time 0 and at time 1 after adding the next diagonal. They then calculate the uncertainty of the difference to obtain the one-year uncertainty. We illustrate this in Figure 5 by showing the addition of one diagonal.

S
(2) n+1 Figure 5: The current triangle (left) and the next year triangle (right). The present is outlined in blue and the one-year ahead future in red. Merz-Wüthrich method allows to compute the ultimate on both triangles and calculate the uncertainty of the difference.

The COT method
The COT formula used at SCOR [5] consists in computing first the ultimate risk, in our case, the TVaR 99% of a B(Ñ , p) distribution, and taking a part of it, to be determined, as the required capital for each year. The idea behind the COT formula is to look at the evolution of the risk over time till the ultimate, and thus obtain the one-year period risks as a portion of the ultimate risk. In what follows we give a brief description of the COT formula however the detailed derivation is presented in [5]. Here,Ñ designates the number of exposures that remain to be included in the whole triangle. This can be written as where ρ κ (X) = xTVaR κ (X) = TVaR κ (X) − E(X), κ = 99%.
The vector δ = (δ 1 , ..., δ n−1 ) is called the COT-pattern and is obtained through the following relation: where γ = (γ 1 , ..., γ n−1 ) designates the incremental calendar year pattern, 32 Figure A1. The current triangle (left) and the next year triangle (right). The present is outlined in blue and the one-year ahead future in red. The Merz-Wüthrich method allows for computing the ultimate risk on both triangles and calculate the uncertainty of the difference.
In short, this model is based on the idea that claims will develop partially with a "good" continuous part and partially according to a "bad" part characterized by sudden jumps, the bad part being modelled as the total rest of the claims realising at once. The variable p b is a coefficient between [0, 1] that determines in which proportion the evolution is going to be continuous or discrete and b ∈ [0.5, 1] models the dependence between different calendar years.
In order for the COT method to be exact, the following assumptions must be true: 1. The evolutions of the claims losses and of the best estimates are stochastic processes as described in Ferriero (2016); roughly speaking, the relative losses evolve from the start to the end as a Brownian motion, except during a random time interval in which they evolve as a fractional Brownian motion, and the consequently best estimates evolve as the conditional expectation of the ultimate loss plus a sudden reserves jump, which may happen as a result of systematic under-estimations of the losses. 2. The volatility, measured in standard deviations, of the attritional claims losses is small relative to the ultimate loss size.
However, the COT method is robust in the sense that gives good estimates even when the assumptions are not fullfilled as we show here with our example.

Appendix E. Numerical Stability
In Section 4.2, we calculate the risk margin by simulating triangle completions according to our n-steps model. It is therefore legitimate to ask if the R = 10,000 simulations we use are enough to obtain stable results. To investigate this question, we simulate an n-steps model triangle. We then calculate its risk margin 200 times using our simulation method with a grid of values of R. From these, we calculate the mean and the standard deviation of the computed risk margins for each value of R. The distribution of the calculated risk margin being approximatively normal due to the central limit theorem, the mean and standard deviation fully characterize the distribution allowing us in particular to draw confidence intervals. We chose for this test R ∈ {10, 20,50,100,200,500,1000,2000,5000,10,000,20,000} and obtained the values presented in Table A1.
The results seem to indicate that the standard deviation of the risk margin calculation is inversely proportional to √ R as one would expect. The mean prediction is almost the same no matter what the number of simulations is even though the variation of this mean around 3.28 diminishes as R grows. The value R = 10,000 that we have used seems in any case sufficient as, for this value, the 95% confidence interval represents only a variation of ±0.34% around the mean. Table A1. Test of the number R of random triangle completions for the risk margin calculation. The mean and standard deviation allow for constructing a Gaussian 95% confidence interval by adding (resp. subtracting) 1.96 times the standard deviation to the mean to obtain the upper (resp. lower) bound. The "Variation" column designates the variation around the mean that represents the 95% confidence interval, i.e., 1.96 times the standard deviation.

R
Mean Standard Dev. Confidence Interval Variation Table A2. Statistics of the proportion of the capital represented by the first year as a function of the number of rvs n. Note that the number of rvs modifies the number of steps I = 5 + log(n) + 1.57 . The results are quite independent of the number of rvs and show no obvious pattern of development. The size of the changes between different values of n is much smaller than the standard deviation indicating that n has no (or non-significant) effect on the ratio of interest. The standard deviation shows no sign of correlation to the number of realized rvs per line, and the maximal observation is slightly more volatile than the minimal. However, both are very stable, giving no indication that n might have any significant effect on the ratio.

Number of rvs Number of Steps
Let us give some intuition behind these results. In our model, with the exception of the move from penultimate column to ultimate, for which all not yet realized exposures are forced to be realized, each move forward in the triangle means, in expectation, dividing by two the number of remaining rvs to be realized. Since we use 2000 triangle completion, we can assume that we are at the expectation. The variance is proportional to the number of rvs remaining. This means that the TVaR, which is proportional to the standard deviation (under normality assumption), is proportional to the square root of the same number. The crucial number in calculating capital is therefore the expectation of the square root of the number of rvs as described in Equations (38) and (40). The expectation of the square root is divided every calendar year by a factor that is almost the same, except near the end. This factor depends on the triangle, unlike the square root of the expected variance (the square root of the expected variance is approximately divided by √ 2 at every step). However, it is generally close to 0.69. Therefore, the ratio is approximately ∑ m−1 j=1 K 1 · 0.69 j−1 ≈ 1 ∑ ∞ j=1 0.69 j−1 = 1 − 0.69 = 0.31 .