Understanding Reporting Delay in General Insurance

The aim of this paper is to understand and to model claims arrival and reporting delay in general insurance. We calibrate two real individual claims data sets to the statistical model of Jewell and Norberg. One data set considers property insurance and the other one casualty insurance. For our analysis we slightly relax the model assumptions of Jewell allowing for non-stationarity so that the model is able to cope with trends and with seasonal patterns. The performance of our individual claims data prediction is compared to the prediction based on aggregate data using the Poisson chain-ladder method.


Introduction
The aim of this paper is to understand the reporting delay function of general insurance claims.The reporting delay function is an important building block in claims reserving.We study this reporting delay function in continuous time based on real individual claims data.The data is used to calibrate a non-stationary version of the statistical model considered in Jewell [1,2].This calibration then provides an estimation of the number of incurred but not yet reported (IBNYR) claims.The second building block in claims reserving is the modeling of the cost process of insurance claims.Jewell [1] stated in 1989: "Currently, the development of a good model for cost evolution over continuous time appears to require a long-term research effort, one that we believe will use the basic understanding of the event generation and reporting processes developed here, but will require much additional empirical effort to develop an understanding of cost-generating mechanisms and their evolution over time."Meanwhile, there have been some improvements into this direction, see Bühlmann et al. [3], Arjas [4], Norberg [5,6], Haastrup-Arjas [7], Taylor [8,9], Herbst [10], Larsen [11], Taylor et al. [12], Jessen et al. [13], Rosenlund [14], Pigeon et al. [15], Agbeko et al. [16], Antonio-Plat [17] and Badescu et al. [18,19].But we believe that state-of-the-art modeling is still far from having a good statistical model that can be used in daily industry practice.This may partly be explained by the fact that it is rather difficult to get access to real individual claims data.
In this paper we study individual claims data of two different portfolios: a property insurance portfolio and a casualty insurance portfolio.We choose explicit distributional models for the individual claims arrival process modeling and we calibrate these models dynamically to the data.The calibration is back-tested against the observations and compared to the (Poisson) chain-ladder method (which is applied to aggregated data).The main conclusion is that the chain-ladder method has a good performance as long as the claims process is stationary, but in non-stationary environments our individual claims estimation approach clearly outperforms the chain-ladder method.
In the next two sections we introduce the underlying statistical model and we describe the available claims data.In Section 4 we model seasonality of the claims arrival process.In Section 5 we calibrate the reporting delay function and underpin this by statistical analysis.Finally, in Section 6 we compare the individual claims modeling estimate to the classical chain-ladder method on aggregated data and we back-test the results of these two approaches.The figures and the proofs are deferred to the appendix.

Individual Claims Arrival Modeling
We extend the model considered in Jewell [1,2] to an inhomogeneous marked Poisson point process.We define Λ(t) ≥ 0 to be the instantaneous claims frequency and w(t) ≥ 0 to be the instantaneous exposure at time t ≥ 0. The total exposure W Λ on time interval (0, τ m ] is given by We consider the run-off situation after time τ m meaning that the exposure expires at time τ m , i.e., w(t) = w(t)1 {t≤τ m } for t ≥ 0. This implies that the total exposure is assumed to be finite W Λ < ∞.
Assume that the claims of the total exposure W Λ on (0, τ m ] occur at times T (called accident dates or claims occurrence dates) and the claims counting process (N(t)) t≥0 is given by for t ≥ 0 and the total number of claims is given by Model Assumptions 1.We assume that the claims counting process (N(t)) t≥0 is an inhomogeneous Poisson point process with intensity (w(t)Λ(t)) t≥0 .
The second ingredient that we consider is the reporting date.Assume that a given claim occurs at time T then we denote its reporting date at the insurance company by S ≥ T .For accident date T and corresponding reporting date S of claim we define the reporting delay by This motivates the study of the following inhomogeneous marked Poisson point process.
From Jewell [1,2] and Norberg [5,6] we immediately obtain the following likelihood function for parameters Λ and Θ L N,(T ,S ) =1,...,N (Λ, where f U|t,Θ denotes the density of F U|t,Θ (for parameter Θ).Observe that we use a slight abuse of notation here.In Norberg [5,6] there is an additional factor N! because strictly speaking Model Assumptions 2 consider ordered claims arrivals T ( ) ≤ T ( +1) for all = 1, . . ., N − 1, whereas in Jewell [1] and in Equation (2) claims arrivals are not necessarily ordered.The aim is to calibrate this model to individual claims data, that is, we would like to calibrate claims frequency Λ and reporting delay distribution F U|t,Θ .One difficulty in this calibration lies in the fact that we have missing data, because information about occurred claims with reporting dates S > τ is not available at time τ ≥ τ m .Therefore, we only observe a thinned inhomogeneous Poisson process, we also refer to Norberg [5,6].
We choose τ ≥ τ m .This implies that all claims have occurred at time τ, providing N = N(τ) = N(τ m ).By M = M(τ) ≤ N we denoted the number of claims that are reported at time τ.The intractable likelihood (2) is then converted to, for details we refer to Jewell [1], where we only consider reported claims with S ≤ τ and π Λ,Θ (τ) denotes the probability that an incurred claim is reported by time τ ≥ τ m .This is given by

Description of the Data
For the statistical analysis we consider two different European insurance portfolios: (1) line of business (LoB) Property, and (2) LoB Casualty.For both portfolios data is available from 1/1/2001 until 31/10/2010.In Figures 1-3 we illustrate the data.Generally, LoB Property is colored blue and LoB Casualty is colored green (depending on the context special features may also be highlighted with other colors, this will be described in the corresponding captions).Figure 1 gives daily claims counts on the left-hand side (lhs) and monthly claims counts on the right-hand side (rhs).The following needs to be remarked for Figure 1:

•
The monthly claims counts on the rhs show a clear annual seasonality.

•
The daily claims counts on the lhs show a weekly seasonality with blue/green dots for weekdays, violet dots for Saturdays and orange dots for Sundays, in Table 1 we present the corresponding statistics.

•
In general, these graphs are decreasing because of missing IBNYR claims (late reportings) that affect younger accident years more than older ones.
In Figure 2 (lhs) we give the daily claims reporting and Figure 2 (rhs) plots accident dates T versus reporting delays U = S − T :

•
Daily reporting differs between weekdays (blue/green) and weekends (violet for Saturdays and orange for Sundays).Basically there is no reporting on weekends because claims staff in insurance companies does not work on weekends, however there is a visible change in LoB Property after 2006.

•
There is a change in reporting policy in LoB Property after 2006 (top, lhs), this is visible by the change of reportings on weekends (and will become more apparent in the statistical analysis below).We do not have additional information on this, but it may be caused by web-based reporting and needs special attention in modeling.We call 1/1/2006 "break point" in our analysis because it leads to non-stationarity, this will analyzed in detail below.

•
Figure 2 (rhs) gives the accident dates T versus the reporting delays U = S − T .We observe that the big bulk of the claims has a reporting delay of less than 1 year, and for both LoBs the resulting dots are located densely for U ≤ 1. Bigger reporting delays are more sparse and LoB Casualty has more heavy-tailed reporting delays than LoB Property, the former having several claims with a reporting delay U of more than 3 years.
Finally, Figure 3 gives the box plots on the yearly scale of the logged reporting delays log(U ): • LoB Property has a change in reporting policy that leads to a faster reporting after break point 1/1/2006.

•
The graphs are generally decreasing because IBNYR claims (late reportings) are still missing, this corresponds to the upper-right (white) triangles in Figure 2 (rhs).

Likelihood Function with Seasonality
We discuss the modeling of the claims frequency Λ = (Λ(t)) t≥0 which can be any measurable function having finite integral on interval (0, τ m ].Time t ∈ R is measured in daily units (unless stated otherwise).In order to get an appropriate model we study annual and weekly seasonality that both influence Λ, see Figure 1.For the weekly seasonality we choose a stationary periodic pattern.This is an appropriate choice unless the insurance product or the portfolio changes.For the annual seasonality we split the time interval (0, τ m ] into smaller sub-intervals on which statistical estimation is carried out.These smaller time intervals (τ i−1 , τ i ] are given by finitely many integer-valued endpoints for 1 ≤ i ≤ m.Typically, ∆τ i corresponds to a calendar month: this is naturally given if data becomes available on a monthly time scale.The following has to be considered: (i) the monthly time grid is not equidistant because months differ in the number of days; (ii) months and weekdays do not have the same periodicity; (iii) ∆τ i should be sufficiently large so that reliable estimation can be done, and sufficiently small so that we have homogeneity on these time intervals; and (iv) smoothing between neighboring time intervals can be applied later on.Such a (monthly) seasonal split is reasonable because often insurance claims are influenced by external factors (such as winter and summer) that (may) only affect bounded time intervals for claims occurrence.This approximation can be seen as a reasonable modeling assumption; if more randomness is involved then we should switch to a hidden Markov model, such as the Cox model presented in Badescu et al. [18,19].
On this time grid we then make the following assumptions: for all 1 ≤ i ≤ m and t ∈ (τ i−1 , τ i ] with weekly periodic (piece-wise constant) pattern λ t = λ t = λ t+7 for all t > 0 fulfilling normalization ∑ 7 k=1 λ k = 7 and global parameter Λ i for interval (τ i−1 , τ i ].We remark the following:

•
We have a weekly periodic piece-wise constant pattern that is assumed to be stationary and a (monthly) seasonal parameter Λ i .The total exposure on (τ i−1 , τ i ] is given by λ k in general differs from ∆τ i because different months may have different weekday constellations.

•
In the special case of λ t ≡ 1 we obtain the piece-wise homogeneous case This is a step function for the claims frequency providing total exposure W (i) Λ = w i Λ i ∆τ i .Note that model (7) was studied in Section 4.2 of Antonio-Plat [17].For our real data examples, the influence of the additional weekly periodic parameter (λ k ) k=1,...,7 will be visualized in Figure 4, below.

•
We could also choose a yearly seasonal pattern for (Λ i ) 1≤i≤m if, for instance, ∆τ i correspond to calendar months.This is supported by Figure 1 (rhs) and would reduce the number of parameters.This is particularly important for claims prediction, i.e., for predicting the number of claims of future exposure years.In our analysis we refrain from choosing additional structure for (Λ i ) 1≤i≤m because we will concentrate on inference of past exposures and because the volumes of the two LoBs are sufficiently large to get reliable inference results on a monthly scale.Time grid (4) defines a natural partition on which the inhomogeneous marked Poisson point process decouples into independent components, see Theorem 2 in Norberg [6].The (τ-observable) likelihood on time interval (τ i−1 , τ i ] under the above assumptions is given by where M i = M i (τ) denotes the number of reported claims at time τ ≥ τ m with accident dates T (i) ∈ (τ i−1 , τ i ] and corresponding reporting dates S (i) ≤ τ for = 1, . . ., M i .The probability that an incurred claim with accident date in (τ i−1 , τ i ] is reported at time τ ≥ τ i simplifies to Observe that probability π (i) Θ (τ) only depends on the (weekly-) seasonal pattern (λ t ) t∈(τ i−1 ,τ i ] and on Θ, but not on global parameter Λ i of time interval (τ i−1 , τ i ].
Formula (8) specifies the likelihood on time interval (τ i−1 , τ i ].Due to the independent splitting property of inhomogeneous marked Poisson processes under partitions the total likelihood function at time τ ≥ τ m is given by In the estimation procedure below we assume that the weekly periodic pattern (λ k ) k=1,...,7 is given and parameters (Λ i ) 1≤i≤m and Θ are estimated with maximum likelihood estimation (MLE) from (9), based on the knowledge of (λ k ) k=1,...,7 .In fact, in the applications below we will use a plug-in estimate for (λ k ) k=1,...,7 .We could also consider the full likelihood, including (λ k ) k=1,...,7 , but for computational reasons we refrain from doing so.

Analysis of the MLE System
In this section we derive the maximum likelihood estimate (MLE) of (Λ i ) 1≤i≤m and Θ based on the knowledge of the weekly periodic pattern (λ k ) k=1,...,7 .We calculate the derivatives of the logarithms of ( 9) and ( 8), respectively, and set them equal to zero to find the MLE.This provides the following lemma.Lemma 3.Under Model Assumptions 2 with exposure (5), the MLE (( Λ (τ) i ) 1≤i≤m , Θ (τ) ) of parameter ((Λ i ) 1≤i≤m , Θ) at time τ ≥ τ m for given weekly periodic pattern (λ k ) k=1,...,7 is obtained by the solution of The proof is given in Appendix A. Observe that for known weekly periodic pattern (λ k ) k=1,...,7 the MLE decouples in the sense that the MLE of Θ can be calculated independently of (Λ i ) 1≤i≤m .This substantially helps in the calibration below because it reduces complexity.Secondly, we remark that the function gives a density on (τ i−1 , τ i ] × (0, τ].

Calibration of the Weekly Periodic Pattern
The MLE in Lemma 3 assumes that the weekly periodic pattern (λ k ) k=1,...,7 is given (and known).In this section we compute a plug-in estimate ( λ k ) k=1,...,7 .This has the advantage that the MLE remains tractable.We estimate this weekly periodic pattern (λ k ) k=1,...,7 under one additional assumption which we only use for this purpose (and drop again thereafter): assume τ ≥ τ m is fixed and that there exists m * ∈ {1, . . ., m} such that for all t ≤ τ m * and all Θ.Assumption Equation (11) We may examine the robustness of these estimates by choosing different time horizons τ m * , this is done in Figure 5 (lhs); these graphs also show the confidence bounds we explain how to construct.We have The latter ratio considers the realization of two independent Poisson distributed random variables with means and variances, respectively, For this reason we approximate X i Following Hinkley [20] we can study the asymptotic behavior of the latter using This allows us to derive approximate confidence bounds of the weekly periodic pattern estimates.Choose confidence level α ∈ (1/2, 1), then we get a two-sided confidence bound estimate with for p ∈ (0, 1) Replacing all parameters by their MLEs given by Lemma 4 and w i Λ i = M i (which is the MLE for i ≤ m * under assumption ( 11)) we get an estimate for the confidence bounds (12).These are plotted in Figure 5.
In Figure 5 (rhs) we present the resulting MLEs ( λ k ) k=1,...,7 and the corresponding (estimated) confidence bounds for confidence level α = 90%.We observe narrow confidence bounds and substantial daily differences.In particular, claims frequencies in LoB Casualty are much lower on weekends than on weekdays (this may suggest that we consider commercial casualty insurance business).For LoB Property we observe higher frequencies on Fridays and Saturdays, also this is directly related to the underlying business.Figure 5 (lhs) gives the corresponding time series as a function of τ m * .We observe convergence of the estimates after roughly 3 years of observations.The confidence bounds in all plots are given by ( 12) for confidence level α = 90%.

Calibration of the Reporting Delay Distribution
In this section we study the choice of the reporting delay distribution F U|t,Θ .In an empirical analysis we identify three regimes of reporting delays which we will model separately.In short, (1) small reporting delay layer where we consider a weekday structure, see Figure 6; (2) middle reporting delay layer with the main bulk of reportings, see Figure 14 (top); and (3) large reporting delay layer that should have an appropriate tail for late reportings, see Figure 14 (bottom).We call these the small, middle and large layers, and label them by n = 1, 2, 3.

Decoupling of the Reporting Delay Distribution
As introduced in (4), we choose a monthly time grid 0 = τ 0 < τ 1 < . . .< τ m with τ m being the end point of the last observed calendar month.The monthly time grid is naturally given because available data is provided on that time scale.From a statistical point of view also finer or wider time grids are possible.In LoB Property we have about 400 claims per month which gives a coefficient of variation of 5% (for the Poisson distribution) and in LoB Casualty we have between 80 and 100 claims which gives a coefficient of variation of roughly 10%, see Figure 1 (rhs).If we would have stationarity we could (or even should) take bigger time intervals, but because of the yearly seasonal pattern, monthly time intervals are preferred to capture these seasonal differences.
The end point τ m of the last observed calendar month will also be considered as a variable that evolves when more and more information becomes available.First (rather limited) information is available at 31/1/2001 and latest available information is as of 31/10/2010.Thus, τ m will run from 31/1/2001 to 31/10/2010 and we perform dynamic calibration based on actual information.
We make assumption (5) on that monthly time grid and we remark that the monthly time grid is not equally spaced in number of days.Therefore it is convenient to measure time t in daily units.We then assume that the weekly seasonal pattern (λ k ) k=1,...,7 is estimated (and fixed, see Figure 5) through Lemma 4 (and we drop the upper hat in the notation of λ k ).Note that fixing the weekly periodic pattern reduces the computational complexity in the sequel.
Next we need to choose the reporting delay distribution F U|t,Θ .We choose three layers with where the probability weights p 1) , u (n) ) for n = 1, 2, 3. We make the following assumptions: Assumption 5. We choose time units in days and make the following (additional) assumptions for the density in (13): For t ∈ (τ i−1 , τ i ] and n = 1, 2, 3 we assume p This assumption says that the density of the small layer depends on weekdays t and the remaining terms only depend on the accident month (τ i−1 , τ i ].The cumulative distribution function is under Assumption 5 for t ∈ (τ i−1 , τ i ] given by This split in layers again defines a partition and the likelihood decouples into independent parts; see also Theorem 2 of Norberg [6].The log-likelihood of ( 9) is then at time τ m given by with M i,n = M i,n (τ m ) being the number of reported claims at time τ m with accident dates . The total exposures for this partition are given by The probability that these claims are reported at time τ m is given by Note that n) because in that case all claims have been reported at time τ m with reporting delay less than u (n) .This may substantially simplify the analysis in the lower and middle layers n = 1, 2, and is similar to (11).

Model in the Small Reporting Delay Layer
We start by considering the small reporting delay layer [u (0) , u (1) ).Data shows that reporting delays have a weekly pattern because claims divisions do not (necessarily) work at weekends and a claim occurring, for instance, on a Saturday can only be reported on Monday.This is illustrated in Figure 6.This indicates that we need a (week-) daily modeling approach.In order to not over-parametrize our model we try to keep this (week-) daily modeling layer as small as possible.The canonical choice then is to set u (1) = 7 because after one week all claims have experienced a full weekly cycle and reporting should be on a similar level for all weekdays, this is supported by Figure 6 (middle), though not fully.
We could now try to maximize the log-likelihood (15) by brute force.Observe that this includes a coupling between all layers through exposures W U|t,Θ .This is unpleasant from a computational point of view, and we therefore propose an approximation.
For i < m we have τ m ≥ u (1) + τ i which implies π (i,1) Θ (τ m ) = 1 for all i < m.For i = m we have from (17) and under Assumption 5, assumption λ t = λ t and a change of variable U|t,Θ = p 1 and p U|t,Θ = p 2 (we omit possible time dependence in the notation of Θ n and p n ), then the coupling of the lower layer with the other two layers happens through π (m,1) U|t,Θ = p 1 and p (18) the latest 7 days of observations, i.e., claims with occurrence dates T (m,1) ∈ (τ m − u (1) , τ m ], then the calibration of the lower layer density completely decouples from the other two layers because we have full information (no missing data) for accidents with occurrence dates in (τ 0 , τ m − u (1) ] at time τ m .This is indicated by the dashed red line in Figure 7 (lhs).In most cases this provides a reasonable approximation because the last u (1) = 7 days will not completely change the calibration of the lower reporting delay distribution F (1) U|t,Θ 1 if we have observations over, say, 10 years ≈ 3652 days.For this reason we shorten the last time interval to (τ m−1 , τ m − u (1) ] which provides in view of (15 where M m,1 denotes the number of reported claims at time τ m with accident dates T (m,1) ∈ (τ m−1 , τ m − u (1) ] and reporting delays U (m,1) ∈ [0, u (1) ).The first component Θ 1 of Θ is assumed to fully characterize density f  Next we discuss the explicit choice of f (1) U|t,Θ 1 . In Figure 6 (lhs) we plot the empirical distribution of all claims with accident dates before 01/2006 and a maximal reporting delay of 365 days.Figure 6 only shows the claims with reporting delays U < u (1) = 7, i.e., belonging to the small layer.The lhs shows the individual delay distributions per weekday of occurrence, the middle pictures the same distributions but compressed by the weekends (because there are (almost) no reportings on Saturdays and Sundays) and the rhs shows the compressed graph that is normalized to 1 at time u (1) .The graphs indicate that we should start by modeling weekdays individually.We make the following Ansatz: choose discrete distributions with, for u = 0, . . ., u (1) The second identity in (20) implies that we obtain a weekly periodic reporting delay distribution with 42 parameters Θ 1 = (θ s (u)) 0≤u≤u (1) −1,1≤s≤u (1) −1 , if we assume stationarity (20) on the weekly time grid.We may even ask for more parameters because of potential non-stationarity of the reporting behavior, see Figure 2. We refrain from doing so but we use a rolling window to detect and capture non-stationarity.We should also remark that the 42 parameters raise questions about over-parametrization.We will investigate this question below and we will find that we can reduce the number of parameters in LoB Casualty, in LoB Property we will work with parametrization (20) which will provide rather stable results due to sufficient volume in this LoB.
Optimization problem (19) provides Lagrangian with Lagrange multipliers χ = (χ s ) 1≤s≤u (1) The MLE of ( 19) is then found by setting the derivatives of L τ m (Θ 1 , χ) w.r.t.Θ 1 and χ equal to zero and solving this system of equations.For s = 1, . . ., u (1) and u = 0, . . ., u (1) − 1 we define M s,u = M s,u (τ m ) to be the number of reported claims at time τ m with accident dates in (τ 0 , τ m − u (1) ], reported on weekday s and having reporting delay u (we think of s = 1 being Mondays and s = 7 being Sundays).The MLE of θ s (u) at time τ m is then given by θ The lower reporting delay layer distribution is at time τ m estimated by To capture potential non-stationarity we will choose a fixed window length K and consider this estimate based on observations in (τ (m−K)∨0 , τ m − u (1) ].

Empirics and Fitting the Small Reporting Delay Layer Distributions
We estimate distributions (21) in the lower reporting delay layer [u (0) , u (1) ) for the 2 LoBs.In Tables 2 and 3 we give the observed number of reported claims M s,u (τ m ) for weekdays 1 ≤ s ≤ 7 and reporting delays 0 ≤ u ≤ 6 = u (1) − 1 at time τ m = 31/10/2010 for claims with accident dates before 26/10/2010.We see that there is a weekly pattern with no reportings on weekends in LoB Casualty and fewer reportings on weekends in LoB Property.For the latter we estimate all parameters θ s (u) individually, for the former we may also discuss other approaches, for instance, compressing weekends and shift the weekdays in Table 3 to the left (which is done on the rhs of Table 3).Moreover, the numbers in Table 3 are rather small and of similar size which may also suggest that we should not distinguish different reporting delays.We investigate this more formally below.We start with LoB Property.We show in Figure 8 the resulting estimates (see (20)) with a rolling window of length 2•365 days (solid lines) which is compared to the estimate considering all observations (dotted lines).We clearly see the non-stationarity after the break point at 1/1/2006, and the rolling window seems to capture it rather well.Therefore, we do not use any other measures here, but work with the rolling window of length 2•365 days (solid lines).For LoB Casualty the non-stationarity is less obvious, see Figure 9.In fact, the resulting estimates with a rolling window of length 2•365 days (solid lines) are rather volatile which is a clear sign of over-parametrization.The dotted lines show the estimates based on all available observations, these are much smoother with a slight positive trend for some of the weekdays.At this point we could investigate more thoroughly this non-stationarity, we refrain from doing so because in this case study it may only marginally influence the estimation of the number of IBNYR claims: the potential trend has a very moderate slope which affects less then 10% of the claims in LoB Casualty (small reporting delay layer).For this reason, we simply choose a stationary model and we mainly aim at studying whether we can further reduce the number of parameters in Θ 1 .First, we compress the weekends (in Table 3 we go from the lhs denoted 0, . . ., 6 to the rhs denoted 0 * , . . ., 4 * ).Then we test the null hypothesis whether for all weekdays s = 1, . . ., 7 and compressed reporting delays u * = 1 * , . . ., 4 * we can choose the same (empirical) probability, and for all weekdays s = 1, . . ., 7 we can choose the same probability for reporting delay u * = 0 * , that is, we test the null hypothesis θ 1 (0 * ) = . . .= θ 7 (0 * ) (delay 0 * does not differ between weekdays s = 1, . . ., 7) and θ 1 (1 * ) = . . .= θ 7 (4 * ) (delays 1 * , . . ., 4 * and weekdays s = 1, . . ., 7 do not differ).We perform for every weekday s = 1, . . ., 7 a Pearson's χ 2 -test (for the information at time τ m = 31/10/2010).The corresponding test statistics is   We provide the resulting p-values in Table 4.The resulting p-value for claims with accident dates on Wednesdays s = 3 is 3.2% and for all other weekdays s we obtain p-values bigger than 20%.We consider these p-values to be sufficiently large so that we do not reject the null hypothesis.This leads to a substantial reduction in the number of parameters, and we only choose three different values for Θ 1 = (θ s (u)) 0≤u≤u (1) −1,1≤s≤u (1) at any time point τ m in this reduced case (the third one being 0% for weekends).In Figure 10 we show the results, the solid line gives the estimates under the null hypothesis and the dotted lines the estimates of the model with 42 parameters.For LoB Casualty we choose this reduced model (under the null hypothesis).
Using (18) and the fact that we choose a step function for F (1) U|t,Θ 1 we obtain estimated probability in the small reporting delay layer given by π (m,1) The results are presented in Figure 4 and they are compared to the case where we do not choose a weekly periodic pattern, that is, where we set (λ k ) k=1,...,7 ≡ 1.This latter model is the one used in Section 4.2 of Antonio-Plat [17].We see that the weekly periodic pattern essentially smooths the estimates, in particular, for weekends in LoB Casualty.This confirms the findings of Section 4.3 and, in particular, of Figure 5.For this reason we continue with the model allowing us for the modeling of a weekly periodic pattern (λ k ) k=1,...,7 .We fix the resulting estimates π (m,1) Θ (τm ) 1 (τ m ) and then calibrate the middle and large layers, this is explained next.22), for weekdays s = 1, . . ., 7 (with 4 degrees of freedom).
U|t,Θ per weekday s = t .Dotted lines show calibration based on individual weekdays and reporting delays and lines show calibration under compressed weekends and the null hypothesis that we only need three parameters.

Calibration of Middle and Large Reporting Delay Layers
We come back to log-likelihood (15).We replace in the small reporting delay layer parameter Θ 1 by its estimate Θ (τ m ) 1 derived in the previous subsection.This provides the log-likelihood at time τ m , we only show the terms including the unknown parameters (Λ, Θ −1 ) := (Λ, Θ 2 , Θ 3 , p 1 , p 2 ), where π (i,1) From this we compute the MLE of Λ = (Λ i ) i=1,...,m (see also Lemma 3): If we insert this back into (24) we get (only stating relevant terms for parameter estimation) where we have used Assumption 5 for f with n = 2, 3. Recall that we have normalization U|τ i ,Θ = 1 of the layer probabilities because the reporting delays need to be in one of the three layers.Assuming p (3) U|τ i ,Θ > 0 we can normalize these probabilities by dividing by this third probability and setting q (n) U|τ i ,Θ ≥ 0. From this we see that we can rewrite the last log-likelihood in terms of q = (q (1) To implement MLE of (26) there remains the calculation of π This is what we are going to discuss next.

Choice of Layers and Approximate Log-likelihood
We still need to specify threshold u (2) .The lower limit was chosen to be u (1) = 7 days.For u (2)  we test different reporting delays κ = 3, 6, 9 or 12 months.Note that κ months is not well-defined in terms of number of days.We set u (2) equal to 89, 181, 273 or 365 which is the minimal number of days that κ consecutive calendar months can have.The accident periods (τ i−1 , τ i ] with i ≤ m − κ are then fully observed in the middle layer at time τ m , and we have Thus, we only need to study m − κ + 1 ≤ i ≤ m in more detail for the middle reporting delay layer.As for the large reporting delay layer we see that there are no observations possible at time τ m for accident periods (τ i−1 , τ i ] with m − κ + 2 ≤ i ≤ m and, therefore, The remaining layers and probabilities are more involved, and we consider these next. Large reporting delay layer [u (2) , ∞) with n = 3.For i ≤ m − κ + 1 we have under Assumption 5 To simplify optimization (26) we assume that for reportings in the large layer the weekly periodic pattern (λ k ) k=1,...,7 only has a marginal influence.This is justified by the argument that between claims occurrence and reporting there are at least κ months of delay, and therefore the specific weekday of the accident should only marginally influence the late reporting (as long as a particular claim type cannot only occur on one specific weekday).For i ≤ m − κ this provides the approximation The situation i = m − κ + 1 is more delicate.We have u (2) ∈ {89, 181, 273, 365} which is the minimal number of days that κ consecutive calendar months can have, the maximal number of days being 92, 184, 276 or 366 days, respectively.Therefore, the following integral may also be non-zero on time interval (τ m−κ , (τ m ) ≤ 3/28 < 1/9 for all m.Therefore, this term only marginally influences the results (and it could also be skipped for parameter estimation but we will keep it).
Middle reporting delay layer [u (1) , u (2) ) with n = 2.We still need to treat the cases As we indicate below, this can be implemented and MLE can be performed.In order to speed up the MLE optimization we also approximate π Θ 2 (τ m ).However, this approximation is only used for parameter estimation, for the number of IBNYR claims estimation we will use the exact form Θ 2 (τ m ).For the approximation in MLE we also neglect weekday differences which provides for (u) = 0 for u ≤ u (1) .This provides the approximation because otherwise reporting delays belong to the small layer.
This allows us to approximate the log-likelihood ( 26) by (we also refer to Corollary 7, below) (1) where several of the π's and π's are either 0 or 1 (we give them in detail below).To calculate them we need the following lemma.
Lemma 6. Assume X ∼ F and choose x 1 < x 2 .We have Proof of Lemma 6.The proof follows by applying integration by parts.

3.
Large layer [u (2) , ∞).For m − κ + 2 ≤ i ≤ m we have probability π Finally, we need to choose explicit distribution functions for the reporting delay layers n = 2, 3.This is what we are going to do in the next subsection.

Choice of Explicit Distributions and Layer Probabilities
There remains the modeling of the reporting delay distributions in the middle and large layers as well as the relative layer probabilities q (n) We have considered different models, compared them to each other, checked them for robustness and applied statistical model selection criteria such as Akaike's information criterion.Our favorite model that is at the same time not too difficult and gives appropriate results is the following: (i) for the middle layer n = 2 we choose a stationary truncated log-normal distribution, (ii) for the upper layer n = 3 we choose a stationary shifted log-normal distribution, and (iii) we choose non-stationary relative layer probabilities q (n) τ i .We justify these choices by some statistical analysis below.Lemma 8. Assume X (2) has a truncated log-normal distribution with parameters µ 2 ∈ R and σ 2 > 0 supported in a non-empty interval [ν i−1 , ν i ] ⊂ R + .The density of X (2) is given by The distribution of X (2) is given by The expectation of X (2) on layer (x 1 , 3) has a shifted log-normal distribution with Z being log-normally distributed with parameters µ 3 ∈ R and σ 3 > 0. We have on the layer (x 1 , Proof of Lemma 8.The proof is a straightforward consequence of calculations with log-normal distributions, see also Section 3.2.3 in [21]. With these choices we revisit log-likelihood (27) which is now given by, set q = (q (1) , q (2) , α, γ), If we use the explicit truncated/translated log-normal distributions introduced in Lemma 8 there are 9 parameters (q (1) , q (2) , α, γ, µ 2 , σ 2 , µ 3 , σ 3 , u (2) ) involved in this optimization.Note that we use canonical translation ν = u (2) for the upper layer distribution F (3) U|t,Θ 3 .

Method: Log-normal
Next we analyze AIC and BIC of the optimal layer limit u (2) for the dynamic truncated/truncated log-normal model in LoB Property (for given γ = 1/4) and the static truncated/truncated log-normal model in LoB Casualty.The results are presented in Table 5 (bottom).We observe that we prefer for both LoBs a small threshold of u (2) = 3 months.

Model Calibration Over Time
In the previous subsection we have identified the optimal models as of τ m = 31/10/2010 using AIC and BIC.Observe that this is the optimal model selection with having maximal available data.This selection will be revised in this subsection because we study the models when incoming information increases over τ m ∈ {31/12/2001, . . ., 31/10/2010}.In Figures 12 and 13 we present the MLEs of parameters (µ 2 , σ 2 , µ 3 , σ 3 ) using ( 29) for (lhs) the truncated/truncated log-normal models and (rhs) the truncated/shifted log-normal models.LoB Property considers the dynamic version α > 0 with γ = 1/4 and LoB Casualty considers the static version α = 0. From this analysis we see that the parameters behave much more robust over time in the truncated/shifted log-normal model, see Figures 12 (rhs) and 13 (rhs).For this reason we abandon our previous choice, and we select the truncated/shifted log-normal model for both LoBs.This is in slight contrast to the AIC and BIC analysis in the previous subsection, but the differences between the two models in Table 5 (lhs) are rather small which does not severely contradict the truncated/shifted model selection.In addition, for LoB Property we choose threshold u (2) = 3 months (Figure 12 (top, rhs)) and for LoB Casualty we decide for the bigger threshold u (2) = 6 months (Figure 13 (bottom, rhs)), also because estimation over time is more robust for this latter choice.In Figure 11 we provide the estimated layer probabilities p (n) U|τ m ,Θ , n = 1, 2, 3, in the truncated/shifted log-normal model for LoB Property (with u (2) = 3 months) and for LoB Casualty (with u (2) = 6 months) for τ m ∈ {31/12/2010, . . ., 31/10/2010}.The solid lines give the dynamic versions α > 0 with γ = 1/4 and the dotted lines the static versions α = 0.In LoB Property we observe stationarity of these layer probabilities up to break point τ m 0 = 1/1/2006, and our modeling approach Equation (28) with γ = 1/4 seems to capture the non-stationarity after the break point rather well (note that the thin solid line shows an exact function x 1/4 after the break point by considering the map i → p (n) U|τ i ,Θ at time τ m = 31/10/2010 and the dotted lines show the stationary case α = 0).In this analysis we also see that LoB Casualty may be slightly non-stationary after 1/1/2008, see Figure 11 (rhs).This was not detected previously, but is also supported by the time series of the trend parameter α estimates given in Figure 7 (rhs).However, since the resulting trend parameter α is comparably small we remain with the static version for LoB Casualty.
Our conclusions are as follows.For LoB Property we choose the dynamic (α > 0) truncated/shifted log-normal model with γ = 1/4 and u (2) = 3 months; for LoB Casualty we choose the static (α = 0) truncated/shifted log-normal model with u (2) = 6 months.In Figure 14 we present the resulting calibration as of τ m = 31/10/2010.In the middle layer the fitted distribution looks convincing, see Figure 14 (top).In the upper layer the fitted distribution is more conservative than the observations, see Figure 14 (bottom), this is supported by the fact that the empirical distribution is not sufficiently heavy-tailed because of missing information about late reportings (IBNYR claims in upper left triangles in Figure 2).This missing information has a bigger influence in casualty insurance because reporting delays are more heavy-tailed.Moreover, in both LoBs the shifted modeling approach is more conservative than the truncated one.  (1, u (2) ) for (lhs) LoB Property with u (2) = 3 months and (rhs) LoB Casualty with u (2) = 6 months; (bottom) calibration of shifted and truncated log-normal distributions in the large layer [u (2) , ∞) for (lhs) LoB Property with u (2) = 3 months (static and dynamic versions) and (rhs) LoB Casualty with u (2) = 6 months (only static versions).
The model is now fully specified and we can estimate the number of IBNYR claims (late reportings).Using Equations ( 6), (25) and replacing all parameters Θ by their MLEs Θ (τ m ) at time τ m we receive estimate for the total number of incurred claims in time interval The number of estimated IBNYR claims is given by the difference IBNYR ).For illustration we choose three time points τ m ∈ {31/03/2004, 31/07/2007, 31/10/2010}, and we always use the relevant available information at those time points τ m .The results are presented in Figure 15, the blue/green line gives the number of reported claims M i (τ m ) and the red line the estimated number of incurred claims N (τ m ) i (the spread being the number of estimated IBNYR claims).We note that the spread is bigger for LoB Casualty than LoB Property (which is not surprising in view of our previous analysis).For LoB Property in Figure 15 (top) we also compare the static (gray) to the dynamic (red) estimation.The static version seems inappropriate since it cannot sufficiently capture the non-stationarity and estimation is too conservative after the break point τ m 0 = 1/1/2006.In the final section of this paper, we back-test our calibration and estimation, and compare it to the chain-ladder estimation.

Homogeneous (Poisson) Chain-Ladder Case and Back-Testing
In this section we compare our individual claims modeling calibration to the classical chain-ladder method on aggregate data.The chain-ladder method is probably the most popular method in claims reserving.Interestingly, the cross-classified chain-ladder estimation can be derived under Model Assumptions 2 and additional suitable homogeneity assumptions.

Cross-Classified Chain-Ladder Model
We choose an equidistant grid Denote by M i,j the number of claims with accident dates in (τ i−1 , τ i ] and reporting dates in (τ i+j−1 , τ i+j ], for i ∈ N and j ∈ N 0 .Under Model Assumptions 2 the random variables M i,j are independent and Poisson distributed with exposures We now make the following homogeneity assumptions for t ∈ (τ i−1 , τ i ] and i ∈ N i.e., we may drop time index t in f U|t,Θ and F U|t,Θ , respectively.We define If we now define reporting pattern (γ j ) j≥0 by we see that under (31) the random variables M i,j are independent and Poisson distributed with Moreover, we have normalization ∑ j≥0 γ j = 1.This provides the cross-classified Poisson version of the chain-ladder model.Under the additional assumption that ∑ J j=0 γ j = 1 for a finite J the MLE exactly provides the chain-ladder estimator.This result goes back to Hachemeister-Stanard [22], Kremer [23] and Mack [24], and for more details and the calculation of the chain-ladder estimator M CL(τ m ) i,j of M i,j with i + j > m at time τ m ≥ τ J we refer to Theorem 3.4 in Wüthrich-Merz [25].Using these chain-ladder estimators we get estimate for the estimated number of claims in period (τ i−1 , τ i ] with τ i ≤ τ m .This chain-ladder estimate N CL(τ m ) i is compared to the estimate N (τ m ) i provided in (30).

Back-Testing
In this section we back-test the estimations obtained by calibration ( 21) and (29) and compare it to the homogeneous chain-ladder case (33).We therefore calculate for each exposure period (τ i−1 , τ i ] with τ i ≤ τ m ∈ {31/12/2001, . . ., 31/10/2010} the estimates N at time τ e = 31/10/2010.In particular, we back-test the estimates with time lags j = 0, 1 against the latest available estimation.We therefore define the ratios The first ratio χ i,0 compares the estimation of the number of claims in period (τ i−1 , τ i ] based on the information available at time τ i to the latest available estimation at time τ e = 31/10/2010, and the second variable χ i,1 considers the same ratio but with the estimation based on the information at time τ i+1 (i.e., after one period of development).Moreover, we calculate the relative process uncertainties for j = 0, 1 defined by this is the relative standard deviation of the number of IBNYR claims at time τ i+j for a Poisson random variable, normalized with N (τ e ) i to make it comparable to χ i,j .We present the results in Figure 16: • Figure 16 (top), LoB Property: We see that the chain-ladder estimate N CL(τ i+j ) i , j = 0, 1, clearly over-estimates the number of claims after the break point τ m 0 = 1/1/2006, whereas our non-stationary approach (28) can capture this change rather well and estimations χ i,j are centered around 1. Remarkable is that after 5 years of observations, the volatility of the estimation can almost completely be explained by process uncertainty (we plot confidence bounds of 2ς i,j ), which means that model uncertainty is comparably low.We also see that the faster reporting behavior after break point τ m 0 has substantially decreased the uncertainty and the volatility in the number of IBNYR claims.Before the break point, uncertainty for j = 0 is comparably high, this can partly be explained by the fact that claims history is too short for model calibration.

•
Figure 16 (bottom), LoB Casualty: After roughly 5 years of claims experience the estimation in the individual claims model and the chain-ladder model are very similar.This indicates that in a stationary situation, the performance of the (simpler) chain-ladder model is sufficient.However, this latter statement needs to be revised because non-stationarity can be detected more easily in the individual claims history modeling approach, in particular, the individual claims model reacts more sensitively to structural changes.The confidence bounds have a reasonable size because the back-testing exercise does not violate them too often (i.e., the observations are mostly within the confidence bounds), however, in practice they should be chosen slightly bigger because they do not consider model uncertainty.
We close this section with a brief discussion of two related modeling approaches.Our approach can be viewed as a refined model of the one used in Antonio-Plat [17].The first refinement we use is that we consider the weekly periodic pattern (λ k ) k=1,...,7 which was supported by the statistical analysis given in Figure 5. Neglecting this weekly periodic pattern would lead to less smooth small layer probabilities, in particular, in LoB Casualty, see Figure 4.The second refinement is that we choose a reporting delay distribution that depends on the weekday of the claims occurrence.This is especially important for the small reporting delay layer because weekday configuration essentially influences the short reporting delays.In our analysis, this then leads to a mixture model with three different layers.Antonio-Plat [17] use a mixture of a Weibull distribution and 9 degenerate components which fits their purposes well.Our approach raises the issue of over-parametrization which we analyze graphically, i.e., we observe rather stable parameter estimates after 4 years of observations, see for instance Figures 12 (rhs) and 13 (rhs).
The issue of over-parametrization is of essential relevance if we would like to do prediction for future exposure periods.That is, in our analysis we have mainly concentrated on making statistical inference of the instantaneous claims frequency Λ(t) of past exposures t ≤ τ m at a given time point τ ≥ τ m .Going forward, we may also want to model and predict the claims occurrence and the claims reporting processes, respectively, of future exposures.An over-parametrized model will have a low predictive power because it involves too much model uncertainty, therefore we should choose as few parameters as necessary.Moreover, for predictive modeling it will also be necessary to model stochastically the instantaneous claims frequency process Λ, i.e., our statistical inference method is not sufficient for predicting claims of future exposures, but it will help to calibrate a stochastic model for Λ.A particularly interesting model is the marked Cox model proposed in Badescu et al. [18,19].Similarly to Antonio-Plat [17], Badescu et al. [18,19] consider the piece-wise homogeneous case ( 7), but with (Λ i ) i being a state-dependent process which is driven by a time-homogeneous hidden Markov process (in the sense of a state-space model).We believe that this is a promising modeling approach for claims occurrence and reporting prediction, if merged with our weekday dependent features (both for claims occurrence and claims reporting delays).

Conclusions
We have provided an explicit calibration of a reporting model to individual claims data.For the two LoBs considered it takes about 5 years of observations to have sufficient information to calibrate the model (this is true for the stationary case and needs to be analyzed in more detail in a non-stationary situation).As long as the claims reporting process is stationary the individual claims reporting model and the aggregate chain-ladder model provide very similar estimates for the number of IBNYR claims, but as soon as we have non-stationarity the chain-ladder model fails to provide reliable estimates and one should use individual claims modeling.Moreover, our individual claims modeling approach is able to detect non-stationarity more quickly than the aggregate chain-ladder method.Going forward, there are two different directions that need to be considered.First, for the evaluation of parameter uncertainty one can embed our individual claims reporting model into a Bayesian framework (this is similar to the Cox model proposed in Badescu et al. [18,19]) or one can use bootstrap methods.Secondly, the far more difficult problem is the modeling of the cost evolution and the claims cash process.We believe that this is still an open problem and it is the next building block of individual claims reserving that should be studied on real data examples.
Author Contributions: Both authors have contributed equally to this paper.

Conflicts of Interest:
The author declares no conflict of interest.

Appendix A. Proofs
Proof of Lemma 3. We maximize likelihood function (9).The derivatives of its logarithm provide for i = 1, . . ., m the requirements (denoted by !=) ∂log L (T ,S ) =1,...,M (Λ, Θ) and for the derivative w.r.t. the reporting delay parameter Θ we obtain ∂ log L (T ,S ) =1,...,M (Λ, Θ) The first requirements provide identity (S (i) − T (i) ) − log(π where in the last identity we have added constants in Θ which vanish under the derivative (note that these constants were added to indicate that we obtain the densities (10)).
Proof of Lemma 4. We maximize likelihood function (9) for i ≤ m * on weekly time grid ∆τ i = 7 under side constraint ∑ 7 k=1 λ k = 7 and under assumption (11) which implies π where we have used λ + i = 7 on the weekly time grid ∆τ i = 7.The latter implies Λ i = M i /(7w i ) and plugging this into the former requirement provides This implies that Lagrange multiplier χ is found from normalization ∑ 7 k=1 λ k = 7 which provides the claim.

Figure 1 .
Figure 1.Observed claims counts from 1/1/2001 until 31/10/2010: (top) LoB Property colored blue and (bottom) LoB Casualty colored green.The lhs gives daily claims counts and the rhs monthly claims counts; the red lines are the rolling averages over 30 days on the lhs and over 2 months on the rhs; violet dots show claims occurrence on Saturdays and orange dots claims occurrence on Sundays, the resulting statistics are provided in Table1.

Figure 2 .
Figure 2. Observed and reported claims counts from 1/1/2001 until 31/10/2010: (top) LoB Property colored blue and (bottom) LoB Casualty colored green.The lhs gives daily claims reporting; the red lines are the 30 days rolling averages; violet dots show claims reporting on Saturdays and orange dots claims reporting on Sundays.The rhs plots accident dates T versus reporting delays U = S − T ; the upper-right (white) triangle corresponds to the missing data (IBNYR claims); the blue/green dots illustrate reported and settled claims; the orange dots reported but not settled (RBNS) claims.

Figure 3 .
Figure 3. Box plots of the logged reporting delays log(U ) on the yearly scale (lhs) LoB Property, (rhs) LoB Casualty.

Figure 6 .
Figure 6.(top) LoB Property and (bottom) LoB Casualty: empirical distribution of reporting delays separated by weekdays of claims occurrence of all data with accident date prior to 01/2006 and maximal reporting delay of U ≤ 365 days (lhs) per day, (middle) compressed by weekends, and (rhs) compressed and normalized to 1 after a delay of one week.

( 1 )
U|t,Θ 1 but no other part of the reporting delay distribution.

Figure 8 . 1 )
Figure 8. LoB Property, small reporting delay layer: estimated cumulative distribution F (1) U|t,Θ per weekday s = t .Dotted lines show calibration based on all observations since τ 0 and lines show calibration based on a rolling window of length 2•365 days.
) with M s,u * (τ m ) being the number of reported claims with occurrence day s and compressed reporting delay u * , M s,• (τ m ) = ∑ 4 * u * =0 * M s,u * (τ m ) and θ (τ m ) s (u * ) being the corresponding MLE under the null hypothesis.

Figure 9 . 1 )
Figure 9. LoB Casualty, small reporting delay layer: estimated cumulative distribution F (1) U|t,Θ per weekday s = t .Dotted lines show calibration based on all observations since τ 0 and lines show calibration based on a rolling window of length 2•365 days.

Mon s = 1 Figure 10 .
Figure 10.LoB Casualty, small reporting delay layer: estimated cumulative distribution F

Figure 16 .
Figure 16.Back-test LoB Property (top) and LoB Casualty (bottom): we compare the (non-stationary) estimate χ i,j (blue/green) to the (stationary) chain-ladder estimate χ CL i,j (orange) for (lhs) time lag j = 0 and (rhs) time lag j = 1.The black line shows the process uncertainty confidence bounds of 2 (relative) standard deviations ς i,j .

Table 1 .
Statistics per weekday: average daily claims counts and empirical standard deviation for LoB Property and LoB Casualty.

Table
is an approximation that we only use for choosing (λ k ) k=1,...,7 .It has the advantage that all time points t ≤ τ m * are fully experienced at time τ.Estimation of (λ k ) k=1,...,7 is then only done based on claims with T ≤ τ m * because for these occurrence days there are no missing values.Of course, this neglects the latest information but often (in stationary cases, if τ m * is not too small and if late reportings do not distort the weekly periodic pattern) this estimation is sufficiently robust.The following lemma is proved in Appendix A.

Table 4 .
p-values of the χ 2 -tests under the corresponding null hypotheses for test statistics χ 2 s , see Equation (

Table 7 .
AIC and BIC asTable 5 (top) but with log-normal distributions replaced by gamma distributions.