Stochastic Claims Reserving Methods with State Space Representations: A Review

Often, the claims reserves exceed the available equity of non-life insurance companies and a change in the claims reserves by a small percentage has a large impact on the annual accounts. Therefore, it is of vital importance for any non-life insurer to handle claims reserving appropriately. Although claims data are time series data, the majority of the proposed (stochastic) claims reserving methods is not based on time series models. Among the time series models, state space models combined with Kalman filter learning algorithms have proven to be very advantageous as they provide high flexibility in modeling and an accurate detection of the temporal dynamics of a system. Against this backdrop, this paper aims to provide a comprehensive review of stochastic claims reserving methods that have been developed and analyzed in the context of state space representations. For this purpose, relevant articles are collected and categorized, and the contents are explained in detail and subjected to a conceptual comparison.


The Importance of Claims Reserving in Non-Life Insurance
The insurance industry offers a multi-faceted range of numerous products that enable policyholders to insure themselves against almost any form of loss. Insurance companies therefore differentiate their products according to various criteria. In this paper, we focus on the problem of claims reserving for a branch of insurance products known as Non-Life Insurance (Continental Europe), General Insurance (United Kingdom) and Property and Casualty Insurance (USA). While this branch encompasses all insurance products that are different from life insurance, life insurance includes only life-related products and disability insurance (see Wüthrich and Merz 2008). This is due to the following reasons. On the one hand, life and non-life products differ reasonably, which is mainly reflected in the contract terms, types of claims and risk drivers. This also explains why different stochastic models and methods are used in both these branches. On the other hand, in many countries (such as Germany or Switzerland), there is a strict legal separation between life and non-life. A non-life insurer is therefore prohibited from offering life products, and vice versa. For this reason, it is not uncommon for insurance corporations to establish different companies and thus sell products from both branches. The following lines of business belong to the non-life insurance branch: motor/car insurance, property insurance, liability insurance, accident insurance, health insurance, marine insurance, and other insurance products such as aviation, credit insurance, epidemic insurance, legal protection, travel insurance, and so on (see Wüthrich and Merz 2008).
The amount of money that a policyholder has to pay to the insurer for insurance coverage is called the premium. By paying a premium, the policyholder under an insurance

State Space Models in the Claims Reserving Literature
The actuarial literature contains various articles in which state space models and the Kalman filter learning algorithms are applied to improve stochastic claims reserving (see Johannssen 2016). As a pioneer, De Jong and Zehnwirth (1983) constructed a state space model for the payment stream of incremental payments, took business volume and inflation indices into account, and presented a method to estimate the states underlying the observations of the upper triangle and to predict the outstanding loss liabilities of the lower triangle. Afterwards, Verrall (1989) used the relationship between the two-way ANOVA and the Chain Ladder (CL) method to establish a state space model for the socalled linear CL model. Wright (1990) constructed a model for incremental payments and employed the state space approach to model variations in parameters across different accident years. Verrall (1994) extended the state space model of Verrall (1989) to weaken the homogeneity property of the CL method, which allows for development factors that do not necessarily have to be identical across all accident years. Zehnwirth (1997) considered different recursive representations, including state space models based on the general form introduced by De Jong and Zehnwirth (1983) and discussed calendar year effects in claims development triangles. Ntzoufras and Dellaportas (2002) presented four models for Reported But Not Settled (RBNS) claims, including state space models following Verrall (1989, and1994). Alpuim and Ribeiro (2003) proposed a univariate distribution-free state space model, where incremental payments are modeled as a function of payments of the first development year, i.e., the accident year itself. Taylor et al. (2003) discussed a generalized Kalman filter that accounts for non-linearities in the observation equation. De Jong (2005) considered the so-called development correlation model, which is a (state space) model that accounts for correlations between individual development factors in the first two development years. In addition, De Jong (2006) not only discussed the development correlation model, but two further approaches taking correlations related to accident and calendar years into account. Li (2006) compared various claims reserving methods including the state space model of Verrall (1989). A completely different approach from the previous articles is taken by Atherino et al. (2010), who did not model the Incurred But Not Reported (IBNR) run-off data in chronological form, but as a univariate time series with missing observations. Pang and He (2012) combined the approach of Verrall (1989) and Taylor et al. (2003) and included an additional lag of the state vector into the state equation. Chukhrova and Johannssen (2017) presented a scalar state space model for cumulative payments. Most recently, Costa and Pizzinga (2020) and Hendrych and Cipra (2021) extended the row-wise stacking approach from Atherino et al. (2010) through the inclusion of tail effects and multivariate considerations that allow for dependency modeling between correlated lines of business, respectively. Figure 1 shows the history of the considered articles in stochastic claims reserving. Thereby, all articles are ordered chronologically and are classified into five categories considering their similarities in terms of contents: "Parametric evolution", "Log-normal model", "Correlation models", "Univariate models", and "Row-wise stacking". These categories need not be taken as mutually exclusive, but the choice of the appropriate category is made considering the main approach used in the respective paper. The first category includes the articles by De Jong and Zehnwirth (1983), Wright (1990), Zehnwirth (1997), Taylor et al. (2003), and Pang and He (2012), as they are based on the assumption of a parametric evolution of the run-off data across the development years. The second category includes the articles by Verrall (1989Verrall ( , 1994, Ntzoufras and Dellaportas (2002), Li (2006) because of the considered log-normal model for incremental payments. The third category consists of the articles by De Jong (2005, and2006) who discusses three types of models that incorporate correlations within claims development triangles. In the fourth category, there are the articles by Alpuim and Ribeiro (2003) and Chukhrova and Johannssen (2017), where models are presented that avoid complex matrix-based structures. Finally, the fifth category include the articles by Atherino et al. (2010), Costa and Pizzinga (2020), and Hendrych and Cipra (2021), who propose a row-wise stacking of the claims data and associated state space representations. The solid arrows in Figure 1 represent the contentual similarities among the papers in their modeling approaches. The dashed arrows indicate, however, that the respective state space models are included in papers where different stochastic claims reserving methods are compared (see England and Verrall 2002;Verrall 2004). In addition, state space models and the Kalman learning algorithms are discussed in the context of stochastic claims reserving in standard text books such as Wüthrich and Merz (2008).

Categorization of Articles and Organization of the Paper
In the following, a category-guided presentation of the articles is performed. Within each of five categories, a chronological order is followed to present the individual articles. For the sake of consistency, a unified notation is used throughout the paper. Since this paper is devoted to state space representations, all essential contents concerning state space models are presented in the following, whereas less relevant contents are omitted or referred to. In particular, the state space representations given in the articles are developed in full detail, often much more detailed than in the original papers.
The paper is organized as follows. In Section 2, articles are discussed that are based on the assumption of a parametric evolution of the claims data across development years (Category 1). Section 3 presents articles in which incremental payments are assumed to be log-normally distributed and are modeled using a log-normal model (Category 2). Section 4 includes articles where correlation models are considered (Category 3). In Section 5, state space models are presented that have a scalar structure (Category 4). Section 6 contains articles where the row-wise stacking approach is considered to re-organize the claims data (Category 5). Subsequently, Section 7 provides a conceptual comparison of the presented approaches and state space representations. In Section 8, concluding remarks are given. Pang and He (2012): Application of State Space Model in Outstanding Claims Reserve.
Three articles marked with are mainly based on the use of state space models and the Kalman filter learning theory, and thus are presented in detail, while the models of the other two articles marked with are treated in a more brief form, as state space models are not the focus of their methodologies.

Claims Reserving, State Space Models and the Kalman Filter
De Jong and Zehnwirth (1983) laid the foundation for the use of state space models and the Kalman filter in stochastic claims reserving with their article "Claims Reserving, State-Space Models and the Kalman Filter". The proposed state space model is constructed for the payment stream of the incremental payments and presumes known, time-varying system matrices.

Modeling the payment stream of incremental payments
The modeling is based on claims development triangles in which incremental payments X i,j are given for accident years i = 1, . . . , I and development years j = 0, . . . , I − 1. The payment stream of incremental payments is modeled with increasing development year j = 0, . . . , t − 1 and decreasing accident year i = t, t − 1, . . . , 1 for a fixed calendar year t = i + j via see also Figure 2. Here, the quantity m(t − j, j) = m(i, j) is generally the expected claim payment to be made in accident year i and development year j of the t-th calendar year, and u j (t) is a noise term with E[u j (t)] = 0.
y 0 (1) y 1 (2) y 2 (3) . . . De Jong and Zehnwirth (1983) propose an optional modification of (1) by including additional information such as the volume of business transacted in each accident year and the inflation factor for each calendar year. To this end, let n(i) denote an appropriate index for the volume of business transacted in accident year i and λ(t) denote an appropriate price index for payments in the t-th calendar year. Using both these quantities, (1) can be extended to where n(t − j)λ(t)m(t − j, j) is the expected value of the inflation-adjusted and volumeweighted incremental payments in accident year i and development year j of calendar year t.

Development of an appropriate state space representation
The modeling of the payment stream via (1) and (2) is promising with respect to the construction of an appropriate observation and state equation of a state space model, respectively. The following discussion in this regard is based on (1), but can be applied to (2) with minor modifications. In the first step of modeling the observation equation, (1) is transferred into a vector representation in such a way that y t represents the vector of observations X i,j of the t-th calendar year, f t forms the vector of expected claims payments m(t − j, j), and w t is the vector of noise terms u j (t) with j = 0, . . . , t − 1. Thus, the incremental payments made in calendar year t can be specified via or briefly as y t = f t + w t . In the second step, the vector f t is to be modeled in such a way that it is obtained by the product of a system matrix G t and a state vector x t . For this purpose, De Jong and Zehnwirth (1983) take m(i, j) for a given accident year i as a function depending on the development year j and thus construct for each accident year a distributed lag model of the form where φ k (j) are known functions in j and b k (i) are unknown parameters depending on the respective accident year i. De Jong and Zehnwirth (1983) justified the approach (4) by an overall smooth evolution of m(i, j) characterized by a firstly increasing and then decreasing behavior in j for a given accident year i. A variation of (4) for p = 1 is the so-called Hoerl curve which De Jong and Zehnwirth (1983) use in their empirical application example. In addition, (4) can be easily transferred into vector notation by using as follows: Substituting (7) into (3) then gives or in a more compact form for all s, t = 1, . . . , I. Thus, given φ(j), j = 0, . . . , t − 1, the system matrix G t is a known time-varying diagonal matrix, and the state vector x t contains unknown parameter vectors b(i) for i = 1, . . . , t. Assuming a Hoerl curve according to (5), the observation Equation (9) of the t-th calendar year results in (due to p = 1): Subsequently, De Jong and Zehnwirth (1983) specify an appropriate state equation, in which they establish a connection between the state vector x t of the t-th calendar year and the state vector x t−1 of the (t − 1)-th calendar year. The basic idea is again to model a smooth evolution, but in a slightly different form than in (4). The starting point is the sequence m(i, j), but with the difference that for a fixed development year j the accident years i are varied, whereas before for a fixed calendar year t the development years j varied (see Figure 3).
For a given development year j, De Jong and Zehnwirth (1983) propose modeling m(i, j) via with q = 1, . . . , i − 1, where η(i, j) is a noise term with E[η(i, j)] = 0. Thus, in contrast to (4), m(i, j) is not modeled in a deterministic way but as a random variable. Further, they assume that the conditional expected value on the right-hand side of (10) is a polynomial in i of degree q − 1 that passes through m(i − 1, j), . . . , m(i − q, j). This leads to with known for k = 1, . . . , q. Substituting (7) on both sides into (11) for j = j 1 , j 2 , . . . , j p yields where the (p × p)-dimensional matrix Φ and the p-dimensional vector v i are given by respectively. If both sides of Equation (12) are multiplied from the left by the inverse Φ −1 = Ψ of the matrix Φ (the existence of the inverse is ensured, see De Jong and Zehnwirth 1983), one obtains Transferring (13) into matrix notation, we obtain or in a more compact fom as well as E v s w T t = O for all s, t = 1, . . . , I. The identity matrices I, zero matrices O and scalar matrices a(k)I with k = 1, . . . , q in (14) are each of dimension p × p. Note also that the system matrices F t and B t are known in the state Equation (15).
A variation of the state Equation (15) is given for p = 1 (i.e., assuming a Hoerl curve as in (5)) and the parameters b(i) of different accident years i = 2, . . . , I are connected by a random walk that is, q = 1, a(1) = 1, Ψ = 1. Since we have Φ = φ T (j) = φ(j) = (j + 1)e −j , the relation Ψ = e j j+1 holds. For this reason, De Jong and Zehnwirth (1983) aim to obtain Ψ = 1 and thus a state equation in the form of the random walk (16), i.e., they choose without loss of generality the fixed development year j = 0.
With respect to (10) and (13), the use of (16) implies for all j = 0, . . . , I − 1. Accordingly, it follows for the system matrix F t that it has the value one at positions (1, 1), (2, 1), (3, 2), . . . , (t, t − 1) and zeros otherwise, while B t corresponds to a t-dimensional unit vector with the value one at position (1, 1). The state Equation (15) thus simplifies to: Table 1 gives an overview of the dimensions of vectors and matrices in the state space model of De Jong and Zehnwirth (1983).  Jong and Zehnwirth (1983).

Vectors
Matrices If one intends to model the observation and state equations by using (2) instead of (1), there are only changes in the observation Equation (9), while the state Equation (15) remains unchanged: each row k = 1, . . . , t of the system matrix G t has to be multiplied by a weighting factor consisting of volume and inflation indices, i.e., by n(t − k + 1)λ(t).

Forecasting the outstanding loss liabilities
As the system matrices G t , F t , B t are assumed to be known for all t = 1, . . . , I, the outstanding loss liabilities for individual and aggregated accident years can be predicted by using x I|I and P I|I = Cov x I − x I|I in a straightforward way. To this end, all future incremental payments are collected in the vector y I+1 = X I,1 X I−1,2 . . . X 2,I−1 X I,2 . . . X 3,I−1 . . . X I,I−1 T .
All these future observations belong to one of the accident years i = 2, . . . , I, and therefore, they are based on the corresponding state b(2), . . . , b(I). Accordingly, the state vector x I+1 corresponds to the vector x I of the current calendar year I, which is why the state Equation (15) is given by x I+1 = x I (i.e., F I+1 = I, Q I+1 = O). The system matrix G I+1 of the observation equation is obtained on the basis of (1) similar to that in (8), i.e., it consists mostly of zero vectors, and the entries φ T (j) with j = 1, . . . , I − 1 are ordered such that they are multiplied by the states b(i) from x I of the corresponding accident year i = 1, . . . , I of X i,j from y I+1 . Thus, the future observations can be predicted via y I+1|I = G I+1 x I|I (given by (9)) and respectively. The variance-covariance matrix of the prediction error y I+1 − y I+1|I is given by: Since x I|I , P I|I , G I+1 are known at time t = I, a prediction of the outstanding loss liabilities for individual and aggregated accident years is straightforward. With respect to the aggregated accident years, all components from y I+1|I are to be added to the total loss reserve, while for individual accident years only those components from y I+1|I related to the respective accident year i = 2, . . . , I are to be added. An extraction of these components can be carried out via a diagonal matrix A, which has a value of one at the respective positions and otherwise zeros. The variance-covariance matrix belonging to A y I+1|I is thus However, if the modified payment stream according to (2) is used, additional uncertainty is induced via the inflation index λ(t) of future calendar years t > I, which is unknown at time t = I. This is due to the unknown entries n(i)λ(i + j)φ T (j) for i + j > I instead of the known entries φ T (j) in the system matrix G I+1 . Wright (1990) primarily establishes a model for incremental payments that includes a state space approach, where the variation of the parameters is modeled over different accident years. Thus, although the model of Wright (1990) is not mainly based on state space models and the Kalman filter theory, it embeds them in a model framework as one component. In the following, therefore, the model for incremental payments and the state space model are presented (for further details, see Wright 1990).

Construction of the model for claims payments
The modeling is built on development triangles that include incremental payments X i,j in accident years i = 1, . . . , I and development years j = 0, . . . , I − 1. The proposed model is based on the assumption that incremental payments X i,j are composed of the sum of N i,j independent and identically distributed (i.i.d.) payments X k i,j (which are stochastically independent of N i,j ), that is, Wright (1990) uses the collective risk model and X i,j has a mixture distribution (see, e.g., Kaas et al. 2009). The lags j of individual incremental payments X k i,j between the accident year of the claim and the actual payment are modeled as i.i.d. random variables, which is why p i,j with ∑ I−1 j=0 p i,j = 1 is defined as the probability of payments regarding claims of accident year i in a given development year j. Let the number N i,j of payments for claims of accident year i in development year j be Poisson-distributed with parameter ε i p i,j , i.e., N i,j ∼ P ε i p i,j ; then, the incremental payments X i,j follow a mixture Poisson distribution. Following the convolution property of the Poisson distribution, the total number of claims payments N i = ∑ I−1 j=0 N i,j of an accident year i also follows a Poisson distribution with parameter where the N i,j for different j are assumed to be stochastically independent random variables and the parameter ε i serves as a measure for the exposure of accident year i. As for modeling of the probability p i,j , Wright (1990) gives two alternatives, the stochastic CL and the Hoerl curve model. While in the first alternative it is assumed that the probabilities p i,j are identical over all accident years i, the second alternative (preferred by Wright 1990) provides a modeling via a Hoerl curve of the form with constants κ i , A i and B i to be estimated and α j and j as functions depending on j.
Using (17), the expected value and variance of N i,j are as follows: In addition to the number N i,j of payments, Wright (1990) also models the amount of individual payments X k i,j for claims of an accident year i in the j-th development year, which, like the N i,j , are also assumed to be stochastically independent for various j. The first two moments of X k i,j are modeled distribution-free with help of with proper (unknown) constants K > 0, λ, ρ and inflation parameter δ t . While such a modeling of the expected value with different λ and K provides a variety of possibilities, the modeling of the variance results from the assumption that the coefficient of variation is time-invariant and corresponds to ρ. The optional term e δ t in (19) with and τ k as the average annual inflation rate between calendar years k − 1 and k, on the other hand, are used to account for inflation; i.e., e δ t reflects the inflation factor from the first calendar year to calendar year t = i + j. However, Wright (1990) proposes using and therefore assumes a constant inflation rate τ. Considering (18)-(20), and using the moments of the mixture Poisson distribution, the expected value and variance of the incremental payments X i,j in (i, j) are obtained via and where X i,j are stochastically independent for different j due to the assumptions regarding N i,j and X k i,j . Moreover, Wright (1990) normalizes the incremental payments X i,j with the help of with exposure defined by By using (17), (21), (23), (24), the expected value E[X i,j ] = µ i,j of the normalized incremental payments X i,j can be stated as follows: Assuming that φ i and ψ j are known, one obtains a generalized linear model of the form with the exponential response function h −1 , linear predictor x T j β i consisting of and noise term e i,j with E e i,j = 0 and Var e i,j = µ i,j φ i ψ j , where the parameter estimators β i and variance-covariance matrices R i can be determined for all i using the Fisher scoring algorithm such that β i ∼ N β i ; R i is approximately satisfied. However, since φ i and ψ j are usually unknown, Wright (1990) proposes an iterative approach using parameter initializations to determine initial values for φ i and ψ j . Considering this approach, all accident years are run sequentially and the results of all accident years are subsequently used to obtain new estimates of the parameters for the next run.

Modeling the parameter variation via a state space model
To increase the reliability of the estimators β i , Wright (1990) models the variation in the parameters β i for different accident years i via with By defining x i with the help of and by using (26), (27) can be written as with hold for all h, i = 1, . . . , I. Thus, Equation (28) forms the state equation of a state space model. Considering the estimators β i as observations y i , the associated observation equation can be obtained via with . . , I. Accordingly, a complete state space model with w = 3 and v = 4 is specified via Equations (28) and (29).

Kalman Filters with Applications to Loss Reserving
Zehnwirth (1997) states that this article arose from various lecture notes on statistics and actuarial science and should be viewed primarily as an introduction to Kalman filter theory and ordinary least squares (OLS) estimation and their close relationship to Bayes estimation. Thus, Zehnwirth (1997) derives Kalman recursions for (multiple) linear regression models and the local level model, shows the connections of sample-based updates with Bayes updates in OLS estimators, and discusses state space models and the general Kalman filter algorithms.
The focus in the experimental and empirical applications is primarily not on an application of the Kalman filter, but on an investigation of the trend properties within claims development triangles. In the experimental application, a simulation of incremental payments X i,j in accident years i = 1, . . . , I and development years j = 0, . . . , I − 1 is performed via i.e., a variation of the Hoerl curve. The factor e α reflects the basic level of incremental payments, while the factor e −0.2j describes their decreasing behavior over the development years. Based on this, calendar year effects (in the form of inflation factors) are illustrated and the problem of overparameterization is addressed, which arises, e.g., when there are too many parameters for the individual accident years, but can be remedied by recursively evolving parameters. However, no specific state space representation is developed.

Loss Reserving: Past, Present and Future
Taylor et al. (2003) give a classification scheme for claims reserving methods whose higher-level criteria make a division between static and dynamic methods. In the framework of this taxonomic classification and especially with respect to the dynamic methods, they discuss a generalized Kalman filter, which allows for non-linearities in the observation equation and noise terms following a distribution of the Exponential Dispersion Family (EDF). They present two modeling approaches based on different types of claims data and state space representations constructed specifically for these data.

Accident year-based state space modeling
In the first modeling approach, an accident year-based state space representation is constructed, which is based on Payments Per Claim Incurred (PPCI) of a workers' compensation insurance policy as claims data. The PPCI of an accident year i = 0, . . . , I in the development year j = 0, . . . , I are denoted by Y i,j and belong to the (t = i + j)-th calendar year with t = 0, . . . , I.
The state space model considered by Taylor et al. (2003) is based on a linear state equation of the form with five-dimensional random vectors for i, k = 0, . . . , I − 1, while the observation equation is based on a generalized linear model with link function h (i.e., response function h −1 ) and linear predictor G i x i for all i, k = 0, . . . , I. Moreover, E v i w T k = O holds for all i, k = 0, . . . , I, the initial state x 0 is uncorrelated with v i and w i for all i = 0, . . . , I and w i is assumed to be EDF-distributed for all i = 0, . . . , I. Thus, any strictly monotonic and differentiable link function h (such as a logarithm function) can be used to link the EDFdistributed observations y i and the systematic component G i x i . The resulting recursive equations Taylor et al. (2003) refer to as the EDF filter, which include the Kalman filter as a special case, namely for the identity function as link function and normally distributed noise terms w i . The observation vector y i in (32) includes all PPCIs of an accident year i = 0, . . . , I of the upper claims development triangle (see Figure 4).  Figure 4. Accident year-based modeling of the observation vector. Taylor et al. (2003) propose a logarithm function as a link function, the noise terms w i are assumed to be gamma-distributed and the (j + 1)-th row of the linear predictor G i x i for an accident year i = 0, . . . , I is given by with respect to the development year j = 0, . . . , I. Here, δ j,0 denotes the Kronecker delta, which can be used to model the peak in development year j = 0. Thus, the observation Equation (32) of accident year i = 0, . . . , I can be stated as follows: On the other hand, Taylor et al. (2003) do not provide any information on the concrete form of the state Equation (31). Taylor et al. (2003) model the evolution of the PPCI over the development years according to (33) in a similar way to De Jong and Zehnwirth (1983), Wright (1990) and Zehnwirth (1997), who specify the evolution of incremental payments over the development years with the help of a Hoerl curve. Taylor et al. (2003) apply this approach to the PPCI, as their evolution over the development years is similar to that of incremental payments: They reach their peak in development year j = 0 and then drop relatively quickly to zero. This evolution of the PPCI is also the justification of Taylor et al. (2003) for the choice of the logarithm function as a link function and the assumption of a gamma distribution for the measurement noise.

Calendar year-based state space modeling
For the second modeling approach, Taylor et al. (2003) use a data set from Taylor (2000) that consists of motor vehicle bodily injury claim closure rates. Here, rather than collecting the observations from each accident year, they stack the observations from each calendar year into observation vectors. This is due to the fact that claim closure rates are relatively flat across development years, but are subject to calendar year effects. The state space model proposed by Taylor et al. (2003) provides a linear state equation and an observation equation in the form of a generalized linear model, but differs from the first approach by the time index (calendar years t instead of accident years i) and by the matrix dimensions. They consider the following state space model consisting of the state equation with (3t + 9)-dimensional random vectors x t+1 , v t , a (3t + 6)-dimensional random vector x t and transition matrix F t ∈ R (3t+9)×(3t+6) for t = 0, . . . , I − 1, and the observation equation of the t-th calendar year with (t + 1)-dimensional random vectors y t , w t , and (t + 1) × (3t + 6)-dimensional system matrix G t for t = 0, . . . , I, where the assumptions concerning the noise terms correspond to those of the first approach (transferred to calendar years). Taylor et al. (2003) choose the identity function as a link function and the measurement noise is assumed to be normally distributed, which is why one obtains an ordinary linear observation equation and the usual linear Kalman filter can be used. This choice is motivated by the sufficiently high number of claims closures in the underlying claims data, and the assumption of an approximate normal distribution is justified by the central limit theorem, although the assumption of a discrete probability distribution such as the binomial distribution would be more appropriate. As for the development of the expected claim closure rate E Z i,j with respect to the claims of an accident year i = 0, . . . , I over the development years j = 0, . . . , I, Taylor et al. (2003) assume with γ t as effect of the t-th calendar year and Kronecker Delta δ i+j,t . The observation vector T of the t-th calendar year with t = 0, . . . , I contains all t + 1 claim closure rates Z i,j of the respective calendar year t = i + j (see Figure 5), which is why the (3t + 6)-dimensional state vector can be stated as Figure 5. Calendar year-based modeling of the observation vector.
While the state vector x i in the first modeling approach only contains the parameters of the i-th accident year, the state vector x t contains all parameters up to the t-th accident year plus the corresponding calendar year effect. This is due to the fact that the observations of the t-th calendar year pass through all accident years i = 0, . . . , t. The observation Equation (35) is thus given by (37) and γ t according to (38) for all i, j = 0, . . . , t as well as three-dimensional zero vectors 0. The state Equation (34) is then where I and O in F t are identity and zero matrices of dimensions 3 × 3, respectively, 0 in v t are three-dimensional zero vectors and v T Thus, the state equation involves a dynamic estimation of the parameters β * t+1 and γ t+1 via Table 2 gives an overview of the dimensions of vectors and matrices in the state space models of Taylor et al. (2003). Table 2. Dimensions in the state space models of Taylor et al. (2003).

The Application of State Space Model in Outstanding Claims Reserve
Pang and He (2012) largely adopt the second modeling approach from Taylor et al. (2003), but without integrating calendar year effects. They extend the state equation by including a further lag of the state vector. Accordingly, the state space model they consider is given by for all s, t = 1, . . . , I. Table 3 gives an overview of the dimensions of vectors and matrices in the state space model of Pang and He (2012). Table 3. Dimensions in the state space model of Pang and He (2012).

Vectors Matrices
The observation vector y t contains all observations X i,j of the t-th calendar year, i.e., all X i,j with i + j − 1 = t. However, the nature of the claims data is not obvious and the authors refer to it only as "times of compensation". Therefore, in view of the magnitude of the observations and their modeling, claims data are assumed to be incremental payments. The expected incremental payments of an accident year i = 1, . . . , I are assumed to have a parametric evolution over the development years j = 1, . . . , I similar to (33) via with Kronecker Delta δ j,1 . Thus, the observation Equation (39) of the t-th calendar year (t = 1, . . . , I) results in a similar form as achieved within the second modeling approach of Taylor et al. (2003), T for all i, j = 1, . . . , I. Pang and He (2012) do not give the general representation of the state equation according to (40), but the reduced form which solely contains the last four rows of (40) that are of interest. For the remaining (4 × 4)dimensional parameter matrices, they assume scalar matrices F * t = µ t I and H * t = η t I for all t = 1, . . . , I, which is why the state Equation (42) is given by: If, on the other hand, one intends to express the state equation in the form (40), the upper (4t × 4t)-dimensional part of F t corresponds to an identity matrix, while the last four rows in the last four columns of F t form the scalar matrix F * t = µ t I and otherwise contain zeros. The parameter matrix H t has only zeros in the (4t × (4t − 4))-dimensional upper part and also in the last four rows except for the last four columns, which correspond to the (4 × 4)-dimensional scalar matrix H * t = η t I. The noise vector v t is equal to a zero vector in the first 4t rows and to the vector v * t in the remaining rows.

Log-Normal Models for Incremental Payments (Category 2)
This section presents articles in which incremental payments are assumed to be lognormally distributed and are modeled using a log-normal model: Verrall (1989) The articles of Verrall (1989Verrall ( , 1994 are presented in detail due to the fact that they are mainly based on the use of state space models and the Kalman filter learning theory (marked in the above listing with ), while the models in the papers of Ntzoufras and Dellaportas (2002) and Li (2006) are treated in a more concise form (marked in the above listing with ).

A State Space Representation of the Chain Ladder Linear Model
Verrall (1989) discusses various state space representations based on the model of a two-way ANOVA, and thus follows Kremer (1982), who shows a close connection between the CL method and the two-way ANOVA. In addition to a dynamic estimation of the parameters by means of the Kalman filter algorithms, Verrall (1989) also considers static models without and with prior information.

The linear Chain Ladder model
The modeling is based on increments X i,j > 0 with i, j = 1, . . . , I. The restriction to positive values is necessary against the backdrop of a logarithmic transformation of X i,j . In practice, the model of Verrall (1989) can be applied to paid data, but not to incurred data. For the increments X i,j , a multiplicative model with u i as a parameter of the accident year i, s j as a parameter of the development year j and r i,j as noise term with E[r i,j ] = 1 for all i, j = 1, . . . , I is assumed. Further, the increments are presumed to follow a log-normal distribution, so a logarithmic transformation of the increments is performed, i.e., Y i,j = log X i,j . Thus, the variables Y i,j are normally distributed. If both sides of (43) of the multiplicative model are logarithmized, this leads to the (additive) model of the two-way ANOVA with normally distributed residuals with population mean µ, row parameter α i , column parameter β j and w i,j ∼ W N 0; σ 2 for all i, j = 1, . . . , I. As for the model parameters, Verrall (1989) assumes α 1 = β 1 = 0 and with i, j = 2, . . . , I, and it holds w i,j = log(r i,j ) for all i, j = 1, . . . , I. Due to the fact that (44) is a model for logarithmized increments, it is referred to in the actuarial literature as lognormal model. Verrall (1989), on the other hand, chooses to refer to it as linear CL model because it is very similar to the CL method (in an additive representation). Kremer (1982) shows this similarity of the classical CL method to the two-way ANOVA by estimating the parameters of the model (44) via OLS estimation for the two-way ANOVA and then reversing the logarithmic transformations. The predictor for the ultimate claim of an accident year i = 1, . . . , I, is similar to the CL predictor except for a different parameterization. However, Verrall (1989) argues that (45) is neither an MLE nor an unbiased estimator of the expected ultimate claim, so he proposes using Bayes estimators instead. In addition, Verrall (1989) develops several state space representations of the linear CL model (44), which are in the focus in the following.

Development of an appropriate state space representation
In order to specify a state space representation and to be able to use dynamic estimation methods, the linear CL model has to be specified in a recursive form. For this purpose, Verrall (1989) collected the incremental payments of a calendar year t = 1, . . . , I in the t-dimensional vector y t . However, different from De Jong and Zehnwirth (1983), he did not use the available observations X i,j , but the logarithmized observations Y i,j = log X i,j : i j Figure 6. Modeling the observation vector in Verrall (1989).
Using a state vector containing the model parameters µ, α 2 , . . . , α t , β 2 , . . . , β t up to the t-th accident and development year, an appropriate observation equation for the t-th calendar year based on (44) can be stated as or in a more compact form as for all s, t = 1, . . . , I. For the third calendar year, for instance, (46) results in: For the state equation, Verrall (1989) gives several alternatives, where the most general variant is Here, w t , v t , u t are pairwise stochastically independent for all t = 1, . . . , I and the input vector u t is independent of the state vector x t . Table 4 gives an overview of the dimensions of the vectors and matrices in the state space model of Verrall (1989). Table 4. Dimensions in the state space model of Verrall (1989).

Vectors Matrices
The dynamics of the system depend on the matrices F t , Q t and the distribution of the input vector u t in the state Equation (48). The simplest case is when u t and v t are zero vectors for all t = 1, . . . , I and the parameters at time t + 1 are the same as those at time t. Then, (48) is given by: If, on the other hand, one wants to realize different parameters at time t + 1 and t, the following variant of the state Equation (48) can be used: The variation of the state Equation (50) means that already determined parameters remain unchanged and the new parameters are considered as stochastic inputs. While static parameter estimation is performed in the cases (49) and (50), dynamic parameter estimation can be achieved using the Kalman filter when a stochastic noise term v t is added. For dynamic modeling, Verrall (1989) proposes state equations for two cases, for a dynamic estimation of the row parameters and for a dynamic estimation of both row and column parameters simultaneously. A dynamic estimation of the row parameters with help of the random walk α t+1 = α t + v t can be achieved via the following state equation: If, on the other hand, a dynamic estimation of both the row and column parameters according to the random walks is intended, an input vector is obsolete and a reasonable state equation can be stated as follows: Thus, dynamic parameter estimation is just between the identical and the different parameter cases, where the parameters in t + 1 are related to the parameters in t, but do not necessarily have to match. The state Equation (53), which allows for a dynamic estimation of both row and column parameters, is also exemplarily given for t = 3: Verrall (1994) adopts the state space model presented in the work of Verrall (1989) with the aim to model a not necessarily homogeneous run-off evolution across the accident years within the CL method. With this approach, he addresses one of the main criticisms of the CL method, the homogeneity property. Since the state space model from Verrall (1989) is a linear CL model according to (44), Verrall (1994) shows how this model can be adjusted when there is a varying development pattern across accident years.

Connection between CL factors and column parameters
A possible method to model a not necessarily homogeneous run-off evolution across the accident years is, for example, to use the individual CL factors F i,j for all i, j instead of the CL development factors f j . Such modeling would allow for deviating development factors in different accident years, but comes with the disadvantage of overparameterization. It is therefore reasonable to strike a balance between both these extremes, i.e., between the CL development factors that are identical across the accident years and individual CL factors. For this purpose, Verrall (1994) uses the connection between the CL factors and the column parameters β j in the linear CL model (44) (see Verrall 1991) to be able to indirectly relax the homogeneity property of the CL method via modifications to the linear CL model. Verrall (1994) modifies the linear CL model of Verrall (1989) such that the column parameters β j with j = 2, . . . , I need not to be identical across all accident years. He differentiates the parameters β j by accident years i = 1, . . . , I via an extension of the notation to β i,j , where β i,j corresponds to the column parameter β j in the i-th accident year. Verrall (1994) does not give general definitions of the observation and state equations, but in the following we provide such representations. As for the observation equation in the t-th calendar year, it can be given in general form as follows:

Development of an appropriate state space representation
As an example, the observation equation in t = 3 results in: A connection between the parameters of successive accident years can be established by the state Equation (48). In this regard, a dynamic estimation of the row parameters can be achieved via with α 1 = 0 and E[v i ] = 0 for all i = 1, . . . , I − 1 to avoid overparameterization of the model. The column parameters β i,j of a development year j are supposed to be connected across accident years i in such a way that they follow a random walk with β i,1 = 0, β 0,j = 0 and E[v i,j ] = 0 for all i = 1, . . . , I and j = 2, . . . , I. In this manner, it is found that the parameters related to a specific development year are similar for different accident years or can be identical, but do not necessarily have to be identical. If one assumes a variance of zero for the noise terms v i,j for all i, j, one obtains the state Equation (51) from Verrall (1989), i.e., the column parameters β i,j of development year j are identical across all considered accident years i and correspond to the column parameter β j of the linear CL model (44). The larger the variance of the noise terms v i,j chosen, the larger the variation in the parameters β i,j can be across different accident years. Accordingly, the variances of the individual noise terms can be used to account for the indicators of changes in the development pattern. Thus, the state equation is obtained using (55) and (56): Considering t = 3, the state equation is exemplarily given by: Finally, when estimates of the column parameters β i,j for all i, j are obtained (determined by means of the Kalman filter), the individual CL factors F i,j can be determined separately for individual accident years via according to (54) for j = 2, . . . , I. In this manner, a not necessarily homogeneous runoff evolution across all accident years can be modeled within the CL method and the problem of overparameterization is avoided due to the recursive development of the column parameters. Furthermore, it should be emphasized that a dynamic estimation of the parameters has a considerable advantage over the static CL estimation: the observations of more recent accident years have a higher weight with respect to the prediction of the outstanding loss liabilities, whereas CL assigns the same weight to all the observations.

Bayesian Modelling of Outstanding Liabilities Incorporating Claim Count Uncertainty
Ntzoufras and Dellaportas (2002) consider four models based on claims development triangles that include incremental payments and claim counts for RBNS claims. They assume that claims are settled via one-off payments. They justify this assumption by means of their empirical application example, in which they use run-off data from a large Greek motor insurance company, where claims must be reported within three working days according to Greek legislation and are usually settled in the form of a one-off payment. The proportion of claims that are paid in more than one installment of claims payments is minimal, and therefore is neglected by Ntzoufras and Dellaportas (2002).
Two models are based solely on incremental payments, while the other two models incorporate incremental payments and claim counts, thus using Payments Per Claim Finalized (PPCF). Ntzoufras and Dellaportas (2002) adjust the incremental payments X i,j by the inflation index ν i,j ≥ 1 of the corresponding calendar year t = i + j − 1 and logtransform the inflation-adjusted incremental payments that are assumed to be log-normally distributed via but it is generally based on the two-way ANOVA model and thus also on the linear CL model from Verrall (1989Verrall ( , 1994 according to (44). In the framework of models 3 and 4, Ntzoufras and Dellaportas (2002) consider state space models; however, they only specify the ANOVA model, recursive relationships of the parameters and model extensions with-out developing a specific state space representation. The reason for this is that they do not employ the Kalman filter to fit the model and to predict the outstanding loss liabilities, but instead they use a Bayesian approach in combination with Markov Chain Monte Carlo (MCMC). As the article by Ntzoufras and Dellaportas (2002) does not mainly rely on state space models and the Kalman filter theory, the models are presented briefly, and, in particular, details on the Bayesian approach are omitted.

Log-normal model for incremental payments (Model 1)
The log-normal model for incremental payments, where the expected value µ i,j is given by for all i, j = 1, . . . , I with α 1 = β 1 = 0, is already considered by various authors. That is, the expected incremental payments µ i,j for claims of the i-th accident year that are paid with a lag of j − 1 years are modeled via a linear predictor. This predictor consists of the sum of µ (expected inflation-and log-adjusted claims payments of the first accident year that are settled in the same development year), α i (row parameter reflecting expected changes in the ith accident year), and β j (column parameter reflecting expected changes in the jth development year). According to Ntzoufras and Dellaportas (2002), the ANOVA model has the disadvantage that it includes only one source of information (i.e., incremental payments) and omits claims counts. For example, this model would not be able to take into account a strong increase in incremental payments due to a surprising increase in the claim counts.

Log-normal model for PPCF (Model 2)
The log-normal model for PPCF extends the first model by additionally considering claim counts in the modeling. For this purpose, Ntzoufras and Dellaportas (2002) give a two-stage model, where the first stage is related to incremental payments, with α 1 = β 1 = 0 and claim counts N i,j > 0 for all i, j = 1, . . . , I. Compared with model 1, the ANOVA model (57) was additively extended by the term log(N i,j ), which is why µ in (58) can be interpreted as the logarithmized expected PPCF of the first accident year in the first development year, and the parameters α i and β j can be considered as expected deviations from µ in the later accident and development years, respectively. The second stage of the model is related to the claim counts N i,j ∼ P λ i,j with λ i,j > 0. It is given by the log-linear model . . , I, hyper-parameters µ * and α * i , and β * j = log π j π 1 , where α * 1 = β * 1 = 0 holds, 0 < π j < 1 is the probability that a claim will be settled with a lag of j − 1 years, and T i denotes the total number of claims for a given accident year i. In this model, an increase in incremental payments induced by higher claim counts is accounted for.

State space model for incremental payments (Model 3)
The state space model for incremental payments is based on the discussion of Verrall (1989) and the extension of the column parameters β j to β i,j as proposed by Verrall (1994): Here, the row and column parameters α i and β i,j follow the recursions with h i ∼ N 0; σ 2 h and v i ∼ N 0; σ 2 v as well as α 1 = β i,1 = 0 for all i, j = 2, . . . , I. Thus, for the variance of the individual log-transformed and inflation-adjusted incremental payments Y i,j , Var(Y i,j ) = σ 2 holds for i = 1 or j = 1 and Var(Y i,j ) = σ 2 + (i − 1) σ 2 v + σ 2 h holds for i, j = 2, . . . , I, as in each subsequent accident year after accident year i = 1, the weighted sum of the variance terms σ 2 v , σ 2 h (see recursions (59) and (60)) is added to the variance term σ 2 . That is, this model differs from model 1 in two ways: the column parameters β j are extended to β i,j , and both row and column parameters evolve recursively. The recursions (59) and (60) are thereby decisively affected by the variances σ 2 h and σ 2 v of their noise terms: If σ 2 h is assumed to be close to zero, all row parameters tend to zero due to α 1 = 0. If, on the other hand, σ 2 v = 0 is assumed, models 1 and 3 are identical (except for the α-recursion) because the column parameters are the same across all accident years, i.e., β i,j = β j holds for all i.

State space model for PPCF (Model 4)
The state space model for PPCF extends model 3 by incorporating claim counts. Like the second model, it is designed as a two-stage model, with stage 1 related to incremental payments and stage 2 related to claim counts. Thus, the first stage of model 4 is described via for all i, j = 1, . . . , I with recursions (59) and (60), and the second stage is identical to the second stage of model 2. Hence, like models 1 and 3, models 2 and 4 differ in other column parameters and in the recursive relationships of row and column parameters. (2006) compares some methods in stochastic claims reserving, including a state space model, in terms of forecasting the outstanding loss liabilities. The considered state space model

Li
is based on the common assumptions regarding the noise terms (as, for example, in De Jong and Zehnwirth 1983), and it is constructed in analogy to Verrall (1989) via the log-normal model for incremental payments and the linear CL model (44), respectively: the observation vector y t includes all logarithmized incremental payments The measurement noise w i,j that overlays the expected logarithmized incremental payments follows a Gaussian white noise process (w i,j ∼ W N 0; σ 2 w ). The state vector x t includes µ, row parameters α 2 , . . . , α t , and column parameters β 2 , . . . , β I ; thus, unlike Verrall (1989), column parameters beyond j = t for t < I are also included. Table 5 gives an overview of the dimensions of the vectors and matrices in the state space model of Li (2006). Table 5. Dimensions in the state space model of Li (2006).

Vectors
Matrices The observation Equation (61) of the t-th calendar year can be stated as: The part on the left-hand side of the vertical line in the system matrix G t is generally of dimensions t × (2t − 1), and the part on the right-hand side consists of (I − t) zero columns for all t = 1, . . . , I. Thus, if t = I, G t only includes the (I × (2I − 1))-dimensional part on the left-hand side of the vertical line and no zero columns. As for the state Equation (62), Li (2006) proposes a dynamic estimation of the row parameters according to α For t ≥ 3, the (t − 1)-th column of F t thus contains in the rows t − 1 and t the value one and otherwise only zeros. In the case t = 2, however, F t deviates from (63) by having only zeros in the second row because of α 2 = v 2 . The noise term v t corresponds in each case to the t-th component of the vector v t .

Correlation Models (Category 3)
This section presents two articles: Here, correlations regarding the different dimensions of claims development triangles are considered. As the conference paper by De Jong (2005) can be seen as a preprint of De Jong (2006) (with respect to the remarks on claims reserving), it is briefly presented, while De Jong (2006) is highlighted in the listing (as in the previous sections) with since it is significantly based on state space models and Kalman filter learning theory.

State Space Models in Actuarial Science
De Jong (2005) discusses two applications of state space models in actuarial sciences, in relation to mortality and in relation to cumulative payments in run-off triangles. As for the latter one, he extends the model of Hertig (1985) and proposes the so-called de-velopment correlation model. This model is already presented in a prior working paper by De Jong (2004), where two additional models, the accident correlation model and the calendar correlation model, are proposed, but without discussing their state space representations. This extension, i.e., an embedding of the three models into state space representations and model fitting via Kalman filter, is carried out in the work of De Jong (2006). Thus, with respect to applications of state space models in claims reserving, De Jong (2005) is a variant of De Jong (2006), which only deals with one of the correlation models. For this reason, we refer to the following subsection, in which the article of De Jong (2006) is presented.

Forecasting Runoff Triangles
De Jong (2006) aims to predict the outstanding loss liabilities using three different models that can account for correlations within the claims data. In each case, De Jong (2006) gives state space representations for these models in order to be able to apply the Kalman filter to predict the claims reserves and to quantify their precision. Based on these results, he simulates the complete shape of the liability distribution. In the following, the focus is mainly on the state space representations of the considered models.
The proposed correlation models in the work of De Jong (2006) are generally based on a model of Hertig (1985), which is extended in such a way that correlations between the individual accident, development or calendar years can be incorporated into the modeling. The models consider the logarithmized individual development factors with i = 1, . . . , I, j = 1, . . . , I − 1 and δ i,0 = ln(C i,0 ). Using the individual development factors (64), the future growth rate g i of cumulative payments in each accident year i = 2, . . . , I can be decomposed as follows: Considering (65), the outstanding loss liabilities R i = C i,I−1 − C i,I−i of an accident year i = 2, . . . , I are given by: An aggregation of (66) across all accident years yields the total outstanding loss liabilities: Thus, in order to predict the outstanding loss liabilities, it is necessary to estimate the growth rates g 2 , . . . , g I according to (65) and the future logarithmized individual development factors δ i,j for i + j > I, respectively. For this purpose, De Jong (2006) considers three extended variants of the model proposed by Hertig (1985). The model of Hertig (1985), with h 0 = 1, E ε i,j = 0 and Var ε i,j = σ 2 , is a simple model for logarithmized individual development factors in which the δ i,j are assumed to be uncorrelated for all i = 1, . . . , I, j = 0, . . . , I − 1. Here, E δ i,j = µ j and Var δ i,j = h 2 j σ 2 , i.e., expected value and variance of the logarithmized individual development factors δ i,j only depend on the development year j.
With the goal to incorporate correlations of the logarithmized individual development factors into the model of Hertig (1985), De Jong (2006) presents the development, accident, and calendar correlation models, each considering correlations between development years j, accident years i, and calendar years t = i + j, respectively. In order to achieve appropriate state space representations of these models, De Jong (2006) generally suggests the state space model with t = 1, . . . , I, where the t-dimensional observation vector y t = (δ 1,t−1 , . . . , δ t,0 ) T contains the logarithmized individual development factors δ i,j of the t-th calendar year (see Figure 7). Due to the fact that De Jong (2006) aims to embed all three models into the same general state space model, the state space representations obtained in this way are excessive in their complexity. This is in contrast to the underlying compact models, in particular the development correlation model with only one model equation.

Development correlation model
The development correlation model allows to model correlations of δ i,j across development years j = 0, . . . , I − 1 for a given accident year i = 1, . . . , I and is defined by with E ε i,j ε i,j−1 = 0 for i = 1, . . . , I and j = 1, . . . , I − 1. Here, the correlation between development years j and j − 1 (i.e., between δ i,j and δ i,j−1 ) is modeled via θ j . Based on empirical evidence, De Jong (2006) argues that only correlations between the first two development years are relevant, so only the correlation between δ i,0 and δ i,1 is considered. Thus, the correlation coefficient between δ i,0 and δ i,1 results in i.e., the correlation between δ i,0 and δ i,1 is based solely on θ 1 . Thus, if θ 1 = 0, then δ i,0 and δ i,1 are uncorrelated as in the model of Hertig (1985). Furthermore, setting θ j = 0 in (71) for all j = 1, . . . , I − 1 results in the original model of Hertig (1985). The development correlation model (71) can be transferred into a state space representation with the observation equation by using (69) and (70). The matrix B t consists of the last t + 1 rows of the row-permuted identity matrix I ∈ R I×I ; that is, B t corresponds to the row-permuted identity matrix on the left-hand side of the vertical line for t = I − 1, and it reduces by one row for each t before the (I − 1)-th calendar year. Considering, for example, t = 3 and I = 5, the state space representation of the development correlation model (71) is given by:

Accident correlation model
The accident correlation model allows for correlations between accident years and implies that more recent accident years receive a higher weight for prediction. To achieve this goal, the expected value µ j in (68) is extended by a row index i to µ i,j and a random walk is assumed across the accident years (i = 1, . . . , I, j = 0, . . . , I − 1): Here, E η i,j = 0, Var η i,j = σ 2 η and E ε i,j η i,j = 0 hold for all i, j. Thus, the expected value µ i,j of a development year can change slowly across accident years. This change is influenced by the parameter λ j : the larger λ j , the higher the weight of µ i,j of more recent accident years. Setting λ j equal to zero for all j, the accident correlation model corresponds to the model of Hertig (1985), since the expected value µ i,j of a development year is identical across all accident years. The accident correlation model (72) can be transferred into a state space representation with the observation equation by using (69) and (70). The matrix B t consists exclusively of zeros, apart from the value of one at position (1, t + 1). Thus, for t = I − 1 it corresponds to the entire ((I × I)-dimensional) part on the left-hand side of the vertical line. Considering, for example, t = 3 and I = 5, the state space representation of the accident correlation model (72) is given by:

Calendar correlation model
The calendar correlation model with E η i+j = 0, Var η i+j = σ 2 η and E ε i,j η i+j = 0 for all i = 1, . . . , I, j = 0, . . . , I − 1 is appropriate to consider correlations between calendar years t = i + j. The calendar year effects τ t are modeled as a random walk across calendar years, which is why all logarithmized individual development factors δ i,j of a given calendar year change equally. The effect of τ t on individual development factors is measured by h j and it is modeled proportionally to the standard deviation of ε i,j . Setting κ = 0, the calendar correlation model (73) corresponds to model (68), since the effects τ t are the same for all calendar years t = 1, . . . , I and the term h j τ i+j is considered as part of µ j . The calendar correlation model (73) can be transferred into a state space representation with the observation equation and the state equation by using (69) and (70). The matrix B t contains the last t + 1 rows of the row-permuted identity matrix I ∈ R I×I and a row of zeros as the last row, i.e., for t = I − 1 it corresponds to the entire (((I + 1) × I)-dimensional) part on the left-hand side of the vertical line, and for each t before the (I − 1)-th calendar year it reduces by one row. Considering, for example, t = 3 and I = 5, the state space representation of the calendar correlation model (73) is given by: Table 6 gives an overview of the dimensions of vectors and matrices in the above three state space models of De Jong (2006).

Univariate State Space Models (Category 4)
In this section, we present articles where univariate state space models are proposed: Alpuim and Ribeiro (2003) Both articles are mainly devoted to state space models and the Kalman filter learning algorithms, so they are highlighted with in the above listing.

A State Space Model for Run-Off Triangles
Alpuim and Ribeiro (2003) present a univariate distribution-free state space model for incremental payments to predict claims reserves and to calculate their precision. They assume that the incremental payments of more recent development years are not related to the respective payments of the previous development year, but to the payments made in the accident year. This is in contrast to the common CL method, which is based on the assumption that cumulative payments in more recent development years are proportional to the cumulative payments of the previous development year, with the proportionality factor being assumed to be constant across all accident years under consideration (homogeneity property). Alpuim and Ribeiro (2003), on the other hand, assume that the proportionality factor linking the incremental payments of more recent development years to the value of the 0th development year may also vary across accident years, so they do not require the common assumption of independent accident years often found in stochastic claims reserving methods.
The observation equation thus links the incremental payments X i,j of the ith accident year (i = 1, . . . , I) in the jth development year (j = 1, . . . , J − 1 and J = I) via factor β i,j to the payments X i,0 that already occurred in accident year i (see also Figure 8):  Here, the incremental payments X i,j act as observations, while the β i,j for all i, j correspond to the unknown states. The state equation is constructed as an AR(1) model with the expected value µ j and β i,j as a function of β i−1,j : As for the noise terms, they are assumed as white noise processes with E[w i,j ] = 0 and E[w i,j w k,l ] = r i,j if i = k and j = l 0 otherwise as well as E[v i,j w k,l ] = 0 for all i, k = 1, . . . , I and j, l = 1, . . . , J − 1. The strictest assumption of the model is that the incremental payments of more recent development years depend on the payments of the 0th development year, whereas the columns for j = 1, . . . , J are independent of each other. Setting the variances q i,j and the coefficients φ j equal to zero for all i, j, (75) simplifies to β i,j = µ j , i.e., β i,j is constant across all accident years and corresponds to the expected value µ j of the j-th development year. In this case, the observation Equation (74) results in X i,j = µ j X i,0 + w i,j . On the other hand, if the coefficients φ 1 , . . . , φ J−1 are all set equal to one and q i,j = 0 also holds for all i, j, then the state equation is β i,j = β i−1,j , which is why the coefficients are constant over all accident years, and the observation equation results in X i,j = β 0,j X i,0 + w i,j . The state equation would thus be obsolete in both cases and the state space modeling would simplify to a regression model. Thus, the general model (see (74) and (75)) can be seen as a simple regression model of each X i,j on X i,0 , where the time-varying parameters β i,j follow an AR(1) process.

State Space Models and the Kalman Filter in Stochastic Claims Reserving: Forecasting, Filtering and Smoothing
Chukhrova and Johannssen (2017) propose a scalar state space model for cumulative payments to employ the Kalman filter for calculating the claims reserves and for measuring their precision. It is assumed that there are unobservable states C i,j underlying the observed cumulative payments C obs i,j with i + j ≤ I for i, j = 0, . . . , I, i.e., the "real cumulative payments" are modeled as latent variables and there may be a potential observation error in the claims data. The introduced state space model then allows to determine the entire unobservable upper and lower run-off triangles, that is, forecasting, filtering and smoothing of all states C i,j with i, j = 0, . . . , I (see Figure 9). The authors consider a linear state space model, which consists of the observation equation with g j > 0, w i,j ∼ W N 0; σ 2 w and σ 2 w > 0 for i = 0, . . . , I, j = 0, . . . , J as well as the state equation with f j > 0, v i,j ∼ W N 0; σ 2 v and σ 2 v > 0 for i = 0, . . . , I, j = 0, . . . , J − 1. The white noise processes (w i,j ) i=0,...,I j=0,...,J and (v i,j ) i=0,...,I j=0,...,J−1 are uncorrelated, i.e., E[v i,j w k,l ] = 0 holds for all i, k = 0, . . . , I, j = 0, . . . , J − 1 and l = 0, . . . , J. This assumption is due to the fact that there is no reason to assume a systematic relationship between the measurement noise (w i,j ) i=0,...,I j=0,...,J and the process noise (v i,j ) i=0,...,I j=0,...,J−1 .
The state Equation (77) and the observation Equation (76) can also be given as follows: In (78) and (79), a i,j and b i,j with i = 0, . . . , I and j = 0, . . . , J are appropriate linear functions. As a consequence of the model assumptions, hold for all j, k = 0, . . . , J, l = 0, . . . , J − 1 with j ≤ k, j ≤ l. Thus, the initial state C i,0 of an accident year i = 0, . . . , I is uncorrelated with v i,j and w i,j for all j.
As for the prediction of the future cumulative payments C i,j with i + j > I for i = 1, . . . , I, j = 1, . . . , J in the lower triangle, the Kalman learning algorithms for oneand h-step predictions (h ≥ 2) can be used. Considering the underlying states C i,j of the observations C obs i,j in the upper triangle, the Kalman learning algorithms for filtering (for i + j = I) and the Kalman learning algorithms for smoothing (for i + j < I) can be applied to identify outliers in the observations and to replace them by filtered or by smoothed observations as well as to quantify outlier effects. Another key application of smoothing and filtering algorithms is the interpolation of missing values in the upper run-off triangle (e.g., resulting from a merger).

Row-Wise Stacking Approaches (Category 5)
In this section, we discuss articles where the claims data is stacked row-wise: These articles are all marked with because the proposed methods are mainly based on state space models and the Kalman filter learning algorithms.

A Row-Wise Stacking of the Runoff Triangle: State Space Alternatives for IBNR Reserve Prediction
In contrast to most of the above approaches, Atherino et al. (2010) do not stack the observations of individual accident, development or calendar years in a vector representation, but consider the claims data as a univariate time series with various missing observations. The time series is then modeled using a structural model in a state space representation. As for the prediction of the claims reserves and the estimation of the corresponding MSEP for individual and aggregated accident years, Atherino et al. (2010) present two approaches, the blocks method and the cumulating method. Although both approaches differ in some aspects, they provide the same numerical results. Atherino et al. (2010) consider claims development triangles that include incremental payments X i,j in accident years i = 1, . . . , J and development years j = 0, . . . , J − 1. They put the incremental payments into a representation as univariate time series by simply stacking the observations of more recent accident years to the observations of the first accident year. Thus, the common double indexing i, j is omitted and replaced by the simple index t, which, however, cannot be interpreted in chronological form as usual for time series. The time series y t constructed in this way, with t = 1, . . . , J 2 , has more and more missing observations for increasing t, which lead to the outstanding loss liabilities for aggregated accident years as follows: Figure 10 shows the row-wise "stacked" incremental payments using the notation y t instead of X i,j , where the observed time series values correspond to those of the upper triangle and the missing values to those of the lower triangle.

Development of an appropriate state space representation
Here, the level component captures the mean level of incremental payments, while the periodic component reflects the column effect (i.e., the development pattern) and the regression term is incorporated to address intervention effects (related to outliers in the observations).
To represent the structural model consisting of Equations (80)-(82) as a state space model, Atherino et al. (2010) consider the general state space model with normal assumptions for t = 1, . . . , J 2 . As for the noise terms w t and v t , it is assumed that E w s w T for all s, t = 1, . . . , J 2 . Moreover, the initial state x 1 is proposed to be independent of w t and v t for all t. Incorporating the structural model into a state space representation, the observation equation results in with y t = y t , G t = g T t , H t = h T t , w t = ε t and R t = σ 2 ε and the state equation is given by Table 7 gives an overview of the dimensions of vectors and matrices in the state space model of Atherino et al. (2010). Table 7. Dimensions in the state space model of Atherino et al. (2010).

Vectors
Matrices In the following, the cumulating method, one of the two approaches proposed by Atherino et al. (2010) to predict the loss reserves and to estimate their MSEP for individual and aggregated accident years, is presented.

Cumulating method
The cumulating method adds additional components to the state vector that accumulates estimates of the missing observations in the lower triangle so that the MSEP of the claims reserves can directly be determined using Kalman filter. In the following, I denotes an index set containing all t-indices belonging to observations y t , and (T) stands for total, i.e., for aggregated accident years. If one is interested only in the claims reserves along with the MSEP for aggregated accident years, the state vector can be extended by the additional component δ (T) t that accumulates all estimates of missing observations across all accident years. The state space model is then given by with δ (T) 1 = 0, the J-dimensional zero vector 0 in the transition matrix, the two-dimensional zero vector 0 T and the J-dimensional row vector where the changes in the dimensions within the system compared to (83) and (84) are given, while g T t , x t , x t+1 , F t , B t , v t remain unchanged. If one is also interested in individual accident years, further components corresponding to the respective accident years i = 2, . . . , J have to be added to the state vector. This leads to the inclusion of the J-dimensional vector in which the component δ (T) t related to aggregated accident years is also included. The modified state space model is then be given by with δ 1 = 0, the (J × J)-dimensional zero matrix O and identity matrix I in the transition matrix, the (J × 2)-dimensional zero matrix O and the (J × J)-dimensional matrix if t / ∈ I and t from row i = 2, . . . , J 0 T otherwise as well as component β (T) t according to (85). Thus, the vector δ J 2 +1 includes the claims reserves for individual and aggregated accident years, but without taking into account the effects of the regression terms h T t u with t / ∈ I, which are excluded from the accumulation process and therefore have to be added separately.

State Space Models for Predicting IBNR Reserve in Row-Wise Ordered Runoff Triangles: Calendar Year IBNR Reserves and Tail Effects
Costa and Pizzinga (2020) extend the row-wise stacking approach of Atherino et al. (2010) and the corresponding state space representation of the structural model by implementing (1) a calendar year IBNR reserve prediction and (2) tail effects for the row-wise ordered triangle. In this way they intend (1) to improve the possibilities of an insurance company to predict short-term IBNR reserves and (2) to make IBNR predictions more conservative and thus more effective to protect insurance companies from insolvency risks.
As for the first extension, Costa and Pizzinga (2020) consider the cumulating method proposed by Atherino et al. (2010) and simply add a further cumulating entry to the state vector, in particular, to the vector (86). The additional cumulating entry δ (C) t is related to the calendar year IBNR reserve and accumulates all estimates of missing observations associated with a specific calendar year.
As for the second extension, Costa and Pizzinga (2020) consider both a one-step ahead column and row tail effects in the claims development triangle. Thus, the triangle is extended by an additional row for the (J + 1)-th accident year and an additional column for the J * -th development year. Following Costa and Pizzinga (2020), this short period for the tail effects does not lead to a reasonable loss of generality as it was empirically shown that the last column payments are expected to be lower than the first ones. In order to incorporate the tail effects into the structural model, Costa and Pizzinga (2020) assume that y J * , y 2J * , . . . , y J * 2 , y J * 2 +J * have the same periodicity behavior (i.e., "saisonality") as the respective previous observation of the time series. Against this backdrop, the following changes are made to the system matrices of the state space representation (see (87) and (88)): That is, the modified state space representation for the cumulating method is the same as in the work of Atherino et al. (2010) for the observations that are not affected by a column tail effect. As for the observations with the tail effect, the above modifications force the periodicity component to be exactly the same as those from the preceding observations.

Applying State Space Models to Stochastic Claims Reserving
Hendrych and Cipra (2021) discuss and compare various common approaches in stochastic claims reserving such as log-normal models or Hoerl curve approaches in the framework of state space models. In particular, the authors use the approach of a row-wise stacking of the claims development data ordered as a time series proposed by Atherino et al. (2010) to handle common claims reserving methods via unified state space representations and the Kalman filter learning algorithms. This approach has the benefit that all the different models can be handled within the same framework and the results can be easily compared. As the row-wise stacking approach in a state space representation has practical advantages over other state space approaches, Hendrych and Cipra (2021) transfer its benefits for handling different approaches within the same state space framework.
In the following, the log-normal model for incremental payments according to (44) investigated by Verrall (1989) and other authors is considered (see Section 3). This model is converted into a state space representation following the row-wise stacking approach. In the first step, Y i,j for all i, j = 0, . . . , I are row-wise stacked (as proposed in the work of Atherino et al. (2010)), and the common time series notation via y t with t = i · I + j is used. In contrast to Verrall (1989), Hendrych and Cipra (2021) take the observations of the first column (Y i,0 for all i) for each accident year as initial values in the observation equation. This is conducted before the backdrop so that the initial level for the recursions is set in a more appropriate way, which has a positive impact on the calculations when there are few data and especially when there are missing values. Thus, the row-wise stacked log-normal model for incremental payments can be stated as The corresponding state space representation with state vector T can then be given as follows: In addition, Hendrych and Cipra (2021) consider the multivariate case for all the discussed approaches. This leads to a further benefit of state space models in claims reserving as it becomes possible to incorporate claims activity dynamics and to model dependencies between correlated lines of business. This does not require any additional effort by the practitioner, since multivariate modelings can be implemented by state space models in a simple way and are largely analogous to the univariate case.
In the following, the multivariate log-normal model for incremental payments is considered in a state space representation. In addition to the unknown parameters in the above univariate case (σ 2 w , σ 2 v ), there are further parameters describing the correlations between the run-off triangles in the multivariate setting. Hence, considering N run-off triangles, the Y i,j (h) for all i, j and h = 1, . . . , N are modeled via the log-normal model for incremental payments in a row-wise stacked manner as follows . As for achieving a suitable state space representation, the vectors can be used, and the variance-covariance matrices R t = (σ w (m, h)) m,h=1,...,N and Q t = (σ v (m, h)) m,h=1,...,N contain the correlation parameters that have to be estimated. Therefore, the following state space representation for the multivariate log-normal model for incremental payments is obtained: Finally, Table 8 gives an overview of the dimensions of vectors and matrices in the above exemplary state space models of Hendrych and Cipra (2021). Table 8. Dimensions in the state space models of Hendrych and Cipra (2021).

Univariate Case
Multivariate Case

Conceptual Comparison
In this section, a conceptual comparison of the proposed methods is conducted. In particular, we compare the objectives behind the methods, the modeling approaches for claims data, and the state space representations. Further, we give insights from practical applications discussed in the papers.

Objectives and Claims Data
The vast majority of articles (Verrall 1989;Wright 1990;Ntzoufras and Dellaportas 2002;Alpuim and Ribeiro 2003;Li 2006;Atherino et al. 2010;Chukhrova and Johannssen 2017;Costa and Pizzinga 2020;Hendrych and Cipra 2021) aim to forecast the outstanding loss liabilities and to calculate the corresponding prediction error. In addition, there are deviant objectives such as an estimation of the underlying states of the observations in the upper triangle (De Jong and Zehnwirth 1983;Chukhrova and Johannssen 2017), an extension of the CL method to not necessarily homogeneous development patterns across accident years (Verrall (1994)), an illustration of calendar year effects (Zehnwirth (1997)), or a simulation of the shape of the liability distribution (De Jong 2006;Hendrych and Cipra 2021).
Often, the claims data are directly embedded in the objective and thus are an essential component of the modeling. For example, log-normal models for incremental data require strictly positive claims data, which is why they are unsuitable for incurred incremental data. Additionally, modeling via a Hoerl curve needs incremental payments and cannot be easily applied to incurred incremental data. In some articles, such as Ntzoufras and Dellaportas (2002) and Taylor et al. (2003), the claims data even form the foundation of the modeling, i.e., the state space representations are motivated by and constructed specifically for the underlying claims data.

Modeling of Claims Data
The categories "Parametric evolution of claims data" and "Log-normal models for incremental payments" include the most common modeling approaches for claims data.
Within the first category, De Jong and Zehnwirth (1983), Wright (1990), and Zehnwirth (1997) assume that incremental payments are subject to a very fast increase in early development years and an exponentially decrease over the following development years, which is why they model incremental payments via a Hoerl curve (see (5), (17) and (30)). The general exponential-logarithmic Hoerl curve is given by with development year parameter β j for all j = 0, . . . , J and κ, δ ∈ R. An advantage of treating development time j as a continuous covariate is that extrapolation is possible beyond the range of development times observed (see, e.g., Chukhrova and Johannssen 2017). The Hoerl curve is the most popular parametric form used for modeling the evolution of incremental payments over development years j, since it behaves very similar to the typical run-off of incremental payments: it rises very quickly to its peak and then tends to zero at an exponential speed. Following the Hoerl curve approach, De Jong and Zehnwirth (1983), Wright (1990), andZehnwirth (1997) propose modeling the expected incremental payments in i, j by means of variations of (89) as follows (see (5), (21) and (30)): De Jong and Zehnwirth 1983) E[X i,j ] = ε i p i,j e (i+j )τ Kj λ (Wright 1990) E[X i,j ] = e α−0.2j (Zehnwirth 1997) In addition, by implementing state space models, De Jong and Zehnwirth (1983) and Wright (1990) allow the accident year parameters to evolve recursively over the accident years, see (16) and (26), i.e., they implement dynamic estimation of the parameters that has the advantage of avoiding overparameterization of the model.
Since the evolution of incremental payments can be applied in a similar way to PPCI and claim closure rates, Taylor et al. (2003) also use a parametric approach to model the evolution over the development years in a suitable way. For this purpose, however, they do not choose a variant of the Hoerl curve, but approaches similar to discounting. In particular, Taylor et al. (2003) (Taylor et al. 2003) (Taylor et al. 2003) for a given accident year i = 0, . . . , I over the development years j = 0, . . . , I (see (33), (36)). Pang and He (2012) follow the modeling approach of the linear predictor for the PPCI according to (33) in the work of Taylor et al. (2003) and adopt their approach for incremental payments (see (41)): (Pang and He 2012) For the most part, the modeling approaches in these articles do not require any distributional assumptions. The only exceptions are Wright (1990), where the number of payments is assumed to be Poisson-distributed, and Taylor et al. (2003), where the noise terms and thus the observations are assumed to be EDF-distributed.
Considering the second category "Log-normal models for incremental payments", all the models are based on explicit distributional assumptions, since the incremental payments are assumed to be log-normally distributed. The logarithmized incremental payments Y i,j in i, j are then specified via the log-normal model for incremental payments (also called the linear CL model, following Verrall 1989). In particular, Verrall (1989) and Li (2006) use the common basic model (see (44)) (Verrall 1989;Li 2006) whereas Verrall (1994) and Ntzoufras and Dellaportas (2002) suggest a variant of this model that allows for variations in the column parameters across accident years, (Verrall 1994;Ntzoufras and Dellaportas 2002) where the column parameters β i,j may evolve according to (56). In addition to incremental payments, Ntzoufras and Dellaportas (2002) also incorporate claim counts, and therefore consider PPCF as claims data. In compliance with the approaches of the first category and also by utilizing state space models, the authors implement recursions for the model parameters to achieve dynamic estimation and to avoid the overparameterization of the model (see, e.g., (52)). In contrast to the above approaches, there are other ways of modeling the claims data: De Jong (2006) (and to some extent also De Jong 2005) presents correlation models where correlations between accident, development or calendar years are considered (see (71)-(73)), Alpuim and Ribeiro (2003) and Chukhrova and Johannssen (2017) propose univariate state space models (see (74), (75) as well as (76), (77)), and Atherino et al. (2010), Costa and Pizzinga (2020), and Hendrych and Cipra (2021) discuss row-wise stacking approaches for the claims data to get a univariate time series (see, e.g., the structural model (80)-(82)).
In particular, De Jong (2006) extends the model δ i,j = µ j + h j ε i,j (i = 1, . . . , I, j = 0, . . . , I − 1) for logarithmized individual development factors (64) from Hertig (1985) by including correlations of δ i,j across development years, accident years or calendar years (see (71)-(73)): In Alpuim and Ribeiro (2003), it is proposed to model the incremental payments X i,j in i, j as a function of the payments X i,0 of the respective accident year i = 1, . . . , I by means of (Alpuim and Ribeiro 2003) see (74). Thus, the total amount of claims incurred in accident year i that has been paid j years later is proportional to the claims incurred and paid in accident year i. This proportion varies randomly with i and j, which is why Alpuim and Ribeiro (2003) consider the AR (1) process (75). By applying this approach, the common assumption of independent accident years is not required. Chukhrova and Johannssen (2017) propose to model the observed cumulative payments C obs i,j as a function of unobservable latent variables C i,j , i, j = 0, . . . , I. Against this backdrop, they presume the relationship C obs i,j = g j C i,j + w i,j (Chukhrova and Johannssen 2017) according to (76), where C i,j is additionally assumed to follow the recursion C i,j+1 = f j C i,j + v i,j (see (77)) that is implemented by using a state space model. The approach by Chukhrova and Johannssen (2017) therefore addresses potential observation errors in the claims data. The authors Atherino et al. (2010) and Costa and Pizzinga (2020) discuss a structural model for incremental payments with a local level component µ t , a stochastic periodic component γ t and a regression term h T t u, (Atherino et al. 2010; Costa and Pizzinga 2020) . This approach is inspired by the nature of the claims process: The level component shall respond for the mean value of claims in each accident year, while the periodic component is supposed to capture the development year effect. The regression term is mainly motivated by the need of intervention effects due to the presence of outliers. That is, the approach of Atherino et al. (2010), and hence also of Costa and Pizzinga (2020) and Hendrych and Cipra (2021), differs from other proposals by using a modeling approach that is not directly based on claims data with the usual double indexing, but instead, the claims data is modeled in its whole as a univariate time series. This allows the use of tools that are available for time series, and thus considerably expands the modeling spectrum including diagnostic checking and model selection criteria.

Modeling Approaches of State Space Representations
Most of the state space representations are based on the approach of a calendar yearbased modeling, in which the claims data of the individual calendar years are stacked into separate observation vectors. Similar approaches are an accident year-based modeling (see Taylor et al. 2003) or a development year-based modeling (see De Jong and Zehnwirth 1983) of the observation vectors. Beyond these most common approaches, there are univariate state space representations and state space models based on the row-wise stacking approach.
The popularity of the approaches that are aligned to the dimensions of claims development triangles (see Figure 11) is to be seen in the fact that they enable for modeling effects related to accident, development or calendar years. Because of the relationship of calendar years t = i + j to accident years i = 0, . . . , I and development years j = 0, . . . , J, it is clear that only two of these three directions (diagonal, vertical, horizontal) are "in-dependent" of each other. While the vertical direction captures trends across accident years and the horizontal direction captures trends across development years, the diagonal direction reflects trends across calendar years (see Figure 12, left-hand side). The vertical and horizontal directions are orthogonal to each other, i.e., trends in one direction are not projected to the other. However, the diagonal direction is not orthogonal to either of the other two directions, i.e., trends in calendar years are projected onto both the horizontal and vertical directions. Accordingly, diagonal or calendar year effects at a level of x% are equivalent in their effect to a combined vertical and horizontal effect each at a level of x% (see Figure 12, right-hand side). Calendar year effects include trend and structural breaks (e.g., due to extraordinary events such as floods, hurricanes, terrorist attacks, etc.), changes in the inflation rate, in individual case reserving, in the underwriting policy, in legislation, and organizational changes such as the implementation of new claims processing systems or the emergence of new phenomena (see, e.g., Zehnwirth 1997). Following the above explanations, an adequate embedding of calendar year effects into claims reserving models is essential. This also accounts for the fact that these approaches are the most widespread. Moreover, the calendar year-based approach can be justified as follows (see Chukhrova and Johannssen 2017): • It corresponds to a natural modeling of the claims data, as annually added observations build up a new diagonal in the run-off triangle.
• As for estimation and prediction, more recent observations should get a higher weight compared to past observations. The recursive and dynamic nature of the Kalman filter learning algorithms complies with this requirement, especially with respect to the calendar year-based approach.
In the following, an exemplary calendar year-based state space representation from the category "Log-normal models for incremental payments" is given. This state space representation is based on the linear CL model discussed by Verrall (1989) and can also be found in a similar form in the work of Verrall (1994) and Li (2006). It consists of the observation equation corresponding to calendar year t = i + j that implies (44) for each Y i,j of calendar year t, and the state equation that allows dynamic estimation of the accident and development year parameters via (52). However, the approaches shown in Figure 11 have the drawback that the dimensions of the vectors and matrices in the corresponding state space representations are timevariant. Considering the calendar year-based approach, this is due to the fact that with proceeding calendar years, complete diagonals are added to the run-off triangle, which have one more observation than the previous calendar year. Thus, the current calendar year has the most observations before the number of future observations in the lower triangle decreases with proceeding calendar years (when considering claims development triangles). Depending on the modeling (e.g., via a Hoerl curve or the log-normal model), these additional observations induce correspondingly increasing state vectors, system matrices, hyper-parameters and noise terms. This can complicate parameter estimation, practical handling, and simultaneous involvement of multiple run-off triangles considerably (see Chukhrova and Johannssen 2021).
The above drawbacks can be avoided by choosing state space models based on the row-wise stacking approach (Atherino et al. 2010;Costa and Pizzinga 2020;Hendrych and Cipra 2021), which enable a unified framework to handle different models. Further, as demonstrated by Hendrych and Cipra (2021), the row-wise stacking approach allows to incorporate claims activity dynamics and to model dependencies between correlated lines of business. It should also be noted that although the row-wise stacking approach is not a calendar year-based modeling approach, calendar year effects can be modeled within the row-wise stacking approach by adding an additional component to the structural model.
There are a few articles where a Bayesian approach is employed for estimation, alternatively or in addition to the Kalman filter (see Verrall 1989;Zehnwirth 1997;Ntzoufras and Dellaportas 2002). This is because both approaches are related to each other. As is well known, the Kalman filter is based on two basic ideas: First, the idea of using new information to update estimators based on previous observations. Second, the idea of filtering, i.e., separating signals from noise. On the other hand, Bayes (1763) was the first to show how new observations can be used to update previous estimators. In the usual Bayesian approach, a posterior density is first generated from the prior density and the current observation, and this posterior density is then updated to the prior density for the next step. This process is then repeated sequentially for all upcoming observations (see, e.g., Barker et al. 1995). The particular benefit of Bayesian estimation is that it allows the practitioner/researcher to incorporate prior information from other sources (see, e.g., Verrall 1989). Following Ntzoufras and Dellaportas (2002), the Bayesian approach also increases the computational flexibility, and MCMC sampling strategies can be used to generate samples for each posterior distribution of interest.
Finally, it is worth mentioning that most of the state space representations considered in the articles of this review are linear state space models, i.e., they consist of a linear observation equation and a linear state equation. This directly implies linear system properties and the limitation to linear processes. An exception is given by Taylor et al. (2003), who consider a non-linear observation equation and EDF-distributed measurement noise, that is, a generalized linear model. This approach enables for any kind of strictly monotonic and differentiable link functions (e.g., logarithm functions). However, linear system properties are not a principal drawback, as every non-linear system can be converted into a linear system by linearizing the system equations. This directly leads to the extended Kalman filter (see, e.g., Julier and Uhlmann 2004).

Insights from Practical Applications
In the following, some selected implications of empirical applications discussed in the above papers are given in chronological order: • De Jong and Zehnwirth (1983) present a simple illustrative example based on a data set from a UK general insurance company (1970)(1971)(1972)(1973)(1974), where volume and inflation indices are also available. They give estimated states for the observations of the upper triangle and predicted future incremental payments of the lower triangle. De Jong and Zehnwirth (1983) conclude that the results confirm the regular nature of the data and therefore the appropriateness of the "constant" transition model for b(i) according to (16). Further, the projected future incremental payments decline smoothly to zero with increasing delay due to the Hoerl curve approach (5). • Verrall (1989) performs comprehensive practical applications using the benchmark data set from Taylor and Ashe (1983) that includes data from the motor bodily injury class of business in one Australian state (1972)(1973)(1974)(1975)(1976)(1977)(1978)(1979)(1980)(1981). In particular, he compares static models with recursive Bayesian estimation and dynamic models, where row and column parameters are estimated dynamically. The results show that the Kalman filter and empirical Bayes methods outperform the OLS (i.e., uninformative prior) approach: the estimates of row (and column) parameters are smoother and the standard errors are lower. This is due to the fact that more information is used for parameter estimation. • Verrall (1994) considers the data set from Taylor and Ashe (1983) for an illustrative example and emphasizes that comprehensive examples covering all possibilities are not feasible. In particular, Verrall (1994) focuses solely on the development parameters and shows that the proposed model allows them to evolve over time.

•
The modeling approaches in the work of Ntzoufras and Dellaportas (2002) are motivated by their RBNS data set from a major Greek motor insurance company. The data are characterized by claims that are reported within three working days according to Greek legislation and are usually settled by a one-off payment. By comparing the predictive performance of the proposed models, Ntzoufras and Dellaportas (2002) state that the predictive ability of models 1 and 2 seems to be better compared to models 3 and 4 for the considered data set. • As for the accident year-based approach, Taylor et al. (2003) discuss a practical application based on a workers' compensation portfolio, in which benefits are dominated by payments of weekly compensation. The data show a strong upward movement of the PPCI at the beginning and a steady slow decrease in later years. Based on this evolution, Taylor et al. (2003) decide for a logarithm function as link function and a gamma distribution for the measurement noise. As for the calendar year-based approach, they use motor vehicle bodily injury data from Taylor (2000). The claim closure rates are relatively flat over the development years, but there are shocks that tend to affect whole calendar years. The filtered results follow the data closely at their general level, that is, there is minor smoothing of the calendar year effects but considerable smoothing across development years. • Alpuim and Ribeiro (2003) discuss two application examples based on real data sets: paid claims from the motor branch of a Portuguese insurance company (1984)(1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996) and the data set from Taylor and Ashe (1983). The authors compare various claims reserving methods and conclude that Hoerl curve approaches lead to the largest MSEP of the claims reserves. Further, they suppose that the log-normal transformation of the data results in larger values of the MSEP, and therefore, the original observations should be used unless there is strong evidence of log-normal distributed data. For both data sets, however, the state space model proposed by Alpuim and Ribeiro (2003) leads to reserves with the smallest MSEP. • De Jong (2006) performs a case study for the development correlation model using a data set from the Historical Loss Development Study that includes cumulative payments related to Automatic Facultative General (AFG) liability (1981)(1982)(1983)(1984)(1985)(1986)(1987)(1988)(1989)(1990). In the first step, he applies the model of Hertig (1985) to the AFG data and concludes that it is not suitable to adequately represent the data, mainly due to remaining (negative) correlations in the standardized residuals regarding the development years zero and one. For this reason, De Jong (2006) uses the development correlation model (71) in the second step, which considers the correlation between the first both development years. Then, the residuals no longer contain any correlations and the correlation between the first both development years can be explained via the development correlation model. • Atherino et al. (2010) also use the AFG data set and especially discuss three results of their analysis regarding the row-wise stacking approach. First, it provides computational feasibility and efficiency. Second, the accuracy of the reserve prediction is increased. Third, the approach is flexible with respect to IBNR modeling possibilities. As a particularly interesting aspect, they highlight that blocks and cumulating methods yield the same numerical results. • Chukhrova and Johannssen (2017) provide a comparison of various claims reserving methods with state space representations (Verrall 1989;Alpuim and Ribeiro 2003;Li 2006;Atherino et al. 2010) and popular methods such as CL, Bornhuetter-Ferguson (BF) and overdispersed Poisson using the data set from Taylor and Ashe (1983). Considering the claims reserves, their MSEP and the coefficient of variation, no model can be identified that provides the best or the worst results for the given data set. • Costa and Pizzinga (2020) perform a practical example based on the data set from Taylor and Ashe (1983) and compare their extended row-wise stacking approach with a modified CL approach and heteroskedastic regression models. For the given data set, their proposed method outperforms the three competitors with respect to IBNR reserve prediction. In particular, by applying the competitors, the insurance company might overestimate the claims reserves (thus leading to overpriced insurance contracts). On the other hand, by employing the original approach by Atherino et al. (2010), this would lead to underestimated reserves. • The most comprehensive empirical comparison of various state space models is conducted by Hendrych and Cipra (2021), who consider five data sets, including data sets from Taylor and Ashe (1983), from a Belgian insurance industry, and the data set from Alpuim and Ribeiro (2003). They compare their introduced models with the models proposed by Alpuim and Ribeiro (2003), Atherino et al. (2010), and Chukhrova and Johannssen (2017) as well as CL and BF methods. Following Hendrych and Cipra (2021), their presented state space models are adequate for routine actuarial situations. Further, they give information about the distribution of the predicted claims reserves.
It is obvious that the empirical application examples are heterogeneous, they often show only facets of the presented methods and the results are not consistently compared with other methods. There is no empirical comparison of different state space models that include, even approximately, all methods introduced up to now; the most comprehensive empirical comparisons can be found in the works of Alpuim and Ribeiro (2003), Li (2006), Chukhrova and Johannssen (2017), and Hendrych and Cipra (2021). However, it is also evident that a larger-scale empirical comparison of all the models presented is narrowly limited. This is due to several factors, such as different objectives, different claims data or the inclusion of additional information. Since the run-off data are often closely integrated in the model building and the objectives in the articles sometimes differ considerably (see Section 7.1), it is not possible to perform an empirical comparison of all the models that could do them justice. Otherwise, models would be applied to claims data and objectives for which they were not constructed. Moreover, some models require the incorporation of further information, such as inflation or volume indices, the availability of which cannot generally be assured (and, in the case of the benchmark data set from Taylor and Ashe 1983, is not available), but the omission of which would counteract the idea behind model building. Likewise, no recommendation can be formulated as to which model is best suited for actuarial practice. The decision for a specific model depends on numerous factors and should mainly rely on the verification of the model assumptions on the underlying data.

Conclusions
In this paper, we have provided a comprehensive review on the topic of stochastic claims reserving methods with state space representations. We have identified 16 relevant articles in this field and grouped them into five categories considering their key content similarities. Most of the articles fall into categories "Parametric evolution" (#5) and "Lognormal models" (#4), but there are also articles devoted to "Correlation models" (#2), "Univariate models" (#2), and "Row-wise stacking" (#3). Moreover, models for incremental payments (#12) and the calendar year-based state space modeling approach (#8) are the most prevalent.
Our main intentions were to identify where state space models have been used for improving stochastic claims reserving and to consolidate the topic in order to aid new researchers in this area. Out of these objectives, we have structured and categorized the relevant articles. Ideally, this sound basis would assist researchers currently focused on state space models in stochastic claims reserving and lead to fruitful future research in this area.
As for promising directions for future research in the field of stochastic claims reserving based on state space models, we mainly suggest to conduct micro-level claims reserving and to implement non-linear systems (see Chukhrova and Johannssen (2021)). Moreover, using state space models and beyond, we would like to emphasize the use of granular models as well as of machine learning and soft computing techniques in future research projects. Although models based on aggregate data are widely used, especially in actuarial practice, they are often characterized by rather simple model assumptions that are inadequate for the underlying data. Thus, there is the need for more flexible models which are able to deal appropriately with data where the common model assumptions are violated (see Taylor (2019)).
Author Contributions: Conceptualization, N.C. and A.J.; methodology, N.C. and A.J.; formal analysis, N.C. and A.J.; investigation, N.C. and A.J.; writing-original draft preparation, A.J.; writingreview and editing, N.C.; project administration, A.J. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.