Estimation of Uncertainty in Mortality Projections Using State-Space Lee-Carter Model

: The study develops alternatives of the classical Lee-Carter stochastic mortality model in assessment of uncertainty of mortality rates forecasts. We use the Lee-Carter model expressed as linear Gaussian state-space model or state-space model with Markovian regime-switching to derive coherent estimates of parameters and to introduce additional ﬂexibility required to capture change in trend and non-Gaussian volatility of mortality improvements. For model-ﬁtting, we use a Bayesian Gibbs sampler. We illustrate the application of the models by deriving the conﬁdence intervals of mortality projections using Lithuanian and Swedish data. The results show that state-space model with Markovian regime-switching adequately captures the effect of pandemic, which is present in the Swedish data. However, it is less suitable to model less sharp but more prolonged ﬂuctuations of mortality trends in Lithuania.


Introduction
Mortality projections are used by insurance companies, pension providers and public policy makers for forecasting expected payments of benefits contingent on human lives (such as insurance benefits and pensions) as well as their distribution. Specifically, insurance companies use such projections for pricing of products, solvency calculations and risk assessments. With the upcoming new international insurance accounting standard, the allowance for risk will be required in the financial reporting as well.
Lee-Carter stochastic mortality model is one of the most popular models applied in practice for forecasting expected human mortality and its distribution. Lee and Carter [1] models mortality rates according to the following formula: where m x,t is central mortality rate (ratio of the number of deaths to the exposed to risk) for age x in year t, α x , κ t , β x are parameters depending on age x or on year t, and E x,t are independent and identically distributed (i.i.d.) random residuals with zero means. In its original form and according to the later modifications, see Brouhns et al. [2] or Lee and Miller [3] for instance, Lee-Carter model supposes two stage estimation procedure. First, parameters α x , κ t and β x are estimated using the matrix singular value decomposition (SVD) method or by fitting a Poisson bi-linear regression. Secondly, parameter κ t , which might be interpreted as time varying index representing the general trend of human mortality improvements, is modelled separately as time series, most often as a simple random walk with drift (RWD): where µ is the drift (or trend) parameter and ν t are i.i.d. residuals with zero means. Despite its popularity Lee-Carter model has several limitations. The model ensures adequate fit only if the underlying data contains a stable mortality trend. For example, Figure 1 shows the results of fitting classical Lee-Carter model to mortality data which either has a volatile trend or includes an outlier. In the Lithuanian case the standard Lee-Carter model produces projections with unacceptably wide confidence intervals due to volatile historic mortality experience which poorly fit a linear trend assumption. The country experienced a sharp increase in mortality rates after the collapse of the Soviet Union (that was experienced by many ex-USSR countries) which was followed by a gradual decline in mortality rates after the country's independence and the accession to the EU. Consistent Swedish mortality improvement was distorted by the effect of 1918-1920 Spanish flu pandemic. Thus, the estimated variance of the Swedish projections averages the variance observed during the periods of usual mortality development and that of the pandemic period. Figure 1b shows that just one outlier of the year 1918 has a major effect on the estimated confidence intervals of the projected time varying index.
Standard Lee-Carter model has limited possibilities of dealing with such distortions and usually the key practical recipe of achieving adequate fit is to cut the data used for model-fitting to the latest stable mortality period, see Gylys and Šiaulys [4], Lee and Miller [3], Tuljapurkar et al. [5]. In addition, the classical Lee-Carter model projections do not allow for parameter uncertainty. Some authors, e.g., Brouhns et al. [6], Koissi et al. [7], applied bootstrap methods to estimate the effect of uncertainty in estimated parameters. However, the results of such analysis are affected by the general limitations of the Lee-Carter model as described above.
State-space time series models (SSMs) (see Durbin and Koopman [8], Harvey [9], West and Harrison [10]) defined by relations where vector y t is observed, vector x t is unobserved state parameter vector, and p is an arbitrary distribution. SSM offers a natural opportunity to extend the Lee-Carter model both in terms of the alternative coherent fitting procedure and the possibility to flex the model by introducing additional parameters to ensure a better fit. A linear Gaussian state-space model, often called Dynamic Linear Model (DLM) is one the most popular SSM in applications This model can be represented by the following two equations: where ε t ∼ N(0, H t ) and η t ∼ N(0, Q t ) are the Gaussian errors and Z t , H t , U t , Q t are the parameter matrixes. Thus, we can express Lee-Carter model in state-space representation by treating Equation (1) as observation equation and Equation (2) of the time varying index κ t as an the state equation. State-space models can be traced back to highly influential paper by Kalman [11]. Although initially these methods were applied predominantly in engineering and signal processing, over time it became a recognized statistical modelling technique. Examples of application of the SSMs in mortality analysis are presented by Fung et al. [12], Kogure and Kurachi [13] and Pedroza [14] among others.
As an extension, we applied SSM with regime-switching (SSM with switching). Such models are often referred to multi-process models and are DLMs conditionally on a sequence of indicator variables, which represent varying regimes. In the context of the mortality modelling such regimes may be interpreted as switching between stable and volatile socio-economic development conditions which positively or negatively affect the mortality development. Such models were successfully applied in economics for modelling of business cycles, see for instance Hamilton [15], Kaufmann [16] and Kim and Nelson [17] among others.
Thus, the aim of the paper is to explore the possibilities of SSMs to develop an alternative solution of dealing with the two common general problems, as illustrated above, encountered in practice when modelling mortality. We treat the pandemics, wars, other significant one-off fluctuations in mortality rates as a natural phenomenon to be considered in the projections. We also show how to use flexibility of SSMs to deal with one-time change in socio-economic development, which, for example, was experienced by many Central and Eastern European countries. The focus of the analysis is estimation of confidence intervals of the projected mortality rates, in particular at the tails of the distributions, which is important for insurers' solvency and economic capital models, rather than just the central mortality forecast. The performance of the models is assessed by using the mortality data of Lithuania and Sweden, but the issues and possible application is not limited to these countries. We estimate the model parameters using Gibbs sampler, which is the primary Markov chain Monte Carlo (MCMC) method for SSMs with switching. Output from the Gibbs sampler also allows us to take into account parameter uncertainty. For ensuring the comparability of results, we also use the Gibbs sampler for fitting DLMs. For model comparison we use marginal likelihoods estimated with sequential Monte Carlo technique based on auxiliary particle filter.
The study covers ages from 25 to 74 and is primary aimed at the assessment of mortality risk at insurance companies. However, the methods used can be extended to analysis of mortality of older ages and can be used for assessment of longevity risk, which is particular important for pension products providers. For the analysis we use the general population data from Human Mortality Database for ages 25-74 grouped in 5 year age groups for the following periods: Lithuania 1959-2017, Sweden 1900-2017. Human Mortality Database (https://www.mortality.org) is maintained by the University of California, Berkeley (USA), and the Max Planck Institute for Demographic Research (Germany). Data for our research was downloaded in February 2020.
The paper is organized as follows. Section 2 provides an overview of the models used, the Gibbs sampler as well as the methodology used for model comparison. Section 3 provides the detailed description of the algorithms. Section 4 deals with details of the MCMC diagnostics and the model-fitting and comparison results. Section 5 provides overview of the forecasting results. Section 6 concludes the paper with the discussion of the results, implications for the mortality modelling and areas for further research.

State-space Lee-Carter models
We develop two different specifications of state-space Lee-Carter model: based on DLM and SSM with switching. For comparison we use the results derived by the classical Lee-Carter method fitted using SVD, therefore, some details of the classical model are also provided.

DLM
By using the general expression (3) of DLM, we can formulate state-space Lee-Carter model by the following two equations: In the observation Equation (4), symbol y t denotes vector with N coordinates, where N is the total number of age groups. Each coordinate of this vector x ∈ {1, 2, . . . , N} is the specific centered log-mortality rate Parameter β is a column vector of N age specific parameters β x , which determine the impact of the time varying index on log-mortality rates. Matrix H t is a diagonal N × N matrix with all diagonal elements equal to σ 2 H . In the state Equation (5), element κ t is the time varying index, which represents the general trend of changes in mortality rates with time, µ (I) t is the drift of κ t which continues for the whole fitting period. Coordinate µ (I I) t represents the additional drift element in the time period t > T 0 and symbol 1I denote the standard indicator function. Thus, during the periods t > T 0 the model assumes that the total drift of parameter κ t is µ t . As in the classical Lee-Carter model, we assume that both drift parameters stay constant during the fitting period and we apply the following covariance matrix of errors of the state parameter x t : The above model maintains many of the features of the classical Lee-Carter model. The time varying index κ t follows a RWD. Contrary to the classical Lee-Carter model we allow a one-time change in drift at some time moment T 0 but otherwise drifts are not allowed to vary in time. Such structure ensures that the model is sufficiently rigid, which is important for making long-term forecasts. On the other hand, the feature of one-time change in drift allows us to model major changes in environment, which cannot be modelled in the classical model. As in the classical Lee-Carter model the general trend of mortality modelled with parameter κ t is translated to age specific log-mortality rates via the parameters β x , and the random residuals E x,t , are assumed to be i.i.d.
In our model, we stochastically assess the deviations from the centered log-mortality rates, and the Lee-Carter parameters α x , which represent the general shape of changes in log-mortality rates with age, are expected to be fixed. We set . . , N, as in the most of the classical Lee-Carter model's applications, and deduct it from log-mortality rates before starting the fitting process, see Equation (6). Overall, in applications of the Lee-Carter model, parameter α x has proved to be one of the most predictable model parameters.
For example, in the bootstrap analysis performed by Koissi [7], variation in the estimates of parameter α x was predominantly related to very young ages (below 20 years) and old ages above 80 years, and for the remaining ages was very stable. Therefore, we consider that the simplification of fixing α x parameter does not have a significant effect on the results of the analysis. As in the classical Lee-Carter model we apply the following identifiability constraint for parameters β x : The constraint is applied in step (vi) of algorithm A2 presented in Section 3.2, where the estimated parameters β x are re-weighted with the corresponding adjustments made to the mean and scale of the estimates of κ t . Thus, after the re-weighting the distribution of the term (0, 0, β)x t is unchanged.
The key feature of the SSMs is that they enable estimation of the distribution of each of the unobservable parameters x t , by using the observed data y 1 , . . . , y t . The recursive algorithm, called Kalman filter, which is based on the method of orthogonal projections and uses the features of the conditional Normal distribution, updates the estimate of the mean and variance of x t−1 with the new evidence contained in y t in order to derive the mean and variance of x t . For the equation of the basic Kalman filter, see step (i) of algorithm A1 described in Section 3.1. Thus, assuming that the collection of the model parameters ψ = {β, σ 2 Q , σ 2 H , µ}, µ = µ (I) , µ (I I) , is known, we can run the filter for all t ∈ {1, . . . , T} and derive the distribution of {x 1 , . . . , x T }, which shall contain the mean and variance of time varying index κ t for t ∈ {1, . . . , T}. The detailed description of the method used to estimate the parameter collection ψ is presented in Section 2.3.

SSM with Switching
The basic equations of our state-space Lee-Carter model with switching remain (4) and (5).
The key difference from the previous model is that we assume that the error terms η t , t ∈ {1, . . . , T}, follow a mixture of two zero mean Normal distributions with the following covariance matrixes: each of them corresponding to one of the two regimes. We restrict σ and use a binary index s t : low variance regime s t = 0 and high variance regime s t = 1. Therefore, in SSM with switching the errors of the parameter κ t depending on its parametrization may have much heavier tails with respect to tail of standard Normal distribution.
The switching between the two regimes is assumed to follow the Markov process with constant transition probabilities as introduced by Hamilton [15]. We define two probabilities which imply that: Consequently, the parameter collection of the model with switching instead of one variance σ 2 Q and vector of transition probabilities π = π (0) , π (1) .
Conditionally on values of κ t , t ∈ {1, . . . , T}, we can recursively estimate probabilities of s t = 0 and s t = 1, by taking into account the corresponding probabilities at the time moment t − 1, as well as the likelihood of being in a particular state, which is driven by the deviation of the change κ t − κ t−1 from its expected value µ t : the higher the deviation is, the more likely is that the process has moved to high variance regime. For details see step (ii) of algorithm A3 described in Section 3.3.
Conditionally on values of s t , t ∈ {1, . . . , T}, we can apply the basic Kalman filter as described in Section 2.1. The only difference is that the matrix Q t can vary in different update steps of the filter; however, this does not change the basic operation of the algorithm.
The key benefit from using the SSM with switching is that it enables us to segregate the total variance of κ t , which is generally recognized to be the key source of variability of mortality rates, into the variance during the periods of the "normal" socio-economic development and the variance during the periods such as epidemics, wars, changes economic and political systems, etc. As shown in Figure 1, the development of the general mortality trend may be far from constant, therefore, the classical Lee-Carter approach, which averages the variance of κ t over the fitting periods might miss important features of the real life development, especially when the purpose of the analysis is estimation of the confidence intervals of the mortality forecasts rather than just the central forecast.

Parameter Estimation with Gibbs Sampler
There are two main approaches to fitting of SSMs: maximum likelihood (MLE) and Bayesian Markov Chain Monte Carlo (MCMC). For estimation of the parameter collection ψ of DLM Lee-Carter we can easily apply MLE, see e.g., [12]. However, as noted by Frühwirth-Schnatter [18], in case of SSM with switching the marginal likelihood where both latent processes x and s are integrated out, is not available in the closed form. In addition, MLE does not explicitly allow for uncertainty in model parameters, which is an important element in satisfying our aim to estimate realistic confidence intervals of the forecasts. Therefore, for model-fitting we apply MCMC method called Gibbs sampler, which was also used in some earlier applications of the state-space Lee-Carter model without regime-switching by Fung et al. [12], Kogure and Kurachi [13], Pedroza [14]. The Gibbs sampler draws samples of parameter values from sequentially updated conditional distributions. Thus, if we have a parameter vector Λ = (Λ 1 , Λ 2 , . . . , Λ M ), the Gibbs sampler at each iteration j = 1, 2, . . . would sequentially draw As shown by Geman and Geman [19] under certain mild conditions the distribution of draws Λ j converge to the true distribution of Λ if j → ∞, independently of the starting values of Λ 1 = (Λ 1 1 , Λ 1 2 , . . . , Λ 1 M ). The general difficulty in constructing and using the Gibbs sampler is the requirement at each step to have conditional distributions of the sampled parameters. In our case, conditional distributions of time invariant parameters can be derived from Equations (4) and (5); however, the complication arises of implementing a proper conditional sampler of the unobserved state parameter x t and, in the case of model with switching, the collection vector of regimes s. This issue was successfully resolved by Forward Filtering Backwards Sampling (FFBS) algorithm presented in algorithm A1 in Section 3.1, see also Carter and Kohn [20] and Frühwirth-Schnatter [21] for additional information.
We note that the Gibbs sampler is a Bayesian estimation method, thus in addition to the estimated parameters the conditional distributions at each step of the sampling scheme depend on the selected priors. We denote the collection of priors for DLM and SSM with switching The details of descriptions of the parameter estimation algorithms are provided in Sections 3.2 and 3.3.

Forecasting
As noted in the previous section, Gibbs sampler after its convergence produces draws from the posterior joint distribution of parameters. Thus, after discharging certain number of initial draws (called burn or warm-up phase) we can use the draws as an input for the forecasting simulations of the state-space models. For both models DLM and SSM with switching we simulate the projected mortality rates for periods T + 1, . . . , T + K in the following way.

(i)
Depending on the model, DLM or SSM with switching, sample parameter collection ψ or ψ (s) from the matrix of Gibbs sampler draws obtained during the model-fitting phase after the warm-up. In this step, we are sampling the parameter values from their joint posterior distributions.

(ii)
For SSM with switching, draw sample of regime indicators s for the forecasted period using step (ii) of the algorithm A3 (see Section 3.3). As κ t are not available for time moments t = T + 1, . . . , T + K, multiplication by the probability P(x t | s t ) in position (ii.2) of algorithm A3 may be disregarded.

(iii)
For SSM with switching, using sampled parameters µ and σ Q and regime vector s from step (ii) simulate κ t for time moments t = T + 1, . . . , T + K by supposing that 2 depending on the simulated s.
In case of DLM, use σ 2 Q as a fixed variance.

(iv)
Using sampled parameters β and σ 2 H and sampled vector κ from step (iii), as well as allowing that errors are i.i.d., simulate y x,t for time moments t = T + 1, . . . , T + K and age groups x = 1, . . . , N by supposing that y Calculate mortality rates, having in mind that m x,t = exp{y x,t + α x } for all possible x and t.
In case of the classical Lee-Carter model, we do not have draws from the posterior distribution of the parameters. We can partially allow for parameter uncertainty in the classical Lee-Carter model by allowing for uncertainty of the drift, see [3], the distribution of which follows from the RWD assumption of κ t . Except for µ (I) , for the simulation we use fixed parameters, estimated via SVD and time series fitting process. Therefore, the classical Lee-Carter model provides less comprehensive treatment of parameter uncertainty than SSMs. For the classical Lee-Carter model we simulate the projected mortality rates for time moments t = T + 1, . . . , T + K by the following algorithm.

(iv)
Calculate mortality rates for all possible x and t, by formula m x,t = exp{y x,t + α x }.

Comparison of State-Space Models
For the state-space model comparison, we use marginal likelihood. We apply the following formula by Chib [22]: conditional on the estimated parameters, m ψ is log density of prior at ψ and m ψ | Y 1:T is the log density of posterior at ψ. Term m Y 1:T | ψ can be calculated in the closed form for DLM, but for SSM with switching the likelihood function would be a mixture of 2 T Normal distributions, the direct assessment of which is not practical. Therefore, we use the sequential Monte Carlo method based on auxiliary particle filter of Pitt and Shephard [23] which was adapted to SSM with switching by Kaufmann [16]. The details of the algorithm are provided in Section 3.4.
The term m ψ | Y 1:T we estimate similarly as in [22] by using the sequentially running Gibbs sampler for each of the parameters in the collection ψ = {ψ 1 , . . . , ψ L } and estimating their posterior likelihood at the mean points of ψ from MCMC runs by the following algorithm: At the end we calculate the total posterior log-likelihood: . . , ψ L−1 ).

Forward Filtering Backwards Sampling (FFBS) Algorithm A1
(i) Conditionally on the initial distribution x 1 ∼ N(a 1 , P 1 ), which we assume to be known, for t ∈ {1, 2, . . . , T} run the Kalman filter recursion in order to obtain estimated distributions x t|t ∼ N(a t|t , P t|t ) of unobserved state parameter vector given the observation y t , as well as the distribution of the one step ahead predictions x t+1 ∼ N(a t+1 , P t+1 ).Using the general Kalman filter equations, see for instance, Chapter 4 by Durbin and Koopman [8], we derive the following recursion formulas for the state-space Lee-Carter model mentioned in Equations (4) and (5): where t ∈ {1, 2, . . . , T}, F t is estimated covariance matrix of the observation, ν t is an estimated residual, I is an identity matrix, and the matrix K t is called the Kalman gain.
(ii) Sample x T by supposing that x T ∼ N(a T|T , P T|T ).
(iii) Treat one step forward sample as an observation and apply the Kalman filter to it. According to Carter and Kohn [20], for t = T − 1, T − 2, . . . , 1, we should sample x t by supposing that x t ∼ N(a t|t, x t+1 , P t|t, x t+1 ), with a t|t, x t+1 = a t|t + P t|t P −1 t+1 x t+1 − a t+1 , P t|t, x t+1 = P t|t − P t|t P −1 t+1 P t|t .

Gibbs Sampler Algorithm A2 for DLM
(o) Initiate the sampler using an arbitrary starting parameter collection ψ = {β, σ 2 Q , σ 2 H , µ} and collection of priors conditionally on ψ, and store vector κ of values κ t by applying FFBS algorithm A1 with the initial distribution Simulation results are usually insensitive to selection of κ 0 , provided σ 2 0 be sufficiently large. We use σ 2 0 = 10. (ii) Sample µ = (µ (I) , µ (I I) ) conditionally on κ and σ 2 Q , from the bivariate Normal distribution where ∆κ is a vector of changes in κ t , while X is a matrix of dimension T × 2 with the form where number of the first lines T 0 T is the number of periods before change in trend of κ t occurred.
(iii) Sample σ 2 Q , conditionally on κ and µ, from the inverse Gamma distribution where 1I {t>T 0 } is the standard indicator function.
(iv) Sample β, conditionally on κ and σ 2 H , from the Normal distribution. Considering that we assume that the matrix of measurement errors H t is diagonal, we can sample β x for each age group x ∈ {1, 2, . . . , N} separately: where y x is a vector of centered observations for age group x.
(v) Sample σ 2 H , conditionally on κ and β, from the inverse Gamma distribution: (vi) Reweight β and related parameters to ensure that the sum of elements of β is equal to unit. We proceed as follows: (vii) Collect the updated parameters to collection ψ and proceed to step (i).

Gibbs Sampler Algorithm A3 for SSM with Switching
conditionally on ψ (s) and the regime vector s, and store vector κ of values κ t by applying FFBS algorithm A1 with the initial distribution Simulation results are usually insensitive to selection of κ 0 , provided σ 2 0 be sufficiently large. We use σ 2 0 = 10.
(ii) Sample vector s, conditionally on κ, σ Q , µ and π in the following way: (ii.1) Calculate the initial probabilities of the two regimes by assuming that the filter is initialized with the following stationary Markov chain probabilities: (1) .

(ii.5) Collect the sampled regime indicators to the vector s.
We remark here that steps (iii)-(v) of the presented algorithm are constructed following results from [17].
(iii) Sample π = (π (0) , π (1) ) , conditionally on s, from beta distributions Here:X * = X [AA], ∆κ * = ∆κ A, ∆κ and X are defined in algorithm A2, A is 1 × T vector of numbers 1/σ (i) Q with the index i corresponding to regimes in vector s, [AA] is 2 × T matrix consisting of the doubled vector A, and symbol denotes the element-wise matrix multiplication.
, conditionally on s, κ and µ, from the inverse Gamma distribution.
Sample β, conditionally on κ and σ 2 H , from the Normal distribution. Considering that we assume that the matrix of measurement errors H t is diagonal, we sample β x for each age group x ∈ {1, 2, . . . , N} separately: where y x is a vector of centered observations for age group x.
(vii) Sample σ 2 H , conditionally on κ and β, from the inverse Gamma distribution: (viii) Reweight β and related parameters to ensure that the sum of elements of β is equal to unit. We proceed as follows: (ix) Include the updated parameters to the collection ψ (s) and proceed to step (i).

Particle Filter Algorithm A4 for Marginal Likelihood for SSM with Switching
In this subsection, we provide the details of the auxiliary particle filter algorithm used to evaluate the log-likelihood conditional on the model parameters ψ (s) , which in turn is used to calculate the marginal likelihood.
(o) With the parameter collection ψ (s) = ( β, σ Q , σ H , µ, π) initiate the filter by drawing R times with replacement κ 0 from the output of the Gibbs sampler and generating R draws of s 0 from Bernoulli distribution with probability (1) .
Obviously, in this filter we consider only the last state line of equitation (5) as the drifts are fixed.
t using the derived probabilities in step (i.3).

(i.5) Given the sampled element s (r)
t with the basic Kalman filter, calculate the distribution of the prediction given by the observation y t is based on the model errors σ 2 H not depending on y t−1 . (iii) Using the Markov property of the model find the marginal log-likelihood conditioned on ψ (s) :

Fitting Process
To obtain SSM estimates we run algorithms A2 and A3 5000 times for each of five different starting values, and we discharge the first 1000 draws as a warm-up. For the basic FFBS calculations we use the functions from R package dlm, see Petris et al. [24].
After recording the results of the runs we split each chain into two equal parts and perform MCMC diagnostics by assessing convergence and efficiency of MCMC sampling according to Gelman et al. [25].
Convergence is assessed by calculating for each sampled parameter: is an estimate of parameter's Λ marginal posterior variance, conditional on the data Y, W is the average of sample variances of a parameter within each of m half-chains of length n, and B is the sample variance of a parameter between the half-chains. If R is close to 1 than B is small relatively to W, which implies that there are no major consistent deviations due to different starting values or between early/late samples in a chain and supports the assumption of adequate convergence. Sampling efficiency is assessed by calculating effective sample size: n e f f = mn where ρ t are autocorrelations of parameter samples in the chains. To limit the effect of noise of sample correlations we used T to be the first positive integer for which ρ T+1 + ρ T+2 is negative.
In the Lithuanian case we allow for the additional drift element since 1995 when the country finalized the major reforms and started independent European development. We consider the country's independence a major change in socio-economic development, which significantly altered the trend of mortality development. In the Swedish case, no such changes occurred during the fitting period, thus variation in the drift is not modelled. In summary, the fitted drift parameters may be summarized in Table 1. For the DLM and SSM with switching we use the following collections of priors: We set priors with the aim, as suggested by Gelman, et al. [25] (see p. 55), that the information they provide is intentionally weaker than the actual knowledge that is available. For example, for Normally distributed parameters we set priors with sufficiently large variances, not to introduce undue bias towards the selected prior mean.
For the classical Lee-Carter model we use the standard SVD procedure to obtain estimates of vectors β and κ. In the second step, we model κ as RWD and estimate µ (I) , σ 2 Q using the standard time series methods. Finally, we estimate σ 2 H from the model residuals. We do not use the second stage re-estimation of κ which is applied in the original paper by Lee and Carter [1]. We base our forecast on the least squares fit delivered by SVD.

Parameter Estimates
The calculated convergence and sampling efficiency statistics are provided in Table 2. Table 2. Estimates of convergence statistics R and effective sample size statistics n e f f (presented in thousands) for DLM and SSM with switching model parameters, calculated for five MCMC chains of 4000 iterations after warm-up (total of 20 thousand iterations). 3 n/a n/a n/a n/a π (0) n/a n/a 1.0 0.9 n/a n/a 1.0 4.8 π (1) n/a n/a 1.0 0.8 n/a n/a 1.0 8.2

Parameter
Statistics R shows adequate convergence for both models and both countries. Effective sample size indicates high efficiency of Gibbs sampler for most of the parameters, with lower efficiency for SSM with switching σ Q variances and transition probabilities, especially in the Lithuanian case.
The estimates of the parameters are presented in Table 3. For the state-space models means of simulated parameters from recorded MCMC runs are provided. n/a n/a n/a π (0) n/a n/a 0.537 (0.279) n/a n/a 0.975 (0.017) π (1) n/a n/a 0.512 (0.287) n/a n/a 0.580 (0.202) Overall, the estimates of parameters β x almost do not depend on the model used. which indicates that the two regimes are very different in terms of the volatility of parameter κ t , but this is not the case of Lithuania. Parameters π (0) and π (1) are applicable to SSM with switching models only and they indicate that for Sweden regime 0 has high frequency and regime 1 has low frequency.
In the Lithuanian case both regimes have similar frequency. Thus, we can conclude that in the Swedish case SSM with switching has converged to two distinct regimes, while in the Lithuanian case the movements in κ t were not sharp enough to achieve this. For more details see Section 4.3. Majority of the marginal posterior variances are relatively low. As expected, higher variances are recorded for parameters which are estimated from smaller number of input variables, such as the Lithuanian drifts and regime specific parameters in SSM with switching model.
In Figure 2 we provide the comparison of estimates of time varying index κ t obtained from SVD and from SSM with switching MCMC runs. We observe that in both cases the shape of the basic κ t curve is similar, but in the MCMC runs we allow for uncertainty of κ t under various samples of the parameter collection ψ (s) , thus the values of κ t fluctuate.    In the Lithuanian case, the model has managed to label with "1" the key disturbances in mortality, such as a large increase in mortality during the country's transformation to the market economies at the beginning of 1990s, large spike in mortality during the economic crisis of 2008-2009 and the subsequent recovery. However, the average frequency of MCMC runs when high variance regime label is assigned in those periods barely exceeds 0.5. Poor ability of the model to identify high variance scenarios even after the identifying restriction σ (0) 2 σ (1) 2 was imposed, indicates that this condition is not sufficient to ensure the unique labelling of regimes. In the MCMC runs there is a lot of label switching which indicates that the constraint is poor, see [18]. Thus, in the Lithuanian case our specification of SSM with switching does not provide adequate performance.

Regime-Switching in MCMC Runs
In the Swedish case, the model in most cases stay in regime 0. Regime 1 is visited just during the years of Spanish flue pandemic of 1918-1920 and with much lower frequency in the mid-1940s, when improvement in mortality was observed after the WWII. Contrary to Lithuania, the regimes are labelled consistently in the MCMC runs and the label switching is almost not observed. This indicates that the identifying restriction σ (0) 2 σ (1) 2 is suitable for this model and that the estimated variances of the state parameter are acceptable.

Model Comparison Results
We compare the models used by performing the assessment of the marginal log-likelihood m(Y 1:T ) as described in Section 2.5. The results of the calculations are summarized in Table 4.

Lithuania Sweden
Model The results show that SSM with switching has the highest marginal likelihood in the Swedish case. For Lithuania, both DLM and SSM with switching models have similar marginal likelihoods, thus a simpler DLM is preferred.

Results of Forecasting
In this section, we provide details of the mortality forecasts derived using the SVD, DLM and SSM with switching models. We present the results for two different confidence levels: 95% and 99.5%. Level 99.5% correspond to 1 in 200 years loss event and is used in assessment of insurers solvency in the European Solvency II regulatory regime.
By taking exponentials on both sides of Equation (1), we can represent the modelled mortality rates as follows: which shows that the variability in projected mortality rates can be considered to arise multiplicatively from three sources: variation in constant which represents the increase in mortality with age (which we ignore due to its insignificance as a simplification), variation in trend, mainly driven by the parameter κ t , and the remaining random variation. In the remainder of this section, we first discuss the modelled uncertainty of κ t , and in the next subsection, we consider how this uncertainty translates to the variation of the overall mortality rate.

Projections of Parameter κ t
As can be seen from Equation (2), the uncertainty of the parameter κ t is driven by variation in drift parameterμ and the residual error. The results of the simulations of κ t are provided in Figure 5 for confidence level of 95%, and in Figure 6 for confidence level 99.5%.
In the Lithuanian case, due to unsettled historic experience, the confidence intervals are wide for both SVD and state-space models. Allowance for one-time change in trend results in a more reasonable central forecast; however, it adds additional uncertainty of the second drift parameter µ (I I) . Thus although estimated error variance σ 2 Q is lower in SSM case, the overall confidence interval of κ t is even slightly wider than the one derived by using the classical Lee-Carter model.
In the Lithuanian case, the SSM with switching does not converge to two clearly expressed low and high variance regimes. Therefore, the distribution of error terms modelled as a mixture can be well approximated with Normal distribution. For this reason, the simulated confidence intervals for both models are very similar. The shaded region provides 99.5% simulated confidence intervals of the κ t parameter fitted using SVD, DLM and SSM with switching. In the Lithuanian case, DLM and SSM with switching allow for one-time change in trend in 1995, therefore the direction of the drift differs from SVD estimate. The blue line is used to denote the Lithuanian central forecast estimated using SVD and red line is used to denote the SSM central forecast.
In the Swedish case, DLM gives a narrower estimate of confidence interval with respect to SVD generally due to the better fit and the lower error variance. For Sweden SSM with switching converges into two different regimes: low probability and high variance pandemic regime and high probability low variance usual development regime. Thus, the distribution of the error terms when modelled as a mixture, has much heavier tails than when it is approximated with a single Normal distribution. This results in significantly wider confidence intervals for SSM with switching, especially at 99.5% confidence level. We note that the width on the difference in confidence intervals is driven by model differences, not the poor fit. As shown in model comparison Section 4.4, despite having the widest confidence intervals SSM with switching has higher log-likelihood that DLM and is a preferred model for Sweden.

Uncertainty of Mortality Rates
The results of the simulations of m x,t mortality rates are provided in Figure 7 for confidence level of 95%, and Figure 8 for confidence level 99.5%. Considering that in practical applications we usually need a set of mortality rates for each of the future years in the forecasting period, in our illustration we provide the results for 15 year projection, which is in the middle of the total forecasting period. As we can see in Equation (7), the variation in κ t is translated to changes of the mortality rates via the parameter β x . The higher β x for a certain age group is, the more sensitive are the projected mortality rates of that age group to changes in κ t , and vice versa. As shown in Table 3, for Lithuania the highest β x are for mid-ages and therefore in the state-space models mid-agers get the widest confidence intervals. The Lithuanian SVD calculations, due to relatively poor fit, have a large residual variance, which dominates in the classical Lee-Carter model simulations and results in very wide simulated mortality rate confidence intervals. Thus, despite we did not manage to achieve with SSMs a significant improvement in the confidence intervals of the parameter κ t , the width of the simulated confidence intervals of mortality rates are reasonable and comparable to the Swedish result.
In the Swedish case, the results are consistent with the results of simulation of parameter κ t . Due to better fit, at 95% confidence level the confidence intervals of SSMs are narrower than in case of SVD. At 99.5% confidence level, as expected, the heavy tails of SSMs with swithing prevail, resulting in the widest confidence intervals.

Discussion
Overall, the use of state-space models enabled us to achieve better fit and to derive more reasonable confidence intervals of the projected mortality rates. The methods developed in the paper can be used in common mortality modelling situations when there is a major change in mortality development trend or major fluctuations in mortality are observed. In particular, in the Swedish case the pandemic effect was well captured by SSM with switching and both DLM and SSM with switching allowed us to take into account the Lithuanian change in mortality trend. However, SSM with switching model developed in this paper proved to capture more prolonged and less sharp fluctuations in the level of mortality less successfully and further research is needed in this area.
The analysis also showed that in case of mortality projections, if we want to achieve high confidence levels, we need to accept wide confidence intervals. Historically, mortality experience was volatile, and we cannot preclude this volatility in the future. The analysis also showed that in some cases Normal approximation of the distribution of residuals, which have true mixture distribution, can unjustifiably narrow the confidence intervals of the projections, especially for high confidence levels. This could have negative implications for insurers, who use high confidence intervals in their risk and capital models as application of too narrowly distributed projections may lead to understatement of the economic capital.
The study demonstrated the advantages of SSM in the mortality modelling, in particular in ensuring higher flexibility and more coherent estimation of parameters. Overall, SSM may be used to flex other parameters of the Lee-Carter model. For example, it is possible to model the variance of residual of parameter κ t stochastically as in [12], apply regime-switching to parameter β x , or apply state-space Poisson model as in Chapter 9 of [8] or introduce certain volatility of a drift of time varying index. Considering the recent developments in the area of nonlinear SSMs and sequential Monte Carlo methods the possibilities of their application in mortality studies are very high.
In practice, it also possible to use the Bayesian structure of the model. In this research, we used weakly informative priors to "let data to speak for itself". However, in practice more informative priors are often used, to use experience from peers, introduce expected developments stemming from business plans, or simply apply the expert judgement. Thus, with SSM we can introduce not only greater flexibility of the model but also of its parameters.
Author Contributions: R.G. and J.Š. contributed equally to this work, as well as to its preparation. All authors have read and agreed to the published version of the manuscript