Backtesting the Lee-Carter and the Cairns-Blake-Dowd Stochastic Mortality Models on Italian Death Rates

The work proposes a backtesting analysis in comparison between the Lee-Carter and the Cairns-Blake-Dowd mortality models, employing Italian data. The mortality data come from the Italian National Statistics Institute (ISTAT) database and span the period 1975-2014, over which we computed back-projections evaluating the performances of the models in comparisons with real data. We propose three different backtest approaches, evaluating the goodness of short-run forecast versus medium-length ones. We find that both models were not able to capture the improving shock on the mortality observed for the male population on the analyzed period. Moreover, the results suggest that CBD forecast are reliable prevalently for ages above 75, and that LC forecast are basically more accurate for this data.


Introduction
performed a backtesting analysis on seven different stochastic mortality models with results showing that the models performed adequately by most backtests.The analysis was applied to English and Welsh male mortality data.We decided to perform a backtesting investigation using Italian mortality data.The decision was motivated by the study of the historical mortality trend, observed on the forty-past-years horizon for both the male and female populations.The gap between genders deeply decreased over the considered horizon with steep improvements in male mortality.Thus, the first aim of this paper is to scrutinize the forecast proposed by the models for both sexes, which have experienced different mortality evolutions.Moreover, in the last three decades, mortality projections have been widely used by Italian policy-makers for making decisions about public pension reforms.The study of mortality risk, intended as the uncertainty in future mortality rates as well as longevity risk for the long-term trend in mortality rates (Cairns et al. 2006), played a central role for both public and private annuity providers.For these reasons, among all the principal stochastic mortality models 1 , we chose to compare Lee-Carter (LC) and the Cairns-Blake-Dowd (CBD) ones.In particular, the Italian National Statistics Institute (ISTAT) adopted the original formulation of the LC model to forecast mortality over the projection horizon 2007-2051 (Istat 2008) now updated 2 over the horizon 2011-2065.The National Association of Insurance Companies (ANIA) uses those projections as demographic basis for annuity computations (ANIA 2014).Therefore, we chose to compare the original formulation of the LC model to the original CBD since they also represent the two most used parametric families of mortality models.
On the one hand, the Lee-Carter model has sparked a deep methodological revolution in the field of demographic forecast, particularly in mortality.The mortality model has been used together with a similar fertility model and deterministic migration assumptions to generate stochastic forecasts about the population and its components.These stochastic population forecasts, in turn, have been used as the key component of stochastic projections of the finances of the US Social Security system.The stochastic forecast avoids some of the problems inherent to using the classic scenario method for representing forecast uncertainty (Lee 2000).Then, in concurrence with the main demographic applications, the LC model suggested: • an important research front on problems related to the parameter estimations (Booth et al. 2006), with many applications also in the actuarial and economics literature (Loisel and Serant 2007); and • extension of the forecasting analysis with disaggregated projections on demographic subsets to maintain consistency at the aggregate level (Lee and Miller 2001;Li and Lee 2005;Li 2010).
On the other hand, the Cairns-Blake-Dowd model, even if more recent in its formulation than the LC model, has played an important role in forecasting mortality at higher ages (i.e., ages starting at 60 and over).The mortality model made great contributions for pension funds, life-insurance companies and private annuity providers in general.It is mainly used for pricing longevity bonds as suggested also by the authors in the first formulation of the model (Cairns et al. 2006).
The second aim of this work is to analyse the medium-length forecast with respect to the short term, observing potential differences in the parameter estimations (Mavros et al. 2014)  3 accordingly with changes in the starting point of the database.Chan et al. (2014) have also studied the new-data-invariant property on the quality of the CBD mortality index.For this purpose, we introduced a new backtesting approach named the jumping fixed-length horizon, which makes short-run projections of five years, "jumping forward" in the historical database by five-year-steps.
Considerations of the backtesting results do not imply a conclusive evaluation of the models, since we perform the analysis exclusively for the range of ages 57 to 90.The choice for the interval of ages was motivated by the fact that, in Italy, Ragioneria dello Stato computes the so-called transformation coefficients for pension annuities, starting from age 57.Moreover, since the CBD model is recommended as a good predictor of mortality at higher ages, we chose this interval of ages to make a more prudent and accurate comparison between the models.Furthermore, we decided to take into consideration only death probabilities q x,t among all of the other possible biometric functions.
We used 4 death probabilities q x,t provided by ISTAT spanning the period 1975-2014.Then, over the designated horizon of historical data, we select the "lookback" and the "lookforward" windows 5 , respectively, for the parameter estimation and forecast.In particular, the length of the forecast window will be different for each of the three backtesting approaches proposed by the work: • fixed horizon backtests: lookback and lookforward windows of 20 years; • jumping fixed-length horizon backtests: lookback window of 20 years and lookforward window of 5 years (short-term projections); and • rolling fixed-length horizon backtests: lookback window of fixed-length (20 years) and a contracting lookforward window from 20 to 2 years of projections.
The paper is organized as follows.Section 2 briefly presents the models and the adopted terminology, Section 3 shows the historical Italian mortality data, and Section 4 and subsections explain methodology and the backtesting results obtained by the different approaches.Section 5 provides conclusions.

5
For the sake of simplicity, we decided to adopt the same terminology used by Dowd et al. (2010a).

The Lee-Carter Model
We took into consideration the original formulation of Lee and Carter (1992), represented by the following model equation: where m x,t is the central rate of mortality at age x and at time t, and it is given by the formula: with 6 d x,t representing the number of deaths that occurred between x and x + 1, and L x,t called the age units living in x, which is simply the average number of individuals alive between x and x + 1.
For simplicity, the model was implemented by adopting its logarithm transformation: with the following parameter interpretations: Appendix A illustrates the method adopted for the estimation and projection of the parameters.

The Cairns-Blake-Dowd Model
We considered the original formulation of the model provided by Cairns et al. (2006) with the following model equation: where t and k (2) t are two stochastic processes and represent the two time indexes of the model; • q x,t and p x,t represent, respectively, the death and the survival probability, at time t for an individual aged x; • ln q x,t p x,t = ln (φ x ) = logit q x,t is the logit transformation of q x,t , with φ x representing the mortality odds; • x is the mean age of the considered interval of ages; and • ε x,t is the error term that encloses the historical trend that the model does not express.All of the error terms are i.i.d following the Normal distribution with mean 0 and variance σ 2 ε .
The model is fully identified, so it does not require additional constraints.
Moreover, the time index k (1) t is the intercept of the model.It affects every age in the same way, and it represents the level of mortality at time t.More precisely, if it declines over time, it means that 6 The variables d x,t and L x,t are the common biometric functions as described in the life tables.the mortality rate has been decreasing over time at all ages.The time index k (2) t represents the slope of the model: every age is differently affected by this parameter.For instance, if during the fitting period, the mortality improvements have been greater at lower ages than at higher ages, the slope period term k (2) t would be increasing over time.In such a case, the plot of the logit of death probabilities against age would become steeper as it shifts downwards over time (Pitacco et al. 2009).
Appendix B illustrates the estimation and projection methods involved.

Case Study: Italian Mortality Data from 1975 to 2014
The application of the presented models requires the use of the death probabilities time series for extrapolating mortality forecast.As already mentioned, we use data provided by ISTAT because these data are commonly used by private insurance companies and public pension providers.The range of ages is 57 ≤ x ≤ 90.In particular, we chose the upper limit for taking into consideration the ISTAT graduation method of ending the life table (Istat 2001).The calculation of the probabilities of dying for ages over 95 is performed by extrapolating the q x,t graduated values following the Thatcher et al. (1998) model7 : x ≥ 95. (3) This kind of graduation could affect the backtesting results, comparing realized data with forecasts obtained by applying the LC (1) and the CBD (2) models, since they offer a different mortality pattern at old ages.For the ages from 5 to 94, ISTAT uses a moving average of crude rates with the length of seven values.Moreover, we selected the time period from 1975 to 2014 because, from in the mid-seventies in Italy, the successful fight against cardiovascular diseases began.More recently, efforts against tumors, which are still the main cause of death, have been launched.These successes have contributed to an extraordinary acceleration of growth in life expectancy, especially at higher ages: e.g., from 1975 to 2014, life expectancy at 60 years has seen an average increase of about four hours each day, both for men and women.In the male case, this phenomena extraordinarily occurred.Previously, life expectancy at birth had registered the first significant increase due to the control of infant and child mortality, while during the years under review, it has also benefited from the control of adult age mortality.
Currently, the probability of reaching an old age for a young adult is really high: for a 30 year old, the probability of reaching the age of 60 is almost 94% for males and 96.4% for females.However, it remains difficult to reach the threshold of 90 years, especially for men.Table 1 accurately shows8 how this probability changed starting from age 50.Moreover, it shows how the difference in probability between genders became greater as the age increased.
This process is known as the rectangularization and shift forward of the survival curves.Its measure can be derived from the entropy of a life table (Equation (4)).It was introduced by Keyfitz and Caswell (2005) and it is referred to in this paper as t H K,ξ with ξ the age by which the survival curve is built, and t the year of the period life table at which the entropy is computed (in our case t = 1975, 1976, ... 2014).Then, where l j is the probability of surviving from age 9 ξ (ξ = 0, 1, ..., w ; l ξ =1 ∀ ξ) to age j (j = ξ + 1, ξ + 2, . . .w).The entropy index becomes smaller whenever the survivorship curve l j moves towards a rectangular form; in this limit case, t H K,ξ = 0. Figure 1 shows how the trend of the rectangularization process has changed according to ages (i.e., from ξ = 50 to ξ = 65, 75).Regarding women, this process was already in place before 1975.In particular, starting from ages 50 and 65, it is continued with a substantially linear continuity.In the case of men, the rectangularization process begins to escalate smoothly after 1984.However, the following trend shows a deep reduction of mortality, from which is derived an attenuation of the inequality between sexes even though it has not disappeared.In Figure 1, t H k,ξ shows that the mortality improvement in the elderly population has taken place at different rates over time, particularly with a faster steep decline for both sexes after 1993.The differentiation of the pace in reducing mortality of both sexes starting from adult age up to those who are old is confirmed by the results of the Kullback and Leibler (1951) divergence: 9The starting point for the final age interval is denoted by w.
where h z and g z are the probability distributions of the "time until death" random variable Z ξ for a person aged ξ, respectively, for males and females.Equation ( 5) measures the "difference" between these two probability distributions, which, in our case, is taken as the reference model g z .The choice is motived not only by the fact that mortality is significantly lower for women than for men, but also because the continuous decline of female mortality in the reporting period occurred much more regularly (Maccheroni 2014).The divergence in mortality between genders mortality has different characteristics depending on the considered age group.
Figure 2 shows that the divergence in mortality between sexes presents different characteristics, depending on the observed age.In particular, until 1981, the divergence gradually increased on the full range of ages.At a later time, differentials in mortality between sexes decrease whenever x is lower than 60, while they progressively increase at higher ages.These diverging trends make the application of the models interesting, especially for the comparison of results.Needless to say, the mortality forecast will be more accurate for women than men because women experienced a death risk reduction process with greater regularity than men.

Backtesting Analysis
In this section, we introduce the three different backtesting frameworks, and we present the related forecast results.

•
The jumping fixed-length horizon backtests make short run projections of five years10 and keep fixed the length of the "lookback" horizon (20 years), but make jumps of five years ahead to cover the "lookforward" interval, 1995 ≤ t ≤ 2014.This analysis is divided into four groups of estimations and forecast, described in Table 2.

•
Finally, the rolling fixed-length horizon backtests keep fixed the length of the "lookback" horizon (i.e., 20 years) and let it roll ahead year by year.The projections are made over the remaining horizon, keeping fixed the last year of the projection at t = 2014.This analysis is divided into nineteen groups of estimations and forecast, described in Table 3.  1975-1994 1995-1999 1980-1999 2000-2004 1985-2004 2005-2009 1990-2009 2010-2014 Table 3. Rolling fixed-length horizon backtests data horizon.

Lookback Lookforward Lookback Lookforward
The numbers in parentheses show the length of the "lookforward" horizon.Moreover, they indicate the position of the year 2014 over the related projection interval.This will be particularly useful for the analysis of results that will be presented in Section 4.3.
Before going in depth about the backtesting analysis, we check for the estimation quality of the models over the historical "lookback" interval, 1975 ≤ t ≤ 1994.For this purpose, we use the index Λ 2 x , a form of R 2 that particularly fits our case (Draper and Smith 2014), described as follows: where q f t x,t is the fitted value for the q x,t and n is the total number of considered years (i.e., n = 20).The index provides the proportion of the temporal variance explained by the model for all 57 ≤ x ≤ 90. Figure 3 shows that both models fit the observed data generally well.Particularly in the case of males, the share of the "explained variance" at any age is always greater than 88%, while, for females, in the case of LC, it falls to 85% at x = 63.However, such a decrease takes place within a very limited age range between 61 and 65 years.More specifically, by the analysis of the "explained variance" for both models, we see that the irregular path of the curves may be influenced by a cohort effect before the age x = 80.This effect is diagonally observable on the graphs in Figure 4 for those individuals aged 57-59 in 1975 and 76-78 in 1994, respectively.These are the generations born during the First World War (1915War ( -1918) ) who, in the course of their lives, have experienced higher mortality at the same ages than the previous and next cohorts (Maccheroni 2016).For ages older than x = 75, the differences between the models are sharply evident.In particular, LC overestimates q O x,t and CBD underestimates (Figure 4).Analysis of the projection results that will be presented in the next section shows that the described cohort effect has an impact on the forecast quality of the models in two ways.

•
Both models slightly suffer the cohort effect for both populations over the projection horizon (1995-2014) for the same cohort aged 77-79 in 1995 that is no longer observed from 2006.
In particular, both models show an underestimated forecast for such birth cohorts on both sexes with observed values above the upper limit of the confidence interval for some ages of the cohort.This occurred particularly for males.

•
The observed male q x,t for individuals aged 57-59 in 1995 and 76-78 in 2014, respectively, are often under the lower extreme of the forecast confidence interval.It seems that models have replicated the cohort effect over an homologous cohort in 1995, but since the male mortality evolution has changed consistently from 1975-1994 to 1995-2014, the two homologous cohorts (i.e., 57-59 in 1975 and 57-59 in 1995) showed different trends that lead to forecast errors.This scenario does not occur for females, since women experienced a more ordinary mortality evolution.Therefore, the homologous cohorts are similar, so the bias is not observable.
For these reasons, the results obtained with the three backtesting approaches need to be evaluated, taking into consideration the analysed cohort effect and its related impact on the forecast.In particular, forecasts seem to suffer the cohort effect as long as the data used for the estimation of the parameters take into account years from 1975 to 1985.After 1985, the cohort effect is small compared to the overall sample; therefore, projections do not suffer greatly from it.
4. 1. Fixed Horizon Backtest (1995-2014) The first backtesting analysis takes into account a forecast horizon that is demographically considered a medium-term projection horizon.The comparisons proposed are among the most likely values of q P x,t prediction, which is the projected central value derived by the model on which we constructed the 95% confidence interval and those observed q O x,t ; comparisons between the central value and extremes of the confidence interval occur only for the ages 65 and 85.These are the ages that in the demographic literature mark the entrance in the range of so-called "young-old" and "oldest-old".Unfortunately, due to space limitations, it was not possible to present the comparison to the age of 75, which divides the old from the "young-old" (Vaupel 2010).
The q O x,t can present a strong temporal variability due to the observed cohort effect and to the so-called "period effect", which is the time condition that affects mortality via a variety of factors.Among these, the best known is the climatic effect that can, for instance, cause a rise in mortality at old ages during a very hot summer (e.g., an episode occurred in Italy in 2003), or epidemiological effects that arise from flu in winter in low-mortality countries.Needless to say, the impact of those factors is stronger on the most vulnerable people.For this reason, a rise in mortality due to those factors is generally followed by a decrease in mortality, since those who remained alive have a lower frailty level.These mortality shocks can affect short-term forecasts rather than medium-term ones, since the latter are usually more capable of capturing changes in environmental and socio-economic conditions and people's lifestyles.From an applicative point of view, particularly focused on the insurance and social security sector, we were interested in analysing the performance of the models on assessing the risk of death at various ages.It is from this point of view that we are going to develop our analysis.For this purpose, we make a brief assessment of forecast errors that was performed using as an index the Root Mean Squared Errors (RMSE), defined as follows: where the mean squared errors (MSE) are equal to the sum of squared errors adjusted for the residual degrees of freedom υ.Moreover, q O x,t and q P x,t are, respectively, the death probabilities observed and forecast (projected).We use the root of the adjusted SSE to take into account the difference in the number of free parameters between the models.Table 4 shows RMSE for the first and the second backtesting approach that will be presented in the following section.Moreover, it takes into account exclusively the central value of the confidence interval as the most relevant for pension policy-makers and annuity providers (Whitehouse 2007).
Table 4 shows how the LC model proposes a more accurate forecast with respect to the CBD model for the period 1995-2014 for females; it is more difficult to judge the models' performances for males given the small difference between the RMSE results.These predictions are produced on the extrapolated parameters k t (Appendix Equations (A7) and (A9)), but the result is made more flexible by the stochastic component of the models that allows building of the forecast confidence interval.One cause of error can arise from the fact that the central value of the projection may be shifted with respect to the observed data, even though it does not differ from the observed trend recorded over the projection horizon.Figure 5 provides a graphic explanation of the phenomenon.In particular, for individuals aged 65, the male forecasts 1995-2014 are above the mortality trend observed for the same period, with divergent paths for the LC model.In the female case (age 65), only the CBD model shows divergences.However, these deviations may be instead very low, as in the case of the LC model for females aged 65, or in the case of both models for both sexes aged 85 (Figure 5).Moreover, the bias due to the continuing fluctuations of the risk of death over time has to be taken in consideration.
The confidence interval provided by the two models takes into account this stochastic component of mortality (Figures 7 and 9), although this may occur with different levels of precision (Figure 10).
Figures 6 and 8 show the overall error dynamic highlighted by the ratio between the projected q P x,t and the observed q O x,t .As far as men are concerned, the LC model initially overestimates q O x,t from ages 57 to 80 (approximately), with persistence across years.In particular, the overestimation errors become sharply evident as the projection is extended to the last year of the forecast horizon.Figure 6 multi-dimensionally shows the ratio between projected and the real death probabilities.The described LC performance trend is also graphically reported by the Figure 7, comparing projections at ages 65 and 85 to the observed data.The overestimation starts decreasing from age 80, pointing out that the divergence between q O x,t and q P x,t is really close to zero.However, for high ages at the extreme of the interval, LC forecasts systematically underestimate q O x,t .x,t ratio.
As for women, the divergence between q O x,t and q P x,t is sharply smaller than for men.This is particularly evident in Figure 6, which shows that the forecast initially underestimates real data converging at the age 65 and then starts overestimating for a wide span of ages.Furthermore, the last part of the age range is again characterized by an underestimation path.However, the overestimation experienced at higher ages is smaller than the one observed in the male case.The CBD forecast greatly overestimates the male mortality historical evolution, particularly for the central and last years of projection.The error is evident in the full range of ages, although it becomes smaller at the age 80, after which forecasts start underestimating q O x,t with an increasing magnitude until the last age and the last projection year (i.e., x = 90 and t = 2014) (Figure 8).
When we look at the female case, the accuracy of the CBD forecast is worse.In this case, in fact, we can notice a wide and systematic underestimation on approximately all of the first half of the age range for almost the totality of the forecast horizon.In particular, the forecast error reduces around the age 68, then it starts overestimating until x = 85, after which it underestimates again.However, at x = 85, the forecast is relatively accurate, with values of q O x,t all inside the confidence interval (Figure 9).
In conclusion, both models make similar forecast errors.On the one hand, regarding males, the error is represented by an initial overestimation that smoothly converges to the real data and then starts underestimating, although the divergences experienced with the CBD model are characterized by a smaller variability with respect to the LC On the other hand, the female case shows an initial underestimation converging to the real data and then a fluctuation of overestimation and final underestimation.In general, the LC model provides a better fit over a wide range of ages, showing lower variability in both over and underestimation.In any case, the choice between the models becomes difficult at particular ages.Figure 10 shows the high and low confidence intervals for both models.Even though LC curves are nested into the CBD lines with greater differences shown in the male case, both models' confidence intervals include the observed data, providing theoretical robustness to the projections.

Jumping Fixed-Length Horizon Backtests
From the results shown in Table 4, it is clear that the two models best capture the trend of female mortality.More specifically, the accuracy of the prediction about the next five years, using the periods 1975-1994 and 1990-2009 as the database, is far higher than the other two sub-groups of forecast.On the contrary, neither model shows the underestimating and overestimating path of the q O x,t at various ages, which was peculiar in characterizing the result in the previous backtesting case.Only the results of the CBD model show a similar pattern, although in this case with overestimates and underestimates staggered by age differently from period to period.This point will be discussed in detail hereafter.
The analysed models should be assessed on a long-term prediction, but in this case, it is particularly noticeable how a change in the starting point of the time series makes the models differently incorporate the changes in mortality that occurred in the past 20 years.This is generally accomplished through the parameter estimates, which are also reflected throughout the extrapolation process associated with the model.However, an estimation procedure cannot guarantee a priori a constant performance of the forecast.This is also due to the fact that the dynamic of mortality varies in accordance with a multiplicity of social factors that affect the life of every person.Unfortunately, mathematical models are not always able to capture such factors11 ."We conclude that the deviations from exponential law at young ages can be explained by heterogeneity, namely by the presence of a subpopulation with a high initial mortality rate presumably due to congenital defects, while those for old ages can be viewed as fluctuations and explained by stochastic effects" (Avraam et al. 2013, p. 1).Now, we analyse the immediate effects of these estimates, starting with the LC model (1).The parameters α x and β x are time-independent age-specific constants, so their estimations will depend on the historical period used as the database and do not need to be predicted.The k t index captures the time-series common risk factor in that same period, showing the main mortality trend for all ages at time t.Forecasts are produced by extrapolating the time index k t , and the mortality projections at each age are all linked together by the product12 β x k t (1).
In this backtesting framework, the shift forward of the database shows a continuous decline in mortality provided by the estimates of the parameter α x and k t .Moreover, the estimations for the parameter β x referring to the male case show greater values at the beginning of the age range (57 ≤ x ≤ 90) than at the end.This result describes a greater decrease in mortality for those ages with respect to the others, at which β x presents smaller estimated values (Figure 11, male).This scenario is in line ex ante with the historical experience.However, the forecast for the period 2000-2004 shows a systematic overestimation of the q O x,t for both men and women until the age x = 80 (Figure 12, LC model).Taking into consideration the female case, the estimates of the parameter β x are more susceptible to changes in the starting point of the time series.Figure 11 shows this for the female case.Needless to say, the female β x trend improved the accuracy of the forecast for the periods 1995-1999 and 2010-2014 (Figure 12, LC model).In the case of CBD model ( 2), the presence of two time-varying parameters k (1) t should increase at least a priori the forecasting performance with respect to the LC model.This result is evident for the male forecast in the short-run (Table 4).As mentioned, in the CBD model, the k  x,t ratio: comparison between models.Note: the curves represent the average of the q P x,t q O x,t ratio over the five-years forecast horizon.
In this case, the jumps of five years ahead do not seem to affect the k (1) t trend.This is also evidenced by the substantial continuity of the overall reduction in mortality.This is not the case as far as the the k (2) t mortality index is concerned.Its path drafts the slope of the logit-transformed mortality curve.An increase in k (2) t entails an increase in the steepness of the logit-transformed mortality curve, which means that mortality at younger ages i.e., those below the mean age x (here x = 73.5)improves more rapidly than at older ages.This is clear on the right-hand side of Figure 13.Regarding the male case, we find that the speed of increase in k (2) t is greater for the periods 1985-2004 and 1990-2009 than for the other two.For this reason, the projected q P x,t shows stronger improvements in mortality for the periods 2005-2009 and 2010-2014 than for the others, particularly for the ages lower than x = 69.More in depth, results show an underestimation of the q O x,t for the ages lower than x = 69 and a smooth overestimation path for those higher.Despite the fact that the growth of k (2) t between 1980 and 1999 is higher than that of 1975-1994 and that the reduction of k (1) t is greater, we find that q P x,t sharply overestimates q O x,t in the period 1995-1999 and particularly in 2000-2004 for the full range of ages.Regarding women, k (2) t presents similar records to men, whereas for the period 1990-2009, the growth rate of k (1) t is slightly attenuated.In contrast with the male scenario, in this case, q P x,t systematically and significantly underestimates q O x,t from the age x = 57, converging gradually to the observed data as x moves towards x.Moreover, Figure 14 shows that the underestimations are larger for the projection periods 2005-2009 and 2010-2014.This error path does not influence the forecasts of ages higher than x that generally overestimate q O x,t .In particular, for ages higher than x, forecasts of the periods 1995-1999 and 2010-2014 show better results than those of the other two projection windows (Figure 12, CBD model).
Hence, comparatively, we conclude that a good result for the performance index RMSE (Table 4) can hide some compensation for the forecast error in terms of age and time.Figure 14 graphically shows the described scenario.x,t ratio: comparison between models on the same gender.Note: the curves represent the average of the q P x,t q O x,t ratio over the five-years forecast horizon.

Rolling Fixed-Length Horizon Backtests
Finally, the analysis concludes with the study of the forecast convergence to the observed q O x,t in the year 13 2014.For this reason, we build a framework of 19 groups of estimations and projections, rolling the database (fixed-length of 20 years) sequentially forward from 14 1975 to 1993.Then, we compare the 2014 forecast obtained in each group with the realized mortality for that year.We observe that the comparison enhances the same critical issues analysed in the previous paragraphs, with particular emphasis on two main aspects.
Firstly, scrolling the database over time year by year gives rise to strong fluctuations in the performance of the prediction measured by the ratio of q P x,t and q O x,t .These oscillations (Figures 15  and 16) are evident for both sexes in the results of both the LC and the CBD models.Moreover, the trend is interrupted by a deep break in correspondence of the 1985-2004 database.In particular, the previous base (1984)(1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003) provided a strong overestimation of q O x,t especially at old ages.The base 1985-2004 data has then reduced the size, while the next one (1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005) moved closer to q O x,t .One the one hand, this result may be related to the cohort effect described at the beginning of paragraph 4, since the cohort effect is proportionally greater on the base of data including years before 1985.On the other hand, they can be partially justified by also recalling that the year 2003 was characterized by a sharp rise in mortality, especially at old ages.Therefore, this historical event may have affected the estimated parameters.However, in the male case, both models systematically underestimate q O x,t when the age is lower than x = 73, and overestimate when it is higher.This result is particularly evident when the "lookback" horizon is 1985-2004, and also for the following cases.In particular, CBD underestimates when the database refers to the period 1981-2000.However, for the period 1985-2004, the divergence becomes greater compared to the LC model (Figures 15 and 16).As is shown, the choice of the database plays a crucial role in forecasting mortality.
Figure 15 shows the ratio between the projected and observed death probabilities for the year 2014.Table 3 shows the projections obtained for that year on each pair of "Lookback" and "Lookforward" horizons. 13The choice for the year 2014 was motivated by the observed regular mortality path.The 2015 mortality trend is expected to be increased, particularly at old ages (Istat 2016). 14These represent the initial years of the 20-year-long database; i.e., 1975 refers to the estimation period 1975-1994, and so on.In particular, the sub-case index of the graph shows the position of that year on the projection horizon (i.e., 20 means that the year 2014 was the 20th year of the projection, 19 means the 19th, and so on ).Since the dataset is rolling over time and decreasing the projection horizon, we decided to show the position of the year 2014 to take into account both the specific sub-case and the related length of the forecast horizon.Figure 16 shows the same for the CBD model with an inverted order of sub-cases for males to better show the shape of each curve.
Secondly, we detected substantial differences between the performances of the two models by analysing female mortality.Figure 16 shows how the CBD model systematically underestimates real mortality until the age of 75 and then starts converging to q O x,t after that "threshold" age.This result, which was already evident in the previous analysis, is likely linked to the combined effects on the CBD model ( 2) of the role of the mean age x (in our case x = 73.5) of the age group, for which the forecast is made, and of the observed female mortality pattern.These results are also confirmed from the analysis of the confidence interval referred to the forecast.Figure 17 shows that, in the female case at age 65 (t = 2014), q O x,t is always outside the confidence interval, while at age 85, it is inside with central values almost converged to the real data in each sub-case (Figure 18).In the case of the LC model, the initial underestimation of the q O x,t is much less pronounced with respect to the previous case.Moreover, the "threshold" age, with respect to which the forecast underestimates and then overestimates q O x,t , increases as the database moves forward (Figure 15).

Conclusions
The main aims of this paper are to scrutinize the forecast for both sexes proposed by the original formulation of the models, given the wide use of LC at the national level, and to analyse the long-term forecast with respect to the short term, observing qualitative differences in the estimation of the parameter accordingly to changes in the starting point of the database.
Regarding the former, we find that, basically, neither model was able to capture the shock in terms of improvements on the male mortality trend, with greater biases for ages lower than x = 75, which were those more affected by the improvement.In this sense, CBD forecasts for those ages are more biased than LC projections in terms of overestimations.The limited capacity of the models to predict male mortality is evident in all of the three backtesting frameworks.Table 4 numerically summarizes the difference in terms of performances between sexes for the first two backtesting approaches.In addition, the analysis of the forecast for the year 2014 that we provided with the third approach confirms this result.Moreover, women's forecasts are widely more accurate than men's, with small biases observed both in the short and the medium-term.However, in the female case, CBD projections showed particularly deep and systematic underestimations with respect to ages lower than 75.
From the comparison between the short-term and the medium-term forecast, we find that changes in the starting point of the database widely affect the estimation of the LC parameters, particularly for β x with observable impacts on the projections.The female forecasts are more influenced by those changes in β x .The CBD model satisfies the "new-data-invariant" property for the estimation of the parameter k (1) t , while k (2) t presents persistent changes for the same year as the dataset slides forward.This aspect is more evident in males than in females.In particular, the adjustment of the parameter k (2) t (i.e., x − x) affects mortality forecasts with weights of the opposite sign at the extremes of the considered age range.The weight is greater the larger the age range.This structural characteristic of the model, albeit simultaneously with k (1) t , results in a systematic underestimation of the q O x,t for ages lower than x that gradually decrease as x moves towards x.Moreover, mortality forecasts around x are almost exclusively explained by k (1) t , since (x − x) is really close to 0 in that case.On the contrary, as x gets closer to the upper limit of the age range, the weight of (x − x) on mortality forecasts changes with the opposite sign, with resulting overestimation of the q O x,t .For these reasons, the risk in terms of application of the models is conspicuous because it could potentially affect both the mortality risk and the longevity risk.Taking into consideration the variability of both the parameters β x (LC) and k (2) t (CBD), it is difficult to judge a priori what these two rigidities penalize more in the mortality forecast.
As far as the CBD model is concerned, we find that projections are not reliable for describing mortality at ages before x = 75.For this reason, LC projections are preferable for describing Italian mortality in this particular framework of years and ages.However, CBD forecasts showed a more restrained variability of the forecast error at higher ages with respect to LC.This result and the fact that usually the CBD confidence interval at higher ages is wider (i.e., LC is nested in CBD) than LC ones provide a more accurate theoretical robustness to the CBD for ages greater than x = 75.
We would like to make clear that we examined the models in their original form, so we cannot rule out the possibility that some extensions of the models might resolve these issues on Italian data .In particular, we expect that the results of both models may be improved with the adoption of the model extensions, including a cohort component, in order to reduce the bias caused by the cohort effect of those born during the First World War.Moreover, the CBD extension, including the quadratic term of the age component, may solve the weighting issue of the model over the considered interval of ages on this data.
In conclusion, the results seem to be relevant for private and public Italian annuity providers that use LC forecasts as demographic bases.From this perspective, the choice between the two models may vary in accordance with the purpose of the use of the model (e.g., the age and the sex of the insured).Even though we limited our analysis to the study of the forecast q x,t , we can infer that a backtesting analysis of annuity prices, based on the forecast obtained by the original formulations of the models, would show evidence of a distortion caused by the forecast error on the money's worth of an annuity and on reserves.

Figure 2 .
Figure 2. Kullback-Leibler divergence with respect to Z ξ at selected ages.

Figure 5 .
Figure 5. LC and CBD models: comparisons between observed and forecast mortality trends.

Figure 6 .
Figure 6.Lee-Carter Fixed Horizon Backtest: q P x,t q O

Figure 7 .
Figure 7. LC Fixed Horizon Backtest forecast: comparison between observed death rates and the corresponding 95% confidence interval of the forecast based on the time series 1975-1994.

Figure 9 .
Figure 9. CBD Fixed Horizon Backtest forecast: comparison between observed death rates and the corresponding 95% confidence interval of the forecast based on the time series 1975-1994.

Figure 10 .
Figure 10.Fixed Horizon Backtest forecast: comparison between CBD and LC confidence intervals at age 85.
the level of the mortality curve, after the logit transformation.A reduction in k(1) t entails a parallel downward shift of the logit-transformed mortality curve, which represents an overall mortality improvement.In particular, this is what occurred in practice, with greater effects for the female case that are enhanced by the smooth divergences of k(1) t trends between sexes.This is clear on the left-hand side of Figure13below, in which we also checked for the new-data-invariant property of the model(Chan et al. 2014).

Figure
Figure 12.LC and CBD q P x,t q O

Figure
Figure 14.LC and CBD q P x,t q O

Figure 15 .
Figure15.LC Rolling Fixed-Length Horizon Backtests:q P x,t q O x,t

Figure 16 .
Figure16.CBD Rolling Fixed-Length Horizon Backtests:q P x,t q O x,t Figures 17 and 18 show the convergence of the projections to the observed data for the year 2014, at ages 65 and 85.The x-axis shows the position of the year on the forecast horizon as before.Figures 19 and 20 present the same for the Lee-Carter model.

Table 1 .
Proportion of persons aged 30 and expected to be alive at selected ages.

Table 4 .
Root Mean Squared Errors (RMSE) between observed q O x,t and forecast q P x,t .