A Parametric Factor Model of the Term Structure of Mortality

The prototypical Lee-Carter mortality model is characterized by a single common time factor that loads differently across age groups. In this paper we propose a factor model for the term structure of mortality where multiple factors are designed to influence the age groups differently via parametric loading functions. We identify four different factors: a factor common for all age groups, factors for infant and adult mortality, and a factor for the “accident hump” that primarily affects mortality of relatively young adults and late teenagers. Since the factors are identified via restrictions on the loading functions, the factors are not designed to be orthogonal but can be dependent and can possibly cointegrate when the factors have unit roots. We suggest two estimation procedures similar to the estimation of the dynamic Nelson-Siegel term structure model. First, a two-step nonlinear least squares procedure based on cross-section regressions together with a separate model to estimate the dynamics of the factors. Second, we suggest a fully specified model estimated by maximum likelihood via the Kalman filter recursions after the model is put on state space form. We demonstrate the methodology for US and French mortality data. We find that the model provides a good fit of the relevant factors and in a forecast comparison with a range of benchmark models it is found that, especially for longer horizons, variants of the parametric factor model have excellent forecast performance. JEL Classifications: C1,C22,J10,J11,G22


Introduction
The Lee and Carter (1992) (LC) model has become a benchmark model when estimating and forecasting improvements in age-specific death rates and the calculation of life expectancy.The model is basically a one-factor model that allows for a single common time trend with age-specific loadings.The model has been extended in many different ways.For instance Booth et al. (2002) and Renshaw and Haberman (2003) extend the model with a second common time trend with age-specific loadings.Hyndman and Ullah (2007) developed a functional data approach in which the data are smoothed across age prior to modelling using penalized regression splines and principal component analysis.De Jong and Tickle (2006) use the state space framework to establish smoothness in the LC model using bsplines.
Typically, the estimation of factors is implemented via either singular value decomposition or principal component analysis.For models with multiple factors, these are identified via orthogonalization.Subsequently, the factors are modelled as individual time series models which can be used for forecast projections.
The LC model and its extensions are basically statistical models that summarize the variability of the measured age-specific death rates over time in a parsimonious way.No structure is imposed in the model specification.However, in the demographics literature on mortality laws it is well established that age groups are exposed rather differently to death risk and it seems reasonable to believe that separate time factors may affect different age groups rather than assuming a single common factor as in the basic LC model.
Mortality laws for death rates observed at a given time have been suggested by amongst others Gompertz (1825), Makeham (1860), and Heligman and Pollard (1980); Tabeau et al. (2001) provide a review.These laws refer to separate mortality characteristics for different age groups such as infants, youths, adults, and the elderly.When accounting for the dynamic development of mortality over time; it seems natural to consider a factor model that accounts for these mortality laws.In this paper, we assume the presence of multiple factors and impose structure on the loadings via specific functional forms.The approach is similar to McNown and Rogers (1989).However, their model is both heavily over parametrized in terms of latent time-varying parameters, and it does not fully exploit the information contained in the factor dynamics of the model.
The idea is similar to e.g. the dynamic Nelson-Siegel model for the term structure of interest rates, see Diebold and Li (2006).Diebold and Li suggest a factor model with parametrized factor loadings which identify level, slope, and curvature of the yield curve, associated with the long, short, and medium term yields.In the context of the term structure of mortality, we define loading functions that identify the factors that drive infant, adult, and 'accident hump' (youth) mortality respectively, plus a common factor uniformly affecting all age groups.We will generally refer to this class of factor models as parametric factor models (PFM).It follows from this approach that opposed to traditional factor analysis, the factors to be extracted will not necessarily be independent.In fact, the factors may potentially cointegrate when these are found to have unit roots.
We consider estimation of the model parameters and the factors by cross-section regressions over age groups for each period of time.These estimations are conducted over a grid of tuning parameters that define the shape of the loading functions.Next, a least squares criterion is used to determine the desired tuning parameters and the corresponding factor elements.This approach is similar to the first step of the cross section projection procedure suggested in Diebold and Li (2006).After the factors have been extracted, the second step implies the estimation of a time series model for the factors.This can be done in different ways.For instance univariate as well as multivariate models for the factors can be formulated and with the possibility of stationary as well as non-stationary factors that potentially cointegrate.The final time series specification of the factor dynamics is an empirical question and separate time series models are considered for this purpose.
We also consider a fully parametrized model specification formulated as a state space model.By use of the Kalman filter recursions the model parameters and the factors can be estimated by full maximum likelihood.This approach is similar to that of Diebold et al. (2006) for the term structure of interest rates.
The proposed model for women and men is estimated using French and US data for the sample period 1950-2014.The estimated functional forms appear to be rather similar across countries with the duration of the accident hump being longer for men than for women.The shape of the four factors also generally appear similar across countries but with differences across genders.In terms of model fit compared with the LC model, it appears that our model is doing especially well for explaining the age-specific death rates around the age groups defining the 'accident hump'.
We also evaluate the out of sample performance of the model where the predicted mortality rates are summarized in a loss function defined by the life expectancy.Specifically we apply the model confidence set procedure of Hansen et al. (2011) to evaluate the relative forecast performance on the horizons of 1, 10, and 20 years ahead using a number of benchmark models.We find that particularly for long horizon forecasts our model tends to be in the set of best predicting models.
In Section 2, we briefly describe the LC model and provide a detailed description of the mortality data and set up a number of stylized facts of the mortality curve in Section 3. Section 4 introduces the PFM and its interpretation.Section 5 discusses estimation of the PFM and in Section 6 we present the empirical analysis, including the estimation results, the model fit, and the factor dynamics.Section 7 examines the relative out-of-sample forecast performance.Section 8 concludes.

The Lee-Carter Model
The observed data of the analysis is the age specific death rates m x,t for age groups x = 0, 1, • • • , N at year t = t 0 , ...T and is broadly defined as the number of deaths at age x in year t divided by the Exposure-to-Risk, that is the average population aged x in year t.The data used is obtained from the Human Mortality Database (2016).
The Lee and Carter (1992) (LC) model describes the (log) age-specific death rates by: where α x captures the average death rate for each age x. κ t is a common time varying factor capturing the general trend in death rates over time t.β x is the factor loading capturing the effect of the factor κ t on each age group x and ε x,t is the age and time specific error term.The LC model is basically a one-factor model that allows a common time trend to have age-specific loadings with respect to the development of the age-specific log death rates.Lee and Carter (1992) obtain identification using the normalizations N x=0 β x = 1 and T t=t 0 κ t = 0.The constraints imply that α x measures the age specific time-average of the log death rates, ln m x,t .1 To estimate β x and κ t , the singular value decomposition is applied to the matrix (A) xt = ln m xt − α x for all x, t. Lee and Carter (1992) find that κ t can be modelled as a random walk with drift, although they allow for other specifications as well.The LC model is designed to maximize the in-sample fit by fitting a general factor model structure to the death rates.Note that the LC model does not impose any particular structure on the age-specific graduation of mortality, which essentially is data driven.However, it imposes a rigid structure on the improvements of the age-specific death rates over time by requiring these to be proportional and governed by the single factor κ t .

Stylized Facts of the Mortality Curve
A good mortality model should desirably account for both the age (cross section) dimension of mortality as well as its development over time, i.e. the time dimension.Here, we describe some stylized facts of the (log) death rates to be modelled.For illustration, we use data for France and USA available from the Human Mortality Database (2016).2 The age dimension: To illustrate the age dimension properties, which a good mortality model should be able to capture, we show the log mortality on 10 year intervals from 1950 to 2010 for men and women in Figures 1a to 1d for the US and France.The mortality curve shows a similar shape over the ages but the level of mortality tends to decline over time; the shape is very similar across both genders and countries.The infant mortality is seen to decline rapidly during early childhood.In the late teens the mortality rate experiences a rapid increase often termed the 'accident hump', which appears either as a distinct hump or as a flattening out of the death rates, see Heligman and Pollard (1980).After the accident hump, the mortality rates are gradually increasing with age (log-linearly).Thus, for a model to produce realistic results, these three facts should hold for each year.The three properties could also be interpreted in terms of biological reasonableness as described by Cairns et al. (2006), which rules out patterns that are biologically unreasonable such as a decreasing mortality curve for the older as well as the crossing over of age-specific mortality rates.1950, 1960, 1970, 1980, 1990, 2000, and 2010 for men and women in France and the USA.
The time dimension: When investigating the time dynamics in the development of the log age-specific death rates, the review paper by Wong-Fupuy and Haberman (2004, 56) notes that "There is a broad consensus across the resulting projections: (1) an approximately loglinear relationship between mortality rates and time, (2) decreasing improvements according to age".The first point helps to explain the success of the LC model where the common time-varying factor is found to evolve almost linearly in most applications, see e.g. Lee and Miller (2001) and Callot et al. (2016).The log-linear development of death rates over time is illustrated in Figures 2a to 2d.The second observation of decreasing improvements in mortality with respect to age can be described by the so called compensation effect of mortality, see e.g.(Gavrilov andGavrilova, 1979, 1991). 3In Figures 2a to 2d this effect is seen by a slope of the log mortality-time plot that decreases with age.Several studies find that a unit root is present in the individual age-specific death rates, see for instance Lazar and Denuit (2009).Also, it is common that the time-factor of the LC model is modelled as a random walk with drift.Basically this means that all death rates are governed by the same stochastic time trend component and hence for a system of N + 1 age groups, all death rates cointegrate pairwisely and a total of N cointegrating relations exists among all age-specific death rates.
To examine this feature of the LC model we have conducted cointegration tests of all the pairwise combinations of log mortality across age groups.Note that due to the dimension of the data a full cointegration analysis cannot be conducted for the full data set.Figures 3a and 3d report a heatmap of the P-values from Johansen's trace test, Johansen (1991), of the null hypothesis of zero cointegrating relations against one cointegrating relation for all combinations of the log age-specific death rates.As seen we cannot reject the null of no cointegration for most of the death rate pairs, especially for US data.It is apparent that most of the cointegrating relations found are between the adjacent ages, i.e. along the diagonal line.For both countries we find clear rejection of no cointegration amongst the youngest ages, but not for newborns.For France, rejection of no cointegration among the oldest ages is found to a larger degree.Further, it is found that around the accident hump and for the infants we cannot reject the null for relatively adjacent ages.Thus, overall the figures clearly show that the assumption of the LC model of N cointegrating relations is not consistent with the mortality data.We note that this is also consistent with Lazar and Denuit (2009) Doornik (1998Doornik ( , 1999) ) and shown for significance levels between 0 and 0.10.
Summarizing, we observe seven stylized facts for the term structure of mortality that a good mortality model should be able to reproduce: 1) declining mortality for infants, 2) increasing mortality around the accident hump, 3) log-linearly increasing mortality with age for adults, 4) a log-linear relationship between the death rates and time, 5) the log age-specific death rates are integrated of order one around a linear trend, 6) decreasing improvements in mortality with age, and 7) multiple stochastic trends characterize the development of log mortality over time for the different age groups.

The Parametric Factor Model for the Term Structure of Mortality
The model we propose assumes that mortality is driven by multiple factors and we impose structure on the factor loadings capturing the regularities discussed in the previous section.
The PFM reads as follows: The model has four factors β it , i = 0, 1, 2, 3 with loading functions that are designed to capture distinct age groups.β 0t is a factor that is common to all age groups.The factor β 1t captures child mortality, β 2t , the accident hump, and finally, β 3t is a factor that tends to increase mortality with age.Note that the common factor has the constant loading one for all age groups.The loading for infant mortality declines rapidly with age.The loading for the accident hump is approximately bell-shaped around age k and finally the loading for the adult factor grows almost linearly with age when λ 3 is close to one.The error term ε x,t is assumed to be IID normally distributed as N (0, σ 2 ) for all ages and years. 4The loading functions estimated for France and the US are shown in Figures 4a and 4b; the estimation procedure will be discussed in the next section.Even though it may be claimed that the functional forms of the loading functions are arbitrary, they are designed such that the mortality laws and stylized facts, described in Section 3, are captured through the model specification.
The level and infant terms β 0,t and β 1t e −λ 1 x , respectively, are used in many models using the age-specific graduation of mortality, see e.g.Siler (1979) and Rogers and Little (1994).
The adult factor can be seen as a generalization of the Gompertz model, inspired by the Box and Cox (1964) power formulation.That is, the loading function captures the Gompertz specification if λ 3 = 1.  Figure 4: Plot of the estimated loading functions for the years 1950-2014 for men and women in France and USA.The loading functions correspond to the level, infant, accident hump, and adult age groups, respectively.The loadings are estimated following the two-step procedure described in Section 5.

Estimation Procedure for the Parametric Factor Model
We consider two estimation procedures for estimating the PFM, the two-step procedure of Diebold and Li (2006) and exact maximum likelihood estimation using the Kalman Filter recursions of the model written on state-space form similar to Diebold et al. (2006).Alternatively, one could use maximum likelihood estimation following Brouhns et al. (2002) assuming a Poisson distribution for the death counts.

The Two-Step Estimation Procedure
The two-step procedure considers first to estimate the model parameters and the factors of the model and second, estimating a time series model of the extracted factors with the primary purpose of forecasting.Regarding the first step, McNown and Rogers (1989) propose to estimate the factors by nonlinear least squares for each point in time, hence giving a time series of the factors.This allows not only the factors but also the model (loading) parameters to be time-varying.McNown and Rogers (1992) fix the parameters of the model a priori and estimate the factors in a sequence of cross-section regressions.The latter procedure is also the one adopted by Diebold and Li (2006) when estimating the dynamic Nelson-Siegel model for the term structure of interest rates, where the different loadings refer to the level, slope, and curvature of the yield curve.
We suggest to modify McNown and Rogers (1992) and Diebold and Li (2006) by considering cross-sectional regressions at each time t for a fine grid of the model parameters and select the preferred model by minimizing the conditional sum of squares function.Hence, for a given set of loading parameters the factors can simply be estimated by using ordinary least squares for each year.This can also be implemented by a nonlinear least squares optimization algorithm.Here we use the limited memory BFGS procedure ("L-BFGS-B") developed by Byrd et al. (1995) and implemented via the R package 'Optim' (R Core Team, 2015).This step provides estimates of the four factors of the model.Note that as opposed to traditional factor models, generally the estimated factors will not be orthogonal and in fact are most likely to be dependent.In the second step of the two-step procedure timeseries models are fitted to the factors.These can be univariate time series models such as ARIMA model specifications, possibly with drifts or trends, or the factors can be modelled as stationary or nonstationary VAR models which potentially can allow for cointegration amongst the factors.It is an empirical question to properly select a time series model in the second step.

One-Step Estimation
The parametric factor model in Equation ( 2) can be formulated on state-space form and estimated by maximum likelihood by use of the Kalman Filter, see e.g.Durbin and Koopman (2012).This estimation procedure improves on the two-step estimation procedure by allowing joint estimation of the latent factors and their transition dynamics as well as the unknown parameters λ 1 , λ 2 , λ 3 and c, assuming Gaussian errors.Estimating the system jointly delivers the correct inference compared with two-step approaches, which ignores the uncertainty and estimation errors from the first step.
The measurement equation of the state space model can be written as: where .
As in Section 4, the vector error term ε t is assumed to be normally distributed as N (0, Iσ 2 ), where I is the identity matrix.
The transition equation of the state space model should be formulated to capture the dynamics of the factors.For instance, if we assume that the factors are governed by a VAR(1) process in first differences, the transition equation can be specified as: where v t is multivariate normal distributed as N (0, Σ).
In the case where the factors cointegrate with r cointegrating relations, the transition dynamics can be written as: where v t is multivariate normal distributed as N (0, Σ).Both α and γ are 4 × r matrices.
The second row gives the desired VECM specification for the transition dynamics: Note that the constant c is treated as a state parameter within the Kalman filter.Estimation of the parameters ψ = [λ 1 , λ 2 , λ 3 , k, σ, Σ, Φ (or α, γ), ] is achieved via numerical optimization of the prediction error decomposition of the likelihood function: where v t is the one step (innovation) prediction error of the measurements equation and F t is the innovation covariance matrix of the measurement equation.The numerical optimization is performed via the low-memory BFGS procedure "L-BFGS-B" from Byrd et al. (1995) in the R package Optim (R Core Team, 2015).
6 Empirical Analysis

Estimates Using the Two-Step Procedure
Figures 4a and 4b in Section 4 display the estimated shape of the loading functions for French and US men and women based on the two-step procedure.Table 1 reports the estimated shape parameters and their standard errors.The estimated parameters are similar across countries.However, the loading functions for the adult curvature is more convex for women than for men.Similarly, the shape and location of the accident hump varies across genders with men suffering from the accident hump longer than for women.A number of insights follow from these figures.The factor governing the common mortality level decreases almost linearly and thus capturing a common decline in mortality across all age groups; this applies for both genders and countries.The infant factor for both men and women decline over the period showing that the infants have seen larger improvements in mortality reduction compared to the general level captured by the first factor.Moreover, it can be seen that the decline for the infant factor stagnates around 1995 for all populations considered.Hence, after 1995 the development in mortality for infants has generally followed the common rate.
The accident hump factor shows an increase in size from 1950 to about 1990 followed by stagnation for all but US men.Regarding the development of the adult factor, Figures 5d and 6d exhibit an upward slope over the sample period and hence reducing the mortality improvements for the relevant age group.Thus, slower improvements in mortality with age is captured by the model, in line with the stylized facts previously reported.

Cointegrating Analysis of the Factors
In order to use the estimated model for forecast projections we need to examine the time series features of the estimated factors β it , i = 0, 1, 2, 3.By using a range of unit root tests we find strong empirical support for the presence of unit roots, possibly with a drift, in all of the factors considered across both countries and gender.Given this observation it is not surprising that the age-specific log death rates individually appear to have similar time series characteristics.The one-factor model of Lee and Carter (1992) also typically model the factor as a random walk with drift.From visual inspection of the factors in Figures 5 and 6 it is evident that the various factors tend to co-move across gender and thus the factors are likely to cointegrate.Accounting for cointegration amongst the factors will potentially lead to superior forecasts.
We have conducted cointegration analysis using the Johansen (1988) trace test for different subsets of factors.In Table 2 we report the test results for each country and for each gender using all four factors.In Table 3 we examine tests for each country using all eight factors for both men and women, and finally, Table 4 displays the tests for men and women, respectively, by merging the factors across countries.
The results are rather different for the USA and France as can be seen from Table 2.For both genders the US factors are found not to cointegrate and hence these factors are driven by four separate common stochastic trends.On the other hand, the factors for French men and women appear to cointegrate with two or three cointegrating vectors and thus factors for each gender appear to be driven by a single or possibly two common stochastic trends.This finding is also in line with the heat maps reported in Figure 3 showing that for France the pairwise log mortality rates appear more cointegrated compared to the USA.Note: The Johansen trace test is calculated with a trend restricted to the cointegration space.The number of lags in the VAR is 1 for all cases based on HQ, SIC and Portmanteau tests (Hannan and Quinn, 1979;Schwarz, 1978;Ljung and Box, 1978)."**" and "*" signify significance at the 1% and 5% level, respectively.
In Table 3 the set of variables in the VAR model is expanded to include both men and women for each country.In this case the eight factors for the US data are driven by four common stochastic trends.It is tempting to believe that the factors cointegrate across genders, however, a formal statistical test rejects this hypothesis.For the French data the eight factors have six to seven cointegrating vectors and thus have one or two common stochastic trends.Again, a formal test rejects that the factors cointegrate pairwisely across genders.
Finally, Table 4 shows that when pooling the US and French data for men and women respectively, both the male and female factors are likely to be driven by 6 factors and thus have two cointegrating relations.Hence cross-country similarities exist across countries for both genders but only to a limited extent.
These findings demonstrate that different time series specifications should be considered when modelling the factors with the purpose of forecasting.For the US it seems appropriate to specify a VAR in first differences with a vector of unrestricted constants to capture the drift of the single series.It could also be considered to base predictions on an expanded (cointegrated) VAR model including factors for both genders.For France, a cointegrated VAR with cointegration rank two or three seems appropriate.An expanded cointegrated VAR with eight factors and six to seven cointegrating vectors is also possible.When modelling the factors as univariate time series models a random walk with drift specification is appropriate but since the cross dependence of factors is neglected in this case, it is likely that inferior forecasts will result.Note: The Johansen trace test is calculated with a trend restricted to the cointegration space.The number of lags in the VAR is 1 for all cases based on HQ, SIC and Portmanteau tests (Hannan and Quinn, 1979;Schwarz, 1978;Ljung and Box, 1978)."**" and "*" signify significance at the 1% and 5% level, respectively.Note: The Johansen trace test is calculated with a trend restricted to the cointegration space.The number of lags in the VAR is 1 for all cases based on HQ, SIC and Portmanteau tests (Hannan and Quinn, 1979;Schwarz, 1978;Ljung and Box, 1978)."**" and "*" signify significance at the 1% and 5% level, respectively.

Estimates Using the One-Step Procedure
We now consider the one-step estimation of the model employing maximum likelihood estimation via the Kalman Filter recursions with the model specified on state space form.This method theoretically improves the efficiency as it avoids the issue from the two-step estimator of ignoring the estimation error from the first step in the second step.The estimation for US is based on the assumption of a VAR(1) in first differences for the transition dynamics and for France it is based on the cointegrated VAR model with 2 cointegrating relations.These specifications of the transient dynamics are chosen in accordance with the results reported in Section 6.2.Table 5 reports the estimated shape parameters and their standard errors.It is seen that the loading parameters and the variance are very similar to those obtained from the two-step procedure.Furthermore, the standard errors of the estimated shape parameters are found to be very close to those of the two-step method (sometimes smaller).This indicates that small efficiency gains can be obtained by using the one-step procedure.Figure 7 shows the estimated factors (or states) for the one-step state space estimation procedure for US based on the VAR(1) specification in differences.The factor estimates for the VECM specification for France are shown in Figure 8.
When comparing the estimated factors with those obtained in the first step of the two-step approach, the results appear similar.However, the factors from the one-step estimation show a smoother development because the one-step procedure directly accounts for the transition dynamics in the estimation.

Model Fit
We now compare the PFM with the LC model in terms of in-sample fit.
As the PFM does not include a constant for each age-specific death rate, we are interested in whether the model can capture the mean by relatively few parameters.As seen in Figures 9a to 9d, the model captures the mean well for all populations.Note that by construction α x in the LC model is equal to the mean of the age specific log death rates, which corresponds to the data mean levels in the figures.To further quantify the model fit, we calculate a pseudo R 2 for each age group by running a regression of the age-specific death rates on a constant and the fitted values. 5The pseudo R 2 's shown in Figures 10a to 10d display that both the LC and the PFM fit the observed data well.However, the PFM tends to perform better around the accident hump, where the LC model is found to have poor performance.Next, we investigate how each of the factors contribute to the explanatory power of the model by calculating the partial correlation between the log mortality and a particular factor after adjusting for the influence of the fit obtained from the remaining factors.This adjustment is necessary because the factors are non-orthogonal.The Figures 11a to 11d display the partial correlations in excess of a 65% threshold for all ages to identify where the different factors improve the fit.
It is seen that the infant mortality factor significantly improves the fit for infants as desired.
The level factor substantially improves the performance for most ages, and the accident hump factor primarily affects the mortality in the years around the accident hump.Finally, the adult factor primarily improves the fit for the adult ages as desired, but its partial explanatory power is of a smaller magnitude compared with the other factors, mainly because the adult factor is highly correlated with the factor common to all age groups.

Forecast Evaluation
In this section we investigate the forecast performance of the PFM and compare with relevant benchmark models.For forecast evaluation and comparison we use the Model Confidence Set (MCS) approach developed in Hansen et al. (2011). 6  The MCS procedure is a test for predictive ability across a number of competing models, which sequentially removes the model that performs significantly worse than the remaining models left in the model confidence set.The procedure delineates the set of best performing models at a given confidence level among which we cannot say that any of the other models perform statistically better.
Hence, the MCS does not necessarily pick out a single best model but rather delineates a set of best models as the available information might not be able to discriminate between these models.The MCS procedure returns p-values, pi , for each model i considered, and from this the MCS can be determined.The MCS procedure returns a p-value of 1 to the best performing model.7 To reduce the dimension of the forecast evaluation, we calculate the life expectancy at birth which aggregates the forecasted age-specific death rates into a single measure.The life expectancy is calculated by using the standard assumption of a constant chance of death within each age interval as in Brouhns et al. (2002): where m x,t signifies the age-specific death rates and ē↑ x (t) is the (period) life expectancy.We use the life expectancy at birth, i.e. setting x = 0.
To show the robustness of the proposed model at producing reliable forecasts, we consider data for men and women for the USA and France in the forecast evaluation.The forecasts are constructed by recursively estimating each model from 1950 onwards until year t = 1970, 1971, ... and forecasting 1, 10, and 20 years ahead.This gives 43, 34, and 24 forecasts of the age-specific death rates for each model, respectively.The forecast performance is evaluated using the mean squared error of the life expectancy as the loss function.For implementation we use the block bootstrap with a block length equal to the longest significant lag length from fitting an AR model and a confidence level of 5%, see Hansen et al. (2011) for details.
As benchmark models we use 1) a random walk with drift (RWD) specification for each (log) age specific death rate, 2) the Lee and Carter (1992) model with a single factor, 3) and the functional data approach (FDA) of Hyndman and Ullah (2007).Based on the analysis in Sections 5 and 6 we consider two dynamic specifications of the factor structure, a VECM (with two cointegrating relations) and a VAR(1) in first differences of the factors.For comparisons we use both specifications for each country and gender estimated by the twostep procedure.For the one-step procedure we consider estimation assuming the VECM structure for France and the VAR(1) structure in first differences for the US.Using the twostep procedure we further compare a VAR(1) model in levels and univariate ARIMA models in the dynamic specification.8Based on the finding of unit roots and trending behaviour for each of the factors we decide to use a random walk with drift specification as ARIMA model specification.9For the LC model we use a random walk with drift specification for the single factor κ t .The FDA model of Hyndman and Ullah (2007) can be considered an extension of the LC model by using K factors and smoothing across the death rates. 10The results are reported in Table 6 for France and in Table 7 for the USA.
Summarizing, the PFM class of models appears to perform especially well for longer forecast horizons and in most cases performs better than the LC model.An explanation for this result could be the structural features of the PFM class of models compared to the LC model.For longer horizons the structural restrictions on the loadings account for different factors affecting the separate age groups.The structure implied by the PFM specification ensures a realistic shape of the mortality curve, which cannot be captured by a single factor LC model.Another conclusion is that in situations where competing models are performing well, especially for longer horizons, the different PFM models also perform well.On the other hand in situations where competing models are not performing so well, the PFM models are included in the MCS as seen especially for men.

Concluding Remarks
We have suggested a multi-factor model for the term structure of mortality.The factors are identified after restrictions on the loading functions in such a way that different age groups and their factor dynamics can be addressed separately.So rather than having a single factor governing all age groups as for the LC model, different factors (or trends) play a role in the way that mortality across age groups develop.In particular, we consider separate factors driving infant mortality, the accident hump mortality, mortality for the elderly in addition to a common factor affecting all age groups.We have suggested two estimation methods that are similar to estimation of term structure models considered in other contexts.In an application we apply the methodology to mortality data for the US and France for each gender.The models are shown to provide good fit and for certain age groups provides a much better fit compared to the LC model.In a forecast comparison across a range of competing models the new class of models that we consider in the paper are shown to perform well, especially over longer forecast horizons.

Figure 2 :
Figure 2: The log age-specific death rates for a range of ages for French and American men and women from 1950-2014.
Figure 3: P-values from Johansen's Trace test are shown for all pairwise combinations of the (log) agespecific death rates for French and US men and women over the period 1950-2014.The test is performed with a restricted time trend in the cointegrating relation and 1 lag in the VAR specification.The P-values are obtained via the gamma approximation followingDoornik (1998Doornik ( , 1999) ) and shown for significance levels between 0 and 0.10.

Figure 5 :Figure 6 :
Figure 5: The factors βit, i = 0, 1, 2, 3 are estimated by the two-step procedure for France using data from 1950-2014.The plots are showing the level factor, infant factor, accident hump factor, and adult factor for both genders, respectively.

Figure 9 :
Figure9: The mean of the data and the mean of the parametric factor model estimated using the twostep procedure for both men and women, for France and USA.The estimation period is 1950-2014.

Figure 10 :
Figure10: The pseudo R 2 (within) for the PFM and the LC model for all ages estimated using the twostep procedure.The R 2 is shown for both men and women in France and the USA, respectively.

Figure 11 :
Figure11: The partial R 2 for the infant, level, accident hump, and adult factor estimated using the twostep procedure for the years 1950-2014.The relative improvements from each of the factors are shown in excess of a 65% threshold.This is shown for both genders for France and the US, respectively.

Table 1 :
Estimated loading function parameters and standard errors from the first step in the two-step procedure for French and US men and women.The standard errors are calculated using the inverse Fisher information criterion.

Table 2 :
Test for cointegration rank amongst factors for US and French men and women.

Table 3 :
Test for cointegration rank amongst factors for men and women for USA and France.

Table 4 :
Test for cointegration rank amongst factors for US and French men and women.

Table 5 :
Estimated loading function parameters and standard errors from the one-step procedure for French and US men and women.For US the VAR(1) model in first difference is assumed for the transition dynamics and for France a VECM with 2 cointegrating relations is assumed.The standard errors are calculated using the inverse Fisher information criterion.