Next Article in Journal
Myopic Savings Behaviour of Future Polish Pensioners
Next Article in Special Issue
Clustering-Based Extensions of the Common Age Effect Multi-Population Mortality Model
Previous Article in Journal
Smart Beta Allocation and Macroeconomic Variables: The Impact of COVID-19
Previous Article in Special Issue
A Study on Link Functions for Modelling and Forecasting Old-Age Survival Probabilities of Australia and New Zealand
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Mortality Forecasting with an Age-Coherent Sparse VAR Model

1
Warren Centre for Actuarial Studies and Research, Asper School of Business, University of Manitoba, Winnipeg, MB R3T 5V4, Canada
2
Department of Actuarial Studies and Business Analytics, Macquarie University, Sydney, NSW 2000, Australia
*
Author to whom correspondence should be addressed.
Risks 2021, 9(2), 35; https://doi.org/10.3390/risks9020035
Submission received: 28 December 2020 / Revised: 22 January 2021 / Accepted: 31 January 2021 / Published: 5 February 2021
(This article belongs to the Special Issue Mortality Forecasting and Applications)

Abstract

:
This paper proposes an age-coherent sparse Vector Autoregression mortality model, which combines the appealing features of existing VAR-based mortality models, to forecast future mortality rates. In particular, the proposed model utilizes a data-driven method to determine the autoregressive coefficient matrix, and then employs a rotation algorithm in the projection phase to generate age-coherent mortality forecasts. In the estimation phase, the age-specific mortality improvement rates are fitted to a VAR model with dimension reduction algorithms such as the elastic net. In the projection phase, the projected mortality improvement rates are assumed to follow a short-term fluctuation component and a long-term force of decay, and will eventually converge to an age-invariant mean in expectation. The age-invariance of the long-term mean guarantees age-coherent mortality projections. The proposed model is generalized to multi-population context in a computationally efficient manner. Using single-age, uni-sex mortality data of the UK and France, we show that the proposed model is able to generate more reasonable long-term projections, as well as more accurate short-term out-of-sample forecasts than popular existing mortality models under various settings. Therefore, the proposed model is expected to be an appealing alternative to existing mortality models in insurance and demographic analyses.

1. Introduction

The ongoing improvement of human life expectancy around the world has made longevity risk, the risk that people live longer than expected, an increasingly important risk for many demographic, economic, and insurance practices. This has urged the academia to better understand the driving factors of mortality improvements, and design effective statistical tools to provide accurate and credible mortality projections.
The past few decades have witnessed rapid developments of mortality forecasting research. Among existing methods, one popular class of model is the so called factor-based models. These models use factor representations to summarize mortality patterns of high dimensional data sets by one or few factors. Mortality projections are then obtained by extrapolating the estimated factors. To date, the most widely used factor-based models include the Lee–Carter model (Lee and Carter 1992) and the Cairns–Blake–Dowd (Cairns et al. 2006) model. Many extensions of these two models have been proposed in the literature (see, among many others, Booth et al. 2006; Li et al. 2015; Li and Li 2017; Li et al. 2019; Renshaw and Haberman 2006). While the Lee–Carter and the CBD models have been demonstrated to be effective in mortality forecasting, they are not able to explicitly model mortality correlations between different populations, as only one population can be fitted at a time. To address the modeling of joint mortality dynamics, Li and Lee (Li and Lee 2005) extend the Lee–Carter model to a multi-population context, in which mortality rates of all populations are assumed to follow a common systematic trend. In particular, mortality projections generated by the Li–Lee model is population-coherent, i.e., the projected mortality rates of different populations at the same age will not diverge in the long run. Since the Li–Lee model, coherent multi-population mortality modelling has drawn increasing attention in the literature (see, for example, Dowd et al. 2011; Hyndman et al. 2013).
While the factor-based models are effective in forecasting mortality rates, especially when the cross-section dimension (age) of the underlying data is large, there are still limitations associated with these models. In particular, these models typically have fixed age effects (i.e., loadings to the factors), and are thus likely to generate diverging mortality forecast at different ages (Li et al. 2013). In other words, while the Li–Lee model is population-coherent, it is not age-coherent. Moreover, as argued by Hunt and Blake (2018), many factors representations suffer from identifiability issue when cohort effects are included. To address this issue, vector-autoregressive (VAR) models have been proposed to study and forecast mortality (Guibert et al. 2019; Li and Lu 2017; Li and Shi 2020). Compared to the factor-based models, VAR models have more flexible parametric structures, and thus could better capture the potentially complicated mortality patterns underlying the data. However, the estimation of VAR mortality models is often challenging because of the high dimension of mortality data sets. In particular, it is often the case that the cross-section dimension is larger than the time dimension, and thus the estimation of unconstrained VAR mortality models is impossible. Consequently, dimension reduction techniques of the coefficient matrix are required. For example, Li and Lu (2017) simplify the coefficient matrix by allowing for only one nonzero period effect and two cohort effects, and the resulting model is referred to as the spatial-temporal (STAR) model. Under this assumption, they derive age-coherent and population-coherent mortality projections, i.e., the projected mortality rates of any two ages in any two populations will not diverge in the long-run. The simple STAR model in Li and Lu (2017) is later extended by many studies, including Chang and Shi (2020a), Chang and Shi (2020b), and Shi (2020). On the other hand, Guibert et al. (2019) adopted a pure data-driven approach for mortality modeling using the sparse VAR model with an elastic-net (ENET) penalty, which we refer to as the SVAR model. Compared to the STAR model, the SVAR model is more objective, as more period and cohort effects can be included in the regression, depending on their explanatory power of the mortality rates modelled. However, the SVAR model focuses on the mortality improvement rates, instead of the original mortality rates, and hence the co-integration relations cannot be explicitly addressed. Consequently, mortality projections generated by the SVAR model are not age-coherent nor population-coherent.
In this paper, we propose a coherent sparse vector-autoregressive (CSVAR) model which combines the appealing features of both the STAR model and the SVAR model. In particular, the proposed CSVAR model utilizes the data-driven algorithms, to determine the period, age, and cohort effects that should be included in each regression, and meanwhile propose a rotation algorithm of the long-term mean of each age-specific mortality series to ensure age-coherent projections. The model is estimated in two steps. First, age-specific mortality improvement rates are fitted to a VAR model with data-driven dimension reduction techniques, such as LASSO or elastic-net. Then, in the projection phase, the intercepts (expectations) of mortality improvement rates are assumed to follow a time-series dynamics and converge to a long-term limit. By letting the long-term limit be age-invariant, the difference between projected mortality rates of any two ages will be bounded in the long-term. Hence, projected mortality rates generated by the proposed CSVAR model are age-coherent. The idea of converging the long-term mean of mortality improvement rates has the same spirit as the mortality rotation method discussed in Li et al. (2013) and Li et al. (2018). Furthermore, the proposed CSVAR model can be extended to modeling the joint mortality dynamics of multiple populations in a computationally efficient manner. In the latter case, the long-term limit of mortality improvement rates will be assumed invariant to both age and population, so the projected mortality rates will be both age- and population-coherent.
In the empirical analysis, the proposed CSVAR model is illustrated using the single-age, uni-sex mortality data of the UK and France in both the single-population and multi-population context. We find that, due to the age-coherent feature, the CSVAR model generates smoother long-term projection of age-specific mortality rates. In particular, the projected mortality improvements are more similar across ages—they are more pronounced for the old ages and less substantial for the very young ages compared to those produced by the Lee–Carter model and the SVAR model. As a result, the proposed CSVAR generates higher point forecasts of life expectancy at birth than the Lee–Carter model and the STAR model at all forecast horizons. Moreover, in the out-of-sample forecast analysis, the proposed CSVAR model is able to produce more accurate out-of-sample forecasting results than the Lee–Carter model (and the Li–Lee model in the multi-population case) and the SVAR model with different choices of sample size and age groups. Furthermore, the proposed CSVAR model produces projected life expectancy closer to the realized values over the out-of-sample forecast period in both populations than the Lee–Carter model and the STAR model. Therefore, it seems that the proposed CSVAR model is able to generate both the long-term forecasts and the short-term out-of-sample forecasts, and could therefore be a appealing alternative to the existing models in life insurance and demographic analysis.
The remainder of the paper is organized as follows. Section 2 reviews the Lee–Carter model and the Li–Lee model. Section 3 reviews the existing VAR-based mortality models, especially the SVAR model, and introduces the age-coherent CSVAR model. Section 4 discusses the empirical analysis. Section 5 concludes the paper.

2. The Factor-Based Model

Suppose we have mortality data of N ages, each age with T years of observations. The Lee–Carter (LC) model (Lee and Carter 1992) summarizes the systematic mortality trends of the N ages by a common factor. Formally, the log central mortality rate at age x in year t, y x , t , follows the specification given by:
y x , t = a x + b x k t + ε x , t ,
where a x is the mortality level, i.e., the average mortality rate over time at age x, k t is the period effect, i.e., the systematic mortality trend common to all ages, b x is the age effect at x, i.e., the sensitivity of y x , t to k t , and ε x , t is the normal residual term with mean 0 and variance σ ε x 2 . As noted by Lee and Carter (1992), Equation (1) is not identifiable without normalization constraints. For example, one could multiply b x with a constant c and divide k t by the same constant, and reach the same fitting results. In Lee and Carter (1992), the following normalization constraints are imposed:
t k t = 0 ,   and x b x = 1 .
Given the first constraint, a x is set to the mean of y x , t over the sample considered. The Lee–Carter model is then estimated by singular value decomposition (SVD) instead of the usual ordinary least square approach in the original paper1. Given the estimated a x and b x , k t is adjusted to match the fitted total number of deaths to the observed values in each year t. This adjustment rebalances the contribution of age-specific mortality by assigning greater weights to ages with a larger number of death.
While a x and b x are assumed to be constant, the period effect is often modelled by a time-series process. In particular, Lee and Carter (1992) assumes a random walk with drift specification, which is adopted by many later studies:
k t = k t 1 + d + e t ,
where the drift term d measures the average annual change in k t , and e t i . i . d . N ( 0 , σ e 2 ) . Based on the time-series specification in Equation (3), future mortality rates can be projected by extrapolating the period effect k t . Specifically, the expected h-step-ahead mean forecasts of the period effect and the log central death rate are given by:
k ^ T + h = k T + h d , y ^ x , T + h = a x + b x k ^ T + h ,
where T is last year of the sample.
The Lee–Carter model is a single-population model, i.e., it focuses on the mortality of one population at a time. This constraint is later relaxed by the Li–Lee model (Li and Lee 2005), which incorporates mortality dynamics of multiple populations simultaneously. The joint modeling of mortality dynamics of multiple populations are important not only in demographic analysis (Li et al. 2019), but also for various insurance practices, including the risk management of insurance policies for small portfolios and the pricing of the innovative index-based longevity-linked securities and retirement products (Chen et al. 2020; Li 2018; Li et al. 2017; Li and Lu 2018, 2019). In the Li–Lee model, the log central death rates are represented by a common factor and a population-specific factor. Suppose there are I populations in total, the log central death rate for the i-th population at age x and year t, y x , t , i , follows the specification given by:
y x , t , i = a x , i + B x K t + b x , i k t , i + ε x , t , i , i = 1 , , I ,
where a x , i is the average mortality level at age x in the ith population, B x and K t represent the common age effect and period effect, k t , i is the population-specific period effect with respect to the ith population, and b x , i is the corresponding population-specific age effect. Finally, ε x , t , i is the normal population-specific error term with mean 0 and variance σ ε x , i 2 . Similarly, a set of normalization constraints are imposed to ensure identifiability:
t K t = 0 , and x B x = 1 , t k t , i = 0 , and x b x , i = 1 .
Similar to the Lee–Carter model, the common period effect K t can be modelled as a random walk with drift process. On the other hand, the population-specific period effects k t , i , i = 1 , , I are fitted by stationary autoregressive processes to ensure coherent forecasts in the long term. Specifically, the time-series specifications of the period effects are given by:
K t = K t 1 + d + e t , k t , i = a 0 , i + a 1 , i k t 1 , i + e t , i ,
where a 0 , i and | a 1 , i | < 1 are the autoregressive parameters and e t , i is the Gaussian error term with mean 0 and variance σ e i 2 . Stationarity of the k i , t processes guarantees that deviations of the projected mortality of each population from each other will not grow infinitely in the long run. Formally, the projected mortality rates are coherent for different populations in the long-run if y ^ x , T + h , i y ^ x , T + h , j = O p ( 1 ) for i j . We see that this condition is indeed satisfied if all k i , t s are stationary, as in this case the long-term mortality trends of all ages and populations are driven by the single common period effect K t , and k i , t s only represent the short-term fluctuations around the common mortality trend. Therefore, the Li–Lee model is indeed able to generate coherent mortality projections across populations. However, as noted by recent studies (for example, Li and Lu 2017), the coherent property only holds for projected mortality rates of the same age. Specifically, for an arbitrary different age z x , the Li–Lee model will not lead to y ^ x , T + h , i y ^ z , T + h , j = O p ( 1 ) for any i and j. Therefore, the mortality projections generated by the Li–Lee model is not age-coherent.

3. The Vector-Autogression-Based Models

As an alternative to factor-based models, VAR-based mortality models have been a rapidly emerging class of mortality models in recent literature. Compared to the factor-based models, the VAR-based are more flexible and thus able to capture more complex time, age, and cohort mortality dependence (Guibert et al. 2019; Li and Lu 2017). However, the application of VAR-based models in mortality forecasting leads to new challenge. In particular, unconstrained VAR models typically have a larger number of parameters to be estimated, while observations in mortality data are often limited. For example, suppose we have N age groups and only one population is considered. Even in the simplest VAR(1) case, if the first order lag of all the N ages are included, then the total number of parameters will be p = N ( N + 1 ) (including N intercepts), while the total number of observations is N T . In mortality analysis, we typically have only a few decades of annual data, while the number of ages could be above 100. Therefore, there are typically more unknown parameters than observations in the standard VAR framework without any parameter constraints.

3.1. The Sparse VAR Model

A recent effort to address the curse of dimenisonality issue in the VAR mortality models is made by Guibert et al. (2019), who employ a sparse VAR (SVAR) model on mortality improvements. First, they model the dynamics of the mortality improvement rates, Δ y x , t = y x , t y x , t 1 , rather than the mortality rates themselves. Under the assumption that ( y x , t ) t is an I ( 1 ) process for all ages, the dependent variables ( Δ y x , t ) t are I ( 0 ) and therefore stationary. As a result, standard VAR models can be used, without the need to consider co-integration relations within mortality data. Second, using an elastic-net (ENET) penalty estimation, the SVAR model adopts a pure data-driven method to select the non-zero coefficients in the estimation process. In this way the coefficient matrix will be sparse, i.e., the majority of autoregressive coefficients will be set to zero. Formally, when only the first order lage is considered, the mortality improvement model is given by:
Δ Y t = M + B Δ Y t 1 + ε t ,
where Δ Y t = ( Δ y 1 , t , Δ y 2 , t , , Δ y N , t ) is the N × 1 vector of mortality improvement rates. ε t is assumed to follow a multi-Gaussian distribution with mean 0. M is an N × 1 vector estimated by the sample averages of Δ Y t . B is the coefficient matrix, and the sparsity (frequency and location of zeros) of B is determined by the LASSO (L1) penalty during the estimation process without any constraints. More specifically, the objective function to be minimized is given by:
( M ^ , B ^ ) = arg min M R N , B R N × N 1 2 t = 1 T y t M B y t 1 2 2 + λ i , j N | β i j | ,
where β i , j is the ith row jth column element of B , and λ is the L1 penalty parameter. A larger value of λ will shrink more β i , j to be exactly 0. The selection of λ is performed via the cross validation with ten-fold, and the estimates of parameters are then obtained accordingly. Details can be found in Friedman et al. (2010). After estimating Model (8), the forecasting is performed as follows:
Δ Y ^ T + h = M ^ + B ^ Δ Y ^ T + h 1 ,
with h > 1 . As a result, it holds that Y ^ T + h = Y T + l = 1 h Δ Y ^ T + l .
In a I-population case, we may let Δ Y t be an N I × 1 vector given by ( Δ y 1 , t , 1 , , Δ y N , t , I ) , where Δ y x , t , i is the mortality improvement rate of the i-th population at age x and year t. Other variables and parameters of Equation (8) can be redefined accordingly in a straightforward manner. In all cases, the sparsity of B depends on the tuning parameter λ in the estimation process, which is derived via the usual cross-validation procedure. Given Equations (8)–(10), we see that y ^ x , T + h y ^ x + k , T + h = O p ( 1 ) + O p ( h ) ( m ^ x m ^ x + k ) for two arbitrary ages x and x + k in the SVAR model, because Δ y x , t is stationary for any x. As a result, without any constraints on m ^ x and m ^ x + k , the projected mortality rates of any two ages will be diverging in the long-run, and thus the projection generated by the SVAR model is not age-coherent either.

3.2. The Coherent Sparse VAR Model

In this paper we propose a age-coherent extension of the SVAR model introduced in Section 3.1 by generalizing the intercepts in the projection, i.e., M ^ in Equation (10), will be extended to a series of stationary time-dependent processes. Specifically, the x-th element of M ^ , m ^ x , will be generalized to a process ( m ^ x , h ) h which varies over the projection horizon and converges in expectation to a constant universal to all ages, and therefore the projected mortality rates of all ages will be non-diverging in the long-run. To this end, many stationary processes can be chosen. In this paper, we illustrate the proposed method using the hyperbolic decay process. In time-series analysis, the hyperbolic decay is related to the concept of long memory (Feng and Shi 2017; Gao et al. 2020; Ho and Shi 2020), representing the type of decay with speed slower than that of the short memory ones, such as the geometric or exponential decay processes (Hosking 1981). An application of the hyperbolic decay in the mortality modelling and forecasting can be found in Feng et al. (2020). Formally, under the assumption of hyperbolic decay, the intercept used in the h step ahead forecast, m ^ x , h , is given by:
m ^ x , h = δ h ( d x ) ( m ^ x m ^ * ) + m ^ * ,
where m ^ x is defined in the same way as in Equation (8), m ^ * is long-term mean of m ^ x for all xs, and the hyperbolic parameter δ h ( d x ) is defined as:
δ h ( d x ) = h 1 + d x k δ h 1 ( d x ) and δ 0 ( d x ) = 1 .
When the hyperbolic parameter d x falls between 0 and 1, it holds that δ h ( d x ) 0 when h , and m ^ x , h will eventually converge (decay) to m ^ * . Furthermore, the speed of decay is slower (resp. faster) for larger (resp. smaller) values of d x . The extension from a constant M ^ to the hyperbolic decay processes of intercepts then leads to age-coherent projection of mortality rates. More specifically, for any two ages x and x + k , the distance between the projected mortality rates h steps ahead is given by:
y ^ x , T + h y ^ x + k , T + h = O p ( 1 ) + O p ( h ) ( m ^ x m ^ x + k ) = O p ( 1 ) + O p ( h ) ( m ^ * m ^ * ) = O p ( 1 ) ,
which will stay bounded in the long run. In this paper, we let m ^ * be the sample mean of all m ^ x , and refer to the model with the time-varying intercepts given in Equation (11) as the coherent SVAR (CSVAR) model.
Despite the aforementioned desirable feature of age-coherent forecasts, the determination of d x is a non-trivial issue in the CSVAR model. There are two major challenges. First, without parameter restrictions, the introduction of d x could increase the number of parameters by N ( d 1 , …, d N ). To address this issue, an appropriate dimensionality reduction technique is required. In particular, as argued in Li and Lu (2017), mortality changes of neighboring ages are typically rather smooth. Therefore, it is reasonable to assume that d x is a smooth function of x for each age to cope with the empirical patterns. Furthermore, it is argued in existing studies that mortality declined will be lower at older ages (see, for example, Li et al. 2013). This suggests that appropriate functional forms should be imposed such that d x is smaller for larger xs, and thus the corresponding m ^ x , h s will converge to m ^ * more slowly. Second, as the parameters d x s only play a role in the projection phase, there is no data to identify the optimal parametric structure of d x (over x). In particular, the historical data were already used to estimate M ^ . Thus, in order to identify the parametric structure of d x , estimation procedures such as cross-validation should be employed.
To deal with the first challenge, in this paper we use the inverse Epanechnikov kernel evaluated at the last observation to characterize the parametric structure of d x . Inverse Epanechnikov kernel is a parsimonious approach to construct smooth functions. More specifically, let τ be the scaled index x / N with x ( 1 , , N ) , the (adjusted) inverse Epanechnikov kernel evaluated at N is determined by 1 K b ( τ 1 ) = 1 K ( τ 1 b ) with K ( τ 1 ) = 0.75 [ 1 ( τ 1 ) 2 ] . The parameter b, ranging from 0 to 1, is the bandwidth of the inverse Epanechnikov kernel. For example, consider ages 0–100, the inverse Epanechnikov kernels evaluated at age 100 is displayed in Figure 1 for b = 0.1, 0.25, 0.5, 0.75 and 1. We see that the estimated kernel 1 K b ( τ 1 ) is constant for all ages up to a “cut-off” age, and then become decreasing afterwards. Specifically, the cut-off age is determined by the bandwidth parameter b. For example, if b = 0.1 , then the cut-off is the 90% percentile of all ages considered. In other words, the kernel will be constant for the first 90% of ages. On other other hand, when b = 1 , the cut-off age is the first age in the sample, and thus the kernel will be decreasing over the whole age group. Finally, the kernel will converge to 0.25, regardless the value of b. Therefore, the inverse Epanechnikov kernel provides a convenient tool to specify the smooth parametric structures of d x .
Next, we can model d x as d x = d 1 1 K b ( τ 1 ) , so that d x is positive and bounded by 1, and has a declining speed of convergence over age. All in all, the proposed parametric structure of d x requires only two additional parameters to be estimated for the CSVAR model, compared to the SVAR model: the hyperbolic parameter of the first age d 1 , and the bandwidth of the inversed Epanechnikov kernel b.
After reducing the number of parameters in the hyperbolic decay process, we now discuss the estimation of the two parameters ( d 1 , b). Recall that d x only plays a role in the projection phase, and hence we can treat d 1 and b as tuning parameters, and estimate these two parameters using cross-validation (Feng et al. 2020). However, a usual cross-validation technique for time-series data, such as the expanding-window approach explained in Hyndman and Athanasopoulos (2018), does not apply to the proposed CSVAR model. The reason is that the expanding-window approach normally considers a short forecasting step, while the age-coherent feature proposed in the CSVAR model focuses the long-term forecast. Thus, we employ a hold-out-sample approach to select the tuning parameters. Formally, the following objective function is minimized:
RMSFE = 1 N ( T / 5 ) i = 1 N h = 1 T / 5 y ^ i , 4 T / 5 + h y i , 4 T / 5 + h 2 ,
where RMSFE is the root of mean squared forecasting errors, and the evaluation period is given by the last fifth ( [ 4 T / 5 , T ] ) of the data in our study.2
In summary, mortality forecasting with the proposed CSVAR model is generated with the following procedure. First, we derive the the optimal values of d 1 and b via a grid search based on the hold-out-sample cross validation. At this stage, the coefficients in the SVAR model, M ^ and B ^ , will be estimated using the ENET algorithm and the first 80% of data. Second, with the optimal values of d 1 and b, the SVAR coefficients M ^ and B ^ will be estimated again using the full sample. The full-sample estimators will be used in the projection phase. Finally, after all parameters are estimated, the intercepts in the projection phase, m ^ x , h s, are calculated, and future mortality rates are then projected using Equation (10). Besides the mean forecasts, a simulation strategy may be applied to investigate uncertainties in the projections. For instance, the prediction interval (PI) of the estimated life expectancies can be determined via simulation based on the normal distribution assumption of the residual ε t s. More specifically, we use method (I) described in Li (2014) to implement the simulation, where the 95% PI is composed of the 2.5th and 97.5th percentiles of the simulated results.

The Multi-Population Extension

We now consider a J-population case. Let y i , j , t be the mortality rate for age i of population j in year t. Further, y j , t be the N × 1 vector containing the mortality rates of population j in year t for j = 1 , 2 , , J , and y t be the J N × 1 vector containing the mortality rates of all populations at time t. The same procedure of estimating the single-population CSVAR model may be followed with two modifications. First, m ^ x , h , j now follows the population-specific hyperbolic decay process:
m ^ x , h , j = δ h ( d x , j ) ( m ^ x , j M ^ * ) + M ^ * ,
where the parameter d x , j is population dependent, and M ^ * is the sample mean across all the J populations. Consequentially, each population has its own pair of parameters d 1 , j and b j . Second, in the original SVAR model, the same penalty parameter λ is applied in the ENET algorithm and only one model is fitted. In this case the same setting used in the single-population CSVAR case can be applied. In particular, the CSVAR coefficients will be estimated from the following objective function:
( M ^ , B ^ ) = arg min M R J N , B R J N × J N 1 2 t = 1 T y t M B y t 1 2 2 + λ i , j J N | β i j | .
The tuning parameters λ , and ( d 1 , j , b j ) , j = 1 , 2 , , J are then determined from minimizing the hold-out-sample RMSFE of mortality rates of the J populations:
RMSFE = 1 J N ( T / 5 ) i = 1 N j = 1 J h = 1 T / 5 y ^ i , j , 4 T / 5 + h y i , j , 4 T / 5 + h 2 .
However, we may also consider a more flexible framework which allows for population-dependent penalty parameters λ j , j = 1 , , J . More specifically, instead of estimating a universal B ^ for all populations using the same penalty parameter λ , we may fit J pairs of B ^ j with the penalty parameter λ j for each population separately. In other words, the population-specific regression parameters ( M j , B j ) are obtained by minimizing the objective function:
( M ^ j , B ^ j ) = arg min M j R N , B j R N × J N 1 2 t = 1 T y j , t M j B j y t 1 2 2 + λ j i = 1 N j = 1 J N | β i j | .
After all the coefficient matrices are estimated, each pair of ( λ j , d 1 , j , b j ) will be determined to minimize the hold-out-sample RMSFE of mortality rates of population j only, i.e.,
RMSFE j = 1 N ( T / 5 ) i = 1 N h = 1 T / 5 y ^ i , j , 4 T / 5 + h y i , j , 4 T / 5 + h 2 .
Based on the estimation procedure above, it is rather efficient to generalize the age-coherent mortality projections to a large number of populations, since the parameters in each population-specific system are estimated separately and the number of parameters to be estimated will just increase linearly with the total number of observations in the data.

4. Empirical Analysis

This paper focuses on mortality data from the Human Mortality Database (2020). In particular, the proposed CSVAR model is illustrated using uni-sex, single-age mortality data of the United Kingdom (UK) and France from 1950 to 2016 and ages 0 to 100. The UK and French mortality data are chosen to illustrate the proposed CSVAR model because these two countries are both developed countries with similar socioeconomic conditions. The mortality of such countries are suitable to be modelled simultaneously as suggested by Li and Lee (2005). Further, the UK and French mortality data have been chosen to illustrate mortality forecasting models in the existing literature, including Chang and Shi (2020a) and Chang and Shi (2020b), among many others. The log central death rates are plotted in Figure 2 across all investigated years. Consistent improvements over time are observed for both countries. It can also be seen that there are relatively more pronounced mortality improvements of the oldest ages for the French population, when contrasting data of 1950 and 2016.

4.1. Long-Term Analysis

We first conduct the long-term analysis with the proposed CSVAR model using the entire sample period over 1950–2016. The projections generated by the Lee–Carter and the SVAR model are reported for comparison. First, Figure 3 displays the projected log mortality rates in 2100 generated by the three models, contrasted against the true rates in 2016. It is observed that, while projected mortality improvements are rather pronounced for the young and the middle ages, little improvements are gained at old ages for the forecasts produced by the LC and SVAR model, especially for the UK data. On the contrary, due to its age-coherent property, forecasts produced by the CSVAR model demonstrate much smoother mortality improvement across ages. In particular, the projected mortality improvements are more similar across ages—they are more pronounced for the old ages and less substantial at the younger ages than the other models. The more balanced projected mortality improvements are attributed to the decay in m ^ x , h , j s.
Next, we investigate the life expectancy projections, which is one of the most important applications in the mortality forecasting. In this paper, we take the life expectancy at birth ( e 0 ) as an example, as it is a comprehensive measure which incorporates mortality information of all ages. As argued in Li et al. (2013), without a proper rotation of b ^ x in the Lee–Carter model, e 0 is likely to be underestimated, especially in the long run. Due to the lack of age coherence, this issue may also exist in the Lee–Carter and the SVAR model, as mortality improvements of the elderly could be underestimated. Figure 4 displays the mean forecasts and the prediction intervals of the life expectancy at birth ( e 0 ) for both the UK and French population up to 2100. As for the point estimates, we can see that the e ^ 0 s generated by the CSVAR model are uniformly larger than those of the Lee–Carter and the SVAR model. It is also worth noting that the SVAR model produces lower long-run forecast of e ^ 0 than the Lee–Carter model. More specifically, the point forecasts of e ^ 0 produced by the Lee–Carter, SVAR, and CSVAR model grow from 81.3, 81.2, and 81.2 (resp. 82.6, 82.6, and 82.6) as of 2017 to 90.6, 88.4 and 93.5 (resp. 93.7, 92.2, and 95.7) as of 2100 for the UK (resp. French) data, respectively. Overall, the higher projected life expectancies generated by the CSVAR model could be attributed to the more pronounced projected mortality improvement in the old ages, as shown in Figure 3. Additional conclusions can be drawn from the prediction intervals, which are generated based on the Gaussian-distributed temporal disturbances with 1000 simulated replicates. Despite the similarity of widths in early years, the lower bound of the 95% PI of the CSVAR model is higher than the point estimates of the Lee–Carter and SVAR (resp. SVAR) model for the UK (resp. French) data as of 2100, implying a significant difference. Furthermore, the widths of PIs produced by the SVAR model are wider than those of the Lee–Carter model after around 20–30 years, whereas the CSVAR model leads to the narrowest PIs for both populations. More specifically, for the UK (resp. French) data, the widths of 95% PIs of the Lee–Carter, SVAR and CSVAR model as of 2100 are 6.6, 8.0 and 3.9 (resp. 7.5, 10.1 and 5.7) years, respectively.

4.2. Out-of-Sample Forecast

Next, we consider the out-of-sample forecasting performance of the proposed CSVAR model. In particular, we compare its forecasting accuracy with those of the Lee-Cater model and the SVAR model. Following Li and Lu (2017) and Feng et al. (2020), the training sample is set to 1950–2000, and the remaining data is used as the test sample. The selected tuning parameters of the SVAR and CSVAR models are reported in Appendix A. For each model, we consider the RMSFEs over age groups and at individual time horizons separately and an overall measure as follows:
R M S F E x = 1 16 h = 1 16 ( y x , T + h y ^ x , T + h ) 2 , R M S F E h = 1 101 x = 0 100 ( y x , T + h y ^ x , T + h ) 2 , R M S F E a l l , h = 1 101 × h i = 1 h x = 0 100 ( y x , T + i y ^ x , T + i ) 2 .
R M S F E x (resp. R M S F E h ) is the RMSFE averaged over all 16 forecasting steps for age group x (resp. all the 101 ages at forecasting horizon h), and R M S F E a l l , h is the overall measure accounting for all ages and forecasting horizons up to h.
Firstly, R M S F E a l l , h and descriptive statistics of R M S F E x are reported in Table 1, where bold numbers indicate the smallest quantity for each statistic. The mean R M S F E x across all age groups of the CSVAR model is around 44% and 45% (resp. 5% and 6%) smaller than that resulting from the Lee–Carter model (resp. SVAR model) for UK and France, respectively. Moreover, Q 1 and Q 3 (the first and third quartiles) support that CSVAR model performs reasonably well among the three competing models, and standard deviation of R M S F E x confirms that the R M S F E x s of the CSVAR model are more narrowly spread than those of the Lee–Carter and the SVAR models. Finally, as indicated by R M S E a l l , 16 , the overall performance of the proposed CSVAR model is better than the Lee–Carter and the SVAR model for both populations.
Secondly, Figure 5 plots the R M S F E h at individual forecasting horizons ranging from 1 year (2001) to 16 years (2016). Distinct differences among all the three models can be observed, especially at larger horizons. In general, the VAR models consistently outperform the Lee–Carter model in all horizons for both populations. Furthermore, comparing with the SVAR model, R M S F E h of the CSVAR model is smaller in the majority of cases. Finally, with the growth of h (especially from the 8th step onward), the increment in R M S F E h is slower for the CSVAR model than Lee–Carter and the SVAR model, suggesting its better performance in the long run.

4.2.1. Robustness Analysis

To evaluate the robustness of the forecasting results, we perform the out-of-sample forecasting analysis under three major variant settings. Firstly, we follow Li et al. (2013) and Boonen and Li (2017) and model the logged death rates of the five-year ages instead of the single-year groups. Secondly, we consider a shorter training sample of 1970–2000. Finally, instead of crude death rates, we consider the smoothed rates using P-splines (Eilers and Marx 1996), as employed in Hyndman and Ullah (2007). We report the resulting R M S F E a l l , 16 s in Table 2. It can be seen that the CSVAR model improves the forecasting results of the LC (resp. SVAR) model by at least 30% and 45% (resp. 7% and 8%) for the UK and French data, respectively. Therefore, we conclude that the proposed CSVAR model is able to produce more satisfying forecasting performance compared to the LC and SVAR model under different settings.

4.2.2. The Two-Population Extension

We now investigate the multi-population case by modelling the UK and French data jointly. As discussed at the end of Section 3.2, we consider the situations of a uniform penalty term (CSVAR 1 ) and individual penalty terms (CSVAR 2 ). The selected tuning parameters of the SVAR and the two CSVAR models are reported in Appendix A. The results of R M S F E x are summarised in Table 3, where the Li–Lee model is fitted to replace the Lee–Carter model for comparison. The following observations can be made. First, by jointly modelling the UK and French data, the forecasting results of the Li–Lee model are considerably improved over those of the Lee–Carter model displayed in Table 1. This, however, is not the case for SVAR and CSVAR 1 . Although their performances are comparatively better than those of the Li–Lee model for both populations, the forecasting results of the CSVAR 1 model and the SVAR model are less accurate than those of the single-population counterparts. For instance, the R M S F E a l l , 16 of the CSVAR 1 model increase from 0.1106 and 0.1358 to 0.1159 and 0.1396 for UK and French data, respectively. One possible reason of the reduced accuracy in the two-population case is that the VAR model may produce less accurate forecasts when more irrelevant information is included (Feng and Shi 2018). Therefore, the application of a uniform penalty term may lead to the estimated M ^ and B ^ that poorly reflect the historical mortality pattern of both populations. In contrast, when different penalty terms are allowed, the forecasting results of CSVAR 2 are almost uniformly better than those of the Li–Lee, SVAR, and CSVAR 1 model. Furthermore, compared to the single-population forecasts, CSVAR 2 leads to 23% improvement over the single-population CSVAR model of the French data ( R M S F E a l l , 16 of 0.1046 vs 0.1358). As for the UK data, the forecasts of CSVAR 2 are almost identical to those of the single-population CSVAR model. This may suggest that the UK mortality improvements may provide additional important information in the projection of the French improvements, but not vice versa.

4.2.3. Forecasting of Life Expectancy of Age 0

We now compare the out-of-sample forecasting accuracy of the CSVAR model, the SVAR model, and the Lee–Carter model in terms of life expectancy projection. Again, the life expectancy at birth is used as an illustration. We fit the three models using data from 1950 to 2000, and use the projected log mortality rates to produce e ^ 0 over 2001–2016. The resulting mean forecasts and the 95% prediction intervals are then plotted against the true life expectancies in Figure 6. It can be seen that the e ^ 0 s produced by the CSVAR model are uniformly larger than those generated by the Lee–Carter and the SVAR model, especially for the UK data. This is consistent with the age-coherent property of the CSVAR model. More importantly, those forecast e ^ 0 by the CSVAR model are closest to the true values at all horizons for both populations. As for the PIs, all models manage to cover the range of the true e 0 . In terms of the efficiency, however, the VAR-based models generate narrower PIs than the Lee–Carter model for both populations. In particular, the CSVAR model results in the most efficient interval estimates, with the widths of the UK data almost 50% narrower than those of SVAR.

5. Conclusions

This paper proposes an age-coherent sparse vector autoregressive model to forecast log mortality rates. In particular, we allow the age-specific mortality improvement rates to converge to a universal long-term mean for all ages. The following key results can be drawn from our study. First, the proposed coherent VAR model generates more accurate out-of-sample forecasting results than the Lee–Carter model and the sparse VAR (SVAR) model recently developed by Guibert et al. (2019), as measured by the root-mean-square errors. This result holds for both the uni-sex, single-age UK and the French mortality data for age 0 to 100 with the training sample of 1950 to 2000 and the forecasting period of 2001 to 2016, and is robust when the training sample is shortened to 1970 to 2000 or when five-age mortality data are used instead of the single-age mortality rates. Second, the proposed coherent model retains the attractive advantages of the SVAR model, and has a more flexible parametric structure than the factor-based models, such as the Lee–Carter and the Li–Lee model. In particular, by extending the SVAR model, the proposed model uses the lasso-based algorithm to determine the autoregressive coefficient matrix, and thus the autoregressive matrix is data-driven, rather than based on a prior parameter constraints. Consequentially, the VAR model is able to capture rather general patterns of mortality developments, such as the impact of mortality change of a young age on that of a very old age. Thirdly, by allowing the mortality improvement rates to converge to a universal long-term mean, the proposed model can generate coherent long-term mortality projections, i.e., projected mortality rates at different ages will not diverge in the long-run, such as the spatial-temporal model by Li and Lu (2017). Moreover, by utilizing a hyperbolic decay structure, we allow the speed of convergence to vary with age. Finally, since the number of parameters to be estimated only increase rather moderately with the addition of new populations, the proposed model can be applied to model the joint mortality improvements of multiple populations in an efficient manner. In the multi-population case, projected mortality rates are coherent both on the age and the population dimension. A two-population illustration is made using the UK and French data.
In this paper, we focus on the age-coherent extension of the SVAR model developed by Guibert et al. (2019). However, the proposed age-coherent extension is rather general and applicable to more VAR specifications, such as VAR models with a moving average structure or stochastic volatility. In the future, it would be interesting to explore the age-coherent mortality projections based on more general VAR specifications.

Author Contributions

Methodology, Y.S. and H.L.; formal analysis, Y.S.; writing—original draft preparation, Y.S. and H.L.; writing—review and editing, Y.S. and H.L.; visualization, Y.S. and H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Sciences and Engineering Research Council of Canada (NSERC), [RGPIN-2020-05387] and [DGECR-2020-00347].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors are grateful to the University of Manitoba and Macquarie University for their support. We thank the three anonymous referees for their valuable comments on the earlier version of this paper. The usual disclaimer applies.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations and variables are used in this manuscript:
LCLee–Carter model
VARVector Autoregression model
CBDCairns–Blake–Dowd model
LLLi–Lee model
STARSpatial-temporal Autoregression model
SVARSparse VAR model
CSVARCoherent Sparse VAR model
LASSOLeast Absolute Shrinkage and Selection Operator
ENETElastic-net
RMSFERoot of Mean Squared Forecasting Error
Variables of single population models:
y x , t Log central mortality rate at age x in year t
a x The average mortality level at each age x
k t The mortality index at time t
b x The age-specific sensitivity of y x , t to changes in k t
ε x , t The normal error term
Δ Y t The vector of differenced log central mortality rate
M Intercept vector of the VAR-type models
B Coefficients of Δ Y t 1 in the VAR-type models
m ^ x , h Forecast intercept term in the CSVAR model for age x at step h
δ h ( d x ) The hyperbolic parameter associated with m ^ x , h
λ The ENET penalty
Additional variables of joint population models:
B x Age effect of the common factor
K t Period effect of the common factor

Appendix A. Additional Tables

Table A1. Tuning parameters.
Table A1. Tuning parameters.
Single ModelsSVARCSVAR
λ λ d 1 b
UK−11.66−9.260.29740.2184
FR−11.30−6.890.74260.0621
Join ModelsSVARCSVAR 1 CSVAR 2
λ λ d 1 , j b j λ j d 1 , j b j
UK−11.98−9.260.44580.1663−10.750.26380.2599
FR−11.98−9.260.74260.4789−11.160.34680.5311
Note: The values of the ENET penalties (λ and λj) are reported in logarithms.

References

  1. Boonen, Tim J., and Hong Li. 2017. Modeling and forecasting mortality with economic growth: A multipopulation approach. Demography 54: 1921–46. [Google Scholar] [CrossRef]
  2. Booth, Heather, Rob J. Hyndman, Leonie Tickle, and Piet De Jong. 2006. Lee–Carter mortality forecasting: A multi-country comparison of variants and extensions. Demographic Research 15: 289–310. [Google Scholar] [CrossRef]
  3. Cairns, A. J. G., David Blake, and Kevin Dowd. 2006. A two-factor model for stochastic mortality with parameter uncertainty: Theory and calibration. Journal of Risk and Insurance 73: 687–718. [Google Scholar] [CrossRef]
  4. Chang, Le, and Yanlin Shi. 2020a. Dynamic modelling and coherent forecasting of mortality rates: A time-varying coefficient spatial-temporal autoregressive approach. Scandinavian Actuarial Journal 9: 843–863. [Google Scholar] [CrossRef]
  5. Chang, Le, and Yanlin Shi. 2020b. Mortality forecasting with a spatially penalized smoothed var model. ASTIN Bulletin: The Journal of the IAA 51: 161–189. [Google Scholar] [CrossRef]
  6. Chen, An, Hong Li, and Mark Schultze. 2020. Tail Index-Linked Annuity: A Longevity Risk Sharing Retirement Plan. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3664433 (accessed on 1 November 2020).
  7. Dowd, Kevin, Andrew J. G. Cairns, David Blake, Guy D. Coughlan, and Marwa Khalaf-Allah. 2011. A gravity model of mortality rates for two related populations. North American Actuarial Journal 15: 334–56. [Google Scholar] [CrossRef] [Green Version]
  8. Eilers, Paul H. C., and Brian D. Marx. 1996. Flexible smoothing with B-splines and penalties. Statistical Science 11: 89–102. [Google Scholar] [CrossRef]
  9. Feng, Lingbing, and Yanlin Shi. 2017. Fractionally integrated garch model with tempered stable distribution: A simulation study. Journal of Applied Statistics 44: 2837–57. [Google Scholar] [CrossRef]
  10. Feng, Lingbing, and Yanlin Shi. 2018. Forecasting mortality rates: Multivariate or univariate models? Journal of Population Research 35: 289–318. [Google Scholar] [CrossRef]
  11. Feng, Lingbing, Yanlin Shi, and Le Chang. 2020. Forecasting mortality with a hyperbolic spatial temporal VAR model. International Journal of Forecasting 37: 255–273. [Google Scholar] [CrossRef]
  12. Friedman, Jerome, Trevor Hastie, and Rob Tibshirani. 2010. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33: 1. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  13. Gao, Guangyuan, Kin-Yip Ho, and Yanlin Shi. 2020. Long memory or regime switching in volatility? evidence from high-frequency returns on the us stock indices. Pacific-Basin Finance Journal 61: 101059. [Google Scholar] [CrossRef]
  14. Guibert, Quentin, Olivier Lopez, and Pierrick Piette. 2019. Forecasting mortality rate improvements with a high-dimensional VAR. Insurance: Mathematics and Economics 88: 255–72. [Google Scholar] [CrossRef] [Green Version]
  15. Ho, Kin-Yip, and Yanlin Shi. 2020. Discussions on the spurious hyperbolic memory in the conditional variance and a new model. Journal of Empirical Finance 55: 83–103. [Google Scholar] [CrossRef]
  16. Hosking, J. R. M. 1981. Fractional differencing. Biometrica 68: 165–76. [Google Scholar] [CrossRef]
  17. Human Mortality Database. 2020. University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Germany). Available online: https://www.mortality.org/ (accessed on 1 November 2020).
  18. Hunt, Andrew, and David Blake. 2018. Identifiability, cointegration and the gravity model. Insurance: Mathematics and Economics 78: 360–68. [Google Scholar] [CrossRef]
  19. Hyndman, Rob J., and George Athanasopoulos. 2018. Forecasting: Principles and Practice. Melbourne: OTexts. [Google Scholar]
  20. Hyndman, Rob J., Heather Booth, and Farah Yasmeen. 2013. Coherent mortality forecasting: The product-ratio method with functional time series models. Demography 50: 261–83. [Google Scholar] [CrossRef] [Green Version]
  21. Hyndman, Rob J., and Md Shahid Ullah. 2007. Robust forecasting of mortality and fertility rates: A functional data approach. Computational Statistics & Data Analysis 51: 4942–56. [Google Scholar]
  22. Lee, Ronald, and Lawrence Carter. 1992. Modeling and Forecasting US Mortality. Journal of the American Statistical Association 87: 659–71. [Google Scholar]
  23. Li, Hong. 2018. Dynamic hedging of longevity risk: The effect of trading frequency. ASTIN Bulletin: The Journal of the IAA 48: 197–232. [Google Scholar] [CrossRef]
  24. Li, Hong, Anja De Waegenaere, and Bertrand Melenberg. 2015. The choice of sample size for mortality forecasting: A bayesian learning approach. Insurance: Mathematics and Economics 63: 153–68. [Google Scholar] [CrossRef]
  25. Li, Hong, Anja De Waegenaere, and Bertrand Melenberg. 2017. Robust mean–variance hedging of longevity risk. Journal of Risk and Insurance 84: 459–75. [Google Scholar] [CrossRef]
  26. Li, Han, Hong Li, Yang Lu, and Anastasios Panagiotelis. 2019. A forecast reconciliation approach to cause-of-death mortality modeling. Insurance: Mathematics and Economics 86: 122–33. [Google Scholar] [CrossRef]
  27. Li, Hong, and Johnny Siu-Hang Li. 2017. Optimizing the lee-carter approach in the presence of structural changes in time and age patterns of mortality improvements. Demography 54: 1073–95. [Google Scholar] [CrossRef] [PubMed]
  28. Li, Hong, and Yang Lu. 2017. Coherent forecasting of mortality rates: A sparse vector-autoregression approach. ASTIN Bulletin: The Journal of the IAA 47: 563–600. [Google Scholar] [CrossRef]
  29. Li, Hong, and Yang Lu. 2018. A bayesian non-parametric model for small population mortality. Scandinavian Actuarial Journal 2018: 605–28. [Google Scholar] [CrossRef]
  30. Li, Hong, and Yang Lu. 2019. Modeling cause-of-death mortality using hierarchical archimedean copula. Scandinavian Actuarial Journal 2019: 247–72. [Google Scholar] [CrossRef]
  31. Li, Hong, Yang Lu, and Pintao Lyu. 2018. Coherent Mortality Forecasting for Less Developed Countries. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3209392 (accessed on 1 November 2020).
  32. Li, Hong, and Yanlin Shi. 2020. Forecasting mortality with international linkages: A global vector-autoregression approach. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3700586 (accessed on 1 November 2020).
  33. Li, Hong, Ken Seng Tan, Shripad Tuljapurkar, and Wenjun Zhu. 2019. Gompertz law revisited: Forecasting mortality with a multi-factor exponential model. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3495369 (accessed on 1 November 2020).
  34. Li, Jackie. 2014. A quantitative comparison of simulation strategies for mortality projection. Annals of Actuarial Science 8: 281. [Google Scholar] [CrossRef]
  35. Li, Nan, and Ronald Lee. 2005. Coherent mortality forecasts for a group of populations: An extension of the Lee–Carter method. Demography 42: 575–94. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  36. Li, Nan, Ronald Lee, and Patrick Gerland. 2013. Extending the lee-carter method to model the rotation of age patterns of mortality decline for long-term projections. Demography 50: 2037–51. [Google Scholar] [CrossRef] [Green Version]
  37. Renshaw, Arthur E., and Steven Haberman. 2003. Lee–carter mortality forecasting with age-specific enhancement. Insurance: Mathematics and Economics 33: 255–72. [Google Scholar] [CrossRef]
  38. Renshaw, Arthur E., and Steven Haberman. 2006. A cohort-based extension to the Lee–Carter model for mortality reduction factors. Insurance: Mathematics and Economics 38: 556–70. [Google Scholar] [CrossRef]
  39. Shi, Yanlin. 2020. Forecasting mortality rates with the adaptive spatial temporal autoregressive model. Journal of Forecasting. [Google Scholar] [CrossRef]
1.
A maximum likelihood method may also be employed to calibrate the parameters (Renshaw and Haberman 2003).
2.
Note that the choice of the length of test sample (one fifth) is common among existing studies. Adopting other popular alternatives such as the last third, fourth and tenth sample will lead to robust results.
Figure 1. Demonstrations of the inversed Epanechnikov kernel.
Figure 1. Demonstrations of the inversed Epanechnikov kernel.
Risks 09 00035 g001
Figure 2. Uni-sex mortality data of the UK and France of age 0–100 and year 1950–2016.
Figure 2. Uni-sex mortality data of the UK and France of age 0–100 and year 1950–2016.
Risks 09 00035 g002
Figure 3. Projected and actual log central death rates: 2016 vs. 2100.
Figure 3. Projected and actual log central death rates: 2016 vs. 2100.
Risks 09 00035 g003
Figure 4. Mean forecast and the 95% prediction intervals vs. actual life expectancy at bitrh: 2001–2100.
Figure 4. Mean forecast and the 95% prediction intervals vs. actual life expectancy at bitrh: 2001–2100.
Risks 09 00035 g004
Figure 5. RMSE over forecasting steps.
Figure 5. RMSE over forecasting steps.
Risks 09 00035 g005
Figure 6. Forecast vs actual life expectancy at age 0: 1991–2016.
Figure 6. Forecast vs actual life expectancy at age 0: 1991–2016.
Risks 09 00035 g006
Table 1. Summary of RMSE over age groups. For each column, the smallest number is in bold.
Table 1. Summary of RMSE over age groups. For each column, the smallest number is in bold.
R M S E a l l , 16 MeanStd. Dev. Q 1 Q 3
Panel A: UK
LC0.16230.14540.07260.09120.1935
SVAR0.12090.10560.05920.05360.1571
CSVAR0.11060.10060.04630.06470.1321
Panel B: France
LC0.21590.16610.13870.05480.2601
SVAR0.14220.12100.07500.06700.1594
CSVAR0.13580.11430.07360.05840.1530
Table 2. Summary of robustness check. For each column, the smallest number is in bold.
Table 2. Summary of robustness check. For each column, the smallest number is in bold.
UKFrance
LCSVARCSVARLCSVARCSVAR
Five-year groups0.16740.12290.11500.22400.14290.1199
1970–20160.15070.13600.11630.19160.14280.1324
Smoothed rates0.15620.10690.09870.21020.12850.1140
Table 3. Summary of two-population results. For each column, the smallest number is in bold.
Table 3. Summary of two-population results. For each column, the smallest number is in bold.
R M S E a l l , 16 MeanStd. Dev. Q 1 Q 3
Panel A: UK
LL0.12410.10530.06600.05300.1435
SVAR0.12070.10520.05940.05120.1569
CSVAR 1 0.11590.10190.05550.05360.1354
CSVAR 2 0.11100.10120.04590.06660.1348
Panel B: France
LL0.15590.13030.08600.06300.1770
SVAR0.14420.12200.07730.06640.1600
CSVAR 1 0.13960.11780.07540.06310.1469
CSVAR 2 0.10460.09060.05260.05210.1172
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Li, H.; Shi, Y. Mortality Forecasting with an Age-Coherent Sparse VAR Model. Risks 2021, 9, 35. https://doi.org/10.3390/risks9020035

AMA Style

Li H, Shi Y. Mortality Forecasting with an Age-Coherent Sparse VAR Model. Risks. 2021; 9(2):35. https://doi.org/10.3390/risks9020035

Chicago/Turabian Style

Li, Hong, and Yanlin Shi. 2021. "Mortality Forecasting with an Age-Coherent Sparse VAR Model" Risks 9, no. 2: 35. https://doi.org/10.3390/risks9020035

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop