Should Selection of the Optimum Stochastic Mortality Model Be Based on the Original or the Logarithmic Scale of the Mortality Rate?

Miguel Santolino

doi:10.3390/risks11100170

Department of Econometrics, Riskcenter-IREA, University of Barcelona, 08034 Barcelona, Spain

Risks2023, 11(10), 170;https://doi.org/10.3390/risks11100170

This article belongs to the Special Issue Longevity Risk, Insurance and Pensions

Version Notes

Order Reprints

Abstract

Stochastic mortality models seek to forecast future mortality rates; thus, it is apparent that the objective variable should be the mortality rate expressed in the original scale. However, the performance of stochastic mortality models—in terms, that is, of their goodness-of-fit and prediction accuracy—is often based on the logarithmic scale of the mortality rate. In this article, we examine whether the same forecast outcomes are obtained when the performance of mortality models is assessed based on the original and log scales of the mortality rate. We compare four different stochastic mortality models: the original Lee–Carter model, the Lee–Carter model with (log)normal distribution, the Lee–Carter model with Poisson distribution and the median Lee–Carter model. We show that the preferred model will depend on the scale of the objective variable, the selection criteria measure and the range of ages analysed.

Keywords:

longevity risk; stochastic mortality models; Lee–Carter; model selection

JEL Classification:

C18; G22; D81

1. Introduction

Human longevity has steadily increased over the last 150 years. During the first half of that period, improvements in life expectancy were mainly attributable to the reduction in infant mortality; in the second half, improvements have been mainly driven by a fall in the mortality rates of the elderly (Wilmoth 2000). Increasing human longevity and ageing represent a major challenge with implications at many societal levels, including rising pressure on healthcare and welfare systems and a declining labour force relative to the overall population. In response, actuaries and demographers have paid increasing attention to the modelling and projection of mortality rates.

One of the most influential approaches to the stochastic modelling of future mortality has undoubtedly been the parametric non-linear regression model developed by Lee and Carter (1992). In the Lee–Carter (LC) model, the mortality rate is estimated by means of a non-linear combination of age and period parameters. Many subsequent attempts at developing mortality models have drawn inspiration from the LC model, including, but not limited to, Brouhns et al. (2002), Currie et al. (2004), Renshaw and Haberman (2003, 2006), Cairns et al. (2006) and Plat (2009). Following the introduction of the concept of mortality coherence by Li and Lee (2005) to indicate that the mortality rates of related populations should not diverge infinitely, many articles have extended the LC model to focus, specifically, on such coherence. Mortality coherence of related populations has, thus, been considered in terms of gender (Li 2013; Li et al. 2016, 2021; Pitt et al. 2018; Wong et al. 2020; Yang et al. 2016) and the countries constituting a given region (Biffis et al. 2017; Chen and Millossovich 2018; Diao et al. 2021; Enchev et al. 2017; Lyu et al. 2021; Scognamiglio 2022). See Hunt and Blake (2021c) for a review of mortality models and Blake et al. (2023) for recent developments in mortality modelling.

This burgeoning of models and construction procedures, however, has introduced another element to the study of future mortality, namely that of model selection. Indeed, various frameworks have been proposed to construct and select the most suitable model in the trade-off between complexity and parsimony (Barigou et al. 2021; Hunt and Blake 2014; SriDaran et al. 2022). The construction of the optimal model typically involves selecting a base (reference) model and deciding whether to incorporate additional parameters or functions under certain selection criteria. Alternative stochastic mortality models can be used as reference in the construction of the optimal mortality model. Here, most extensions of the LC model define the reference mortality model assuming a Gaussian error structure of log mortality rates (Chang and Shi 2022; Gao and Shi 2021; Li and Lu 2017; Li and Shi 2021; SriDaran et al. 2022) or a Poisson distribution of deaths (Barigou et al. 2021; Chen and Millossovich 2018; Enchev et al. 2017; Hunt and Blake 2014; Li 2013; Li et al. 2016, 2021; Pitt et al. 2018; Wong et al. 2020; Yang et al. 2016). A less common option for the reference mortality model is to assume a binomial distribution of annual death probabilities (Atance et al. 2020) or gamma distribution for mortality rates (Huang et al. 2022).

One issue that has not received sufficient research is the impact the selection criteria might have on the model selection decision. As Atance et al. (2020) stress, there is no single criterion for evaluating the goodness-of-fit and the prediction accuracy of stochastic mortality models. Selection criteria frequently rely on measures based on squared errors (Chang and Shi 2022; Enchev et al. 2017; Gao and Shi 2021; Li and Lu 2017; Li and Shi 2021), absolute errors (Li et al. 2016, 2021), maximum likelihood (Pitt et al. 2018; Yang et al. 2016) or a combination of these measures (Atance et al. 2020; Chen and Millossovich 2018; Huang et al. 2022; Li 2013; Wong et al. 2020). Additionally, even the same selection criteria measures are often defined based on either mortality rate predictions (estimates) (Atance et al. 2020; Chen and Millossovich 2018) or log mortality rate predictions (estimates) (Chang and Shi 2022; Enchev et al. 2017; Gao and Shi 2021; Li and Lu 2017; Li and Shi 2021; Li et al. 2021; Li and Lee 2005; Wong et al. 2020). Elsewhere, others have used a combination of measures based on mortality rates expressed on both original and log scales (Li 2013; Li et al. 2016). Predictive analytics, machine learning and artificial intelligence have become popular in recent years (Chen and Khaliq 2022; Hainaut 2018; Li 2023; Marino et al. 2023; Perla et al. 2021; Richman and Wüthrich 2021; Wang et al. 2021) and scholars are also using different selection criteria measures to compare the mortality models. For example, on the original scale, the mean squared error is used by Richman and Wüthrich (2021), the mean absolute percentage error by Wang et al. (2021) and both measures by Chen and Khaliq (2022), while, on the log scale, the mean square error is used by Hainaut (2018).

The goal of the present article is to evaluate the implications of choosing selection criteria measures for the reference LC stochastic model based on either mortality rates or log mortality rates. The model selection measures used in this study are based on squared and absolute errors. To undertake this evaluation, we analyse the performance of stochastic reference mortality models, for a set of countries, in terms of their goodness-of-fit and prediction accuracy when the selection measures are based on either original mortality rates or log mortality rates. In doing so, we compare four alternative reference mortality models: namely, the original LC model (LC), the LC model with (log)normal distribution (LN-LC), the LC model with Poisson distribution (P-LC) and the median LC model (M-LC).

Reference stochastic mortality models are rarely compared in the literature. Claims have been made to the effect that the Poisson assumption provides a more rigorous statistical framework for analysing mortality data and that counting random variables is a more natural choice than that of modelling the death rate (Cairns et al. 2009; Li 2013; Wong et al. 2020). However, Gaussian and Poisson LC models have not been compared to date in terms of their goodness-of-fit and prediction accuracy, with the exception of Brouhns et al. (2002), who compared the two models solely in terms of the goodness-of-fit of Belgian mortality rates, concluding that the Poisson LC model performed better for ages above 90. Here, by comparing the use of selection criteria measures based on mortality rates in either the original or log scales, we seek to determine if the preference for the Gaussian or Poisson assumption is conditional on the scale involved. While Santolino (2020) introduced the LC quantile stochastic model to estimate the quantiles of the log mortality rate, here, we focus our attention on the median LC model as a specific version of the LC quantile model that models the median log mortality rate (Santolino 2021). Recall that the mean is the value that minimizes the squared error while the median minimizes the absolute error. Thus, we also seek to determine whether the median LC model is the preferred choice when absolute-error-based selection measures are used in both the log and original scales.

Finally, we also examine whether the selection of the preferred reference mortality model also depends on the interval of ages considered. In the actuarial field, the mortality patterns of greatest interest are often those that manifest at more advanced ages. Most life insurance products are defined so as to provide longevity protection, given that individuals receiving a lifetime income may live longer than accounted for in the valuation of the provision of insurer liabilities (longevity risk) (Hunt and Blake 2021a, 2021b). Annuities are usually deferred to retirement. Pension funds and annuity providers need to effectively manage the longevity risk to which they are exposed for future improvements in mortality at the ages at which periodic payments are made (OECD 2014). In this study, therefore, we analyse the performance of the four reference mortality models under the alternative selection criterion measures at ages both below and above 50 years old.

The main contribution of our study is that the focus is on the selection criteria measure and how it determines the choice of the optimal stochastic mortality model. In previous studies, the selection criteria measures are usually stated a priori and a set of mortality models is evaluated according to them to choose the preferred model. In those studies, the focus is on the design of a new—frequently more complex—mortality model. The aim of those studies is to prove that the new modelling approach outperforms previous mortality developments in terms of goodness-of-fit and/or prediction accuracy. Multiple selection criteria measures can be used to evaluate mortality models. To the best of our knowledge, however, the impact of the choice of the selection criteria measure on the selected mortality model has not been previously discussed in detail in the literature. In our study, four basic stochastic mortality models with equal complexity in their designs are stated and the impact of alternative selection measures on the model choice is discussed.

The rest of this article is structured as follows. In Section 2, we introduce our notation. Our motivation for the study is provided in Section 3. Stochastic parametric mortality models are described in Section 4. We present an application in Section 5. The analysis is illustrated for a population divided in age intervals in Section 6. Finally, a discussion is provided in Section 7.

2. Notation

Let the random variable

D_{x, t}

denote the number of deaths in a population at age x and calendar year t,

x = 0, \dots, ω

and

t = 1, \dots, T

. The central rate of mortality

m_{x, t}

is defined as

m_{x, t} = \frac{D_{x, t}}{E_{x, t}}

, where

E_{x, t}

is the central exposure to risk at age x in year t. The estimated and predicted central rates of mortality are denoted as

{\hat{m}}_{x, t}

and

{\tilde{m}}_{x, t}

, respectively.

Two measures have been preferred in the literature to compare fitted and predicted values: the sum of the squared error and the sum of the absolute percentage error (or their respective mean values). The sum of the squared error in log scale (

S S E L

) is defined as

S S E L = \sum_{\forall (x, t)} {(l o g (m_{x, t}) - l o g ({\hat{m}}_{x, t}))}^{2},

and the sum of the absolute percentage error in log scale (

S A P E L

) as

S A P E L = \sum_{\forall (x, t)} (|\frac{l o g (m_{x, t}) - l o g ({\hat{m}}_{x, t})}{l o g (m_{x, t})}|) .

The out-of-sample versions of these measures can be defined to evaluate prediction accuracy as follows. Let us consider that data until the calendar year

t^{*}

were used to calibrate mortality models, for

1 < t^{*} < T

. The sum of the squared predicted error in log scale (

S S P E L

) is defined as

S S P E L_{t^{*}} = \sum_{t = t^{*} + 1}^{T} \sum_{x = 0}^{ω} {(l o g (m_{x, t}) - l o g ({\tilde{m}}_{x, t}))}^{2}

and the sum of the absolute percentage predicted error in log scale (

S A P P E L

) as

S A P P E L_{t^{*}} = \sum_{t = t^{*} + 1}^{T} \sum_{x = 0}^{ω} (|\frac{l o g (m_{x, t}) - l o g ({\tilde{m}}_{x, t})}{l o g (m_{x, t})}|) .

In line with Li and Lee (2005), who proposed the use of the explanation ratio to compare models, here, we use this ratio to evaluate the prediction accuracy of our models. If we consider that

{\bar{m}}_{x, t^{*}} = \sum_{t = 1}^{t^{*}} m_{x, t} / t^{*}

for

1 < t^{*} < T

, the explanation ratio in log scale

R L

can be defined as

R L_{t^{*}} = 1 - \frac{S S P E L}{\sum_{t = t^{*} + 1}^{T} \sum_{x = 0}^{ω} {(l o g (m_{x, t}) - l o g ({\bar{m}}_{x, t^{*}}))}^{2}} .

Equivalent measures can be derived for mortality rates in the original scale. The sum of the squared error (

S S E

) is defined as

S S E = \sum_{\forall (x, t)} {(m_{x, t} - {\hat{m}}_{x, t})}^{2},

the sum of the absolute percentage error (

S A P E

) as

S A P E = \sum_{\forall (x, t)} (|\frac{m_{x, t} - {\hat{m}}_{x, t}}{m_{x, t}}|),

the sum of the squared predicted error (

S S P E

) as

S S P E_{t^{*}} = \sum_{t = t^{*} + 1}^{T} \sum_{x = 0}^{ω} {(m_{x, t} - {\tilde{m}}_{x, t})}^{2}

and the sum of the absolute percentage predicted error (

S A P P E

) as

S A P P E_{t^{*}} = \sum_{t = t^{*} + 1}^{T} \sum_{x = 0}^{ω} (|\frac{m_{x, t} - {\tilde{m}}_{x, t}}{m_{x, t}}|) .

Finally, the explanation ratio R can be defined as

R_{t^{*}} = 1 - \frac{S S P E}{\sum_{t = t^{*} + 1}^{T} \sum_{x = 0}^{ω} {(m_{x, t} - {\bar{m}}_{x, t^{*}})}^{2}} .

A review of these and other selection measures used in the literature, including the mean absolute error, is provided by Atance et al. (2020).

3. Motivation

To illustrate differences in the analysis of mortality rates in log scale and in the original scale, we consider the 2020 mortality rate of the Spanish male population for ages between 0 and 100 years. Data were obtained from the Human Mortality Database (HMD 2023). Let us assume that two stochastic mortality models were used to forecast the mortality rates for Spanish males.

Model A predicted a $40 %$ higher mortality rate for each age below 50 and a $5 %$ higher rate for each age above 50.
Model B predicted a $5 %$ higher mortality rate for each age below 50 and a $40 %$ higher mortality rate for each age above 50.

Figure 1 shows the mortality rate predictions made by the two models and the observed Spanish male mortality rate—on the left, as represented in logarithmic scale; on the right, as represented in the original scale.

Figure 1. Illustration of mortality rate predictions made by model A and model B in log and original scales for Spanish male population in 2020.

Figure 1 (left) shows that both model predictions are equally acceptable; however, their prediction accuracy differs when mortality rates in the original scale are analysed (Figure 1, right). When squared errors of the log mortality rates are compared, the

S S P E L

of models A and B take the same value (

5.780

); thus, the choice of model is indifferent. However, the prediction accuracy of both models differs when the sum of the squared error of mortality rates in the original scale is analysed. In this case, model A is preferred:

S S P E

of model A

= 0.003

vs.

S S P E

of model B

= 0.220 .

However, the opposite holds if the sum of the absolute percentage error is analysed. Prediction accuracy is different in log scale:

S A P P E L

of model A

= 3.20

vs.

S A P P E L

of model B

= 7.51 .

Thus, model A would be preferred. However, the choice of models is indifferent when the sum of the absolute percentage error is compared in the original scale:

S A P P E

of both models A and B

= 22.5 .

4. Stochastic Mortality Models

4.1. Lee–Carter Stochastic Mortality Model

The original Lee–Carter mortality model, introduced in 1992, can be defined as follows (Lee and Carter 1992):

l o g (m_{x, t}) = a_{x} + b_{x} \cdot k_{t} + ε_{x, t}

(1)

where

a_{x}

and

b_{x}

are the specific age parameters and

k_{t}

the time-varying index,

x = 0, \dots, ω

and

t = 1, \dots, T

. Finally, the error

ε_{x, t}

has mean 0 and variance

σ {^{2}}_{ε}

. Within this framework, infinite solutions exist. For any scalars c and d, the following transformations

{{\tilde{a}}_{x}, {\tilde{b}}_{x}, {\tilde{k}}_{t}} = {a_{x} - c \cdot b_{x}, \frac{b_{x}}{d}, d \cdot (k_{t} + c)}

give unaltered fitted values. To overcome the lack of identifiability, Lee and Carter (1992) proposed two constraints

\sum_{x} b_{x} = 1

and

\sum_{t} k_{t} = 0

.

The conditional expectation in (1) is equal to

E [l o g (m_{x, t})] = a_{x} + b_{x} \cdot k_{t} .

(2)

The expectation is the value that minimizes the sum of squared errors. One strategy for estimating the parameters is to minimize the squared residuals; however, this model cannot be directly estimated using ordinary least squares because the right-hand side of Equation (1) is not linear with the parameters. To estimate the coefficients, Lee and Carter (1992) proposed the application of singular value decomposition (SVD), that is, decomposing the matrix of log mortality rates once the average over time of log age-specific rates has been subtracted. This way, a vector of coefficient estimates

\hat{θ}

is obtained,

\hat{θ} = {({\hat{a}}_{0}, \dots, {\hat{a}}_{ω}, {\hat{b}}_{0}, \dots, {\hat{b}}_{ω}, {\hat{k}}_{1}, \dots, {\hat{k}}_{T})}^{⊤}

. Note that, in general, it does not hold that

E [l o g (m_{x, t})] = l o g (E [l o g (m_{x, t})]) .

The authors suggested recalibrating

{\hat{k}}_{t}

via iterative processes to match the estimated number of deaths with the observed number of deaths in period t,

\sum_{x} d_{x, t} = \sum_{x} E_{x, t} e x p ({\hat{a}}_{x} + {\hat{b}}_{x} \cdot {\hat{k}}_{t})

where

d_{x, t}

is the observed number of deaths in period t and at age x. The motivation for this second-stage estimate is to avoid sizeable discrepancies between the numbers of predicted and actual deaths, which are likely to occur because the first step is based on logarithms of death rates (Brouhns et al. 2002; Lee and Carter 1992). In this study, we consider the original LC model (without second-stage recalibration) for the purpose of comparison with the LC model with lognormal error distribution (see Section 4.2).

4.2. Lee–Carter Model with Known Parametric Distribution

The error term

ε_{x, t}

in (1) is often assumed to be normally distributed. In that case, it holds that

l o g (m_{x, t})

is normally distributed with

E [l o g (m_{x, t})] = a_{x} + b_{x} \cdot k_{t}

and variance

σ {^{2}}_{ε}

, i.e.,

l o g (m_{x, t}) \sim N (E [l o g (m_{x, t})], σ {^{2}}_{ε})

. This is equivalent to assuming that

m_{x, t}

is lognormally distributed as follows:

m_{x, t} \sim LN (E [l o g (m_{x, t})], σ {^{2}}_{ε}),

with

E [m_{x, t}] = exp (a_{x} + b_{x} \cdot k_{t} + \frac{σ {^{2}}_{ε}}{2})

and variance

V [m_{x, t}] = (exp (σ {^{2}}_{ε}) - 1) \cdot exp (2 \cdot (a_{x} + b_{x} \cdot k_{t}) + σ {^{2}}_{ε})

.

Brouhns et al. (2002) reported that the logarithm of the observed mortality rates was much more variable at older ages and suggested modelling the number of deaths by means of a non-linear Poisson regression model with exposure to risk. The Poisson Lee–Carter model proposed by Brouhns et al. (2002) is represented as a generalized non-linear regression model as follows:

-: The random component

$D_{x, t} \sim Poisson (E_{x, t} \cdot E [m_{x, t}]),$

where $E [D_{x, t}] = E_{x, t} \cdot E [m_{x, t}]$ or, equivalently, $E [m_{x, t}] = \frac{E [D_{x, t}]}{E_{x, t}}$ .
-: The systemic component

$η_{x, t} = a_{x} + b_{x} \cdot k_{t}$
-: The link function

$g (\frac{E [D_{x, t}]}{E_{x, t}}) = g (E [m_{x, t}]) = η_{x, t} .$

In the case of the Poisson regression, the canonical link function g is the logarithmic function, so

log (E [m_{x, t}]) = a_{x} + b_{x} \cdot k_{t},

(3)

or, equivalently,

E [m_{x, t}] = exp (a_{x} + b_{x} \cdot k_{t}) .

Renshaw and Haberman (2006) showed that maximum-likelihood estimates may be obtained under the Gaussian and Poisson structures using an iterative process. The quasi-Poisson non-linear regression is often used to account for the overdispersion of deaths. The extra dispersion parameter of the quasi-Poisson can be calculated separately from the deviance function (Wong et al. 2020). Currie (2016) showed that many common mortality models can be expressed in the standard terminology of generalized linear or non-linear models. Villegas et al. (2018) defined a unified framework of stochastic mortality, which they refer to as generalized age–period–cohort (GAPC) stochastic mortality models, taking their inspiration from the definition of age–period–cohort models proposed by Hunt and Blake (2015), with the Poisson Lee–Carter model being identified as a particular case. These first authors developed the R package StMoMo to fit GAPC models using maximum log likelihood.

Remark 1.

Assuming that

m_{x, t}

is lognormally distributed with

E [m_{x, t}] = exp (a_{x} + b_{x} \cdot k_{t} + \frac{σ {^{2}}_{ε}}{2})

and variance

V [m_{x, t}] = (exp (σ {^{2}}_{ε}) - 1) \cdot exp (2 \cdot (a_{x} + b_{x} \cdot k_{t}) + σ {^{2}}_{ε})

is equivalent to assuming that

D_{x, t}

is lognormally distributed with

E [D_{x, t}] = exp (a_{x} + b_{x} \cdot k_{t} + ln (E_{x, t}) + \frac{σ {^{2}}_{ε}}{2})

and variance

V [D_{x, t}] = (exp (σ {^{2}}_{ε}) - 1) \cdot exp (2 \cdot (a_{x} + b_{x} \cdot k_{t} + ln (E_{x, t})) + σ {^{2}}_{ε})

. Likelihood selection measures, such as the Akaike information criterion (AIC) and the Bayesian information criterion (BIC), can be used for the lognormal and Poisson LC models. However, those measures should be used with caution when comparing the Poisson and lognormal deaths framework, since they involve a discrete and a continuous random variable, respectively.

4.3. Median Lee–Carter Model

Let Y be a continuous random variable with finite expectation and cumulative distribution function

F_{Y}

defined by

F_{Y} (y) = P (Y \leq y)

. The inverse function of

F_{Y}

is known as the quantile function, Q. The quantile of order

α

is defined as

Q_{α} (Y) = F_{Y}^{- 1} (α) = inf \{y ∣ F_{Y} (y) \geq α\}

, where

α \in (0, 1)

. Santolino (2020) introduced the quantile Lee–Carter model equivalent to expression (1). Let us consider the following expression for the log mortality rate:

l o g (m_{x, t}) = a_{x}^{α} + b_{x}^{α} \cdot k_{t}^{α} + ε_{x, t}

where superscript

α

indicates the

α

-quantile associated with the parameters. The error

ε_{x, t}

has an

α

-quantile equal to 0,

Q_{α} (ε) = 0

. Thus, the quantile Lee–Carter model is defined as

Q_{α} (l o g (m_{x, t})) = a_{x}^{α} + b_{x}^{α} \cdot k_{t}^{α} .

(4)

As in the case of the mean Lee–Carter regression model, to overcome the lack of identifiability, two constraints are established, namely

\sum_{x} b_{x}^{α} = 1

and

\sum_{t} k_{t}^{α} = 0

. The median case was investigated by Santolino (2021), i.e.,

α = 0.5

. In the same way that the mean is the value that minimizes the sum of squared deviations, the median minimizes the sum of absolute deviations. The parameters of median LC regression can be estimated using least absolute techniques (Santolino 2020, 2021).

Median regression has many of the appealing properties of the ordinary sample median (Koenker 2005). Least absolute regression estimates are less sensitive to the presence of outliers than ordinary least square regression estimates are. Santolino (2021) showed that this feature is particularly appealing in the context of mortality rates since outliers are often observed (wars, pandemics, etc.). Another appealing property is that the median regression is stable under monotonic transformations. For any monotone function g, it holds that

Q_{0.5} (g (Y | X)) = g (Q_{0.5} (Y | X))

, so

Q_{0.5} (l o g (m_{x, t})) = l o g (Q_{0.5} (m_{x, t}))

. Therefore, unlike in the original LC framework (Lee and Carter 1992), in the M-LC model a second-stage estimate would not be required. This second property is partially evaluated in this study since the log mortality rate is modelled in the M-LC design and the selection measures are based on the mortality rate in the log and original scales.

5. Application

To evaluate the models’ goodness-of-fit and prediction accuracy, the mortality rate of the male population in the 0–100 age range was considered for a set of countries. Data were obtained from the Human Mortality Database (HMD 2023). In selecting the interval of years for each country, we chose the most recent period—with a minimum interval length of sixty years and a maximum of one hundred—for which complete mortality rates were available. Calendar years with null mortality rates for ages in the 0–100 range were excluded since log mortality rates cannot deal with zeros. An alternative would have been to consider null mortality rates as missing values and to use statistical techniques to impute values (Scognamiglio 2022). However, we opted to exclude these calendar years to avoid any impact on our results attributable to the application of imputation techniques. Nine countries were compared (in parentheses is the period considered in our analysis): Austria (1922–2020), Belgium (1945–2020), Canada (1921–2020), France (1921–2020), Italy (1957–2019), Japan (1947–2021), Spain (1921–2020), UK (1922–2020) and USA (1933–2020). The rest of the countries for which information was available presented null mortality rates for ages in the 0–100 range in at least one year in the last sixty and, so, were not included in the analysis.

5.1. Goodness-of-Fit

The sum of squared errors for the nine countries when evaluating their respective mortality rates in logarithmic (

S S E L

) and original scale (

S S E

) are shown in Table 1. The stochastic mortality model providing the lowest goodness-of-fit value is highlighted in bold for each country. In the logarithmic scale, the minimum sum of squared errors is observed for the LC with lognormal distribution (LN-LC) followed by the original LC model. When the sum of squared errors is analysed in the original scale (

S S E

), the best fit is again provided by the LN-LC, but it is now followed by the P-LC and M-LC. The original LC framework would be our least preferred of the four when evaluating the sum of squared errors in the original scale.

Table 1. Model fit statistics. Sum of squared error when mortality rate is evaluated in logarithmic (

S S E L

) and original scales (

S S E

).

The sum of the absolute percentage error for the nine countries when the mortality rate was evaluated in the logarithmic (

S A P E L

) and original scales (

S A P E

) are shown in Table 2. The lowest

S A P E L

value is provided by the M-LC model for six countries, so we would select this as our reference model when the minimum

S A P E L

criterion is applied. The other three models perform similarly in terms of

S A P E L

. When the minimum

S A P E

criterion is applied, the M-LC model would also be selected. In this case, the second best fit is provided by the original LC modelling approach.

Table 2. Model fit statistics. Sum of absolute percentage error when mortality rate is evaluated in logarithmic (

S A P E L

) and original scales (

S A P E

).

5.2. Prediction Accuracy

Stochastic mortality models seek to forecast future mortality; thus, their prediction accuracy is often more important than a particular model’s goodness-of-fit. The four mortality models compared herein have just one time-dependent parameter: that is, the time-varying index

k_{t}

in the LC, LN-LC and P-LC models and

k_{t}^{0.5}

in the M-LC model. In other words, the dynamics of the mortality rates are captured by the set of estimated mortality indexes

{\hat{k}}_{t}

and

{\hat{k}}_{t}^{0.5}

,

t = 1, \dots, T

. Time-series techniques are used to project mortality indexes. For comparative purposes, in all cases, estimated mortality indexes are assumed to follow an autoregressive integrated moving average with drift, ARIMA (

1, 1, 0

).

To evaluate the models’ prediction accuracy, the following approach was followed. Model parameters were estimated with mortality data until calendar year 1990 and the model was then projected until either calendar year 2020 or the last year for which mortality data were available.1 The sum of the squared prediction error and absolute percentage prediction error WAS computed in the logarithmic scale,

S S P E L

and

S A P P E L

, and in the original scale,

S S P E

and

S A P P E

. An additional year was then included in the model estimation, so that their parameters were estimated with mortality data until 1991. Mortality projections were made to the last year for which data were available and prediction errors were computed. The process was repeated with an additional year being included each time in the model estimation. In the last step, the parameters were estimated with mortality data up to and including the penultimate calendar year for which mortality data were available and, thus, mortality was projected one year ahead.

Table 3 displays in percentage terms the number of times that each model performed best in terms of the minimum squared prediction error and the absolute percentage prediction error in log and original scales. When prediction accuracy was evaluated in terms of the lowest squared prediction error in log scale, the LN-LC model performed best (

47.23 %

), followed by the LC model (

23.25 %

) and the M-LC model (

22.88 %

). However, the lowest sum of absolute prediction percentage error in log scale was most frequently obtained by the M-LC model (

38.38 %

), closely followed by the P-LC model (

35.79 %

). When prediction accuracy was evaluated in terms of the lowest sum of the squared prediction error in the original scale, the P-LC model performed best (

80.82 %

). The P-LC model also recorded the best performance in terms of obtaining the lowest sum of the absolute percentage prediction error, but was closely followed by the M-LC model (

36.16 %

and

32.84 %

, respectively).

Table 3. Comparison of prediction accuracy. Percentage of times the mortality model performed best in terms of the lowest value of the selection measure.

Table 3 displays the average of the number of times that each model performed the best for the horizon period 1991–2020. Appendix A shows the performance of the mortality models for different projection periods to evaluate the impact of the selected forecast horizon on the outcomes (Table A1). The results remained quite stable for the different time horizons. A remarkable pattern is that the PLC model was preferred to the MLC model for time horizons further out in time in accordance with the absolute error measures (

S A P P E L

and

S A P P E

). However, the preference was reversed when shorter-term predictions were made (time horizons of less than or equal to ten years, approximately).

While Table 3 provides information as to just how often a model performed best in terms of the lowest selection measure value, it says nothing about how accurate the prediction was. Table 4 addresses this by showing the average explanation ratio for the four models in the log scale

(\sum_{t^{*} = 1990}^{T - 1} R L_{t^{*}} / (T - 1990))

and the original scale

(\sum_{t^{*} = 1990}^{T - 1} R_{t^{*}} / (T - 1990)) .

On average, the explanation ratio in log scale for both the LC and LN-LC models was

93.07 %

, followed by the M-LC (

92.37 %

) and, finally, the P-LC model with the lowest mean explanation ratio

(83.88 %)

. In the original scale, the order of the performance of the models is inverted. Here, the best mean explanation ratio is obtained by the P-LC model (

89.60 %

), followed by the M-LC model (

84.24 %

) and, finally, the LN-LC and LC models had the lowest mean explanation ratios (

80.87 %

and

80.26 %

, respectively).

Table 4. Comparison of prediction accuracy. Average of the explanation ratio in log scale (

R L

) and original scale (R).

6. Analysis by Age Interval

Actuarial practitioners are typically interested in mortality patterns in advanced ages given their impact on life insurance pricing and provisions. In addition, heterogeneity in mortality, which is due to observable and unobservable differences among individuals, increases at older ages, producing more variability in the observed deaths of old populations (Pitacco 2019). In this section, we analyse the performance of the four reference mortality models when employing alternative selection criterion measures for young and old populations. The four goodness-of-fit selection measures were estimated by age for each mortality model. The mean and standard deviation values of the error measures by age are displayed in Figure A1 and Figure A2 of Appendix B, respectively. In terms of the estimated mean error, a change in the performance behaviour of models is observed at approximately the age of 50 years for the four selection measures (Figure A1). In terms of the standard deviation of estimated errors, as expected, higher values are observed for old ages (Figure A2). In the case of the PLC, a high variability in the

S S E L

measure is also observed for young ages (Figure A2). Based on these results, the age of 50 years is selected as a breaking point to separate the age range between young and adult populations. Mortality models are fitted for all ages, but the computation of model selection measures is achieved by differentiating the population into two age intervals: 0–50 years (young) and 51–100 years (old). Our goal is to analyse whether model selection is dependent on the age interval considered.

Goodness-of-fit tables for the young and old populations are provided in Appendix B. Based on the lowest sum of the squared error for the young population, the LC model for mortality rates was preferred in the log scale and the P-LC model in the original scale. However, based on the absolute percentage error for the young population, the M-LC and LC models performed best in terms of presenting the lowest

S A P E L

and

S A P E

. In the case of the old population, the preferred model was the P-LC on the basis of the

S S E L

, the

S A P E L

and the

S A P E

goodness-of-fit measures. When considering the

S S E

, the LN-LC model was the preferred model for the old population.

Model prediction accuracy results are shown separately for the young and old populations (Table 5 and Table 6). Table 5 reports in percentage terms the number of times each mortality model performed best in terms of minimum squared prediction error and absolute percentage prediction error in log and original scales, differentiating by age group. For the population under 50, the mortality model providing the best prediction most frequently was the LC model, followed closely by the LN-LC and M-LC models. In contrast, the P-LC model rarely provided the best prediction in the age range 0 to 50, regardless of the prediction measure considered. However, for the population aged 51–100, the P-LC model provided the highest degree of prediction accuracy with largely overlapping values for all prediction measures.

Table 5. Comparison of prediction accuracy for population aged 50 or younger and population aged 51 and over. Percentage of times the mortality model showed the best performance (lowest value of the selection measure).

Table 6. Comparison of prediction accuracy for population aged 50 or younger and population aged 51 and over. Average of the explanation ratio in log scale (

R L

) and original scale (R).

Finally, Table 6 shows the average of the explanation ratio in log scale and original scale, differentiating between the young and old populations. The LC model was the model with the highest explanation ratio on average in log scale for the population aged under 50, closely followed by the LN-LC and M-LC models. In contrast, the explanation ratio of the P-LC model is notably lower

(81.43 %)

. The distance, however, is shortened when the explanation ratio is analysed in the original scale for this young population. Now, the highest mean explanation ratio is obtained by the LN-LC model

(99.39 %)

, closely followed by the LC

(99.38 %)

, M-LC

(99.32 %)

and P-LC models

(98.76 %)

.

When analysing the prediction accuracy for the population aged 51 and over, the highest explanation ratio on average was obtained by the P-LC model in both the log scale

(92.95 %)

and the original scale

(89.49 %)

. The performance of the other three mortality models in terms of the mean explanation ratio was notably worse in both log and original scales. In log scale, the second-best model in terms of the mean explanation ratio was the LN-LC model

(86.16 %)

, followed by the LC

(86.11 %)

and M-LC models

(83.92 %)

. In the original scale, the second best model was the M-LC model

(80.64 %)

, followed by the LN-LC

(80.63 %)

and M-LC models

(80.02 %)

.

Remark 2.

The four reference stochastic mortality models analysed in our study are single-factor mortality models. Multiple factors may be required to capture the dynamics of mortality rates, particularly at older ages where mortality rates are higher and variability is shown to be higher. In Appendix C we provide an illustration of the performance of two-factor mortality models with lognormal and Poisson error distributions. In the case of the lognormal two-factor mortality model, the expected value of the log mortality rate is expressed as

E [l o g (m_{x, t})] = a_{x} + \sum_{i = 1}^{2} b_{x, i} \cdot k_{t, i}

. In the case of the Poisson two-factor mortality model, the log of the expected value of the mortality rate is

log (E [m_{x, t}]) = log (\frac{E [D_{x, t}]}{E_{x, t}}) = a_{x} + \sum_{i = 1}^{2} b_{x, i} \cdot k_{t, i}

. The percentage of times that the two-factor models performed the best in terms of goodness-of-fit and prediction accuracy are shown in Table A6 and Table A7, respectively. The results are in line with those obtained in the case of single-factor mortality models. In terms of goodness-of-fit, the lognormal two-factor mortality model is preferred. By contrast, the Poisson two-factor model is preferred in terms of prediction accuracy. For age, the two-factor lognormal model has a better fit and better prediction for ages below 50 years, while the two-factor Poisson model has a better fit and better prediction for ages above 50 years.

7. Discussion and Concluding Remarks

7.1. Discussion

Goodness-of-fit and prediction accuracy measures are usually defined in terms of the sum of squared errors and the sum of absolute percentage errors. In the case of mortality modelling, these measures may be defined for mortality rates in log scale or in the original scale. When our primary interest lies in the performance of mortality models for age ranges that present relatively low mortality rates, selection measures need to assess relative rather than absolute variations in estimations/predictions. In this case, the selected measures should be the sum of squared errors in log scale and the sum of absolute percentage errors in the original scale. In contrast, the sum of squared errors in the original scale and the sum of absolute percentage errors in log scale should be selected when the performance of mortality models for age ranges that present relatively high mortality rates is our priority.

This distinction between selection measures defined on the basis of mortality rates in either log or original scales is relevant because of the marked differences in mortality rates with age. For instance, in 2020, in the case of the Spanish male population, the mortality rate of a 5-year-old boy was approximately 36 times lower than that of a 50-year-old male, 430 times lower than that of a 75-year-old male, and 2429 times lower than that of a 90-year-old male (Figure 1). This means that conclusions may diverge when the analysis is conducted based on selection measures defined with mortality rates on log scale, on the one hand, and with mortality rates on the original scale, on the other.

In terms of goodness-of-fit, we conclude that the best performance is provided by the LC model with lognormal distribution when selection measures are based on squared errors, regardless of the scale of the mortality rates. In logarithmic scale, the performances of the original LC model and the LN-LC model are similar, but the latter is clearly preferred to the original LC when the squared error selection measure is based on mortality rates in the original scale. In fact, both the Poisson LC model and the median LC model are preferred to the original LC model when goodness-of-fit is evaluated based on squared errors in the original scale. The LN-LC model takes into account that the expected mortality rate is higher than the exponential of the mean of the log mortality rate. That is, the Gaussian (lognormal) error distribution for mortality rates in log (original) scale seems adequate when the purpose is to minimize squared errors. Goodness-of-fit measures are often based on the absolute percentage error (Li et al. 2016, 2021) and, here, when the selection criterion is the minimum absolute percentage error, the best performance was obtained by the M-LC model in both log and original scales.

The parameters of the LN-LC and the P-LC models were estimated using maximum likelihood, whereas the parameters of the original LC model were estimated using least square optimization techniques and those of the M-LC model using least absolute optimization techniques. In the case of the original LC model, the conditional mean of the log mortality rate is estimated; in the case of the M-LC model, the conditional median of the log mortality rate is computed. In general, the mean of the log does not match the log of the mean; yet, the median of the log does match the log of the median. The M-LC model performs better than the original LC model in terms of goodness-of-fit when selection measures based on absolute errors are used, but also when the selection measure is the sum of squared errors in the original scale. Thus, least absolute optimization algorithms can be an interesting alternative to least square optimization algorithms to estimate the parameters of the LC model when we are interested in ages with relatively high mortality rates.

Stochastic mortality models serve to predict future mortality, hence the interest in evaluating the prediction accuracy of such models. When the models’ prediction error is considered in the original scale, the most accurate predictions are obtained most often by the P-LC model. The superior performance of the P-LC model is particularly evident when prediction accuracy is evaluated in terms of the squared prediction error. When the prediction error is evaluated in log scale, the best performance is provided by the LN-LC model in terms of the squared prediction error and the M-LC model in terms of the absolute percentage predicted error. Unlike the squared prediction error in log scale, the absolute percentage prediction error in log scale penalizes prediction errors in ages associated with high mortality rates.

Mortality patterns in advanced ages attract particular attention in actuarial research given their relevance for insurance products. When considering an old population (aged 51 and over), the best fit is provided by the P-LC model when the selection criteria are defined in log scale, but also when the criteria are based on the absolute percentage error in the original scale. This means the Poisson LC should be selected if our primary concern is goodness-of-fit for a population at advanced ages. This outcome is in line with Brouhns et al. (2002), who showed that the P-LC model performed better than the original LC model at the most advanced ages (over 90) in the Belgian population in terms of the proportion of the variance accounted for by the model. However, here, unlike in Brouhns et al. (2002), we compare the prediction accuracy for different age intervals. The preference for the Poisson model becomes more explicit when the prediction accuracy is analysed for the old population (aged 51–100). In this case, all the prediction accuracy measures considered in this study show the performance of the P-LC model to be superior to that of the other models.

In short, in terms of goodness-of-fit, mortality models that perform well in log scale also perform adequately in the original scale. In general, the LC model based on the lognormal distribution is preferred to those based on squared errors, while the M-LC model is the preferred model based on absolute errors. These two models are also preferred when prediction accuracy is analysed in log scale. However, the Poisson LC model is unreservedly the one selected when prediction accuracy is analysed in the original scale, the reason being that the P-LC model performs particularly well in terms of both goodness-of-fit and prediction accuracy in the interval of ages marked by high mortality rates (population aged over 51). Yet, for the population aged 50 and under, the P-LC model performs worse in terms of both goodness-of-fit and prediction accuracy than the other models. However, even though it is the model with the poorest prediction accuracy, its explanation ratio in the original scale for this age interval is very high (

98.76 %

). The explanation for this lies in the fact that the Poisson model does not perform as well as the other three models at ages associated with infinitesimal mortality rates.

Summarizing, the original LC model or the LC model regression with heavy right-tailed distributed error, such as the lognormal distribution, were adequate to describe and predict the behaviour of mortality rates in log scale in terms of the minimum square error (

S S P E L

). However, the parameters estimates of the original LC model (least square techniques) and the lognormal-error-distributed LC model (maximum likelihood) seem more sensitive to observations with low mortality rates than those parameter estimates of the median LC model (least absolute techniques) and the Poisson-error-distributed LC model (maximum likelihood). As a result, the last two modelling approaches showed a better performance in minimizing square prediction errors in the original scale (

S S P E

) and in terms of the minimum absolute percentage prediction error in the original and log scales (

S A P P E

and

S A P P E L

). Therefore, LC-based modelling approaches focused on estimating the median or assuming Poisson error distribution seem more adequate when the mortality model is designed to predict mortality rates in the original scale (or in logarithmic scale when the prediction error is measured in terms of absolute percentage deviation). When our main interest lies in predicting mortality rates in log scale and the prediction error is measured in terms of square deviation, the original LC model or the LC model with lognormal-distributed error should be preferred.

7.2. Conclusions

In this article, we have evaluated the implications of using different selection criteria measures when seeking to choose the most suitable stochastic mortality model. We show that least absolute optimization techniques constitute an interesting alternative to least square algorithms for estimating the parameters of stochastic mortality models when our interest lies in the fitting of mortality rates expressed in the original scale. We also provide solid arguments for selecting the Poisson LC model when the main concern is the prediction accuracy of mortality rates in advanced ages (51 and over). This result has important implications, since while the Poisson assumption has traditionally been considered to provide a rigorous statistical framework, the prediction accuracies of Gaussian and Poisson LC models have rarely been compared.

In general, selection criteria measures based on log scale errors yielded approximately the same modelling preferences in the fitting and forecasting domains. That is, the quadratic and absolute error measures defined on a logarithmic scale showed roughly similar results when used to rank order explanatory models intended to explain variation in historical data and to rank-order predictive models focused on forecasting error. This was not the case for selection criteria measures based on original scale errors for both squared and absolute deviations. In that case, the selected mortality models with the best fit results according to these measures were not the preferred mortality models in terms of prediction accuracy.

The aim of our study was to examine the implications of selection criteria measures for the choice of the most appropriate stochastic model. For this purpose, four alternative versions of the stochastic LC mortality model with the same design and number of parameters were selected. The rationale for this selection was that the comparison between the mortality models should be `fair’ and no other elements should influence the results except the underlying distributional assumption and the method of parameter estimation. We argue that our comparison of the reference mortality models is useful to researchers. The need for more complex developments in mortality modelling is often justified when researchers compare the performance improvement of their models with respect to a reference mortality model. Our study may be useful in selecting the benchmark stochastic model to include in that comparison. However, we recognize that the number of models compared is limited and that a more comprehensive selection of modelling approaches should be an advantage. In this regard, the comparison of the mortality models with recent machine learning extensions of the LC model is an interesting practical exercise for the future.

To conclude, this study provides an enhanced awareness of the implications of using different selection criteria measures in terms of their impact on the performance of mortality models. Indeed, we show that models that provide a good fit or a good prediction performance in log scale may well be inadequate in the original scale, and vice versa. Some measures are better suited to mortality estimations/predictions at ages with relatively low mortality rates, while others perform better at ages with relatively high mortality rates. The use of one selection measure or another ultimately depends on the preferences of the decision makers, but they must be aware that the mortality model they select might be conditioned on the measure used in conducting the evaluation.

Funding

The Spanish Ministry of Science and Innovation supported this study under the grant PID2019-105986GB-C21.

Data Availability Statement

Data available in Human Mortality Database (https://www.mortality.org), (accessed on 1 December 2022).

Acknowledgments

The author is very grateful to the editor and reviewers for their comments and suggestions.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A

Table A1. Comparison of prediction accuracy by the length of the time horizon, where

Δ

T is the number of projected years. Percentage of times the mortality model performed best in terms of the lowest value of the selection measure.

Table A1. Comparison of prediction accuracy by the length of the time horizon, where

Δ

T is the number of projected years. Percentage of times the mortality model performed best in terms of the lowest value of the selection measure.

	$SSPEL$				$SSPE$				$SAPPEL$				$SAPPE$
$Δ$ T $^{*}$	LC	LN-LC	PLC	MLC	LC	LN-LC	PLC	MLC	LC	LN-LC	PLC	MLC	LC	LN-LC	PLC	MLC
30	33.3	33.3	22.2	11.1	0	22.2	77.8	0	0	22.2	55.6	22.2	22.2	11.1	55.6	11.1
29	22.2	33.3	11.1	33.3	0	11.1	88.9	0	0	22.2	55.6	22.2	22.2	0	55.6	22.2
28	11.1	44.4	11.1	33.3	0	22.2	77.8	0	0	22.2	55.6	22.2	22.2	0	55.6	22.2
27	11.1	33.3	11.1	44.4	0	22.2	77.8	0	0	22.2	55.6	22.2	22.2	0	66.7	11.1
26	11.1	44.4	11.1	33.3	0	22.2	66.7	11.1	0	22.2	66.7	11.1	22.2	0	66.7	11.1
25	11.1	44.4	11.1	33.3	0	22.2	77.8	0	0	22.2	66.7	11.1	11.1	0	77.8	11.1
24	22.2	44.4	11.1	22.2	0	22.2	66.7	11.1	22.2	0	55.6	22.2	11.1	0	66.7	22.2
23	11.1	44.4	11.1	33.3	0	11.1	66.7	22.2	0	22.2	55.6	22.2	22.2	0	55.6	22.2
22	22.2	55.6	11.1	11.1	0	22.2	66.7	11.1	0	22.2	55.6	22.2	33.3	11.1	33.3	22.2
21	11.1	55.6	11.1	22.2	11.1	0	77.8	11.1	0	22.2	55.6	22.2	33.3	0	44.4	22.2
20	22.2	44.4	11.1	22.2	11.1	0	77.8	11.1	0	11.1	55.6	33.3	22.2	11.1	44.4	22.2
19	22.2	44.4	11.1	22.2	11.1	0	77.8	11.1	0	11.1	55.6	33.3	22.2	0	55.6	22.2
18	22.2	44.4	11.1	22.2	11.1	0	77.8	11.1	0	11.1	55.6	33.3	33.3	0	44.4	22.2
17	22.2	44.4	11.1	22.2	0	0	88.9	11.1	0	11.1	55.6	33.3	33.3	0	44.4	22.2
16	22.2	44.4	11.1	22.2	0	11.1	88.9	0	0	22.2	44.4	33.3	33.3	0	33.3	33.3
15	33.3	33.3	11.1	22.2	0	0	88.9	11.1	11.1	22.2	33.3	33.3	33.3	0	44.4	22.2
14	22.6	66.7	0	11.1	0	11.1	88.9	0	0	22.2	33.3	44.4	22.2	0	33.3	44.4
13	22.2	44.4	11.1	22.2	0	11.1	88.9	0	0	33.3	33.3	33.3	22.2	0	44.4	33.3
12	11.1	66.7	0	11.1	0	11.1	88.9	0	0	11.1	33.3	55.6	22.2	0	44.4	33.3
11	22.2	55.6	0	22.2	0	11.1	88.9	0	0	44.4	22.2	33.3	22.2	0	44.4	33.3
10	11.1	77.8	0	11.1	0	11.1	88.9	0	0	22.2	22.2	55.6	22.2	0	22.2	55.6
9	33.3	44.4	0	22.2	11.1	0	77.8	11.1	22.2	11.1	22.2	44.4	33.3	0	22.2	44.4
8	11.1	77.8	0	11.1	0	0	88.9	11.1	11.1	11.1	11.1	66.7	22.2	0	11.1	66.7
7	44.4	55.6	0	0	0	0	88.9	11.1	22.2	11.1	0	66.7	44.4	0	11.1	44.4
6	44.4	33.3	0	22.2	0	0	88.9	11.1	22.2	11.1	0	66.7	44.4	0	0	55.6
5	22.2	55.6	0	22.2	0	0	88.9	11.1	22.2	11.1	0	66.7	44.4	0	0	55.6
4	33.3	55.6	0	11.1	11.1	0	77.8	11.1	11.1	22.2	0	66.7	55.6	0	0	44.4
3	33.3	33.3	0	33.3	11.1	0	88.9	0	11.1	22.2	0	66.7	44.4	0	0	55.6
2	22.2	44.4	0	33.3	11.1	0	88.9	0	22.2	11.1	11.1	55.6	22.2	11.1	11.1	55.6
1	50.0	25.0	0	25.0	50.0	12.5	37.5	0	37.5	25.0	12.5	25.0	25.0	37.5	0	37.5

* Mortality rates of Italy for the year 2020 were not available. This country was excluded when

Δ

T = 1.

Appendix B

Figure A1. Average of the estimated goodness-of-fit measure values by age for the LC, LN-LC, PLC and MLC mortality models: (A)

S S E L

; (B)

S S E

; (C)

S A P E L

and (D)

S A P E

. Note: y-axis in log scale in plots (B,C).

Figure A2. Standard deviation of the estimated goodness-of-fit measure values by age for the LC, LN-LC, PLC and MLC mortality models: (A)

S S E L

; (B)

S S E

; (C)

S A P E L

and (D)

S A P E

. Note: y-axis in log scale in plots (B,C).

Table A2. Model fit statistics. Sum of squared error when mortality rate was evaluated in logarithmic and original scale for population aged 50 or younger.

	Mortality Rate in Log Scale				Mortality Rate in Original Scale
	( $SSEL$ )				( $SSE$ × 100)
	LC	LN-LC	PLC	MLC	LC	LN-LC	PLC	MLC
AUSTRALIA	158.41	158.47	337.19	188.80	0.14	0.13	0.33	0.21
BELGIUM	164.39	164.44	266.92	176.85	0.40	0.44	0.26	0.65
CANADA	96.45	96.46	299.97	107.60	0.44	0.40	0.23	0.34
FRANCE	166.41	166.55	435.10	206.31	7.85	8.05	4.28	4.78
ITALY	69.67	69.66	142.87	90.19	0.11	0.11	0.09	0.39
JAPAN	53.40	53.42	160.02	71.70	0.45	0.46	0.12	0.37
SPAIN	176.89	176.88	323.89	201.83	7.64	8.06	1.59	6.92
UK	122.09	122.10	895.18	132.45	0.46	0.49	0.32	0.45
US	45.22	45.14	93.44	52.55	0.14	0.14	0.12	0.14

Note: Minimum values in bold.

Table A3. Model fit statistics. Sum of absolute percentage error when mortality rate was evaluated in logarithmic and original scale for population aged 50 or younger.

	Mortality RATE in Log Scale				Mortality Rate in Original Scale
	( $SAPEL$ )				( $SAPE$ )
	LC	LN-LC	PLC	MLC	LC	LN-LC	PLC	MLC
AUSTRALIA	105.87	105.89	143.15	112.04	710.12	722.26	944.90	745.26
BELGIUM	82.72	82.74	101.85	82.48	594.93	605.71	703.29	592.06
CANADA	83.84	83.84	113.66	84.32	536.47	544.17	706.55	542.81
FRANCE	107.51	107.57	177.62	107.10	622.32	635.39	1094.37	601.21
ITALY	50.19	50.20	60.57	52.96	349.61	353.89	424.91	352.65
JAPAN	51.99	52.00	79.43	55.26	343.28	347.60	513.21	344.61
SPAIN	121.34	121.33	155.23	118.12	717.83	732.95	960.75	679.00
UK	93.61	93.61	206.10	90.58	602.39	612.84	1315.15	587.47
US	55.68	55.63	77.69	55.88	340.87	343.06	488.68	334.36

Note: Minimum values in bold.

Table A4. Model fit statistics. Sum of squared error when mortality rate was evaluated in logarithmic and original scale for population aged 51 and over.

	Mortality Rate in Log Scale				Mortality Rate in Original Scale
	( $SSEL$ )				( $SSE$ )
	LC	LN-LC	PLC	MLC	LC	LN-LC	PLC	MLC
AUSTRALIA	123.80	123.04	82.76	120.50	6.46	6.25	6.02	6.27
BELGIUM	70.24	69.37	53.67	71.72	8.52	8.48	8.62	8.56
CANADA	84.93	84.42	59.27	86.28	2.57	2.49	2.53	2.52
FRANCE	72.36	71.74	39.29	62.65	9.99	9.26	9.84	9.52
ITALY	17.25	17.06	10.09	12.44	1.27	1.24	1.34	1.29
JAPAN	42.72	42.42	30.86	42.36	4.29	4.14	4.24	4.49
SPAIN	70.44	67.08	59.08	65.76	5.55	4.46	4.69	4.51
UK	125.82	125.43	41.00	144.06	4.47	4.43	3.94	4.36
US	27.41	27.33	14.93	30.81	0.48	0.46	0.47	0.49

Note: Minimum values in bold.

Table A5. Model fit statistics. Sum of absolute percentage error when mortality rate was evaluated in logarithmic and original scale for population aged 51 and over.

	Mortality Rate in Log Scale				Mortality Rate in Original Scale
	( $SAPEL$ )				( $SAPE$ )
	LC	LN-LC	PLC	MLC	LC	LNLC	PLC	MLC
AUSTRALIA	330.22	327.48	257.04	293.24	639.26	643.88	464.75	587.85
BELGIUM	347.65	378.53	369.10	361.45	361.35	363.33	291.68	337.48
CANADA	231.87	231.17	192.84	218.62	538.40	538.74	420.72	511.58
FRANCE	586.79	556.83	611.27	607.44	470.09	480.73	306.56	412.75
ITALY	125.26	124.47	114.93	115.64	178.49	180.66	128.35	137.94
JAPAN	210.45	207.38	188.78	185.57	294.65	295.71	222.09	255.53
SPAIN	311.10	283.89	260.87	276.53	457.56	465.94	399.40	427.17
UK	343.64	346.19	238.46	322.75	675.86	684.58	357.78	662.02
US	118.42	117.85	91.54	109.91	282.10	281.60	203.50	265.35

Note: Minimum values in bold.

Appendix C

This appendix compares the performance of a two-factor lognormal mortality model (LN2F) and a two-factor Poisson mortality model (P2F) (see Remark 2). The percentage of times the models obtained the best results in terms of goodness-of-fit and prediction accuracy are shown in in Table A6 and Table A7, respectively. The results are shown for all ages (population), individuals younger than 50 years (young population) and individuals older than 50 years (elderly population).

Table A6. Comparison of goodness-of-fit. Percentage of times the mortality model showed the best performance (lowest value of the selection measure).

	$SSEL$		$SSE$		$SAPEL$		$S A P E$
	LN2F	P2F	LN2F	P2F	LN2F	P2F	LN2F	P2F
All ages	100%	0%	89%	11%	62%	38%	100%	0%
Under 50	100%	0%	0%	100%	100%	0%	100%	0%
50 and over	33%	57%	89%	11%	38%	62%	0%	100%

Table A7. Comparison of prediction accuracy. Percentage of times the mortality model showed the best performance (lowest value of the selection measure).

	$SSPEL$		$SSPE$		$SAPPEL$		$SAPPE$
	LN2F	P2F	LN2F	P2F	LN2F	P2F	LN2F	P2F
All ages	47%	53%	37%	63%	40%	60%	46%	54%
Under 50	51%	49%	54%	46%	51%	49%	51%	49%
50 and over	35%	65%	37%	63%	36%	64%	36%	64%

Note

1	In the cases of Italy and Japan, this was 2019 and 2021, respectively.

References

Atance, David, Ana Debón, and Eliseo Navarro. 2020. A comparison of forecasting mortality models using resampling methods. Mathematics 8: 1550. [Google Scholar] [CrossRef]
Barigou, Karim, Stéphane Loisel, and Yahia Salhi. 2021. Parsimonious predictive mortality modeling by regularization and cross-validation with and without covid-type effect. Risks 9: 5. [Google Scholar] [CrossRef]
Biffis, Enrico, Yijia Lin, and Andreas Milidonis. 2017. The cross-section of Asia-Pacific mortality dynamics: Implications for longevity risk sharing. Journal of Risk and Insurance 84: 515–32. [Google Scholar] [CrossRef]
Blake, David, Andrew J. G. Cairns, Malene Kallestrup-Lamb, and Jesper Rangvid. 2023. Longevity risk and capital markets: The 2021–22 update. Journal of Demographic Economics 89: 299–312. [Google Scholar] [CrossRef]
Brouhns, Natacha, Michel Denuit, and Jeroen Vermunt. 2002. A Poisson Log-Bilinear Regression Approach to the Construction of Projected Life Tables. Insurance: Mathematics and Economics 31: 373–93. [Google Scholar] [CrossRef]
Cairns, Andrew J. G., David Blake, and Kevin Dowd. 2006. A Two-Factor Model for Stochastic Mortality with Parameter Uncertainty: Theory and Calibration. Journal of Risk and Insurance 73: 687–718. [Google Scholar] [CrossRef]
Cairns, Andrew J. G., David Blake, Kevin Dowd, Guy D. Coughlan, David Epstein, Alen Ong, and Igor Balevich. 2009. A Quantitative Comparison of Stochastic Mortality Models Using Data From England and Wales and the United States. North American Actuarial Journal 13: 1–35. [Google Scholar] [CrossRef]
Chang, Le, and Yanlin Shi. 2022. Forecasting mortality rates with a coherent ensemble averaging approach. ASTIN Bulletin: The Journal of the IAA 53: 1–27. [Google Scholar] [CrossRef]
Chen, Ree Yongqing, and Pietro Millossovich. 2018. Sex-specific mortality forecasting for UK countries: A coherent approach. European Actuarial Journal 8: 69–95. [Google Scholar] [CrossRef] [PubMed]
Chen, Yuan, and Abdul Q. M. Khaliq. 2022. Comparative study of mortality rate prediction using data-driven recurrent neural networks and the lee-carter model. Big Data and Cognitive Computing 6: 134. [Google Scholar] [CrossRef]
Currie, Iain D. 2016. On fitting generalized linear and non-linear models of mortality. Scandinavian Actuarial Journal 2016: 356–83. [Google Scholar] [CrossRef]
Currie, Iain D., Maria Durban, and Paul H. C. Eilers. 2004. Smoothing and forecasting mortality rates. Statistical Modelling 4: 279–98. [Google Scholar] [CrossRef]
Diao, Liqun, Yechao Meng, and Chengguo Weng. 2021. A dsa algorithm for mortality forecasting. North American Actuarial Journal 25: 438–58. [Google Scholar] [CrossRef]
Enchev, Vasil, Torsten Kleinow, and Andrew J. G. Cairns. 2017. Multi-population mortality models: Fitting, forecasting and comparisons. Scandinavian Actuarial Journal 2017: 319–42. [Google Scholar] [CrossRef]
Gao, Guangyuan, and Yanlin Shi. 2021. Age-coherent extensions of the lee-carter model. Scandinavian Actuarial Journal 2021: 998–1016. [Google Scholar] [CrossRef]
Hainaut, Donatien. 2018. A neural-network analyzer for mortality forecast. ASTIN Bulletin: The Journal of the IAA 48: 481–508. [Google Scholar] [CrossRef]
HMD. 2023. Human Mortality Database. Rostock: Max Planck Institute for Demographic Research. Berkeley: University of California. Available online: www.mortality.org (accessed on 1 December 2022).
Huang, Zhiping, Michael Sherris, Andrés M. Villegas, and Jonathan Ziveyi. 2022. Modelling usa age-cohort mortality: A comparison of multi-factor affine mortality models. Risks 10: 183. [Google Scholar] [CrossRef]
Hunt, Andrew, and David Blake. 2014. General procedure for constructing mortality models. North American Actuarial Journal 18: 116–38. [Google Scholar] [CrossRef]
Hunt, Andrew, and David Blake. 2015. On the Structure and Classification of Mortality Models. Working Paper. London: Pension Institute. [Google Scholar]
Hunt, Andrew, and David Blake. 2021a. Forward mortality rates in discrete time i: Calibration and securities pricing. North American Actuarial Journal 25: S482–7. [Google Scholar] [CrossRef]
Hunt, Andrew, and David Blake. 2021b. Forward mortality rates in discrete time ii: Longevity risk and hedging strategies. North American Actuarial Journal 25: S508–33. [Google Scholar] [CrossRef]
Hunt, Andrew, and David Blake. 2021c. On the structure and classification of mortality models. North American Actuarial Journal 25: S215–34. [Google Scholar] [CrossRef]
Koenker, Roger. 2005. Quantile Regression. Econometric Society Monographs. Cambridge: Cambridge University Press. [Google Scholar] [CrossRef]
Lee, Ronald D., and Lawrence R. Carter. 1992. Modeling and Forecasting U. S. Mortality. Journal of the American Statistical Association 87: 659–71. [Google Scholar]
Li, Hong, and Yang Lu. 2017. Coherent forecasting of mortality rates: A sparse vector-autoregression approach. ASTIN Bulletin: The Journal of the IAA 47: 563–600. [Google Scholar] [CrossRef]
Li, Hong, and Yanlin Shi. 2021. Mortality forecasting with an age-coherent sparse var model. Risks 9: 35. [Google Scholar] [CrossRef]
Li, Jackie. 2013. A Poisson common factor model for projecting mortality and life expectancy jointly for females and males. Population Studies 67: 111–26. [Google Scholar] [CrossRef]
Li, Jackie. 2023. A model stacking approach for forecasting mortality. North American Actuarial Journal 27: 530–45. [Google Scholar] [CrossRef]
Li, Jackie, Leonie Tickle, and Nick Parr. 2016. A multi-population evaluation of the Poisson common factor model for projecting mortality jointly for both sexes. Journal of Population Research 33: 333–60. [Google Scholar] [CrossRef]
Li, Jackie, Maggie Lee, and Simon Guthrie. 2021. A double common factor model for mortality projection using best-performance mortality rates as reference. ASTIN Bulletin: The Journal of the IAA 51: 349–74. [Google Scholar] [CrossRef]
Li, Nan, and Ronald Lee. 2005. Coherent mortality forecasts for a group of populations: An extension of the Lee-Carter method. Demography 42: 575–94. [Google Scholar] [CrossRef]
Lyu, Pintao, Anja De Waegenaere, and Bertrand Melenberg. 2021. A multi-population approach to forecasting all-cause mortality using cause-of-death mortality data. North American Actuarial Journal 25: S421–56. [Google Scholar] [CrossRef]
Marino, Mario, Susanna Levantesi, and Andrea Nigri. 2023. A neural approach to improve the lee-carter mortality density forecasts. North American Actuarial Journal 27: 148–65. [Google Scholar] [CrossRef]
OECD. 2014. Mortality Assumptions and Longevity Risk: Implications for Pension Funds and Annuity Providers. Paris: OECD Publishing. [Google Scholar] [CrossRef]
Perla, Francesca, Ronald Richman, Salvatore Scognamiglio, and Mario V. Wüthrich. 2021. Time-series forecasting of mortality rates using deep learning. Scandinavian Actuarial Journal 2021: 572–98. [Google Scholar] [CrossRef]
Pitacco, Ermanno. 2019. Heterogeneity in mortality: A survey with an actuarial focus. European Actuarial Journal 9: 3–30. [Google Scholar] [CrossRef]
Pitt, David, Jackie Li, and Tian Kang Lim. 2018. Smoothing Poisson common factor model for projecting mortality jointly for both sexes. ASTIN Bulletin: The Journal of the IAA 48: 509–41. [Google Scholar] [CrossRef]
Plat, Richard. 2009. On stochastic mortality modeling. Insurance: Mathematics and Economics 45: 393–404. [Google Scholar] [CrossRef]
Renshaw, Arthur, and Steven Haberman. 2003. On the Forecasting of Mortality Reduction Factors. Insurance: Mathematics and Economics 32: 379–401. [Google Scholar] [CrossRef]
Renshaw, Arthur, and Steven Haberman. 2006. A Cohort-Based Extension to the Lee-Carter Model for Mortality Reduction Factors. Insurance: Mathematics and Economics 38: 556–70. [Google Scholar] [CrossRef]
Richman, Ronald, and Mario V. Wüthrich. 2021. A neural network extension of the lee–carter model to multiple populations. Annals of Actuarial Science 15: 346–66. [Google Scholar] [CrossRef]
Santolino, Miguel. 2020. The Lee-Carter quantile mortality model. Scandinavian Actuarial Journal 2020: 614–33. [Google Scholar] [CrossRef]
Santolino, Miguel. 2021. Median bilinear models in presence of extreme values. SORT-Statistics and Operations Research Transactions 45: 163–80. [Google Scholar] [CrossRef]
Scognamiglio, Salvatore. 2022. Calibrating the Lee-Carter and the Poisson Lee-Carter models via Neural Networks. ASTIN Bulletin: The Journal of the IAA 52: 519–61. [Google Scholar] [CrossRef]
SriDaran, Dilan, Michael Sherris, Andres M. Villegas, and Jonathan Ziveyi. 2022. A group regularisation approach for constructing generalised age-period-cohort mortality projection models. ASTIN Bulletin: The Journal of the IAA 52: 247–89. [Google Scholar] [CrossRef]
Villegas, Andrés M., Vladimir K. Kaishev, and Pietro Millossovich. 2018. StMoMo: An R Package for Stochastic Mortality Modeling. Journal of Statistical Software 84: 1–38. [Google Scholar] [CrossRef]
Wang, Chou-Wen, Jinggong Zhang, and Wenjun Zhu. 2021. Neighbouring prediction for mortality. ASTIN Bulletin: The Journal of the IAA 51: 689–718. [Google Scholar] [CrossRef]
Wilmoth, John R. 2000. Demography of longevity: Past, present, and future trends. Experimental Gerontology 35: 1111–29. [Google Scholar] [CrossRef]
Wong, Kennet, Jackie Li, and Sixian Tang. 2020. A modified common factor model for modelling mortality jointly for both sexes. Journal of Population Research 37: 181–212. [Google Scholar] [CrossRef]
Yang, Bowen, Jackie Li, and Uditha Balasooriya. 2016. Cohort extensions of the Poisson common factor model for modelling both genders jointly. Scandinavian Actuarial Journal 2016: 93–112. [Google Scholar] [CrossRef]

Figure 1. Illustration of mortality rate predictions made by model A and model B in log and original scales for Spanish male population in 2020.

Table 1. Model fit statistics. Sum of squared error when mortality rate is evaluated in logarithmic (

S S E L

) and original scales (

S S E

).

Table 1. Model fit statistics. Sum of squared error when mortality rate is evaluated in logarithmic (

S S E L

) and original scales (

S S E

).

	Mortality Rate in Log Scale				Mortality Rate in Original Scale
	( $SSEL$ )				( $SSE$ )
	LC	LN-LC	P-LC	M-LC	LC	LN-LC	P-LC	M-LC
AUSTRALIA	282.22	281.51	419.95	309.30	6.46	6.25	6.02	6.27
BELGIUM	234.63	233.81	320.58	248.57	8.53	8.49	8.63	8.57
CANADA	181.39	180.88	359.24	193.88	2.57	2.50	2.54	2.53
FRANCE	238.77	238.29	474.36	268.95	10.16	9.34	9.88	9.57
ITALY	86.91	86.72	152.97	102.63	1.28	1.24	1.34	1.30
JAPAN	96.11	95.84	190.87	114.07	4.30	4.15	4.24	4.49
SPAIN	247.33	243.97	382.97	267.59	5.63	4.54	4.71	4.58
UK	247.91	247.53	936.17	276.50	4.48	4.43	3.95	4.36
US	72.62	72.47	108.37	83.36	0.49	0.46	0.47	0.49

Note: Minimum values in bold.

Table 2. Model fit statistics. Sum of absolute percentage error when mortality rate is evaluated in logarithmic (

S A P E L

) and original scales (

S A P E

).

Table 2. Model fit statistics. Sum of absolute percentage error when mortality rate is evaluated in logarithmic (

S A P E L

) and original scales (

S A P E

).

	Mortality Rate in Log Scale				Mortality Rate in Original Scale
	( $SAPEL$ )				( $SAPE$ )
	LC	LN-LC	P-LC	M-LC	LC	LN-LC	P-LC	M-LC
AUSTRALIA	436.09	433.37	400.20	405.28	1349.38	1366.15	1409.66	1333.12
BELGIUM	430.36	461.27	470.95	443.97	956.28	969.04	994.97	929.54
CANADA	315.71	315.01	306.50	302.93	1074.87	1082.81	1127.27	1054.39
FRANCE	694.30	664.41	788.89	714.54	1096.32	1116.12	1400.93	1013.96
ITALY	175.45	174.67	175.51	168.60	528.11	534.55	553.25	490.59
JAPAN	262.43	259.38	268.22	240.83	637.93	643.31	735.30	600.13
SPAIN	432.43	405.22	416.09	394.65	1175.39	1198.93	1360.14	1106.17
UK	437.25	439.80	445.46	413.34	1278.28	1297.43	1672.93	1249.48
US	174.10	173.48	169.23	165.79	622.97	624.67	692.18	599.71

Note: Minimum values in bold.

Table 3. Comparison of prediction accuracy. Percentage of times the mortality model performed best in terms of the lowest value of the selection measure.

Mortality Rate in Log Scale				Mortality Rate in Original Scale
LC	LN-LC	P-LC	M-LC	LC	LN-LC	P-LC	M-LC
$S S P E L$				$S S P E$
23.25%	47.23%	6.64%	22.88%	4.42%	8.49%	80.82%	6.27%
$S A P P E L$				$S A P P E$
6.64%	19.19%	35.79%	38.38%	28.41%	2.58%	36.16%	32.85%

Note: Maximum values in bold.

Table 4. Comparison of prediction accuracy. Average of the explanation ratio in log scale (

R L

) and original scale (R).

Table 4. Comparison of prediction accuracy. Average of the explanation ratio in log scale (

R L

) and original scale (R).

Mortality Rate in Log Scale				Mortality Rate in Original Scale
LC	LN-LC	P-LC	M-LC	LC	LN-LC	P-LC	M-LC
Average of $R L_{t^{*}}$				Average of $R_{t^{*}}$
93.07%	93.07%	83.88%	92.37%	80.26%	80.87%	89.60%	84.24%

Note: Maximum values in bold.

Table 5. Comparison of prediction accuracy for population aged 50 or younger and population aged 51 and over. Percentage of times the mortality model showed the best performance (lowest value of the selection measure).

	Mortality Rate in Log Scale				Mortality Rate in Original Scale
	LC	LN-LC	P-LC	M-LC	LC	LN-LC	P-LC	M-LC
Age	$S S P E L$				$S S P E$
Under 50	42.80%	31.73%	1.84%	23.63%	39.48%	34.69%	2.21%	23.62%
51 and over	1.48%	0.74%	95.57%	2.21%	4.42%	8.49%	80.81%	6.28%
	$S A P P E L$				$S A P P E$
Under 50	40.59%	27.31%	2.21%	29.89%	33.57%	33.21%	7.38%	25.84%
51 and over	1.84%	1.84%	91.51%	4.81%	0.74%	1.10%	96.68%	1.48%

Note: Maximum values in bold.

Table 6. Comparison of prediction accuracy for population aged 50 or younger and population aged 51 and over. Average of the explanation ratio in log scale (

R L

) and original scale (R).

Table 6. Comparison of prediction accuracy for population aged 50 or younger and population aged 51 and over. Average of the explanation ratio in log scale (

R L

) and original scale (R).

	Mortality Rate in Log Scale				Mortality Rate in Original Scale
	LC	LN-LC	P-LC	M-LC	LC	LN-LC	P-LC	M-LC
Age	Average of $R L_{t^{*}}$				Average of $R_{t^{*}}$
Under 50	94.71%	94.70%	81.43%	94.38%	99.38%	99.39%	98.76%	99.32%
51 and over	86.11%	86.16%	92.95%	83.92%	80.02%	80.63%	89.49%	80.64%

Note: Maximum values in bold.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Should Selection of the Optimum Stochastic Mortality Model Be Based on the Original or the Logarithmic Scale of the Mortality Rate?

Abstract

1. Introduction

2. Notation

3. Motivation

4. Stochastic Mortality Models

4.1. Lee–Carter Stochastic Mortality Model

4.2. Lee–Carter Model with Known Parametric Distribution

4.3. Median Lee–Carter Model

5. Application

5.1. Goodness-of-Fit

5.2. Prediction Accuracy

6. Analysis by Age Interval

7. Discussion and Concluding Remarks

7.1. Discussion

7.2. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

Appendix C

Note

References

Article Metrics

Citations

Article Access Statistics