#### 4.1. Prediction of Underlying Death Causes

As an applied example for our proposed stochastic mortality model, as well as for some further applications, we take annual death data from Australia for the period 1987 to 2011. We fit our model using the matching of moments approach, as well as the maximum-likelihood approach with Markov chain Monte Carlo (MCMC). Data source for historical Australian population, categorised by age and gender, is taken from the Australian Bureau of Statistics

2 and data for the number of deaths categorised by death cause and divided into eight age categories, i.e., 50–54 years, 55–59 years, 60–64 years, 65–69 years, 70–74 years, 75–79 years, 80–84 years and 85+ years, denoted by

${a}_{1},\cdots ,{a}_{8}$, respectively, for each gender is taken from the AIHW

3. The provided death data is divided into 19 different death causes—based on the ICD-9 or ICD-10 classification—where we identify the following ten of them with common non-idiosyncratic risk factors:

‘certain infectious and parasitic diseases’, ‘neoplasms’, ‘endocrine, nutritional and metabolic diseases’, ‘mental and behavioural disorders’, ‘diseases of the nervous system’, ‘circulatory diseases’, ‘diseases of the respiratory system’, ‘diseases of the digestive system’, ‘external causes of injury and poisoning’, ‘diseases of the genitourinary system’. We merge the remaining eight death causes to idiosyncratic risk as their individual contributions to overall death counts are small for all categories. Data handling needs some care as there was a change in classification of death data in 1997 as explained at the website of the Australian Bureau of Statistics

4. Australia introduced the tenth revision of the International Classification of Diseases (ICD-10, following ICD-9) in 1997, with a transition period from 1997 to 1998. Within this period, comparability factors are given in

Table 1. Thus, for the period 1987 to 1996, death counts have to be multiplied by corresponding comparability factors.

To reduce the number of parameters which have to be estimated, cohort effects are not considered, i.e.,

$\gamma =0$, and trend reduction parameters are fixed with values

$\zeta =\varphi =0$ and

$\eta =\psi =\frac{1}{150}$. This corresponds to slow trend reduction over the data and forecasting period (no acceleration) which makes the setting similar to the Lee–Carter model. Moreover, we choose the arbitrary normalisation

${t}_{0}=1987$. Results for a more advanced modelling of trend reduction are shown later in

Section 4.2. Thus, within the maximum-likelihood framework, we end up with 394 parameters, with 362 to be optimised. For matching of moments we follow the approach given in

Section 3.4. Risk factor variances are then estimated via Approximations (

10) and (

11) of the maximum a posteriori approach as they give more reliable results than matching of moments.

Based on 40,000 MCMC steps with burn-in period of 10,000 we are able to derive estimates of all parameters where starting values are taken from matching of moments, as well as (

10) and (

11). Tuning parameters are frequently re-evaluated in the burn-in period. The execution time of our algorithm is roughly seven hours on a standard computer in ‘R’. Running several parallel MCMC chains reduces execution times to several minutes. However, note that a reduction in risk factors (e.g., one or zero risk factors for mortality modelling) makes estimation much quicker.

As an illustration,

Figure 1 shows MCMC chains of the variance of risk factor for external causes of injury and poisoning

${\sigma}_{9}^{2}$, as well as of the parameter

${\alpha}_{2,f}$ for death probability intercept of females aged 55 to 59 years. We observe in

Figure 1 that stationary distributions of MCMC chains for risk factor variances are typically right skewed. This indicates risk which is associated with underestimating variances due to limited observations of tail events.

Table 2 shows estimates for risk factor standard deviations using matching of moments, Approximation (

11), as well as mean estimates of MCMC with corresponding 5% and 95% quantiles, as well as standard errors. First,

Table 2 illustrates that (

10) and (

11), as well as matching of moments estimates for risk factor standard deviations

$\sigma $ are close to mean MCMC estimates. Risk factor standard deviations are small but tend to be higher for death causes with just few deaths as statistical fluctuations in the data are higher compared to more frequent death causes. Solely estimates for the risk factor standard deviation of mental and behavioural disorders give higher values. Standard errors, as defined in (

Shevchenko 2011, section 2.12.2) with block size 50, for corresponding risk factor variances are consistently less than 3%. We can use the approximation given in Equation (

7) to derive risk factor estimates over previous years. For example, we observe increased risk factor realisations of diseases of the respiratory system over the years 2002 to 2004. This is mainly driven by many deaths due to influenza and pneumonia during that period.

Assumption Equation (

4) provides a joint forecast of all death cause intensities, i.e., weights, simultaneously—in contrast to standard procedures where projections are made for each death cause separately. Throughout the past decades we have observed drastic shifts in crude death rates due to certain death causes over the past decades. This fact can be be illustrated by our model as shown in

Table 3. This table lists weights

${w}_{\mathrm{a},\mathrm{g},k}\left(t\right)$ for all death causes estimated for 2011, as well as forecasted for 2031 using Equation (

4) with MCMC mean estimates for males and females aged between 80 to 84 years. Model forecasts suggest that if these trends in weight changes persist, then the future gives a whole new picture of mortality. First, deaths due to circulatory diseases are expected to decrease whilst neoplasms will become the leading death cause over most age categories. Moreover, deaths due to mental and behavioural disorders are expected to rise considerably for older ages. High uncertainty in forecasted weights is reflected by wide confidence intervals (values in brackets) for the risk factor of mental and behavioural disorders. These confidence intervals are derived from corresponding MCMC chains and, therefore, solely reflect uncertainty associated with parameter estimation. Note that results for estimated trends depend on the length of the data period as short-term trends might not coincide with mid- to long-term trends. Further results can be found in

Shevchenko et al. (2015).

#### 4.2. Forecasting Death Probabilities

Forecasting death probabilities and central death rates within our proposed model is straight forward using Equation (

3). In the special case with just idiosyncratic risk, i.e.,

$K=0$, death indicators can be assumed to be Bernoulli distributed instead of being Poisson distributed in which case we may write the likelihood function in the form

with

$0\le {\widehat{N}}_{a,g,0}\left(t\right)\le {E}_{a,g}\left(t\right)$. Due to possible overfitting, derived estimates may not be sufficiently smooth across age categories

$a\in \{0,\cdots ,A\}$. Therefore, if we switch to a Bayesian setting, we may use regularisation via prior distributions to obtain stabler results. To guarantee smooth results and a sufficient stochastic foundation, we suggest the usage of Gaussian priors with mean zero and a specific correlation structure, i.e.,

$\pi (\alpha ,\beta ,\zeta ,\eta ,\gamma )=\pi \left(\alpha \right)\pi \left(\beta \right)\pi \left(\zeta \right)\pi \left(\eta \right)\pi \left(\gamma \right)$ with

and correspondingly for

$\beta $,

$\zeta $,

$\eta $ and

$\gamma $. Parameters

${c}_{\alpha}$ (correspondingly for

$\beta $,

$\zeta $,

$\eta $ and

$\gamma $) is a scaling parameters and directly associated with the variance of Gaussian priors while normalisation-parameter

${d}_{\alpha}$ guarantees that

$\pi \left(\alpha \right)$ is a proper Gaussian density. Penalty-parameter

${\epsilon}_{\alpha}$ scales the correlation amongst neighbour parameters in the sense that the lower it gets, the higher the correlation. The more we increase

${c}_{\alpha}$ the stronger the influence of, or the believe in the prior distribution. This particular prior density penalises deviations from the ordinate which is a mild conceptual shortcoming as this does not accurately reflect our prior believes. Setting

${\epsilon}_{\alpha}=0$ gives an improper prior with uniformly distributed (on

$\mathbb{R}$) marginals such that we gain that there is no prior believe in expectations of parameters but, simultaneously, lose the presence of variance-covariance-matrices and asymptotically get perfect positive correlation across parameters of different ages. Still, whilst lacking theoretical properties, better fits to data are obtained by setting

${\epsilon}_{\alpha}=0$. For example, setting

${\epsilon}_{\alpha}={\epsilon}_{\beta}={10}^{-2}$ and

${\epsilon}_{\zeta}={\epsilon}_{\eta}={\epsilon}_{\gamma}={10}^{-4}$ yields a prior correlation structure which decreases with higher age differences and which is always positive as given in subfigure (

**a**) of

Figure 2.

There exist many other reasonable choices for Gaussian prior densities. For example, replacing graduation terms

${({\alpha}_{a,g}-{\alpha}_{a+1,g})}^{2}$ in Equation (

13) by higher order differences of the form

${\left({\sum}_{\nu =0}^{k}{(-1)}^{\nu}\left(\genfrac{}{}{0pt}{}{k}{\nu}\right){\alpha}_{a,g+\nu}\right)}^{2}$ yields a penalisation for deviations from a straight line with

$k=2$, see subfigure (

**b**) in

Figure 2, or from a parabola with

$k=3$, see subfigure (

**c**) in

Figure 2. The usage of higher order differences for graduation of statistical estimates goes back to the Whittaker–Henderson method. Taking

$k=2,3$ unfortunately yields negative prior correlations amongst certain parameters which is why we do not recommend their use. Of course, there exist many further possible choices for prior distributions. However, in our example, we set

${\epsilon}_{\alpha}={\epsilon}_{\beta}={\epsilon}_{\zeta}={\epsilon}_{\eta}={\epsilon}_{\gamma}=0$ as this yields accurate results whilst still being reasonably smooth.

An optimal choice of regularisation parameters ${c}_{\alpha},{c}_{\beta},{c}_{\zeta},{c}_{\eta}$ and ${c}_{\gamma}$ can be obtained by cross-validation.

Results for Australian data from 1971 to 2013 with

${t}_{0}=2013$ are given in

Figure 3. Using MCMC we derive estimates for logarithmic central death rates

$log{m}_{a,g}\left(t\right)$ with corresponding forecasts, mortality trends

${\beta}_{a,g}$, as well as trend reduction parameters

${\zeta}_{a,g},{\eta}_{a,g}$ and cohort effects

${\gamma}_{a-t}$. As we do not assume common stochastic risk factors, the MCMC algorithm we use can be implemented very efficiently such that

$40\phantom{\rule{0.166667em}{0ex}}000$ samples from the posterior distribution of all parameters are derived within a minute. We observe negligible parameter uncertainty due to a long period of data. Further, regularisation parameters obtained by cross-validation are given by

${c}_{\alpha}=500$,

${c}_{\beta}={c}_{\eta}=30,000{c}_{\alpha}$,

${c}_{\zeta}={c}_{\alpha}/20$ and

${c}_{\gamma}=1000{c}_{\alpha}$.

We can draw some immediate conclusions. Firstly, we see an overall improvement in mortality over all ages where the trend is particularly strong for young ages and ages between 60 and 80 whereas the trend vanishes towards the age of 100, maybe implying a natural barrier for life expectancy. Due to sparse data the latter conclusion should be treated with the utmost caution. Furthermore, we see the classical hump of increased mortality driven by accidents around the age of 20 which is more developed for males.

Secondly, estimates for ${\zeta}_{a,g}$ suggest that trend acceleration switched to trend reduction throughout the past 10 to 30 years for males while for females this transition already took place 45 years ago. However, note that parameter uncertainty (under MCMC) associated with ${\zeta}_{a,g}$ is high, particularly if estimates are not regularised. Estimates for ${\eta}_{a,g}$ show that the speed of trend reduction is much stronger for males than for females. Estimates for ${\gamma}_{a-t}$ show that the cohort effect is particularly strong (in the sense of increased mortality) for the generation born between 1915 and 1930 (probably associated with World War II) and particularly weak for the generation born around 1945. However, considering cohort effects makes estimation and forecasts significantly less stable for the used data, which is why we recommend to set ${\gamma}_{a-t}=0$.

Based on forecasts for death probabilities, expected future life time can be estimated. To be consistent concerning longevity risk, mortality trends have to be included as a 60-year-old today will probably not have as good medication as a 60-year-old in several decades. However, it seems that this is not the standard approach in the literature. Based on the definitions above, expected (curtate) future life time of a person at date

T is given by

${e}_{a,g}\left(T\right)=\mathbb{E}\left[{K}_{a,g}\left(T\right)\right]={\sum}_{k=1}^{\infty}{}_{k}{p}_{a,g}\left(T\right)$, where survival probabilities over

$k\in \mathbb{N}$ years are given by

${}_{k}{p}_{a,g}\left(T\right):={\prod}_{j=0}^{k-1}\left(1-{q}_{a+j,g}(T+j)\right)$ and where

${K}_{a,g}\left(T\right)$ denotes the number of completed future years lived by a person of particular age and gender at time

T. Approximating death probabilities by central death rates, for newborns in Australia we get a life expectancy of roughly 83 years for males and

$89.5$ for females born in 2013, see

Table 4. Thus, comparing these numbers to a press release from October 2014 from the Australian Bureau of Statistics

5 saying that ‘Aussie men now expected to live past 80’ and ‘improvements in expected lifespan for women has since slowed down, increasing by around four years over the period—it’s 84.3 now’, our results show a much higher life expectancy due to the consideration of mortality trends.