Projecting Mortality Rates Using a Markov Chain

Spreeuw, Jaap; Owadally, Iqbal; Kashif, Muhammad

doi:10.3390/math10071162

Open AccessArticle

Projecting Mortality Rates Using a Markov Chain

by

Jaap Spreeuw

¹

,

Iqbal Owadally

¹

and

Muhammad Kashif

^2,*

¹

Bayes Business School, University of London, 106 Bunhill Row, London EC1Y 8TZ, UK

²

School of Business and Economics, Universidad de las Americas Puebla, Sta. Catarina Mártir, San Andrés Cholula, Puebla 72810, Mexico

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(7), 1162; https://doi.org/10.3390/math10071162

Submission received: 28 February 2022 / Revised: 27 March 2022 / Accepted: 30 March 2022 / Published: 3 April 2022

(This article belongs to the Special Issue Markov-Chain Modelling and Applications)

Download

Browse Figures

Versions Notes

Abstract

We present a mortality model where future stochastic changes in population-wide mortality are driven by a finite-state hierarchical Markov chain. A baseline mortality in an initial ‘Alive’ state is calculated as the average logarithm of the observed mortality rates. There are several more ‘Alive’ states and a jump to the next ‘Alive’ state leads to a change (typically, an improvement) in mortality. In order to estimate the model parameters, we minimized a weighted average quadratic distance between the observed mortality rates and expected mortality rates. A two-step estimation procedure was used, and a closed-form solution for the optimal estimates of model parameters was derived in the first step, which means that the model could be parameterized very fast and efficiently. The model was then extended to allow for age effects whereby stochastic mortality improvements also depend on age. Forecasting relies on state space augmentation and an innovations state space time series model. We show that, in terms of forecasting, our model outperforms a naïve model of static mortality within a few years. The Markov approach also permits an exact computation of mortality indices, such as the complete expectation of life and annuity present values, which are key in the life insurance and pension industries.

Keywords:

mortality forecasting; Markov chain; model calibration; life insurance; pensions

MSC:

60J28

1. Introduction

Mathematical modeling of mortality trends is becoming a central concern for researchers and practitioners due to its importance for public health planning, social insurance, private life insurance, and pension systems. Accurate mortality forecasts are critical to allocate resources in a timely manner for forward planning. In this context, prolonged life expectancy, also known as longevity risk, poses challenges for the pricing, advanced funding, and reserving of life insurance and pension schemes, which may require forecasts up to 50 years ahead. Generally, it is difficult to measure and hedge the effects of mortality improvement on retirement planning. Ideally, the difference between the observed mortality and mortality estimates should be negligible. However, over the last several decades, old-age mortality projections have underestimated mortality improvement. The aim of this study is to introduce a new method for population-wide mortality modeling that is driven by a finite-state hierarchical Markov chain.

The literature on stochastic mortality models has been developing rapidly over the past 25 to 30 years. Mortality forecasting approaches can be classified into three main categories: extrapolative, explanatory, expectation. The extrapolative approach applies simple extrapolation to measures, such as life expectancy, considering that observed age patterns and trends exhibit regularity over time. One example of this approach is the seminal Lee–Carter model [1], which is a discrete-time model that is driven by a time series component. The explanatory approach makes use of epidemiological or structural models to forecast mortality by cause of death, where exogenous variables are measurable and known. The expectation-based models set the parameters of mortality by fitting deterministic functions to recent trends or by consultation with demographers and other experts. Such models are only useful for short-term forecasting, as they do not capture the stochastic features of mortality.

Recent reviews of mainstream mortality forecasting models can be found in [2,3,4,5,6,7,8,9] and the references therein. The vast majority of mortality models are extrapolative, are easier to apply, and are more accurate than the other approaches [10]. In the last few decades, the most prevalent mortality forecasting method has been the Lee–Carter model and its variants. The Lee–Carter model decomposes age-specific mortality into an overall time trend over a certain time period. The model extrapolates the overall time trend using past time series to forecast the underlying factors of force of mortality [1]. The main advantage of the Lee–Carter model is its robustness in the case of a linear past trend, and a simple stochastic model is able to forecast the age pattern of mortality with one time-varying parameter.

More recently, various modifications in estimation methods have been made to the original Lee–Carter model by Lee and Miller [11], Booth et al. [12] and Brouhns et al. [13] and Hatzopoulos and Haberman [14]. In some other extensions, nonparametric smoothing, Kalman Filtering, and multiple principal components were included: see Hyndman and Ullah [15] and De Jong and Tickle [16]. Booth et al. [17] to compare the performances of five different extensions of the Lee–Carter model. The Cairns–Blake–Dowd (CBD) model introduces the use of logit of the death probabilities as a linear or quadratic function of age to better capture mortality at older age [2,18,19]. Furthermore, for non-linear trends, the cohort parameter was included to improve mortality prediction by Renshaw and Haberman [20], Plat [21], Cairns et al. [22] and Reither et al. [23]. Other related studies include the application of machine learning in standard stochastic mortality models to identify the patterns and calibrate parameters to improve the goodness of fit [24]. Atance et al. [25] compare the Lee–Carter model and its extended two-factor version to predict dynamic life tables and conclude that the Lee–Carter model projects mortality better than other versions.

This paper introduces a new model of stochastic mortality based on a time-homogeneous continuous-time Markov chain of mortality changes. The model allows for age effects whereby mortality improvements differentially impact individuals of different ages. To forecast mortality rates, states are added to the Markov chain and an innovations state space time series model is employed. To the best of our knowledge, such models have not been employed in the mortality forecasting literature. Our model is inspired by the one discussed by Norberg [26], except that, rather than involving specific causes of death that may diminish over time, we look at mortality in aggregate terms. Markov models have also been applied to human mortality by Lin and Liu [27] and Liu and Lin [28], although in a different way from what we propose in this article. Both of these papers employed a finite-state Markov model to capture the human aging process. In Liu and Lin [28], the Markov model is subordinated by a gamma process to allow for stochastic mortality. By contrast, in our paper, the Markov model itself drives the stochastic mortality.

Our model has three major advantages. First, it is flexible: one can create as many states in the Markov model as one sees fit. Even with 200 states, say, the computations are still fast, and transparency is not compromised. Second, the calibration part is easy to implement and can even be performed on a spreadsheet. Finally, once the forecasting part is complete, i.e., projections have been made of the future changes in mortality, it is straightforward to calculate the exact distributions of key quantities, such as (future) expectancies of life and fixed-rate annuity present values. These can be obtained by solving Thiele’s differential equation (Dickson et al. [29] (p. 211)).

The setup of this paper is as follows. Section 2 introduces the model while Section 3 describes the mortality data, from the well-known Human Mortality Database, which is used to calibrate the model. The calibration procedure and its results are described in Section 4. In Section 5, the innovations state space time series model as in Hyndman et al. [30] is introduced and applied to forecasting mortality rates. The forecasting power of the model, as compared with a naïve model of static mortality, is also discussed. Applications in life insurance and pensions are briefly presented in Section 6, and Section 7 concludes. The technical details of the innovations state space model are left to Appendix A.

2. The Model

Markov processes have been applied in life insurance mathematics for more than 50 years. Amsler [31] gave a seminal lecture at the 18th International Congress of Actuaries, while a publication by Hoem [32] introduced the Markov model in the actuarial literature. Of the many publications on Markov processes, the textbooks by Haberman and Pitacco [33] (in particular the applications in disability insurance), Wolthuis [34] and Dickson et al. [29] (ch. 8) are noteworthy.

A basic survival model that is used in elementary life insurance mathematics is the Markov process, which involves only two states: an initial ‘Alive’ state and a terminal ‘Dead’ state, denoted by a and d, respectively. The instantaneous transition intensity from ‘Alive’ to ‘Dead’ is then an instantaneous mortality rate that depends on age. For an x-year old individual, the instantaneous mortality rate is denoted by

μ (x)

. Let

p_{j k} (s, t)

, with

j, k \in \{a, d\}

and for

0 \leq s \leq t

, be the transition probability of a life being in state k at time t given that the life is in state j at time s. Since the ‘Dead’ state is an absorbing state, it follows that

\begin{matrix} p_{d d} (s, t) = 1 - p_{d a} (s, t) = 1, \\ p_{a a} (s, t) = 1 - p_{a d} (s, t) = e^{- \int_{s}^{t} μ_{x + u} d u} . \end{matrix}

Our model extends the basic two-state model above by augmenting the state space to allow for mortality improvements. We consider a continuous-time, time-homogeneous hierarchical Markov process consisting of

N + 2

states, i.e.,

N + 1

strongly transient ‘Alive’ states and one ‘Dead’ state, the latter denoted by D. This is depicted in Figure 1. Each individual aged x starts in the ‘Alive’ state 0 with an age-dependent instantaneous rate of mortality

μ_{x}^{(0)}

specified as

μ_{x}^{(0)} = μ_{x}^{s} exp (γ (0))

, where

μ_{x}^{s}

denotes the standard (or benchmark or baseline) rate of mortality, while

exp (γ (0))

reflects the relative difference between the initial mortality and the standard mortality. From the ‘Alive’ state

i \in [0, N)

, an individual can only make a transition to either (a) the next ‘Alive’ state

i + 1

with intensity

λ (i)

, which is independent of time but allows to depend on the state of sojourn; or to (b) the ‘Dead’ state with rate of mortality

μ_{x}^{(i)}

. Thus,

μ_{x}^{(i)}

denotes the instantaneous rate of mortality for an x-year old in state i, with

i \in {0, \dots, N - 1}

. In the last ‘Alive’ state N, only a transition to the ‘Dead’ state, with mortality rate

μ_{x}^{(N)}

is possible. This implies that any life can experience, at most, N age-independent changes in mortality.

We first set up a preliminary model where improvements in population-wide mortality proceeded as follows. A transition from state i to state

i + 1

entails a relative improvement in mortality of

100 (1 - exp (γ (i + 1))) %

at all ages, so

μ_{x}^{(i + 1)} = exp (γ (i + 1)) μ_{x}^{(i)}

for all age x. Thus,

γ (i + 1)

is the log-change in instantaneous mortality rate from state i to state

i + 1

. It is also useful to define

Γ (i) = \sum_{j = 0}^{i} γ (j)

as the cumulative log-change in mortality rate by state i.

Notice that the transition intensities do not depend on time. The transition intensity

λ (i)

from one ‘Alive’ state i to the next ‘Alive’ state

i + 1

depends only on state i. The mortality rate

μ_{x}^{(i)}

depends on the age x of an individual, but not on clock time t for the population as a whole. The Markov chain is therefore a time-homogeneous process capturing population-wide mortality improvements.

For this hierarchical Markov chain, using the same definition as above for the transition probability

p_{j k} (s, t)

, but now with

j, k \in \{0, 1, \dots, N, D\}

, we have the following expressions:

p_{i k} (s, t) = 0, for i \in \{1, \dots, N\} and k < i,

p_{i i} (s, t) = e^{- \int_{s}^{t} (λ (i) + μ_{x + u}^{(i)}) d u}, for i \in \{0, \dots, N - 1\},

p_{N N} (s, t) = e^{- \int_{s}^{t} μ_{x + u}^{(N)} d u} .

In reality, when population-wide mortality improvements occur, they will be of a different magnitudes at different ages. Our full model therefore extends the preliminary model by allowing for age effects, which are captured by the factor

b_{x}

for an individual aged x. In the ‘Alive’ state 0, the mortality rate for an individual aged x is

μ_{x}^{(0)} = μ_{x}^{s} exp (b_{x} γ (0))

. For any x-year old, a transition from state i to state

i + 1

entails a relative improvement in mortality of

100 (1 - exp (b_{x} γ (i + 1))) %

, so that

μ_{x}^{(i + 1)} = μ_{x}^{(i)} exp (b_{x} γ (i + 1))

.

3. Mortality Data

We consider a life insurer, at the end of year 2000, which has mortality data for several years until the year 2000. The data pertains to female policyholders of ages varying from 20 to a limiting age, which is assumed to be 105. We choose female mortality data purely for illustrative purposes, and could equally have chosen male mortality data. (Most mortality forecasting studies use either male or female data, unless a gender-comparative analysis is being undertaken, e.g., Chiou and Müller [35] use female data whereas Cairns et al. [2] use male data.) The life insurer is in charge of devising a sound model of stochastic mortality, using the available mortality statistics which we assume are drawn from the Human Mortality Database (2020) [36].

At present, the Human Mortality Database (HMD) [36] contains detailed population and mortality data for 41 countries or areas. This includes input data, such as death counts, census counts, birth counts, and population estimates. Such input data enable the calculation of key quantities, such as exposure to risk, death rates, and life tables of national populations, which can also be found in the HMD [36]. The period of time for which complete data are available varies across countries or areas but usually involves at least a couple of decades. Most countries or areas are highly industrialized and relatively wealthy.

Two further points are noteworthy. First, we disregard ages younger than 20 years old to avoid unnecessary complexity. As pointed out by Jarner and Kryger [37], the pattern of infant and child mortality is different from adult mortality. Furthermore, in most developed countries, current levels of young-age mortality are very low.

Second, we fit the model on 51 years of in-sample annual mortality rates from year 1950 to 2000. We use the mortality rates from 2001 onward as out-of-sample data to assess the forecast error of our model. We also have mortality data at ages 20 to 104 for each year of in-sample data, giving

51 \times 85 =

4335 data points. This is far greater than the number of parameters to estimate in our model (provided that the number of states in the Markov model is not excessively large), and there is therefore no danger of overfitting the model to the available data.

4. Model Calibration to Mortality Data

4.1. Outline of Calibration Procedure

For a given value of N, we fit our model to the mortality data by minimizing the weighted average quadratic distance (WAQD) between expected log-mortality and observed log-mortality. The WAQD is defined precisely below, for both the preliminary model and the full model with age effects. In the full model, there are

2 N + 86

parameters: transition intensities

λ (0), \dots, λ (N - 1)

; mortality improvement factors

γ (0), \dots, γ (N)

; age effect factors

b_{20}, \dots, b_{104}

at ages 20, …, 104 respectively. Ideally, all of these parameter values would be found by simultaneously minimizing the WAQD wrt of all the parameters. Unfortunately, this is either not computationally feasible or very time-consuming, unless N is small.

We proceed using a pragmatic approach instead. Consider first the preliminary model where age effects are not included. We fix the transition intensities between the various ‘Alive’ states to be constant across all states,

λ (k) \equiv λ

for

k \in [0, N)

, and we then derive mathematically an expression for the optimal estimates of the mortality improvement factors by minimizing the WAQD wrt

{γ (k), k \in [0, N]}

. We then perform a grid search for the minimum WAQD on a grid spanned by

λ

, refining the grid at any local minimum. For reasonably small N, this is feasible because we have an easily and quickly calculated closed-form expression for the optimal

{γ (k), k \in [0, N]}

.

Fixing

λ

and then minimizing the WAQD wrt

{γ (k), k \in [0, N]}

might at first sight appear to be an implausible shortcut. It is, in fact, perfectly justifiable. A direct analogy is to Girsanov’s change-of-measure theorem (Itô [38], p. 1535): one can change the probability measure underlying a standard Brownian motion, but then distort its state space by imposing a local drift, which reverses the change in measure. In financial mathematics, this underpins the pricing of securities in complete markets when an artificial risk-neutral probability measure is used along with a risk-free rate to discount the security payoffs in the state space (Shreve [39], p. 216).

Thus, armed with the optimal estimates of

λ

and

{γ (k), k \in [0, N]}

in the preliminary model, we turn to estimating all of the parameters, including the age effect factors

{b_{x}, x \in [20, 104]}

, in the full model. We will show that we can use the preliminary model parameters as the input or first-stage values to a recursive scheme, which minimizes the WAQD. Repeat substitution leads to numerical convergence to our final estimates of all the parameter values.

In the next sections, we describe in greater detail the procedure that we use for estimation, first for the preliminary model and then for the full model.

4.2. Preliminary Model without Age Effects

Denote the state of the Markov chain at time t by

X_{t}

. Let

p_{0 k} (t)

be the transition probability of a life being in state k in the middle of year t given that the life starts in state 0 at time 0, i.e.,

p_{0 k} (t) = P [X_{t + 1 / 2} = k | X_{0} = 0]

. (The observed instantaneous rate of mortality for a given year estimates the rate of mortality at the midpoint of the year.)

The observed rate of mortality

{\hat{μ}}_{x, t}

at age x in year t is calculated as

{\hat{μ}}_{x, t} = d_{x, t} / E_{x, t}^{c}

. Here,

d_{x, t}

is the number of deaths recorded at age x—last birthday during the calendar year t. Furthermore,

E_{x, t}^{c}

is the exposure-to-risk at age x—last birthday during the year t (that is, the total time lived by people aged x—last birthday in the calendar year t) ([40] pp. 95–96). These data were obtained from the HMD [36]. Similar to Lee and Carter [1], the log of the estimated mortality rate

\bar{μ} (x)

at age x is then obtained as a weighted average of the log of the observed mortality rates across the years of observation:

ln [\bar{μ} (x)] = \frac{\sum_{t = 0}^{2000 - Y} w_{x, t} ln {\hat{μ}}_{x, t}}{\sum_{t = 0}^{2000 - Y} w_{x, t}},

(1)

with Y being the starting year of the period of observation and the year 2000 being the last year. The estimated mortality rate at age x from the mortality data in the HMD [36] is then taken to be the standard or baseline mortality at age x in the Markov model, i.e., in Figure 1,

μ_{x}^{s} = \bar{μ} (x)

. In Equation (1) above,

w_{x, t}

denotes the weight assigned to the exposure in year t at age x. It is common to use

w_{x, t} = {(V a r [{\hat{μ}}_{x, t}])}^{- 1}

, which can be estimated by

w_{x, t} = {(E_{x, t}^{c})}^{2} / d_{x, t}

([41] pp. 321–322).

The WAQD pertaining to calendar year t is defined as

W_{t} = \sum_{k = 0}^{N} p_{0 k} (t) \sum_{x = 20}^{104} w_{x, t} {(\sum_{j = 0}^{k} γ (j) + ln \bar{μ} (x) - ln {\hat{μ}}_{x, t})}^{2} .

(2)

This is similar to the objective functions used by Lee and Carter [1] and (Pitacco et al. [40] p. 190). The term in parentheses on the r.h.s. of Equation (2) is evidently the difference between the estimated log-mortality rate and the observed log-mortality rate plus the cumulative log-change in mortality rates by state k which is reached at time t, at a given age x. Because we may have more observations at certain ages than at other ages (e.g., there are more younger individuals than older individuals in the population), the quadratic deviation is then weighted by exposure weights. An expectation is then computed by summing the preceding quantity over the whole state space weighted by the probability of reaching state k by time t, given the starting state 0 at time 0. The total WAQD is then found by summing the right hand side of Equation (2) over the years from the start of observation in year Y to the end in year 2000, i.e.,

W = \sum_{t = 0}^{2000 - Y} W_{t}

.

As explained in Section 4.1, the first step in our calibration procedure is to minimize the WAQD wrt

{γ (k), k \in [0, N]}

. Proposition 1 sets this out below. It is helpful to recall and introduce some notation ahead of the statement and proof of Proposition 1. From Section 2,

Γ (k) = \sum_{j = 0}^{k} γ (j)

, for

k = 0, \dots, N

, is the cumulative log-change in mortality rate by state k of the Markov chain, i.e., the log-change in mortality rate from state 0 to state k. Recall also that

p_{0 k} (t) = P [X_{t} = k | X_{0} = 0]

is the transition probability of being in state k at time t given the starting state 0 at time 0. Let the transition probability vector from state 0 over time t b

p (t) = {(p_{00} (t), p_{01} (t), \dots, p_{0 N} (t))}^{T}

, and let

P (t) = diag (p (t)) \in R^{(N + 1) \times (N + 1)}

, i.e.,

P (t)

is a diagonal matrix whose leading diagonal is made up of elements of

p (t)

.

Proposition 1.

The values of

γ (k)

, which minimize the weighted average quadratic distance (WAQD) between expected and observed mortality, are given by

γ (k) = ı_{k + 1}^{T} A^{- 1} c - ı_{k}^{T} A^{- 1} c \frac{}{}

(3)

for

k = 1, \dots, N

, and

γ (0) = ı_{1}^{T} A^{- 1} c

.

In the above,

ı_{k}

is a column vector of zeros except for 1 in row k. Further,

A = \sum_{t = 0}^{2000 - Y} h_{1} (t) P (t)

, with

h_{1} (t) = \sum_{x = 20}^{104} w_{x, t}

. Moreover,

c = \sum_{t = 0}^{2000 - Y} h_{2} (t) p (t)

, with

h_{2} (t) = \sum_{x = 20}^{104} w_{x, t} (ln {\hat{μ}}_{x, t} - ln \bar{μ} (x))

.

Proof of Proposition 1.

The total WAQD is

W = \sum_{t = 0}^{2000 - Y} \sum_{k = 0}^{N} p_{0 k} (t) \sum_{x = 20}^{104} w_{x, t} {[Γ (k) + ln \bar{μ} (x) - ln {\hat{μ}}_{x, t}]}^{2} .

(4)

Expanding the term in square brackets in the equation above, and using

h_{1} (t)

and

h_{2} (t)

as defined in Proposition 1, as well as

h_{3} (t) = \sum_{x = 20}^{104} w_{x, t} {(ln {\hat{μ}}_{x, t} - ln \bar{μ} (x))}^{2}

, we can simplify the WAQD to

W = \sum_{t = 0}^{2000 - Y} \sum_{k = 0}^{N} p_{0 k} (t) [Γ {(k)}^{2} h_{1} (t) - 2 Γ (k) h_{2} (t) + h_{3} (t)] .

(5)

Now,

h_{1} (t)

,

h_{2} (t)

and

h_{3} (t)

can be taken out of the inner summation. Let

Γ = {(Γ (0), Γ (1), \dots, Γ (N))}^{T}

. Using

p (t)

and

P (t)

as defined just before Proposition 1, we obtain

W = \sum_{t = 0}^{2000 - Y} [h_{1} (t) Γ^{T} P Γ - 2 h_{2} (t) p^{T} Γ + h_{3} (t) p^{T} 1],

(6)

where

1 = {(1, \dots, 1)}^{T} \in R^{N + 1}

. Note that we suppress the dependence of

p (t)

and

P (t)

on t in the notation hereinafter, for the sake of clarity.

The elements of the diagonal matrix

P

are positive; hence,

P

is positive definite. (The elements in the leading diagonal of

P

are the non-zero transition probabilities to the various ‘Alive’ states of the Markov chain. The eigenvalues of the diagonal matrix

P

are positive. By a well-known theorem of matrices—see, for example, Theorem 2 of Johnson [42], or Itô [38], p. 996—

P

is therefore positive definite.) Furthermore,

h_{1} (t) = \sum_{x = 20}^{104} w_{x, t} > 0

since

w_{x, t} > 0

. From the quadratic form in Equation (6), we conclude that the existence and uniqueness of a minimum in W wrt.

Γ

are guaranteed.

\frac{\partial W}{\partial Γ} = \sum_{t = 0}^{2000 - Y} [2 h_{1} (t) P Γ - 2 h_{2} (t) p] = 2 A Γ - 2 c,

(7)

where

A

and

c

are defined in the proposition. To minimize W, we solve

\partial W / \partial Γ = 0

, where

0 \in R^{N + 1}

is a column vector of zeros, giving

Γ = A^{- 1} c

. Note that

A = \sum_{t = 0}^{2000 - Y} h_{1} (t) P (t)

is invertible since

h_{1} (t) \neq 0

, and

P

is non-singular. (

P

is non-singular since its leading diagonal elements are non-zero, as discussed above. By a well-known theorem of matrices—see for example Perlis [43], p. 72—the determinant of

P

, a diagonal matrix, is the product of these non-zero elements, and is therefore non-zero.)

Finally, since

Γ (k) = \sum_{j = 0}^{k} γ (j)

, it follows that

γ (0) = Γ (0) = ı_{1}^{T} Γ

and

γ (k) = Γ (k) - Γ (k - 1) = (ı_{k + 1}^{T} Γ) - (ı_{k}^{T} Γ)

for

k = 1, \dots, N

. □

As explained in the outline of our calibration procedure in Section 4.1, we make some simplifying assumptions for the sake of parsimonious modeling and to keep the estimation as straightforward as possible. First, we assume that the exposure weights

w_{x, t}

at age x and time t, as introduced in Equation (1) and used in the WAQD in Equation (2), are set equal to one,

w_{x, t} \equiv 1

, for the sake of simplicity. As argued by (Pitacco et al. [40] p. 190), using weights that are not exogenous, in that they depend on the random number of deaths, is questionable, especially for stochastic mortality models in contrast to the static ‘life tables’ used for insurance pricing purposes. Second, we assume henceforth that the transition intensity

λ > 0

, from one ‘Alive’ state to the next, is constant not just in time but also over the state space. We still require a numerical search procedure for

λ

when minimizing the WAQD, but the overall estimation procedure is simplified. In particular,

p_{0 k} (t)

can be expressed simply as:

p_{0 k} (t) = \frac{1}{k!} {(λ t)}^{k} e^{- λ t}, for k = 0, \dots, N .

(8)

Equation (8) above follows from the fact that, conditional on no death occurring, the transitions out of any ‘Alive’ state

k < N + 1

in the Markov process are restricted to those of a time-homogeneous pure birth process with rate

λ

.

Since

p_{0 k} (t)

in Equation (8) features a maximum wrt

λ

(at

λ = k / t

provided

k > 0

,

t > 0

) and no minimum, it is worth investigating whether W can indeed be minimized wrt

λ

, i.e., it is worth investigating the existence of an optimal estimate of

λ

using WAQD.

Denoting by

H (k, t)

the expression in the square brackets in Equation (5), we can express the WAQD in a compact fashion:

W = \sum_{t = 0}^{2000 - Y} \sum_{k = 0}^{N} p_{0 k} (t) H (k, t)

. Since

\partial p_{0 k} (t) / \partial λ = {(λ t)}^{k} e^{- λ t} (k - λ t) / (λ k!)

, it follows that

\frac{\partial W}{\partial λ} = \sum_{t = 0}^{2000 - Y} \sum_{k = 0}^{N} \frac{1}{λ k!} {(λ t)}^{k} e^{- λ t} (k - λ t) H (k, t) .

(9)

An analytical expression for

λ

in the solution of

\partial W / \partial λ = 0

is difficult to find, especially for a Markov chain with a large state space (large N), but numerical estimates can easily be computed.

As for the existence of a minimum in W wrt

λ

, we note that

\partial^{2} p_{0 k} (t) / \partial λ^{2} = {(λ t)}^{k} e^{- λ t} [{(k - λ t)}^{2} - k]

/ (λ^{2} k!)

, so that

\frac{\partial^{2} W}{\partial λ^{2}} = \sum_{t = 0}^{2000 - Y} \sum_{k = 0}^{N} \frac{1}{λ^{2} k!} {(λ t)}^{k} e^{- λ t} [{(k - λ t)}^{2} - k] H (k, t) .

(10)

For

k \geq 0

and

t > 1

, it is easy to see that, disregarding the term in square brackets in Equation (10), the summand inside the double summation in Equation (10) is positive. In particular,

H (k, t) > 0

since it is identical to the innermost summand in Equation (4). Whether W is convex wrt

λ

therefore rests on a weighted sum of terms in

[{(k - λ t)}^{2} - k]

. We cannot formally show the existence of a minimum, but the above analysis serves two purposes. First, it reassures us that the absence of a minimum in

p_{0 k} (t)

does not rule out a minimum in W. Second, it illustrates the difficulty in deriving the optimal parameter estimates analytically, thereby justifying our two-step estimation procedure.

4.3. Full Model with Age Effects

For the more comprehensive model allowing for age effects, i.e., with structure

μ_{x}^{(i + 1)} = μ_{x}^{(i)} exp (b_{x} γ (i))

as spelled out in Section 2, the WAQD pertaining to calendar year t is

{\tilde{W}}_{t} = \sum_{k = 0}^{N} p_{0 k} (t) \sum_{x = 20}^{104} w_{x, t} {(b_{x} \sum_{j = 0}^{k} γ (j) + ln \bar{μ} (x) - ln {\hat{μ}}_{x, t})}^{2} .

(11)

As before, the total WAQD from year Y to year 2000 is then

\tilde{W} = \sum_{t = 0}^{2000 - Y} {\tilde{W}}_{t}

.

The WAQD then needs to be optimized with respect to the mortality improvement factors

γ (k)

for

k = 0, \dots, N

, as well as the age effects

b_{x}

for

x = 20, \dots, 104

. As in the proof of Proposition 1, let the cumulative sum of the mortality improvements be

Γ (k) = \sum_{j = 0}^{k} γ (j)

, for

k = 0, \dots, N

, and let

Γ = {(Γ (0), Γ (1), \dots, Γ (N))}^{T}

. Furthermore, define

b = {(b_{20}, b_{21}, \dots, b_{104})}^{T}

. The following system of equations has to be solved:

\begin{matrix} \frac{\partial \tilde{W}}{\partial Γ} = 0 \Leftrightarrow Γ (k) = ı_{k + 1}^{T} {\tilde{A}}^{- 1} \tilde{c}, \end{matrix}

(12a)

\begin{matrix} \frac{\partial \tilde{W}}{\partial b} = 0 \Leftrightarrow b_{x} = ı_{x - 19}^{T} B^{- 1} d . \end{matrix}

(12b)

In Equation (12a),

\tilde{A}

and

\tilde{c}

are, respectively, versions of

A

and

c

(as defined above in Proposition 1), which are modified to allow for the age effects. Specifically,

\tilde{A} = \sum_{t = 0}^{2000 - Y} {\tilde{h}}_{1} (t) P (t)

with

{\tilde{h}}_{1} (t) = \sum_{x = 20}^{104} w_{x, t} b_{x}^{2}

, while

\tilde{c} = \sum_{t = 0}^{2000 - Y} {\tilde{h}}_{2} (t) p (t)

with

{\tilde{h}}_{2} (t) = \sum_{x = 20}^{104} w_{x, t} b_{x} (ln {\hat{μ}}_{x, t} - ln \bar{μ} (x))

. The vector

p (t)

and the matrix

P (t)

are unchanged from Proposition 1.

In Equation (12b),

B = \sum_{t = 0}^{2000 - Y} h_{4} (t) V (t)

. Here,

h_{4} (t) = \sum_{k = 0}^{N} p_{0 k} (t) Γ {(k)}^{2}

. We observe that

h_{4} (t)

is the second moment of

Γ (X_{t})

where

X_{t}

is the random state of the Markov chain at time t, i.e.,

h_{4} (t) = E [Γ {(X_{t})}^{2}]

. Furthermore,

V (t) = diag (w (t))

, i.e.,

V (t)

is a diagonal matrix whose leading diagonal is made up of elements of

w (t) = {(w_{20, t}, w_{21, t}, \dots, w_{104, t})}^{T}

.

Finally, in Equation (12b), we also have

d = \sum_{t = 0}^{2000 - Y} h_{5} (t) z (t)

. Here,

h_{5} (t) = \sum_{k = 0}^{N} p_{0 k} (t)

Γ (k)

, and we observe that

h_{5} (t)

is the first moment of

Γ (X_{t})

(compare with

h_{4} (t)

above which is the second moment). Furthermore,

z (t) = {(z_{20, t}, z_{21, t}, \dots, z_{104, t})}^{T}

where

z_{j, t} = w_{j, t} (ln {\hat{μ}}_{j, t} - ln \bar{μ} (x))

for

j \in [20, 104]

.

The system of Equations (12a) and (12b) can be solved numerically by successive substitution, as follows. At the first stage, start with the preliminary model without age effects, i.e.,

b_{x} = 1

at all ages

x \in [20, 104]

. Cumulative changes to mortality

Γ (k)

in all states

k \in [0, N]

can then be calculated using Proposition 1. These are then substituted into the r.h.s. of Equation (12b) yielding second-stage values for

b_{x}

at all ages

x \in [20, 104]

. In turn, these are substituted into the r.h.s. of Equation (12a), leading to second-stage values for

Γ (k)

in all states

k \in [0, N]

, and so on, until convergence is reached.

4.4. Results of Calibration

Our calibration procedure generates estimates for all the parameters of our Markov model, but it does not tell us what the size of the Markov chain should be, i.e., the number of states. We suggest that this can be determined by numerical experimentation. For the preliminary model, the calibration has been run with several values of N.

We find that, as N increases, the minimum WAQD decreases. This is as anticipated because the more states there are, the more parameters are involved, and the better the fit. We also find that, as N increases, our (WAQD-minimizing) estimate of

λ

increases. This is illustrated in Table 1, where we have 51 years of annual mortality data, from 1950 onward, on which the model is calibrated, and we choose N to be equal to 50 times 0.5, 1, 2… Adding states without changing

λ

means that the probability of eventually entering the last few “Alive” states will become smaller and eventually negligible. In order to significantly improve the fit, these probabilities need to be sufficiently different from zero, which is achieved by increasing

λ

so that the process traverses as much of the state space as possible.

Finally, we also find that, as N increases, the estimated log-changes to mortality rates,

γ (k)

,

k \in [0, N]

, decrease (results not shown here for economy of space). More states lead to more transitions if the transition intensity

λ

increases. To compensate for this, the impact of each transition should be smaller.

For the full Markov model with

N = 50

, parameterized from 51 years of mortality data from 1950, we estimate

λ = 1.29

. Table 2 and Table 3 show the optimal parameter values for

Γ (k)

,

k \in [0, N]

and

b_{x}

,

x \in [20, 104]

, respectively. We observe from Table 2 that

Γ (0) = γ (0)

is strongly positive, indicating that mortality in the initial state is much higher than the average mortality across the period of investigation. This is self-evident since mortality rates decrease during the period. For the same reason, it is unsurprising that

Γ

decreases as a function of state, reaching negative values upon reaching state 33. At that point, mortality is below average and will reduce further. The values in Table 3 follow a less monotone pattern. However, we can observe that

b_{x}

is relatively high, and therefore mortality improvements more pronounced, for young ages, up to age 45, say, compared to later ages. Moreover, note that the values for

b_{x}

are small for

x \geq 90

, so for very high ages, mortality improvements do not have a very significant impact. This suggests the possible existence of an upper limit to lifespan. This is in line with expectation-based methods, where mortality reductions are observed to be greater for younger ages, see e.g., [44].

5. Forecasting

5.1. Forecasting Procedure

To test the forecasting power of our Markov model, we calibrate it on (in-sample) mortality data from 1950 until 2000, and then forecast mortality from 2001 onward. We can then compare our forecast mortality rates to (out-of-sample) mortality data from 2001.

In order to create the forecasts, the Markov model must be augmented by new ‘Alive’ states. This is akin to forecasting a Markov counting process and adding integer states. Whilst we can use the estimated transition intensity

\hat{λ}

and the estimated age effect factors

{\hat{b}}_{x}

,

x \in [20, 104]

from the model calibration stage, the mortality change factors

{γ (k)}

over the augmented states must themselves be forecast.

In order to project the mortality change factors, we employ an innovations state space model (Hyndman et al. [30]). This is a richer class of models than the classical Holt–Winters exponential smoothing model (Hyndman et al. [45]). Trends and seasonal components, which may be of either additive or multiplicative form, are simultaneously estimated (Ord et al. [46]). We use the bias-corrected Akaike information criterion (cAIC) to select the best model in the class of innovations state space models: this turns out to be the so-called “damped trend” model (McKenzie and Gardner [47]; Hyndman et al. [30], p. 48), which is reported to be highly successful in terms of forecast accuracy when applied to different types of data (Makridakis and Hibon [48]; Gardner and McKenzie [49]; Fildes [50]). Parameter estimation is performed via maximum likelihood estimation, and both point forecasts and prediction intervals can be generated. Refer to Appendix A for details.

For the full model with age effects and

N = 50

, a plot of

Γ (k)

, along with forecasts and confidence intervals, is shown in Figure 2.

5.2. Forecast Accuracy

The forecast accuracy of our Markov model may be assessed by comparing the actual mortality rates in the out-of-sample years from 2001 onward to the forecast mortality rates in these out-of-sample years, according to our model. As in the seminal Lee–Carter stochastic mortality model [1], we wish to compare the log of the instantaneous mortality rates.

Let

\tilde{μ} (x, t)

be a random variable denoting the instantaneous mortality rate at age x in an out-of-sample year t according to our model. Our central forecast of the log-mortality rate is

E [ln \tilde{μ} (x, t)]

. By summing the squared deviation between our central forecast and the observed log-mortality over all ages, the forecast error in an out-of-sample year t can therefore be measured as:

\sum_{x = 20}^{104} {(E [ln \tilde{μ} (x, t)] - ln {\hat{μ}}_{x, t})}^{2} .

(13)

In the Markov model, from Figure 1, the log-mortality rate at age x when in state k is

ln μ_{x}^{(k)} = ln μ_{x}^{(k - 1)} + b_{x} γ (k) = ln μ_{x}^{s} + \sum_{j = 0}^{k} b_{x} γ (j),

(14)

where the standard or baseline mortality is estimated from the mortality data in the HND [36],

μ_{x}^{s} = \bar{μ} (x)

, as explained near Equation (1). Thus, the expected log-mortality at age x and time t in the Markov process is the log-mortality at age x when in state k in Equation (14), weighted by the probability that the process is in state k after t years given the starting state 0, summed over all possible values of state k:

E [ln \tilde{μ} (x, t)] = \sum_{k} p_{0 k} (t) (\sum_{j = 0}^{k} b_{x} γ (j) + ln \bar{μ} (x)) .

(15)

Substituting the expected log-mortality in Equation (15) above into Equation (13) gives the forecast error in an out-of-sample year t:

\sum_{x = 20}^{104} {[(b_{x} \sum_{k} p_{0 k} (t) \sum_{j = 0}^{k} γ (j)) + ln \bar{μ} (x) - ln {\hat{μ}}_{x, t}]}^{2},

(16)

noting that

\sum_{k} p_{0 k} (t) = 1

. Note that the error calculated in Equation (16) above for forecasting purposes is subtly different from the WAQD in Equation (11) used for calibration purposes. Note also that the forecast error can be readily and exactly calculated without need for simulations or approximations.

Table 4 lists the forecast errors at different out-of-sample years for three models: a naïve model where mortality is static and remains as in year 2000; the Markov model calibrated with

N = 50

and

Y = 1950

, and the Markov model calibrated with

N = 10

and

Y = 1990

. Recall from Section 2 that there are

N + 2

states in total: an initial ‘Alive’ state with preliminary mortality, a terminal ‘Dead’ state, and N ‘Alive’ states with improved mortality. So, the second and third Markov models in Table 4 have 52 and 12 states in total, respectively. Note also that these models are the full Markov models that allow for age effects.

We observe from Table 4 that the forecast errors for all three models generally increase the further out one is in the out-of-sample period, as one might anticipate. Judging by the total forecast errors over all of the out-of-sample years (in the bottom row of Table 4), the Markov models clearly outperform the naïve model. This lends credibility to our Markov modeling approach.

Somewhat surprisingly, the Markov model with fewer states outperforms the other Markov model in Table 4. However, our initial investigations show that it is not clear-cut that fewer states lead to better forecasts. Further research will be required to be more conclusive. The results in Table 4 serve mainly to illustrate that the Markov model is a viable model that can perform well in terms of forecasting mortality.

6. Applications in Life Insurance and Pensions

In life insurance and pensions industries, models of mortality are of critical importance. Commonly adopted measures of mortality changes include distributions of expectation of life and distributions of present values of annuities at future durations and ages. A key advantage of the Markov approach in this paper is that such indices can be calculated exactly by solving Thiele’s differential equations (Dickson et al. [29] p. 266), These differential equations enable an insurer to calculate the reserves that it needs to hold when it sells a portfolio of life insurance policies. With the Markov chain approach, we can add the different states directly to Thiele’s differential equations and solve the differential equations numerically at multiple durations. This does not require any simulations, unlike other models of stochastic mortality, such as the Lee–Carter model [1].

We give a brief example to illustrate this. If

{\bar{e}}_{x}^{(j)}

denotes the complete expectation of life of an x year old in state j, the appropriate Thiele’s differential equation would be for

j \in {0, \dots, N}

:

\frac{d}{d t} {\bar{e}}_{x + t}^{(j)} = - 1 - \sum_{k = j + 1}^{N + 1} λ ({\bar{e}}_{x + t}^{(k)} - {\bar{e}}_{x + t}^{(j)}) + μ_{x + t}^{(s)} exp [b_{x + t} Γ (j)] {\bar{e}}_{x + t}^{(j)} .

(17)

For

j = N + 1

, this equation reduces to

\frac{d}{d t} {\bar{e}}_{x + t}^{(N + 1)} = - 1 + μ_{x + t}^{(s)} exp [b_{x + t} Γ (N + 1)] {\bar{e}}_{x + t}^{(N + 1)},

(18)

while

\frac{d}{d t} {\bar{e}}_{x + t}^{(N + 2)} = 0 .

(19)

The appropriate boundary conditions are

{\bar{e}}_{ω}^{(j)} = 0

for

j \in \{0, \dots, N + 2\}

, where

ω

denotes the limiting age of a life. In addition,

μ^{(s)}

and b are defined for non-integer ages by applying polynomial interpolation between integer ages.

For durations 25, 40, and 55 (so calendar years 2000, 2015, and 2030), the cumulative distribution functions (CDFs) of complete expectation of life are displayed for ages 50 and 80 in Figure 3 and Figure 4, respectively. For a 50-year-old, the mean life expectancies are 34.77, 36.73, and 38.50, respectively. For an 80-year-old, they are 8.97, 10.03, and 11.05, respectively.

From Figure 3 and Figure 4, we notice that the CDFs move to the right as duration goes up. This is not surprising, when we consider the extrapolative nature of forecasting. Mortality improvements have been observed during the periods of observation, so we would expect mortality improvements to continue in future years. The variability of remaining lifetime is for age 80 than for age 50, due to the more limited remaining life span. Figure 3 and Figure 4 capture the variability of future remaining lifetimes and, therefore, the number of years that annuities or pensions will remain payable. They can therefore help pension and annuity providers to determine the amount of capital to hold to cover longevity risk. The CDFs can also help national governments with future general public planning (health care needs, etc.).

7. Conclusions

In this paper, we introduced a Markov chain model for stochastic mortality based on time-homogeneous continuous-time mortality changes, and we demonstrated its advantages in terms of flexibility and ease of calibration. We modeled age-independent changes in mortality by means of transitions across several ‘Alive’ states, along with a terminal ‘Dead’ state. Our preliminary model considers mortality improvements in population-wide mortality, whereas our full model allows for age effects in mortality improvements.

We used female mortality statistics drawn from the Human Mortality Database to calibrate our models using a two-step estimation procedure. In the first step, we obtained a closed-form solution to the minimization of a weighted average quadratic distance (WAQD) with respect to the cumulative log-change in mortality rates, and we then numerically estimated the transition intensities. Our investigation shows that the choice of total number of ’Alive’ states is critical. On the one hand, the greater the number of states, the better the fit to the data. One the other hand, a model with a higher number of states means that more mortality change factors are to be estimated and may lead to overfitting.

We calibrated the models on in-sample mortality data from 1950 until 2000, and then forecast mortality from 2001 onward. Our forecast can then be compared with the out-of-sample data. We employed an innovations state space model, in particular the damped trend model, to project the mortality change factors. We used these forecasts along with the estimated transition intensity and age effect factors for forecasting. We compared the actual mortality rates to the forecast mortality rates for out-of-sample data to find the forecast error for three models: naïve (with static mortality rates), full Markov model with 50 states and starting year 1950, and full Markov model with 10 states and starting year 1990. The Markov models exhibited a lower forecast error than the static model. As expected, the forecast error increased as we moved further out of sample.

Finally, we presented an application of our Markov approach to life insurance and pensions. Key mortality change indicators, such as the distributions of life expectancy and expected present values of annuities are easily calculated using Thiele’s differential equations. This should facilitate the estimation and management of longevity risk by life insurers, pension providers, and others.

The main novelty of our model is the application of both Markov chains and innovations state space models to the mortality forecasting problem. Our method has many advantages, including flexibility and ease of parameterization. With regard to flexibility, as many mortality improvement states as required can be added to the model. As demonstrated in this work, the model can be easily calibrated to real mortality data. Life expectancies and reserves required for life insurance and pensions are also easily computed, without recourse to simulations. Therefore, our model can help practitioners forecast mortality and manage longevity risk more easily.

Our work has some limitations that require further investigation and exploration. Our model is fitted by minimizing the WAQD, but other criteria could be used for this purpose. In addition, the transition intensities are assumed to be constant between the various ’Alive’ states. Furthermore, our estimation method has a limitation due to the two-step method that we utilize, whereby mortality rates are first estimated and then an innovations state space model is used for projection. The model also disregards idiosyncratic shocks to mortality, such as COVID mortality.

For future work, we intend to undertake a more rigorous and systematic investigation into the combination of factors, such as the number of states, period of investigation, and transition intensities in the model, that delivers the best forecast accuracy. This will also enable us to compare the performance of our model with that of a mainstream one, such as the Lee–Carter model. In this paper, only point forecasts were used in judging the forecasting power of our model. Another topic for future research would involve using information on prediction intervals, reflecting the parameter uncertainty of the model.

Author Contributions

Conceptualization, J.S. and I.O.; methodology, J.S. and I.O.; formal analysis, J.S. and I.O.; investigation, J.S. and I.O.; resources, I.O.; data curation, J.S.; software, J.S., I.O. and M.K.; calibration, J.S. and I.O.; writing—original draft preparation, J.S., I.O. and M.K.; writing—review and editing, I.O. and M.K.; visualization, J.S. and I.O.; supervision, J.S. and I.O.; project administration, I.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data on population estimates and mortality rates are available on the Human Mortality Database: https://www.mortality.org/ (accessed on 5 March 2020).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Innovations State Space Model

In order to forecast the coefficients

{γ (k)}

, we assume that they are realizations of a stochastic process indexed by the stage

k \in N

of mortality improvement. The stochastic process is described by an innovations state space model, which is briefly described here. For details, see Hyndman et al. [30].

The innovations state space model can be written by means of an observation equation

γ (k) = ℓ_{k - 1} + ϕ b_{k - 1} + ε_{k},

(A1)

and two state equations

\begin{matrix} ℓ_{k} & = ℓ_{k - 1} + ϕ b_{k - 1} + α ε_{k}, \end{matrix}

(A2)

\begin{matrix} b_{k} & = ϕ b_{k - 1} + β ε_{k} . \end{matrix}

(A3)

Here,

ℓ_{k}

denotes the level of the data, superposed on a trend

b_{k}

, along with additive noise

ε_{k}

, which is identically normally distributed with zero mean and variance

σ_{ε}^{2}

. The parameters

α

and

β

are smoothing parameters for the level and trend, respectively, whilst

ϕ

controls the speed at which the trend flattens out.

Three basic specifications exist, depending on the values of the three parameters:

{α \in (0, 1)

,

β \equiv ϕ \equiv 0}

or

{α, β \in (0, 1)

,

ϕ \equiv 1}

or

{α, β, ϕ \in (0, 1)}

. This is extended to a total of 10 specifications by allowing the trend and/or error to enter multiplicatively into the observation and state equations: for details, see Hyndman et al. [30]. (Seasonal components can also be incorporated, but visual inspection does not reveal any seasonality, so this is ignored here.)

We omit the last estimated value of

γ (k)

, pertaining to the terminal state of our Markov model. Since this last state is an absorbing state,

γ_{N}

is an outlier as a result of the boundary effects. We then choose the best model by minimizing the bias-corrected Akaike information criterion (cAIC).

Parameter values are found by maximizing likelihood, as described by Hyndman et al. [30]. Initial values of the state variables are chosen according to a heuristic scheme, which is empirically verified by Hyndman et al. [45]. Parameter estimates, along with initialization values and cAIC values, are displayed in Table A1. (If a parameter value appears as 0 or 1, this means that the model specification is such that the parameter is identical to 0 or 1, respectively.)

Table A1. MLE parameter estimates (

\hat{α}

,

\hat{β}

,

\hat{ϕ}

), initialization values (

ℓ_{0}

,

b_{0}

), standard error

σ_{ε}

of innovations, and bias-corrected Akaike information criterion (cAIC) for the innovations state space model fitted to

Γ (k)

for the two Markov models are described in Section 5.2.

Table A1. MLE parameter estimates (

\hat{α}

,

\hat{β}

,

\hat{ϕ}

), initialization values (

ℓ_{0}

,

b_{0}

), standard error

σ_{ε}

of innovations, and bias-corrected Akaike information criterion (cAIC) for the innovations state space model fitted to

Γ (k)

for the two Markov models are described in Section 5.2.

Y	N	$\hat{α}$	$\hat{β}$	$\hat{ϕ}$	$ℓ_{0}$	$b_{0}$	$σ_{ε}$	cAIC
1950	50	0.9983	0.9983	1	39.55	$- 2.3105$	0.2439	61.69
1990	10	0.0001	0.0001	1	4.4896	$- 0.5626$	0.3163	18.08

Point forecasts are readily calculated by substitution and iteration in the observation and state Equations (A1)–(A3), with the error term replaced by its mean of zero. Confidence levels can also be calculated since closed-form expressions for conditional variances are known for the cAIC-minimizing models that are specified for our data (see Hyndman et al. [30]). A plot of

Γ (k)

, with forecasts and confidence intervals, is shown in Figure 2.

References

Lee, R.D.; Carter, L.R. Modeling and forecasting US mortality. J. Am. Stat. Assoc. 1992, 87, 659–671. [Google Scholar] [CrossRef]
Cairns, A.J.G.; Blake, D.; Dowd, K.; Coughlan, G.D.; Epstein, D.; Ong, A.; Balevich, I. A quantitative comparison of stochastic mortality models using data from England and Wales and the United States. N. Am. Actuar. J. 2009, 13, 1–35. [Google Scholar] [CrossRef]
Cairns, A.J.G.; Blake, D.; Dowd, K.; Coughlan, G.D.; Epstein, D.; Khalaf-Allah, M. Mortality density forecasts: An analysis of six stochastic mortality models. Insur. Math. Econ. 2011, 48, 355–367. [Google Scholar] [CrossRef]
Dowd, K.; Cairns, A.J.G.; Blake, D.; Coughlan, G.D.; Epstein, D.; Khalaf-Allah, M. Evaluating the goodness of fit of stochastic mortality models. Insur. Math. Econ. 2010, 47, 255–265. [Google Scholar] [CrossRef]
Dowd, K.; Cairns, A.J.G.; Blake, D.; Coughlan, G.D.; Epstein, D.; Khalaf-Allah, M. Backtesting stochastic mortality models: An ex post evaluation of multiperiod-ahead density forecasts. N. Am. Actuar. J. 2010, 14, 281–298. [Google Scholar] [CrossRef]
Haberman, S.; Renshaw, A.E. A comparative study of parametric mortality projection models. Insur. Math. Econ. 2011, 48, 35–55. [Google Scholar] [CrossRef]
Stoeldraijer, L.; Van Duin, C.; Van Wissen, L.; Janssen, F. Impact of different mortality forecasting methods and explicit assumptions on projected future life expectancy: The case of the Netherlands. Demogr. Res. 2013, 29, 323–354. [Google Scholar] [CrossRef]
Guibert, Q.; Lopez, O.; Piette, P. Forecasting mortality rate improvements with a high-dimensional VAR. Insur. Math. Econ. 2019, 88, 255–272. [Google Scholar] [CrossRef]
Hunt, A.; Blake, D. On the structure and classification of mortality models. N. Am. Actuar. J. 2021, 25, S215–S234. [Google Scholar] [CrossRef]
Booth, H.; Tickle, L. Mortality modelling and forecasting: A review of methods. Ann. Actuar. Sci. 2008, 3, 3–43. [Google Scholar] [CrossRef]
Lee, R.; Miller, T. Evaluating the performance of the Lee-Carter method for forecasting mortality. Demography 2001, 38, 537–549. [Google Scholar] [CrossRef] [PubMed]
Booth, H.; Maindonald, J.; Smith, L. Applying Lee-Carter under conditions of variable mortality decline. Popul. Stud. 2002, 56, 325–336. [Google Scholar] [CrossRef]
Brouhns, N.; Denuit, M.; Vermunt, J.K. A Poisson log-bilinear regression approach to the construction of projected lifetables. Insur. Math. Econ. 2002, 31, 373–393. [Google Scholar] [CrossRef]
Hatzopoulos, P.; Haberman, S. A parameterized approach to modeling and forecasting mortality. Insur. Math. Econ. 2009, 44, 103–123. [Google Scholar] [CrossRef]
Hyndman, R.J.; Ullah, M.S. Robust forecasting of mortality and fertility rates: A functional data approach. Comput. Stat. Data Anal. 2007, 51, 4942–4956. [Google Scholar] [CrossRef]
De Jong, P.; Tickle, L. Extending Lee–Carter mortality forecasting. Math. Popul. Stud. 2006, 13, 1–18. [Google Scholar] [CrossRef]
Booth, H.; Hyndman, R.J.; Tickle, L.; De Jong, P. Lee-Carter mortality forecasting: A multi-country comparison of variants and extensions. Demogr. Res. 2006, 15, 289–310. [Google Scholar] [CrossRef]
Cairns, A.J.G.; Blake, D.; Dowd, K. A two-factor model for stochastic mortality with parameter uncertainty: Theory and calibration. J. Risk Insur. 2006, 73, 687–718. [Google Scholar] [CrossRef]
Li, H.; O’Hare, C. Semi-parametric extensions of the Cairns–Blake–Dowd model: A one-dimensional kernel smoothing approach. Insur. Math. Econ. 2017, 77, 166–176. [Google Scholar] [CrossRef]
Renshaw, A.E.; Haberman, S. A cohort-based extension to the Lee–Carter model for mortality reduction factors. Insur. Math. Econ. 2006, 38, 556–570. [Google Scholar] [CrossRef]
Plat, R. On stochastic mortality modeling. Insur. Math. Econ. 2009, 45, 393–404. [Google Scholar] [CrossRef]
Cairns, A.J.G.; Blake, D.; Dowd, K.; Coughlan, G.D.; Khalaf-Allah, M. Bayesian stochastic mortality modelling for two populations. ASTIN Bull. J. IAA 2011, 41, 29–59. [Google Scholar] [CrossRef]
Reither, E.N.; Olshansky, S.J.; Yang, Y. New forecasting methodology indicates more disease and earlier mortality ahead for today’s younger Americans. Health Aff. 2011, 30, 1562–1568. [Google Scholar] [CrossRef][Green Version]
Levantesi, S.; Pizzorusso, V. Application of machine learning to mortality modeling and forecasting. Risks 2019, 7, 26. [Google Scholar] [CrossRef]
Atance, D.; Debón, A.; Navarro, E. A comparison of forecasting mortality models using resampling methods. Mathematics 2020, 8, 1550. [Google Scholar] [CrossRef]
Norberg, R. Optimal hedging of demographic risk in life insurance. Financ. Stochastics 2013, 17, 197–222. [Google Scholar] [CrossRef]
Lin, X.S.; Liu, X. Markov aging process and phase-type law of mortality. N. Am. Actuar. J. 2007, 11, 92–109. [Google Scholar] [CrossRef]
Liu, X.; Lin, X.S. A subordinated Markov model for stochastic mortality. Eur. Actuar. J. 2012, 2, 105–127. [Google Scholar] [CrossRef]
Dickson, D.C.M.; Hardy, M.R.; Waters, H.R. Actuarial Mathematics for Life Contingent Risks, 2nd ed.; Cambridge University Press: New York, NY, USA, 2013. [Google Scholar]
Hyndman, R.J.; Koehler, A.B.; Ord, J.K.; Snyder, R.D. Forecasting with Exponential Smoothing: The State Space Approach; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
Amsler, M.H. Les chaines de Markov des assurances vie, invalidité et maladie. In Proceedings of the Transactions of the 18th International Congress of Actuaries, Munich, Germany, 4–11 June 1968; Volume 5, pp. 731–746. [Google Scholar]
Hoem, J.M. Markov chain models in life insurance. Blätter Der DGVFM 1969, 9, 91–107. [Google Scholar] [CrossRef]
Haberman, S.; Pitacco, E. Actuarial Models for Disability Insurance; Chapman & Hall: London, UK, 2018. [Google Scholar]
Wolthuis, H. Life Insurance Mathematics (The Markovian Model); IAE, Universiteit van Amsterdam: Amsterdam, The Netherlands, 2003. [Google Scholar]
Chiou, J.M.; Müller, H.G. Modeling hazard rates as functional data for the analysis of cohort lifetables and mortality forecasting. J. Am. Stat. Assoc. 2009, 104, 572–585. [Google Scholar] [CrossRef]
Human Mortality Database. 2020. Available online: https://www.mortality.org/ (accessed on 5 March 2020).
Jarner, S.F.; Kryger, E.M. Modelling adult mortality in small populations: The SAINT model. ASTIN Bull. 2011, 41, 377–418. [Google Scholar] [CrossRef]
Itô, K. Encyclopedic Dictionary of Mathematics, 2nd ed.; MIT Press: Cambridge, MA, USA, 1993. [Google Scholar]
Shreve, S.E. Stochastic Calculus for Finance II: Continuous-Time Models; Springer: New York, NY, USA, 2004; Volume 11. [Google Scholar]
Pitacco, E.; Denuit, M.; Haberman, S.; Olivieri, A. Modelling Longevity Dynamics for Pensions and Annuity Business; Oxford University Press: New York, NY, USA, 2009. [Google Scholar]
Benjamin, B.; Pollard, J.H. The Analysis of Mortality and Other Actuarial Statistics; The Institute of Actuaries: Oxford, UK, 1993; Volume 3. [Google Scholar]
Johnson, C.R. Positive definite matrices. Am. Math. Mon. 1970, 77, 259–264. [Google Scholar] [CrossRef]
Perlis, S. Theory of Matrices; Addison-Wesley: Reading, MA, USA, 1952. [Google Scholar]
CMIB. Report no. 17 Continuous Mortality Investigation Bureau; Technical Report; Institute and Faculty of Actuaries: London, UK, 1999. [Google Scholar]
Hyndman, R.J.; Koehler, A.B.; Snyder, R.D.; Grose, S. A state space framework for automatic forecasting using exponential smoothing methods. Int. J. Forecast. 2002, 18, 439–454. [Google Scholar] [CrossRef]
Ord, J.K.; Koehler, A.B.; Snyder, R.D. Estimation and prediction for a class of dynamic nonlinear statistical models. J. Am. Stat. Assoc. 1997, 92, 1621–1629. [Google Scholar] [CrossRef]
McKenzie, E.; Gardner, E.S. Damped trend exponential smoothing: A modelling viewpoint. Int. J. Forecast. 2010, 26, 661–665. [Google Scholar] [CrossRef]
Makridakis, S.; Hibon, M. The M3-Competition: Results, conclusions and implications. Int. J. Forecast. 2000, 16, 451–476. [Google Scholar] [CrossRef]
Gardner, E.S.; McKenzie, E. Why the damped trend works. J. Oper. Res. Soc. 2011, 62, 1177–1180. [Google Scholar] [CrossRef]
Fildes, R. The evaluation of extrapolative forecasting methods. Int. J. Forecast. 1992, 8, 81–98. [Google Scholar] [CrossRef]

Figure 1. Transition diagram for the Markov chain model of population-wide mortality changes. Instantaneous transition intensities are shown for each allowable transition. There are

N + 1

“Alive” states and one “Dead” state. Age is denoted by x and, in the preliminary model, there are no age effects

(b_{x} = 1)

.

Figure 1. Transition diagram for the Markov chain model of population-wide mortality changes. Instantaneous transition intensities are shown for each allowable transition. There are

N + 1

“Alive” states and one “Dead” state. Age is denoted by x and, in the preliminary model, there are no age effects

(b_{x} = 1)

.

Figure 2. Plot of cumulative mortality change factor

Γ (k)

for the full model with age effects and

N = 50

. Calibrated values from in-sample mortality data (starting year

Y = 1950

) are shown up to state 50, and forecasts and confidence intervals are shown for later states.

Figure 2. Plot of cumulative mortality change factor

Γ (k)

for the full model with age effects and

N = 50

. Calibrated values from in-sample mortality data (starting year

Y = 1950

) are shown up to state 50, and forecasts and confidence intervals are shown for later states.

Figure 3. Cumulative distribution function of complete expectation of life for a 50-year old at durations 25 (blue), 40 (orange), and 55 (green).

Figure 4. Cumulative distribution function of complete expectation of life for an 80-year old at durations 25 (blue), 40 (orange), and 55 (green).

Table 1. Optimal parameter values for the transition intensity from one ‘Alive’ state to the next in the preliminary Markov model, for different values of N, i.e., for a different number of states in the Markov chain; 51 years of observations are used from the starting year

Y = 1950

.

Table 1. Optimal parameter values for the transition intensity from one ‘Alive’ state to the next in the preliminary Markov model, for different values of N, i.e., for a different number of states in the Markov chain; 51 years of observations are used from the starting year

Y = 1950

.

N	$N / 50$	$\hat{λ}$
25	$0.5$	$0.77$
50	1	$1.42$
100	2	$2.69$
150	3	$3.91$
200	4	$5.06$
300	6	$7.27$
400	8	$9.45$

Table 2. Optimal parameter values for

Γ (k) = \sum_{i = 0}^{k} γ (i)

at each state

k \in [0, 50]

for the full Markov model with

λ = 1.29

,

N = 50

, parameterized from 51 years of mortality data from 1950.

Table 2. Optimal parameter values for

Γ (k) = \sum_{i = 0}^{k} γ (i)

at each state

k \in [0, 50]

for the full Markov model with

λ = 1.29

,

N = 50

, parameterized from 51 years of mortality data from 1950.

State k $Γ (k)$	State k	$Γ (k)$	State k	$Γ (k)$
0	$37.21046234$	17	$10.16128483$	34	$- 1.282926153$
1	$34.89110164$	18	$9.43041363$	35	$- 2.123815014$
2	$31.30856332$	19	$8.708274309$	36	$- 2.999173623$
3	$27.90632886$	20	$8.001132141$	37	$- 3.904036891$
4	$25.19136301$	21	$7.314907495$	38	$- 4.831988911$
5	$23.04653425$	22	$6.653009452$	39	$- 5.775537066$
6	$21.28096167$	23	$6.015309668$	40	$- 6.726552135$
7	$19.7463605$	24	$5.398173284$	41	$- 7.676727809$
8	$18.36637132$	25	$4.79519957$	42	$- 8.618012162$
9	$17.11439091$	26	$4.19826336$	43	$- 9.542970976$
10	$15.98204253$	27	$3.598533779$	44	$- 10.44505611$
11	$14.96112827$	28	$2.987293383$	45	$- 11.31876798$
12	$14.0375066$	29	$2.356516698$	46	$- 12.15971639$
13	$13.19117416$	30	$1.699253023$	47	$- 12.96459607$
14	$12.39925669$	31	$1.009887793$	48	$- 13.7311004$
15	$11.64013218$	32	$0.284343793$	49	$- 14.45779864$
16	$10.89723104$	33	$- 0.479749777$	50	$- 19.13930163$

Table 3. Optimal parameter values for

b_{x}

,

x \in [20, 104]

for the full Markov model with

λ = 1.29

,

N = 50

, parameterized from 51 years of mortality data from 1950.

Table 3. Optimal parameter values for

b_{x}

,

x \in [20, 104]

for the full Markov model with

λ = 1.29

,

N = 50

, parameterized from 51 years of mortality data from 1950.

Age x	$b_{x}$	Age x	$b_{x}$	Age x	$b_{x}$	Age x	$b_{x}$
20	$0.014261355$	42	$0.014812453$	64	$0.009601416$	86	$0.011695384$
21	$0.015866235$	43	$0.014119139$	65	$0.009402891$	87	$0.010812765$
22	$0.017292544$	44	$0.013620313$	66	$0.00894529$	88	$0.010131609$
23	$0.018935807$	45	$0.013317686$	67	$0.009874399$	89	$0.00983736$
24	$0.018301737$	46	$0.013626309$	68	$0.010134124$	90	$0.008982384$
25	$0.01955903$	47	$0.013238125$	69	$0.010570269$	91	$0.0078466$
26	$0.019086171$	48	$0.013587528$	70	$0.010295663$	92	$0.007807907$
27	$0.018720848$	49	$0.013508995$	71	$0.009893928$	93	$0.007240778$
28	$0.018955846$	50	$0.012225902$	72	$0.011135998$	94	$0.006614019$
29	$0.019767372$	51	$0.011260071$	73	$0.01159652$	95	$0.00612189$
30	$0.019052062$	52	$0.011779514$	74	$0.012122628$	96	$0.005999284$
31	$0.017292291$	53	$0.011757458$	75	$0.012241192$	97	$0.005087022$
32	$0.018619477$	54	$0.011429729$	76	$0.012617455$	98	$0.005715154$
33	$0.016869444$	55	$0.009619608$	77	$0.012072695$	99	$0.004075368$
34	$0.017544975$	56	$0.010643902$	78	$0.012779524$	100	$0.004026139$
35	$0.016240706$	57	$0.009609826$	79	$0.012945643$	101	$0.004006924$
36	$0.015965233$	58	$0.010172365$	80	$0.012194586$	102	$0.00183354$
37	$0.016163254$	59	$0.009579479$	81	$0.011769632$	103	$0.003193209$
38	$0.016294748$	60	$0.008964135$	82	$0.012136992$	104	$- 0.00459685$
39	$0.015642919$	61	$0.00845602$	83	$0.0122118$
40	$0.015160679$	62	$0.009601562$	84	$0.012377463$
41	$0.014380544$	63	$0.010045288$	85	$0.011700719$

Table 4. Forecast errors at different out-of-sample years and total forecast errors for three models: a naïve model with static mortality, the Markov model with

N = 50

, and the Markov model with

N = 10

. The Markov models are the full models incorporating age effects, and N determines the size of the state space of the model.

Table 4. Forecast errors at different out-of-sample years and total forecast errors for three models: a naïve model with static mortality, the Markov model with

N = 50

, and the Markov model with

N = 10

. The Markov models are the full models incorporating age effects, and N determines the size of the state space of the model.

Year	Naïve	$N = 50$	$N = 10$
2001	0.348	1.054	0.349
2002	0.341	1.150	0.299
2003	0.511	1.443	0.387
2004	0.743	1.718	0.545
2005	1.147	1.568	0.685
2006	1.530	2.013	0.955
2007	1.870	2.069	1.079
2008	1.761	2.427	0.992
2009	3.008	3.149	1.767
2010	3.641	2.972	1.927
2011	4.207	3.390	2.116
2012	4.985	2.868	2.412
2013	5.140	3.171	2.334
2014	5.421	3.701	2.269
2015	5.110	3.332	2.406
2016	4.642	3.574	2.192
Total	44.405	39.599	22.714

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Spreeuw, J.; Owadally, I.; Kashif, M. Projecting Mortality Rates Using a Markov Chain. Mathematics 2022, 10, 1162. https://doi.org/10.3390/math10071162

AMA Style

Spreeuw J, Owadally I, Kashif M. Projecting Mortality Rates Using a Markov Chain. Mathematics. 2022; 10(7):1162. https://doi.org/10.3390/math10071162

Chicago/Turabian Style

Spreeuw, Jaap, Iqbal Owadally, and Muhammad Kashif. 2022. "Projecting Mortality Rates Using a Markov Chain" Mathematics 10, no. 7: 1162. https://doi.org/10.3390/math10071162

APA Style

Spreeuw, J., Owadally, I., & Kashif, M. (2022). Projecting Mortality Rates Using a Markov Chain. Mathematics, 10(7), 1162. https://doi.org/10.3390/math10071162

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Projecting Mortality Rates Using a Markov Chain

Abstract

1. Introduction

2. The Model

3. Mortality Data

4. Model Calibration to Mortality Data

4.1. Outline of Calibration Procedure

4.2. Preliminary Model without Age Effects

4.3. Full Model with Age Effects

4.4. Results of Calibration

5. Forecasting

5.1. Forecasting Procedure

5.2. Forecast Accuracy

6. Applications in Life Insurance and Pensions

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Innovations State Space Model

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI