1. Introduction
Since the start of the 21st century, mortality rates have been decreasing steadily due to several factors such as improved medical inventions, robotic surgery, better healthcare systems, and better diets, among many other factors, see (
Boo and Choi 2020;
Chen 2020;
Kilic 2020;
Pourhomayoun and Shakibi 2020). These factors have prompted actuaries, demographers, and statisticians to think of novel ideas to do mortality modeling and forecasting for an increased level of precision in the models. While this is a good idea for the general global population, many governments, life assurance firms, and life pension companies have substantial financial losses since they cannot make precise estimations when offering financial services. Correct mortality risk estimation is vital in their financial survival, especially after the hard times of the global Covid-19 pandemic see (
Pourhomayoun and Shakibi 2020) that is likely to lead to massive global economic recession affecting many nations, both first-world and third-world countries.
Today, in the actuarial literature, we have many refined techniques that many actuaries, statisticians, and demographers use when forecasting future mortality and systematic longevity risks. We estimate the complete life expectations of those who wish to buy annuities and life assurance products sold in the market. From (
Lee and Carter 1992), there are many stochastic mortality models currently used when modeling and forecasting systematic mortality risk. However, these models have different strengths and weaknesses depending on the data availability and the number of available parameters that need to be determined or estimated. (
Cairns et al. 2006) model improves some of the weaknesses of the (
Lee and Carter 1992) model by incorporating the cohort effects and double parameter ability. Modeling a systematic mortality risk that is constant over age while preventing overlapping of the age lines during forecasting leading desirable results as a parsimonious model was demonstrated by (
Cairns et al. 2011), which leads to high certainty levels and acceptability in the results.
Many researchers, including (
Hainaut 2018) in his paper, proposed a neural network capable of predicting and simulating future systematic mortality risk. During his research, the author used a neural analyzer when detecting latent time processes while directly predicting mortality. The approach did allow for identification and duplication of non-linearity observed in the changes of logit forces of mortality. In addition, (
Deprez et al. 2017) used some machine learning techniques to improve the estimation process of the logit mortality risk. This work was extended by (
Levantesi and Pizzorusso 2019) to the framework of mortality forecasting as in the [3] model. Furthermore, a recent paper by (
Richman and Wüthrich 2018) proposed multiple-dimensional populations for (
Lee and Carter 1992) model where it estimated the parameters using artificial neural networks. Many other relevant machine learning uses and applications in an actuarial field are discussed by (
Castellani et al. 2018) and (
Gabrielli et al. 2020), especially when looking at the future of systematic mortality risk modeling methodologies.
In this research study, we use a deep learning technique to improve the predictive capability of the (
Cairns et al. 2006) model. To be more specific, our approach aims at Integrating the original (
Cairns et al. 2006) formulation by the introduction of an artificial Recurrent Neural Networks with Long Short-Term Memory or LSTM architecture when forecasting future evolution of the
parameter thus overcoming the challenges showed by the traditional ARIMA
time series process. The choice of the CBD model instead of other standard mortality models is based on the fact that CBD solves the problem of cohort effect in mortality synonymous with other mortality models. In addition, it incorporates the effect of cohorts in models compared to others used in modeling of systematic mortality risk.
Using LSTM allows more coherency when determining mortality forecasts with high dynamism of observed mortality, especially when dealing with nonlinear mortality trends. To be more precise, the LTSM network is structured to help elaborate long data sequences to form a memory capable of preserving the vital relationships between the available data and every deviation within these sequences. In a similar sense, within the context of traditional time series, the LSTM gives room for predicting future mortality over time by considering the substantial influence of the historical systematic mortality risk trends before adequately reproducing it into the forecasted trend. In addition, the power of LSTM is by preserving the information over a given period, therefore blocking the older signals from slowly disappearing during processing.
While the research focuses on forecasting systematic mortality risk trends, parameter estimation methodology remains similar as for (
Lee and Carter 1992). The paper does introduce a new method to mortality fitting surface as by (
Hainaut 2018) that applies the use of neural networks or deep learning technique for fitting mortality rates as opposed to the conventional SVD method (Singular Value Decomposition). This study introduces a novel methodology structure based on the LSTM network when modeling future common trends of systematic mortality risk.
4. Mathematical Application and Results
In this area, we introduce the LSTM and RNN architectures within the standard scheme of the CBD model. More distinctly, the study’s objective is to exploit the advantages and functionalities of the LSTM architecture to improve the CBD model predictive capacity. For this aim, we design several experiments to test LSTM skills in forecasting future systematic mortality risk over time before comparing its performance with the results derived from the model of ARIMA.
Thus, the analysis of the study will concern on the time index
trend prediction, bearing in mind the ARIMA
model as the forecasted benchmark, whereas other parameters
and
are determined as per the estimation method by (
Cairns et al. 2006).
Distinctly, the CBD model that applies a simple random walk process with drift is vital to calibrate the best ARIMA (p,d,q), as illustrated by (
Hyndman and Khandakar 2007). This procedure checks the time series stationarity in the initial round using a suitable unitary root test before choosing the differencing order d. The 2nd stage determines the auto-regressive best values and moving average order, like p and q, respectively, using exact information criteria for AIC or BIC. In most cases, the implemented algorithm utilizing the function, which is present in the python package for forecasting (
Hyndman and Khandakar 2007); and (
Bauer et al. 2020).
Proposition 2. The performance of ARIMA (p,d,q) is compared with that of LSTM. The LSTM looks like a smooth, natural competitor to ARIMA (p,d,q) because it can capture a long-term sequence or pattern within sequential data. We start building an LSTM model, which enumerates the stated function f linking to the time lags, as:where is defined as the number of time lags being considered and is the homoschedastic error or randomness term. Proof. The LSTM network, just like many other standard machine learning methods, needs the dataset dividing into testing and training sets. The training set often represents supervised learning, whereas testing is for the validation of the model.
Table 1 shows a supervised learning dataset, which is helpful for prediction. Upon completion of training, the network will have learned the input-output functional relationship, thus predicting future values of
by using only the input. To be more practical, taking the input as
matrix with time lags of
as well as the output as the
vector of best current values, with
is the number as in
Table 1. □
The predicted
values, at time
are done recursively. Generally, the predicted values of
in a generic time
is determined using the values of
with
) as input. The values of
are determined by the predicted as opposed to observed values. We start by estimating the CBD model parameters
,
and
using the SVD method. The extracted time series of
is denoted as the first base for our analysis. The data is then split into training set and testing set as per 80% training and 20% testing rule. Consequently, we determine the last year
T of observation. We have done the analysis for the U.K. and Kenya differentiating through gender with one-time lag
in
Table 2.
When selecting the optimum hyperparameters combination for the neural network, it is essential to carry a preliminary fine-tuning round for all these countries while distinguishing them by gender (see
Table 3). In this step, we can get combinations, which will be used during LSTM calibration during the forecasting procedure. On the tuning results, we have discovered that this architecture having one hidden layer does perfume better than others on our data and the number of neurons depending on the country. Using a Rectified Linear Unit (ReLU) as an activation function outperformed many other functions when testing many other countries. Moreover, there is no clear evidence on the influence of the performance of hyper-parameters.
After the calibration step, the paper’s analysis will include numerical and graphical processing and presentation of the goodness of fit. To be specific, the study will follow the approach of out of sample, which denotes the testing step within the field of machine learning. The estimation of parameter
parameter is determined using SVD, as for male and female respectively.
Figure 3 dashed vertical line shows a separation of the forecasted period compared to one used in training the LSTM network. As for ARIMA models, it is shown that the confidence interval within 0.995 level of confidence. In addition to the graphical check, we can compare the LSTM performance against those of optimum ARIMA in the testing set before measuring the correctness of the forecasting by calculating the following measures of statistical goodness of fit; which includes Mean Absolute Error (MAE) as well as Root Mean Square Error (RMSE):
Table 4 illustrates the respective performance of ARIMA and LSTM in terms of their RMSE and MAE by the individual nation and gender. From the results of measures of goodness of fit and kt plots, we can see that the LSTM network offer excellent performances when equated to the traditional ARIMA models.
By analyzing error estimates of MAE and RMSE, Kenya shows the best performance LSTM concerning the ARIMA model for both tabulated genders. Moreover, by graphical analysis, the LSTM appears to capture the non-linearity, especially of the future mortality trends, by showing its good capability of bettering representation by decreasing mortality dynamics when dealing with the ARIMA model.
Analytically, we have noticed a higher ability of the LSTM when capturing trends of nonlinear without going in an opposite situation, which is an excessive oscillating or parabolic trend (as well as the latter observed when compared to traditional ANNs). Contrary, the analysis is showing that ARIMA method is not effective. This evolution of as per ARIMA models is sometimes experienced out of reach within the confidence interval levels, as in the U.K. case for both sexes.
The results obtained have highlighted the ARIMA’s inadequacy to detect the ever-decreasing mortality dynamics over time. Though many researchers across the globe still use the ARIMA process when modeling time mortality indexes, because having a fixed structure at the same time works well as long as data satisfies the assumptions of ARIMA like constant variance assumption, which has vast importance for integrated models, it has many flaws when compared to currently existing deep learning techniques. In many cases, life table data may exhibit unpredictable volatility changes for long time series, which doesn’t fit the ARIMA assumption well.
Even though ANN is an excellent and outstanding learning algorithm for modeling, it offers the only point of predictions without indicating any form of their variability. In addition, the prediction of confidence intervals is a real substantial challenge within the ANN field. Nevertheless, the LSTM network still demonstrates an excellent candidate to use when predicting the mortality trend accurately over a long time.
Table 4 shows that LSTM indeed over-performs the traditional ARIMA (p,d,q) model in all stated nations because of its fantastic architecture, which permits learning the vital influence from historical mortality data and replacing it with high accuracy the future years. The LSTM network capability is seen easily, particularly for the populations of the two nations where
parameter does need to take a protuberant linear trend.
One remarkable LSTM aspect concerns the probability of achieving optimal predicting performance without resorting to a prior selection, especially of the time steps. For example, we have shown that the determined values of mortality (
Figure 3) from logit-mortality rates,
for the Kenyan for males. The ARIMA model offers a trivial forecasted trend shape when compared to LSTM. The straight line of the future
values, which varies over time, produces a fundamental behavior of the predicted shape of mortality. On the contrary, the CBD integrated model with the LSTM has an insignificant gap between the real and the forecasted values mortality rates. From the smoothness of the curve, it is easy to prove the capability of LSTM as a better forecast on accurate and big data compared to the historical ARIMA (p,d,q) model.
6. Recommendations
Using a CBD model, we have demonstrated that deep learning techniques make the model more accurate when modeling and predicting, thus reducing the challenges associated with errors. Any government or private company that needs to model a behaviour study can apply deep learning as opposed to traditional statistical estimations.
However, training on the use of machine learning needs to be done for the professionals in these institutions since this will enable them to save huge lots of money when pricing the financial products based on a prediction and projections. For instance, the valuation of life assurance products such as assurances and annuities depends on predicted systematic mortality risk levels, meaning that poor estimation of the risk can lead to the insurance, pension, and social security firms making substantial financial losses.
On implementing the LSTM method in a policy document by the Kenyan government, the information should be fed into the input layer before transferred to its hidden layer. The interconnections of the Neural Networks between the two layers assign proportions or weights to every input randomly.