Hybrid of the Lee-Carter Model with Maximum Overlap Discrete Wavelet Transform Filters in Forecasting Mortality Rates

: This study implements various, maximum overlap, discrete wavelet transform ﬁlters to model and forecast the time-dependent mortality index of the Lee-Carter model. The choice of appropriate wavelet ﬁlters is essential in effectively capturing the dynamics in a period. This cannot be accomplished by using the ARIMA model alone. In this paper, the ARIMA model is enhanced with the integration of various maximal overlap discrete wavelet transform ﬁlters such as the least asymmetric, best-localized, and Coiﬂet ﬁlters. These models are then applied to the mortality data of Australia, England, France, Japan, and USA. The accuracy of the projecting log of death rates of the MODWT-ARIMA model with the aforementioned wavelet ﬁlters are assessed using mean absolute error, mean absolute percentage error, and mean absolute scaled error. The MODWT-ARIMA (5,1,0) model with the BL14 ﬁlter gives the best ﬁt to the log of death rates data for males, females, and total population, for all ﬁve countries studied. Implementing the MODWT leads towards improvement in the performance of the standard framework of the LC model in forecasting mortality rates.


Introduction
Mortality studies are essential in understanding the demographic structure and indicating the health status of a population. The analysis of mortality and its historical trends enables a country to comprehend its population dynamics and serves as a foundation for formulating economic and social policies [1]. Actuaries used mortality forecasts to project cash flows and evaluate premiums as well as reserves in life insurance and pension plans. The Lee-Carter (LC) model [2] significantly contributed to the development of various extensions. A popular extension of this model is by Hyndman and Ullah [3], which used a functional data paradigm combined with nonparametric smoothing (penalized regression splines). Currie [4] extended the LC model to a generalized linear model (GLM) framework where the LC model and its extensions were fitted following the GLM framework in the Poisson and binomial settings. Neves et al. [5] considered five probability models (Poisson, binomial, negative binomial, Gaussian, and beta) based on the generalized autoregressive score (GAS) model to estimate the LC parameters and forecast mortality rates.
Apart from the extensions discussed, the original LC model is still widely used in mortality forecasting in many countries due to its simplicity and robustness. The LC model consists of three main parameters. The first and second parameters are age-specific parameters representing every age group while the third parameter, k(t) defines the timevarying effect such that the tendency of all age-specific central death rates has the same pattern of stochastic evolvement over time [6]. In the original LC model, the ARIMA model is used to forecast k(t). However, according to Nigri et al. [7], the standard ARIMA approach has limited ability to recognize unknown and unidentified patterns in future mortality trends over time. Hainaut and Denuit [8] reported that wavelets are great tools for investigating mortality trends across time with numerical illustrations. This motivated the integration of wavelets to enhance the performance of the ARIMA model in forecasting mortality rates.
Wavelets are commonly used in time series analysis, especially in signal processing, engineering, economics, and finance (see Percival and Walden [9] for an excellent review on the use of wavelets in time series). Wavelets decompose the original univariate time series into a group of time series (detail and smooth coefficients) that have an explicit hierarchical structure [10]. Wavelet-based methods have significant advantages in terms of denoising and are robust to outliers [11][12][13]. To date, research related to the application of wavelets in mortality is limited and patchily available. Morillas et al. [14] pioneered the use of wavelets and piecewise polynomial harmonic interpolation to develop a two-stage method for grading mortality rates and compared it to kernel grading. The wavelet technique can improve smoothness, fit, and oscillations more effectively than the conventional technique. Hainaut and Denuit [8] demonstrated that only a small number of wavelets are required to reconstruct all the mortality curves in the Belgian population from 1965 to 2015. Wavelet coefficients display clear trends in the Belgian population and are therefore straightforward to forecast.
This article takes cognizance of the advantages of the maximal overlap discrete wavelet transform (MODWT) and the ARIMA model to overcome the limitations of the ARIMA process in forecasting the future evolution of the k(t) parameter. Wavelet transforms (WT) can be classified as continuous wavelet transforms (CWT) and discrete wavelet transforms (DWT). The MODWT is a modified version of the DWT which avoids the subsampling process, leading to a higher level of information in the resulting wavelet and scaling coefficients. The MODWT was chosen in this study because it can retain down-sampled values at each level of the decomposition and is well defined for all sample sizes [15]. So far, no work has been done to study the performance of MODWT wavelet filters such as the least asymmetric (LA8), best-localized (BL14), and Coiflet (C6) wavelet filters, for modelling and improving the prediction accuracy of mortality trends in the LC model. Hence, it is of interest to undertake this task and investigate the performance of the MODWT-based LC model. In this study, five countries with data on the Human Mortality Database (HMD) for years 1950 to 2016, involving ages 0 to 90+ were considered to verify the effectiveness of the MODWT-based LC model. This paper is organized as follows: Section 2 presents the methodologies used in constructing the LC model and the MODWT-based LC model, Section 3 illustrates and compares the performances of the LC model and its wavelet counterparts for five countries (Australia, England, France, Japan, and USA), and Section 4 concludes.

The Lee-Carter (LC) Model
The Lee-Carter [2] model is as follows: where m(x, t) is the central death rate of age x at time t, ω is the beginning of the last age interval. Here, a(x) describes the average shape of the age profile, b(x) describes the pattern of deviations from this age profile when k(t) the mortality index at time t varies. ε(x, t) reflects the age-specific historical influences which are not fully captured by the model which is independent and identically distributed and follows the N 0, σ 2 distribution. The parameters in Equation (1) are estimated using a two-stage method by imposing the following restrictions: to distinguish a unique solution for the system of equations of the model. The singular value decomposition approach was applied to the matrix of centered age profiles, ln(m x,t ) − a(x), which allows a first estimation of parameters b(x) and k(t). The model in (1) is fitted to the crude mortality rates,m(x, t) = D(x,t) E(x,t) where D(x, t) > 0 denotes the number of deaths of age x at time t, and E(x, t) is the matching central exposure of age x at time t. Once b(x) and k(t) are estimated by satisfying (2), a second stage estimate of k(t) is found to ensure that the actual total deaths are identical to the total expected deaths for each t, as a basis for comparing actual and expected deaths. Hence, the parameter estimates satisfy This adjustment gives more weight to high rates, thus roughly counterbalancing the effect of using a log transformation of the mortality rates. To forecast mortality rates, an appropriate time series was fitted to k(t) using future extrapolation values, i.e., k(t + n). Subsequently, the forecasted mortality rate would be For this method, a(x) and b(x) are fixed. The adjustedk(t) is then extrapolated using ARIMA models. Lee and Carter [2] used a random walk with drift model, which can be expressed as where d is known as the drift parameter and measures the average change according to time t in the series, and e(t) is an uncorrelated error.

The MODWT-Based LC Model
Wavelets are based on Fourier transform which show any function as the sum of sine and cosine functions. WT is a function of time t that obeys the basic rule known as the admissibility condition [16]: where ϕ( f ) is the Fourier transform and a function of the frequency f , of ϕ(t). A father wavelet generates the smooth and low-frequency parts of a signal while a mother wavelet generates the detailed and high-frequency components. The following equations represent the father and mother wavelets, respectively, where j = 1, 2, 3, . . . , J in a J-level wavelet decomposition: where J denotes the maximum scale sustainable by the number of data points. The father and mother wavelets satisfy: In any time series data, a function which is an input represented by wavelet transforms can be built as a sequence of projections onto father The analysis of real discretely sampled data requires creating a lattice for making calculations. Mathematically, it is convenient to use a dyadic expansion as shown in Equation (8). The expansion coefficients are given by the projections: The wavelet approximation coefficients to f (t), which leads to k(t) in the wavelet framework of (1) is defined by: where WT is used to calculate the wavelet approximation coefficient in (10), for a discrete signal where S j (t) and D j (t) introduce the smooth and detailed coefficients, respectively. The smooth coefficients emphasize on the most critical features of the data, and the detailed coefficients detect the main features in the data [9,16].
An orthonormal DWT matrix k(t) can be constructed based on any filter satisfying the properties of a wavelet filter in (8), namely summation to zero and orthonormality [9]. A Daubechies' wavelet filter of even width L has a squared gain function: where S( f ) = 4sin 2 (π f ) defines the squared gain function for the difference filter {1, −1} and which establishes the squared function of a low-pass filter. The scaling filter {g l } that agrees to the Daubechies wavelet filter has a squared gain function given by As L increases, some additional criteria to select a unique wavelet filter or a unique scaling filter may be imposed. For the Daubechies filter, let the scaling filter g where {g l } is any other filter with squared gain G (S) (.) and g (ep) 1 denotes the extremal phase (ep) scaling filter.
For a least asymmetric filter, the scaling filter whose transfer function G( f ) = [G (S) ( f )] 1/2e iθ (G) ( f ) , has a phase function θ (G) (.). This is as close as possible to that of a linear phase filter. The benefit of the least asymmetric filters is that the v of v value minimizes G( f ) to match the scaling and wavelet coefficients in such a way that they can be viewed as approximately the output of zero-phase filters [9,16].
The approximate zero phase property is significant as it allows us to link the DWT coefficients meaningfully to different events in the original time series. The least asymmetric wavelet with an excellent general-purpose has a width which is 8 (LA8). The LA8 wavelet filter strikes a balance between providing smooth approximations with few artifacts and minimal edge effects at data boundaries [9,16].
Let k(t) be associated with the actual time t 0 + t∆t. Then, the phase properties of the least asymmetric filters dictate associating the wavelet coefficient W j,t with actual time where and For the scaling coefficient V j,t , a similar expression is obtained by replacing v The Coiflet wavelet filters are alternatives to the Daubechies filters that provide better approximations to zero phase filters compared to their least asymmetric counterparts [17]. However, they have a less desirable filter form and less embedded differentiating operations for a specific filter width. Coiflet wavelets and scaling coefficients with actual times use (19) by substituting v = −2L 3 + 1. Doroslovacki [18] proposed a best-localized squared gain factorization for the Daubechies scaling filter. This filter refines the least asymmetric idea by using a new linear-phase deviation measure that penalizes departures at low frequencies more severely than those at high frequencies [9]. The best-localized (BL) wavelet filter gives improved results for the scaling function by minimizing the time-localization measure [18]. A proper selection of the wavelet-generating-filter transfer function is crucial to make the scaling functions and wavelets in the binary orthonormal Daubechies form more symmetrical. The Coiflet (C6), least asymmetric (LA8), and best-localized (BL14) wavelet filters will be investigated to find the best approximation for k(t).

Results and Discussion
The data for this study is obtained from the Human Mortality Database (HMD). Five developed countries were selected such as Australia, England, Japan, France, and USA by considering sex-specific, as well as total population for analysis. The data for these countries are based on single years of age. The data for older ages (age 90 and above) were grouped to avoid problems associated with erratic rates at these ages. The LC and MODWT-LC models were fitted to log death rates from 1950 to 2016. To fit these models, the data for the five countries were divided into training and test sets. The training set consists of the observed log death rates occurring until 2005, and the test set is chosen from 2006 onwards. Tables 1-3 provide the forecast accuracy based on mean absolute error (MAE), mean absolute percentage error (MAPE), and mean absolute scale error (MASE) for male, female, and total populations for the five countries studied.
The accuracy of two models representing k(t) such as the default ARIMA(0,1,0) and MODWT-ARIMA(5,1,0) by integrating the LA8, BL14, and C6 wavelet filters were evaluated. In selecting the suitable form of ARIMA (p, d, q) for the MODWT-LC model, various values were fitted to p (the number of autoregressive terms), d (the number of non-seasonal differences required for stationarity), and q (the order of moving average). The Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test Kwiatkowski et al. [19] was used to test for stationarity and the p-values were significantly lesser than 0.05 for d = 1. This indicates that one non-seasonal difference is required to achieve stationarity for the data of the five countries. Based on the smallest error obtained using MAE, MAPE, and MASE, p = 5, d = 1 and q = 0 form the best combination in improving k(t) to fit the log of death rates for the five countries investigated. The structure of the data also suggested that no moving average term i.e., q is required.  : for Tables 1-3, LA8, BL14, and C6 represent the filters used in the MODWT ARIMA (5,1,0) model representing k(t).
Tables 1-3 show that the MODWT-ARIMA (5,1,0) model gives the smallest MAE, MAPE, and MASE values for the total, male and female populations of the countries studied. It can be seen from Tables 1-3 that the BL14 filter consistently outperforms its counterparts. In the LC model, k(t) plays a significant role in capturing the mortality trend over time. The application of the BL14 MODWT-ARIMA) to model k(t) for the log death rates data decomposes them into various resolution levels that reveal their essential structure and generates detailed coefficients at every level. The MODWT with filters were used to capture the pattern of the k(t) series over time. The MODWT-based decomposition (Alenezy et al. [20] and Cornish et al. [15]) is an effective approach for revealing variations, magnitudes, and phases of the data. In capturing the pattern of the k(t) series, the BL14 filter has markedly better phase properties than the LA8 and C6 filters. The best localized filter penalizes deviations at low frequencies more heavily than those at high frequencies [18].
For the total population of Australia (Table 1) and the male population of France (Table 2), the LA8 filter appeared to have a slight edge when using the MAPE measure. Simulation is not required to prove the effectiveness of the BL14 filter because the proper (Alenezy et al. [20] and Cornish et al. [15]) is an effective approach for revealing variations, magnitudes, and phases of the data. In capturing the pattern of the ( ) series, the BL14 filter has markedly better phase properties than the LA8 and C6 filters. The best localized filter penalizes deviations at low frequencies more heavily than those at high frequencies [18].
For the total population of Australia (Table 1) and the male population of France (Table 2), the LA8 filter appeared to have a slight edge when using the MAPE measure. Simulation is not required to prove the effectiveness of the BL14 filter because the proper choice of ARIMA model equipped with the MOWDT-BL14 filter is versatile in capturing extreme values.

Conclusions
This study considered the hybrid of the MODWT with the Lee-Carter model to improve the forecast accuracy of the time-dependent mortality index, ( ). The MODWT is more advantageous than the DWT for mortality modelling. The MODWT can handle any sample size. In each level, there are detail (wavelet) and smooth (scale) coefficients which

Conclusions
This study considered the hybrid of the MODWT with the Lee-Carter model to improve the forecast accuracy of the time-dependent mortality index, k(t). The MODWT is more advantageous than the DWT for mortality modelling. The MODWT can handle any sample size. In each level, there are detail (wavelet) and smooth (scale) coefficients which are associated with zero phase filters. The MODWT also produces a more asymptotically efficient wavelet variance estimator than the DWT. This clearly shows that the MODWT has an edge over its counterparts. Overall, the results of this study show that the hybrid of the MODWT using BL14 filters with the Lee-Carter [2] model entails significant improvement in forecasting accuracy. Our findings show that the minimization of a time-localization measure offered by BL14 wavelet filters can improve results in forecasting k(t). The MODWT-ARIMA-(5,1,0) with the BL14 filter generally shows excellent proximity to the actual log-mortality rates for the five countries studied. For future studies, the extension of wavelet-based neural network models with the LC model is of interest.