Robust non-parametric mortality and fertility modelling and forecasting: Gaussian process regression approaches

A rapid decline in mortality and fertility has become major issues in many developed countries over the past few decades. A precise model for forecasting demographic movements is important for decision making in social welfare policies and resource budgeting among the government and many industry sectors. This article introduces a novel non-parametric approach using Gaussian process regression with a natural cubic spline mean function and a spectral mixture covariance function for mortality and fertility modelling and forecasting. Unlike most of the existing approaches in demographic modelling literature, which rely on time parameters to decide the movements of the whole mortality or fertility curve shifting from one year to another over time, we consider the mortality and fertility curves from their components of all age-specific mortality and fertility rates and assume each of them following a Gaussian process over time to fit the whole curves in a discrete but intensive style. The proposed Gaussian process regression approach shows significant improvements in terms of preciseness and robustness compared to other mainstream demographic modelling approaches in the short-, mid- and long-term forecasting using the mortality and fertility data of several developed countries in our numerical experiments.


Introduction
There has been an increasing demand for demographic modelling and forecasting over the last few decades, driven by many developed countries are now suffering a rapid decline in mortality and fertility, leading to a significant increase in expenditures on health services for an ageing population and a shortage of future labour. A better understanding of the mortality and fertility patterns and trends is always of importance for all stakeholders in a society as the mortality forecasts, for example, play a vital role for the insurance and pensions industries in pricing their insurance products. The fertility predictions are also of great interest to the government and education sectors in planing children's welfare and educational services.
Unlike the biological and the medical methods, statisticians have developed very different and purely mathematical methods to model the demographic patterns and trends which are well-documented by Preston et al. (2000). The history of demographic modelling with the mathematical approaches can be traced back to some deterministic models proposed in the midnineteenth century, see, for example, Gompertz (1825) and Makeham (1860). The deterministic models are, however, restricted with few fixed factors and have no stochastic process considered owing to the lack of computing capability in that early period. With the advance of technology in computing, stochastic modelling has become a mainstream method of mortality and fertility curve fittings over the last three decades. A significant milestone of the demographic modelling literature is the seminal work done by Lee & Carter (1992), well-known as the Lee-Carter model, which can model and extrapolate a long-term mortality trend as a stochastic time series.
It rapidly gained credit and popularity, given its simplicity and ability to capture most variations in demographic data evolved over time. For instance, Lee (1993) applies it to fertility modelling and forecasting, and there has also been a series of extensions, variants and modifications proposed afterwards, see, for example, Bell (1997), Lee & Miller (2001), Booth et al. (2002), Brouhns et al. (2002), Renshaw & Haberman (2003), Cairns et al. (2006) and Hunt & Blake (2014).
In the meantime, non-parametric or data-driven techniques, which focus on the aspect of letting the data speak for themselves with no need to meet certain assumptions on parametric forms in the model calibration, have also been introduced and developed in demographic modelling.
This kind of techniques can be dated back to the classical graduation technique (Whittaker, 1922), which maintains an agnostic view of historical experience and solely focuses on removing random fluctuations in observed data then directly extrapolates the past trend to the future.
In more recent years, Currie et al. (2004) use P-splines to smooth the historical mortality surface across both age and year dimensions before fitting. Hyndman & Ullah (2007) extend the Lee-Carter model to a functional data framework with a non-parametric smoothing method that allows for smooth functions of age and is more robust for demographic modelling. Some other developments in this area include Delwarde et al. (2007), Debón et al. (2010), Li et al. (2015), Ludkovski et al. (2018), Dokumentov et al. (2018), Wu & Wang (2018) and Alexopoulos et al. (2019).
As an alternative to existing methods as well as having several desirable advantages over others, in this paper we propose a new non-parametric approach using Gaussian process regression with a natural cubic spline mean function and a spectral mixture covariance function for mortality and fertility modelling and forecasting. Unlike most of the existing approaches in demographic modelling, which depend on few parameters to decide the movement of the whole mortality or fertility curve shifting from one year to another, we consider the mortality and fertility curves from their components of all age-specific mortality and fertility rates and assume each of them following a Gaussian process over time to fit the whole curves in a discrete but dense style.
More will be discussed in detail in the paper, and the rest of this paper is organised as follows. We briefly outline the theoretical background of Gaussian process regression in Section 2. In Section 3, we describe the framework of the proposed Gaussian process regression model for mortality and fertility modelling and forecasting. We then illustrate the proposed model with applications to the empirical mortality and fertility data, followed by comparisons to other existing approaches in terms of the systematic differences and forecasting performances using the observed mortality and fertility data of ten developed countries in Section 4. We lastly conclude this paper with discussions and remarks in Section 5.

Theoretical background of Gaussian process regression
Gaussian process regression (also known as Kriging) is a regression method which belongs to a class of Bayesian functional non-parametric approaches to inferencing and modelling of an unknown latent function. This approach can be seen as conditioning of test data by training data in a joint Gaussian distribution from a function space point of view. The theoretical basis for Gaussian process regression (GPR) was initially developed for estimating the most likely distribution of gold based on samples from a few boreholes in South Africa (Krige, 1951), and it was mainly used for spatial analysis and natural resources evaluation in geostatistics in former times, see, for example, Matheron (1963), Journel & Huijbregts (1978) and Cressie (1989).
Over the last two decades, GPR has gained its popularity rapidly as different statistical tools among the data science community, such as, Engel et al. (2005), Krause et al. (2008) andNeal (2012). It has also been treated as a powerful tool for regressions and forecasting problems in the field of economics thanks to its abilities to quantify the uncertainties associated with the historical experience of the observed data and generates the full stochastic trajectories for outof-sample forecasts with prediction intervals under the Gaussian probabilistic framework. It is also capable of scaling to a large dataset with a minimum set of tunable (hyper-) parameters involved as well as capturing non-linear or periodic dynamics with a high degree of analytic tractability that extends the explorabilities and explainabilities of the classical linear regression model in more complicated scenarios. Some economic forecasting applications with GPR can be found, for instance, Alamaniotis et al. (2012) perform a short-term load forecasting using an ensemble of GPR approaches and Wu et al. (2012) predict the tourism demand volume in Hong Kong using GPR. Since GPR is the fundamental technique applied in this article, we will give a brief review of GPR which will be used later. Readers can refer to MacKay (1998) and Williams & Rasmussen (2006) for more complete discussions of GPR.

Gaussian process regression
Let a training dataset with n pairs of observations of univariate covariates and responses be We consider the following non-linear regression model where f (·) is the unknown function that needs to be estimated, and { (t i )} n i=1 are the independent and identically normally distributed error terms with mean 0 and constant variance σ 2 .
Following the Gaussian process paradigm, the unknown function {f (t) : t ∈ T } is assumed to have a Gaussian process prior with a specified mean function and a specified covariance function over the domain T . It gives for all t ∈ T where the mean function µ(t) and the covariance function K(t, t ) are defined as The specified mean function µ(·) and the specified covariance function K(·, ·) in the Gaussian process prior reflect our prominent belief in the unknown function f (·), prior to any information about the observed data. A well-specified prior mean function has a profound impact on the forecast performance since the mean function will dominate the forecast results of the Gaussian process regression model in regions far beyond from the historical data. Meanwhile, the specified covariance function encodes the correlation between any pair of outputs {y(t), y(t )}, which determines the relativeness of one point to another, such as smoothness or periodicity.
A common covariance function example is the squared exponential covariance function which is used to model a smooth function. It has the form where h is the response-scale amplitude determining the variation of function values, l is the characteristic length-scale which gives smooth variations in a covariate-scale and controls how far the observed data can be extrapolated.
Another common example is the periodic covariance function, which is used to model a periodic function. It gives where p determines the distance between repetitions of a function, h and l are the same as in the squared exponential covariance function above.
A wide variety of covariance functions has been proposed, see, for example, Williams & Rasmussen (2006), and this allows us great flexibilities in modelling among many different scenarios.
With the assumption of the Gaussian distribution on a collection of all the discrete observations, y = [y(t 1 ), . . . , y(t n )] T follows an n-variate normal distribution, i.e.
The log-likelihood function of the n-dimensional collection of the discrete observations y for all the (hyper-) parameters in the specified mean function and the specified covariance function (denoted by a generic θ) and the noise parameter σ 2 is where | · | denotes the determinant of a matrix.
Standard gradient-based numerical optimisation techniques, such as Conjugate Gradient method, can be used to maximise the log-likelihood function L(θ, σ 2 ) in Equation (2.2) to obtain the estimates of the model parameters.
For the corresponding value y(t * ) at any measurement time point t * , we can denote the joint distribution of [y(t 1 ), . . . , y(t n ), y(t * )] T as an (n + 1)-variate normal distribution with a mean vector [µ(t 1 ), . . . , µ(t n ), µ(t * )] T and a covariance matrix as The conditional distribution of y(t * ) given y with the estimated (hyper-) parametersθ and noise varianceσ 2 through the optimisation of the log-likelihood function in Equation (2.2), is The Gaussian process regression approach for conditioning of an unknown value by some realisations of a stochastic process in a joint Gaussian distribution can be seen as expanding discrete data from a function space point of view over the same domain. This idea can also be applied to the task of forecasting. With the Gaussian probabilistic framework, it can quantify the uncertainty associated with the historical experience of the observed data and then generate the full stochastic trajectories for out-of-sample forecasts with prediction intervals. We will discuss this in detail with applications to mortality and fertility data in the next section.

Gaussian process regression (GPR) model for mortality and fertility modelling and forecasting
In this section, we introduce the proposed Gaussian process regression (GPR) model for mortality and fertility modelling and forecasting.
Let our discrete observed log mortality (or fertility) rates dataset be {t i , y x (t i )} for i = 1, . . . , n and x = 0, . . . , m, where n is the total number of observed calendar years t, m is the maximum age of interest, and y x (t i ) is the log of mortality (or fertility) rates for a given age x in a calendar year t i . The proposed GPR model for mortality (or fertility) rates modelling and forecasting in a given age x is where f x (t i ) is the underlying function that needs to be estimated, { x (t i )} n i=1 allow the observation errors varying based on the assumption of i.i.d. normally distributed random variables with mean 0 and constant variance σ 2 x for a given age x.
Following the Gaussian process regression paradigm discussed in Section 2, the underlying function {f x (t) : t ∈ T } for mortality (or fertility) modelling and forecasting of each age x is assumed to follow a Gaussian process with a specified prior mean function µ x (t) and a specified (3.1)

Specified Gaussian process prior mean function
Under the Gaussian process regression paradigm, we are allowed to choose a prior mean function that reflects our prominent belief in the unknown function f x (t) for the Gaussian process regression model. Due to incomplete information on the functional form among time-series age-specific mortality and fertility rates over the domain T , and a wide range of patterns in the age-specific mortality and fertility rates exhibiting in our numerical examples in Section 4 (see To explain further how it works mathematically, we consider the natural cubic spline func- can be represented as the truncated power series representation for a cubic spline function. It gives with additional constraints on the boundary conditions when t < ξ 1 and t ≥ ξ K , such that where (·) + denotes the positive part, {α p } 3 p=0 are the coefficients of the cubic polynomial and {β k } K k=0 are the coefficients of the truncated power series for the cubic splines with K interior knots.
By imposing the above boundary constraints in Equation (3.3), the natural cubic spline function can be further derived as a reduced basis when t ≥ ξ K for extrapolation as (3.4) For simplicity of notation, we denote c 0 = (α 0 − K k=1 β k ξ 3 k ) and c 1 = (α 1 + K k=1 3β k ξ 2 k ). Then thus µ x (t) is a linear function with constants c 0 and c 1 when t ≥ ξ K for extrapolation.
Unlike the cubic spline function whose behaviour tends to be erratic near boundaries and the extrapolation can be unrealistic outside the observed time interval [t 1 , t n ], the natural cubic spline function forces its extrapolation outside regions beyond the observed data to be a linear function with zero second derivative that agrees with the spline in its intercept and slope values among its last knot (Friedman et al., 2001). With this distinctive feature, the natural cubic spline function can hence inherit the local information of recent data and gives a reasonable direction for extrapolation. In demographic modelling, it is often the case that more recent experience has greater relevance on future behaviour than those data from the distant past. On account of this, we specify the natural cubic spline function as the mean function for mortality and fertility modelling and forecasting in the proposed GPR model.

Specified Gaussian process prior covariance function
In the proposed GPR model, the specified covariance function is used to discover and capture the similarities and various time-correlated structures among the historical demographic trends for different age groups, then project their patterns into the future.
Given that the choice of covariance functions in Gaussian process regression remains an ongoing research problem, the choice of a covariance function is based on the empirical evidence by simulating the GPR model with different covariance functions in our study. Apart from the squared exponential and the periodic covariance functions mentioned in Section 2, we have also considered and tested four more common covariance functions, including the rational quadratic, Matérn 3/2 and Matérn 5/2 covariance functions documented in the Gaussian process literature (Williams & Rasmussen, 2006), and the spectral mixture covariance function proposed by Wilson & Adams (2013). The spectral mixture covariance function is derived by modelling a spectral density of the Fourier transform of input-correlated structures with a Gaussian mixture, which can automatically discover patterns and extrapolate far beyond the available data; see Wilson & Adams (2013) for detailed discussions.
From our empirical experiments 2 , we have found that the spectral mixture covariance function is indeed relatively more successful in capturing different patterns of age-specific mortality and fertility rates. It also provides more accurate forecast results than the other covariance functions for mortality and fertility predictions. Wu & Wang (2018) also discovered the similar results. The spectral mixture covariance function is thus selected as the primary covariance function in the proposed GPR model for all age-specific mortality and fertility rates as a unified approach.
Considering a Q number of Gaussian mixture components, in which the q-th component has mean λ q and variance ν 2 q , the spectral mixture covariance function is defined as where w q is the weight specifying the contribution of the q-th Gaussian mixture component.

Likelihood function of the proposed GPR model
It yields that y x = [y x (t 1 ), . . . , y x (t n )] T in each given age x follows an n-variate normal distribution, such that The log-likelihood function of the n-dimensional collection of the discrete observations y x in each age x for all the (hyper-) parameters (denoted by a generic θ x ) and the noise variance where | · | is the determinant of a matrix.
Standard gradient-based numerical optimisation techniques can be used to maximise the log-likelihood function L(θ x , σ 2 x ) in Equation (3.6) to obtain the estimates of the model parameters.

Out-of-sample forecasts and prediction intervals of the proposed GPR model
Let t * h be the h-step ahead of the last observed calendar year where t n < t * 1 < . .
variate normal distribution that comes with the mean vector [µ x (t 1 ), . . . , µ x (t n ), µ x (t * 1 ), . . . , µ x (t * h )] T and the covariance matrix is where K * x is the size of n × h covariance matrix, and K * * x is the size of h × h covariance matrix Therefore, the h-step ahead out-of-sample forecast and its variance of the age-specific mortality (or fertility) rates can be found from the conditional distribution of y * given y x with the estimated (hyper-) parametersθ x and noise varianceσ 2 x through the optimisation of the log-likelihood function in Equation (3.6), is then N (ŷ * x , Var(y * x )), and let Var(y * With the normality assumptions on the model error and the known Var(y * x ), a 100(1 − α)% prediction interval for y * x can be calculated as y * quantile of the standard normal distribution.

Applications
In this section, we demonstrate the abilities of the proposed GPR model for modelling and forecasting two different sets of demographic data − mortality data and fertility data. We first apply the proposed model to the male mortality data and the fertility data of Japan for illustration purposes. We then compare and evaluate the quality of the fitted mortality and fertility curves by the proposed method with some other existing approaches using the mortality and fertility data of ten different developed countries.

Male mortality data
The male mortality data of Japan are available from the year 1947 to the year 2016 from the Human Mortality Database (2020). The database consists of the age-specific male mortality rates by a single calendar year of age up to 110 years old. The age-specific male mortality rates are defined as the number of deaths in males during a calendar year, proportional to the male resident population of the same age during the same calendar year. We restrict our experimental mortality data up to age 100 to avoid any potential problem associated with the erratic mortality rates above age 100.
The observed male mortality data are presented in Figure 4.1(a) as separate univariate time series of the log age-specific male mortality rates with 20-year age intervals from age 0 to age 100 from the year 1947 to the year 2016. We can see that there is a general decrease in male mortality rates among all the selected age groups during the examined period. The decline in mortality rates at higher ages seems to change more slowly than those at younger ages. Figure  4.1(b) presents the log male mortality curves, which give us information about the general trends and the variations of the age-specific male mortality rates over the observed period. There is an apparent hump around 18 to 25 years old which usually relates to reckless behaviour in teenage ages and a general drop for all population in mortality rates over time, due primarily to the advances in medical technology.

Mortality modelling and forecasting
In the demonstration of the proposed model, we aim to display 10-years-ahead out-of-sample forecasts of mortality rates. We first split the dataset into a training dataset with the observed The predictive mean and variance for forecasting age-specific mortality rates can be obtained by Equation (3.7) and Equation (3.8) respectively. Figure 4.2 demonstrates the predicted results of the selected age-specific mortality rates by the proposed GPR model. We can see that the proposed model can capture the varying patterns in mortality rates among different age groups properly. To construct a predicted mortality curve for any specified year, we extract the predicted age-specific mortality rates from age 0 to age 100 in that specified year.

Fertility data
We move to the second case study of modelling and forecasting the age-specific fertility rates.
The fertility data of Japan are available from the year 1947 to the year 2016 from the Human Fertility Database (2020). The database consists of the age-specific fertility rates by calendar year from 12 to 55 years old. The age-specific fertility rates are defined as the number of births during a calendar year, based on the age of the mother proportional to the total number of the female resident population. Our study focuses on the age-specific fertility rates from age 15 to age 45, given that this age range is closer to reality (Hyndman & Ullah, 2007), and the problem of missing data is also common when the age-specific fertility data starts from age 46 to age 55.
Excluding this age range not just helps stabilise the fitted fertility curves for illustration in this section, it also gives fairer forecast performance comparisons of the proposed model to other existing models without making any subjective adjustments on the missing data when involving fertility data of several other countries which will be discussed in Section 4.5.
We present the historical fertility data of Japan as separate univariate time series of the log age-specific fertility rates with 5-year age intervals in and a rapid decrease in the birth rate during the 'Japanese economic miracle period' in the early 1980s due to a delay in child-bearing while the economy was establishing rapidly at that time. In more recent years, there was an increasing trend in fertility at higher ages, because the females devoted more time in their educations and careers. In general, females from age 20 to age 40 have relatively higher fertility rates compared to the age groups 15 and 45. The bunch of fertility curves in Figure 4.4(b) display a concave shape. It shows that the fertility rates climb from age 15 and reach their peak at around age 30 then decline. There exist some sparse patterns in the later part of the fertility curves, which reflects that the variations in birth rates above age 35 are obvious across the observed years.

Fertility modelling and forecasting
For the task of fertility forecasting, we again attempt to make 10-years-ahead out-of-sample forecasts of fertility rates for demonstration in this section. We maintain the same settings in the GPR model as for the mortality case, including the split of dataset, the number of interior knots in the mean structure and the Gaussian mixture components in the covariance structure with the same estimation procedures for all (hyper-) parameters involved. Figure 4.5 presents the forecast results of the selected age-specific fertility rates by the GPR model. We can see that the GPR method can catch the varying patterns of fertility reasonably well.

Comparisons and forecast accuracy evaluations with existing models
We now compare and evaluate the forecast performance and accuracy of the proposed GPR model with four other mainstream approaches in the demographic modelling literature. These include the Lee-Carter (LC) model (Lee & Carter, 1992), the Lee-Miller (LM) model (Lee & Miller, 2001), the Booth-Maindonald-Smith (BMS) model (Booth et al., 2002) and the Hyndman-Ullah (HU) model (Hyndman & Ullah, 2007).
The LC model applies principal component analysis to decompose the age-time matrix of the log mortality (or fertility) rates y x,t i into a linear combination of age and time parameters from the first-order principal component, i.e.
where a x is the averaged log mortality (fertility) rates at age x across all calender years, b x reflects the relative change in the log mortality (fertility) rates at age x, k t i measures the general time trend of the log mortality (fertility) rates, and { x,t i } n i=1 are the i.i.d. normally distributed error terms with zero mean and constant variance σ 2 . The LC model relies on the h-step ahead extrapolated time parameterk t i+h by some time-series models, such as ARIMA model, to produce a h-step ahead forecast of the mortality (or fertility) curve across all age groups.
The LM model, the BMS model and the HU model can be thought of as the extensions and variants using the similar framework of the LC model in Equation (4.1). Their ways to make forecasts of the mortality (or fertility) curves also depend on the extrapolation of the time parameters derived from the age-time matrix of the log mortality (or fertility) rates (Booth & Tickle, 2008). More specifically, the LM model is a modification to the LC model where the coefficient series is adjusted so that the fitted life expectancy is equal to the observed life expectancy in each year in an attempt to reduce forecast basis. The BMS model modifies the LC model to adjust the coefficients using the age-at-death distribution and determines the optimal fitting period beforehand to address the non-linearity problem in the time component.
The HU model extends the LC model by adapting the functional data paradigm using nonparametric smoothing to reduce outliers in observed data and more principal components for robust forecasts.
In contrast to the LC method and its numerous variants and extensions which rely on the time parameters to decide the movement of the whole mortality (or fertility) curve across all ages groups shifting from one year to another, our approach provides a novel point of view in demographic modelling and forecasting. The proposed GPR method treats demographic data in each age group as time-series data and assumes each of them following a Gaussian process to achieve the same tasks in a discrete but intensive fashion. The natural cubic spline mean function in the proposed GPR method can automatically discover and extract more recent information from the observed data and then extrapolate its future trend in an appropriate direction.
The spectral mixture covariance function in the GPR model, on the other hand, addresses the problems associated with the non-linearity and periodicity in the demographic data. Figure   where y x,t 2016 is the log fertility rates aged x in the year 2016.

Forecast accuracy evaluations using rolling-window analysis
For evaluating the forecast accuracy, we consider ten developed countries for which mortality and fertility data are also available in the Human Mortality Database (2020)  the parameters should be set in these models. Rolling-window analysis is used for assessing the consistency of the forecasting ability of a model by rolling a fixed size prediction interval (window) throughout the observed period (Zivot & Wang, 2007). We hold the sample data from the initial year up to the year t m , where t m < t n , as holdout samples and produce the forecast for the t m+h year where h is the forecast horizon. The forecasts errors are then determined by comparing the out-of-sample forecast result with the actual data. We increase one rollingwindow (1 year ahead) in year t m+1 to make the same procedure again for the year t m+1+h until the rolling-window analysis covers all available data in year t n . We consider four different forecast horizons (h = 5, 10, 15 and 20) with ten sets of rolling-window to examine the shortterm, the mid-term, and the long-term forecast abilities of the models. In the mortality rollingwindow experiments, the RMSE formula is where c is the selected country, w is the index of the rolling-window sets and x is the age from age 0 to age 100. And the RMSE in the fertility rolling-window experiments is , where x is the age from age 15 to age 45. Table 4.1 presents the average RMSE results of ten sets of rolling-window analysis across ten countries in four different forecast horizons for the mortality experiments. The proposed GPR model performs consistently the most desirable for mortality forecasting. It occupies the major positions of having the least forecast errors among the ten selected countries in the four different prediction horizons. The method is shown to be capable of capturing various mortality curve patterns across different periods and age groups. We can see that the GPR model has significant improvements in the prediction accuracy in comparison with the four other tested models, particularly for the mortality data of Japan. It may mainly be thanks to the intensive treatment of mortality curve fitting by the GPR approach, which enhances the preciseness and robustness, especially for countries with high mortality fluctuations in certain age groups due possibly to natural disasters. It is also worth noting that the forecasting performances of the LM  Table 4.2 shows the average RMSE results of ten sets of rolling-window analysis across the selected ten countries in the four different prediction horizons in the fertility experiments. The proposed GPR model continues to maintain the dominant positions of having the smallest forecast errors in the short term and the mid-short term prediction horizons and is equally well with the HU model in the fifteen-year forecast horizon. As for the long term prediction of the twenty-year forecast horizon, it seems that the HU model is more suitable to capture the fertility patterns with smaller forecasts errors than the GPR model. The HU model also fits the French fertility data better than the other models. It may be because the fertility data contain more outliers or measurement errors than the mortality data, and the HU model includes the smoothing techniques, which can improve the model forecasting performances if the observed fertility data are smoothed in advance.
Following the theoretical framework of Gaussian process regression discussed in Section 2, in this article, we have introduced a new design of the Gaussian process regression model equipped with the natural cubic spline mean function and the spectral mixture covariance function as a new approach for mortality and fertility modelling and forecasting. The use of the natural cubic spline mean function in the proposed GPR model can exploit the local information of the recent data and force its extrapolation beyond the observed data as a linear function to provide smooth and robust forecasts. The spectral mixture covariance function, on the other hand, detects and handles any non-linearity and periodicity of the demographic data unexplained by the fitted mean function.
We have demonstrated the usefulness and flexibility of the proposed GPR model through two empirical data applications: one is to forecast the male mortality data, and the other is to predict the fertility data. Our experiments have proved the forecasting ability of the proposed GPR model. In these two experiments, the accuracies of the predicted mortality and fertility curves are significantly improved by the proposed GPR model in terms of the forecast errors for the ten tested countries in different forecasting horizons compared to the four other mainstream approaches. The prediction performances of these four methods rely barely on the extrapolation of time parameters as they decide how the movement of the whole mortality (or fertility) curve should shift over time across all age groups. The entire fitted curve can go far away from its expected location when the fitted time parameters in these four methods are not well predicted, and this problem can be seen in our demonstration for the predicted Japanese fertility curve in Figure 4.8. The forecast results of the fertility curves by the mainstream approaches, such as the LC model and the BMS model, deviate from the observation noticeably in some parts of the curve. In contrast, the proposed method is more robust to this issue, because it treats the demographic data in each age group as time-series data and assumes each of them following a Gaussian process to achieve the same tasks in a discrete but very intensive fashion. This issue can hence be avoided as any misprediction by the proposed model only affects a single point of the curve in each age group, rather than the shape of the whole curve across all the age groups.
Furthermore, the proposed GPR model is in time series, non-linear and non-parametric structures and is not restricted to mortality and fertility forecasting and modelling only. It can also support a wide range of potential applications in many domains of applied science and engineering, such as signal processing or weather forecasting. Its particular features enable more flexibility than the four existing models considered in our study. Although Wu & Wang (2018) proposed a similar approach using a Gaussian process regression with a weighted linear mean function, the weighted linear mean structure relies on the assumption that more recent data tend to have more weights on the future mortality, which requires on subjective judgements on how the weights are assigned to reflect the impacts of different periods. It can affect the accuracy of forecasts remarkably when different subjective choices of weights are adopted, or the weight parameters are inappropriately specified (Wu & Wang, 2018). In contrast, our method exploits the natural cubic spline function as a non-parametric mean structure without any assumption made on the relationship between the mortality rates and the time, and it lets the natural cubic spline mean function automatically identify the local information from the observed data and then project the future mortality trend in an appropriate direction.
The main limitations of the proposed GPR models also relate to the characteristics of the classes of independent time series and simple non-parametric single population extrapolative methods of which it belongs to. Although it can capture the trends of the historical demographic data robustly, at the same time, it is lack of ability to incorporate and model more other related information. For example, one may expect that the mortality or fertility rates for different ages may be closely correlated, especially among the neighbouring ages. Our GPR model would be more desirable if it could model the dependence on different age groups simultaneously while taking their heterogeneity into account. We, therefore, aim to extend the current GPR model to a spatial or a multi-output GPR model for connecting the relationships between different age groups and time periods altogether. These will be left for our future works to achieve.