Calendar Effect and In-Sample Forecasting Applied to Mesothelioma Mortality Data

In this paper, we apply and further illustrate a recently developed extended continuous chain ladder model to forecast mesothelioma deaths. Making such a forecast has always been a challenge for insurance companies as exposure is difficult or impossible to measure, and the latency of the disease usually lasts several decades. While we compare three approaches to this problem, we show that the extended continuous chain ladder model is a promising benchmark candidate for asbestosis mortality forecasting due to its flexible and simple forecasting strategy. Furthermore, we demonstrate how the model can be used to provide an update for the forecast of the number of deaths due to mesothelioma in Great Britain using in recent Health and Safety Executive (HSE) data.


Motivation
Since the 1960s, mesothelioma or asbestos-related cancer has gained worldwide interest as a result of its increasing incidence, related medico-legal issues and poor prognosis. Mesothelioma is mainly caused by occupational exposure to asbestos fibres in sectors such as mining, road, railway and general construction as well as shipyards, etc., which have a mainly male workforce [1]. A brief, or even indirect, exposure to a small dose of asbestos fibres might be enough to trigger the disease much later in life [2]. For example, the latency period of mesothelioma is between 20 and 70 years with an average of around 40 years. Once the symptoms appear, it is rapidly fatal, with the majority of deaths occurring amongst those over 60 years of age [1]. In Great Britain, mesothelioma mortality has been steadily increasing in recent years, with 2101 deaths recorded in 2016 and a trend for the average age of deaths to slowly increase over time. Notifications of mesothelioma claims have exhibited a stable but increasing trend so far [3].
Asbestos-related claims have a lasting impact on the global insurance industry. The industry is paying, on average, $1.9 billion for mesothelioma claims annually (2013-2017, [4]) under policies that covered, for example, employer or product liability at the time of exposure. This has resulted in multiple insurer insolvencies since the 1940s. Still, today, it is difficult to project the industry's ultimate loss exposure due to advances in treatment, increasing life expectancies, changes in litigation and the number of new claimants emerging. A core uncertainty for the insurance industry is, therefore, whether the amounts of technical reserves set aside to cover future claims are sufficient. The UK insurance market estimates are based on population mesothelioma deaths projected by the Health and Safety Executive (HSE), which is the UK independent regulator with respect to health and safety in the workplace.
With respect to mesothelioma mortality forecasting, there are two key questions for insurers to answer: (i) when the numbers of deaths are going to peak (and establishing the peak value); and (ii) how the deaths will develop after the peak, that is, the shape of the forecasts. As insurers have to set aside reserves for future claims payments, the ultimate claim amounts and shape of how the number of deaths due to mesothelioma will reduce over time are of critical importance. If the shape is incorrectly forecasted, reserves might be overestimated or underestimated.

Literature Review
For decades, mortality forecasting has been an important tool for decision making in many fields such as actuarial science, economics, epidemiology and demography, to name only a few. With such great interest in the topic, numerous mortality forecasting approaches have been developed. The literature on mortality forecasting approaches is large, and we do not aim to provide a full review here (see for example [5] for a recent description). The first step in understanding and forecasting mortality patterns is to construct a model describing observed death counts or mortality rates, across age groups or within cohorts. The Lee-Carter model [6] is the current benchmark in mortality studies used by government agencies and pension funds. The model assumes that the dynamics of the logarithms of the central death rates are driven by an age specific constant plus the speed of change at each age multiplied by an overall time trend of mortality rates. The model has many extensions in the literature, which provide improved estimation procedures [7], a relaxation of assumptions [8] and adjustments to the model [9][10][11], among others.
Standard mortality approaches rest on dose-response analyses where both death counts and exposure are available. However, this is not the case in mesothelioma mortality forecasting, where the number of people who have been exposed to asbestos is unknown. There are two possible approaches to this problem. The first one is to construct a synthetic measure for exposure and to use a dose-response model. The second approach is to model only the observed number of deaths. The first approach has been used in the UK by the Health and Safety Executive (HSE), using a birth-cohort model [12][13][14], which assumes that the risk of mesothelioma depends on age and years of exposure, and that an individual's asbestos exposure is related to the year of exposure. However, a key problem with this approach is that the affected individuals could have been silently exposed to asbestos over prolonged periods of time so that there is no reliable measure for the exposure to asbestos. Therefore, the estimates include a high degree of uncertainty, and regular adjustments are necessary.
A discussion on modelling mortality with synthetic exposure can be found in [15]. Other approaches involving synthetic exposure include [16], who describes an application of the Lee-Cater model to forecast mesothelioma mortality in Argentina, while [17] considered Generalised Interactive Linear Models for Italian data, and [18] used Generalised Additive Models for Brazil. A different and simpler approach to model the observed number of deaths has been proposed by [15,19]. It does not rely on exposure measures and, therefore, avoids the difficulties associated with extrapolating them into the future. The approach is inspired by the so-called "chain ladder" method introduced by [20] in actuarial science. This is a technique used to calculate the liabilities in the form of outstanding claims faced by an insurance company. For an overview of the classical chain ladder method, see [21]. While the method was introduced as a deterministic algorithm, it is has been shown that it consists of an age-period-cohort (APC) model estimated using Poisson regression [22]. APC models have been studied for a long time. Refer to [23] for a general review of age-period-cohort models, and [24] for a recent review and comparison for cancer studies. They are primarily descriptive tools for data in Lexis diagrams when there are non-trivial age, period and cohort effects. A key issue in APC models is the overparameterisation induced by the relation age = period − cohort. The consequence of this relationship is that, without further assumptions, APC models are not uniquely identifiable, i.e., the model has infinitely many fits with infinitely many interpretations [25]. The modelling challenges that come with this problem have been well formulated by [26,27] in discrete time and by [28] in continuous time. When facing overparametrisation, there are two choices, either to work with the over-parametrised model, using for instance an ad hoc identification of the parameters in order to specify them uniquely, or to use a unique and well-defined parametrisation based on a maximal invariant (in generalized linear models, this is the canonical parameter). Following the second approach, [22] proposed second differences in order to uniquely parametrise APC models. While the canonical parametrisation has important theoretical advantages, it might feel less intuitive to many researchers and applied analysts. Therefore, we can find many examples in the literature where non-unique parametrisations are used. Some additional issues arise when looking at models with non-linear parametrisations, such as the Lee-Carter model-see [29,30] for a discussion on the identification problem in this model. Other common approaches in mortality analysis include Bayesian methods or random effects methods (e.g., [31,32]). Unfortunately the identification problem remains in these cases-see [30] for a discussion on additional issues with these methods.
Usually, the objective of a mortality study is to forecast the future mortality, and extrapolation is often used. A major challenge when using extrapolative approaches is to identify the underlying long-term mortality trend that can be extrapolated. Unfortunately, the data might not contain enough information, and a careful analysis of past mortality and its determinants is often required. Within the APC models, the period and cohort parameters are typically treated as independent time series processes, which are used to project the parameters stochastically into the future. Reference [33] combines the canonical parametrisation of [22] with standard methods for the forecasting of non-stationary time series. Reference [34] considers Box-Jenkins procedures to determine the time series processes generating the parameters. Recently, [35,36] defined the term "in-sample forecasting" to mean forecasting a structured function in regions where the function is not observed, but where it is determined by its values in the observed region. This has several advantages over methods based on time series analysis, as [37] discussed. Standard APC models can be understood as discrete density models with a simple multiplicative structure (or log lineal). There are many examples in actuarial science and epidemiology where in-sample forecasting is possible under an age-cohort (AC) model. Reference [38] called this approach the "continuous chain ladder" because of its relationship to the chain ladder method used in non-life insurance. Our paper shows that mesothelioma mortality forecasting is one such example even when a period effect is included in the model.

Aim and Outline
The aim of our paper is to propose, apply and further illustrate in-sample forecasting, by providing a projection of future mesothelioma deaths using simple methodology. We build on the approach of [37] by using the updated dataset from the HSE to conduct a study that forecasts the number of deaths due to mesothelioma. Furthermore, we analyse the differences in forecasts of future expected mesothelioma deaths due in respect of three models: the model applied in this paper, the discrete APC model of [19] and the model using synthetic exposure measures adopted by HSE [39].
Our paper makes two main contributions: First, we illustrate the method in which the approach of [37] addresses the problem of the lack of exposure data by applying the method to a real dataset that would normally pose a challenge to an insurer's claim reserving methodology. Our study expands on the method in terms of usability and applications. While [37] developed an extended continuous chain ladder model which allows for a calendar (period) effect and they illustrated the broad applicability using two empirical examples, we are able to explore further implications of the method when applying it to the challenging problem of mesothelioma mortality forecasting. We show that the approach is very flexible by introducing a smoothing technique for the temporal effects. Second, we provide an up-to-date estimate of the future number of asbestos related deaths using this new methodology, together with the recent data released by the HSE for 1968-2016.
The remainder of the paper is organised as follows: In Section 2 we formulate the approach of [37] in the context of mortality forecasting and show how it can be applied to provide forecasts of the number of deaths due to mesothelioma. We conduct this in three parts: first, a statistical structured density model is motivated and formulated; second, the density components are estimated non-parametrically; and finally, mortality projections are made into the future. Under the formulated model, we describe how the past data can provide the density component estimates as well as a complete forecast of the future; hence, the name "in-sample forecasting" is used. Section 3 presents the mesothelioma mortality forecasting analysis, together with discussion, before the paper finishes with concluding remarks in Section 4. Further details on the theoretical approach are provided in Appendices A and B.

Density Model
Let X denote cohort (birth-year) of an individual and let Y denote age at death for that same individual. Thus, X + Y is the period or calendar year of death. Let us consider X and Y as the two main time effects so we aim to describe the number of deaths as a function of X and Y. According to these variables, past observations {(X i , Y i ); i = 1, . . . , n} are typically supported by a trapezium. Figure 1 shows the special trapezium support of the observations in the dataset analysed later in this paper (see more details in the next section). In this context, the aim is to forecast the number of deaths which will occur in the periods beyond the most recent one where we have observations. The area with observations is a trapezium, and the area we aim to forecast is a triangle. Both areas are presented in Figure 1. To derive forecasts for the targeted triangle, Reference [36] suggested modelling and estimating the density of deaths on the full rectangle shown in Figure 1, which is the two dimensional density f of (X, Y). The problem is that, from the available observations, we can only estimate the density from the data contained within the trapezium. The solution to the problem comes from imposing a suitable multiplicative structure for the density.
A simple structured density model is as follows: where f 1 represents the cohort density, and f 2 denotes the age at death density. With the help of a backfitting algorithm, Reference [36] derived estimated componentsf 1 andf 2 from the data in the trapezium and used them to forecast the density on the target triangle. This can be performed since the data provide information for all cohorts and ages involved in the forecasting region. Model (1) is, however, too simple in many respects. From a mathematical perspective it means that variables X and Y are independent. In the context of this paper, this means that mesothelioma deaths only depend on birth cohort and age of individuals. Reference [36] extended the simple model described above by including a third component f 3 in the multiplicative structure, corresponding to the period or calendar time, that is, the variable X + Y. In this paper, we used such an extended model and described the density f of the observations with the following multiplicative structure: where f 3 represents the calendar or period effect on deaths. This model can be described as being a continuous APC model, while model (1) is a continuous AC model. To simplify the notation in these sections we normalize X i and Y i to take values in the unit interval [0, 1] (see left panel of Figure 2) and assume that the observations ( is the interval of calendar times where we have observations. The forecast region corresponds to the triangle I fc = {(x, y) ∈ [0, 1] 2 : x + y > 1}. The assumed trapezium and the forecast region are shown in the right panel of Figure 2. A key observation at this point is that model (2) is not identified, which means that the functions f 1 , f 2 and f 3 are not uniquely determined. Identification of this model relates to the identification of the commonly used age-period-cohort models (see [22,30]). On a logarithmic scale, the model becomes log f (x, y) = log f 1 (x) + log f 2 (y) + log f 3 (x + y), which can be rewritten as log f (x, y) = log g 1 (x) + log g 2 (y) + log g 3 (x + y), where the following is the case: with three arbitrary real-valued constants a 1 , a 2 and b. This means that the density components (on a logarithmic scale) can only be determined up to two linear trends. To overcome this problem we imposed the following three identification constraints: for an appropriate parameter 0 ≤ κ ≤ 1, which must be determined later on. Notice that determining κ = 1 means that model (2) becomes the simple continuous AC model (1). In this sense, κ represents the distance from a (continuous) APC model to an AC submodel (the distance being bigger for small values of κ).
Although the conditions (3)-(5) might seem very restrictive, they can always be fulfilled for smooth functions. Conditions (3) and (4) ensure that f 1 and f 2 can be interpreted as proper densities, and they can easily be achieved just by rescaling. Thus, the only assumption to be ensured is that f 3 is constant in the near past (5), which is justified from smoothness considerations. Assuming that log f 3 (z) is differentiable at z = 1, it can be approximated by a linear function in that region, i.e., log f 3 (z) = a + bz + χ(z) for z ∈ [1 − κ, 1], with κ being small, where χ(z) is approximately zero, and a and b are constants. Notice that this interval might be very small at worst, but it could also be the entire interval [0, 1]. Now we can move the linear trend a + bz to log( f 1 ) and log( f 2 ), and condition (5) is fulfilled. This is illustrated in Appendix A with a simple example.

Data
We consider data provided by UK Health Service Executive that consist of annual aggregated counts of deaths caused by exposure to asbestos in Great Britain. The original data are given by age levels and calendar year of death between 1968 and 2016 (all given in years). We consider only data corresponding to males with ages between 25 and 89, which gives an array with dimensions 65 (age levels) by 49 (calendar year). The observed total number of deaths (sample size) is n = 49,750. The authors of [19] analysed the same data at the time but only up to calendar year 2013 by using a discrete APC model. They showed that the main variables to consider in relation to death due to mesothelioma are the cohort and the age (see also [15] for similar conclusions from data up to 2007). Figure 3 shows the data according to these effects and visualizes the special trapezium support available for estimation. As discussed in Section 1.2 the only actual data available for forecasting mesothelioma deaths is the observed number of deaths (by age and period of deaths), while the number of people at risk (exposure) is not known. The risk set consists of those who have survived relative to the time of exposure and who have then been exposed. The long latency period of mesothelioma, and its rapid fatal end once discovered, contributes to the problem of finding reliable measures on exposure, as well as data on mortality from competing risks. Many researchers in this area, including the UK Health Service Executive, chose to estimate the exposure and used it in conjunction with the actual data. However constructing these estimates is nontrivial, and they include a high degree of uncertainty; moreover, regular adjustments are necessary when these measures of exposure are extrapolated [39]. Our approach, which follows that of [15], is to avoid having to estimate the exposure. Instead, we only need to model the observed number of deaths. In fact, in our specific case where we are adopting a continuous approach, we modelled the density of deaths. This suffices since our objective is to forecast aggregated mortality, and its simplicity is an advantage in forecasting.

Estimation
The density components f 1 , f 2 and f 3 in model (2) are the building blocks of any information about mortality in I. If the density f (x, y) is known, all sorts of information could be extracted, such as the number of future deaths in I fc . Let us denote this number as D I fc . An estimate can be calculated through the following expression: where τ is the number of all deaths in the full rectangle in Figure 3. Notice that τ satisfies the relation τ I f (x, y)dxdy = D I , where D I is the number of observed deaths in I. This implies that τ can be estimated, in practice, given an estimatef (x, y) of f (x, y) evaluated on I just by computingτ = D I ( If (x, y)dxdy) −1 . Then, the forecast D I fc is obtained byD I fc =τ I fcf fc (x, y)dxdy, given a forecastf fc (x, y) of f (x, y) on I fc . Thus, the problem of forecasting the future number of deaths reduces to a density estimation and forecasting problem.
Next, we describe how to estimate the density components f 1 , f 2 and f 3 from a data sample {(X i , Y i ) : i = 1, . . . , n}. Consider the following notation: Let S = {(x, y) ∈ I : x ≤ In practice, the exact solutions of the above integral equations are unknown because f is unknown. To estimate f 1 , f 2 and f 3 , we formulated empirical integral equations by substituting f in (6)-(8) by a suitable estimator. We considered the local linear density estimator of [40] with a bandwidth vector (b 1 , b 2 ), which is defined in Appendix B. With this choice, the empirical integral equations are defined as follows: under following constraints: (11) are chosen such that the constraints in (12) are satisfied. Here, we have used the notationf w,l for the estimator of f w,l above (l = 1, 2, 3), which has been obtained by replacing f withf .
Reference [37] proved the existence of a unique solution for the above empirical equations (see also [41] for related theoretical tools); however, the solution cannot be explicitly obtained, and a backfitting algorithm is required in practice to derive estimateŝ f 1 ,f 2 andf 3 . The algorithm can be written as follows.
Step 0. Letf 2 be starting values for estimating f 1 and f 2 , which satisfy the first two constraints in (12). Calculate the following: and setf 3 is chosen such that the third constraint in (12) is satisfied.
Step r. Letf  Calculate the following: (y)f Step r in the algorithm is iterated (r = 2, 3, . . .) until convergence. As a convergence criterion, we evaluate the change |f i (x) and stop when it is smaller than a tiny constant c > 0, for all i = 1, 2, 3. In our empirical analyses, we used c = 1e − 7 with a maximum number of iterations of 1000.
Notice that from the above algorithm we obtain estimates for f 1 (x) and f 2 (y) for x, y ∈ [0, 1]; that is, for all observed cohorts and ages. However we only obtained estimates of f 3 (x + y) for x + y in the trapezium I, which is the past period x + y. Thus, to derive the required forecasts, we will need to extrapolate f 3 . Among other aspects, this issue makes the model (2) more challenging than the simpler model (1). A convenient extrapolation of f 3 is described in Section 2.4.

Forecasting
Under model (2), from the density components' estimates that were derived in the previous section, we obtained an estimator of the density f of deaths observed in the past calendar times, i.e., in the trapezium I. However, the aim in mesothelioma mortality forecasting is to derive mortality projections for future calendar years, i.e., those lying in the triangle I f c = {(x, y) ∈ [0, 1] 2 : x + y > 1} shown on the right panel of Figure 2. To satisfy this aim we need to estimate the two-dimensional density f also on I fc . Under model (2), this can be performed from the previous backfitting estimatesf 1 andf 2 , along with a method of extrapolatingf 3 to the future calendar year points (z = x + y ∈ (1, 2]). Recall that the backfitting algorithm only estimates the calendar effect density f 3 (z) up to the present time point (z = 1).
From the identification constraints (3)-(5), we have assumed that the calendar effect f 3 is constant around the present time point z = 1, to be more precise on the interval [1 − κ, 1]. The authors of [37] used this assumption to extrapolate the calendar effect as a constant into the future, that is, by settingf fc 3 (z) =f 3 (1) for z > 1 (recall that z = 1 represents the more recent observed calendar time). Thus, projections of f (x, y) at future time points (x, y) ∈ I fc are given by the following.
By contrast to other forecasting strategies, such as the I(0), I(1) and I(2) forecaster of [33], the above extrapolation strategy is a natural method for describing the future based only on the past data (in-sample) and smoothness considerations. While the former forecasters extrapolate the (logarithmic) calendar effect linearly into the future, estimating the slope of such a line in three different ways, our approach eliminates the calendar effect from the model, normalizing it to have zero slope in the recent past by the imposed identification constraints.
At this point there is only one issue left: how to choose, in practice, the constant κ. This parameter can be interpreted as the length of the recent past which should be used to estimate and forecast the calendar effect. Therefore, it is of interest in practice to illustrate the effect of such a parameter on the forecasts. In our application to mesothelioma mortality, we perform this and conclude with a data-driven choice derived by cross-validation. The cross-validation method estimate κ from the data as follows: pick some small δ > 0 and define the following.
Let D be the set of data points (X i , Y i ) that lie in S < δ , that is, (X i , Y i ) ∈ S < δ . For any κ ∈ (δ, 1], compute the backfitting estimatorsf κ 1 ,f κ 2 ,f κ 3 from the data sample D, as described in Section 2.3, where the set S is replaced by S < δ . Next, we define the following. The estimator of κ is defined as follows: κ = arg min κ∈(δ,1] CV(κ). (13) where the following is the case.
The justification of the above criterion comes from the fact that CV is an estimator of the Mean Integrated Squared Error (MISE) off κ (x, y) in the set S > δ , which ideally one would minimise to choose κ. To observe this, we expand the MISE as follows.
Since the last term is positive and does not dependent on κ, we can minimise the above expression for MISE and ignore the final term. Then, as the second term depends on the unknown f , we can replace it with the simple non-parametric estimator n −1 . This provides the above expression for CV(κ).

Results
Using the methodology described above, we have analysed a dataset consisting of annual aggregated counts of deaths caused by exposure to asbestos in Great Britain for males aged between 25 and 89. From this data, we can update the results from the previous analysis of [19] using the additional data and applying model (2), which allows us to take into account the calendar year effect on the mesothelioma deaths.
Therefore, we assume that the two-dimensional density f (x, y), with x denoting the cohort and y the age, can be written as the product of three functions: the densities corresponding to the cohort and the age effects, f 1 and f 2 , as well as a third function, f 3 , describing the effect of the year (period) of death. In order to estimate the three density components, we have used the backfitting algorithm described in Section 2.3. For the local linear estimatorf of the two-dimensional density f , we have considered bandwidthŝ b 1 = 6,b 2 = 4.2 years. The bandwidthsb 1 andb 2 were obtained as follows: we first computed the common cross-validated bandwidthsb 1 andb 2 for the local linear estimator f and then rescaled them by the factor n −1/5 /n −1/6 . The justification of this rescaling is based on theory. The authors of [37] proved the consistency of the backfitting estimates assuming that the component bandwidths satisfy the condition n 1/5 b j → c j for some constant c j > 0 (j = 1, 2), i.e., they have convergence order of n −1/5 . The cross-validated estimates derived for the two-dimensional density have order n −1/6 (see [40]), so rescaling the cross-validated bandwidths with the above factor provides the theoretical requirements.
The estimated density components are shown in Figure 4. The graphs display the estimates produced by the backfitting algorithm for different values of the parameter κ. Recall that this parameter defines the length of the most recent time interval where the calendar effect function f 3 is constant. It can be also interpreted as the length of the recent past, which is used to estimate the calendar effect. For an easier interpretation here, we provide this parameter in number of years so the value κ = 49 corresponds to a constant calendar effect over the whole period from 1968 to 2016, κ = 10 corresponds to a constant calendar effect in the last 10 years, i.e., from 2007 to 2016, and non-constant from 1968 to 2006; in general, κ = 49 − p corresponds to a constant calendar effect in the last p years, i.e., from 2016 − p + 1 to 2016, and non-constant from 1968 to 2016 − p. This can be seen in the last graph of Figure 4 where the bigger κs correspond to nearly constant estimates of f 3 , while smaller values allow for general shapes. The estimates of f 1 and f 2 also vary slightly with the value of κ. Notice that the shapes of the densities do not allow us to observe the variations in more detail in the graphs.  Applying the forecasting method described in Section 2.4 in conjunction with the density component estimates for different κ's, we have calculated the predicted total number of deaths in future years (that is, in the years following 2016). The results are shown in Figure 5 for κ = 10, . . . , 49, along with the observed number of past deaths. We can observe that κ does not have a big effect on the forecasts, which agrees with the slight variations in the density component estimates for each κ shown in Figure 4.
It is of interest to predict the peak value (highest number of deaths) as well as the year of peaking. From Figure 5, we can observe that the peak had already been reached in 2016, and it is confirmed for all the κ values considered. The peak value was 2101 observed deaths. This agrees with the provisional data for 2017 published by the HSE [39], where the number of deaths in the year 2017 had decreased to 2087.
To derive our final forecasts, we consider the cross-validation method defined above to choose the value of κ, see Equation (13). In order to minimize the function CV, we evaluate it on κ = 10, . . . , 49. The CV function is quite flat, showing slightly smaller values for the bigger κ's. This seems to be in line with the stable behaviour of the density component estimates shown in Figure 4 and suggests choosing the maximum value of κ, which is κ = 49 years. This value corresponds to the case of f 3 being constant for every period, that is, it corresponds to a simple age-cohort model. For this choice we predicted 2063 deaths in the year 2017, which is a bit lower than the available provisional figure of 2087 for 2017 [39]. The number of deaths decreases slowly until 2032 deaths were reached in 2020. Table 1 shows our forecasts until 2022 using the full dataset (third column), as well as using only data up to 2013 (sixth column). The available data are shown for assessing forecasts derived from data up to 2013 (second column).  Table 1. Mesotheolioma mortality forecasts in the UK. Five forecasts of numbers of deaths are compared: "our-201x" corresponds to our proposal using data up to 201x and constant calendar effect; "apc-201x*" to the discrete approach of [19] using data up to 201x and truncating cohorts from 1966 (x = 3, 6); and HSE projections also using provisional data for 2017, "HSE-2017p". Available data are shown for assessing forecasts. For comparison purposes we have included in the table forecasts derived from the discrete approach of [19] (fourth and last columns) and the more recent HSE forecasts [39] (fifth column). The discrete approach of [19] is computed by truncating the data corresponding to the youngest cohorts, that is, from 1966 as those authors suggested. This is necessary since the approach does not perform any smoothing and cannot properly deal with sparsity in the data corresponding to those cohorts. These forecasts have been computed using the apc package [42] and are shown for years up to 2022. Our approach and the discrete approach of [19] provide similar forecasts, but we do not need to perform any arbitrary truncation in contrast to the discrete approach. Moreover, truncating cohorts from 1966 means that mortality is only projected from ages above 50, which might explain the slightly lower forecasts from the discrete model. On the other hand, HSE provides a similar forecast for calendar year 2018 but differs substantially for future periods. Figure 6 shows the differences in the shapes of forecasts being compared. HSE projects a notably faster decline of deaths than the other approaches. For reference, we have also shown in this figure the forecasts published in [19] using (truncated) data up to 2013 (the truncation in this case resulted in projections only for ages above 47 years). Moreover, we have added our forecasts with data up to 2013 (without truncation) and constant calendar effect (corresponding to the value of κ chosen by cross-validation). By restricting the data for estimation up to 2013, we can assess whether our proposal would have been able to predict the peak in 2016. Figure 7 shows the peak forecasts for different values of κ along with the year of peak. Peak values vary with κ between 2032 and 2077 and the year of peaking between 2016 and 2018. The observed peak of 2101 in 2016 is, therefore, hardly predicted using data up to 2013. The same happens with the forecasts derived by [19], which predicted a peak of 2079 deaths in 2017. This seems to be natural since the statistical projections describe the expected future mortality as a smooth curve, while the actual numbers in years close to the peak are expected to fluctuate above and below due to year-on-year random variation [39].

Discussion and Conclusions
Standard methods and common benchmarks in the literature on mortality forecasting rely on dose-response models where both deaths and exposure are observed. Such methods involve the non-trivial and risky exercise of estimating the exposure when it is not known, as happens in the case of asbestos-related mortality. Our paper demonstrates that the methodology of [37] can be another benchmark with important benefits. First, it does not require any modelling, estimating and extrapolating of the exposure when it is unknown. This makes the approach more robust compared with those that are more detailed in this regard, especially when the model for the exposure is mis-specified. Second, it is very intuitive due to its connection with the popular APC models. Third, it takes advantage of the powerful non-parametric structured models, which exhibit excellent theoretical and practical properties, compared to the standard (discrete) APC models. Finally, forecasting is entirely determined by the data, avoiding the need to use time series modelling and other more sophisticated extrapolation techniques. This further contributes to the robustness of the approach in practice.
We applied our method to actual data consisting of the number of deaths due to mesothelioma between 1968 and 2016, which we obtained from the Health and Safety Executive. From this, we have been able to produce an up to date forecast for the number of deaths in the future. We have also compared the results from our model to those from the discrete AC model of [19] and the model used by the Health and Safety Executive. While our forecasts, and those from the discrete age-cohort model, show very similar shapes, the HSE forecasts differ notably, showing a much faster decline in the number of deaths up to calendar year 2030. In addition, we have provided forecasts with truncated data which could be compared to data for the following years. In all cases, the forecasts were reasonable but differed depending on the constant κ, which is the length of the interval in the past for which the period function is set as constant. While choosing κ is challenging, we explained a cross validation approach on how to choose it in practice.
Modelling the number of deaths by our density approach suffices when the objective is to forecast aggregated mortality. However, more detailed information might be required beyond this objective. For instance, if a further study considers the prevention of deaths due to mesothelioma, then it would be necessary to model the length of the latency period and to take into account survival from competing risks.
A limitation of previous approaches compared in this paper ( [19,39]) is that the ageprofile is assumed to be common for all cohorts [12]. The model considered in this paper relaxes this assumption by introducing some dependence between age and cohorts. More general structures could be assumed under a similar density approach, but with the risk of increasing the uncertainty arising out of such more general models. Therefore, further research would be required to find a good compromise between model complexity and uncertainty. Evaluating the uncertainty of the forecasts is another issue which will require further work, e.g., to derive confidence bands for the forecasts.  Acknowledgments: The authors thank Robert J. Brooks for providing the data analysed in the paper. Moreover, discussions with Jens P. Nielsen are gratefully acknowledged. Finally, we thank two anonymous reviewers for their many valuable comments and suggestions which have helped to improve the quality of the article.

Conflicts of Interest:
The authors declare no potential conflicts of interests.

Appendix A
Here, we describe how the identification constraint (5) can be fulfilled in practice by using a simple example. Recall that, under smoothness assumptions, this amounts to removing from f 3 a log-linear trend in the interval [1 − κ, 1] so it becomes (approximately) constant.