1. Introduction
From a socioeconomic perspective, disability insurance and, in particular, long-term care (LTC) insurance have recently become areas of interest in developed countries as a result of the observed demographic trends. The accelerated ageing of the population, caused by the increase in life expectancy and the decline in fertility rates, is expected to intensify in the coming years, as is discussed in
OECD (
2023).
With the increase in the elderly—and consequently dependent—population, several countries, including Germany, the United Kingdom, France, and Switzerland, have developed public long-term care networks and also offer LTC products through private insurance companies; see
Haberman and Pitacco (
2018). Recently,
Lee et al. (
2023) conducted a detailed comparative analysis of LTC systems across 26 OECD countries, focusing on the source of financing (namely tax-based systems, health insurance and LTC insurance) while providing a characterisation of each financing mechanism.
Regardless of the financing mechanism, LTC systems are inherently challenging to implement, as explored in
Barreira et al. (
2023). LTC insurance products, in particular, are complex to design due to their broad scope of coverage (both in terms of benefits offered and the extended duration of the policies), which often results in high premiums that are not affordable to the general population. Moreover, since this field and its associated products are relatively new, there is a significant lack of data concerning disability events, which is essential for developing accurate models for premium and reserve calculations.
Population ageing is also an observable trend in Portugal, see
INE (
2023). To address this phenomenon, the country has established the National Network of Integrated Continuous Care (RNCCI), a public network that provides healthcare and social support to individuals in situations of dependence. Several studies have been conducted using RNCCI data, mainly focusing on characterising key features of the Portuguese population with dependency needs. Notable works include those by
Lopes et al. (
2020,
2021) and the other references therein, which explore various aspects of the system in place, such as the demographic profile of individuals receiving care, factors influencing the duration of care, and the effectiveness of integrated continuous care systems in addressing dependency challenges. These studies provide valuable insights for improving care strategies and policies targeting the dependent population in Portugal. Portugal has been identified as one of the EU-27 member states with the highest rates of care provided by informal carers and with the highest share of direct out-of-pocket funding for LTC, see
European Commission (
2021).
Although there are no private companies in Portugal providing LTC insurance products, mostly due to the high risk involved and the lack of data, see
Esquível et al. (
2021), some research work has been conducted in this area. In
Oliveira et al. (
2017), using 2015 data from the RNCCI database and a clustering-based approach, the authors estimate the discrete-time transition probabilities and
Oliveira (
2017) develops the simulation and calibration of a multi-state model, in discrete time, for the calculation of the associated costs and premiums for an LTC product in Portugal. In
Esquível et al. (
2021,
2024), using the same dataset and relaxing the assumption of a discrete model, the authors present a method to estimate and calibrate transition intensities for non-homogeneous continuous-time Markov chains, and they present a general multi-state model using official Portuguese data. In this paper, we aim to develop further stochastic and statistical methods for the formulation of disability and LTC insurance products in Portugal by incorporating the demographic characteristics of the population into the framework.
From an actuarial perspective, the main concern when dealing with disability and LTC insurance (as with other types of insurance) is the modelling of the underlying risks. In this case, we are presented with a higher complexity, not only due to the wide range of possible statuses of an individual’s health but also because of the variety of benefits that can be provided by this type of insurance. Therefore, robust models and accurate estimates are required for the accurate risk modelling of the different events and underlying benefits, as well as for the respective premium and reserve calculations.
Multi-state models are the most common actuarial models used for the study of disability and LTC insurance. A well-established mathematical study of the theory of Markov chains and their relation with multi-state models is presented, for instance, in
Ross (
1996), and
Wolthuis (
2003). The application of multi-state models in disability and LTC insurance has also been thoroughly studied, see
Pitacco (
2014),
Haberman and Pitacco (
2018), and
Dickson et al. (
2019). Regarding the methodology for the formulation of these models, the mentioned studies follow the transition intensity approach (or TI-approach) proposed in
Waters (
1984), in which the transition intensities are considered the fundamental quantities (or parameters) of multi-state models. For the estimation and graduation of the transition intensities, methods that derive from the TI-approach have been developed—see
Waters (
1984),
Dickson et al. (
2019) and
Haberman and Pitacco (
2018)—including parametric approaches that apply generalised linear models (GLMs) as graduation techniques, as described in
Rensham and Haberman (
1995).
Several studies, employing diverse methodologies involving Markov multi-state models, predominantly focus on age and gender as the key risk factors in the analysis of long-term care dynamics.
Cui et al. (
2022) estimates the transition probabilities to project LTC costs, explicitly accounting for age and gender differences. Similarly,
Pritchard (
2013) evaluates the transition intensities and associated costs under LTC insurance, distinguishing between males and females.
Fong et al. (
2015) develops age- and sex-specific functional status transition rates using GLMs, particularly for older individuals.
Baione and Levantesi (
2014) addresses data limitations by estimating transition intensities based on aggregated mortality and morbidity information, stratified by age and gender.
Leung (
2006) applies graduation by mathematical formulae to obtain graduated transition intensities and then compute LTC premiums, with separate considerations for males and females. Collectively, these studies highlight the significance of age and gender in assessing LTC risks and costs.
The primary goal of the work presented in this paper is to develop a parametric framework for the estimation and graduation of transition intensities within a multi-state model, specifically designed for disability and LTC insurance applications. Recent studies on the analysis of long-term transition dynamics employ methodological approaches that closely resemble ours, similarly using age and gender as baseline risk factors, though differing in the selection of additional covariates.
Czado and Rudolph (
2002) employs a semiparametric hazard model that, while centred on diagnoses as key predictors, also includes age, sex and nursing home status as explanatory variables, which aligns with our interest in multidimensional risk factors. Similarly,
Fuino and Wagner (
2018) models transition probabilities by age and gender but extends the analysis to account for duration in dependency.
Kabuche et al. (
2024) maintains a demographic focus (age and gender) but introduces additional stratification by functional disability and chronic illness, mirroring our emphasis on layered risk assessment.
Guo (
2024),
Arrighi et al. (
2017) and
Casasnovas and Nicodemo (
2016) further expand the scope by incorporating socioeconomic variables (education, economic status, or residence), reinforcing our methodological commitment to capture diverse influences on LTC transitions. While these studies share important conceptual similarities with our approach (particularly in incorporating additional risk covariates and addressing data limitations in LTC insurance), our methodology introduces novelty in the considered risk factors and through the application of GLMs. In contrast with the mentioned authors, which predominantly employ semiparametric hazard models, in our GLM-based framework, we take advantage of the flexibility and simplicity offered by this type of models while maintaining robust modelling capabilities.
In the absence of real portfolio data for the application of the graduation techniques, we resort to data simulation to generate a dataset of individuals with demographic and lifestyle characteristics (such as age, sex, smoking habits, body mass index, exercise habits and region of residence) representative of the Portuguese population.
The ultimate objective of this work is to present functional forms for the transition intensities as a function of age, where the parameters are dependent on the demographic characteristics of the population. In other words, the proposed methodology differentiates individuals based on their risk profiles when estimating their transition intensities. This approach properly adjusts the probabilities used in the calculation of premiums and reserves for disability and long-term care insurance, ensuring that premiums are appropriately aligned with individual risk profiles.
Personalised risk premiums in LTC insurance can significantly benefit insurers by improving risk assessment accuracy, enhancing profitability and promoting market sustainability. By tailoring premiums to individual risk factors, insurers can better align pricing with actual expected costs, reducing adverse selection and financial instability. Advanced modelling techniques, including transition probability analysis and multi-state frameworks, allow for more precise premium differentiation, thus ensuring that high-risk policyholders pay appropriate premiums while lower-risk individuals are not overcharged. This tailored pricing improves underwriting efficiency, optimises reserve allocations, and may expand insurability by making coverage more accessible to lower-risk groups. Accurate personalised premiums in LTC insurance enhance risk management by reducing financial volatility through precise claim forecasting, improving underwriting efficiency via data-driven models, and strengthening competitiveness by attracting lower-risk policyholders with fair pricing. This approach also supports regulatory compliance by ensuring adequate reserves and solvency margins, while enabling dynamic pricing adjustments for long-term portfolio stability and profitability. Overall, personalised LTC premiums strengthen insurers’ financial resilience while incentivising data-driven product innovation.
In this work, we present a model that is both generalizable and applicable in a real-world context.
Section 2 outlines our proposed framework step by step, ensuring straightforward adaptation to other model formulations and/or real-world datasets.
Section 2.1 introduces a multi-state model designed for disability and long-term care insurance applications, which serves as the foundation for illustrating the proposed approach, which relies on the estimation and graduation of the transition intensities considering multiple risk factors. To address the critical lack of reliable data for disability and LTC insurance in Portugal, we developed a sophisticated synthetic portfolio in
Section 3.1 that accurately mirrors the following: (a) Portugal’s demographic structure; (b) lifestyle risk factors representative of the population; and (c) policyholder behaviour patterns. This simulated dataset enables Portuguese insurers to (i) test our methodology without proprietary data constraints; (ii) validate models against Portugal-specific risk profiles; and (iii) develop products tailored to the country’s ageing population challenges. The simulation framework, which may be extended for other risk factors, provides an essential tool for insurers to overcome data limitations while maintaining actuarial rigour, making it a crucial advancement for this developing insurance market. The proposed methodology directly addresses the need for sophisticated yet practical tools to support the growing disability and LTC insurance sector.
The framework developed in this paper offers a flexible foundation for multi-state modelling across diverse populations and/or insurance portfolios and, while broadly applicable, the results obtained in this paper may be used to propose a risk-adjusted pricing structure tailored to the Portuguese population’s profile.
2. Risk-Adjusted Estimation and Graduation of Transition Intensities
This section presents the proposed methodology for the estimation and risk-adjusted graduation of transition intensities within continuous-time Markov multi-state models in a structured sequence, with each step corresponding to the fully detailed technical development in subsequent subsections.
The framework proposed in this paper can be summarised into three key steps, briefly described as follows:
Multi-state model specification
- –
Design of a continuous-time Markov model with the definition of a finite state space, with states representing the underlying condition or status of individuals.
- –
Identification of possible transitions between states and the characterisation of the corresponding transition intensity matrix.
Estimation of transition intensities
Estimation of piecewise constant transition intensities, stratified by age intervals, following a maximum likelihood approach.
Risk-adjusted graduation of transition intensities
- (a)
Data preparation
Collection of transition counts between states and state-specific sojourn times, incorporating relevant risk factor data.
- (b)
Modelling framework
Implementation of a Poisson GLM (log-link function) with the formal inclusion of risk factors through the covariate selection and estimation of smooth, risk-adjusted intensity estimates.
- (c)
Functional form specification and parameter estimation
Definition of a functional form for risk-adjusted transition intensities (e.g., Gompertz–Makeham), using non-linear regression models for parameters estimation.
2.1. Multi-State Model Specification
We begin by defining the specific multi-state model employed throughout this paper to illustrate the results, while emphasizing that the proposed methods and findings are generalizable to a broader class of multi-state models.
For a general continuous-time multi-state model, with a finite state space , , each state represents a distinct condition of an individual at a given time. Instantaneous transitions are possible between selected pairs of states, and the model describes the random transitions of the individual over time as they move between the possible states.
For each
, the random variable
takes one of the values
, which represents the state of the individual at a given age
x. The set
is a continuous-time stochastic process and, for states
and times
, we define the transition probabilities,
and the transition intensities of the multi-state model,
Furthermore, assuming that the probability of future transitions depends solely on the current state of the process (i.e., the transition probability is independent of any past values of the process prior to time x), we conclude that the process is a continuous-time Markov chain.
Multi-state models can be designed for disability and LTC insurance purposes by considering several states to represent different degrees of disability/lack of autonomy.
Disability insurance (DI) and long-term care insurance (LTCI) both address disability risk but differ fundamentally in scope and purpose: DI provides income replacement when policyholders cannot work due to illness/injury, typically paying monthly benefits tied to pre-disability earnings and requiring occupational incapacity proof; meanwhile, LTCI covers actual care costs (such as nursing homes or home care) triggered by functional impairments in Activities of Daily Living (ADLs) or cognitive decline, reimbursing specific expenses regardless of employment status. DI primarily serves working-age populations with time-limited benefits, whereas LTCI targets older adults or those with chronic conditions, offering lifelong coverage for care needs. Although distinct, both products mitigate financial risks arising from disability as follows: DI preserves income streams, while LTCI protects against catastrophic care expenditures. The methodology proposed in this paper serves both types of insurance, with premiums defined according to the respective coverages and benefits.
To illustrate the framework proposed in this paper, we consider a general four-state continuous-time Markov model, with two disability states (State 1 ≡ Autonomous, State 2 ≡ Mildly Disabled, State 3 ≡ Severely Disabled, and State 4 ≡ Dead), where a direct recovery from a severe disability is not possible, as illustrated in
Figure 1.
For
, the corresponding transition intensity matrix of the model is given by
and, for
, the transition probability matrix is given by
with
easily derived from the Kolmogorov forward equations given the transition intensities, see (
Haberman and Pitacco 2018, p. 18).
While the model in
Figure 1 considers only two intermediate disability/LTC states, we emphasise that both the methodology and results presented in this paper can be easily extended to more complex multi-state disability/LTC frameworks.
2.2. Estimation of Transition Intensities
In this paper, we adopt the formulation of multi-state models based on the transition intensity approach (TI-approach), in which the transition intensities are treated as the models’ fundamental quantities (or parameters). For the estimation of the transition intensities, we follow a maximum likelihood approach to obtain crude estimates, assuming that the transition intensities are piecewise constant functions of age, i.e., they are constant between consecutive ages. This technique was introduced in
Waters (
1984) and also studied in
Haberman and Pitacco (
2018) and
Dickson et al. (
2019).
In the following, let us consider a population of individuals under observation during a certain period of consecutive years and, for illustration purposes, the disability multi-state model presented in
Figure 1. For each individual, we collect data referring to the transitions between states during the observation period, including the sojourn times in each state, as well as the number of transitions of each type. Hence, each age
x individual symbolises an independent realisation of the underlying continuous-time Markov process
.
Without loss of generality, let us select an interval of consecutive ages, say
, over which we can reasonably assume that the transition intensities are constant, i.e.,
Let us also assume that there are N individuals under observation with age within the considered range .
Transition Intensity Estimation Approach
The proposed methodology, based on the model depicted in
Figure 1, but easily adapted for different multi-state models, can be summarised by the following steps:
Individual likelihood function
Determine the likelihood function for each individual under observation as a function of the transition intensities.
For a single individual, see (
Haberman and Pitacco 2018, p. 147), each stay of duration
in a given state, followed by a transition to a different state, contributes to the likelihood of the sample path with a factor of the form
Thus, for , and defining
the likelihood function for individual
k, as a function of the transition intensities
, is given by
Population likelihood function
Obtain the likelihood function for the whole population under observation.
Assuming the mutual independence of the individuals’ sample paths, the likelihood function for the observed population is given by
with
and
,
.
Estimation of transition intensities
Obtain the maximum likelihood estimators for the transition intensities.
Factorizing the likelihood function with respect to each parameter to be estimated, we obtain terms of the form
resulting in the log-likelihood function
The maximum likelihood estimators for the transition intensities, for a given age
x, are given by
Following the steps outlined above, for each consecutive age interval within the observation period, crude estimates of the transition intensities for each year of age are obtained, resulting in a piecewise constant function between ages (as initially assumed).
This methodology and assumption may compromise certain desired properties, such as the smoothness of the transition intensity function across ages, see (
Haberman and Pitacco 2018, p. 145). Therefore, graduation of these raw estimates is usually employed to maintain such desired characteristics.
2.3. Risk-Adjusted Graduation of Transition Intensities
In this work, a parametric approach using generalised linear models (GLMs), specifically Poisson regression models, with predictors of the Gompertz–Makeham form is adopted. For completeness, we begin with a brief introduction of the necessary concepts.
The Gompertz–Makeham mortality law, see
Gompertz (
1825) and
Makeham (
1860), is a fundamental model for age-dependent death rates, and it can be expressed as
While the Gompertz–Makeham law is primarily used to model mortality rates, it can also be applied to describe other age-dependent transitions in multi-state models, such as long-term care and disability insurance, see (
Haberman and Pitacco 2018, p. 100).
Moreover, it is among the first expressions employed for the graduation of mortality rate estimates, see
Forfar et al. (
1988). This approach is also applicable to the graduation of other transition intensities in multi-state models and, notably, it can be integrated into graduation processes using GLMs, see
Renshaw (
1991), as demonstrated in the graduation process outlined in this paper.
For the risk-adjusted graduation of transition intensities, consider a vector , , of observations (or response variables), assumed to be the realisation of , a vector of independent random variables. Additionally, let , , represent the vectors of covariates (or explanatory variables), each containing n known observations, which model the response variables.
In order to estimate the expected value of
given the observations
, i.e.,
, GLMs assume that there exists a non-linear injective and differentiable transformation
g, known as the link function, such that
where
is a linear predictor and
,
, are unknown parameters to be estimated by maximum likelihood, based on observations
.
To graduate the raw estimates of the transition intensities obtained from the methodology described in
Section 2.2, we consider a Poisson regression, i.e., a GLM with a Poisson-distributed response variable, with a logarithmic link function.
This parametric graduation method based on GLMs was first proposed in
Renshaw (
1991) for the graduation of the force of mortality. In
Rensham and Haberman (
1995), it was further applied to the graduation of the transition intensities in the standard sickness–death model, noting that it can be generalised to other multi-state models, provided the required data are available. While
Rensham and Haberman (
1995) considers as the explanatory variables the age at the onset of sickness and, for the transitions from the sick state, the sickness duration, this paper incorporates additional observable risk factors, which is one of the key contributions of this work.
Regarding the modelling of mortality indicators, several methods have been proposed to account for population heterogeneity and the impact of various observable variables on mortality. In
Rensham (
1994), a graduation of the force of mortality is performed, incorporating multiple explanatory variables, such as gender, smoking habits, medical status, and age.
In this paper, we generalise the approach presented in
Rensham and Haberman (
1995) to the multi-state model introduced in
Section 2.1, expanding the explanatory variables to include additional demographic risk factors (such as age, sex, smoking habits, body mass index, exercise habits, and region of residence), in line with the methodology of
Rensham (
1994).
The proposed risk-adjusted graduation process consists of the following steps:
Data collection
Collect and process the data regarding the transitions between states.
For the graduation of each transition intensity, organise the data in a grid of units or cells denoted by , where correspond to the risk factors. The observed data consist of a set of ordered pairs , where is the number of transitions accruing from the central exposures (or sojourn times) , for each unit u. The covariates included in the definition of units u will be different for each intensity, as well as the definition of and , as follows:
- -
For , and , represents the observed number of transitions from the Autonomous state to the Mildly Disabled, Severely Disabled, and Dead states, respectively, accruing from corresponding exposures , which, in this case, is the total sojourn time in the Autonomous state.
- -
For , and , represents the observed number of transitions from the Mildly Disabled state to the Autonomous, Severely Disabled, and Dead states, respectively, accruing from corresponding exposures , which, in this case, is the total sojourn time in the Mildly Disabled state.
- -
For and , represents the observed number of transitions from the Severely Disabled state to the Mildly Disabled and Dead states, respectively, accruing from corresponding exposures , which, in this case, is the total sojourn time in the Severely Disabled state.
GLM framework
Without loss of generality, let
be one of the transition intensities available in the disability model with two disability states, presented in
Section 2.1. We assume that there exists a differentiable and injective (and thus invertible) function
g that links the intensities
to a linear predictor
where functions
define the known covariate structure of the current model and coefficients
are the unknown regression parameters. In other words,
Parameter estimation
Estimate the parameters using a maximum likelihood approach.
Assuming constant transition intensities between consecutive ages, we showed that the likelihood function for the transition intensities is expressed by Equation (
3), which is equivalent to assuming that the number of transitions is Poisson-distributed. Thus, considering
the response variables of the GLM, we assume they are independent and Poisson-distributed, with the expected number of transitions given by the product of the exposures to risk and the respective transition intensity.
However, since the same transition between two states may be observed multiple times for each individual (with the exception of transitions to the Dead state), the variance of the observed number of transitions may exceed the mean, indicating the presence of overdispersion.
To account for the possibility of overdispersion, we assume that
are independent overdispersed Poisson variables,
with mean and variance, respectively, given by
where
is the scale or dispersion parameter. Furthermore, from (
5) and (
6), the expected value is expressed as
which establishes a relationship between
and the unknown parameters
.
Considering
and
, the log-likelihood function associated with the Poisson-distributed response variables is given by
which can be expressed in terms of the
using Equation (
7). The estimates
are obtained by maximizing the log-likelihood function (
8).
Link function
Choose the function that links the transition intensities to the linear predictor containing the estimated parameters . We employ the log-link function, as it serves as the canonical link for Poisson-distributed response variables.
Using the logarithm in Equation (
6), we have
replacing the linear predictor by its expression in Equation (
5). By the properties of the logarithmic function,
and we observe that the terms
act in the linear model as an offset. Therefore, the graduated transition intensities are given by
It is important to highlight that graduation methods based on GLMs are particularly valuable, not only due to their broad range of actuarial applications but also for their versatility. These methods can be applied to the transition intensities in multi-state models, extending beyond their traditional use in modelling mortality indicators.
3. Empirical Approach
In this section, we present a practical application of the previously developed methods for estimating and graduationg the transition intensities of the multi-state model illustrated in
Figure 1.
Using publicly available data for Portugal—for Health statistics, see
INE (
2020a,
2024), and for Census data, see
INE (
2021a,
2021b,
2021c)—we employ simulation techniques to generate a dataset/portfolio that reflects the characteristics of Portuguese individuals, as well as their trajectories through the states of the adopted multi-state model over a given observation period. By applying the previously introduced GLM framework, we obtain graduated estimates of the transition intensities, which depend not only on the individuals’ age but also on other relevant risk factors.
The simulation procedures, model fitting and additional computations have been implemented using R (version 4.2.3) software.
3.1. Dataset/Portfolio Simulation
In Portugal, there is currently a lack of data related to LTC insurance, as referred to, for instance, in
Esquível et al. (
2021). Therefore, it is challenging to obtain the necessary data to estimate and graduate transition intensities following the TI-approach (namely, number of transitions of each type and the corresponding central exposures per age).
For the purpose of this study, in the absence of a real dataset, we start by simulating a dataset/portfolio reflecting the demographic characteristics of the Portuguese population based on the most recent health statistics, see
INE (
2020a,
2024), and Census data, see
INE (
2021a,
2021b,
2021c). Employing the calibrated transition intensities derived in
Esquível et al. (
2021) for a similar multi-state model and from real data of the Portuguese population, we use the transition intensity matrix of the model in
Figure 1 to simulate trajectories for the individuals within the portfolio over a given observation period.
Current demographic and health-related trends observed in Portugal (many of which are shared with other nations), such as the rapidly ageing population—projected to reach 3 million elderly by 2050, according to
INE (
2020b)—the high prevalence of chronic conditions among older adults, and regional disparities in healthcare access, highlight the need for LTC insurance products. By integrating Portugal-specific characteristics—such as age and sex distribution, smoking and exercise habits, body mass index (BMI), and region of residence—as risk factors into the modelling of transition intensities, this study provides a foundation for developing tailored premium models. These models align with market needs and public health priorities, while addressing Portugal’s critical LTC coverage gap through more accurate, equitable, and sustainable risk-based pricing.
The 2021 Population and Housing Census, conducted by the Portuguese National Institute of Statistics (INE), provides data on the distribution of the resident population in Portugal by age group, sex, and regional population size, see
INE (
2021a,
2021b,
2021c). Additionally, the 2019 National Health Inquiry, also conducted by the INE, provides data on the distribution of Portugal’s resident population by age group and sex for each of the remaining covariates (BMI, smoking habits, and exercise habits), see
INE (
2020a). Note that BMI serves as a high-level indicator of an individual’s health status based on their height and weight.
Based on the mentioned publicly available data for the Portuguese population, see
INE (
2020a,
2021a,
2021b,
2021c,
2024), the risk factors and the corresponding levels considered to model the transition intensities are described in
Table 1. Note that the variable region is categorised in terms of population size (in number of individuals) and the exercise habits are measured by the number of hours spent performing physical activity in a normal week.
A dataset of 100,000 mutually independent individuals was simulated, in which the explanatory variables for each individual are sampled according to the available statistical data for the Portuguese population.
The simulation of the individual trajectories through the states of the multi-state model is based on each individual’s characteristics or, in other words, their risk profile. The following elements were simulated for each individual:
- -
Initial state: the condition or status of the individual at the beginning of the observation period, serving as the starting point for their trajectory.
- -
Exposure: the length of time during which the individual is observed (or at risk) within the study period.
- -
Transition intensities: the intensities of the transitions performed by each individual, according to the transition intensity matrix (
1) associated with the adopted multi-state model.
For the purpose of the data simulation, we assume the initial state of the trajectories may correspond to any state of the model except the Dead state. Initial states are then sampled for each individual in the population using the information available on the Health Statistics published by
INE (
2024) for the year 2022, which provides the distribution of the resident population in Portugal by sex and age groups, categorised according to limitations in performing daily activities (“Not limited” corresponding to the Autonomous state, “Limited but not severely” to the Mildly Disabled state, and “Severely limited” to the Severely Disabled state). Note that when dealing with real portfolio data regarding a disability or LTC insurance, the Autonomous state would be the initial state for all individuals (which would impact the transition intensities estimates).
Figure 2 illustrates the distribution of the generated population by each risk factor and initial state.
For the simulation of the exposure (or exposure to risk) of each individual, we considered that the observation period is uniformly distributed between 6 and 12 months, and that the exposure is independent from their demographic characteristics.
For the transition intensities included in the transition intensity matrix (
1), following
Esquível et al. (
2021) and
Haberman and Pitacco (
2018), firstly we assume intensities of the Gompertz–Makeham type, i.e., of the form
in which the transition intensities exponentially increase with the age of the individual. This is a reasonable assumption that is commonly used in other countries for disability insurance, see
Haberman and Pitacco (
2018), for instance.
As a starting point, we consider the calibrated transition intensity parameters obtained in
Esquível et al. (
2021) for a multi-state model with three dependence states, using data from 2015 of the Portuguese RNCCI. Therefore, the initial transition intensities for the individuals of the dataset simulated in this paper are of the form of Equation (
10), with the parameters for each transition presented in
Table 2.
By calibrating the Gompertz–Makeham parameters, we aim to model transition intensities that reflect individual risk profiles.
Table 3 illustrates the most frequent characteristics (i.e., most common risk profile) in the simulated population/portfolio.
For these individuals, transition intensities are defined by the parameters specified in
Table 2. To account for different risk profiles, we modify these parameters using multiplicative adjustment factors. Specifically, for a transition intensity of the form (
10), the adjusted parameters are expressed as
Note that, for individuals whose characteristics match those in
Table 3, all multiplicative adjustment factors are equal to 1. Moreover, the factors are cumulative, i.e., for individuals with multiple characteristics differing from the reference profile in
Table 3, the final adjustment factor equals the product of all applicable individual adjustment factors.
For the intensities from the Autonomous state to the Mildly Disabled state,
, the adjustment factors according to each explanatory variable are presented in
Table 4. To identify the optimal risk differentiation, and using published health statistics from
WHO (
2009,
2024), multiple combinations of adjustment factors were tested, and those that best captured meaningful variations in covariate effects, while ensuring model coherence, were selected. The final chosen factors were designed to reflect both clinically plausible risk gradients and actuarially significant distinctions between covariate levels, maintaining an appropriate balance between sensitivity and stability across the pricing framework.
Similar procedures for the remaining transition intensities were followed, considering appropriate adjustments to account for the effects of the selected risk factors, thereby ensuring the risk differentiation is consistently reflected across all transitions.
For simulation purposes, and given the assumption of Gompertz–Makeham transition intensities, we consider a normally distributed error term with 0 mean and standard deviation proportional to the transition intensity, i.e.,
with
.
Finally, the trajectory of each individual in the population or, in other words, the sequence of states visited and the corresponding durations spent in each state throughout the observation period was simulated. Under the assumption of piecewise constant intensities between consecutive ages, the sojourn times and state transitions can be derived from the transition intensities, as described in (
Ross 1996, p. 165).
Let us consider an individual, with simulated initial state and exposure, as well as transition intensity matrix (
1), with the transition intensities depending on their risk profile. Starting from the initial state, the individual will remain in that state for a certain amount of time before they transition to the next state. The sojourn time in each state
i is exponentially distributed, with rate equal to the total intensity of leaving state
i, which we shall denote by
, with
.
If the generated sojourn time is greater than the observation period, the trajectory ends with the member having made no transitions. Otherwise, the next state j () is randomly chosen, with probability . The trajectory ends when the absorbing state (Dead) is reached or when the aggregated sojourn times reach the end of the observation period.
Having simulated the trajectories for the 100,000 individuals, with observation periods between 6 months and 12 months, the number of transitions of each type and the times spent in each state observed for each individual were determined, which allows to proceed with the estimation and graduation of transition intensities according to the risk profile of the individual, following the procedures described in
Section 2.2 and
Section 2.3.
3.2. Results and Discussion
Following the GLM framework from
Section 2.3 and incorporating the simulated risk factors, risk-adjusted graduated transition intensities, for the disability/LTC multi-state model defined in
Section 2.1, were obtained. The ultimate objective is to derive parametric forms for these transition intensities that capture population risk heterogeneity.
First, we organise the available data into a grid of units
u, where the categorical variables represent the risk factors and levels outlined in
Table 1, where individuals under the age of 45 were grouped into a single age group.
The most frequent levels in the grid of units
u for each risk factor are presented in
Table 5. This set of characteristics constitutes the intercept (base level) of the fitted models.
3.2.1. Transition Intensities from Autonomous to Mildly Disabled
In this section, for illustration purposes, we perform a detailed presentation of the estimation of risk-adjusted transition intensities from the Autonomous state to the Mildly Disabled state, .
For each cell u of the grid, the required data consist of the observed number of transitions from the Autonomous state to the Mildly Disabled state, accruing from corresponding exposure (total sojourn time in the Autonomous state).
By assuming that are independent overdispersed Poisson variables, we fit a quasi-Poisson regression to the data, using a log-link function and with offsetting the model.
Table 6 presents the final GLM model for the transition intensities
, which includes the maximum likelihood estimates of the model’s coefficients, along with the standard error and statistical significance. Furthermore, we estimate the dispersion parameter of the quasi-Poisson model (
) and compute the AIC (
), as well as the residual deviance (
) and the degrees of freedom (
) of the final model. Model adequacy was further assessed through deviance residual analysis, which indicated a satisfactory goodness of fit. It is worth noting that all levels of the explanatory variables are found to be statistically significant.
Figure 3 illustrates the estimated effects of each covariate on the response variable in the final model. Each plot displays the predicted number of transitions by level of the explanatory variable (dots), the fitted value trend across levels (blue lines), and the associated 95% confidence intervals (orange bars).
Consistent with prior expectations, the estimated transition intensity from the Autonomous state (State 1) to the Mildly Disabled state (State 2) increases substantially with age. Furthermore, statistically significant differences are observed between smokers and non-smokers and between males and females, with higher intensities associated with the former in both comparisons. A negative association is identified between physical activity duration and this transition intensity, while individuals with a “normal weight” BMI classification exhibit reduced transition intensities relative to other BMI categories.
3.2.2. Further Transition Intensities
An analogous procedure is applied to the further transition intensities, using the corresponding definitions of the response variables and exposures . As in , each transition intensity is modelled using an overdispersed Poisson regression with a log-link function, incorporating as an offset term.
The parameter estimates of the fitted GLMs for all transition intensities
included in matrix (
1) are presented in
Table 7.
In terms of risk factors, we observe that the variables age and sex are statistically significant in all models, along with the variables BMI and smoking, except for the transition intensity between the Severely Disabled and Mildly Disabled states. Additionally, the variable region is found to be significant solely for transitions to the Dead state, while the variable exercise does not exhibit significance for transitions from the Mildly Disabled state, except for those directed to the Autonomous state.
Subsequently, we generate predicted values for the response variables based on the fitted GLMs for all 100,000 individuals in the population. From Equation (
9), the predicted number of transitions is divided by the total exposure to risk, resulting in risk-adjusted estimated transition intensities for each individual as predicted by the fitted models.
In conclusion, by leveraging the simulated trajectory data and applying the GLM framework, the proposed approach enables the estimation of transition intensities at the individual level, incorporating demographic characteristics. The predicted values obtained from the GLMs allow for the derivation of a functional representation of transition intensities across different risk profiles within the population.
3.2.3. Function for Transition Intensities Depending on Risk Factors
In this section, functional forms for the transition intensities, as functions of individuals’ age, with the parameters reflecting the risk profile of a given group are derived. To this end, we consider Gompertz–Makeham-type transition intensities (
10), and we fit the corresponding non-linear least squares (NLS) regression model to the predicted values obtained from the GLMs.
We start again by analysing the transition intensity, , from the Autonomous to the Mildly Disabled state, aiming now to establish its functional form with respect to age for the overall population.
By fitting a NLS regression model to the predicted values of the transition intensities obtained from the GLM, using the functional form of Equation (
10), and initializing with the parameter values provided in
Table 2, the parameters
,
, and
are estimated through the minimisation of the residual sum of squares.
To improve the regression model’s fit at advanced ages, the NLS fitting procedure was restricted to individuals aged 50 and above. This approach aligns with the methodology of
Esquível et al. (
2021), whose calibrated parameters—illustrated in
Table 2—were derived under similar constraints.
As a result, the following functional form was obtained:
resulting from the model presented in
Table 8, where the last two columns include a
confidence interval for the estimated parameters.
For a general model applicable across all ages and all LTC financing models, different NLS regression models of the Gompertz–Makeham type could be fitted for individuals below and above 50 years of age.
To assess the quality of the simulation results, we compare the estimated parameters with those of the transition intensities originally used to simulate the trajectories in
Section 3.1. As a reminder, the initial transition intensities were generated by adjusting the baseline parameters from
Table 2, using the adjustment factors provided in
Table 4, according to the characteristics of each individual. For the purpose of comparison, we compute the mean estimate of each parameter across the entire population.
For illustrative purposes,
Figure 4 presents the transition intensities obtained both through the estimation procedures and from the simulation, aggregated over the entire population/portfolio. We focus specifically on the transitions from the Autonomous state to the Mildly Disabled state and from the Mildly Disabled state to the Severely Disabled state, which reflect consistency.
In
Figure 5, we compare the transition intensities corresponding to disability deterioration—i.e., transitions from the Autonomous state to the Mildly Disabled state,
, and from the Mildly Disabled state to the Severely Disabled state,
—, as well as the corresponding recovery intensities from these states. It is observed that, for individuals aged over 75, the value of
increases at a higher rate than that of
, which is consistent with the findings reported in
Esquível et al. (
2021).
In particular, the proposed methodology allow us to obtain a functional form for the transition intensities as a function of age, stratified by risk factors and levels.
As an example, we present the functions for the transition intensity
specific to male and female individuals. We begin by stratifying the population by sex and then fitting group-specific NLS regression models to the predicted transition intensity values (obtained from the GLM). These models follow the functional form presented in (
10), with the starting parameters summarised in
Table 2. The resulting functions are as follows:
Figure 6 presents the functions (
11) and (
12), alongside the raw estimates of the transition intensities for each age. As a reminder, the crude estimates were obtained, in this case, by dividing the number of transitions between the Autonomous state and the Mildly Disabled state by the waiting time in the Autonomous state, as outlined in (
4). It is observed that the obtained functions provide a good fit to the raw estimates for each year of age.
By applying the same procedure to each of the significant risk factor included in the final models, transition intensity functions specific for each risk factor have been obtained.
Table 9 presents the parameters to the stratification for each significant risk factor and the corresponding transition intensity functions are illustrated in
Figure 7.
From
Figure 7, we can observe that the transition intensity
is higher for Males than for Females, particularly at older ages. Regarding the BMI, individuals classified as “Obese” exhibit the highest intensities, followed by “Overweight II”, while those in the “Normal weight” group (which also includes “Underweight” and “Overweight I” individuals) display lower values. As expected, Smoker individuals present higher transition intensities than Non-Smokers, with differences increasing with age. For physical activity, the “Less than 3 h” group (which results from merging the “Less than 2 h” and “2 to less than 3 h” categories) shows higher transition intensities, while the intensity decreases with increasing activity levels, reaching the lowest values for individuals exercising “5 or more hours” per week.
Finally, by applying the same method to groups of individuals sharing a specified set of characteristics across multiple risk factors, it is possible to derive transition intensity functions as a function of age specific to each risk profile.
As an illustration, for transitions from the Autonomous state to the Mildly Disabled state, let us consider the two risk profiles defined in
Table 10 as follows: a “Low risk” profile, corresponding to the combination of levels associated with lower transition intensities, and a “High risk” profile, corresponding to the combination of characteristics leading to higher intensities, obtained from the GLM coefficients in
Table 7.
As before, a NLS regression model was fitted to the predicted transition intensities (obtained from the GLM) for each group, where each group consists of individuals matching the characteristics of the respective risk profile. Thus, the following functions for the transition intensity
corresponding to the “Low risk” and “High risk” profiles were obtained:
Figure 8 illustrates functions (
13) and (
14), as well as the corresponding functions for high- and low-risk profiles for the transition between Mildly Disabled and Severely Disabled. In both cases, transition intensities are consistently higher for individuals with a high-risk profile, as expected.
In summary, by incorporating multiple risk factors and stratifying individuals by risk profile, from the proposed methodology, we derive transition intensity functions that explicitly depend on age x and individuals’ characteristics. This framework enables the differentiation of transition intensities, and thus risk-adjusted transition probabilities, supporting the accurate calculation of premiums and reserves, within the context of disability and long-term care insurance, for general multi-state models. For LTC insurers, accurate risk premiums transform risk management from a reactive cost centre into a strategic driver of profitability and sustainability. By embedding advanced analytics into pricing, insurers can achieve more stable portfolios, improved customer segmentation, and long-term market growth, while fulfilling their core mission of providing sustainable LTC coverage.
4. Conclusions
Long-Term Care insurance has gained interest due to population ageing and the increasing prevalence of dependency among the elderly in developed countries. In Portugal, despite facing similar socio-economic challenges, the absence of private LTC insurance products is mainly due to the complexity of risk modelling and data scarcity enabling accurate pricing and reserving.
In this paper, we propose a practical application of estimation and risk-adjusted graduation methods for disability and LTC multi-state models, supported by a simulated dataset based on Portuguese demographic characteristics. A GLM framework was employed to graduate transition intensities while incorporating individual risk profiles, enhancing the model’s practical applicability for insurance purposes.
We first introduced the disability/LTC model designed for this study, featuring two degrees of disability/lack of autonomy, and the corresponding transition matrix, and we applied the transition intensity approach for parameter estimation. Under the assumption of constant intensities between consecutive ages, we derived the raw maximum likelihood estimates for the transition intensities. To incorporate the impact of risk factors, we implemented a risk-adjusted graduation technique based on GLMs, using overdispersed Poisson models with Gompertz–Makeham-type predictors.
Extending the approach in
Rensham and Haberman (
1995), in addition to age, our model incorporates individual risk factors such as sex, region, BMI, smoking habits, and exercise habits, to enhance heterogeneity modelling. In the absence of real portfolio data, we simulated a cohort of 100,000 individuals using Portuguese national statistics as the demographic foundation. Transition intensities were derived from the calibrated parameters in
Esquível et al. (
2021), with multiplicative adjustment factors defined to incorporate individual risk characteristics. This approach enabled the simulation of complete trajectories across all individuals of the population.
GLMs were fitted for each transition intensity, adjusting the covariate structures to ensure statistical significance. Variables such as age, sex, BMI, and smoking habits consistently proved significant, while region and exercise showed more limited relevance. GLM predictions enabled stratification by risk profiles, and NLS regressions were applied to derive smooth functions of transition intensities as a function of age.
Finally, by combining multiple risk factors, we derived age-dependent transition functions specific to chosen “Low risk” and “High risk” profiles, demonstrating a clear differentiation in expected transitions based on individual characteristics. Similar results may be obtained for any risk profile within the population/portfolio.
Key findings include the following: (1) higher intensities for males, smokers, obese individuals, and those with lower physical activity; (2) physical activity correlated inversely with transition intensities; (3) region was not a significant factor for most of the transitions; and (4) higher risk profiles consistently exhibited higher transition intensities, particularly at advanced ages.
The methodology proposed in this paper provides a foundation for the development of disability and long-term care insurance products, enabling the construction of robust and risk-sensitive pricing structures tailored to the Portuguese market. While developing this generalizable approach, which allows for the construction of robust and risk-sensitive pricing structures, we specifically addressed Portugal’s challenges in long-term care risk modelling (particularly data scarcity) by basing our simulations and model results on studies based on real-world Portuguese data, see
Esquível et al. (
2021), and official publicly available Portuguese population statistics, see
INE (
2020a,
2021a,
2021b,
2021c,
2024). This dual focus ensures both methodological innovation and contextually relevant outcomes for insurance product development.
4.1. Policy Implications
The accurate estimation of transition intensities is fundamental for ensuring actuarial fairness, financial stability, and regulatory compliance in insurance. For insurers, reliable multi-state models and risk-adjusted transition intensities allow for precise risk differentiation, thus preventing adverse selection, optimising premium pricing, and ensuring solvency under regulatory frameworks like Solvency II. In countries like Portugal, where ageing populations and evolving morbidity patterns—such as the increasing prevalence of chronic diseases—shape longevity, disability, and long-term care risks, robust transition data are critical to design sustainable health-related insurance products.
At the macroeconomic level, these insights aid policymakers in anticipating healthcare demands and addressing inequalities in coverage, for instance, in cases where regional disparities in health service availability are a reality. For private insurers, leveraging real Portuguese data mitigates model risk and fosters innovation (e.g., dynamic pricing for long-term care). Conversely, underestimating transition intensities could compromise portfolios’ risk measures, while overestimation might exclude vulnerable groups from affordable coverage. Thus, investing in granular and population-specific modelling is not just a technical imperative but a societal one, bridging actuarial science and public policy in an era of demographic transition.
4.2. Limitations and Future Work
This research offers valuable insights for insurance regulators and social policymakers, particularly in addressing the growing challenges of disability and long-term care needs in ageing populations like Portugal’s, while the proposed framework demonstrates strong theoretical foundations and practical potential, we acknowledge several limitations that point to important directions for future work.
The model’s adaptability to different institutional contexts, although conceptually robust, requires empirical validation to realise its full potential. A critical next step involves testing the framework with real insurance portfolio data, which would serve two essential purposes as follows: first, to validate the simulated transition patterns and risk factor parameterisation against observed experience; second, to identify potential additional covariates that could enhance the model’s predictive accuracy. Such empirical validation would naturally lead to refinements in both the GLM specifications and transition intensity functions, ensuring better alignment with Portugal’s unique demographic and risk profiles.
Future research should also explore extensions of this work, including possible integrations with public health strategies and emerging care models, which could foster valuable synergies between actuarial science and policy development. From an implementation perspective, careful attention must be given to validating assumptions, testing calibration choices, and developing guidelines for adapting the framework to diverse subpopulations.
These research directions would significantly strengthen the model’s reliability and practical relevance for the Portuguese market, while maintaining the methodological rigour established in the current study. By addressing these challenges, the framework can evolve into a robust tool for insurers and policymakers alike, supporting data-driven decisions in product design, premium setting, and resource allocation for Portugal’s evolving social protection needs.