2.2.2. Quasi-Maximum Likelihood Estimation
To estimate the parameters of the regime-switching models, we employ the quasi-maximum likelihood estimation (QMLE) method, following the approach of
James and Webber (
2000). QMLE applies the principle of maximum likelihood to a quasi-likelihood (or approximate likelihood) function, rather than to the exact likelihood, thereby providing a tractable approximation when the true likelihood is analytically intractable. This methodology has been used effectively in
Zhou and Mamon (
2012) to approximate likelihood functions arising from stochastic differential equation models for interest rates. We adopt the same approach for both the one-state and multi-state specifications considered in this paper, owing to their relative simplicity and ease of implementation. For alternative estimation techniques that explicitly incorporate serial correlation, see
Hardy (
2001).
Under the maximum likelihood estimation, we determine the vector of parameter values of a given probabilistic model that maximizes its likelihood of representing the observed samples. Let
be a random sample (independent and identically distributed random variables) from a distribution with probability density function (pdf)
, where
is a vector of parameters and
, where
is the parameter space. The likelihood function
is given by
The maximum likelihood estimator
for
is as follows:
Since
is a monotonically increasing one-to-one function this is also equivalent to
where
is the log-likelihood function. After these estimates are obtained, we obtain the standard errors corresponding to these parameter estimates from the Observed Fisher Information Matrix
, which is a
matrix whose entries are given by
From this, the standard error of the estimate
for the parameter
is the square-root of
-th entry of the inverse of
, i.e.,
. In case the resulting
is singular, we instead use its pseudoinverse (specifically, the Moore-Penrose Generalized Inverse) which in this case is
.
Li and Yeh (
2012) justified the use of the said pseudoinverse by proving that it is the Cramér–Rao bound of a singular Fisher information matrix corresponding to the minimum variance amongst all choices of minimum constraint functions.
The parameter estimation in this paper is carried out using quasi-maximum likelihood estimation (QMLE), where the likelihood is constructed from an approximate conditional distribution implied by a discretisation of the continuous-time dynamics. In particular, the Euler–Maruyama scheme is used to obtain a Gaussian approximation to the one-step transition of under the assumed models. This approach is widely used in practice when working with discretely observed data and provides a transparent and computationally tractable estimation strategy.
We note, however, that when observations are available only at low frequency, such as annually, discretisation error may be non-negligible, especially for mean-reverting processes. For the one-state case considered here, both the Vasiček short-rate model and the Ornstein–Uhlenbeck-type mortality model admit exact Gaussian transition densities, which could be employed to reduce discretization bias. Extending the likelihood construction to exact transitions and to higher-frequency datasets where available is, therefore, a natural robustness enhancement.
We acknowledge that, due to the limited availability of observations owing to the very nature of the mortality rate data with annual frequency, the statistical robustness of the estimates may not be particularly strong. To provide further insights into robustness, bootstrap methods could be applied, which allow for the assessment of the variability of parameter estimates under repeated resampling. Aside from this, since Equations (
3) and (
4) have regime-switching volatilities which are apparently latent in nature, further model diagnostics and robustness checks that are commonly employed in discrete-time implementations of latent stochastic volatility models may be considered (e.g.,
Zeghdoudi et al. (
2014)).
The empirical results reported in this study should be interpreted as an implementation illustration of the proposed modelling framework, with the estimator understood as quasi-likelihood-based rather than exact maximum likelihood.
To derive the quasi-log-likelihood function for the one-state case, we follow
James and Webber (
2000) and discretise Equations (
1) and (
2) using the Euler–Maruyama scheme with
(reflecting annual data):
where the increments
and
are independent standard normal random variables, that is
for
. It follows that, conditional on
, the vector
has a bivariate normal distribution, namely
This implies that the conditional density of
given
is
Given observations
and
, the quasi-log-likelihood function is
2.2.4. Parameter Estimation
In this paper, quasi-maximum likelihood estimation (QMLE) is implemented numerically using the
FindMaximum function in
Mathematica, a technical computing software package. We fit the one-state, two-state, and three-state versions of the model to the data. The corresponding optimisation problems and parameter constraints for the three specifications are summarised in
Table 1.
Table 2 reports the parameter estimates together with their standard errors (in parentheses). Overall,
Table 2 indicates that the QMLE procedure yields stable and reasonable parameter estimates, as reflected in the magnitude of the associated standard errors. In particular, the volatility parameters
, followed by
, are estimated most precisely across the three models, whereas the correlation parameters
exhibit comparatively larger standard errors and hence are estimated with lower precision.
The QMLE approach adopted in this study is based on an approximate Gaussian likelihood implied by the Euler–Maruyama discretisation. Although QMLE is widely used in practice and performs well under mild regularity conditions, it remains an approximation to the true likelihood and may be sensitive to small-sample effects. In particular, uncertainty may be more pronounced for regime-dependent parameters, such as correlation and transition probabilities, especially when a regime is visited relatively infrequently in the observed sample. For this reason, the reported standard errors should be interpreted as measures of estimation precision within the quasi-likelihood framework.
Alternative estimation procedures could be explored in future work to assess robustness and improve inference under regime-switching dynamics. These include maximum likelihood estimation via expectation–maximisation algorithms, Bayesian methods (e.g., Markov chain Monte Carlo), and filtering-based approaches designed for latent regime processes. Nevertheless, QMLE is adopted here because it offers a transparent and computationally tractable estimation strategy, and it produces stable parameter estimates that support practical implementation of the proposed pricing framework.
Knowing that the mortality process
admits negative values for any
, we address this modelling concern by verifying that such probabilities are practically negligible. To this end, we approximate the probability that the mortality rate process takes negative values at future times using a Monte Carlo simulation of 100,000 sample paths. Specifically,
Table 3 reports
together with their standard errors (in parentheses). Here, age 40 corresponds to the age of the U.S. male cohort in 2023, as shown in
Figure 2. The chosen values of
t therefore correspond to ages 45, 50, …, 100, which constitute the relevant time horizon for the numerical experiments presented in the succeeding subsections. Across all model specifications, the probabilities are extremely small, with the largest probability being below 0.2%. In particular, for the one-state model, no realisations of the mortality process have attained negative values across all 100,000 sample paths for any
t, resulting in estimated probabilities of 0 throughout the entire time horizon. Further reassurance can be obtained by using the closed-form formula derived by
Luciano and Vigna (
2005) for the one-state case. In this setting, the formula is given by
and these probabilities are also displayed in
Table 3. Indeed, the exact probabilities are extremely low, ranging from orders of
up to
, which are practically negligible in actuarial contexts.
Incorporating regime switching increases the likelihood for the mortality process to attain negative values, reflecting the higher variability induced by transitions between latent states. This effect is modest in the two-state model and more apparent in the three-state model, where the estimated probabilities increase gradually with the time horizon. Nevertheless, even in the three-state case, the probability of observing negative mortality rates remains below 0.2%, indicating that such events are exceedingly rare within the relevant age range.
Although the quantities reported in
Table 3 measure the likelihood of negative values at the specified future times, they do not capture when such an event is expected to happen along a sample path of
. To complement the above analysis, we therefore look at the expected hitting time of the negative region. Conditional on
, we define the first hitting time as
whilst
quantifies the time at which the mortality process first exits the admissible (non-negative) state space of the modelling framework, its expectation may be difficult to estimate reliably in practice due to rare-event behaviour and long simulation horizons. Moreover, from an actuarial perspective, our window of interest is typically restricted to ages of up to 100, corresponding to a maximum horizon of 60 years from age 40. To this end, we introduce a right-censored version of
, defined as
We then estimate
via Monte Carlo simulation using 100,000 sample paths. In this setup, the paths for which the process remains non-negative over the interval
are censored at 60. This censored expectation provides a conservative and practical measure for the time scale over which negative values may arise. The resulting estimates are shown in
Table 4.
Across all model specifications and consistent with the results in
Table 3, the expected hitting times are extremely close to the censoring threshold, indicating that, on average, negative realizations of the mortality rate process occur, if at all, only at the very end of the considered horizon.
In the one-state model, the expected hitting time equals 60 exactly, with zero standard error, reflecting that no paths hit negative values within the 60-year window. This observation is consistent with the closed-form probability of negativity discussed previously, providing additional reassurance that the one-state model remains well within the admissible non-negative region.
Introducing regime switching slightly decreases the expected hitting time, with the two-state model yielding 59.997 and the three-state model 59.945. Consistent with the previous discussion on negative mortality rates, these small deviations reflect the increased variability induced by state transitions, which occasionally allow paths to reach negative values before the censoring point . Moreover, even in the three-state model, the expected hitting time also remains near the upper bound of the horizon, and the corresponding standard errors are extremely small.
In summary, the results from
Table 3 and
Table 4 demonstrate that negative mortality values are exceedingly rare within the actuarially relevant age range of 45–100 and confirm that the models produce practically admissible mortality trajectories over the time horizon of interest.
From
Table 2, one observes that the quasi-log-likelihood values of the regime-switching models are higher than those of the one-state model, indicating that the former capture the underlying data dynamics more effectively. Taken in isolation, this would suggest selecting the three-state model. However, it is also necessary to assess whether the improvement in fit implied by the quasi-log-likelihood is sufficient to justify the additional model complexity, in order to avoid overfitting.
To address this, we employ two standard model selection criteria that penalise the log-likelihood for extra parameters: the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). These are given by
where
n denotes the number of observations and
D the number of parameters.
Table 3 reports these values for the models considered, with
.
Table 5 indicates that the two-state model attains the highest AIC value, whilst the one-state model has the highest BIC value. Given our datasets and the trade-off between goodness of fit and model complexity, the main choice is therefore between the one-state and two-state frameworks for overall performance in terms of fit, predictive ability, and interpretability. In general, AIC is regarded as more suitable for prediction, whereas BIC is often preferred for explanation or for identifying the true underlying model. The AIC is typically chosen when predictive accuracy is the primary objective and the inclusion of additional parameters is acceptable, whilst the BIC is favoured when the aim is to select the most parsimonious model that still explains the data adequately. In our context, the modelling framework is intended for long-term projection of interest and mortality rates for GAO pricing, which aligns more naturally with the predictive focus of the AIC. Accordingly, the two-state model is adopted for the valuation and sensitivity analyses.
Examining the parameter estimates for the two-state model in
Table 2, Regime 1 corresponds to a low-volatility and lower interest-rate environment, as implied by the corresponding parameter values of
and
. This regime is visited most of the time, consistent with the relatively small transition probability
of 22.25%, which implies a low propensity to move from Regime 1 into Regime 2.
By contrast, Regime 2 may be characterised as a high-volatility environment, evidenced by the larger values of and , together with a higher long-term mean interest-rate level b. Elevated interest rates often coincide with heightened volatility, reflecting increased uncertainty and a greater likelihood of default when borrowing costs are high. The economy occupies this regime relatively infrequently, as suggested by the transition probability of 63.15%, which indicates a strong tendency to move out of Regime 2.
Finally, we note that the one-state model contains 6 parameters, the two-state model contains 12, and the three-state model contains 20. By induction, a K-state model may be shown to have parameters.