1. Introduction
The phase-type aging model (PTAM) belongs to a class of Coxian Markovian models that were proposed in a previous study [
1]. The purpose of the PTAM is to provide a quantitative description of well-known aging characteristics that are part of a genetically determined, progressive, and irreversible process. It provides a means of quantifying the heterogeneity in aging among individuals and of capturing the anti-selection effects.
Since the PTAM is nonlinear in its parameters, the parameter estimation turns out to be non-stable, which gives rise to an estimability issue (see [
2]). The estimability issue of the PTAM originates from two complications. First, the structure of the PTAM, which is complicated by its matrix exponential form, makes it impossible to directly analyze the gradient and Hessian matrix of its likelihood function. In these instances, any hill-climbing optimization algorithms will be subject to the risk of becoming stuck in local maxima. The second aspect is that the structure of the PTAM gives rise to flat profile likelihood functions (see [
2]). The estimability issue arising from flat profile likelihood functions is thoroughly discussed in [
3]. Frequentists would potentially address the flat likelihood functions using regularization, which can be thought of as using a log prior density (see [
4]).
To address both problems, a Bayesian approach is considered. Thus, the parameters are assumed to be random variables, which automatically eliminates the risk of being stuck in local maxima. This addresses the first problem. As for the second one, the Bayesian approach can improve parameter estimability by making use of sound prior information. Then, the posterior distributions will significantly depend on the prior distributions since the profile likelihood functions are flat and become nearly horizontal.
Moreover, there are convincing reasons for applying the Bayesian approach on the PTAM. In this context, it has previously been applied via the data augmentation Gibbs sampler, which consists of two iterative steps—a data augmentation step and a posterior sampling step (see [
5]). The data augmentation step in relation to continuous phase-type distributions was thoroughly studied by the authors of [
6], where an EM algorithm was proposed for estimating parameters of phase-type distributions. Based on the same data augmentation scheme, the authors of [
7] considered Dirichlet and Gamma distributions as the conjugate prior distributions in the posterior sampling step, before developing an MCMC-based Bayesian method for continuous phase-type distributions. Later on, several studies were carried out regarding the data augmentation step in order to improve computational efficiency. The authors of [
8] proposed the Exact Conditional Sampling (ECS) algorithm. Another efficient algorithm introduced by the authors of [
9] involves uniformization and backward likelihood computation. However, these contributions all focus on the data augmentation step. In the context of the PTAM, the posterior sampling step also becomes more involved because of its parameter structure, since the posterior distributions are then no longer as simple as Dirichlet and Gamma distributions after data augmentation. This situation has not been studied before in the literature. Therefore, the first contribution of this study is to develop an MCMC algorithm for sampling the posterior distribution of the PTAM.
Another area that needs further development is the determination of a method for dealing with left-truncated data in the data augmentation step. Although the authors of [
10] developed the EM algorithm for censored data from phase-type distributions, the case of left-truncated data in connection with the MCMC-based Bayesian approach has not previously been studied. The MCMC algorithms proposed in [
7,
8,
9] are indeed only applicable to data that are not left-truncated. However, it is important to develop MCMC-based methods that handle left truncation because it is a common feature of real-life data. In particular, in the context of the PTAM, real-life data are left-truncated because it is unlikely that in practice, individuals will enter the study at the same physiological age. Accordingly, the second contribution of this study is to develop an MCMC algorithm for estimating the PTAM parameters when data are left-truncated.
The proposed MCMC algorithm utilizes a nested structure comprising two levels. In the outer level, augmented data are generated using the ECS algorithm proposed in [
8] combined with the technique developed for left-truncated data. In the inner level, Gibbs sampling is applied to draw samples from the posterior distributions based on a newly developed rejection sampling scheme on a logarithmic scale. Thus, the proposed algorithm can be seen as a methodological extension of the existing data augmentation Gibbs sampler for continuous phase-type distributions. It can also be regarded as a further illustration of making use of the MCMC algorithm in the case of sampling from high-dimensional distributions. On applying the proposed algorithm, a Bayesian estimation of the PTAM parameters can be carried out. This will be illustrated with both simulated and actual data.
This paper is organized as follows. Preliminaries on the PTAM are introduced in 
Section 2. 
Section 3 presents a literature review on existing MCMC algorithms in connection with continuous phase-type distributions. In 
Section 4, the proposed MCMC algorithms for Bayesian inference on the PTAM are introduced. A simulation study is provided in 
Section 5 to validate the proposed approach. Meanwhile, parameter estimability is analyzed by comparing the proposed Bayesian approach with the frequentist one that was employed in [
1]. In 
Section 6, the proposed Bayesian approach is applied to calibrate the PTAM to the Channing House data set which pertains to the residents of a retirement community. Lastly, some concluding remarks are included in 
Section 7.
  2. The Phase-Type Aging Model
The phase-type aging model (PTAM) stems from the phase-type mortality model proposed in [
11]. The motivation for analyzing the phase-type mortality model consists of linking its parameters to certain biological and physiological mechanisms of aging, so that the longevity risk facing annuity products can be measured more accurately. Experimental results showed that the phase-type mortality model with a four-state developmental period and a subsequent aging period achieved very satisfactory fitting results with respect to Swedish and USA cohort mortality data (see [
11]). Later on, the authors of [
12] applied the phase-type mortality model to Australian cohort mortality data.
Furthering the research in [
11], the authors of [
1] developed a parsimonious yet flexible representation of the PTAM for modeling various aging patterns. Similarly, the main objective of the PTAM is to describe the human aging process in terms of the evolution of the distribution of physiological ages, utilizing mortality rates as aging-related variables. Therefore, although the PTAM can reproduce mortality patterns, it ought not to be treated as a mortality model. In this context, the PTAM is most applicable at human ages beyond the attainment of adulthood, where, relatively speaking, the aging process is the most significant factor that contributes to the variability in lifetimes (see [
1]).
  2.1. Preliminaries
Definition 1. Let  be a continuous time Markov chain (CTMC) defined on a finite state space , where  is the absorbing state and  is the set of transient states. Let  have initial distribution  over the transient states such that , and let the transition intensity matrix be as follows:where  and  is the column vector of ones.  is defined as the time until absorption. Then, T is said to follow a continuous phase-type (CPH) distribution denoted by  of the order m, with  being defined as the exit vector.  Remark 1. Given  of order m,
 There is a long history of using phase-type distributions for survival modeling in the category of “absorbing time” distributions (see [
6,
11,
12,
13]).
Definition 2. A CPH distribution of order m with representation  is said to be a Coxian distribution of order m if  and the following holds true:where , , and .  The Coxian distribution can often be visualized by a phase diagram such as that displayed in 
Figure 1 (see [
1]).
  2.2. The Phase-Type Aging Model
According to the authors of [
1], the phase-type aging model (PTAM) belongs to a class of Coxian-type Markovian models, which can provide a quantitative description of the genetically determined, progressive, and irreversible aging process.
Definition 3. The PTAM of order m is a Coxian distribution of order m with transition intensity matrix  and exit rate vector  such that the following applies:where , , and. This is denoted by .  As can be seen from 
Figure 2, the PTAM has a phase diagram that is similar to the Coxian distribution observed in 
Figure 1, the difference being the constant transition rate and the functionally related exit rates defined in (
4).
- (i)
- In  Figure 2- , each state in the Markov process represents the physiological age—a variable that reflects an individual’s health condition or frailty level. As the aging process progresses, the frailty level will increase, until the last state occurs. At which point, the individual’s health conditions have deteriorated to the point of causing death. 
- (ii)
- The transition rate  -  is assumed to be constant. The exiting rate  -  is the dying rate or force of dying. With this setup, at a given calendar age, an individual will be assigned to a certain state. This mathematically describes the fact that the individuals involved will have different physiological ages at the same calendar age (see [ 1- ]). 
- (iii)
- The dying rates assume the structure given in ( 4- ), which is somewhat reminiscent of the well-known Box–Cox transformation introduced in [ 14- ]. The first and last dying rates— -  and  - —are included in the model parameters, whereas the remaining inbetween rates are interpolated based on the parameter  s- , which is a model parameter related to the curvature of the exit rate pattern. To verify this,  Figure 3-  presents the effect of  s-  on the pattern of the exit rates. When  - , the dying rates have a linear relationship. When  - , the rates are concave, and when  - , the rates are convex. In particular, when  - , the rates behave exponentially. In practice, we believe that it is likely that  -  when calibrating to mortality data. That is, the dying rates increase faster than in a linear manner as an individual ages (see [ 1- ]). Throughout this study,  -  will follow the structure given in ( 4- ), for  - . 
The parameter structure of the PTAM proves to be parsimonious and flexible, which allows us to model the internal aging process explicitly. Further information is available in [
1]. Since our study pertains to the PTAM, the processes being considered are homogeneous, use the same intensity for all transitions to the next stage, and have an intensity of moving to the absorbing stage consisting of a linear interpolation between the two end points. More general processes would require further consideration.
  4. MCMC-Based Bayesian Inference for the PTAM
The MCMC algorithm for Bayesian inference on the PTAM being introduced in this section constitutes the principal contribution of this study. This contribution involves two aspects. Firstly, the proposed MCMC algorithm can be considered as a methodological extension of the existing algorithm in terms of sampling from 
. This is due to the fact that the likelihood function of the PTAM is so involved that no simple conjugate prior distributions such as the Dirichlet and Gamma distributions are adequate. Although the authors of [
8] consider special parameter structures such as zero-valued and identical parameters, the prior conjugacy still holds as it simply involves deleting and regrouping parameters. However, further extensions are required in the case of the PTAM, since its parameters exhibit more complicated functional relationships as a result of the constraint specified in (
4). Secondly, similarly to the authors of [
10], where the EM algorithm was developed for censored data from the CPH, we have developed the MCMC-based Bayesian approach for left-truncated data from the PTAM. This development is crucial for the estimation of the PTAM parameters based on real-life data, since it is unlikely that in practice, each individual will enter the study at the same physiological age. Thus, there exists additional difficulty with respect to analyzing left-truncated data.
With these contributions, a methodologically extended MCMC algorithm is proposed in order to carry out the sampling from  so that an MCMC-based Bayesian inference on the PTAM could be achieved, particularly for real-life data that are left-truncated.
  4.1. Likelihood Function of the PTAM with Left-Truncated Data
Taking into account left-truncated data, the likelihood function for the PTAM after data augmentation is as follows:
        where 
 is the time at which individual 
i enters the study; 
 is the total number of transitions from state 
i to 
j, which occurred before the entry times; 
 is the total sojourn time in state 
i for the portions of the sample paths after the entry times; 
 is as defined in 
Section 3; and 
 is the total sojourn time in state 
i for the sample paths in 
, as follows:
        where 
 is the sojourn time at state 
j for the 
kth sample path.
The likelihood function (
13) can be seen as a generalized version of the likelihood function (
6) given in [
6]. To verify this, if the data do not involve left truncation, then the 
’s and 
’s will be reduced to zero for all 
i and 
j; 
 will be reduced to the set of indices of all sample paths; and both the 
’s and 
’s will be reduced to 
’s for all 
i. Thus, the likelihood function in (
13) will boil down to (
6) with 
. The details of the derivation of the likelihood function (
13) are presented in 
Appendix B.
  4.2. Characteristics of the Posterior Distribution of the PTAM
In the PTAM, the posterior distribution of the model parameters is no longer a product of independent kernels. To verify this, we start by substituting (
4) into the likelihood function (
13), as follows:
        where 
.
Then, the posterior distribution 
 can be written as follows:
        where
        and
        with 
 and 
 denoting the respective prior distributions and likelihood functions, for 
.
Based on (
16), it is straightforward to see that the posterior distribution of the PTAM parameters can be decomposed into two independent posterior distributions—
 and a joint posterior distribution 
, where
Thus, we can evaluate the posterior distribution for 
 separately, using a gamma distribution (
21), which will produce the posterior distribution (
22) of the same class, as follows: 
However, the likelihood function of  does not consist of independent kernels, which prevents one from determining conjugate priors. The prior distributions for  and s, which are then subjectively determined, are taken to be ,  and . We assume, for simplicity, that  and s are independently distributed. Accordingly, their joint prior distribution, , will be the product of , , and .
  4.3. The Proposed Methodology for Sampling from 
Next, a methodology needs to be developed to address the sampling from the joint posterior distribution 
. The Gibbs algorithm can be utilized again, further taking advantage of the MCMC method. In that case, the proposed algorithm will become a nested MCMC algorithm. The nested Gibbs algorithm samples from the joint posterior distribution, given the augmented data. The algorithm framework is presented in 
Figure 5, for a 
p-dimensional posterior distribution.
In the case of the joint posterior distribution for the PTAM, 
 in 
Figure 5 becomes 
. For example, in order to sample from 
 in the 
th iteration, we need to sample from the corresponding conditional distributions. These are also the transition kernels of the Gibbs algorithm, as follows:
Since general notations are adopted in 
Figure 5, the concept of the nested MCMC algorithm is likely applicable to other models whose posterior distributions are complicated after data augmentation.
Define .
First, we introduce a sampling scheme for 
, which is the conditional distribution of 
. We know the following:
Rejection sampling can then be utilized in conjunction with 
, as described in Algorithm 3.
        
| Algorithm 3 The rejection sampling algorithm for . | 
| 1:Calculate the maximum value of  on . Denote it by .2:Draw a pair of samples .  and , where .3:while do4:    repeat Step 25:end while6:.
 | 
Secondly, we consider the sampling scheme for 
, which is the marginal distribution of 
. We know the following:
Rejection sampling can be utilized in conjunction with 
, as described in Algorithm 4.
        
| Algorithm 4 The rejection sampling algorithm for , | 
| 1:Calculate the maximum value of  on . Denote it by . In this case, a is a large enough truncation point.2:Draw a pair of samples .  and , where 3:while do4:    repeat Step 25:end while6:.
 | 
Thirdly, we consider the sampling scheme for 
, which is the marginal distribution of 
s. We know the following:
Rejection sampling can be utilized in conjunction with 
, as described in Algorithm 5.
        
| Algorithm 5 The rejection sampling algorithm for . | 
| 1:Calculate the maximum value of  on . Denote it by . In this case,  are large enough truncation points.2:Draw a pair of samples .  and , where .3:while do4:    repeat Step 25:end while6:.
 | 
The rejection sampling schemes presented in Algorithms 3–5 also constitute important original contributions. Unlike traditional rejection sampling where a proposal function is chosen to fully cover the target density, the proposed rejection sampling transforms them to a logarithmic scale. This differs from returning a log posterior value to a generic MCMC sampler, as the proposed method applies log-scale bounds within Gibbs updates for individual conditional distributions, some of which have truncated support. We note that the values of the posterior kernels are often too small to be handled by making use of the likelihood functions. In fact, sampling on a logarithmic scale is analogous to taking the logarithm of likelihood functions in order to find MLEs, since both frequentist and Bayesian methods will face the same problem caused by small likelihood function values. However, they deal with this problem differently. In some regular problems, frequentist inference may be performed by maximizing a function (often numerically), and using the curvature at the maximum to quantify uncertainty in an estimate. Other authors have addressed frequentist inference in a non-regular context (see [
21]). This study focuses on Bayesian inference, in which case, the output is a (posterior) distribution rather than a point estimate. In this context, it will involve random sampling techniques instead of optimization techniques, as is the case for the rejection sampling on a logarithmic scale presented in Algorithms 3–5. Technical details regarding rejection sampling on a logarithmic scale are elaborated on in 
Appendix A.
  4.4. The MCMC Algorithm for the PTAM
Combining all these building blocks in the data augmentation step and the posterior sampling step, Algorithm 6 presents the MCMC algorithm for Bayesian inference on the PTAM.
In Step 14, for the inner Gibbs sampling, at each iteration, the initial values are selected to be the parameter outputs in the previous iteration. Because the parameter outputs themselves also become increasingly accurate as they converge to the true posterior distribution, using the parameter outputs in previous iterations as the initial values is then believed to be more reasonable and objective. In that case, we can make the most of this algorithm.
        
| Algorithm 6 The MCMC algorithm for Bayesian inference on the PTAM. | 
| Require: The number of states, m, based on prior knowledge or subjective judgment. | 
| Input: Output: The posterior samples for , , s and , each of which has  sample points.  1.The data observations , and the entry times  if there are left-truncated data.  2.The hyper-parameters for prior distributions.  3.The number of states m.  4.The number of inner and outer iterations:  and .  5.The size of the burn-in period and thinning rate, if possible.
   1:Initialization.  2:Initialization.  3:for  do  4:    Draw a sample path  from , based on Algorithm 1 (or Algorithm 2 for right-censored data).  5:    Based on , calculate .  6:    for  do  7:        Sample  from , based on Algorithm 3.  8:        Sample  from , based on Algorithm 4.  9:        Sample  from , based on Algorithm 5.10:    end for11:    Sample  from .12:    .13:    Reset the inner Gibbs sampling vector to zeros.14:    .15:end for
 | 
  5. Simulation Study
In this section, the proposed algorithm is implemented via a simulation study. The aim of the study is to demonstrate that the proposed MCMC algorithm is sound and that the parameter estimability of the PTAM can be improved via sound prior information. Consider the following experimental conditions:
- (i)
- The underlying parameters are  - . They were taken from the simulation study on Le Bras limiting distribution carried out in [ 1- ], except that  m-  is assumed to take a moderate value of ten. 
- (ii)
- The sample size is 50. 
- (iii)
- There are 4500 iterations of the Gibbs sampler for data augmentation. 
- (iv)
- There are 500 iterations of the inner Gibbs sampling for the posterior distribution. 
- (v)
- The first 500 iterations are taken as burn-in, based on cumulative standard deviation plots (see [ 19- ]). 
- (vi)
- A thinning rate of 10 is adopted, based on autocorrelation functions (ACFs) (see [ 19- ]). 
- (vii)
- The prior distributions are assumed to be sound; the prior means remain close to the true parameter values with low variances. 
- We assume that  -  because the dying rate pattern forms a fairly convex increasing pattern, as displayed in  Figure 3- , which is consistent with the biological interpretation of dying rates. 
In 
Table 1, the true parameters are all within the corresponding 95% credible intervals. This indicates that the proposed MCMC algorithm for Bayesian inference is quite satisfactory. It can be seen from 
Figure 6 that the correlations between 
, and 
s are minimal. This indicates that the dependent structure of 
, and 
s in the likelihood function has little effect on the shape of the posterior distributions, so that 
, and 
s are still nearly independent, as was assumed in the prior distributions. This observation suggests that the estimability of 
, and 
s could be poor. In fact, the same conclusion can also be reached by observing the diagonal panels in 
Figure 6, which reveal the shapes of their posterior distributions. In particular, the posterior distribution for 
s closely resembles the prior. This suggests that their distributions are less responsive to data so that the prior effects are, to some degree, still preserved in the behaviour of their posterior distributions. This indicates a weaker inferential power and therefore a poorer estimability. In contrast, 
 is more estimable, as the shape of its posterior distribution differs substantially from that of its prior. Therefore, the role of prior distributions is crucial for dealing with flat likelihood functions. Sound prior information can improve the accuracy of the parameter estimates as the posterior distributions are highly dependent on the priors.
In 
Figure 7, the convergence of the proposed MCMC algorithm is being assessed by means of trace plots, ACFs, and ergodic mean plots. First, the trace plots demonstrate the stationarity of the MCMC samples in terms of level-off patterns, though there are occasionally a few spikes for 
 and 
s. However, such spikes are a normal phenomenon as the shapes of their posterior densities still remain close to their skewed prior densities due to poor estimability. Secondly, the ACFs for all parameters are within the tolerance range after the second lag. This indicates that the thinning rate effectively reduces the ACFs between the MCMC samples. Thirdly, the ergodic means all converge as the number of iterations increases. This suggests that the number of iterations, that is, 4500, is sufficient to believe that the simulated MCMC samples were approximately generated from the stationary distributions, which are the target posterior distributions.
  Prior Sensitivity Analysis
To further validate the vital role of sound prior information in terms of estimability improvement, we now conduct a prior sensitivity analysis. Two alternative types of priors are tested. The first type is taken to be falsely informative, where the prior means deviate noticeably from the true parameter values with low variances. The second type is taken to be non-informative, where parameters are uniformly distributed. The results are listed in 
Table 2 and illustrated in 
Figure 8 and 
Figure 9.
It can be seen from 
Table 2 and 
Figure 8 that when priors are taken to be falsely informative, the 95% credible intervals for 
, and 
s all failed to cover the true values. This is as expected because their likelihood functions are flat due to poor estimability. Then, the posterior distributions will be highly dependent on the prior distributions. On the other hand, the interval for 
 remains narrow and covers the true value, which indicates a better estimability than that of 
, and 
s.
Next, when the priors are taken to be non-informative, the shape of posterior density will be totally determined by the shape of the likelihood function. It can be seen from 
Table 2 and 
Figure 9 that while covering their MLEs as expected, the 95% credible intervals for 
, and 
s are extremely wide. This further corroborates the flatness of their likelihood functions with the concomitant poor estimability. On the other hand, the interval for 
 still remains narrow while covering its MLE, which indicates a better estimability.
Upon completing this prior sensitivity analysis, all conclusions are consistent with each other throughout this simulation study. The poor estimability of , as well as the improved estimability of , has been demonstrated. The significant prior sensitivity on , and s indicates that suitable prior information indeed plays a significant role in improving their estimability. Therefore, it is crucial to select priors that are as sound as possible when making Bayesian inference. Otherwise, deficient priors might yield unreliable parameter estimates, particularly when their estimability is poor.
  6. Data Analysis
In 
Section 5, we have shown that the proposed MCMC algorithm can improve parameter estimability for the PTAM by making use of sound prior information. In this section, we will demonstrate that in addition to improving estimability via sound prior information, the proposed algorithm can also be utilized to adapt the PTAM to real-life data.
Consider the data collected from the Channing House, which is a retirement community in Palo Alto, California. The data consist of entry ages, ages at death, and ages at study end for 462 people who resided in the facility between January 1964 and July 1975 (see [
22]). The Channing House data are chosen because all the residents in the community are approximately subject to the same circumstances, so that, relatively speaking, the aging process is the most significant factor contributing to the variability in their lifetimes, which is the process we intend to model using the PTAM. Moreover, the female data are chosen to preclude the effects of gender differences. Of the 361 females, 129 died while residing in Channing House, whereas the other 232 survived to the end of the study.
In practice, residents join a retirement community at various physiological ages. According to the Channing House data, the youngest entry age is 61. Thus, for modeling purposes, it will be assumed that the aging process starts at calendar age 50 for all residents. Under that setting, residents are expected to have variable physiological ages at the time of entering the study. As well, letting  ought to be more than adequate.
Unlike what was assumed in 
Section 5, an underlying model does not exist. In that case, the prior distributions are surmised to be as follows:
The priors are deliberately chosen in such a way that the model with parameters taken as the prior mean is far away from the Kaplan–Meier survival function estimates, as plotted in 
Figure 10. The purpose of proceeding this way is to more persuasively show that the proposed Bayesian approach is valid. In practice, of course, one should select the priors in such a way that the model with parameters taken as the prior mean is as close to the Kaplan–Meier survival function estimates as possible.
Using the proposed Bayesian approach, the parameter estimation results are displayed in 
Table 3.
In 
Figure 10, we illustrate the goodness of fit of the PTAM to the Channing House female data by plotting the fitted survival function along with the nonparametric Kaplan–Meier survival function estimates. In addition, for comparison purposes, we also plotted the model with parameters taken as the prior mean, the fitted model using the MLE method, and the fitted model obtained in [
1]. It can be observed that the PTAM fits the Channing House female data very well, as the associated fitted survival function stays within the 95% confidence limits of the Kaplan–Meier estimates. The significant difference between the fitted model and the model with parameters taken as the prior mean, as mentioned earlier, very convincingly validates the proposed Bayesian approach. This difference clearly shows that the prior distributions are actually updated to the corresponding posterior distributions for the Channing House female data.
Furthermore, the fitted models with 
, whether they are estimated based on the MLEs or the proposed Bayesian method, are in very close agreement with the fitted model in [
1], where 
. In fact, the fitted model with 
 fits the data even better for ages between 91 and 101.