# An Accelerated Failure Time Cure Model with Shifted Gamma Frailty and Its Application to Epidemiological Research

## Abstract

**:**

## 1. Introduction

## 2. Regression Models for Survival Time Response

#### 2.1. Literature Review

#### 2.2. Problem Formulation

**Z**be a p-dimensional covariate vector. Based on these, let $\{O,\Delta ,\mathit{Z}\}$ be the observed variables. Let the sample size be n, with the observed values of the i-individual following the probability distribution of ${(O,\Delta ,{\mathit{Z}}^{\top})}^{\top}$ independently of each other, and their realization is expressed as ${({t}_{i},{\delta}_{i},{\mathit{z}}_{i}^{\top})}^{\top}$. Therefore, the sample is $\{\{{t}_{i},{\delta}_{i},{\mathit{z}}_{i}\},\phantom{\rule{3.33333pt}{0ex}}i=1,\cdots ,n\}$.

#### 2.3. Mixture Cure Models

#### 2.4. Accelerated Failure Time Models

**. Here, let ${S}_{\epsilon}\left(t\right|\mathit{Z})$ be the conditional survival function where $\epsilon ={e}^{\xi}$; then,**

**Z****Z**, and it is equivalent to a linear regression model without censoring. In addition, a model without proportional hazard assumption can also be expressed depending on the distribution that assumes for $\xi $ or $\epsilon $ (it becomes the parametric proportional hazard model in the case of exponential and Weibull distributions). As with the proportional hazard model, there are semiparametric and parametric methods for inferring the AFT model (e.g., [21]). In the present study, we consider the parametric method and make statistical inferences assuming a specific distribution, such as Weibull and logarithmic normal distributions, for $\epsilon $.

#### 2.5. Frailty Models

**Z**depending on the magnitude of Y.

## 3. A Novel Accelerated Failure Time Frailty Mixture Cure Model

#### 3.1. Proposed Model

#### 3.2. Estimation Method and Its Algorithm

- Step 1
- Set the initial values ${\mathbf{\beta}}^{\left(1\right)},{\mathbf{\kappa}}^{\left(1\right)},{\mathbf{\theta}}^{\left(1\right)},{\mathbf{\gamma}}^{\left(1\right)}$.
- Step 2
- Calculate the sample version of Equation (22) for $k=1,2,\dots $. That is,$$\widehat{\overline{E}}\left[{D}_{i}\right]={\delta}_{i}+(1-{\delta}_{i})\frac{p{\left({\mathit{z}}_{i}\right)}^{\left(k\right)}{S}_{\mathrm{u},\left(k\right)}\left(t\right|\mathit{z})}{{p}_{\left(k\right)}\left({\mathit{z}}_{i}\right){S}_{\mathrm{u},\left(k\right)}\left(t\right|\mathit{z})+p{\left({\mathit{z}}_{i}\right)}^{\left(k\right)}{S}_{\mathrm{u},\left(k\right)}\left(t\right|\mathit{z})}.$$
- Step 3
- Find the updated value for each parameter$$\begin{array}{c}\hfill ({\mathbf{\beta}}^{(k+1)},{\mathbf{\kappa}}^{(k+1)},{\theta}^{(k+1)})=\underset{{\displaystyle \mathbf{\beta},\mathbf{\kappa},\theta}}{\mathrm{argmax}}{\overline{\ell}}_{1}(\mathbf{\beta},\mathbf{\kappa},\theta ),\phantom{\rule{3.33333pt}{0ex}}{\mathbf{\gamma}}^{(k+1)}=\underset{\mathbf{\gamma}}{\mathrm{argmax}}{\overline{\ell}}_{2}\left(\mathbf{\gamma}\right).\end{array}$$
- Step 4
- If the convergence condition is satisfied, terminate the algorithm and set the estimated values to $({\mathbf{\beta}}^{(k+1)},{\mathbf{\kappa}}^{(k+1)},{\theta}^{(k+1)},{\mathbf{\gamma}}^{(k+1)})$. Otherwise, increase the value of k by 1 and return to step 2.

## 4. Numerical Examples

#### 4.1. Simulations

#### 4.1.1. Setting

**Z**. Next, for ${\tilde{\mathit{Z}}}_{i}={(1,{\mathit{Z}}_{i}^{\top})}^{\top}$ and $\mathbf{\gamma}={({\gamma}_{1},{\gamma}_{2},{\gamma}_{3})}^{\top}$, let

- (i)
- $\mathbf{\beta}=(-0.5,0.5,-0.8)$, $(\mu ,\sigma ,q)=(3,1,2),\theta =0.5,\phantom{\rule{3.33333pt}{0ex}}\mathbf{\gamma}=(0.1,0.5,-1,0.6)$
- (ii)
- $\mathbf{\beta}=(-0.5,0.5,-0.8)$, $(\mu ,\sigma ,q)=(3,1,2),\theta =3,\phantom{\rule{3.33333pt}{0ex}}\mathbf{\gamma}=(0.1,0.5,-1,0.6)$

#### 4.1.2. Results

#### 4.2. Real Data Example

#### 4.2.1. Dataset and Previous Study

- Proportional hazard model;
- AFT model;
- Mixture cure model with the proportional hazard model;
- Mixture cure model with the AFT model;
- Proposed model.

#### 4.2.2. Results

## 5. Discussion

**Figure 2.**Histogram of the estimated values of ${\beta}_{1}$ in setting (ii). The red dotted line represents the true value.

**Figure 3.**Histogram of the estimated values of ${\gamma}_{1}$ in setting (ii). The red dotted line represents the true value.

**Figure 4.**Kaplan–Meier estimator of the survival function for the onset of hypertension for each gender. The black and red lines represent the estimators for males and females, respectively.

**Figure 5.**Probability density function of the estimated shifted gamma distribution. The black and red lines show the results for males and females, respectively.

**Table 1.**Selected related literature and elements of our proposed method. Elements of our proposed method and relationship among existing studies. The symbol ✓ means “Considered”. PH and AFT means proportional hazard and accelerated failure time models, respectively.

Literature | Cured Patients | Uncured Model | Frailty |
---|---|---|---|

Sy and Taylor [5] | ✓ | PH | – |

Vaupel [6], Aalen [12] | – | PH | gamma |

Pan [13] | – | AFT | gamma, log-normal |

Chen et al. [15] | – | AFT | generalized gamma |

Yu [9], Price and Manatunga [14] | ✓ | PH | gamma |

He [16] | ✓ | AFT | generalized gamma |

Present study | ✓ | AFT | shifted gamma |

Parameter | True Value | Mean (SD) | |||||
---|---|---|---|---|---|---|---|

n = 100 | n = 500 | n = 1000 | |||||

${\beta}_{1}$ | −0.5 | $-0.355$ | $\left(1.602\right)$ | $-0.507$ | $\left(0.948\right)$ | $-0.529$ | $\left(0.803\right)$ |

${\beta}_{2}$ | 0.5 | $0.278$ | $\left(3.000\right)$ | $0.796$ | $\left(1.556\right)$ | $0.814$ | $\left(1.203\right)$ |

${\beta}_{3}$ | −0.8 | $-0.402$ | $\left(1.830\right)$ | $-0.651$ | $\left(0.925\right)$ | $-0.656$ | $\left(0.737\right)$ |

$\mu $ | 3 | $2.091$ | $\left(4.390\right)$ | $2.889$ | $\left(2.074\right)$ | $2.844$ | $\left(0.985\right)$ |

$\sigma $ | 1 | $1.141$ | $\left(0.852\right)$ | $1.004$ | $\left(0.641\right)$ | $1.179$ | $\left(0.743\right)$ |

q | 2 | $1.435$ | $\left(3.412\right)$ | $2.804$ | $\left(1.835\right)$ | $2.600$ | $\left(2.012\right)$ |

$\theta $ | $0.5$ | $8.214$ | $\left(20.992\right)$ | $3.053$ | $\left(9.309\right)$ | $2.273$ | $\left(5.812\right)$ |

${\gamma}_{0}$ | 0.1 | $-0.063$ | $\left(1.526\right)$ | $0.135$ | $\left(1.110\right)$ | $0.192$ | $\left(0.927\right)$ |

${\gamma}_{1}$ | 0.5 | $0.540$ | $\left(1.216\right)$ | $0.576$ | $\left(0.853\right)$ | $0.553$ | $\left(0.818\right)$ |

${\gamma}_{2}$ | −1 | $-1.031$ | $\left(2.294\right)$ | $-0.878$ | $\left(1.556\right)$ | $-0.865$ | $\left(1.308\right)$ |

${\gamma}_{3}$ | 0.6 | $1.045$ | $\left(1.077\right)$ | $0.785$ | $\left(0.852\right)$ | $0.829$ | $\left(0.725\right)$ |

Parameter | True Value | Mean (SD) | |||||
---|---|---|---|---|---|---|---|

n = 100 | n = 500 | n = 1000 | |||||

${\beta}_{1}$ | −0.5 | $-0.092$ | $\left(1.796\right)$ | $-0.316$ | $\left(1.072\right)$ | $-0.452$ | $\left(0.823\right)$ |

${\beta}_{2}$ | 0.5 | $0.269$ | $\left(3.277\right)$ | $0.552$ | $\left(1.875\right)$ | $0.548$ | $\left(1.285\right)$ |

${\beta}_{3}$ | −0.8 | $-0.386$ | $\left(1.619\right)$ | $-0.777$ | $\left(1.032\right)$ | $-0.698$ | $\left(0.904\right)$ |

$\mu $ | 3 | $1.378$ | $\left(2.993\right)$ | $3.253$ | $\left(4.296\right)$ | $2.840$ | $\left(1.163\right)$ |

$\sigma $ | 1 | $0.910$ | $\left(0.768\right)$ | $1.206$ | $\left(0.878\right)$ | $1.077$ | $\left(0.718\right)$ |

q | 2 | $7.276$ | $\left(49.95\right)$ | $2.706$ | $\left(2.197\right)$ | $2.928$ | $\left(2.184\right)$ |

$\theta $ | 3 | $23.076$ | $\left(72.067\right)$ | $8.874$ | $\left(21.858\right)$ | $8.258$ | $\left(17.101\right)$ |

${\gamma}_{0}$ | 0.1 | $-0.093$ | $\left(1.333\right)$ | $0.185$ | $\left(1.335\right)$ | $0.109$ | $\left(0.962\right)$ |

${\gamma}_{1}$ | 0.5 | $0.699$ | $\left(1.191\right)$ | $0.704$ | $\left(0.926\right)$ | $0.534$ | $\left(0.748\right)$ |

${\gamma}_{2}$ | −1 | $-1.159$ | $\left(2.140\right)$ | $-1.133$ | $\left(1.587\right)$ | $-0.971$ | $\left(1.141\right)$ |

${\gamma}_{3}$ | 0.6 | $0.933$ | $\left(1.269\right)$ | $0.608$ | $\left(0.771\right)$ | $0.692$ | $\left(0.844\right)$ |

**Table 4.**AIC for the regression model of the onset of hypertension in males. The asterisk (*) represents that variable selection is performed for the uncured probability $p\left(\mathit{Z}\right)$.

Model | Distribution | Number of Parameters | AIC |
---|---|---|---|

Proportional hazard (PH) | Exponential | 19 | 7120.407 |

Weibull | 20 | 7030.615 | |

AFT model | Log-normal | 20 | 7002.352 |

Generalized gamma | 21 | 6999.232 | |

Mixture cure + PH | Exponential | 38 | 7103.040 |

Weibull | 39 | 7047.378 | |

Mixture cure + AFT | Log-normal | 39 | 7005.633 |

Generalized gamma* | 40 | 7004.674 | |

Generalized gamma * | 29 | 6992.934 | |

Mixtrue cure + AFT frailty | Generalized gamma Shifted gamma* | 41 | 7003.011 |

Generalized gamma Shifted gamma * | 30 | 6987.012 |

**Table 5.**AIC for the regression model of the onset of hypertension in females. The asterisk (*) represents that variable selection is performed for the uncured probability $p\left(\mathit{Z}\right)$.

Model | Distribution | Number of Parameters | AIC |
---|---|---|---|

Proportional hazard (PH) | Exponential | 19 | 11,804.01 |

Weibull | 20 | 11,644.16 | |

AFT | Log-normal | 20 | 11,596.45 |

Generalized gamma | 21 | 11,586.00 | |

Mixture cure + PH | Exponential | 38 | 11,798.48 |

Weibull | 39 | 11,669.49 | |

Mixture cure + AFT | Log-normal | 39 | 11,609.63 |

Generalized gamma * | 40 | 11,600.23 | |

Generalized gamma * | 27 | 11,579.07 | |

Mixture cure + AFT frailty | Generalized gamma Shifted gamma * | 41 | 11,596.26 |

Generalized gamma Shifted gamma * | 26 | 11,575.05 |

**Table 6.**Estimation result of regression coefficient when variable selection is performed by applying the proposed model in males. CI is the confidence interval. The asterisk (*) and dagger (†) indicate that p-value is less than $0.10$ and $0.05$, respectively.

Covariate | Inference of $\mathit{\beta}$ (Regression Coefficients for the Uncured Group) | ||
---|---|---|---|

Estimates | 95% CI | p-Value | |

age | $-0.0071$ | $(-0.0188,0.0046)$ | $0.232$ |

waist | $-0.0077$ | $(-0.0183,0.0029)$ | $0.155$ |

exe1h_day | $-0.0392$ | $(-0.2006,0.1221)$ | $0.634$ |

exe30_2_week | $0.0470$ | $(-0.0112,0.0517)$ | $0.562$ |

sleep_good | $0.0138$ | $(-0.1732,0.2009)$ | $0.885$ |

walk_speed | $-0.1112$ | $(-0.2617,0.0393)$ | $0.148$ |

eat_speed_n | $0.1386$ | $(-0.1019,0.3791)$ | $0.259$ |

eat_speed_f | $0.0672$ | $(-0.1617,0.2961)$ | $0.565$ |

eat_b_sleep | $0.0025$ | $(-0.1920,0.1970)$ | $0.980$ |

snacking | $-0.3154$ | $(-0.6195,-0.0112)$ | $0.042$ ${}^{\u2020}$ |

breakfast | $-0.1784$ | $(-0.4329,0.0760)$ | $0.169$ |

weight_move | $-0.0227$ | $(-0.1956,0.1502)$ | $0.797$ |

plus10kg | $0.0162$ | $(-0.1896,0.2219)$ | $0.878$ |

smoking | $-0.0673$ | $(-0.2229,0.0884)$ | $0.397$ |

drink_amount2 | $-0.1636$ | $(-0.4831,0.1558)$ | $0.315$ |

drink_amount3 | $-0.2134$ | $(-0.4313,0.0045)$ | 0.054 * |

drink_amount4 | $-0.1120$ | $(-0.3288,0.1048)$ | $0.311$ |

drink_amount5 | $-0.1824$ | $(-0.4299,0.0650)$ | $0.148$ |

Covariate | Inference of $\mathbf{\gamma}$ (Regression Coefficients for the Cured Group) | ||

Point Estimates | 95% CI | p-Value | |

Intercept | $-4.9869$ | $(-7.7813,-2.1924)$ | $0.0005$ ${}^{\u2020}$ |

age | $0.0882$ | $(0.0423,0.1340)$ | $0.0002$ ${}^{\u2020}$ |

eat_speed_n | $0.8394$ | $(-0.2355,1.9144)$ | $0.1259$ |

eat_speed_f | $0.8427$ | $(-0.0571,1.7424)$ | 0.0664 * |

kanshyoku | $-1.4826$ | $(-2.4026,-0.5626)$ | $0.0016$ ${}^{\u2020}$ |

plus10kg | $0.9488$ | $(0.0316,1.8666)$ | $0.0426$ ${}^{\u2020}$ |

drink_amount2 | $-0.9510$ | $(-1.9441,0.0420)$ | 0.0605 * |

drink_amount4 | $1.1910$ | $(0.0918,2.2901)$ | $0.0337$ ${}^{\u2020}$ |

**Table 7.**Results of parameter estimation when variable selection is performed by applying the proposed model in males.

Parameter | Distribution | |||
---|---|---|---|---|

Generalized Gamma | Shifted Gamma | |||

μ | σ | q | θ | |

Estimates | $8.4404$ | $0.8767$ | $-0.3141$ | $13.1350$ |

Standard error | $0.6150$ | $0.0866$ | $0.2701$ | $7.0537$ |

**Table 8.**Estimation result of regression coefficient when variable selection is performed by applying the proposed model in females. CI is the confidence interval. The asterisk (*) and dagger (†) indicate that p-value is less than $0.10$ and $0.05$, respectively.

Covariate | Inference of $\mathit{\beta}$ (Regression Coefficients for the Uncured Group) | ||
---|---|---|---|

Estimates | 95% CI | p-Value | |

age | $-0.0103$ | $(-0.0222,0.0016)$ | $0.089$ * |

waist | $-0.0034$ | $(-0.0100,0.0033)$ | $0.321$ |

exe1h_day | $-0.0493$ | $(-0.1717,0.0732)$ | $0.430$ |

exe30_2_week | $-0.0230$ | $(-0.1456,0.0995)$ | $0.712$ |

sleep_good | $0.0855$ | $(-0.1126,0.1297)$ | $0.890$ |

walk_speed | $0.0401$ | $(-0.0726,0.1528)$ | $0.486$ |

eat_speed_n | $-0.0651$ | $(-0.2045,0.0742)$ | $0.360$ |

eat_speed_f | $0.0520$ | $(-0.1051,0.2092)$ | $0.516$ |

eat_b_sleep | $-0.0342$ | $(-0.2395,0.1711)$ | $0.744$ |

snacking | $0.0442$ | $(-0.1075,0.1958)$ | $0.568$ |

breakfast | $-0.0781$ | $(-0.3625,0.2063)$ | $0.591$ |

weight_move | $-0.0210$ | $(-0.1589,0.1169)$ | $0.765$ |

plus10kg | $0.0301$ | $(-0.1391,0.1993)$ | $0.727$ |

smoking | $-0.1611$ | $(-0.0638,0.3859)$ | $0.160$ |

drink_amount2 | $-0.0098$ | $(-0.1464,0.1268)$ | $0.888$ |

drink_amount3 | $-0.0500$ | $(-0.2688,0.1688)$ | $0.654$ |

drink_amount4 | $-0.1566$ | $(-0.5165,0.2032)$ | $0.394$ |

drink_amount5 | $-0.2162$ | $(-0.8744,0.4419)$ | $0.520$ |

Covariate | Inference of $\mathbf{\gamma}$ (Regression Coefficients for the Cured Group) | ||

Point Estimates | 95% CI | p-Value | |

Intercept | $-6.0049$ | $(-8.3249,-3.6849)$ | <0.001 ${}^{\u2020}$ |

age | $0.1045$ | $(0.0645,0.1445)$ | <0.001 ${}^{\u2020}$ |

eat_speed_f | $0.5600$ | $(-0.1536,1.2736)$ | $0.124$ |

plus10kg | $0.6271$ | $(-0.1873,1.4415)$ | $0.131$ |

**Table 9.**Results of parameter estimation when variable selection is performed by applying the proposed model in females.

Parameter | Distribution | |||
---|---|---|---|---|

Generalized Gamma | Shifted Gamma | |||

μ | σ | q | θ | |

Estimates | $8.1034$ | $0.9653$ | $-0.8065$ | $22.9620$ |

Standard error | $0.4688$ | $0.0794$ | $0.2772$ | $9.9425$ |

