Detecting Clinical Risk Shift Through log–logistic Hazard Change-Point Model

Nadar, Shobhana Selvaraj; Upadhyay, Vasudha; Joshi, Savitri

doi:10.3390/math13091457

Open AccessArticle

Detecting Clinical Risk Shift Through log–logistic Hazard Change-Point Model

by

Shobhana Selvaraj Nadar

^†

,

Vasudha Upadhyay

^†

and

Savitri Joshi

^*

Department of Applied Sciences, Indian Institute of Information Technology Allahabad, Prayagraj 211015, India

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2025, 13(9), 1457; https://doi.org/10.3390/math13091457

Submission received: 27 March 2025 / Revised: 20 April 2025 / Accepted: 23 April 2025 / Published: 29 April 2025

(This article belongs to the Special Issue Advances in Statistical Methods with Applications)

Download

Browse Figures

Versions Notes

Abstract

The change–point problem is about identifying when a pattern or trend shifts in time–ordered data. In survival analysis, change–point detection focuses on identifying alterations in the distribution of time–to–event data, which may be subject to censoring or truncation. In this paper, we introduce a change–point in the hazard rate of the log–logistic distribution. The log–logistic distribution is a flexible probability distribution used in survival analysis, reliability engineering, and economics. It is particularly useful for modeling time–to–event data exhibiting decreasing hazard rates. We estimate the parameters of the proposed change–point model using profile maximum likelihood estimation. We also carry out a simulation study and Bayesian analysis using the Metropolis–Hastings algorithm to study the properties of the proposed estimators. The proposed log–logistic change–point model is applied to survival data from kidney catheter patients and acute myeloid leukemia (AML) cases. A late change–point with a decreasing scale parameter in the catheter data reflects an abrupt increase in risk due to delayed complications, whereas an early change–point with an increasing scale parameter in AML indicates high early mortality followed by slower hazard progression in survivors. We find that the log–logistic change–point model performs better in comparison to the existing change–point models.

Keywords:

acute myeloid leukemia; bee–swarm plot; change–point; hazard function; kidney catheter; log–logistic distribution; Metropolis–Hastings algorithm; profile maximum likelihood estimation

MSC:

62N01; 62N02

1. Introduction and Background

In survival analysis, the log–logistic distribution serves as an alternative to the more commonly used Weibull and log–normal distributions, offering an advantage when the data display a decreasing hazard rate. The probability density function (PDF) and the hazard function of the log–logistic distribution are, respectively, given by the following:

f (x; α) = \frac{α}{{(α + x)}^{2}}, x > 0

h (x; α) = \frac{α}{α + x}, x > 0

(1)

where

α > 0

.

The log–logistic distribution is widely used in survival analysis due to its ability to model decreasing hazard rates, making it suitable where the risk of an event declines over time. Several studies have contributed to the application of the log–logistic model. To mention a few, Gupta et al. (1999) [1] considered the log–logistic model to analyze lung cancer data and evaluated the confidence intervals for both parameters and the critical points. They also computed the mean residual life function and maximum likelihood estimators, which were found to be unique for both parameters. Bennett (1983) [2] considered a log–logistic regression model by equating one parameter of the distribution with a linear function of the covariates, which was a special case of the proportional odds model, and tested it on patients with advanced inoperable lung cancer. He fitted the data using the Generalized Linear Interactive Modeling software and found that the model captured the general shape of the data very well. Muse et al. (2021) [3] comprehensively reviewed the log–logistic distribution and its generalizations, highlighting its mathematical properties, applications, and advancements in statistical modeling.

Apart from analyzing survival data, the log–logistic distribution has gained increasing traction in various fields such as economics, medical sciences, hydrology, and social sciences. Lemonte (2014) [4] incorporated the beta distribution as a weight function, allowing for greater adaptability in fitting real–world data. Ashkar and Mahdi (2006) [5] investigated the use of the generalized moment (GM) method for estimating parameters and quantiles in the two–parameter log–logistic (LL2) distribution. They compared the GM method with other common estimation techniques, including the generalized probability–weighted moments and the maximum likelihood method, and observed that the GM method provided more accurate parameter estimates in the LL2 case when moment orders were appropriately selected. Shoukri, Mian, and Tracy (1988) [6] examined the sampling properties of the estimators for the log–logistic distribution, focusing on its application to Canadian precipitation data. They compared the following two estimation methods: probability–weighted moments and maximum likelihood estimation. Their findings confirmed that the log–logistic distribution provided an excellent fit for various meteorological datasets, making it a valuable tool in hydrology and climatology. De Santana et al. (2012) [7] introduced the Kumaraswamy–log–logistic distribution, a generalization of the log–logistic distribution designed to provide greater flexibility in modeling lifetime and survival data. Ramos et al. (2013) [8] introduced the Zografos–Balakrishnan–log–logistic distribution, an extension of the traditional log–logistic model. This distribution accommodated various hazard function shapes, enhancing flexibility in survival analysis. Alfaer et al. (2021) [9] introduced the extended log–logistic (Ex–LL) distribution, a flexible extension of the traditional log–logistic model designed to better capture actuarial and engineering risk data, particularly those with heavy–tailed distributions. Felipe et al. (2023) [10] proposed a robust estimation approach for the log–logistic distribution using minimum density power divergence estimators (MDPDEs) to address the limitations of traditional methods in the presence of outliers. The study demonstrated, through simulation and real data, that the MDPDEs offered a favorable balance between efficiency and robustness, making them suitable for practical survival analysis. Gaire and Gurung (2024) [11] introduced a new four–parameter log–logistic distribution based on the Rayleigh distribution. They carried out two methods of estimation and derived the statistical properties of the estimators, such as moments, survival function, etc. They applied it on two real–world datasets and found that the proposed model performed better than some select probability models.

Change–Point Problems in Survival Analysis

The problem of change–points in survival analysis has been a topic of interest for researchers in the past few decades. Matthews and Farewell (1982) [12] were among the first to consider change–point detection in hazard functions. Chang, Chen, and Hsiung (1994) [13] developed estimation methods for change–point hazard rate models under random censorship. Gijbels and Gürler (2003) [14] proposed a model incorporating a hazard jump at a specific time point and estimated the unknown parameters using maximum likelihood estimation, a least squares estimator, and the cumulative hazard–based approach introduced by Chang, Chen and Hsiung. Goodman, Li, and Tiwari (2011) [15] developed methods to detect multiple change–points in piecewise constant hazard functions. Williams and Kim (2013) [16] developed a Weibull hazard model considering Type–I censoring and staggered entry and studied the applicability of the model through a data on chronic granulomatous disease. Joshi et al. (2017) [17] proposed the Lindley hazard change–point model, estimating that the risk of death from a bone marrow transplant in patients with acute lymphoblastic leukemia significantly decreases after certain time. Palmeros et al. (2018) [18] considered a Weibull hazard regression model in the presence of covariates and censored observations. They estimated the parameters using maximum likelihood estimation and Monte Carlo simulation and applied the model on data regarding chronic granulomatous disease. Joshi and Rattihalli (2019) [19] proposed a general hazard regression change–point model where the effect of covariates on lifetime begins after the change–point. The Exponential–Lindley hazard change–point model was introduced by Joshi and Rattihalli (2020), ref. [20] where the failure rate remained constant before and after the change–point, following a Lindley distribution failure rate. Gierz and Park (2022) [21] proposed a sequential testing approach for detecting multiple change–points in a Weibull accelerated failure time (AFT) model and conducted a simulation study to show that the proposed method detected change–points and estimated the model quite accurately.

Despite the log–logistic model being a popular survival model, its scope in the field of change–point analysis is unattempted. Thus, in this paper, we aim to achieve the following:

Introduce a novel hazard change–point model, namely the log–logistic hazard change–point model;
Estimate the model parameters using the profile maximum likelihood estimation method and the Bayesian method, as well as compare the results for varying sample sizes and censoring levels;
Show the suitability of the model to detect a change–point in two real–world datasets.

The article is organized as follows: After an introduction and background in Section 1, the proposed change–point model is introduced in Section 2. In Section 3, parameter estimation using the profile maximum likelihood estimation procedure is shown followed by a simulation study in Section 4. We also evaluate the performance of the estimators through a Bayesian analysis, carried out in Section 5. In Section 6, a real–life data analysis is performed using the proposed change–point model. Lastly, Section 7 discusses the findings and future works.

2. log–logistic Hazard Change–Point Model for Survival Analysis

Let T be a survival time variable with a hazard function given by Equation (1). Suppose after a certain point in time, say

τ

,

α_{1}

changes to

α_{2}

. The hazard model under this setup is now given by the following equation:

h (t) = \{\begin{matrix} \frac{1}{α_{1} + t}, & 0 ⩽ t ⩽ τ, \\ \frac{1}{α_{2} + t} & t > τ, \end{matrix}

where

α_{1} > 0

and

α_{2} + τ > 0

. It can be seen that if

α_{1}

< (>)

α_{2}

, then the hazard function beyond the change–point is at lower (higher) level than the hazard function below the change–point. It can also be noted that both the hazard functions are decreasing in X (Figure 1).

The PDF and the survival function corresponding to the above model are as follows

f (t) = \{\begin{matrix} \frac{α_{1}}{{(α_{1} + t)}^{2}}, & 0 ⩽ t ⩽ τ, \\ \frac{α_{1} (α_{2} + τ)}{(α_{1} + τ) {(α_{2} + t)}^{2}} & t > τ, \end{matrix}

and

S (t) = \{\begin{matrix} \frac{α_{1}}{α_{1} + t}, & 0 ⩽ t ⩽ τ, \\ \frac{α_{1} (α_{2} + τ)}{(α_{1} + τ) (α_{2} + t)} & t > τ . \end{matrix}

(2)

where we have three parameters—

α_{1}

,

α_{2}

, and

τ

—to be estimated. In the next section, we discuss the profile maximum likelihood estimation (PMLE) method.

3. Parameter Estimation Using PMLE

To estimate the parameters, we opt for PMLE; however, one may use any command based approach of multi parameter estimation in R or any other suitable platform.

We shall consider random right censoring, i.e., instead of observing the survival time

T_{i}

, we observe a pair (

T_{i}

,

C_{i}

) (

C_{i}

is the right censored observations) and a censoring indicator

ϵ_{i}

, where

ϵ_{i} = 1

for an uncensored observation, and 0 for a censored observation. The study variable now becomes

X_{i} =

min

(T_{i}, C_{i})

.

With the assumption of independent censoring, the likelihood function for the pair (

X_{i}

,

ϵ_{i}

) is expressed as follows:

\begin{matrix} L (α_{1}, α_{2}, τ | \underset{̲}{x}) & \propto & \prod_{i = 1}^{n} {(f (x_{i}))}^{ϵ_{i}} \prod_{i = 1}^{n} {(S (x_{i}))}^{(1 - ϵ_{i})} . \end{matrix}

where n is the total number of observations in the data. In the presence of a change–point, the likelihood changes to the following:

\begin{matrix} L (α_{1}, α_{2}, τ | \underset{̲}{x}) & \propto & \prod_{i = 1}^{n} {(f (x_{i}))}^{I (X_{i} ⩽ τ) ϵ_{i}} \prod_{i = 1}^{n} {(S (x_{i}))}^{I (X_{i} ⩽ τ) (1 - ϵ_{i})} \\ \prod_{i = 1}^{n} {(f (x_{i}))}^{I (X_{i} > τ) ϵ_{i}} \prod_{i = 1}^{n} {(S (x_{i}))}^{I (X_{i} > τ) (1 - ϵ_{i}),} \end{matrix}

where

I_{[0, τ]} (X_{i})

is the indicator function. Substituting the values of

f (x)

and

S (x)

for the proposed log–logistic model, we obtain the likelihood function as follows:

\begin{matrix} L (α_{1}, α_{2}, τ | \underset{̲}{x}) & \propto & \prod_{i = 1}^{n} {(\frac{α_{1}}{{(α_{1} + x_{i})}^{2}})}^{I (X_{i} ⩽ τ) ϵ_{i}} \prod_{i = 1}^{n} {(\frac{α_{1}}{α_{1} + x_{i}})}^{I (X_{i} ⩽ τ) (1 - ϵ_{i})} \\ \prod_{i = 1}^{n} {(\frac{α_{1} (α_{2} + τ)}{(α_{1} + τ) {(α_{2} + x_{i})}^{2}})}^{I (X_{i} > τ) ϵ_{i}} \prod_{i = 1}^{n} {(\frac{α_{1} (α_{2} + τ)}{(α_{1} + τ) (α_{2} + x_{i})})}^{I (X_{i} > τ) (1 - ϵ_{i})} . \end{matrix}

The log-likelihood function corresponding to the above expression is as follows:

\begin{matrix} log L (α_{1}, α_{2}, τ | \underset{̲}{x}) & \propto & \sum_{i = 1}^{n} ϵ_{i} I (X_{i} ⩽ τ) log (\frac{α_{1}}{{(α_{1} + x_{i})}^{2}}) + \sum_{i = 1}^{n} (1 - ϵ_{i}) I (X_{i} ⩽ τ) log (\frac{α_{1}}{α_{1} + x_{i}}) \\ + \sum_{i = 1}^{n} ϵ_{i} I (X_{i} > τ) log (\frac{α_{1} (α_{2} + τ)}{(α_{1} + τ) {(α_{2} + x_{i})}^{2}}) + \sum_{i = 1}^{n} (1 - ϵ_{i}) I (X_{i} > τ) log (\frac{α_{1} (α_{2} + τ)}{(α_{1} + τ) (α_{2} + x_{i})}) . \end{matrix}

Let

n_{u}

be the total number of uncensored observations in the data. Let

u (τ) = \sum_{i = 1}^{n} I (X_{i} ⩽ τ) ϵ_{i}

and

c (τ) = \sum_{i = 1}^{n} I (T_{i} ⩽ τ) (1 - ϵ_{i})

be the number of uncensored and censored observations by time

τ

, respectively. The log-likelihood function can be further simplified as follows:

\begin{matrix} log L (α_{1}, α_{2}, τ | \underset{̲}{x}) \propto n log α_{1} - 2 \sum_{i = 1}^{n} ϵ_{i} I (X_{i} \leq τ) log (x_{i} + α_{1}) - \sum_{i = 1}^{n} (1 - ϵ_{i}) I (X_{i} \leq τ) log (x_{i} + α_{1}) \\ - (n - u (τ) - c (τ)) [log (α_{1} + τ) - log (α_{2} + τ)] - 2 \sum_{i = 1}^{n} ϵ_{i} I (X_{i} > τ) log (x_{i} + α_{2}) \\ - \sum_{i = 1}^{n} (1 - ϵ_{i}) I (X_{i} > τ) log (x_{i} + α_{2}) \end{matrix}

Let

X_{(i)}

be the order statistics corresponding to

X_{i}

, and

ϵ_{(i)}

be the corresponding censoring indicator. For estimating

α_{1}, α_{2}

and

τ

, we assume that

τ (= \tilde{τ})

is known. We consider

\tilde{τ}

to be the mid-point of each interval in all the

(n - 1)

intervals

[X_{(j)}, X_{(j + 1)}]

,

j = 1, 2, \dots, (n - 1)

. The confined log-likelihood is given as follows:

\begin{matrix} log L (α_{1}, α_{2}, \tilde{τ} | \underset{̲}{x}) \propto n log α_{1} - 2 \sum_{i = 1}^{n} ϵ_{(i)} I (X_{(i)} \leq \tilde{τ}) log (x_{(i)} + α_{1}) - \sum_{i = 1}^{n} (1 - ϵ_{(i)}) I (X_{(i)} \leq \tilde{τ}) log (x_{(i)} + α_{1}) \\ - (n - u (\tilde{τ}) - c (\tilde{τ})) [log (α_{1} + \tilde{τ}) - log (α_{2} + \tilde{τ})] - 2 \sum_{i = 1}^{n} ϵ_{(i)} I (X_{(i)} > \tilde{τ}) log (x_{(i)} + α_{2}) \\ - \sum_{i = 1}^{n} (1 - ϵ_{(i)}) I (X_{(i)} > \tilde{τ}) log (x_{(i)} + α_{2}) \end{matrix}

(3)

By applying the principle of maximum likelihood (ML) estimation and taking the partial derivatives of the simplified log-likelihood with respect to

α_{1}

and

α_{2}

, we get the following equations:

\frac{\partial log L (α_{1}, α_{2}, τ | \underset{̲}{x})}{\partial α_{1}} = \frac{n}{α_{1}} - 2 \sum_{i = 1}^{n} \frac{ϵ_{i} I (X_{i} ⩽ τ)}{α_{1} + x_{i}} - \sum_{i = 1}^{n} \frac{(1 - ϵ_{i}) I (X_{i} ⩽ τ)}{α_{1} + x_{i}} - \frac{n - u (τ) - c (τ)}{α_{1} + τ},

and

\frac{\partial log L (α_{1}, α_{2}, τ | \underset{̲}{x})}{\partial α_{2}} = \frac{n_{u} - u (τ) - c (τ)}{α_{2} + τ} - 2 \sum_{i = 1}^{n} \frac{ϵ_{i} I (X_{i} > τ)}{α_{2} + x_{i}} - \sum_{i = 1}^{n} \frac{(1 - ϵ_{i}) I (X_{i} > τ)}{α_{2} + x_{i}} .

The above likelihood equations cannot be solved analytically for both

α_{1}

and

α_{2}

, therefore we employ numerical methods of estimation using the nlm() and the optimize() function in R [22]. To estimate

α_{1}

, we consider

α_{2}

to be known (nuisance parameter). The initial value of

α_{2}

is taken to be the one used in the data simulation. By maximizing Equation (3) with respect to

α_{1}

, using the nlm() function and the same seed value of

α_{1}

as that used in the data simulation, we obtain the PML estimate of

α_{1}

for the

j t h

interval, denoted as

{\hat{α_{1}}}_{j}

. Then, by replacing

α_{1}

with

{\hat{α_{1}}}_{j}

in (3) and maximizing it for

α_{2}

using the optimize() function, as well as considering the interval for

α_{2}

such that the estimated value falls in the given interval, (0,3) yields the PML estimate of

α_{2}

for the

j t h

interval, denoted as

{\hat{α_{2}}}_{j}

. Substituting the values of

{\hat{α_{1}}}_{j}

and

{\hat{α_{2}}}_{j}

obtained in (3) and maximizing the function over [

X_{(1)}, X_{(n)}

] will give us the estimate

{\hat{τ}}_{j}

. Proceeding iteratively, we obtain the estimates

{\hat{α_{1}}}_{j}, {\hat{α_{2}}}_{j},

and

{\hat{τ}}_{j}

for all the intervals.

We compute the log-likelihood

log L_{j} ({\hat{α_{1}}}_{j}, {\hat{α_{2}}}_{j}, {\hat{τ}}_{j}), j = 1, 2, \dots, (n - 1)

. Suppose the maximum value of the likelihood is obtained at index

\bar{j}

. The estimates of

({\hat{α_{1}}}_{j}, {\hat{α_{2}}}_{j}, {\hat{τ}}_{j})

corresponding to

\bar{j}

will provide us with the PML estimates of the parameters, i.e.,

{\hat{α_{1}}}_{\bar{j}}, {\hat{α_{2}}}_{\bar{j}}, {\hat{τ}}_{\bar{j}}

.

4. Simulation Study

In order to simulate data for our analysis, we use the inverse cumulative distribution function (CDF) technique. For this purpose, we obtain a random sample from a standard rectangular distribution

U (0, 1)

of the same size as our data. If our distribution function is denoted by

F (x)

, equating it with the random uniform sample

F (x) = u

previously generated will yield the required data

x = F^{- 1} (u)

for a fixed value of

(α_{1}, α_{2}, τ)

.

The distribution function for the model proposed in Equation (2) is as follows:

F (x) = 1 - S (x) = \{\begin{matrix} \frac{x}{α_{1} + x}, & 0 ⩽ x ⩽ τ, \\ \frac{(α_{1} + τ) x + (α_{2} - α_{1}) τ}{(α_{1} + τ) (α_{2} + x)}, & x > τ . \end{matrix}

For the change–point

τ

, the corresponding uniform number will be

u_{0} = F (τ) = \frac{τ}{α_{1} + τ}

. The generated sample is thus given by the following:

x = F^{- 1} (u) = \{\begin{matrix} \frac{u α_{1}}{1 - u}, & 0 \leq u \leq u_{0}, \\ \frac{1}{1 - u} (\frac{α_{2} + τ}{α_{1} + τ}) α_{1} - α_{2} + α_{2} u, & u_{0} < u \leq 1 . \end{matrix}

To check how well the proposed estimators work, we use PMLE. We use the following two sets of parameters for the estimation: (

α_{1}

= 1,

α_{2}

= 2,

τ

= 1) and (

α_{1}

= 2,

α_{2}

= 1,

τ

= 3). The estimation procedure is repeated 1000 times to ensure the reliability of the results. For each simulation run, we generate random samples from the log–logistic distribution with predefined parameters. We test for different sample sizes (n), 20, 50, 200, 500, and 1000, to see how the estimators perform under four levels of right-censoring, including 0%, 10%, 20%, and 50%. To assess the performance of the proposed estimators, we calculate bias and mean square error (MSE) and present them in Table 1 and Table 2.

Conclusion of the Simulation Study

From Table 1 and Table 2, we observe the following:

The estimators for $α_{1}$ , $α_{2}$ , and $τ$ are consistently close to the true values across all sample sizes and censoring levels, with both bias and MSE decreasing as the sample size increases. This indicates that the estimators are approximately unbiased and consistent, even under high censoring. However, an increase in censoring percentage increases the MSE, because it introduces uncertainty due to missing information in the likelihood.
We plot bee-swarm plots to visualize the estimates of all the parameters (Figure A1, Figure A2, Figure A3 and Figure A7, Figure A8, Figure A9), and we observe that all the plots show similar observations, which indicates that the estimators are asymptotically unbiased and consistent.
We also plot the kernel density estimates to assess the normality. From Figure A4, Figure A5, Figure A6 and Figure A10, Figure A11, Figure A12, we see that the estimators are asymptotically normal. Here, in the case of 50% censoring when $α_{2} = 1$ , we find that the kernel density estimator does not converge to normality at sample size 1000; however, the convergence can be achieved with a larger sample size.

5. Bayesian Analysis

We also establish the properties of the estimators obtained in Section 3 using a Bayesian analysis. We apply the Metropolis–Hastings (M-H) algorithm [23] and a Markov Chain Monte Carlo (MCMC) technique to estimate

α_{1}, α_{2} and τ

. We choose a normal distribution for all our three parameters (with an optimal choice of variance for all three parameters) as our proposal distribution.

If

{\tilde{α}}_{1}, {\tilde{α}}_{2}, and \tilde{τ}

denote, respectively, the random variables

α_{1}, α_{2}, and τ

, they are then expressed as follows:

{\tilde{α}}_{1} \sim N (α_{1}, {0.5}^{2}), {\tilde{α}}_{2} \sim N (α_{2}, {0.3}^{2}) and \tilde{τ} \sim N (τ, {0.7}^{2})

Under a normal prior, the log of the posterior likelihood is given by the following:

\begin{matrix} log L ({\tilde{α}}_{1}, {\tilde{α}}_{2}, \tilde{τ} | α_{1}, α_{2}, τ, \underset{̲}{x}) \propto n log {\tilde{α}}_{1} - 2 \sum_{i = 1}^{n} ϵ_{i} I (X_{i} \leq \tilde{τ}) log (x_{i} + {\tilde{α}}_{1}) - \sum_{i = 1}^{n} (1 - ϵ_{i}) I (X_{i} \leq \tilde{τ}) log (x_{i} + {\tilde{α}}_{1}) \\ - (n - u (\tilde{τ}) - c (\tilde{τ})) [log ({\tilde{α}}_{1} + \tilde{τ}) - log ({\tilde{α}}_{2} + \tilde{τ})] - 2 \sum_{i = 1}^{n} ϵ_{i} I (X_{i} > \tilde{τ}) log (x_{i} + {\tilde{α}}_{2}) \\ - \sum_{i = 1}^{n} (1 - ϵ_{i}) I (X_{i} > \tilde{τ}) log (x_{i} + {\tilde{α}}_{2}) - log (\sqrt{2 π} 0.5) - \frac{1}{2} {(\frac{{\tilde{α}}_{1} - α_{1}}{0.5})}^{2} - log (\sqrt{2 π} 0.3) \\ - \frac{1}{2} {(\frac{{\tilde{α}}_{2} - α_{2}}{0.3})}^{2} - log (\sqrt{2 π} 0.7) - \frac{1}{2} {(\frac{\tilde{τ} - τ}{0.7})}^{2} \end{matrix}

We run the algorithm 10,000 times. At each iteration i, a proposed value is generated from

{\underset{̲}{θ}}^{(i)} \sim N ({\underset{̲}{θ}}^{(i - 1)}, σ^{2})

, where

{\underset{̲}{θ}}^{(i - 1)} = ({\tilde{α}}_{1}^{(i - 1)}, {\tilde{α}}_{2}^{(i - 1)}, {\tilde{τ}}^{(i - 1)})

denotes the value of the parameters at the

(i - 1) t h

iteration. The M-H algorithm is described as follows:

Algorithm:

First, generate a proposal $θ^{*} = θ^{(i)}$ from the proposal distribution $N ({\underset{̲}{θ}}^{(i - 1)}, σ^{2})$ .
Next, compute the acceptance ratio, which is given as follows:

$r_{i} = min (1, \frac{L (θ^{(i)} | α_{1}, α_{2}, τ, \underset{̲}{x})}{L ({\underset{̲}{θ}}^{(i - 1)} | α_{1}, α_{2}, τ, \underset{̲}{x})})$

(4)
Then, draw a random variable $u_{i} \sim U [0, 1]$ .
If $u_{i} \leq r_{i}$ , accept the proposal and set $θ^{*} = θ^{(i)}$ ; otherwise, reject the proposal and set $θ^{*} = θ^{(i - 1)}$ .

We consider sample sizes of 20, 50, 200, 500, and 1000 under 0%, 10%, 20%, and 50%, censoring for the set of parameters

α_{1}

= 2,

α_{2}

= 1, and

τ

= 3. We do not discard the burn-in samples; we retain all 10,000 samples.

We plot trace plots for all parameters (shown in Figure A13, Figure A14, Figure A15) and tabulate the mean, bias, MSE, and the acceptance rate (AR), which is defined as the proportion of accepted samples out of the 10,000 simulated samples. The results are shown in Table 3.

From Table 3 and from the trace plots, we observe the following:

With an increase in sample size, there is a reduction in the bias and the MSE for all three parameters.
We observe that the AR increases with an increase in sample size. We also see that the AR is lower for the same sample size when we increase the censoring percentage. Hence, we can infer that censoring has a slight effect on the AR of the samples.
From the traceplots, we see that there is no trend or fluctuation observed in any of the three parameters, indicating a good mix of the chain.

Comparing the Bayesian analysis with the PMLE, we find that the bias and MSE are significantly lower for the Bayesian analysis compared to the PMLE; this could be due to the greater number of iterations in the Bayesian scenario, as well as the use of normal priors with small variance as the proposal distribution. Overall, the simulation results and the Bayesian analysis indicate that the proposed estimators are asymptotically unbiased, consistent, and asymptotically normal.

6. Data Analysis

In this section, we apply the proposed log–logistic hazard change–point model (LLHM) to two real-life datasets and evaluate its performance by comparing it with the exponential hazard change–point model (EHM), the Lindley hazard change–point model [LHM], and the Weibull hazard change–point model [WHM].

Model Evaluation Criteria and Goodness-of-Fit Measures

The AIC and BIC are defined as follows: [24]

AIC = −2 ln L + 2k, BIC = −2 ln L + k ln n

where ln L is the maximum value of the log-likelihood function for the estimated model, k is number of estimated parameters, and n is the sample size. Further, to assess the accuracy of the proposed model, the goodness-of-fit metrics—Manhattan distance (

L_{1}

norm), Euclidean distance (

L_{2}

norm), and Kolmogorov–Smirnov (

K – S

) statistic—are defined as follows:

\begin{matrix} L_{1} norm, d_{1} (a, b) = \sum_{j = 1}^{n} | a^{(j)} - b^{(j)} | \end{matrix}

\begin{matrix} L_{2} norm, d_{2} (a, b) = \sqrt{\sum_{j = 1}^{n} {(a^{(j)} - b^{(j)})}^{2}} \end{matrix}

\begin{matrix} K – S statistic, d_{3} (a, b) = max_{1 \leq k \leq n} |\sum_{j = 1}^{k} a^{(j)} - \sum_{j = 1}^{k} b^{(j)}| \end{matrix}

The Anderson–Darling (A-D) test is employed to determine whether the distribution of the model-generated data is statistically similar to that of the observed data. This test evaluates the null hypothesis

H_{0}

, which states that both samples are drawn from the same underlying distribution.

The A-D statistic is given by the following:

A^{2} = - n - \frac{1}{n} \sum_{i = 1}^{n} (2 i - 1) [ln F (X_{(i)}) + ln (1 - F (X_{(n + 1 - i)}))]

where n is the sample size,

X_{(i)}

are the ordered sample values, and F is the cumulative distribution function of the reference distribution (in this case, the empirical distribution of the observed data).

A high p-value (typically

p > 0.05

) indicates that there is no statistically significant difference between the model and observed distributions, supporting the similarity assumption and indicating a good fit of the model to the data.

6.1. Kidney Catheter Data

The kidney dataset in the survival package is based on a medical study that investigated repeated kidney infection in patients following catheter treatment. A total of 76 patients were monitored over time to record the duration until a recurrence of infection, out of which 18 were censored. The dataset included information such as the time (in days) until the event of interest occurred, such as catheter failure or removal, censoring status, treatment type, and other related clinical details. The study was originally described in McGilchrist and Aisbett (1991).

Early catheter failures are frequently associated with acute complications, such as improper placement, immediate post-insertion infection, or suboptimal initial care. However, patients who avoid these early risks tend to reach a period of relative stability, during which the catheter can function effectively for an extended duration [25].

The change–point models are applied to the selected dataset. Model adequacy is assessed using the standard selection criteria, and their performance is evaluated through a range of predefined similarity and goodness-of-fit measures. The results of these evaluations are summarized in Table 4 and Table 5, with graphical representations of the cumulative distribution functions (CDFs) (Figure 2) illustrating how well each model fits the observed data.

From Table 5, it is observed that, among all the competing models, the proposed LLHM yields the smallest values across all distance-based evaluation metrics, indicating a better fit to the data. Also from the A-D test, the asymptotic p-value indicates that the proposed model is not significant at 5% level of significance and is therefore a good fit. Based on the analysis, from Table 4, a change–point is identified at 201 days and the scale parameter decreases from 0.0170 to 0.0155, i.e., there is an abrupt increase in the hazard rate after around 201 days, after which the infection increases at a faster rate, signaling catheter removal or failure. This finding also aligns with the fact that the median duration of catheter dialysis is 190 days [25].

6.2. Acute Myeloid Leukemia

The analysis in this study is based on the LeukSurv dataset, which contains detailed survival data for 1043 patients diagnosed with acute myeloid leukemia (AML). It includes key variables such as time (in days) to event (death), event status (indicating death or censoring), age, sex, and white blood cell count (wbc) at diagnosis. For AML, the hazard rate plays a critical role in understanding disease progression and evaluating treatment outcomes. AML typically exhibits a high early hazard, especially during induction therapy, due to treatment-related toxicity and complications such as infections or bleeding [26].

The change–point models are applied to the selected dataset. Model adequacy is assessed using the standard selection criteria, and their performance is evaluated through a range of predefined similarity and goodness-of-fit measures. The results of these evaluations are summarized in Table 6 and Table 7, with graphical representations of the cumulative distribution functions (CDFs) (Figure 3) illustrating how well each model fits the observed data.

From the above-mentioned results (Table 6 and Table 7), the EHM shows lower

A I C

and

B I C

values, which might be due to the simplicity of the model as compared to LLHM. However, other evaluation measures such as the goodness-of-fit tests show higher values, indicating a poor fit of EHM to the data. This means that although EHM looks good based on

A I C

and

B I C

, it does not capture the actual pattern of the data well. Apart from

A I C

and

B I C

, the LLHM performs best across multiple evaluation criteria. It shows the smallest values in the

L_{1}

norm,

L_{2}

norm, and

K – S

statistic, indicating that its CDF curve is closest to the observed data. Additionally, the

A - D

test yields a p-value that is not significant at 5% level of significance, further supporting that the model fits the data well.

The WHM, LLHM, and EHM estimate delayed change-points ranging from approximately 479 to 535 days. Among these, the LLHM identifies a change–point at 11 days, indicating a sharp increase in hazard very early in the disease course. From Table 6, a small change–point indicates that the risk of failure (e.g., death or relapse) is initially high during the early weeks after diagnosis. However, the increase in the scale parameter after this early change–point suggests that patients who survive this critical early period tend to have a slower progression of risk over time. This finding of a change–point at 11 days is consistent with the fact that the maximum risk of death due to AML occurs within 3–4 weeks after start of the treatment [26].

7. Conclusions and Future Work

In this paper, we introduce a log–logistic hazard model in the presence of a change–point. We derive expressions for the PDF and the survival function. We then estimate parameters using the PMLE and study the estimators obtained using different statistical measures and plots. The proposed change–point model is applied to both the Kidney Catheter and the Acute Myeloid Leukemia datasets and compared with the EHM, LLHM, and WHM. In both cases, LLHM provides the best fit, as indicated by lower values in the distance-based metrics (e.g.,

K – S

statistic,

L_{1}

and

L_{2}

norms) and the A-D test asymptotic p-value.

Interestingly, the change–point occurred earlier in the AML dataset, and the scale parameter increased after the change–point, suggesting that risk was highest in the early phase, but patients who survived this period tended to have slower hazard progression. In contrast, the Kidney Catheter dataset showed a later change–point with a decrease in the scale parameter, indicating that the patients were initially relatively stable, but hazard increased more sharply beyond a certain time, aligning with the possibility of delayed complications such as infections. These findings demonstrate the flexibility and clinical relevance of the LLHM in capturing disease-specific survival dynamics.

We may further extend the log–logistic change–point model to test for the presence of multiple change-points in a dataset, as well as the presence of covariates, so as to enhance the applicability of this model to diverse fields.

Author Contributions

Study Conceptualization, S.S.N. and S.J.; Methodology, S.S.N. and V.U.; Software, S.S.N. and V.U.; Validation, S.S.N., V.U., and S.J.; Formal Analysis, S.S.N. and V.U.; Data Curation, S.S.N. and V.U.; Writing—Original Draft Preparation, S.S.N. and V.U.; Writing—Review and Editing, S.S.N., V.U., and S.J.; Supervision, S.J.; Funding Acquisition, V.U. and S.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The Kidney Catheter dataset used in this study is publicly available in the survival package in R and can be accessed using the kidney dataset object. The Acute Myeloid Leukemia dataset, referred to as LeukSurv, is available through the SurvSet package in R. Both datasets are open access and can be obtained directly from the respective R packages version 4.4.2.

Acknowledgments

The first author acknowledges the cooperation of the Administration of the Guru Nanak Khalsa College of Arts, Science & Commerce, Mumbai. The second author acknowledges the financial assistance received in the form of a research fellowship from the Indian Institute of Information Technology Allahabad (IIITA) for pursuing this research. The third author acknowledges the SEED grant received from IIITA to carry out this research.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AML	Acute Myeloid Leukemia
PDF	Probability Density Function
GM	Generalized Moments
LL2	2-parameter log–logistic
Ex-LL	Extended log–logistic
MDPDEs	Minimum Density Power Divergence Estimators
AFT	Accelerated Failure Time
PMLE	Profile Maximum Likelihood Estimation
ML	Maximum Likelihood
CDF	Cumulative Distribution Function
MSE	Mean Square Error
MEV	Mean Estimated Value
M-H	Metropolis–Hastings
MCMC	Markov Chain Monte Carlo
AR	Acceptance Rate
LLHM	log–logistic Hazard Change-Point Model
EHM	Exponential Hazard Change-Point Model
LHM	Lindley Hazard Change-Point Model
WHM	Weibull Hazard Change-Point Model
AIC	Akaike Information Criterion
BIC	Bayesian Information Criterion
$K – S$	Kolmogrov–Smirnov
$A - D$	Anderson–Darling