1. Introduction
The Cox proportional hazards (PH) model [
1,
2] is one of the most commonly applied survival analysis models since it effectively associates the covariates with the survival time through a hazard function. Since the hazard function is also known as the instantaneous risk of experiencing an event of interest, such as death, disease, or failure, given that the individual has survived to that point, the Cox PH model is also widely used in many areas, such as biology, medicine, and the social sciences.
The hazard function in the Cox PH model with time-independent covariates has the following functional form:
where
Z is a
vector of covariates,
is a
coefficient vector, and
is a baseline hazard function. In (
1), the proportional hazard and the log-linearity hazard are two key assumptions of practical interest. The proportional hazard suggests that the hazard ratio for different individuals only depends on the value of covariates and the model coefficient
. On the other hand, taking the logarithm of (
1),
The log-linearity hazard enables researchers to analyze the linear effect of every unit change in the covariate on the log of the hazard function.
While most of the existing literature focuses on testing the proportional hazards assumption [
3,
4], our primary focus is on validating the log-linearity assumption. This assumption is critical, as assessing its validity provides a foundation for exploring alternative monotonic effects when the log-linear effect may not hold. To this end, we propose a goodness-of-fit (GOF) test designed to evaluate the log-linearity assumption in the Cox PH model.
In the literature, martingale-based residual methods have been widely used to investigate and test the GOF for log-linearity. Therneau et al. [
5] introduced a graphical approach using smoothed martingale residuals to examine the functional form of covariate effects. Lin et al. [
6] proposed an analytical test for log-linearity based on partial cumulative sums of martingale residuals, providing asymptotic properties, such as the limiting Gaussian process of the residuals and the null distribution of the test statistics. However, the existing literature has not explicitly addressed testing log-linearity against monotonic effects.
In this work, we consider a univariate covariate
Z and a monotonic function
to explore the monotonic effect. Relaxing
in the Cox PH model to
, Chung et al. [
7] considered the isotonic PH model with a hazard function
which maintains the proportional hazard assumption. Similar to the estimation process in the traditional Cox PH model, Chung et al. [
7] developed nonparametric partial-likelihood-based estimation for
. Therefore, a ratio of partial likelihoods from the Cox and isotonic PH models is a natural test statistic for checking log-linearity. Inspired by Xu et al. [
8], we propose a bootstrapping method conditional on covariates and censoring time to determine the critical values.
The remaining sections are organized as follows.
Section 2 reviews the estimations in the Cox PH model and the isotonic proportional hazards models for a univariate time-independent covariate. In
Section 3, we propose the GOF test for the Cox PH model under monotonicity constraints. We provide a numerical study in
Section 4 to evaluate the performance of the GOF test. Lastly, in
Section 5, we apply the proposed GOF test in two data examples, including a lung cancer study conducted by the North Central Cancer Treatment Group [
9] and a breast cancer study conducted by the German Breast Cancer Study Group in the
R package
survival. All the R codes are available on GitHub
https://github.com/cftang9/LLGOF_UniCoxPH, accessed on 30 June 2025.
2. Partial Likelihood Estimators
We assume that the survival time follows a continuous distribution associating with a covariate for , for . Let denote the censoring time. We observe that with the censoring indicator , where I is an indicator function. We assume that and are independent and that follows a distribution G. Thus, the observed data consist of triplets .
A univariate Cox PH model relates the hazard of
and the covariate
as follows:
where
is a vector of coefficients and
is a baseline hazard function invariant of covariates. The parameter
is estimated by maximizing the partial likelihood
, given by
where
are the counting processes indicating that the event of interest occurs at or before time
t, and
are the at-risk processes that no events of interest or censoring occurs before time
t. The log-partial-likelihood, denoted by
, is given by
The maximum partial likelihood estimator (MPLE) is denoted by
and can be obtained numerically by the Newton–Raphson method.
When a monotonic covariate effect is preferable to a log-linearity effect in (
2), it is natural to relax from
to
, where
is monotonic. Hence,
can be relaxed to
which is also known as the isotonic PH model [
7]. Without loss of generality, we assume that
is a non-decreasing function. Under the isotonic PH model, the partial likelihood is slightly different. Define
as the
ith-order statistic of
. Given a monotonic function
, define
as the value of
at
for
, where
is the number of uncensored subjects, so that
. We further denote
as the vector of parameters
and
for
and
, which are the intervals formed by order statistics
. Chung et al. [
7] proposed the partial likelihood for the isotonic PH model (
5) as
where the counting process
and the at-risk process
, where
. Hence, the log partial likelihood can be defined accordingly as
Note that
for any constant
c. Due to the identifiability issue, restricting the monotonic function
satisfying
for some
suffices. In this work, we consider
as the median of the covariates
as in Chung et al. [
7]. They also developed a computationally efficient pseudo-iterative convex minorant algorithm to obtain the MPLE, denoted by
.
3. Goodness-of-Fit Test
Here, we propose a GOF test for the log-linearity in the Cox PH model (
2) under the isotonic PH model (
5). The null and alternative hypotheses,
and
, respectively, are defined as follows:
Given a sample
, we have MPLEs
and
of partial likelihood (
4) and (
6) under the Cox PH model (
2) and isotonic PH model (
5), respectively. Under
, it is natural to consider a restricted MPLE
. Plugging in the MPLEs into their corresponding partial likelihoods, we propose a partial-likelihood-based test statistic
Since the linear relationship
with
under
is a special case of a monotone relationship
under
, it is expected that
and
. Therefore, large values of
suggest rejecting
. On the other hand, if
in (
8) is decreasing in
Z, the restricted MPLE
can be replaced by a restricted MPLE
and
under
. Then, the test statistic is obtained by
and the log-linearity is rejected when
is too small.
We propose a bootstrap method to determine a critical value for
to reject
. Inspired by Xu et al. [
8], we consider a bootstrapping approach conditional on covariates and censoring times under the null hypothesis
. To generate survival time in the bootstrap samples, we apply the traditional inverse of the distribution function method to generate the bootstrap survival time. Note that a survival function defined by
of the Cox PH model is
where
is the cumulative baseline hazard function. Therefore, provided by existences, an inverse of
is given by
for
, where
is the inverse of
.
Since the estimators of
and
are required when estimating
, we first consider the Breslow-type estimator [
10]
and the corresponding inverse defined by
for
. Note that
is a step function such that the bootstrapped survival times generated from it may contain ties, potentially reducing the efficiency and stability of the proposed test. To address this issue, we consider a continuous and piecewise linear version of
, denoted by
, to smooth
and avoid ties of the bootstrapped survival times and enhance the performance of the test. Specifically, denote
, and
are the locations of jump points of
such that
for
. Then, we define
for
,
, and
for
. The corresponding inverse is defined by
Therefore, in the
bth bootstrapped sample, we generate the survival times
for the
, where
are random variables independently and identically generated from a univariate uniform distribution with support
, denoted by
.
Based on the censoring status
, we generate the bootstrapped censoring time for the
ith subject in the
bth bootstrapped sample, denoted by
. If the survival time is censored (
), obtain the bootstrapped censoring time as the original censoring time with
. If the survival time is observed (
), we again apply the inverse distribution method from the estimated censoring distribution
G. We approximate
G by the traditional Kaplan–Meier estimator, denoted by
, with modified data
. Similar to bootstrapped survival times, to generate bootstrapped censoring times, we smooth and interpolate
linearly and obtain
. Then, we obtain the corresponding inverse
where
,
are the locations of jump points of
such that
for
. Hence, we obtain
where
is independently and identically generated from
and is independent from
.
In summary, given data
, we independently generate
B bootstrapped survival and censoring times,
and
, from (
10) and (
11), respectively, and obtain
and
for
. Using each bootstrapped sample,
, we calculate the bootstrapped test statistic
. The critical value, denoted by
, is approximated by the
αth upper quantile of
. We reject the null hypothesis when
.
5. Illustrations with Real Data
5.1. German Breast Cancer Study
Breast cancer is the most common cancer among women worldwide, and although it predominantly affects women, men can also develop breast cancer. It can profoundly impact a person’s life, but with advances in medical research and treatment, the prognosis for many patients has improved significantly. Early detection, comprehensive treatment plans, and ongoing support can help individuals and their families manage the challenges posed by breast cancer and improve their overall quality of life.
Here, we analyze the gbsg dataset in the R package survival, which is also known as the German Breast Cancer Study Group (GBSG) dataset. It comes from a clinical trial conducted by the German Breast Cancer Study Group. The dataset contains information on 686 patients with primary node-positive breast cancer who were treated at 17 centers in Germany between 1984 and 1989. In the dataset, 387 of 686 patients were censored, so the censoring rate is . Except for the survival time and status of whether a patient is censored, the dataset includes several variables, such as age, tumor size, and the number of positive lymph nodes.
Here, we wish to test the log-linearity assumption for an important prognostic factor in breast cancer: the number of positive lymph nodes. As suggested by Fitzgibbons et al. [
11], a higher number of positive lymph nodes is associated with an increased risk of breast cancer recurrence and a higher risk of death from the disease. We assume the monotonicity constraints to be satisfied and conduct the GOF test with hypotheses
where
Z is the number of positive lymph nodes. For the estimation of
, we choose
such that
. With the bootstrap sample size of 500, we obtain the test statistic
and bootstrap critical value
. We reject the null hypothesis that the log-linearity of the hazard associated with the number of positive lymph nodes is satisfied.
In
Figure 2, we present shifted log-hazard effect
and
, so
at
. This normalization mitigates identifiability issues and facilitates clearer graphical comparisons. From
Figure 2,
from the isotonic PH model reveals a faster increase in hazard for a smaller number of positive nodes and deviates from the log-linear effect
with constant slope
. In addition, the shape of estimate
motivates a square-root transformed covariate
in the Cox PH model with hazard
for some
. For comparisons, we added
, where
is the corresponding restricted MPLE of
, in
Figure 2. Compared with
, the square-root transformed
captures the faster increase in hazard for a smaller number of positive nodes but a slower increase for a larger number of positive nodes. To further evaluate the square-root transformation, we apply the proposed GOF test with the following hypotheses:
With test statistic
and an estimated critical value
from
bootstrapped samples, the evidence is not strong enough to reject
and support the appropriateness of
in the Cox PH model. This finding is consistent with the approach of Royston and Altman [
12], which performed a square-root transformation on the number of nodes when applying the Cox PH model.
5.2. NCCTG Lung Cancer Data
While progress has been made in recent years, particularly in the areas of early detection, targeted therapies, and immunotherapies, lung cancer remains a challenging problem. Lung cancer holds the top position in cancer-related deaths across the globe, resulting in a higher number of fatalities than the sum of breast, prostate, and colorectal cancer deaths. Compared to early-stage lung cancer, advanced lung cancer, which refers to lung cancer that has progressed to a stage where it has spread beyond the lungs to other parts of the body or has become locally extensive, affecting nearby tissues and structures, typically has a poorer prognosis.
Here, we analyze the
cancer dataset in the
R package
survival, which is collected from a study conducted by the North Central Cancer Treatment Group (NCCTG) on patients with advanced lung cancer. As mentioned by Loprinzi et al. [
9], this study aimed to assess if the prognostic information gathered from a patient-completed questionnaire could offer independent insights beyond those already obtained by the patient’s physician through descriptive data. The Karnofsky Performance Score (KPS), rated by patients, assesses a patient’s ability to perform routine daily tasks and activities. The KPS was developed by Karnofsky [
13] as a method for evaluating a patient’s functional status, particularly in assessing the response to chemotherapeutic agents in cancer treatment. The score ranges from 0 to 100, with higher scores indicating better functional ability. We consider the relationship between the hazard of failure and the KPS to be decreasing. Therefore, we can apply the GOF test for the Cox PH model to check if the log-linearity assumption is satisfied for the KPS. The hypotheses are as follows:
where
Z represents the KPS rated by the patient. In the dataset, 63 of the 225 patients were censored after removing the records with missing values. Here, we choose
such that
. With the bootstrap sample size of 500, we obtain the test statistic
and bootstrap critical value
. We fail to reject the null hypothesis that the log-linearity assumption between the hazard of death and the Karnofsky Performance Score is satisfied. The shifted log-linear hazard effect
and log-monotonic hazard effect
are presented in
Figure 3. These two log-hazard effects are close overall, also suggesting that the Cox PH model is a reasonable choice for studying the hazard through the patients’ self-rated KPS.
6. Discussion
In this work, we propose a GOF test for evaluating the log-linearity effect of a univariate covariate in the traditional Cox PH model framed within the isotonic PH model. The bootstrapped critical values used in the test demonstrate well-controlled type I error rates and strong power for detecting deviations from log-linearity. In addition, when the proposed GOF test rejects the log-linearity, from the estimated
plot, one can propose a transformation
and perform the GOF test for
to check if the transformed
is appropriate or further monotonic transformation is needed.
As shown in
Section 4.1, the proposed test appears to be conservative, indicating potential for improvement. In addition to the robustness assessment in
Section 4.2, we further provide more numerical results for a higher censoring rate of
and discrete censoring distributions in the
Supplementary Material. As expected, a higher censoring rate leads to more conservative tests; however, the tests remain valid since the power approaches 1 as the sample size increases. On the other hand, we observe that a uniform discrete censoring distribution that mildly discretizes the continuous uniform censoring distribution has a minor impact on the rejection rates. In addition, exploring the asymptotic properties and theoretical justifications is crucial for improving our understanding of the test statistic’s distribution and refining the choice of critical values. Investigating how the shape of the monotonic function affects power could also yield valuable insights guided by these theoretical advancements.
Despite these challenges, extending the proposed partial-likelihood-based GOF test to handle multiple covariates or a partial linear PH model with monotonic effects [
7] is a natural and promising direction. However, such extensions require careful consideration and warrant further investigation. Finally, resolving the open problem of understanding the distribution of the log partial likelihood in (
7), even for univariate covariates, remains a challenging and necessary task.