Abstract
The Cox proportional hazards (PH) model is widely used because it models the covariates to the hazard through a log-linear effect. However, exploring flexible effects becomes desirable within the Cox PH framework when only a monotonic relationship between covariates and the hazard is assumed. This work proposes a partial-likelihood-based goodness-of-fit (GOF) test to assess the log-linear effect assumption in a univariate Cox PH model. Rejection of log-linearity suggests the need to incorporate monotonic and non-log-linear covariate effects on the hazard. Our simulation studies show that the proposed GOF test controls type I error rates and exhibits consistency across various scenarios. We illustrate the proposed GOF test with two datasets, breast cancer data and lung cancer data, to assess the presence of log-linear effects in the Cox PH model.
Keywords:
bootstrap; isotonic proportional hazard model; isotonic regression; Kaplan–Meier estimator; likelihood ratio test; partial likelihood MSC:
92B15
1. Introduction
The Cox proportional hazards (PH) model [1,2] is one of the most commonly applied survival analysis models since it effectively associates the covariates with the survival time through a hazard function. Since the hazard function is also known as the instantaneous risk of experiencing an event of interest, such as death, disease, or failure, given that the individual has survived to that point, the Cox PH model is also widely used in many areas, such as biology, medicine, and the social sciences.
The hazard function in the Cox PH model with time-independent covariates has the following functional form:
where Z is a vector of covariates, is a coefficient vector, and is a baseline hazard function. In (1), the proportional hazard and the log-linearity hazard are two key assumptions of practical interest. The proportional hazard suggests that the hazard ratio for different individuals only depends on the value of covariates and the model coefficient . On the other hand, taking the logarithm of (1),
The log-linearity hazard enables researchers to analyze the linear effect of every unit change in the covariate on the log of the hazard function.
While most of the existing literature focuses on testing the proportional hazards assumption [3,4], our primary focus is on validating the log-linearity assumption. This assumption is critical, as assessing its validity provides a foundation for exploring alternative monotonic effects when the log-linear effect may not hold. To this end, we propose a goodness-of-fit (GOF) test designed to evaluate the log-linearity assumption in the Cox PH model.
In the literature, martingale-based residual methods have been widely used to investigate and test the GOF for log-linearity. Therneau et al. [5] introduced a graphical approach using smoothed martingale residuals to examine the functional form of covariate effects. Lin et al. [6] proposed an analytical test for log-linearity based on partial cumulative sums of martingale residuals, providing asymptotic properties, such as the limiting Gaussian process of the residuals and the null distribution of the test statistics. However, the existing literature has not explicitly addressed testing log-linearity against monotonic effects.
In this work, we consider a univariate covariate Z and a monotonic function to explore the monotonic effect. Relaxing in the Cox PH model to , Chung et al. [7] considered the isotonic PH model with a hazard function
which maintains the proportional hazard assumption. Similar to the estimation process in the traditional Cox PH model, Chung et al. [7] developed nonparametric partial-likelihood-based estimation for . Therefore, a ratio of partial likelihoods from the Cox and isotonic PH models is a natural test statistic for checking log-linearity. Inspired by Xu et al. [8], we propose a bootstrapping method conditional on covariates and censoring time to determine the critical values.
The remaining sections are organized as follows. Section 2 reviews the estimations in the Cox PH model and the isotonic proportional hazards models for a univariate time-independent covariate. In Section 3, we propose the GOF test for the Cox PH model under monotonicity constraints. We provide a numerical study in Section 4 to evaluate the performance of the GOF test. Lastly, in Section 5, we apply the proposed GOF test in two data examples, including a lung cancer study conducted by the North Central Cancer Treatment Group [9] and a breast cancer study conducted by the German Breast Cancer Study Group in the R package survival. All the R codes are available on GitHub https://github.com/cftang9/LLGOF_UniCoxPH, accessed on 30 June 2025.
2. Partial Likelihood Estimators
We assume that the survival time follows a continuous distribution associating with a covariate for , for . Let denote the censoring time. We observe that with the censoring indicator , where I is an indicator function. We assume that and are independent and that follows a distribution G. Thus, the observed data consist of triplets .
A univariate Cox PH model relates the hazard of and the covariate as follows:
where is a vector of coefficients and is a baseline hazard function invariant of covariates. The parameter is estimated by maximizing the partial likelihood , given by
where are the counting processes indicating that the event of interest occurs at or before time t, and are the at-risk processes that no events of interest or censoring occurs before time t. The log-partial-likelihood, denoted by , is given by
The maximum partial likelihood estimator (MPLE) is denoted by and can be obtained numerically by the Newton–Raphson method.
When a monotonic covariate effect is preferable to a log-linearity effect in (2), it is natural to relax from to , where is monotonic. Hence, can be relaxed to
which is also known as the isotonic PH model [7]. Without loss of generality, we assume that is a non-decreasing function. Under the isotonic PH model, the partial likelihood is slightly different. Define as the ith-order statistic of . Given a monotonic function , define as the value of at for , where is the number of uncensored subjects, so that . We further denote as the vector of parameters and for and , which are the intervals formed by order statistics . Chung et al. [7] proposed the partial likelihood for the isotonic PH model (5) as
where the counting process and the at-risk process , where . Hence, the log partial likelihood can be defined accordingly as
Note that for any constant c. Due to the identifiability issue, restricting the monotonic function satisfying for some suffices. In this work, we consider as the median of the covariates as in Chung et al. [7]. They also developed a computationally efficient pseudo-iterative convex minorant algorithm to obtain the MPLE, denoted by .
3. Goodness-of-Fit Test
Here, we propose a GOF test for the log-linearity in the Cox PH model (2) under the isotonic PH model (5). The null and alternative hypotheses, and , respectively, are defined as follows:
Given a sample , we have MPLEs and of partial likelihood (4) and (6) under the Cox PH model (2) and isotonic PH model (5), respectively. Under , it is natural to consider a restricted MPLE . Plugging in the MPLEs into their corresponding partial likelihoods, we propose a partial-likelihood-based test statistic
Since the linear relationship with under is a special case of a monotone relationship under , it is expected that and . Therefore, large values of suggest rejecting . On the other hand, if in (8) is decreasing in Z, the restricted MPLE can be replaced by a restricted MPLE and under . Then, the test statistic is obtained by and the log-linearity is rejected when is too small.
We propose a bootstrap method to determine a critical value for to reject . Inspired by Xu et al. [8], we consider a bootstrapping approach conditional on covariates and censoring times under the null hypothesis . To generate survival time in the bootstrap samples, we apply the traditional inverse of the distribution function method to generate the bootstrap survival time. Note that a survival function defined by of the Cox PH model is where is the cumulative baseline hazard function. Therefore, provided by existences, an inverse of is given by for , where is the inverse of .
Since the estimators of and are required when estimating , we first consider the Breslow-type estimator [10]
and the corresponding inverse defined by for . Note that is a step function such that the bootstrapped survival times generated from it may contain ties, potentially reducing the efficiency and stability of the proposed test. To address this issue, we consider a continuous and piecewise linear version of , denoted by , to smooth and avoid ties of the bootstrapped survival times and enhance the performance of the test. Specifically, denote , and are the locations of jump points of such that for . Then, we define for , , and for . The corresponding inverse is defined by
Therefore, in the bth bootstrapped sample, we generate the survival times
for the , where are random variables independently and identically generated from a univariate uniform distribution with support , denoted by .
Based on the censoring status , we generate the bootstrapped censoring time for the ith subject in the bth bootstrapped sample, denoted by . If the survival time is censored (), obtain the bootstrapped censoring time as the original censoring time with . If the survival time is observed (), we again apply the inverse distribution method from the estimated censoring distribution G. We approximate G by the traditional Kaplan–Meier estimator, denoted by , with modified data . Similar to bootstrapped survival times, to generate bootstrapped censoring times, we smooth and interpolate linearly and obtain . Then, we obtain the corresponding inverse
where , are the locations of jump points of such that for . Hence, we obtain
where is independently and identically generated from and is independent from .
In summary, given data , we independently generate B bootstrapped survival and censoring times, and , from (10) and (11), respectively, and obtain and for . Using each bootstrapped sample, , we calculate the bootstrapped test statistic . The critical value, denoted by , is approximated by the αth upper quantile of . We reject the null hypothesis when .
4. Simulation
4.1. Size and Power Study
We set a significant level of for Section 4 and Section 5. To evaluate the size and power of the proposed GOF test, we consider seven different functions over in the isotonic PH model in (5) with the constant baseline hazard . Three effect functions are linear—, and , satisfying —and the remaining four correspond to nonlinear increasing functions: , , and , satisfying . The covariates were independently generated from scaled and shifted beta distributions over the support with densities proportional to for , with . We consider three covariate distribution scenarios, denoted by , , and , corresponding to parameter pairs , and , respectively. In scenario , covariate values are concentrated near the center of the support. In contrast, scenario emphasizes values near the boundaries, while scenario results in a uniform distribution of covariates across the support.
For each scenario, 500 Monte Carlo samples were generated to approximate the probabilities of rejecting . We consider to assess the effect of sample size on test performance. Censoring times are drawn from a uniform distribution with such that the censoring rate is about . For each sample, critical values for the proposed GOF test are computed using bootstrap replications conditional on covariates and censoring times. We compare the proposed GOF test with the commonly applied residual-based GOF test proposed by Lin et al. [6], denoted by , which does not rely on the isotonic PH assumption.
Under , according to the results in Table 1, all the type-I error probabilities are below and do not significantly exceed the nominal level , with a margin of error of for 500 Monte Carlo samples at a confidence level of , where is the 99.5th quantile of the standard normal distribution. Regarding the proposed test , the case with generally exhibits slightly larger type I error probabilities compared to and . Conversely, does not exhibit systematic patterns. Furthermore, neither nor demonstrates notable differences across the covariate distributions considered.
Table 1.
Rejection rates for the GOF tests and with covariate distributions , , and .
Under , both and exhibit powers approaching 1 as sample size n increases, suggesting that both tests are capable of detecting deviations from log-linearity in the hazard function with high probability for large sample sizes. Among the considered covariate distributions, generally leads to the highest power, followed by and then , since there is greater curvature near the boundaries, which aligns well with the boundary-focused distribution . Notably, shows greater power for small sample sizes for , , and . When the sample size grows, continues to outperform under when the covariates focus more around the center of the support, where is not as curved around boundaries. Conversely, demonstrates higher power under , while, under , the performances of and are generally comparable, with neither test consistently dominating. Lastly, for , although the power of tends to be lower than , it remains competitive across most of the covariate distributions and sample sizes.
4.2. Evaluation of Robustness Across Baseline Hazards
We evaluate the robustness of the tests and under different baseline hazard functions, using the effect functions and uniform covariate distribution as in Section 4.1. Specifically, we consider baseline hazards generated from Gompertz distributions, denoted by G, with shape parameter and scale parameter b. Two Gompertz models are used: G and G. In contrast to the constant baseline hazard from the exponential distribution with mean 1, both G and G have increasing hazard functions, while G increases faster. The distinct shapes of these hazard functions are illustrated in Figure 1.
Figure 1.
Hazard functions for the exponential distribution with mean 1 and the Gompertz distributions G and G.
Under , Table 2 shows that both and have similar type I error probabilities lower than , maintaining well-controlled type-I error rates, as shown in the size study in Section 4.1. Under , the test demonstrates robustness between different baseline hazard functions, with most power differences between G and G remaining below . In contrast, is more sensitive to the changes in the baseline hazards, with power differences up to and when with and 1000, respectively.
Table 2.
Rejection rates for the GOF tests and under baseline hazards G and G. Power differences of at least 0.05 between the two settings are indicated with an asterisk (*), and differences exceeding 0.12 are further marked with a dagger (†).
5. Illustrations with Real Data
5.1. German Breast Cancer Study
Breast cancer is the most common cancer among women worldwide, and although it predominantly affects women, men can also develop breast cancer. It can profoundly impact a person’s life, but with advances in medical research and treatment, the prognosis for many patients has improved significantly. Early detection, comprehensive treatment plans, and ongoing support can help individuals and their families manage the challenges posed by breast cancer and improve their overall quality of life.
Here, we analyze the gbsg dataset in the R package survival, which is also known as the German Breast Cancer Study Group (GBSG) dataset. It comes from a clinical trial conducted by the German Breast Cancer Study Group. The dataset contains information on 686 patients with primary node-positive breast cancer who were treated at 17 centers in Germany between 1984 and 1989. In the dataset, 387 of 686 patients were censored, so the censoring rate is . Except for the survival time and status of whether a patient is censored, the dataset includes several variables, such as age, tumor size, and the number of positive lymph nodes.
Here, we wish to test the log-linearity assumption for an important prognostic factor in breast cancer: the number of positive lymph nodes. As suggested by Fitzgibbons et al. [11], a higher number of positive lymph nodes is associated with an increased risk of breast cancer recurrence and a higher risk of death from the disease. We assume the monotonicity constraints to be satisfied and conduct the GOF test with hypotheses
where Z is the number of positive lymph nodes. For the estimation of , we choose such that . With the bootstrap sample size of 500, we obtain the test statistic and bootstrap critical value . We reject the null hypothesis that the log-linearity of the hazard associated with the number of positive lymph nodes is satisfied.
In Figure 2, we present shifted log-hazard effect and , so at . This normalization mitigates identifiability issues and facilitates clearer graphical comparisons. From Figure 2, from the isotonic PH model reveals a faster increase in hazard for a smaller number of positive nodes and deviates from the log-linear effect with constant slope . In addition, the shape of estimate motivates a square-root transformed covariate in the Cox PH model with hazard for some . For comparisons, we added , where is the corresponding restricted MPLE of , in Figure 2. Compared with , the square-root transformed captures the faster increase in hazard for a smaller number of positive nodes but a slower increase for a larger number of positive nodes. To further evaluate the square-root transformation, we apply the proposed GOF test with the following hypotheses:
Figure 2.
Shifted log-hazard effect estimates for the breast cancer dataset. The plot includes from a Cox PH model, from the isotonic PH model, and from a Cox PH model with the square-root transformed covariate .
With test statistic and an estimated critical value from bootstrapped samples, the evidence is not strong enough to reject and support the appropriateness of in the Cox PH model. This finding is consistent with the approach of Royston and Altman [12], which performed a square-root transformation on the number of nodes when applying the Cox PH model.
5.2. NCCTG Lung Cancer Data
While progress has been made in recent years, particularly in the areas of early detection, targeted therapies, and immunotherapies, lung cancer remains a challenging problem. Lung cancer holds the top position in cancer-related deaths across the globe, resulting in a higher number of fatalities than the sum of breast, prostate, and colorectal cancer deaths. Compared to early-stage lung cancer, advanced lung cancer, which refers to lung cancer that has progressed to a stage where it has spread beyond the lungs to other parts of the body or has become locally extensive, affecting nearby tissues and structures, typically has a poorer prognosis.
Here, we analyze the cancer dataset in the R package survival, which is collected from a study conducted by the North Central Cancer Treatment Group (NCCTG) on patients with advanced lung cancer. As mentioned by Loprinzi et al. [9], this study aimed to assess if the prognostic information gathered from a patient-completed questionnaire could offer independent insights beyond those already obtained by the patient’s physician through descriptive data. The Karnofsky Performance Score (KPS), rated by patients, assesses a patient’s ability to perform routine daily tasks and activities. The KPS was developed by Karnofsky [13] as a method for evaluating a patient’s functional status, particularly in assessing the response to chemotherapeutic agents in cancer treatment. The score ranges from 0 to 100, with higher scores indicating better functional ability. We consider the relationship between the hazard of failure and the KPS to be decreasing. Therefore, we can apply the GOF test for the Cox PH model to check if the log-linearity assumption is satisfied for the KPS. The hypotheses are as follows:
where Z represents the KPS rated by the patient. In the dataset, 63 of the 225 patients were censored after removing the records with missing values. Here, we choose such that . With the bootstrap sample size of 500, we obtain the test statistic and bootstrap critical value . We fail to reject the null hypothesis that the log-linearity assumption between the hazard of death and the Karnofsky Performance Score is satisfied. The shifted log-linear hazard effect and log-monotonic hazard effect are presented in Figure 3. These two log-hazard effects are close overall, also suggesting that the Cox PH model is a reasonable choice for studying the hazard through the patients’ self-rated KPS.
Figure 3.
Shifted log-hazard effect estimates for the NCCTG lung cancer data. The plot includes from a Cox PH model and from the isotonic PH model, and from a Cox PH model with the square-root transformed covariate .
6. Discussion
In this work, we propose a GOF test for evaluating the log-linearity effect of a univariate covariate in the traditional Cox PH model framed within the isotonic PH model. The bootstrapped critical values used in the test demonstrate well-controlled type I error rates and strong power for detecting deviations from log-linearity. In addition, when the proposed GOF test rejects the log-linearity, from the estimated plot, one can propose a transformation and perform the GOF test for
to check if the transformed is appropriate or further monotonic transformation is needed.
As shown in Section 4.1, the proposed test appears to be conservative, indicating potential for improvement. In addition to the robustness assessment in Section 4.2, we further provide more numerical results for a higher censoring rate of and discrete censoring distributions in the Supplementary Material. As expected, a higher censoring rate leads to more conservative tests; however, the tests remain valid since the power approaches 1 as the sample size increases. On the other hand, we observe that a uniform discrete censoring distribution that mildly discretizes the continuous uniform censoring distribution has a minor impact on the rejection rates. In addition, exploring the asymptotic properties and theoretical justifications is crucial for improving our understanding of the test statistic’s distribution and refining the choice of critical values. Investigating how the shape of the monotonic function affects power could also yield valuable insights guided by these theoretical advancements.
Despite these challenges, extending the proposed partial-likelihood-based GOF test to handle multiple covariates or a partial linear PH model with monotonic effects [7] is a natural and promising direction. However, such extensions require careful consideration and warrant further investigation. Finally, resolving the open problem of understanding the distribution of the log partial likelihood in (7), even for univariate covariates, remains a challenging and necessary task.
Supplementary Materials
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math13142264/s1.
Author Contributions
Methodology, H.C. and C.-F.T.; Software, H.C.; Writing—original draft, H.C. and C.-F.T.; Writing—review & editing, C.-F.T.; Visualization, H.C.; Supervision, C.-F.T.; Funding acquisition, C.-F.T. All authors have read and agreed to the published version of the manuscript.
Funding
This work was partially supported by the National Science Foundation [DMS 2311292].
Data Availability Statement
The original contributions presented in this study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| CMF | Cyclophosphamide, methotrexate, and fluorouracil |
| GBSG | German Breast Cancer Study Group |
| GOF | Goodness-of-fit |
| KPS | Karnofsky Performance Score |
| MPLE | Maximum partial likelihood estimator |
| NCCTG | North Central Cancer Treatment Group |
| PH | Proportional hazards |
References
- Cox, D.R. Regression models and life-tables. J. R. Stat. Soc. Ser. (Methodol.) 1972, 34, 187–202. [Google Scholar] [CrossRef]
- Cox, D.R. Partial likelihood. Biometrika 1975, 62, 269–276. [Google Scholar] [CrossRef]
- Grambsch, P.M.; Therneau, T.M. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 1994, 81, 515–526. [Google Scholar] [CrossRef]
- Therneau, T.M.; Grambsch, P.M. The Cox Model. In Modeling Survival Data: Extending the Cox Model; Springer: New York, NY, USA, 2000. [Google Scholar]
- Therneau, T.M.; Grambsch, P.M.; Fleming, T.R. Martingale-based residuals for survival models. Biometrika 1990, 77, 147–160. [Google Scholar] [CrossRef]
- Lin, D.Y.; Wei, L.J.; Ying, Z. Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika 1993, 80, 557–572. [Google Scholar] [CrossRef]
- Chung, Y.; Ivanova, A.; Hudgens, M.G.; Fine, J.P. Partial likelihood estimation of isotonic proportional hazards models. Biometrika 2018, 105, 133–148. [Google Scholar] [CrossRef] [PubMed]
- Xu, G.; Sen, B.; Ying, Z. Bootstrapping a change-point Cox model for survival data. Electron. J. Stat. 2014, 8, 1345. [Google Scholar] [CrossRef] [PubMed]
- Loprinzi, C.L.; Laurie, J.A.; Wieand, H.S.; Krook, J.E.; Novotny, P.J.; Kugler, J.W.; Bartel, J.; Law, M.; Bateman, M.; Klatt, N.E. Prospective evaluation of prognostic variables from patient-completed questionnaires. North Central Cancer Treatment Group. J. Clin. Oncol. 1994, 12, 601–607. [Google Scholar] [CrossRef] [PubMed]
- Breslow, N.E. Contribution to discussion of paper by DR Cox. J. R. Stat. Soc. Ser. B 1972, 34, 216–217. [Google Scholar]
- Fitzgibbons, P.L.; Page, D.L.; Weaver, D.; Thor, A.D.; Allred, D.C.; Clark, G.M.; Ruby, S.G.; O’Malley, F.; Simpson, J.F.; Connolly, J.L.; et al. Prognostic factors in breast cancer: College of American Pathologists consensus statement 1999. Arch. Pathol. Lab. Med. 2000, 124, 966–978. [Google Scholar] [CrossRef] [PubMed]
- Royston, P.; Altman, D.G. External validation of a Cox prognostic model: Principles and methods. BMC Med. Res. Methodol. 2013, 13, 1–15. [Google Scholar] [CrossRef] [PubMed]
- Karnofsky, D.A. The clinical evaluation of chemotherapeutic agents in cancer. In Evaluation of Chemotherapeutic Agents; Columbia University Press: New York, NY, USA, 1949; pp. 191–205. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).