Next Article in Journal
Remarks on Conjectures in Block Theory of Finite Groups
Previous Article in Journal
Chaotic Steady States of the Reinartz Oscillator: Mathematical Evidence and Experimental Confirmation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Proportional Odds Hazard Model for Discrete Time-to-Event Data

by
Maria Gabriella Figueiredo Vieira
1,
Marcílio Ramos Pereira Cardial
2,
Raul Matsushita
1 and
Eduardo Yoshio Nakano
1,*
1
Department of Statistics, University of Brasilia, Campus Darcy Ribeiro, Asa Norte, Brasília 70910-900, Brazil
2
Institute of Mathematical and Computer Sciences, University of São Paulo, São Carlos 13566-590, Brazil
*
Author to whom correspondence should be addressed.
Axioms 2023, 12(12), 1102; https://doi.org/10.3390/axioms12121102
Submission received: 31 October 2023 / Revised: 28 November 2023 / Accepted: 1 December 2023 / Published: 6 December 2023
(This article belongs to the Special Issue New Trends in Discrete Probability and Statistics)

Abstract

:
In this article, we present the development of the proportional odds hazard model for discrete time-to-event data. In this work, inferences about the model’s parameters were formulated considering the presence of right censoring and the discrete Weibull and log-logistic distributions. Simulation studies were carried out to check the asymptotic properties of the estimators. In addition, procedures for checking the proportional odds assumption were proposed, and the proposed model is illustrated using a dataset on the survival time of patients with low back pain.

1. Introduction

The proportional hazards model [1] is a regression model widely used in survival data analysis, whose main characteristic is that the covariates act multiplicatively on the hazard function. However, this characteristic cannot be met when survival times are discrete (intrinsically discrete or grouped into intervals) since the hazard function is limited in the interval (0,1). According to [2], the use of statistical methods that are specially designed for discrete times has many advantages. Indeed, [3] illustrated through simulation studies and application to real data that it is inadvisable to use a continuous model to analyze discrete data.
Given this situation and the importance of correctly treating discrete data to effectively model discrete time-to-event survival data, since the aforementioned model would not always be the most suitable, the proportional odds hazard model has been used with some frequency in the literature for this purpose. This model is an alternative version proposed by [1] to be used when the time-to-event data are discrete, with the covariates having a multiplicative effect on the odds hazard.
A comprehensive study of the model in which various link functions are considered is presented in [2], and the semiparametric extensions that the model can take on in [4]. Applications of this model are given in [5,6,7,8].
The popularization of this model is due, in part, to the fact that users do not invest effort in reporting the baseline hazard, which receives less attention in these studies. However, according to [9], the behavior of the hazard function is of potential medical interest because it is directly related to the course of a disease. To estimate this hazard function informatively (i.e., smoothly), a parametric model may be appropriate. In this context, parametric models in which the response variable is discrete to inform the baseline hazard of the model efficiently become fundamental, and in recent years a large number of research articles dealing with discrete distributions arising from the discretization of distributions of continuous random variables in a survival analysis context have emerged among these are: discrete Weibull distribution (DW) in [10,11], discrete Weibull geometric in [12], exponentiated discrete Weibull (EDW) in [13], discrete Gumbel in [14], discrete Burr in [15] and discrete log-logistic in [16].
This work aims to formulate the proportional odds hazard model considering the discrete Weibull and discrete log-logistic distributions as baseline distributions, as well as the estimation via maximum likelihood of the model’s parameters for right-censored data. The Weibull distribution was chosen due to its popularity in modeling survival data and the log-logistic distribution, allowing model data with non-monotonic hazards. The quality of the model’s fit was assessed using simulation studies. Finally, the proposed methodology was illustrated using a data set whose response variable is the number of unsuccessful sessions before pain relief or reduction in patients with low back pain [17].

2. Discrete Random Variables for Time-to-Event Data

Let T be a discrete random variable that takes on non-negative integer values ( T = 0 , 1 , 2 , ) , whose distribution function, survival function and hazard function are defined, respectively, by p ( t ) = P ( T = t ) , S ( t ) = P ( T > t ) and h ( t ) = P ( T = t | T t ) , t = 0 , 1 , . Other relationships can be established from the functions mentioned, such as:
S ( t ) = P ( T > t ) = k = t + 1 P ( T = k ) I { t = 0 , 1 , 2 , } ,
h ( t ) = p ( t ) S ( t ) + p ( t ) I { t = 0 , 1 , 2 , } ,
p ( t ) = 1 S ( 0 ) I { t = 0 } S ( t 1 ) S ( t ) 1 I { t = 0 } I { t = 0 , 1 , 2 , } = h ( 0 ) I { t = 0 } h ( t ) S ( t 1 ) 1 I { t = 0 } I { t = 0 , 1 , 2 , }
and
S ( t ) = k = 0 t [ 1 h ( k ) ] I { t = 0 , 1 , 2 , } .
In Equations (1) and (4), t denotes the largest integer less than or equal to t. More details on the functions and relationships presented can be found in [2].

2.1. Discrete Weibull Distribution (DW)

The discrete Weibull distribution (DW) was first proposed by [10]. Denoted by T DW ( q , η ) , 0 < q < 1 and η > 0 , its probability function is given by:
p d w ( t | q , η ) = ( q t η q ( t + 1 ) η ) I { t = 0 , 1 , 2 , } .
The survival and hazard functions of the DW, obtained from Equations (1) and (2), are expressed respectively by:
S d w ( t | q , η ) = q ( t + 1 ) η I { t 0 }
and
h d w ( t | q , η ) = q t η q ( t + 1 ) η q t η I { t = 0 , 1 , 2 , } .
In Equation (6), t denotes the largest integer less than or equal to t.
According to [11], the DW hazard function has different shapes that are directly linked to its shape parameter η , i.e., when η > 1 , the hazard function is strictly increasing; η < 1 , the hazard function is strictly decreasing; η = 1 , the hazard function is constant, in which case the DW is reduced to the geometric distribution, which is a discrete analog of the exponential distribution [3].

Discrete Log-Logistic Distribution (DLL)

Let T be a discrete random variable that follows a discrete log-logistic distribution (DLL) which is the discrete analog of the continuous log-logistic distribution, with some important results presented by [16], with parameters α > 0 and η > 0 , denoted by T DLL ( α , η ) , the probability, survival, and hazard function are given by:
p d l l ( t | α , η ) = 1 1 + ( t / α ) η 1 1 + [ ( t + 1 ) / α ] η I { t = 0 , 1 , 2 , } ,
S d l l ( t | α , η ) = 1 1 + [ ( t + 1 ) / α ] η I { t 0 } ,
and
h d l l ( t | α , η ) = 1 1 + ( t / α ) η 1 + [ ( t + 1 ) / α ] η I { t = 0 , 1 , 2 , } .
In Equation (9), t denotes the largest integer less than or equal t. According to [14], the DLL is a particular case of the discrete Burr distribution studied by [15], which is the discrete analog of the continuous Burr distribution.

3. Materials and Methods

3.1. Proportional Odds Hazard Model for Discrete Time-to-Event

Let T be a discrete non-negative random variable that represents the time until the occurrence of the event of interest follows the proportional odds hazard model (POHM) if [1]:
h ( t | z ) 1 h ( t | z ) = exp { z β } h 0 ( t ) 1 h 0 ( t ) ,
where h 0 ( · ) is the baseline hazard function and β = ( β 1 , , β p ) is the vector of coefficients associated with the vector of covariates z = ( z 1 , , z p ) .
Note that the intercept β 0 does not appear in the linear predictor because the baseline hazard function, h 0 ( t ) , absorbs this constant term. This model is a discrete version of the Cox proportional hazards model to cover the possibility of an appreciable number of draws.
From expression (11), it is possible to establish the hazard function in the presence of covariates:
h ( t | z ) = exp { z β } h 0 ( t ) 1 + ( exp { z β } 1 ) h 0 ( t ) I { t = 0 , 1 , 2 , } .
From (4) and (12), the survival function in the presence of covariates can be written as:
S ( t | z ) = k = 0 t 1 h 0 ( k ) 1 + ( exp { z β } 1 ) h 0 ( k ) I { t 0 } ,
where, t denotes the largest integer less than or equal to t.
Furthermore, using expressions (3) and (4), the probability function in the presence of covariates is:
p ( t | z ) = exp { z β } h 0 ( 0 ) 1 + ( exp { z β } 1 ) h 0 ( 0 ) I { t = 0 } × exp { z β } h 0 ( t ) 1 + ( exp { z β } 1 ) h 0 ( t ) k = 0 t 1 1 h 0 ( k ) 1 + ( exp { z β } 1 ) h 0 ( k ) 1 I { t = 0 } I { t = 0 , 1 , 2 , } .
To estimate the parameters of the proportional odds hazard model, consider an observed random sample ( t 1 , t 2 , , t n ) with its respective censoring indicators ( δ 1 , δ 2 , , δ n ), where δ i = 1 if t i is a failure time and δ i = 0 if is a right-censored time and z i = ( z i 1 , z i 2 , , z i p ) the covariates vector of individual i, i = 1 , 2 , , n . The model’s likelihood function, where ξ represents the vector of parameters of the baseline distribution, is given by:
L ( ξ , β ; t , δ , z ) i = 1 n exp { z i β } h 0 ( t i | ξ ) 1 + ( exp { z i β } 1 ) h 0 ( t i | ξ ) k = 0 t i 1 1 h 0 ( k | ξ ) 1 + ( exp { z i β } 1 ) h 0 ( k | ξ ) 1 I { t i = 0 } δ i × exp { z i β } h 0 ( 0 | ξ ) 1 + ( exp { z i β } 1 ) h 0 ( 0 | ξ ) I { t i = 0 } δ i k = 0 t i 1 h 0 ( k | ξ ) 1 + ( exp { z i β } 1 ) h 0 ( k | ξ ) ( 1 δ i )
Applying the logarithm to the likelihood function (15), we get:
( ξ , β ; t , δ , z ) = i = 1 n 1 I { t i = 0 } δ i log exp { z i β } h 0 ( t i | ξ ) 1 + ( exp { z i β } 1 ) h 0 ( t i | ξ ) + k = 0 t i 1 log 1 h 0 ( k | ξ ) 1 + ( exp { z i β } 1 ) h 0 ( k | ξ ) + i = 1 n I { t i = 0 } δ i log exp { z i β } h 0 ( 0 | ξ ) 1 + ( exp { z i β } 1 ) h 0 ( 0 | ξ ) + i = 1 n ( 1 δ i ) k = 0 t i log 1 h 0 ( k | ξ ) 1 + ( exp { z i β } 1 ) h 0 ( k | ξ ) + c ,
where c is a constant that does not depend on ξ and β .
The likelihood equation is given by:
U ( ϑ ) = ( ϑ ) ϑ = 0 .
Thus, the value ϑ ^ = ( ξ ^ , β ^ ) , that satisfies Equation (17), is the maximum likelihood estimator of the POHM, which under appropriate regularity conditions has a multivariate normal asymptotic distribution with mean ϑ and variance and covariance matrix given by:
Σ ( ϑ ^ ) = 2 ( ϑ ) ϑ ϑ T ϑ = ϑ ^ 1 = J ( ϑ ) ϑ = ϑ ^ 1 .
The ϑ ^ = ( ξ ^ , β ^ ) and the observed matrix J ( ϑ ) can be obtained numerically using computational optimization methods using the Newton-Raphson type algorithm, which provides an accurate numerical approximation for this matrix. From these results, it is possible to construct confidence intervals for the parameters and carry out significance tests on the POHM covariates.
When considering the model presented in (11), by assigning the baseline hazard function to the hazard function of DW (7), DW with η = 1 and DLL (10), we obtain the proportional odds hazard model: discrete Weibull (POHM-DW), geometric (POHM-G) and discrete log-logistic (POHM-DLL), which will be studied in the following subsections.

3.2. Verification of the Proportional Odds Hazard Assumption

The model proposed in (11), assumes that the odds hazard for two individuals are proportional. Considering a discrete non-negative random variable T and a dichotomous covariate z that takes on the values 0 and 1, the model assumes that:
h ( t | z = 1 ) 1 h ( t | z = 1 ) = θ h ( t | z = 0 ) 1 h ( t | z = 0 ) ,
where h ( · ) is the hazard function and θ is the proportionality constant that does not depend on t. Let g l ( t ) be the odds hazard function of an individual with covariate z = l ; l = 0 , 1 , expressed by:
g l ( t ) = h ( t | z = l ) 1 h ( t | z = l ) , l = 0 , 1 .
The function G l ( . ) is, in turn, the cumulative odds hazard function given by:
G l ( t ) = u = 0 t g l ( u ) = u = 0 t h ( u | z = l ) 1 h ( u | z = l ) , l = 0 , 1 .
Note that, under the assumption of odds proportional hazard, expressions (19) and (21), it follows that:
G 1 ( t ) = θ G 0 ( t ) .
Applying the logarithm to both sides of the equality in (22), we get:
log ( G 1 ( t ) ) = log ( θ ) + log ( G 0 ( t ) ) .
Therefore, the relationship between log ( G 1 ( t ) ) and log ( G 0 ( t ) ) is a straight line with the angular coefficient, m 1 , equal to 1 and the linear coefficient m 0 = log ( θ ) , i.e.,
log ( G 1 ( t ) ) = m 0 + m 1 log ( G 0 ( t ) ) .
Thus, the assumption of proportional odds hazard can be verified graphically by fitting a simple regression line with an angular coefficient, m 1 , equal to one (fixed). In this way we can plot the graph of points formed by the coordinates ( log ( G 0 ( t ) ) ,   log ( G 1 ( t ) ) ) , and the expected behavior is that the points formed by the coordinates are close to this regression line.
Graphical analysis is very informative, and for a given assessment for decision-making to be complete, it is advisable to have a measure of evidence. Thus, when considering expression (24), a hypothesis test can be used to check whether the odds hazards are proportional to each other. Thus, if t ( j ) , with j = 1 , 2 , , J , is the j-th distinct time observed (censored or uncensored), the verification can be conducted by testing the hypothesis that the angular coefficient of the straight line is different from one ( m 1 1 ). Thus, the hypotheses of interest are described by:
H 0 : m 1 = 1 v s . H 1 : m 1 1 .
The statistical test of the hypothesis (25) is given by:
M = m 1 ^ 1 j = 1 J ( x j x ¯ ) 2 ( J 2 ) j = 1 J ( y j y ¯ ) 2 ,
where m 1 ^ = J j = 1 J x j y j j = 1 J x j j = 1 J y j J j = 1 J x j 2 j = 1 J x j 2 , x ¯ = j = 1 J x j J and y ¯ = j = 1 J y j J with x j = log ( O 0 ( t j ) ) and y j = log ( O 1 ( t j ) ) . Assuming normality of log ( O 1 ( t ) ) , M follows a Student’s t distribution with J 2 degrees of freedom.
The procedures for checking the assumption of proportional odds hazard presented here can be easily extended to categorical covariates with three or more levels, comparing each level of the covariate two by two. In the case of numerical covariates, the same method can be adopted when categorizing the covariate to be verified.
In case of lack of proportionality, the POHM might not perform optimally. In these cases, other regression models for discrete data can be considered (see for examples Equations (27) and (28)).

4. Simulation Study

This section describes a simulation study to evaluate the behavior of the maximum likelihood estimators of the POHM-DW and POHM-G models. The study was conducted using data simulated in the R software [18], and the survival times of these models were generated using the inverse transformation method. For more details, see [19].
The survival time samples were simulated, considering two covariates: a numerical covariate, Z 1 , with a standard normal distribution and a dichotomous covariate, Z 2 , generated from a Bernoulli distribution with a probability of success p = 0.5 , the various parameters used take into account the baseline hazard of a WD and geometric distribution (particular case of WD considering η = 1 ), more specifically the parameters of the two scenarios are shown in Table 1.
To assess the behavior of the parameter estimators, the histograms of the parameter estimates of the different scenarios resulting from 1000 Monte Carlo replications will be evaluated for different sample sizes, i.e., n = 30, 50, 100, 250 and 500.
The mean of the parameter estimates, the mean squared error (MSE), and the coverage probability (CP) are shown alongside the above graphs. To construct the confidence intervals for calculating the CP, a confidence level of 0.95 was used. In addition, for the parameters of the probability distributions (q and η ), which are limited in parametric space, it is interesting to transform them to make them unrestricted. The appropriate transformations were made to the following parameters to construct the confidence intervals, as described by [13].
The results from 1000 Monte Carlo replication that refer to the estimator q, η , β 1 and β 2 are shown in Figure 1, Figure 2, Figure 3 and Figure 4 respectively.
When evaluating the estimators in general, it can be seen that the mean estimates are approximately equal to the respective true parameter values, regardless of the scenario and sample size. For the estimators referring to the baseline distribution, it can be seen that the mean values of the parameter estimates are concentrated close to the true parameter values, and as the sample size increases, the mean estimates of the MSEs become closer to zero, and the coverage probabilities converge to the adopted confidence level of 0.95.
For the estimators related to the covariates, where β 1 is associated with the numerical variable and β 2 associated with the dichotomous variable, similar behavior can be observed between the two and, in turn, satisfactory performance concerning the estimates and distributions of the data, just like the estimators referring to the baseline distribution.
When evaluating the estimators for the scenarios, it can be seen that the first scenario is associated with a circumstance in which the discrete Weibull distribution is adopted as the baseline distribution and the second in which the geometric distribution is adopted ( η = 1), it can be seen from the estimates and graphs presented that both baseline distributions are suitable for modeling discrete time-to-event data.
The entire evaluation up to this point has been carried out without censoring. Therefore, considering the same scenarios and sample sizes in Table 2 shows the estimates (average of parameter estimates, mean square error (MSE) and coverage probability (CP)) considering censoring percentages of 5, 10 and 30%. These estimates are the result of 1000 Monte Carlo replications.
In the presence of censoring, it can be seen that the higher the percentage of censoring, the greater the deviations of the estimates from the true value of the parameter. This behavior is expected since the higher the percentage of censoring, the more the empirical distribution of the simulated data differs from the theoretical distribution used to generate the data. The probability of coverage, which has a confidence level of 0.95, reinforces this statement. Note that as the amount of censoring increases, the greater the differences between the CP and the confidence level stipulated for constructing the intervals.
Another pertinent aspect is that, even with this shift in the true value of the parameter, the distribution of the estimators, even in the presence of censoring, is similar to the estimators in the absence of censoring (see, for example, Figure 5, which shows the estimator of β 1 , considering 30% censoring, which has the lowest CP values among the estimators).
Therefore, it can be seen from the results of the estimates and histograms, regardless of the scenario, censoring percentage, or sample size, that the shape of the empirical distribution of the estimators suggests adherence to the normal distribution. Thus, this distribution can be used for interval parameter estimation. As a result, hypothesis tests approximated by a normal distribution to verify the significance of the covariate can also be used in applications.

5. Application

Chronic low back pain is a major public health problem, as it can affect the quality of life and daily activities. Low back pain is also responsible for high rates of absenteeism from work.
The data set used in this study comes from [17], whose time-to-event is the number of unsuccessful sessions before the session that reduced or relieved the low back pain. Here, t = 0 represents the patient who would have had pain relief in the very first session.
Observations were considered censored when the patient’s follow-up was interrupted for some reason unrelated to the event of interest in the study or after 11 unsuccessful sessions. Table 3 shows the number of patients who experienced a reduction or relief of low back pain and the number and percentage of censored patient observations per number of unsuccessful sessions.
In addition to the number of unsuccessful sessions, the data set includes information on the various characteristics of the 150 patients (6 covariates). The covariates age, body mass index (BMI), and duration of pain were originally quantitative and were categorized. The patients were divided into two age groups, one for individuals aged up to 50 and the other aged 50 or over; into two BMI groups, non-obese (BMI less than 30) and obese (BMI greater than or equal to 30); into two pain time groups, one with less than five years of pain and the other with five years or more of pain. This information is summarized in Table 4.
The application data was then adjusted using POHM-G, POHM-DW and POHM-DLL. Initially, these multiple models were adjusted to check the significance of their covariates ( H 0 : β 1 = 0 to H 0 : β 6 = 0 ). The p-value results of the multiple models are shown in Table 5.
According to the results in Table 5, the covariates treatment and medicines are significant (at a significance level of 5%) in all three models.
On the other hand, the other covariates are not significant and would not influence the relief or reduction of the patient’s back pain. The significance test was therefore carried out by adjusting only the significant covariates, and the results are shown in Table 6.
The results in Table 6 show that the covariates treatment and medicines influence the relief or reduction of patients’ low back pain. Therefore, taking these covariates into account, the study to verify the assumption of proportional odds hazard will be carried out using the methods presented in Section 3.2.
The assumption of proportional odds hazard will be verified for the data set, observing this proportionality between the levels of the covariate treatment and the covariate medicines and for each of the levels of these two covariates, using the graph: l o g ( G 0 ( t ) ) × l o g ( G 1 ( t ) ) and the hypothesis test proposed in (25).
Since five consecutive tests were carried out, the Bonferroni correction will be used to correct the probability of incorrectly rejecting the null hypothesis, and thus the significance level will be 0.05 / 5 = 0.01 . The results are shown in Figure 6, assuming: z 1 = level of the covariate treatment ( z 1 = 0 : placebo; z 1 = 1 : active) and z 2 = level of the covariate medicines ( z 2 = 0 : yes; z 2 = 1 : no).
It is important to note that the number of tests to be carried out would be eight, that is, four levels of covariates combined two by two, totaling six tests plus the two levels within the covariates. However, one of the covariate levels ( z 1 = 0 ; z 2 = 1 ) has a limited number of observations (<10), making it inadequate to construct graphs and test hypotheses.
The test results shown in Figure 6 show that the proportional odds hazard assumption was not rejected for 3 of the five levels of covariates considered in this study (given a significance level of 1%). The l o g ( G 0 ( t ) ) × l o g ( G 1 ( t ) ) graphs shown corroborate that the proportional odds hazard was not rejected in the hypothesis tests, as the points are close to the fitted regression line.
The fact that most of the two-by-two levels studied have proportional odds hazard indicates that the data under study have proportional odds hazard, which justifies using this methodology in this data set.
Thus, for the POHM-G, POHM-DW, and POHM-DLL models as a whole, considering the two significant covariates, the point and interval estimates of their parameters were calculated and are shown in Table 7.
The estimates in Table 7, provide an interpretation of the odds hazard for the different categories of the covariates under study. Taking the POHM-DW model and the treatment covariate as an example. Since exp { β 1 } represents the ratio of the odds hazard of the different groups, constant over time, considering that the covariate medicines is constant. Assuming the group of patients with active treatment ( z 1 = 1 ). In this context, the odds hazard for patients on active treatment is exp { 0.7153 } = 2.0448 times the odds hazard for patients on placebo treatment.
Therefore, the odds hazard of the patient having active treatment is 1.0448 times greater than the odds hazard of the patient having placebo treatment ( z 1 = 0 ). In this circumstance, the odds hazard for patients who do not use medication is 1.1282 times greater than the odds hazard for patients who do use medication.
The same interpretation can be made for the other models with different numerical values. However, the odds hazard remain higher for active treatment and patients not taking medication.
To assess the fit of the models to the data, the survival graphs of the Kaplan-Meier estimator (K-M) [20] and the survival curves of the models under study were drawn for each of the covariate levels to analyze the set of graphs and interpret their overall fit (Figure 7).
Figure 7 shows that the models fit the data well, with the survival estimates of these models always being close to the empirical estimates, with a positive highlight for the POHM-DLL and POHM-DW models, where the survival estimates are closer to the survival estimates of the Kaplan-Meier estimator.
In order to compare with pre-existing discrete models in the literature, regression models were fitted taking into account the DW (expressions (5)–(7)) and geometric distribution (DW with η = 1 ) with the covariates associated in the parameter q through a logit link function, i.e.,
q = exp { z β } 1 + exp { z β } .
These models will be referred to respectively as the discrete Weibull regression model (DWRM) and the geometric regression model (GRM).
In addition, we also consider the DLL (expressions (8)–(10)) with the covariates associated in the parameter α through a logarithmic link function, i.e.,
α = exp { z β } .
This model will be called the discrete log-logistic regression model (DLLRM).
Note, through Figure 8, that for levels z 1 = 1 ; z 2 = 0 and z 1 = 1 ; z 2 = 1 , the so-called traditional models behaved similarly to the model under study. However, for the other levels, the estimates of these models are more distant from the empirical estimates compared to the model under study, providing indications that the proportional odds hazard structure for discrete data provides a better fit to the data when compared to traditional discrete models.

6. Conclusions

The proportional odds hazard model (POHM) presented in this paper is a regression model developed for discrete data that has been used in the literature because it has the advantage of facilitating the interpretation of its coefficients without having to worry about using the baseline hazard. However, in certain studies, correctly informing the baseline hazard is essential.
In this study, the POHM was formulated considering the following distributions: discrete Weibull of [10], geometric and discrete log-logistic. The inferential process was developed in a survival analysis context using the maximum likelihood estimation method. The results obtained on simulated data showed evidence of the asymptotic properties of the estimators for two different baseline distributions. Furthermore, the model proposed by adopting three different baseline distributions (geometric, discrete Weibull and discrete log-logistic) showed a good fit to the real data set, demonstrating that the estimation method developed and the use of baseline distributions for discrete random variables used to develop the POHM is a highly viable alternative for modeling discrete survival data with covariates.

Author Contributions

Conceptualization, M.G.F.V., M.R.P.C., R.M. and E.Y.N.; methodology, M.G.F.V., M.R.P.C., R.M. and E.Y.N.; software, M.G.F.V. and M.R.P.C.; validation, M.G.F.V., M.R.P.C., R.M. and E.Y.N.; formal analysis, M.G.F.V., M.R.P.C., R.M. and E.Y.N.; investigation, M.G.F.V., M.R.P.C., R.M. and E.Y.N.; resources, M.G.F.V. and E.Y.N.; data curation, M.G.F.V. and M.R.P.C.; writing—original draft preparation, M.G.F.V., M.R.P.C., R.M. and E.Y.N.; writing—review and editing, M.G.F.V., M.R.P.C., R.M. and E.Y.N.; visualization, M.G.F.V., M.R.P.C., R.M. and E.Y.N.; supervision, E.Y.N.; project administration, E.Y.N.; funding acquisition, E.Y.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brazil (CAPES)—Finance Code 001, Fundação de Apoio à Pesquisa do Distrito Federal (FAPDF)—TOA 443/2022, National Council for Scientific and Technological Development (CNPq), Editais de Auxílio Financeiro DPI/DPG/UnB, DPI/DPG/BCE/UnB and PPGEST/UnB.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CPCoverage probability
DWDiscrete Weibull distribution
DLLDiscrete log-logistic distribution
DLLRMDiscrete log-logistic regression model
DWRMDiscrete Weibull regression model
GRMGeometric regression model
MSEMean squared error
POHMProportional odds hazard model
POHM-GProportional odds hazard model geometric
POHM-DWProportional odds hazard model discrete Weibull
POHM-DLLProportional odds hazard model discrete log-logistic

References

  1. Cox, D.R. Regression models and life-tables. J. R. Stat. Soc. Ser. (Methodol.) 1972, 34, 187–202. [Google Scholar] [CrossRef]
  2. Tutz, G.; Schmid, M. Modeling Discrete Time-to-Event Data; Springer: New York, NY, USA, 2016. [Google Scholar]
  3. Nakano, E.Y.; Carrasco, C.G. Uma avaliação do uso de um modelo contínuo na análise de dados discretos de sobrevivência. Trends Comput. Appl. Math. 2006, 7, 91–100. [Google Scholar] [CrossRef]
  4. Berger, M.; Schmid, M. Semiparametric regression for discrete time-to-event data. Stat. Model. 2018, 18, 322–345. [Google Scholar] [CrossRef]
  5. Vallejos, C.A.; Steel, M.F. Bayesian survival modelling of university outcomes. J. R. Stat. Soc. Ser. Stat. Soc. 2017, 180, 613–631. [Google Scholar] [CrossRef]
  6. Heyard, R.; Timsit, J.-F.; Held, L. COMBACTE-MAGNET consortium. Validation of discrete time-to-event prediction models in the presence of competing risks. Biom. J. 2020, 62, 643–657. [Google Scholar] [CrossRef] [PubMed]
  7. Zhou, X.-D.; Wang, Y.-J.; Yue, R.-X. Optimal designs for discrete-time survival models with random effects. Lifetime Data Anal. 2021, 27, 300–332. [Google Scholar] [CrossRef] [PubMed]
  8. Groll, A.; Tutz, G. Variable selection in discrete survival models including heterogeneity. Lifetime Data Anal. 2017, 23, 305–338. [Google Scholar] [CrossRef] [PubMed]
  9. Royston, P.; Parmar, M.K.B. Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Stat. Med. 2002, 21, 2175–2197. [Google Scholar] [CrossRef] [PubMed]
  10. Nakagawa, T.; Osaki, S. The discrete Weibull distribution. IEEE Trans. Reliab. 1975, 24, 300–301. [Google Scholar] [CrossRef]
  11. Vila, R.; Nakano, E.Y.; Saulo, H. Theoretical results on the discrete Weibull distribution of Nakagawa and Osaki. Statistics 2019, 53, 339–363. [Google Scholar] [CrossRef]
  12. Jayakumar, K.; Babu, M.G. Discrete Weibull geometric distribution and its properties. Commun.-Stat.-Theory Methods 2018, 47, 1767–1783. [Google Scholar] [CrossRef]
  13. Cardial, M.R.P.; Fachini-Gomes, J.B.; Nakano, E.Y. Exponentiated discrete Weibull distribution for censored data. Braz. J. Biom. 2020, 38, 35–56. [Google Scholar]
  14. Chakraborty, S. Generating discrete analogues of continuous probability distributions-A survey of methods and constructions. J. Stat. Distrib. Appl. 2015, 2, 6. [Google Scholar] [CrossRef]
  15. Krishna, H.; Pundir, P.S. Discrete Burr and discrete Pareto distributions. Stat. Methodol. 2009, 6, 177–188. [Google Scholar] [CrossRef]
  16. Para, B.A.; Jan, T.R. Discrete version of log-logistic distribution and its applications in genetics. Int. J. Mod. Math. Sci 2016, 14, 407–422. [Google Scholar]
  17. Corrêa, J.B.; Costa, L.O.P.; Oliveira, N.T.B.; Lima, W.P.; Sluka, K.A.; Liebano, R.E. Effects of the carrier frequency of interferential current on pain modulation and central hypersensitivity in people with chronic nonspecific low back pain: A randomized placebo-controlled trial. Eur. J. Pain 2016, 20, 1653–1666. [Google Scholar] [CrossRef]
  18. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2019; Available online: https://www.R-project.org/ (accessed on 1 October 2023).
  19. Ross, S.M. Simulation; Academic Press: Cambridge, MA, USA, 2022. [Google Scholar]
  20. Kaplan, E.L.; Meier, P. Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 1958, 53, 457–481. [Google Scholar] [CrossRef]
Figure 1. Results from 1000 Monte Carlo replications for the parameter q.
Figure 1. Results from 1000 Monte Carlo replications for the parameter q.
Axioms 12 01102 g001
Figure 2. Results from 1000 Monte Carlo replications for the parameter η .
Figure 2. Results from 1000 Monte Carlo replications for the parameter η .
Axioms 12 01102 g002
Figure 3. Results from 1000 Monte Carlo replications for the parameter β 1 .
Figure 3. Results from 1000 Monte Carlo replications for the parameter β 1 .
Axioms 12 01102 g003
Figure 4. Results from 1000 Monte Carlo replications for the parameter β 2 .
Figure 4. Results from 1000 Monte Carlo replications for the parameter β 2 .
Axioms 12 01102 g004
Figure 5. Results from 1000 Monte Carlo replications for the parameter β 1 (30% censoring).
Figure 5. Results from 1000 Monte Carlo replications for the parameter β 1 (30% censoring).
Axioms 12 01102 g005
Figure 6. Verification of the proportional hazards assumption for the covariates treatment ( z 1 ) and medicines ( z 2 ). (a) ( z 1 = 0 ; z 2 = 0 ) × ( z 1 = 1 ; z 2 = 0 ); (b) ( z 1 = 0 ; z 2 = 0 ) × ( z 1 = 1 ; z 2 = 1 ); (c) ( z 1 = 1 ; z 2 = 0 ) × ( z 1 = 1 ; z 2 = 1 ); (d) treatment; (e) medicines.
Figure 6. Verification of the proportional hazards assumption for the covariates treatment ( z 1 ) and medicines ( z 2 ). (a) ( z 1 = 0 ; z 2 = 0 ) × ( z 1 = 1 ; z 2 = 0 ); (b) ( z 1 = 0 ; z 2 = 0 ) × ( z 1 = 1 ; z 2 = 1 ); (c) ( z 1 = 1 ; z 2 = 0 ) × ( z 1 = 1 ; z 2 = 1 ); (d) treatment; (e) medicines.
Axioms 12 01102 g006
Figure 7. Fitting the models to the low back pain data by level of the covariates treatment and medicines.
Figure 7. Fitting the models to the low back pain data by level of the covariates treatment and medicines.
Axioms 12 01102 g007aAxioms 12 01102 g007b
Figure 8. Fitting traditional discrete models to low back pain data by level of covariates treatment and medicines.
Figure 8. Fitting traditional discrete models to low back pain data by level of covariates treatment and medicines.
Axioms 12 01102 g008aAxioms 12 01102 g008b
Table 1. Scenarios used in the simulations.
Table 1. Scenarios used in the simulations.
Scenarioq η β 1 β 2 Model
S 1 0.91.502.01.0POHM-DW
S 2 0.51.002.01.0POHM-G
Table 2. Average estimates, MSE and CP of the POHM-DW and POHM-G parameters considering the simulation scenarios and different sample sizes and censoring percentages.
Table 2. Average estimates, MSE and CP of the POHM-DW and POHM-G parameters considering the simulation scenarios and different sample sizes and censoring percentages.
nScen.Cens.
Perc.
q η β 1 β 2
AverageMSECPAverageMSECPAverageMSECPAverageMSECP
30 S 1 5%0.90970.00250.91001.62380.08250.83502.15560.28250.91101.05130.40440.9470
S 2 0.55410.01870.93401.11860.06220.82802.16240.54900.89401.07350.70500.9410
S 1 10%0.91290.00230.91501.60380.08010.85002.07210.25100.92301.00900.40770.9390
S 2 0.58890.02260.91301.08360.05780.86501.93740.42860.85900.95000.65410.9300
S 1 30%0.92960.00260.88201.56110.09700.90101.83360.27230.87200.81860.47500.9200
S 2 0.69580.04930.68300.96680.06080.93601.40570.60680.61100.58780.73870.8940
50 S 1 5%0.90750.00130.93401.55630.03800.89402.04760.12250.93501.04200.21770.9470
S 2 0.55050.01140.93301.04990.02490.90101.99430.20790.90701.05360.34550.9400
S 1 10%0.91280.00130.93201.54600.03700.90601.98250.12420.92801.00320.21900.9450
S 2 0.58760.01580.87701.02580.02370.92701.81850.21950.84200.96820.32900.9280
S 1 30%0.93090.00180.87701.50270.04350.93601.76290.16970.84800.83950.25620.9230
S 2 0.70370.04720.40600.94710.02940.95901.35430.52230.43500.67630.37980.8980
100 S 1 5%0.90690.00070.94101.52880.01530.92902.01390.06040.93901.00450.08480.9510
S 2 0.54140.00600.91801.02150.01020.93201.94170.09130.90500.97660.11190.9460
S 1 10%0.91190.00080.92101.52230.01640.92801.96440.06370.92700.97180.08670.9410
S 2 0.57630.01000.82301.00690.01030.95101.81750.11630.82900.92340.11270.9360
S 1 30%0.93120.00140.78001.49170.02200.94901.77480.11270.79700.86860.10660.9280
S 2 0.69920.04170.01380.95630.01570.94501.43670.38280.36000.75790.16600.8810
250 S 1 5%0.90600.00030.94301.50360.00620.94301.96140.02560.93000.98450.02990.9570
S 2 0.54290.00360.84901.00370.00440.93801.88170.04780.85200.95360.04930.9480
S 1 10%0.91040.00040.90701.49280.00660.93901.94010.03260.87900.95460.03880.9340
S 2 0.57780.00740.59000.94890.00460.94201.75010.09320.62000.89400.05800.9100
S 1 30%0.93030.0110.54001.45790.00980.94201.70520.10920.49100.84920.05950.8710
S 2 0.69270.03480.00200.92480.01080.86301.35680.43520.02300.70800.13030.7000
500 S 1 5%0.90570.00020.92101.50060.00330.94301.94800.01480.90000.97000.02240.9340
S 2 0.54040.00270.73800.99290.00230.94601.83990.04270.68700.91250.03710.8980
S 1 10%0.91020.00020.86201.48640.00350.95701.88450.02460.78700.94050.02440.9140
S 2 0.57500.06600.31000.98170.00230.96501.72250.08730.30600.84960.04860.8160
S 1 30%0.93010.00100.28101.44890.00730.88301.67180.11950.13800.83680.04760.7660
S 2 0.69020.03690.00100.91200.01050.64501.27520.53620.00000.63070.15900.2790
Table 3. Patients with low back pain relief by number of unsuccessful sessions.
Table 3. Patients with low back pain relief by number of unsuccessful sessions.
Time-to-EventPatients with ReliefNumber of Censures% of Censures
08500.00%
12900.00%
2710.67%
3510.67%
4400.00%
5600.00%
6100.00%
7300.00%
9100.00%
11253.33%
Total14374.67%
Souce: [17].
Table 4. Summary of study covariates by category.
Table 4. Summary of study covariates by category.
CovariatesCategoriesFrequency%
TreatmentPlacebo5033.33%
Active10066.67%
SexFemale11576.67%
Male3523.33%
AgeUp to 50 years6444.00%
50 years and over8456.00%
BMIUp to 3010872.00%
30 or more4228.00%
Pain duration5 years or more9362.00%
Less than 5 years5738.00%
MedicinesYes11576.67%
No3523.33%
Table 5. Test of significance of covariates for multiple POHM-G, POHM-DW and POHM-DLL.
Table 5. Test of significance of covariates for multiple POHM-G, POHM-DW and POHM-DLL.
Identificationp-Value
VariableParameterPOHM-GPOHM-DWPOHM-DLL
Treatment 1 β 1 5 × 10 5 0.00350.0050
Sex 2 β 2 0.91240.96010.9889
Age 3 β 3 0.29150.44400.4272
BMI 4 β 4 0.65000.72800.7357
Pain duration 5 β 5 0.01290.08790.1098
Medicines 6 β 6 0.06000.03960.0397
Reference level of the covariates: 1 = Placebo; 2 = Female; 3 = Up to 50 years; 4 = Up to 30; 5 = 5 years or more; 6 = Yes.
Table 6. Test of significance of covariates for POHM-G, POHM-DW and POHM-DLL for significant covariates.
Table 6. Test of significance of covariates for POHM-G, POHM-DW and POHM-DLL for significant covariates.
Identificationp-Value
VariableParameterPOHM-GPOHM-DWPOHM-DLL
Treatment 1 β 1 2 × 10 5 0.00280.0045
Medicines 2 β 2 0.00100.01720.0202
Reference level of the covariates: 1 = Placebo; 2 = Yes.
Table 7. POHM-G, POHM-DW and POHM-DLL joint parameter estimates—significant covariates.
Table 7. POHM-G, POHM-DW and POHM-DLL joint parameter estimates—significant covariates.
POHM-G
VariableParameterEstimateStandard ErrorCI (95%)p-value
-q0.23590.0325[0.1722; 0.2996]-
Treatment 1 β 1 0.96570.2277[0.5195; 1.4412] 2 × 10 5
Medicines 2 β 2 0.99670.3048[0.3993; 1.5942]0.0010
POHM-DW
VariableParameterEstimateStandard ErrorCI (95%)p-value
-q0.58270.0574[0.4701; 0.6953]-
- η 0.58870.0670[0.4573; 0.7201]-
Treatment 1 β 1 0.71530.2390[0.2468; 1.1838]0.0028
Medicines 2 β 2 0.75530.3169[0.1342; 1.3764]0.0172
POHM-DLL
VariableParameterEstimateStandard ErrorCI (95%)p-value
- α 1.42990.3606[0.7231; 2.1367]-
- η 0.95680.1253[0.7111; 1.2024]-
Treatment 1 β 1 0.68750.2419[0.2135; 1.1615]0.0045
Medicines 2 β 2 0.73660.3171[0.1151; 1.3582]0.0202
Reference level of the covariates: 1 = Placebo; 2 = Yes.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Vieira, M.G.F.; Cardial, M.R.P.; Matsushita, R.; Nakano, E.Y. Proportional Odds Hazard Model for Discrete Time-to-Event Data. Axioms 2023, 12, 1102. https://doi.org/10.3390/axioms12121102

AMA Style

Vieira MGF, Cardial MRP, Matsushita R, Nakano EY. Proportional Odds Hazard Model for Discrete Time-to-Event Data. Axioms. 2023; 12(12):1102. https://doi.org/10.3390/axioms12121102

Chicago/Turabian Style

Vieira, Maria Gabriella Figueiredo, Marcílio Ramos Pereira Cardial, Raul Matsushita, and Eduardo Yoshio Nakano. 2023. "Proportional Odds Hazard Model for Discrete Time-to-Event Data" Axioms 12, no. 12: 1102. https://doi.org/10.3390/axioms12121102

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop