Next Article in Journal
PathGen-LLM: A Large Language Model for Dynamic Path Generation in Complex Transportation Networks
Previous Article in Journal
Composite Estimators for the Population Mean Under Ranked Set Sampling
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Bivariate Copula–Driven Multi-State Model for Statistical Analysis in Medical Research

by
Hugo Brango
1,*,
Roger Tovar-Falón
2 and
Guillermo Martínez-Flórez
2
1
Grupo de Investigación Análisis Funcional y Ecuaciones Diferenciales (AFED), Departamento de Matemáticas, Facultad de Educación y Ciencias, Universidad de Sucre, Sincelejo 700001, Colombia
2
Departamento de Matemáticas y Estadística, Universidad de Córdoba, Montería 230002, Colombia
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(19), 3072; https://doi.org/10.3390/math13193072
Submission received: 1 August 2025 / Revised: 13 September 2025 / Accepted: 19 September 2025 / Published: 24 September 2025
(This article belongs to the Special Issue Statistical Modeling and Analysis in Medical Research)

Abstract

We develop and evaluate a copula-based multistate model for illness–death processes with dependent transition times. The framework couples Cox proportional hazards models for the marginal transition intensities with Archimedean copulas to capture dependence, and it is estimated via the Inference Functions for Margins (IFM) approach under right censoring. A Monte Carlo study shows that assuming independence between transitions can severely underestimate joint survival, yielding coverage as low as 40 % under strong dependence, compared with 92 % to 97 % when copulas are used. We apply the method to a large Colombian cohort of COVID-19 patients (2021 to 2022) that includes sociodemographic, clinical, and vaccination data. The Gumbel copula best captures the strong positive dependence between hospitalization and death, producing more accurate joint survival estimates than independence-based models. Model diagnostics, including proportional hazards tests, Kaplan-Meier comparisons, hazard rate functions, and TTT plots, support the adequacy of the Cox margins. We also discuss limitations and avenues for extension, such as parametric or cure-fraction margins, nested or vine copulas, and full-likelihood estimation. Overall, the results underscore the methodological and applied value of integrating copulas into multistate models, offering a robust framework for analyzing dependent event times in epidemiology and biomedicine.

1. Introduction

Survival analysis constitutes a fundamental statistical methodology in medical, epidemiological, and biostatistical research, playing a crucial role in analyzing time-to-event data, such as disease recurrence, clinical progression, hospitalization, and mortality. Its utility in identifying prognostic factors, evaluating therapeutic efficacy, and providing empirical evidence on patient outcomes makes it indispensable for evidence-based clinical decision-making [1,2,3]. Among the various survival analysis techniques, the Cox proportional hazards model has become the predominant method, thanks to its interpretability, flexibility, and capability to accommodate covariate effects without specifying the baseline hazard function [4,5]. However, a significant limitation of the conventional Cox model is its implicit assumption of independence between event times, an assumption that frequently does not hold true in clinical practice [6,7].
In numerous clinical scenarios, such as chronic illnesses, cardiovascular conditions, and cancer, patients often experience sequences of related events that exhibit inherent interdependencies. For example, a hospitalization event can significantly alter the subsequent risk of mortality through factors such as clinical deterioration, treatment complications, or common underlying health determinants [7,8,9]. Neglecting these dependencies in the analysis of multi-state events can substantially bias risk estimates, distort covariate effect interpretations, and lead to erroneous conclusions, ultimately impacting patient management, resource allocation, and prognostic accuracy [9,10].
Multi-state models have emerged as powerful statistical tools to explicitly describe and analyze transitions between distinct health states over time. These models facilitate the investigation of complex event histories, allowing researchers to quantify covariate effects on the timing and sequence of clinical outcomes comprehensively [10,11]. Despite their versatility, conventional multi-state models often continue to assume independence across transitions, a simplification frequently violated due to latent patient heterogeneity or unmeasured common risk factors influencing multiple events [12,13,14].
Recent methodological developments have aimed to address these limitations by introducing sophisticated approaches that explicitly account for dependencies between multistate transitions. Techniques such as landmarking, joint frailty–multistate modeling, hierarchical models integrating longitudinal biomarkers, and hidden semi-Markov frameworks exemplify these advancements [12,14,15,16]. Collectively, these approaches highlight the increasing recognition of the necessity to model dependencies robustly and flexibly within multistate contexts.
An especially promising and increasingly employed approach to capturing complex dependence structures in survival analysis is the use of copula functions. Copulas offer considerable flexibility by modeling dependencies separately from marginal distributions, enabling researchers to accurately describe complex association structures among event times [17,18,19]. Originally developed in fields such as finance and hydrology, copulas have gained substantial attention in biostatistics and medical research, where they have demonstrated notable improvements in modeling correlated event times and enhancing predictive accuracy [20,21,22,23].
Empirical studies in medical research have emphasized the value of copula-based multistate models. For instance, in oncology research, copula-driven approaches have more precisely quantified risks of recurrence and metastasis, outperforming traditional models [22,24]. Similarly, in cardiovascular epidemiology, copula-based multistate models have substantially improved predictions of rehospitalization and mortality after myocardial infarction [25]. These studies underscore the practical importance of incorporating copula models to effectively represent real-world clinical dependencies.
Motivated by these critical methodological advancements and their substantial clinical implications, this article proposes and rigorously evaluates a bivariate copula-based multi-state model for jointly analyzing clinically significant events, hospitalization, and death using data from a large observational registry of COVID-19 patients in Colombia (2021–2022). Utilizing flexible Archimedean copulas, our model explicitly accounts for residual dependencies between events, improving inferential accuracy, risk stratification, and predictive performance.
The article is structured as follows: Section 2 presents foundational concepts of multistate modeling, including Cox proportional hazards models and copula functions, particularly Archimedean copulas. Section 3 introduces our proposed joint semi-parametric multistate–copula model, detailing its statistical formulation and inference methodologies. Section 4 presents a comprehensive simulation study, discussing scenario configurations, data-generating mechanisms, estimation strategies, and extensive results with an interpretative discussion. Section 5 demonstrates a practical application of the proposed model to COVID-19 patient data, illustrating the clinical and practical utility. Section 6 summarizes the main findings and conclusions of the study, while Section 7 discusses limitations and outlines directions for future research.

2. Basic Concepts

2.1. Multi-State Models

In survival studies, it is common for individuals to experience events over time. Multi-state models provide a useful framework for estimating the risk of transitions between disease states and for understanding the influence of covariates on the timing of these transitions [26].
In longitudinal studies, individuals may experience several clinical events over time. Multi-state models provide a flexible framework for estimating transition hazards between discrete disease states and for quantifying how covariates affect those transitions [7,9]. Formally, one considers a stochastic process X t : t T , taking values in 1 , 2 , , K , where K denotes the number of states and each i j transition is governed by an intensity function.
λ i j t X = lim Δ t 0 Pr X ( t + Δ t ) = j X ( t ) = i , X Δ t .
The three frameworks are summarized in Figure 1. Panel (a) depicts the basic two–state survival model, where individuals transition directly from the initial “alive” state to the absorbing state of death. Panel (b) extends this framework by introducing an intermediate “illness” or “progression” state, which allows a more detailed characterization of clinical trajectories and the risks associated with subsequent mortality. Finally, panel (c) presents the generalized multi–state structure, in which individuals sequentially pass through multiple intermediate states before reaching the absorbing state. This representation provides a more flexible and realistic description of complex clinical scenarios, such as the progression through successive stages of a chronic disease.
It is of interest to determine the rate of onset or progression of the disease and to identify prognostic variables governing the transitions between the different states in a medical context.
The most common approach to these practical scenarios is to decompose the process into multiple survival models and fit, separately, the transition risk from one state to another (often called the transition intensity). When covariates are available to explain these transitions, the standard choice is the Cox proportional hazards model.

2.2. Cox Proportional Hazards Model

To assess the effect of a vector of covariates X = ( x 1 , x 2 , , x p ) T , we invoke the Cox proportional hazards model [4]. Let T denote the time to the transition of interest and define the hazard function
λ t X = lim Δ t 0 Pr t T < t + Δ t T t , X Δ t .
The Cox model assumes
λ ( t ; X ) = λ 0 ( t ) exp ( β X ) ,
where
  • λ 0 t is an unspecified baseline hazard, common to all subjects when X = 0 , and
  • β = ( β 1 , , β p ) measures log-hazard ratios associated with the covariates.
Under proportionality, the hazard ratio between two individuals with covariate values X and X * is
λ t X λ t X * = exp β ( X X * ) ,
constant in t.
Cox’s key insight was to construct a partial likelihood that eliminates the unknown λ 0 t . Suppose that there are n ordered event times t ( 1 ) < < t ( m ) , and at time t ( k ) , a single subject j k experiences the event. Let
R t ( k ) = { : T t ( k ) }
be the risk set just before t ( k ) . The partial likelihood is
L p ( β ) = k = 1 m exp β X j k R ( t ( k ) ) exp β X .
Maximizing log L p ( β ) yields the estimator β ^ . Standard errors follow from the observed information matrix.

Baseline Hazard and Survival Function

Once β ^ is obtained, one recovers an estimate of the cumulative baseline hazard
Λ ^ 0 ( t ) = t ( k ) t 1 R ( t ( k ) ) exp β ^ X ,
known as the Breslow estimator. The corresponding marginal survival for an individual with covariates X is
S ^ ( t X ) = exp Λ ^ 0 ( t ) exp β ^ X .
Fitting the Cox model separately to each transition of a multi-state process yields marginal survival curves, S ^ ( t X ) .
However, in processes with multiple sequential events (e.g., hospitalization followed by death), fitting separate Cox models for each transition can overlook residual dependence between event times. To address this limitation and jointly model survival across stages, we need an extension that explicitly captures the interrelationships among transition times.
A powerful framework for joint survival modeling in this setting is provided via copula models, which allow for flexible specification of the dependence structure among multiple outcomes or event times. Within a multi-state context, individuals move through a series of states, and the occurrence of one event can directly influence the probability of subsequent events.

2.3. Model Diagnostics and Marginal Fits

We assessed the proportional hazards (PHs) assumption using Schoenfeld residuals for each covariate and transition. Plots were inspected for systematic time trends and formal tests were computed (global and covariate-specific).
To show that the semiparametric Cox models reproduce the marginal behavior of the data before introducing dependence, we contrasted, for each transition, (i) the nonparametric Kaplan–Meier (KM) estimator with 95% pointwise CIs and (ii) the population-averaged survival curve implied by the fitted Cox model. The latter was computed as
S ¯ ( t ) = 1 n i = 1 n S 0 ( t ) exp ( η i ) ,
averaging over individuals (and mixing over city strata using sample proportions when applicable), where S 0 t is the baseline survival, and η i is the fitted linear predictor.
We also compared hazard rate functions using the following: (a) a nonparametric kernel estimator (muhaz) restricted to the observed event-time range, and (b) the Cox-induced hazard obtained via the numerical differentiation of the Breslow baseline cumulative hazard and, when stratified, averaging across strata with the same weights as above. Hazards were plotted on a log-scale for readability.
We also examined how risk evolved over time by plotting hazard-rate functions (HRFs) for each transition. Specifically, we contrasted the population-averaged HRF implied by the fitted Cox model with a nonparametric kernel HRF (muhaz); curves were shown on a log scale for readability. The agreement between the two supports the adequacy of the marginal Cox specification.
Finally, Total Time on Test (TTT) plots were produced to summarize the failure-time shape irrespective of model assumptions. Writing u [ 0 , 1 ] for the proportion failed, the TTT curve TTT ( u ) was computed from the KM estimator. A 45 line corresponds to a constant hazard; concavity indicates a decreasing hazard, and convexity indicates an increasing hazard. We annotated u max (the maximum horizontal deviation), which provides a concise summary of early- vs. late-failure dominance.

2.4. Copula Functions

Copulas provide a general mechanism to build multivariate distributions by “coupling” univariate margins with an explicit dependence structure. They have found widespread application in fields such as finance, hydrology, and biostatistics [17,18]. Formally, a copula is a multivariate distribution function on the unit cube 0 , 1 d with uniform 0 , 1 margins.
Sklar’s theorem [27] underpins the entire copula approach. For any d-dimensional continuous distribution function, F, with univariate margins, F 1 , , F d , there exists a unique copula, C, such that
F ( x 1 , , x m ) = C F 1 ( x 1 ) , , F m ( x m ) .
Conversely, given any copula, C, and univariate margins, F i , the above relation defines a valid joint distribution. In survival analysis, one typically works with survival copulas, linking marginal survival functions, S i t = 1 F i t , into a joint survival,
S t 1 , , t d = C S 1 ( t 1 ) , , S d ( t d ) .

2.4.1. Archimedean Copulas

Archimedean copulas were chosen in this work because they offer a balance between analytical tractability and flexibility. Their one-parameter formulation makes them parsimonious and computationally efficient, which is advantageous for estimation within the IFM framework [17,18]. In addition, Archimedean families such as Clayton and Gumbel can capture clinically relevant tail dependencies: lower-tail for early adverse outcomes (e.g., rapid deterioration) and upper-tail for extreme late events (e.g., prolonged hospitalization followed by death). These features make them particularly suitable for bivariate survival data in medical and epidemiological research [19,21,22]. Other copula families, such as Gaussian, Student-t, or vine copulas, could also be employed to model more complex dependence structures, but they often require higher-dimensional parameterizations and substantially greater computational effort [20,28]. For clarity and parsimony, we focused on Archimedean copulas while acknowledging that extensions to elliptical or vine copulas constitute a relevant direction for future research.

Definition and Generator

A d-dimensional Archimedean copula, C, is defined via a continuous, strictly decreasing generator
φ : [ 0 , 1 ] [ 0 , ] , φ ( 1 ) = 0 , φ ( 0 + ) = ,
and its pseudo-inverse φ 1 . The copula is
C ( u 1 , , u d ) = φ 1 φ ( u 1 ) + + φ ( u d ) , ( u 1 , , u d ) [ 0 , 1 ] d .
When margins are continuous, the copula density is given by
c ( u 1 , , u d ) = φ 1 i = 1 d φ ( u i ) i = 1 d φ ( u i ) .
Note that the derivative of the inverse generator can be written explicitly as
d d x φ 1 ( x ) = φ ( φ 1 ( x ) ) 1 ,
which leads to the equivalent expression for the copula density:
c ( u 1 , , u d ) = i = 1 d φ ( u i ) φ φ 1 i = 1 d φ ( u i ) .
In the bivariate case d = 2 , this reduces to
c ( u , v ) = φ ( u ) φ ( v ) φ φ 1 φ ( u ) + φ ( v ) .

One-Parameter Families

For d = 2 , three canonical generators and their copulas are as follows:
  • Clayton ( θ > 0 ): The generator is
    φ ( u ) = u θ 1 θ , C ( u , v ) = u θ + v θ 1 1 / θ .
    This copula exhibits lower-tail dependence λ L = 2 1 / θ , zero upper-tail.
Remark 1.
Some texts omit the division by θ and define φ ( u ) = u θ 1 . Both forms are equivalent up to a positive scaling of the generator and lead to the same copula. Here, we adopt the divided-by-θ convention, which is standard in the copula literature.
  • Gumbel ( θ 1 ): The generator is
    φ ( u ) = ( ln u ) θ , C ( u , v ) = exp ( ln u ) θ + ( ln v ) θ 1 / θ .
    This copula exhibits lower-tail dependence λ U = 2 2 1 / θ , zero upper-tail.
  • Frank ( θ 0 ): The generator is
    φ ( u ) = ln e θ u 1 e θ 1 , C ( u , v ) = 1 θ ln 1 + ( e θ u 1 ) ( e θ v 1 ) e θ 1 .
    This copula exhibits tail symmetry, lacks asymptotic tail dependence, and is flexible enough to capture both negative θ < 0 or positive θ > 0 dependence.
To investigate the dependence properties of these bivariate copulas, we consider Kendall’s τ , a rank-based measure of concordance. This coefficient ranges from 1 (perfect negative dependence) to 1 (perfect positive dependence), with 0 indicating independence. Unlike linear correlation, Kendall’s τ depends only on the underlying copula, making it an appropriate summary of dependence strength.
Let ( X 1 , X 2 ) denote a pair of continuous random variables with copula C ( u , v ) . In general, Kendall’s τ can be expressed as
τ ( X 1 , X 2 ) = 4 0 1 0 1 C ( u , v ) d C ( u , v ) 1 ,
where C ( u , v ) is the copula function. For Archimedean copulas, τ has a closed-form representation based on the generator function φ ( u ) :
τ ( θ ) = 1 + 4 0 1 φ ( t , θ ) φ ( t , θ ) d t .
Closed-form relations are available for several families, which facilitate interpretation and provide initial parameter estimates:
τ Clayton = θ θ + 2 , τ Gumbel = 1 1 θ , τ Frank = 1 4 θ D 1 ( θ ) 1 ,
where D 1 ( θ ) denotes the Debye function of order one.
Thus, Kendall’s τ provides an interpretable measure of the strength and type of dependence, which is particularly useful in our context for summarizing and comparing the dependence captured via different copula families in survival applications.

2.4.2. Inference via Inference Functions for Margins (IFM)

The Inference Functions for Margins (IFM) method is a two-step estimation procedure commonly used for copula models. In the first step, the marginal distributions (or marginal survival functions in our context) are estimated using standard techniques (e.g., Cox proportional hazards models). In the second step, the copula parameter is estimated by maximum likelihood, using the pseudo-observations derived from the estimated margins. This approach separates the estimation of marginal and dependence parameters, making it computationally efficient and widely applicable in practice.
The IFM method [28,29] separates marginal estimation from copula fitting:
  • Marginal step: estimate each univariate distribution, F i , (parametrically or semiparametrically), and compute pseudo-observations
    u i j = F ^ i ( x i j ) .
  • Copula step: maximize the pseudo-log-likelihood
    ( θ ) = j = 1 n log c u 1 j , , u d j ; θ ,
    to obtain θ ^ .
In our framework, the IFM method is applied by first fitting Cox proportional hazards models for each transition intensity to estimate the marginal survival functions and then estimating the copula parameter that links the transition times. This ensures that dependence between transitions is modeled explicitly while retaining the flexibility of semi-parametric marginal estimation.
This two-stage procedure is computationally efficient and leverages existing marginal fits. In practice, the selection of the most appropriate Archimedean copula relies on a combination of statistical and graphical tools. The process usually begins with information criteria, such as AIC or BIC, to identify the generator that provides the best balance between fit and parsimony [18]. This is followed by graphical diagnostics, where contour plots of the empirical and fitted copulas are compared to verify whether both central and tail dependencies are adequately captured [30]. As a final step, formal goodness-of-fit tests such as the Cramér–von Mises and Kolmogorov–Smirnov statistics can be implemented within a parametric bootstrap framework to obtain valid p-values. This approach provides a rigorous assessment of how well the selected copula reproduces the dependence structure in the data [30].

3. Proposed Joint Semi-Parametric Multi-State–Copula Model

Building on the three-state diagram in Figure 2, our goal is to estimate marginal covariate effects on each transition hazard while simultaneously capturing residual dependence between the two observed transitions (hospitalization → death). In what follows, we present a mathematically rigorous development of the model, the full likelihood under arbitrary right-censoring, and the associated inference strategy.

3.1. Hazard-Based Margins

Let X R p denote baseline covariates (e.g., age, biomarkers). We focus on the two successive transitions:
1 ( Admission ) T 12 2 ( Complication ) T 23 3 ( Death ) ,
and allow a direct transition
1 ( Admission ) T 13 3 ( Death without complication ) .
For each transition j k { ( 1 , 2 ) , ( 1 , 3 ) , ( 2 , 3 ) } , we posit a Cox proportional hazards model
λ j k ( t X ) = λ 0 , j k ( t ) exp β j k X ,
where λ 0 , j k ( t ) is left unspecified, and β j k captures the log-hazard ratios for clinical covariates such as disease stage or treatment arm.
In the illness–death framework considered here, patients may follow two possible paths:
  • Diagnosis → Hospitalization → Death;
  • Diagnosis → Death without prior hospitalization.
Accordingly, our model specifies transition intensities for ( 1 2 ) , ( 2 3 ) , and ( 1 3 ) . In the copula-based formulation, individuals who die without being hospitalized contribute information to the ( 1 3 ) transition, with the time to hospitalization treated as censored. This ensures that both direct and sequential paths to death are accommodated within the likelihood framework, and the copula links the observed times even when one component is censored.
Maximization of the partial likelihood  produces β ^ j k , and the Breslow estimator yields the cumulative baseline hazard
Λ ^ 0 , j k ( t ) = t i t 1 R j k ( t i ) exp β ^ j k X ,
from which the marginal survival function follows:
S ^ j k ( t X ) = exp Λ ^ 0 , j k ( t ) exp β ^ j k X .
These margins quantify, for example, how a one-unit increase in a severity score multiplies the instantaneous risk of complication or death, information of direct clinical relevance [5].
The Cox proportional hazards model relies on the proportional hazards (PH) assumption, which states that the hazard ratios associated with covariates remain constant over time. This assumption is standard in multi-state analyses and provides a parsimonious yet flexible way to model marginal transition intensities. If violations of the PH assumption were present, extensions such as stratified Cox models or additive hazards models could be considered [5,10].

3.2. Copula Linkage of Sequential Event Times

To capture residual dependence between the time to hospitalization T 12 and the subsequent time to death T 23 , we link their marginal survival curves through an Archimedean survival copula C θ [17,18]. According to Sklar’s theorem, for any t 1 and t 2 ,
Pr T 12 > t 1 , T 23 > t 2 X = C θ S 12 t 1 X , S 23 t 2 X .
In practice, we replace the true margins, S j k , with their Cox–Breslow estimates S ^ j k , yielding
S ^ joint ( t 1 , t 2 X ) = C θ S ^ 12 ( t 1 X ) , S ^ 23 ( t 2 X ) .
Choosing a one-parameter family (Clayton, Gumbel, or Frank) provides closed-form expressions for C θ ( u , v ) , its density, c θ ( u , v ) , and the partial derivative 1 C θ ( u , v ) [18].

3.3. Likelihood Under Right-Censoring

Let C denote an independent right-censoring time. For each subject, i, we observe
Y 12 , i = min ( T 12 , i , C i ) , Δ 12 , i = 1 { T 12 , i C i } ,
and, if Δ 12 , i = 1 , then
Y 23 , i = min T 23 , i , C i T 12 , i , Δ 23 , i = 1 { T 23 , i C i T 12 , i } .
Define the “pseudo-uniforms”:
u 1 i = S ^ 12 Y 12 , i X i , u 2 i = S ^ 23 Y 23 , i X i .
Let c θ ( u , v ) = 2 C θ ( u , v ) u v be the copula density and 1 C θ its derivative with respect to the first argument. Then, the likelihood contribution of subject i is
L i ( θ ) = c θ ( u 1 i , u 2 i ) , Δ 12 , i = 1 , Δ 23 , i = 1 , 1 C θ ( u 1 i , u 2 i ) , Δ 12 , i = 1 , Δ 23 , i = 0 , C θ ( u 1 i , u 2 i ) , Δ 12 , i = 0 ( Δ 23 , i = 0 ) .
Thus, the overall copula log-likelihood is
cop ( θ ) = i = 1 n [ Δ 12 , i Δ 23 , i log c θ ( u 1 i , u 2 i ) + Δ 12 , i ( 1 Δ 23 , i ) log 1 C θ ( u 1 i , u 2 i ) + ( 1 Δ 12 , i ) log C θ ( u 1 i , u 2 i ) ] .

3.4. Two-Step IFM Estimation and Inference

We adopt the Inference Functions for Margins (IFM) strategy [28,31]:
  • Marginal step. Fit each Cox model for transitions j k to obtain β ^ j k and the estimated survival, S ^ j k ( t X ) .
  • Copula step. Compute pseudo-uniforms ( u 1 i , u 2 i ) , and maximize cop ( θ ) over θ to get  θ ^ .
This fully specifies our joint model, as illustrated in Figure 2, and provides interpretable hazard-ratio estimates, together with a parsimonious dependence parameter that reflects unobserved clinical heterogeneity.

4. Simulation Study

To validate our two-step IFM estimator for the joint semi-parametric multi-state–copula model introduced in Section 3, we conducted a Monte Carlo study in which all data were generated under a Clayton copula and then analyzed under four estimation strategies: marginal (independence), correctly specified Clayton, and mis-specified Gumbel and Frank. Two covariates were included to assess inference on their effects.

4.1. Scenario Configuration

We considered a factorial design comprising three factors:
  • Sample size:
    n { 200 , 500 , 1000 } .
  • Clayton dependence:
    θ 1 ( τ 0.33 ) , 2 ( τ 0.50 ) , 4 ( τ 0.67 ) ,
    where τ = θ θ + 2 is Kendall’s τ . These values span from moderate to strong lower-tail dependence, reflecting clinical frailty scenarios.
  • Right-censoring rate:
    Uniform censoring C Uniform ( 0 , c ) calibrated to yield approximately 20 % and 50 % censoring on T 12 .
For each of the 18 scenarios defined by the cross-product of these conditions, we generated 1000 independent replicates. The joint survival probability was evaluated at times corresponding to the quantiles 0.25 , 0.50 , and 0.75 .

4.2. Data-Generating Mechanism

For each individual, we simulated two covariates:
  • X 1 Uniform 18 , 80 , representing age in years.
  • X 2 Bernoulli 0.5 , representing a binary treatment indicator (e.g., treatment vs. control).
This simple design with one continuous covariate drawn from a uniform distribution and one binary Bernoulli covariate is common in methodological simulation studies (e.g., [32,33,34,35]). It provides a straightforward yet effective way to evaluate the performance of statistical estimators under controlled conditions. Although real datasets often include a larger number of covariates and more complex distributions, our framework naturally extends to higher-dimensional covariate structures. Exploring such scenarios constitutes a natural avenue for future work.
Given these covariates, we simulated the latent event times, T 12 (admission to complication) and T 23 (complication to death), with exponential marginal distributions:
U j k , i U ( 0 , 1 ) , T j k , i * = ln U j k , i λ j k 0 exp β j k , 1 X 1 , i + β j k , 2 X 2 , i ,
and the baseline hazards and coefficients were fixed as follows:
λ 1 2 0 = 0.05 , β 1 2 = ( 0.02 , 0.50 ) , λ 1 3 0 = 0.02 , β 1 3 = ( 0.01 , 0.30 ) , λ 2 3 0 = 0.10 , β 2 3 = ( 0.03 , 0.40 ) .
The dependence between T 12 and T 23 was induced by generating copula-based pairs, U 1 , U 2 C θ , where C θ denotes the Clayton copula with the specified τ . Right-censoring times, C, were drawn from a uniform distribution and applied independently.
The observed event times were constructed as follows: individuals were first observed for the transition from state 1 to 2 (or directly to state 3 if earlier), and then, if a complication occurred, for the transition to death or censoring.

4.3. Estimation Procedures

On each simulated dataset, we fitted the following.
  • Marginal cox (independence): Three separate Cox models for transitions 1 2 , 1 3 and 2 3 , each including X 1 and X 2 . The joint survival was estimated as S ^ 12 t · S ^ 23 t
  • Clayton copula model: a two-step IFM approach was applied (Section 3.4)
    • Step 1: We fitted the same Cox models to estimate S ^ 12 t , S ^ 23 t .
    • Step 2: pseudo-observations were derived from the marginal estimates, and the copula parameter θ was estimated via maximum likelihood using the uncensored bivariate observations.
  • Gumbel and Frank copula models: the same two-step IFM procedure was applied, assuming mis-specified dependence structures using Gumbel and Frank copulas, respectively.
The true joint survival function at each time point, t p , was calculated as follows:
S t p , t p = S 12 t p θ + S 23 t p θ 1 1 θ ,
reflecting the generating Clayton copula.
For each method and scenario we computed, over B = 1000 Monte Carlo replicates, the following performance measures at time t. Let S ^ b ( t ) denote the estimate from replicate b and S ( t ) the truth.
  • Bias:
    Bias ( t ) = S ¯ ( t ) S ( t ) , S ¯ ( t ) = 1 B b = 1 B S ^ b ( t ) .
  • Mean squared error (MSE):
    MSE ( t ) = 1 B b = 1 B S ^ b ( t ) S ( t ) 2 .
  • Coverage: for each replicate, b, we formed a 95% confidence interval for S ( t ) using the replicate–specific standard error SE ^ b ( t ) (model–based within replicate): Greenwood’s formula for KM estimates and a delta–method variance (via the Breslow baseline) for the Cox–based estimator. To respect the ( 0 , 1 ) range, we built intervals on the complementary log–log scale g ( x ) = log { log ( x ) } and back–transformed the following:
    CI b ( t ) = g 1 g ( S ^ b ( t ) ) ± 1.96 SE ^ b g ( t ) ,
    where SE ^ b g ( t ) is the standard error of g ( S ^ b ( t ) ) (obtained through the delta method from SE ^ b ( t ) ). The empirical coverage is, then,
    cov ^ ( t ) = 1 B b = 1 B 1 S ( t ) CI b ( t ) .

4.4. Results

In this subsection, we present and discuss the findings from our Monte Carlo experiments, which were designed to assess the performance of joint survival quantile estimators under varying dependence levels, sample sizes, and censoring proportions. We focus on three key metrics: average bias, mean squared error (MSE), and the empirical coverage of 95 % confidence intervals. To highlight the main patterns, we focus on Figure 3, which displays the empirical coverage, and Figure 4, which reports the mean squared error. Both figures compare, under a representative scenario with moderate dependence, the independence estimator against the copula-based estimators (Clayton, Frank, and Gumbel) for the quantiles p = 0.25 , 0.50 , and 0.75 .
We then examine how these metrics change as dependence increases τ = 0.33 , 0.50 , 0.75 , the sample size grows n = 200 , 500 , 1000 , and the censoring levels vary ( 20 % and 50 % ). Full numerical results, including detailed tables of bias, MSE, and coverage for every combination of parameters, are provided in Appendix A (Table A1, Table A2 and Table A3). This structure allows us to concentrate the main discussion on the most salient findings while ensuring the full transparency and reproducibility of the simulation study.
Across all scenarios, the copula-based estimators consistently outperform the product-limit estimator that assumes marginal independence in terms of empirical coverage, mean squared error (MSE), and bias, and their advantage grows as dependence and censoring increase. The key patterns are as follows:
  • Empirical coverage.
    Results clearly demonstrate that assuming marginal independence (product-limit estimator) significantly compromises empirical coverage, falling substantially below the nominal 95 % level in all tested scenarios. Even under weak dependence ( τ = 0.33 ), the coverage ranged between 66 % and 73 % , which is notably poor. Conversely, copula-based estimators (Clayton, Frank, Gumbel) substantially improved coverage, achieving 73–83%, though still below the nominal value. As dependence increased ( τ = 0.50 and τ = 0.67 ), coverage with independence dropped drastically to as low as 40 % , whereas copula-based methods, particularly Clayton, achieved near-nominal coverage (92–97%) under strong dependence.
  • Mean squared error (MSE).
    Copula-based estimators consistently yielded lower MSE values compared to the independence estimator, especially under moderate and strong dependence scenarios. For instance, at τ = 0.50 and 20 % censoring, the MSE for the quantile p = 0.25 , was reduced by approximately 40–50% using copula methods relative to independence. These improvements became more pronounced with increased sample size and reduced censoring. Notably, the Clayton copula consistently provided the lowest MSE values across all tested scenarios.
  • Bias.
    The average bias across all evaluated estimators was consistently negative, indicating a slight but systematic underestimation of the true joint survival quantiles. The absolute bias never exceeded 0.06 , and it decreased as the sample size increased and censoring decreased.

4.5. Discussion

Our findings align with the existing literature on bivariate survival and copula models, highlighting the severe consequences of ignoring dependence structures, leading to biased estimations and incorrect inference [19,36]. The severe under-coverage observed when assuming independence corroborates [6] assertion regarding the importance of explicitly modeling dependence to ensure accurate joint survival inferences. Similar bias phenomena under dependent censoring have also been reported in the copula literature [32], who proposed a copula-based approach to survival data with dependent censoring. Additionally, the superior performance of the Clayton copula aligns with prior studies emphasizing its effectiveness in modeling clinical events exhibiting pronounced positive dependence, such as recurrence times in oncology or paired organ failures [37,38].
Our results also suggest important practical implications. While assuming independence between marginal distributions severely underestimates joint survival probabilities, thus potentially driving inappropriate clinical decisions and resource misallocation, incorrectly specifying the copula family also introduces estimation bias, though typically less severe. Nonetheless, even when miss-electing the copula family, copula-based estimators consistently outperform independence assumptions, highlighting their robustness and clinical relevance.
Therefore, we strongly recommend employing copula-based inference, particularly the Clayton family, in clinical contexts characterized by considerable positive dependence between event times.

5. Real-Data Application

To illustrate the applicability of the copula-driven multi-state model, we analyzed a large cohort of patients diagnosed with COVID-19 in four Colombian cities between 2021 and 2022. The dataset included sociodemographic, clinical, and vaccination information, as well as records of hospitalization and death. In this section, we first present the baseline characteristics of the study population and the marginal risk estimates obtained using Cox proportional hazards models for each transition within the illness–death framework. We then assess the dependence between hospitalization and death times using several Archimedean copula families, comparing their fit and joint survival estimates. Finally, we discuss the findings in light of the simulation results and the existing literature, emphasizing their clinical and epidemiological implications for the management of COVID-19 in the Colombian context.
The analysis of this multi-city cohort reinforces the necessity of accounting for residual dependence in multi-state survival analyses.
Baseline characteristics (Table 1) revealed a predominantly young population (57.8% aged 18–44) with high vaccination coverage (80.5%). Nevertheless, the hospitalization rate reached 50%, and the overall mortality rate was 2.4%, reflecting the substantial clinical burden of the pandemic even in a vaccinated population.
Cox regression models (Table 2) confirmed the strong protective effect of vaccination across all transitions, with particularly marked reductions in the risk of direct mortality (HR = 0.10) and mortality following hospitalization (HR = 0.04). These findings are consistent with evidence from other large-scale studies highlighting the effectiveness of vaccination in reducing severe outcomes [39,40,41]. Older patients (≥65 years) displayed dramatically higher hazards, up to 47-fold for direct death and 42-fold for death after hospitalization, underscoring their vulnerability. Male sex and comorbidities further amplified risk, in line with international literature on COVID-19 risk factors [42].
For each transition, we assessed the proportional-hazards (PH) assumption using Schoenfeld residual plots with LOESS smoothing and global PH tests (Figure A1, Figure A2 and Figure A3). To check that the marginal Cox models reproduce the observed survival, we contrasted the population-averaged survival implied by the Cox fits with the nonparametric Kaplan–Meier estimator and its 95% Greenwood band (Figure A4). We further compared transition-specific hazard rate functions (HRFs) on the log scale: the Cox hazard h Cox t was obtained by differentiating the Breslow cumulative baseline hazard (and averaging over strata), and it was contrasted with a kernel-based nonparametric estimator (muhaz) (Figure A5). Finally, total time on test (TTT) curves were used to diagnose the qualitative shape of the hazard over follow-up (Figure A6): curves below the 45 line indicate a decreasing hazard, curves above indicate an increasing hazard, and proximity to the diagonal suggests an approximately constant hazard. We report u max , the proportion failed at the point of maximum vertical deviation from the diagonal, as a simple summary of departure from constancy.
For the joint model, candidate copulas were compared using log-likelihood, AIC, BIC, and CAIC, and were subjected to multiplier-bootstrap goodness-of-fit tests based on Kolmogorov–Smirnov (KS), Cramér–von Mises (CmV), and Anderson–Darling (AD) statistics, computed on rank pseudo-observations; margins were estimated semiparametrically (Table 3 and Table 4).
Proportional hazards (PHs) diagnostics indicated no substantial deviations from the proportional hazards assumption (global tests: Diagnosis → Hospitalization p = 0.370; Hospitalization → Death p = 0.178; and Diagnosis → Death p = 0.097; Figure A1, Figure A2 and Figure A3). Kaplan–Meier curves and the population-averaged Cox survival were nearly indistinguishable, and the Cox curves lay within the KM 95 % band across transitions (Appendix A, Figure A4). Hazard-rate comparisons were consistent (Figure A5): Diagnosis → Hospitalization displayed an early peak, followed by a monotone decline; Hospitalization → Death decreased steadily; Diagnosis → Death remained low with a slight downward trend. TTT plots corroborated these patterns (Figure A6), indicating predominantly decreasing hazards with the largest deviation from constancy for Diagnosis → Hospitalization ( u max = 36.9 % ) and small deviations for Diagnosis → Death ( 0.5 % ) and Hospitalization → Death ( 2.2 % ). These checks support the use of Cox PH models for the marginal transition intensities in this application.
The copula selection analysis (Table 3) indicated that the Gumbel copula provided the best overall fit. Selection was based on multiple criteria log-likelihood, Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and Consistent AIC (CAIC), with Gumbel achieving the largest log-likelihood and the smallest information criteria. The implied Kendall’s τ was 0.72 , indicating strong upper-tail dependence between hospitalization and death. These findings are consistent with prior copula-based survival analyses in medical settings where Gumbel effectively captures upper-tail dependence [24,25].
We complemented model selection with rank-based goodness-of-fit tests for the empirical copula using the multiplier bootstrap ( B = 5000 ). Table 4 reports p-values for the Kolmogorov–Smirnov (KS), Cramér–von Mises (CvM), and Anderson–Darling (AD). Only the Gumbel copula is not rejected (e.g., p = 0.63 , 0.60 , 0.65 ), whereas Clayton and Frank are rejected ( p < 0.01 ).
Beyond the covariate effects, the diagnostic suite (PH checks, KM–Cox agreement, HRF, and TTT) indicates that our Cox components provide a reliable marginal description of each transition. This justifies using them as the margins in the copula framework.
The Kaplan–Meier survival curves (Figure 5) illustrated distinct survival patterns for each transition, while the joint survival estimates (Figure 6) revealed a critical finding. Relative to the copula-based estimate, the independence curve lies uniformly lower, indicating systematic underestimation of joint survival when dependence is ignored. The shaded region highlights the pointwise gap between the two curves, quantifying how much joint survival would be understated under the independence assumption across follow-up. This underestimation occurs because neglecting dependence fails to capture the compounding risk when hospitalization and death are correlated. As demonstrated in our simulations, the independence assumption yielded empirical coverage rates as low as 40% under strong dependence, compared with 92–97% when copula models were applied. These results align with previous methodological studies that highlight the risks of disregarding dependence in multi-state data [6,19].
From a clinical perspective, this misestimation is particularly concerning. Underestimating joint survival implies that clinicians and policymakers may underestimate the likelihood of patients experiencing the combined burden of hospitalization and death, potentially leading to inadequate risk stratification, delayed interventions, or misallocation of healthcare resources. As emphasized by [17,43], such underestimation can substantially distort clinical decision-making and ultimately compromise patient outcomes. In our cohort, the copula-based estimates, particularly those from the Gumbel copula, provided a more accurate representation of survival by capturing the strong positive dependence between transitions and yielding more reliable evidence to guide clinical and epidemiological planning.
In summary, these findings confirm the simulation results and reinforce a critical message: assuming independence in the presence of dependence systematically underestimates joint survival, which can have serious consequences for patient management and health policy. Copula-based multi-state models offer a robust framework to overcome this limitation and should be considered a methodological standard in contexts where sequential clinical events are strongly correlated.

6. Conclusions

This work introduced a bivariate copula–driven multi-state model that extends the conventional illness–death framework by explicitly modeling the dependence between sequential event times. Methodologically, we formulate a joint semiparametric likelihood using Inference Functions for Margins (IFM): Cox proportional hazards models provide covariate-adjusted marginal transition intensities, while an Archimedean copula encodes the dependence structure. The construction is flexible yet tractable, allowing estimation of both marginal and joint survival functions under right-censoring.
Extensive simulation experiments showed that ignoring dependence, as assumed under independence, systematically underestimates joint survival and can lead to severe coverage losses when dependence is moderate to strong. In contrast, copula-based estimators, particularly those from the Gumbel and Clayton families, achieved near-nominal coverage and lower mean squared error, confirming the theoretical robustness of the proposed framework even under partial copula misspecification.
The large Colombian COVID-19 cohort further validated the approach. First, model-fit diagnostics supported the adequacy of the Cox margins: global PH tests showed no material violations; population-averaged Cox survival closely tracked Kaplan–Meier with Greenwood bands; hazard-rate functions (Cox vs. kernel) displayed the expected shapes; and TTT plots indicated predominantly decreasing hazards with the largest deviation from constancy for Diagnosis → Hospitalization. Second, copula selection favored Gumbel by pseudo-log-likelihood and information criteria (AIC/BIC/CAIC), and rank-based bootstrap GOF tests (AD/KS/CvM) failed to reject Gumbel while rejecting Clayton and Frank. Substantively, the estimated upper-tail dependence between hospitalization and death was strong (Kendall’s τ 0.72 ), consistent with clinical intuition about severe disease progression.
These results have practical implications. Assuming independence when transitions are correlated can underestimate the joint burden of hospitalization and death, potentially distorting risk stratification, timing of interventions, and resource planning. By combining covariate-adjusted Cox margins with a well-supported copula, our framework yields more reliable joint survival estimates, offering a principled alternative to independence-based approaches in clinical and epidemiological studies.

7. Limitations and Future Work

This study has several limitations that also suggest natural extensions. First, the dependence structure was restricted to one-parameter Archimedean copulas. This choice provides parsimony and interpretability, but it imposes a single, global form of dependence and emphasizes either the lower or the upper tail. Second, the marginal transition intensities were modeled under the proportional-hazards (PHs) assumption via Cox models. Although our Schoenfeld and TTT/HRF diagnostics supported PH for this application, the assumption can be violated in other settings. Third, we estimated the joint model using Inference Functions for Margins (IFM). IFM is consistent and computationally attractive, but it ignores some cross-equation curvature; deriving finite-sample properties and standard-error formulas under censoring remains an open problem. Finally, the population-averaged HRF was obtained by differentiating the Breslow baseline; uncertainty from this transformation was not propagated into the HRF plots.
Several avenues can strengthen and broaden the framework:
  • Margins. Replace/compare Cox with parametric hazards (e.g., alpha-power, piecewise exponential, or Royston–Parmar) and, for death transitions showing plateaus, mixture-cure models; allow time-varying or stratified effects when PH is doubtful.
  • Dependence. Enrich the copula layer via nested Archimedean or vine copulas, or covariate/time-varying copula parameters to capture evolving dependence.
  • Estimation. Move beyond IFM to full-likelihood or Bayesian joint estimation and carry uncertainty into HRF summaries; study finite-sample properties under censoring.
These steps are compatible with the present framework and, in this dataset, the most immediately impactful additions are as follows: alpha-power parametric margins and cure-fraction models for the death transitions.

Author Contributions

Conceptualization, H.B., G.M.-F. and R.T.-F.; methodology, H.B.; formal analysis, H.B.; investigation, H.B., G.M.-F. and R.T.-F.; data curation, H.B., G.M.-F. and R.T.-F.; writing—original draft preparation, H.B.; writing—review and editing, H.B.; project administration, G.M.-F. and R.T.-F.; funding acquisition, G.M.-F. and R.T.-F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Vice-rectorate for Research of the Universidad de Córdoba, Colombia, project grant FCB-03-23: “Aplicación de Metodologías Estadísticas a Datos de Vigilancia en Salud Pública en Colombia” (R.T.-F. and G.M.-F.).

Data Availability Statement

The data and codes used in this study are available upon request from the corresponding author. The data are not publicly available due to privacy restrictions.

Acknowledgments

This article was developed as a research product during the probationary period of the first author (H.B.) at the Universidad de Sucre. The authors also acknowledge the institutional support of the Universidad de Sucre and the Universidad de Córdoba, which provided the academic environment and resources that contributed to the completion of this work. The authors are also grateful to the anonymous reviewers for their careful reading and valuable suggestions, which helped improve the quality of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
IFMInference Functions for Margins
MSEMean Squared Error
AICAkaike Information Criterion
BICBayesian Information Criterion
CAICConsistent AIC
HRFHazard-Rate Functions
TTTTotal Time on Test
PHsProportional Hazards

Appendix A. Simulation Results and Marginal Fit Diagnostics

Appendix A.1. Detailed Simulation Tables

The following tables, Table A1, Table A2 and Table A3, provide a complete numerical account of our Monte Carlo study. For each copula family (Clayton, Frank, Gumbel) and the independence benchmark, they list the average bias, mean squared error (MSE), and empirical coverage of 95 % confidence intervals for the joint survival quantiles p = 0.25 , 0.50 and 0.75 under three dependence levels τ = 0.25 , 0.50 , 0.75 , across varying sample sizes and censoring proportions. These detailed results support the summary figures in the main text and allow readers to verify the robustness of each estimator under all considered scenarios.
Table A1. Simulation performance metrics (Bias, MSE, and Coverage) for joint survival quantile estimators at p = 0.25 , 0.50 , and 0.75 under dependence τ = 0.33 .
Table A1. Simulation performance metrics (Bias, MSE, and Coverage) for joint survival quantile estimators at p = 0.25 , 0.50 , and 0.75 under dependence τ = 0.33 .
nCensMethod t 0.25 t 0.50 t 0.75
BiasMSECoverBiasMSECoveBiasMSECover
20020%Indepen.−0.03150.047069.6248−0.03010.047270.3150−0.02960.042168.7631
Clayton−0.01230.024277.7591−0.00990.026379.8654−0.00900.025580.0428
Frank−0.01530.029975.0420−0.01480.029776.8876−0.01080.03173.4736
Gumbel−0.0150.026873.2808−0.01360.030972.4536−0.01100.029671.9241
50%Indepen.−0.03140.040266.3335−0.03020.041265.2388−0.02940.042364.6366
Clayton−0.01250.025375.8529−0.01300.025478.0613−0.01200.025775.8581
Frank−0.01710.028469.5521−0.01770.030170.8179−0.01770.029771.0670
Gumbel−0.01460.030068.6262−0.01510.029569.6863−0.01540.033170.0467
50020%Indepen.−0.02850.041369.0889−0.02980.040972.9304−0.02930.041970.7275
Clayton−0.00940.022880.7339−0.01140.023680.7310−0.01000.023780.4574
Frank−0.01360.026977.6692−0.01730.027176.2826−0.01420.026876.4679
Gumbel−0.01190.028172.8579−0.01540.0371.8125−0.01290.030273.9711
50%Indepen.−0.03010.040466.9640−0.02990.040567.2321−0.0310.040167.3374
Clayton−0.00990.022977.1285−0.00970.023780.3641−0.01050.023277.6133
Frank−0.01420.027074.0208−0.01430.029273.5759−0.01710.027672.7427
Gumbel−0.01390.028371.8653−0.01310.030170.8162−0.01400.028471.3511
100020%Indepen.−0.02930.040271.8346−0.02920.040672.3745−0.02920.040773.5831
Clayton−0.00930.021082.7723−0.00610.020880.6443−0.00810.020383.1196
Frank−0.01300.024175.7694−0.00970.024077.8382−0.01080.024279.1307
Gumbel−0.01340.024176.0206−0.00890.025175.3124−0.01180.026576.0794
50%Indepen.−0.03010.041669.6842−0.03140.040669.8166−0.02810.037670.5177
Clayton−0.00900.020480.4389−0.00880.021179.4383−0.00820.020181.4859
Frank−0.01240.023776.7811−0.01410.024775.4459−0.01180.024075.8386
Gumbel−0.01060.025473.0186−0.01220.024371.3825−0.01130.025073.3120
Table A2. Simulation performance metrics (Bias, MSE, and Coverage) for joint survival quantile estimators at p = 0.25 , 0.50 , and 0.75 under dependence τ = 0.5 .
Table A2. Simulation performance metrics (Bias, MSE, and Coverage) for joint survival quantile estimators at p = 0.25 , 0.50 , and 0.75 under dependence τ = 0.5 .
nCensMethod t 0.25 t 0.50 t 0.75
BiasMSECoverBiasMSECoveBiasMSECover
20020%Indepen.−0.0410.053158.543−0.03890.050759.8173−0.04020.051259.2404
Clayton−0.00820.025584.9564−0.00830.02382.8707−0.00780.022683.4459
Frank−0.010.028776.1301−0.01050.027875.6787−0.0110.026874.2828
Gumbel−0.01250.030572.3124−0.00980.028374.0408−0.01080.028172.6693
50%Indepen.−0.0390.051358.0232−0.03920.051756.3771−0.04130.053954.9623
Clayton−0.0080.02480.2138−0.00790.024182.1695−0.00710.024682.4536
Frank−0.01310.027973.4811−0.01260.02772.3782−0.01090.029871.1431
Gumbel−0.01310.028270.1113−0.00920.028470.2578−0.00890.029370.1408
50020%Indepen.−0.04320.048859.7813−0.03930.048160.9558−0.03940.050660.0366
Clayton−0.00780.021786.4482−0.00730.02285.6575−0.00810.021785.724
Frank−0.01190.025677.4783−0.01190.028876.5898−0.01180.024378.4892
Gumbel−0.01060.026475.4871−0.00810.026974.6786−0.01050.02674.4679
50%Indepen.−0.040.049557.3644−0.04140.050757.0399−0.04040.049758.3495
Clayton−0.00710.021383.24−0.00700.021482.423−0.00820.023681.9138
Frank−0.01070.029773.6307−0.01240.025573.8452−0.01210.027474.9103
Gumbel−0.01090.026173.4743−0.01090.024874.1702−0.01090.029273.2449
100020%Indepen.−0.04160.043662.851−0.03980.043863.9631−0.04040.042960.7854
Clayton−0.00590.018588.4837−0.00790.018085.6872−0.00690.018087.0821
Frank−0.00940.022180.2484−0.01180.023778.9311−0.01180.024177.9725
Gumbel−0.00760.024477.6773−0.01090.022779.1138−0.01110.021179.9683
50%Indepen.−0.03930.043960.6706−0.04060.045659.8494−0.04110.04459.1562
Clayton−0.00640.018685.1172−0.00560.020785.8221−0.00650.020483.6295
Frank−0.00970.022777.5161−0.00940.024877.1136−0.01010.023477.8659
Gumbel−0.00980.025375.4597−0.01090.024574.7300−0.00910.024174.7189
Table A3. Simulation performance metrics (Bias, MSE, and Coverage) for joint survival quantile estimators at p = 0.25 , 0.50 , and 0.75 under dependence τ = 0.67 .
Table A3. Simulation performance metrics (Bias, MSE, and Coverage) for joint survival quantile estimators at p = 0.25 , 0.50 , and 0.75 under dependence τ = 0.67 .
nCensMethod t 0.25 t 0.50 t 0.75
BiasMSECoverBiasMSECoveBiasMSECover
20020%Indepen.−0.06040.062647.3231−0.06190.061244.7579−0.05970.05843.9796
Clayton−0.00630.022494.1316−0.00520.022295.4655−0.00590.020493.9587
Frank−0.01170.027480.4285−0.00920.024679.5198−0.00940.024178.8466
Gumbel−0.00890.028178.5178−0.0080.026873.868−0.00790.027475.8526
50%Indepen.−0.05930.061440.7743−0.06110.061239.9696−0.06040.061238.7751
Clayton−0.00550.02491.4467−0.00470.021689.7386−0.00570.021689.8389
Frank−0.01010.027776.1474−0.00930.025976.0295−0.00790.025778.6842
Gumbel−0.00840.029872.7686−0.00870.027472.1694−0.00720.027972.9548
50020%Indepen.−0.0610.054840.8398−0.05920.056941.5215−0.0570.056442.4796
Clayton−0.00370.020496.585−0.00550.019394.9181−0.00340.0295.6602
Frank−0.00740.024181.6106−0.00910.021682.2602−0.00730.023883.26
Gumbel−0.00650.02679.856−0.00660.02578.0756−0.00640.024979.1335
50%Indepen.−0.05970.056639.9407−0.06070.057238.713−0.06050.059238.133
Clayton−0.0030.021893.2805−0.00260.021993.9646−0.00530.020292.655
Frank−0.00740.026278.8456−0.00720.024577.3108−0.01050.023577.0498
Gumbel−0.00540.025673.4647−0.00690.02673.187−0.00980.023573.3557
100020%Indepen.−0.05930.053242.851−0.06040.052243.2692−0.06130.051343.8451
Clayton−0.00230.017396.7774−0.00210.016794.5497−0.00050.01793.7262
Frank−0.00620.020684.083−0.00630.020984.0673−0.0050.02283.864
Gumbel−0.00610.023278.5108−0.00690.021580.4637−0.0040.020280.2535
50%Indepen.−0.05970.051242.379−0.06240.051840.1266−0.05850.054440.7119
Clayton−0.0020.017492.5743−0.00130.016291.6146−0.00250.01790.2778
Frank−0.00510.021980.9206−0.00540.021981.777−0.00720.021279.9565
Gumbel−0.00460.023477.8571−0.00480.020577.2225−0.00540.020976.9871

Appendix A.2. Marginal Fits from the Application

The results presented in this appendix display the proportional-hazards diagnostics for each transition of the illness–death process. Schoenfeld residual plots with LOESS smoothing and global tests are reported, allowing a visual and statistical assessment of whether the proportional hazards assumption holds across covariates. These results confirm the adequacy of the Cox proportional hazards models as marginal specifications within the copula-based framework.
Figure A1. Proportional-hazards diagnostics for the diagnosis–hospitalization transition. Schoenfeld residuals by covariate with LOESS smooth (blue) and 95% pointwise confidence band (gray); the horizontal dashed line marks zero. Panel labels report covariate-specific p-values for proportionality. Global proportional-hazards test: p = 0.370.
Figure A1. Proportional-hazards diagnostics for the diagnosis–hospitalization transition. Schoenfeld residuals by covariate with LOESS smooth (blue) and 95% pointwise confidence band (gray); the horizontal dashed line marks zero. Panel labels report covariate-specific p-values for proportionality. Global proportional-hazards test: p = 0.370.
Mathematics 13 03072 g0a1
Figure A2. Proportional-hazards diagnostics for the hospitalization-death transition. Schoenfeld residuals by covariate with LOESS smooth (blue) and 95% pointwise confidence band (gray); the horizontal dashed line marks zero. Panel labels report covariate-specific p-values for proportionality. Global proportional-hazards test: p = 0.178.
Figure A2. Proportional-hazards diagnostics for the hospitalization-death transition. Schoenfeld residuals by covariate with LOESS smooth (blue) and 95% pointwise confidence band (gray); the horizontal dashed line marks zero. Panel labels report covariate-specific p-values for proportionality. Global proportional-hazards test: p = 0.178.
Mathematics 13 03072 g0a2
Figure A3. Proportional-hazards diagnostics for the diagnosis–death transition. Schoenfeld residuals by covariate with LOESS smooth (blue) and 95% pointwise confidence band (gray); the horizontal dashed line marks zero. Panel labels report covariate-specific p-values for proportionality. Global proportional-hazards test: p = 0.097.
Figure A3. Proportional-hazards diagnostics for the diagnosis–death transition. Schoenfeld residuals by covariate with LOESS smooth (blue) and 95% pointwise confidence band (gray); the horizontal dashed line marks zero. Panel labels report covariate-specific p-values for proportionality. Global proportional-hazards test: p = 0.097.
Mathematics 13 03072 g0a3

Appendix A.3. Compatibility of Adjustments to Data in Each Transition

The following results compare semiparametric Cox models against nonparametric estimators to evaluate the consistency of marginal fits. Kaplan–Meier curves with Greenwood confidence intervals are contrasted with population-averaged Cox survival curves, transition-specific hazards are compared against kernel estimators, and TTT plots summarize hazard shapes across follow-up. Collectively, these results demonstrate that the Cox models reproduce the empirical survival patterns, supporting their use as the margins in the joint copula model.
Figure A4. Transition-specific marginal survival: Kaplan–Meier (95% CI) vs. Cox (population-averaged). Marginal survival for each transition. The blue dashed lines show the Kaplan–Meier estimator with a 95% Greenwood confidence band (shaded); the red solid lines show the population-averaged survival from the Cox marginal models.
Figure A4. Transition-specific marginal survival: Kaplan–Meier (95% CI) vs. Cox (population-averaged). Marginal survival for each transition. The blue dashed lines show the Kaplan–Meier estimator with a 95% Greenwood confidence band (shaded); the red solid lines show the population-averaged survival from the Cox marginal models.
Mathematics 13 03072 g0a4
Figure A5. Estimated hazard rate functions, h t , on a log scale for the three transitions. The red solid line shows the Cox population-averaged hazard obtained by differentiating the Breslow cumulative baseline hazard and averaging across strata; the blue dashed line shows the nonparametric kernel estimator (implemented via muhaz). Time since origin is on the x-axis.
Figure A5. Estimated hazard rate functions, h t , on a log scale for the three transitions. The red solid line shows the Cox population-averaged hazard obtained by differentiating the Breslow cumulative baseline hazard and averaging across strata; the blue dashed line shows the nonparametric kernel estimator (implemented via muhaz). Time since origin is on the x-axis.
Mathematics 13 03072 g0a5
Figure A6. Total time on test ( T T T ) plots for each transition. The empirical T T T curve (blue) is compared to the 45° reference line (grey). Values of u max (proportion failed at the maximum vertical deviation from the diagonal) are indicated within each panel: Diagnosis → Hospitalization u max = 36.9 % , Diagnosis→Death u max = 0.5 % , and Hospitalization → Death u max = 2.2 % . Curves lying below the diagonal indicate a decreasing hazard over time; curves above indicate an increasing hazard; proximity to the diagonal suggests an approximately constant hazard.
Figure A6. Total time on test ( T T T ) plots for each transition. The empirical T T T curve (blue) is compared to the 45° reference line (grey). Values of u max (proportion failed at the maximum vertical deviation from the diagonal) are indicated within each panel: Diagnosis → Hospitalization u max = 36.9 % , Diagnosis→Death u max = 0.5 % , and Hospitalization → Death u max = 2.2 % . Curves lying below the diagonal indicate a decreasing hazard over time; curves above indicate an increasing hazard; proximity to the diagonal suggests an approximately constant hazard.
Mathematics 13 03072 g0a6

References

  1. Klein, J.P.; Moeschberger, M.L. Survival Analysis: Techniques for Censored and Truncated Data; Springer: New York, NY, USA, 2003. [Google Scholar] [CrossRef]
  2. Kalbfleisch, J.D.; Prentice, R.L. The Statistical Analysis of Failure Time Data, 2nd ed.; Wiley: New York, NY, USA, 2002. [Google Scholar] [CrossRef]
  3. Collett, D. Modelling Survival Data in Medical Research, 3rd ed.; Chapman & Hall/CRC: Boca Raton, FL, USA, 2015. [Google Scholar] [CrossRef]
  4. Cox, D.R. Regression models and life-tables. J. R. Stat. Soc. Ser. B 1972, 34, 187–220. [Google Scholar] [CrossRef]
  5. Therneau, T.M.; Grambsch, P.M. Modeling Survival Data: Extending the Cox Model; Springer: New York, NY, USA, 2000. [Google Scholar] [CrossRef]
  6. Hougaard, P. Analysis of Multivariate Survival Data; Springer: New York, NY, USA, 2000. [Google Scholar] [CrossRef]
  7. Meira-Machado, L.; de Uña-Álvarez, J.; Cadarso-Suárez, C.; Andersen, P.K. Multi-state models for the analysis of time-to-event data. Stat. Methods Med. Res. 2009, 18, 195–222. [Google Scholar] [CrossRef]
  8. Beyersmann, J.; Allignol, A.; Schumacher, M. Competing Risks and Multistate Models with R; Springer: New York, NY, USA, 2012. [Google Scholar] [CrossRef]
  9. Putter, H.; Fiocco, M.; Geskus, R.B. Tutorial in biostatistics: Competing risks and multi-state models. Stat. Med. 2007, 26, 2389–2430. [Google Scholar] [CrossRef]
  10. Andersen, P.K.; Keiding, N. Multi-state models for event history analysis. Stat. Methods Med. Res. 2002, 11, 91–115. [Google Scholar] [CrossRef]
  11. Cook, R.J.; Lawless, J. The Statistical Analysis of Recurrent Events; Springer: New York, NY, USA, 2007. [Google Scholar] [CrossRef]
  12. van Houwelingen, H.C.; Putter, H. Dynamic Predicting by Landmarking as an Alternative for Multi-State Modeling: An Application to Acute Lymphoid Leukemia Data. Lifetime Data Anal. 2008, 14, 447–463. [Google Scholar] [CrossRef]
  13. De Uña-Álvarez, J.; Meira-Machado, L. Nonparametric estimation of transition probabilities in the non-Markov illness-death model: A comparative study. Biometrics 2015, 71, 364–375. [Google Scholar] [CrossRef] [PubMed]
  14. Li, Z.; Chinchilli, V.M.; Wang, M. A Bayesian Joint Model of Recurrent Events and a Terminal Event. Biom. J. 2019, 61, 187–202. [Google Scholar] [CrossRef]
  15. Ferrer, L.; Rondeau, V.; Dignam, J.; Pickles, T.; Jacqmin-Gadda, H.; Proust-Lima, C. Joint Modelling of Longitudinal and Multi-State Processes: Application to Clinical Progressions in Prostate Cancer. Stat. Med. 2016, 35, 3933–3948. [Google Scholar] [CrossRef]
  16. Ramezankhani, A.; Blaha, M.J.; Mirbolouk, M.H.; Azizi, F.; Hadaegh, F. Multi-State Analysis of Hypertension and Mortality: Application of Semi-Markov Model in a Longitudinal Cohort Study. BMC Cardiovasc. Disord. 2020, 20, 321. [Google Scholar] [CrossRef] [PubMed]
  17. Nelsen, R.B. An Introduction to Copulas, 2nd ed.; Springer: New York, NY, USA, 2006. [Google Scholar] [CrossRef]
  18. Joe, H. Multivariate Models and Dependence Concepts; Chapman & Hall: London, UK, 1997. [Google Scholar] [CrossRef]
  19. Emura, T.; Matsui, S.; Rondeau, V. Survival Analysis with Correlated Endpoints: Joint Frailty-Copula Models; Springer: Singapore, 2019. [Google Scholar] [CrossRef]
  20. Othus, M.; Li, Y. A Gaussian Copula Model for Multivariate Survival Data. Stat. Biosci. 2010, 2, 154–179. [Google Scholar] [CrossRef] [PubMed]
  21. Gasparini, A.; Humphreys, K. A Natural History and Copula-Based Joint Model for Regional and Distant Breast Cancer Metastasis. Stat. Methods Med. Res. 2022, 31, 2415–24300. [Google Scholar] [CrossRef]
  22. Shewa, F.; Endale, S.; Nugussu, G.; Abdisa, J.; Zerihun, K.; Banbeta, A. Time to Kidneys Failure Modeling in the Patients at Adama Hospital Medical College: Application of Copula Model. J. Res. Health Sci. 2022, 22, e00549. [Google Scholar] [CrossRef]
  23. Cheung, L.C.; Albert, P.S.; Das, S.; Cook, R.J. Multistate Models for the Natural History of Cancer Progression. Br. J. Cancer 2022, 127, 1279–1288. [Google Scholar] [CrossRef] [PubMed]
  24. Shewa Gari, F.; Fenta Biru, T.; Endale Gurmu, S. Application of the Joint Frailty Copula Model for Analyzing Time to Relapse and Time to Death of Women with Cervical Cancer. Int. J. Women’s Health 2023, 15, 1295–13046. [Google Scholar] [CrossRef] [PubMed]
  25. Ieva, F.; Jackson, C.H.; Sharples, L.D. Multi-State Modelling of Repeated Hospitalisation and Death in Patients with Heart Failure: The Use of Large Administrative Databases in Clinical Epidemiologys. Stat. Methods Med. Res. 2017, 26, 1350–1372. [Google Scholar] [CrossRef] [PubMed]
  26. Le-Rademacher, J.G.; Therneau, T.M.; Ou, F.S. The Utility of Multistate Models: A Flexible Framework for Time-to-Event Data. Curr. Epidemiol. Rep. 2022, 9, 183–189. [Google Scholar] [CrossRef]
  27. Sklar, A. Fonctions de répartition à n dimensions et leurs marges. Publ. L’Institut Stat. L’Université Paris 1959, 8, 229–231. Available online: https://hal.science/hal-04094463/document (accessed on 20 July 2025).
  28. Joe, H. Dependence Modeling with Copulas; Chapman and Hall/CRC: Boca Raton, FL, USA, 2014. [Google Scholar] [CrossRef]
  29. Oakes, D. Bivariate survival models induced by frailties. J. Am. Stat. Assoc. 1989, 84, 487–493. [Google Scholar] [CrossRef]
  30. Genest, C.; Rémillard, B.; Beaudoin, D. Goodness-of-fit tests for copulas: A review and a power study. Insur. Math. Econ. 2009, 44, 199–213. [Google Scholar] [CrossRef]
  31. Hafner, C.M.; Reznikova, O. Efficient estimation of a semiparametric dynamic copula model. Comput. Stat. Data Anal. 2010, 54, 2609–2627. [Google Scholar] [CrossRef]
  32. Emura, T.; Chen, Y.H. Gene Selection for Survival Data under Dependent Censoring: A Copula-Based Approach. Stat. Methods Med. Res. 2014, 25, 2840–2857. [Google Scholar] [CrossRef] [PubMed]
  33. Erdmann, A.; Loos, A.; Beyersmann, J. A Connection Between Survival Multistate Models and Causal Inference for External Treatment Interruptions. Stat. Methods Med. Res. 2023, 32, 697–712. [Google Scholar] [CrossRef]
  34. Li, J.; Fine, J. On sample size for sensitivity and specificity in prospective diagnostic accuracy studies. Stat. Med. 2004, 23, 2537–2550. [Google Scholar] [CrossRef]
  35. Zhou, B.; Fine, J.; Laird, G. Goodness-of-Fit Test for Proportional Subdistribution Hazards Model. Stat. Med. 2013, 32, 3804–3811. [Google Scholar] [CrossRef]
  36. Shih, J.H.; Louis, T.A. Inferences on the association parameter in copula models for bivariate survival data. Biometrics 1995, 51, 1384–1399. [Google Scholar] [CrossRef]
  37. Lakhal-Chaieb, L.; Rivest, L.P.; Abdous, B. Estimating survival under dependent truncation. Biometrika 2006, 93, 655–669. [Google Scholar] [CrossRef]
  38. Emura, T.; Chen, Y.H. Analysis of Survival Data with Dependent Censoring; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar] [CrossRef]
  39. Arbel, R.; Hammerman, A.; Sergienko, R.; Friger, M.; Peretz, A.; Netzer, D.; Yaron, S. BNT162b2 Vaccine Booster and Mortality Due to Covid-19. New Engl. J. Med. 2021, 385, 2413–2420. [Google Scholar] [CrossRef]
  40. de Gier, B.; van Asten, L.; Boere, T.M.; van Roon, A.; van Roekel, C.; Pijpers, J.; van Werkhoven, C.H.H.; van den Ende, C.; Hahné, S.J.M.; de Melker, H.E.; et al. Effect of COVID-19 vaccination on mortality by COVID-19 and on mortality by other causes, the Netherlands, January 2021-January 2022. Vaccine 2023, 41, 4488–4496. [Google Scholar] [CrossRef]
  41. González Rodríguez, J.L.; Oprescu, A.M.; Muñoz Lezcano, S.; Cordero Ramos, J.; Romero Cabrera, J.L.; Armengol de la Hoz, M.A.; Estella, A. Assessing the Impact of Vaccines on COVID-19 Efficacy in Survival Rates: A Survival Analysis Approach for Clinical Decision Support. Front. Public Health 2024, 12, 1437388. [Google Scholar] [CrossRef] [PubMed]
  42. Peckham, H.; de Gruijter, N.M.; Raine, C.; Radziszewska, A.; Ciurtin, C.; Wedderburn, L.R.; Rosser, E.C.; Webb, K.; Deakin, C.T. Male sex identified by global COVID-19 meta-analysis as a risk factor for death and ITU admission. Nat. Commun. 2020, 11, 6317. [Google Scholar] [CrossRef] [PubMed]
  43. Li, N.; Zhao, M.; Xu, L. Bivariate copula regression models for semi-competing risks. Stat. Methods Med. Res. 2023, 32, 843–859. [Google Scholar] [CrossRef]
Figure 1. Basic structures of survival and multi–state models. (a) Survival model. (b) Disease–death model. (c) Multi-state model with k progressive states.
Figure 1. Basic structures of survival and multi–state models. (a) Survival model. (b) Disease–death model. (c) Multi-state model with k progressive states.
Mathematics 13 03072 g001
Figure 2. Proposed illness–death model with covariate-adjusted transition hazards. From the initial state (State 1), subjects may move to hospitalization (State 2) or directly to death (State 3). Subsequent death after hospitalization is also modeled. Each arrow corresponds to a Cox hazard λ j k t X for transition j k .
Figure 2. Proposed illness–death model with covariate-adjusted transition hazards. From the initial state (State 1), subjects may move to hospitalization (State 2) or directly to death (State 3). Subsequent death after hospitalization is also modeled. Each arrow corresponds to a Cox hazard λ j k t X for transition j k .
Mathematics 13 03072 g002
Figure 3. Coverage probabilities (%) of 95% confidence intervals for joint survival quantiles p = 0.25 , 0.50 , 0.75 , comparing independence and copula-based estimators (Clayton, Frank, Gumbel) across sample sizes and censoring levels. The independence assumption is represented by dashed lines with square markers, while copula-based estimators are represented by solid lines with circular markers.
Figure 3. Coverage probabilities (%) of 95% confidence intervals for joint survival quantiles p = 0.25 , 0.50 , 0.75 , comparing independence and copula-based estimators (Clayton, Frank, Gumbel) across sample sizes and censoring levels. The independence assumption is represented by dashed lines with square markers, while copula-based estimators are represented by solid lines with circular markers.
Mathematics 13 03072 g003
Figure 4. Mean squared error (MSE) of joint survival quantile estimators p = 0.25 , 0.50 , 0.75 , comparing independence and copula-based methods (Clayton, Frank, Gumbel) across sample sizes and censoring levels. The independence assumption is represented by dashed lines with square markers, while copula-based estimators are represented by solid lines with circular markers.
Figure 4. Mean squared error (MSE) of joint survival quantile estimators p = 0.25 , 0.50 , 0.75 , comparing independence and copula-based methods (Clayton, Frank, Gumbel) across sample sizes and censoring levels. The independence assumption is represented by dashed lines with square markers, while copula-based estimators are represented by solid lines with circular markers.
Mathematics 13 03072 g004
Figure 5. Kaplan–Meier curves for the three transitions. Diagnosis → Hospitalization (blue), Diagnosis → Death (red), and Hospitalization → Death (green). The steep early decline for Diagnosis → Hospitalization reflects a higher short-term risk of hospitalization after diagnosis, whereas the other transitions remain comparatively rare over follow-up.
Figure 5. Kaplan–Meier curves for the three transitions. Diagnosis → Hospitalization (blue), Diagnosis → Death (red), and Hospitalization → Death (green). The steep early decline for Diagnosis → Hospitalization reflects a higher short-term risk of hospitalization after diagnosis, whereas the other transitions remain comparatively rare over follow-up.
Mathematics 13 03072 g005
Figure 6. Joint survival probability S t , t for progressing from diagnosis through hospitalization to death. The shaded region marks the difference between the copula-based estimate (blue) and independence (black dashed), evidencing systematic underestimation of joint survival when dependence is ignored.
Figure 6. Joint survival probability S t , t for progressing from diagnosis through hospitalization to death. The shaded region marks the difference between the copula-based estimate (blue) and independence (black dashed), evidencing systematic underestimation of joint survival when dependence is ignored.
Mathematics 13 03072 g006
Table 1. Baseline characteristics of the cohort.
Table 1. Baseline characteristics of the cohort.
VariableTotal (n, %)Not Vaccinated (n, %)Vaccinated (n, %)
Age
    18–441,047,007 (57.8%)228,169 (12.6%)818,838 (45.2%)
    45–64542,187 (29.9%)85,112 (4.7%)457,075 (25.2%)
    ≥65221,228 (12.2%)39,148 (2.2%)182,080 (10.1%)
Sex
    Female1,009,942 (55.8%)183,849 (10.2%)826,093 (45.6%)
    Male800,480 (44.2%)168,580 (9.3%)631,900 (34.9%)
Insurance type
    Contributory1,630,231 (90%)294,843 (16.3%)1,335,388 (73.8%)
    Subsidized180,191 (10%)57,586 (3.2%)122,605 (6.8%)
Comorbidities
    No1,547,048 (85.5%)309,599 (17.1%)1,237,449 (68.4%)
    Yes263,374 (14.5%)42,830 (2.4%)220,544 (12.2%)
Vaccinated
    No352,429 (19.5%)
    Yes1,457,993 (80.5%)
Clinical outcomes
    Hospitalized905,790 (50%)184,094 (52.2%)721,696 (49.5%)
    Died43,263 (2.4%)28,931 (8.2%)14,332 (1.0%)
Table 2. Adjusted hazard ratios (HRs) and 95% confidence intervals (CIs) for the three illness–death transitions. Events: 928,273 hospitalizations; 9465 deaths without hospitalization; 20,970 deaths after hospitalization.
Table 2. Adjusted hazard ratios (HRs) and 95% confidence intervals (CIs) for the three illness–death transitions. Events: 928,273 hospitalizations; 9465 deaths without hospitalization; 20,970 deaths after hospitalization.
VariableDiagnosis → HospitalizationDiagnosis → DeathHospitalization → Death
Vaccinated
    NoRefRefRef
    Yes0.88 (0.88–0.89)0.10 (0.10–0.11)0.04 (0.04–0.04)
Age
    18–44RefRefRef
    45–641.06 (1.05–1.06)7.96 (7.22–8.77)9.42 (8.86–10.00)
     ≥651.12 (1.11–1.13)47.21 (42.98–51.86)42.35 (39.88–44.98)
Sex
    FemaleRefRefRef
    Male1.03 (1.02–1.03)1.70 (1.63–1.77)1.89 (1.83–1.94)
Comorbidity
    NoRefRefRef
    Yes0.97 (0.96–0.98)1.77 (1.69–1.85)1.78 (1.73–1.84)
Insurance type
    ContributoryRefRefRef
    Subsidized1.08 (1.07–1.09)1.38 (1.31–1.45)1.34 (1.30–1.39)
Values are HR (95% CI). Ref = reference category. Cox models stratified by city.
Table 3. Copula selection for the joint modeling: maximum-likelihood estimate θ , implied Kendall’s τ , log-likelihood, and information criteria. Lower AIC/BIC/CAIC indicate better fit; best values are shown in bold.
Table 3. Copula selection for the joint modeling: maximum-likelihood estimate θ , implied Kendall’s τ , log-likelihood, and information criteria. Lower AIC/BIC/CAIC indicate better fit; best values are shown in bold.
Copula θ τ LogLikAICBICCAIC
Clayton2.160.52578.01−1158.02−1165.54−1164.25
Frank12.170.712689.83−5377.66−5371.14−5370.14
Gumbel3.590.722824.09−5646.19−5639.67−5638.67
Table 4. Goodness-of-fit tests for candidate copulas (multiplier bootstrap, B = 5000 ). Reported p-values correspond to Andersen–Darling (AD), Kolmogorov–Smirnov (KS), and Cramér–von Mises (CvM) statistics. The null hypothesis is that the copula is correctly specified.
Table 4. Goodness-of-fit tests for candidate copulas (multiplier bootstrap, B = 5000 ). Reported p-values correspond to Andersen–Darling (AD), Kolmogorov–Smirnov (KS), and Cramér–von Mises (CvM) statistics. The null hypothesis is that the copula is correctly specified.
CopulaAD (p)KS (p)CvM (p)
Clayton<0.01<0.01<0.01
Frank<0.01<0.01<0.01
Gumbel0.630.600.65
Margins were modeled semiparametrically, and pseudo-observations were constructed from ranks.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Brango, H.; Tovar-Falón, R.; Martínez-Flórez, G. A Bivariate Copula–Driven Multi-State Model for Statistical Analysis in Medical Research. Mathematics 2025, 13, 3072. https://doi.org/10.3390/math13193072

AMA Style

Brango H, Tovar-Falón R, Martínez-Flórez G. A Bivariate Copula–Driven Multi-State Model for Statistical Analysis in Medical Research. Mathematics. 2025; 13(19):3072. https://doi.org/10.3390/math13193072

Chicago/Turabian Style

Brango, Hugo, Roger Tovar-Falón, and Guillermo Martínez-Flórez. 2025. "A Bivariate Copula–Driven Multi-State Model for Statistical Analysis in Medical Research" Mathematics 13, no. 19: 3072. https://doi.org/10.3390/math13193072

APA Style

Brango, H., Tovar-Falón, R., & Martínez-Flórez, G. (2025). A Bivariate Copula–Driven Multi-State Model for Statistical Analysis in Medical Research. Mathematics, 13(19), 3072. https://doi.org/10.3390/math13193072

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop