1. Introduction
Left truncation arises in many clinical studies when the origin of the time scale, such as conception, is not directly observed within the relevant timeline, such as gestational age. Ignoring left truncation can introduce bias because study entry may depend on both outcome and exposure status [
1]. This issue has been documented in several areas of epidemiology research, including the development of AIDS among individuals with HIV infection, the occurrence of spontaneous abortion or birth defects, and cases of cancer survivors who were enrolled after diagnosis [
2]. For instance, the observed inverse association between smoking during pregnancy and preeclampsia, typically showing a 10–40% lower risk, may reflect left-truncation bias due to higher early pregnancy loss among smokers before preeclampsia diagnosis [
3].
In cancer survival analysis, accounting for left truncation is particularly critical when survival time is measured from the initiation of treatment or the date of surgery. Patients who die before these events are inherently excluded from the analytic cohort, which leads to delayed entry bias and distorts the estimation of treatment effects [
4,
5,
6]. Such bias may cause treatments to appear more effective than they actually are, or conversely, attenuate estimates due to selection effects. Properly modeling left-truncated data ensures that each individual contributes their time only after becoming at risk, thereby yielding unbiased estimates of survival probabilities and hazard ratios.
Survival models are widely used to examine relationships between patient characteristics and treatment outcomes. The Cox proportional hazards (CPH) model, a semi-parametric framework, assumes a log-linear relationship between covariates and the hazard, which may be overly restrictive in complex biomedical settings. Nonlinear approaches, such as neural networks and survival forests, may better capture high-dimensional interactions among variables.
This study introduces a deep neural network-based survival model, referred to as DeepLTRC, developed to appropriately handle left-truncated and right-censored data and to evaluate its performance using real-world clinical datasets. Deep learning-based models have achieved substantial success across various fields, including natural language processing, speech recognition, and image analysis [
7,
8,
9], owing to their multilayer nonlinear structure optimized by loss functions with regularization. However, applying deep learning to censored or truncated outcomes remains challenging because conventional loss functions cannot directly account for incomplete follow-up information.
To overcome this limitation, several studies have proposed loss functions assuming a Weibull-distributed failure time [
10,
11], while models such as DeepSurv have extended the proportional hazards framework using partial likelihood-based loss functions [
12,
13,
14,
15], and several authors have further proposed models for survival time based on discrete distributions [
16,
17]. Despite these methodological advances, no deep neural network approach has been specifically developed for left-truncated survival data, representing an important unmet need in modern survival analysis.
4. Discussion
This study proposed and evaluated DeepLTRC, a deep neural network that jointly accounts for left truncation and right censoring in survival analysis. Results from simulations and real-world data show that DeepLTRC maintains strong predictive performance across various censoring rates and data structures, extending deep learning’s applicability to settings where conventional methods often fail.
In Simulation Scenario 1, which included only independent noise variables, the deep neural network for right-censored data improved with increasing censoring, achieving an iAUC of 0.850 at 60%, which was comparable to the CPH model and RSF. When left truncation was introduced, DeepLTRC initially showed a slightly lower performance (iAUC = 0.649 at 20% censoring), which improved steadily as censoring increased (iAUC = 0.745 at 60% censoring), approaching the performance of the CPH model (0.779). These results indicate that DeepLTRC effectively learns survival structures even when truncation reduces available information.
In Simulation Scenario 2, which incorporated nonlinear and interaction effects, the network achieved high accuracy for right-censored data (iAUC = 0.928 at 60% censoring), comparable to CPH (0.938) and outperforming RSF (0.866). For left-truncated data, DeepLTRC again showed improved performance with higher censoring (iAUC = 0.589 to 0.717), showing its capacity to model complex nonlinear covariate relationships while maintaining consistent survival estimation under truncation.
Application to the BMT dataset further confirmed its practical utility. DeepLTRC achieved an iAUC of 0.575, which outperformed RSF (0.504) but was slightly lower than CPH (0.776). Because this dataset involves delayed entry after transplantation, the results underscore DeepLTRC’s ability to manage incomplete at-risk periods, which are common in clinical survival studies. Despite the small sample size, its performance was comparable to advanced deep survival models such as DeepSurv [
7] and DeepHit [
32], which were trained on much larger datasets. The DeepSurv model, for example, used the United Network for Organ Sharing (UNOS) database, which contains data from 60,400 patients who underwent heart transplantation between 1985 and 2015, including 29,436 uncensored (48.7%) and 30,964 censored (51.3%) cases with 50 clinical features. It also utilized the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset, consisting of 2092 patients (999 uncensored and 1093 censored) with 21 gene expression and clinical variables. The C-index [
33] was 0.573 (0.555–0.571) for UNOS and 0.648 (0.636–0.660) for METABRIC, while DeepHit achieved values of 0.589 and 0.691, respectively. These comparisons demonstrate that DeepLTRC delivers competitive predictive accuracy even with limited data, confirming its robustness and adaptability to left-truncated survival settings.
Overall, DeepLTRC bridges the methodological gap between traditional linear-effect survival models and deep learning models that overlook truncation. By integrating the Breslow estimator in its output layer, it combines the interpretability of semi-parametric models with the flexibility of neural networks. Thus, DeepLTRC advances methodologically transparent and generalizable deep learning approaches for complex clinical survival data, particularly in cancer prognosis and registry-based research.
In this simulation framework, the Weibull data-generating process assumes a log-linear relationship between covariates and the hazard function, which inherently favors the CPH model. As a result, DeepLTRC did not consistently outperform CPH in settings that strictly follow linear hazard structures. However, the adaptability of DeepLTRC reveals its potential under nonlinear effects, higher censoring, or complex covariate interactions, demonstrating its capability to effectively model survival patterns that extend beyond the parametric constraints of the CPH model.
Despite these promising results, several limitations should be noted. First, the current simulation design used a moderate number of covariates; therefore, further validation on high-dimensional genomic or imaging data is needed. Second, the model’s hyperparameters were tuned manually. Future research should apply automated hyperparameter optimization (AutoHPO) techniques such as population-based training (PBT) [
34], Bayesian optimization with Hyperband (BOHB) [
35], or optimization frameworks like Optuna [
36] to improve efficiency and reproducibility. Third, the current study focused on single-event survival analysis. Extending the framework to competing risks or multi-state survival modeling could further increase its clinical utility. Fourth, although this study employed the integrated area under the time-dependent ROC curve (iAUC) as the primary metric for evaluating the discriminative performance of the survival models, the Integrated Brier Score (IBS) also represents a comprehensive measure for assessing overall calibration and discrimination performance over time [
37,
38]. Therefore, future studies should incorporate IBS as a complementary performance metric to enable a more rigorous evaluation of model reliability and predictive accuracy over time. In addition, it is necessary to incorporate the Δ-iAUC (model minus reference) with paired bootstrap confidence intervals and the time-dependent Brier and calibration plots for a more rigorous and comprehensive performance assessment [
39]. Although the DeepLTRC framework effectively handled delayed entry in the BMT dataset, a limitation of this study is that competing risks, such as death before relapse, were not explicitly modeled. This omission may influence the interpretation of relapse-related survival probabilities, as event dependencies can alter hazard estimates in real-world clinical settings. Future work should extend the model to a competing-risks or multi-state survival framework to more accurately capture multiple event processes. Finally, incorporating explainable artificial intelligence (XAI) methods may help identify significant prognostic variables and enhance interpretability for clinical applications [
40].