Next Article in Journal
Relationship of Immune-Related Adverse Events with Tumor Response and Prognosis in Esophageal Squamous Cell Carcinoma Following Nivolumab Monotherapy
Next Article in Special Issue
Longitudinal FDG-PET Radiomics for Early Prediction of Treatment Response to Chemoradiation in Locally Advanced Cervical Cancer: A Pilot Study
Previous Article in Journal
Dysbiosis of the Upper Gastrointestinal Tract in Head-and-Neck Cancer Survivors: A Pilot Study Using the Capsule Sponge Device
Previous Article in Special Issue
Early 2-Factor Transcription Factors Associated with Progression and Recurrence in Bevacizumab-Responsive Subtypes of Glioblastoma
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

AI Survival Prediction Modeling: The Importance of Considering Treatments and Changes in Health Status over Time

1
Phalcon, LLC, Manhasset, NY 11030, USA
2
Newark Campus, Rutgers University, Newark, NJ 07102, USA
3
Rutgers New Jersey Medical School, Newark, NJ 07103, USA
4
Rutgers Cancer Institute of New Jersey, Newark, NJ 07103, USA
*
Author to whom correspondence should be addressed.
Cancers 2024, 16(20), 3527; https://doi.org/10.3390/cancers16203527
Submission received: 18 September 2024 / Revised: 11 October 2024 / Accepted: 16 October 2024 / Published: 18 October 2024
(This article belongs to the Collection Artificial Intelligence in Oncology)

Abstract

:

Simple Summary

Predictions of survival in patients with localized breast cancer base their models on data from the time the patients are diagnosed. These survival curves have an inherent inaccuracy because they do not take into consideration events that occur after initial diagnosis. We used eep learning, a type of artificial intelligence, to model the survival of Medicare patients with stage I–III breast cancer from the SEER-Medicare dataset from 1991 to 2016. In addition to considering patient and cancer variables from the time of diagnosis, we included variables that occurred later, including treatment, adverse events, other medical conditions, and progressive age of the patient. Our predictions improved significantly, with the inaccuracy rate dropping from around 30% to less than 10% when the time-varying data were added to the time-fixed data. We also developed our models to generate individual patient predicted survival based on their unique circumstances made up of the patient, cancer, treatment, and treatment-related adverse events that occurred over time. This approach will be a powerful tool that can advise oncology caregivers and patients on the factors that impact their predicted survival.

Abstract

Background and objectives: Deep learning (DL)-based models for predicting the survival of patients with local stages of breast cancer only use time-fixed covariates, i.e., patient and cancer data at the time of diagnosis. These predictions are inherently error-prone because they do not consider time-varying events that occur after initial diagnosis. Our objective is to improve the predictive modeling of survival of patients with localized breast cancer to consider both time-fixed and time-varying events; thus, we take into account the progression of a patient’s health status over time. Methods: We extended four DL-based predictive survival models (DeepSurv, DeepHit, Nnet-survival, and Cox-Time) that deal with right-censored time-to-event data to consider not only a patient’s time-fixed covariates (patient and cancer data at diagnosis) but also a patient’s time-varying covariates (e.g., treatments, comorbidities, progressive age, frailty index, adverse events from treatment). We utilized, as our study data, the SEER-Medicare linked dataset from 1991 to 2016 to study a population of women diagnosed with stage I–III breast cancer (BC) enrolled in Medicare at 65 years or older as qualified by age. We delineated time-fixed variables recorded at the time of diagnosis, including age, race, marital status, breast cancer stage, tumor grade, laterality, estrogen receptor (ER), progesterone receptor (PR), and human epidermal receptor 2 (HER2) status, and comorbidity index. We analyzed six distinct prognostic categories, cancer stages I–III BC, and each stage’s ER/PR+ or ER/PR− status. At each visit, we delineated the time-varying covariates of administered treatments, induced adverse events, comorbidity index, and age. We predicted the survival of three hypothetical patients to demonstrate the model’s utility. Main Outcomes and Measures: The primary outcomes of the modeling were the measures of the model’s prediction error, as measured by the concordance index, the most commonly applied evaluation metric in survival analysis, and the integrated Brier score, a metric of the model’s discrimination and calibration. Results: The proposed extended patients’ covariates that include both time-fixed and time-varying covariates significantly improved the deep learning models’ prediction error and the discrimination and calibration of a model’s estimates. The prediction of the four DL models using time-fixed covariates in six different prognostic categories all resulted in approximately a 30% error in all six categories. When applying the proposed extension to include time-varying covariates, the accuracy of all four predictive models improved significantly, with the error decreasing to approximately 10%. The models’ predictive accuracy was independent of the differing published survival predictions from time-fixed covariates in the six prognostic categories. We demonstrate the utility of the model in three hypothetical patients with unique patient, cancer, and treatment variables. The model predicted survival based on the patient’s individual time-fixed and time-varying features, which varied considerably from Social Security age-based, and stage and race-based breast cancer survival predictions. Conclusions: The predictive modeling of the survival of patients with early-stage breast cancer using DL models has a prediction error of around 30% when considering only time-fixed covariates at the time of diagnosis and decreases to values under 10% when time-varying covariates are added as input to the models, regardless of the prognostic category of the patient groups. These models can be used to predict individual patients’ survival probabilities based on their unique repertoire of time-fixed and time-varying features. They will provide guidance for patients and their caregivers to assist in decision making.

1. Introduction

Many factors affect long-term survival from localized breast cancer (BC) in women. Covariates that impact predictive survival modeling can be both time-fixed and time-varying, and each have been incorporated in generating the predicted survival of BC patient populations. Time-to-event modeling aims to predict the patients’ survival function. Factors that correlate with survival from localized breast cancer are covariates that characterize the patients and their cancer at the time of diagnosis. These factors, referred to as time-fixed covariates, include the cancer stage (I–III), laterality, tumor size (T), number of lymph nodes (N), estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (Her2) status, tumor grade, Ki67 staining, lymphovascular invasion, perineural invasion, androgen receptor (AR) status, molecular signature and recurrence score, stem cell frequency, the presence of circulating tumor cells and bone marrow micrometastases, patient race, location of residence, age, menopausal status, relationship status, health, and comorbidity status [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22].
However, much happens during patients’ lives after the initial diagnosis that affects their survival. These factors are referred to as time-varying covariates, and their impact on survival is less well-characterized, more complex, and often as important as the impact of time-fixed covariates. Time-varying covariates that affect survival include the administration of neoadjuvant therapy, timing and type of initial surgery, the timing of initiation, amount, type, and completion of adjuvant chemotherapy, biotherapy, hormone therapy, radiotherapy, or prolonged hormone or Her2 blockade [14,19,23,24]. Multiple life events that occur at irregularly variable periods have been implicated in decreased survival [25]. They include changes in health status, adverse events from treatment, comorbidities, post adjuvant reconstruction [26] or other non-cancer surgery, angiogenesis and wound healing, hypercoagulable states, persistence of bone marrow disseminated tumor cells [27], aging, estrogen deprivation, infection, inflammation, exposure to toxic chemicals, sedentary lifestyle, weight gain, obesity, smoking, alcohol consumption [28,29,30], anxiety, stress, and depression [25]. These events shorten survival independently or by abrogating the microenvironment’s ability to suppress the awakening of dormant micrometastases [25].
Given the proven impact of both time-fixed and time-varying covariates on predicted survival in population-averaged models, we hypothesized that taking them both into consideration would improve the accuracy of predictive modeling. We focused our study on prediction models that use time-to-event data, including the time a patient is observed for the event of interest, death in our case, and whether they experienced it before the study follow-up ended. Some patients are said to be right-censored, i.e., are lost to the study before experiencing the event and before the study follow-up ends or did not experience the event before the study follow-up ends. Over the past five decades, statistical methods for analyzing time-to-event data, especially right-censored survival data, have been developed. Unlike traditional prediction models, survival prediction models account for censoring; ignoring the censoring leads to biased and inefficient predictions [31,32]. Additional shortcomings of population-based predictions are that they provide highly inadequate predictions of individual patient outcomes due to the virtually unlimited combination of both time-fixed and time-varying features of each patient that impact survival.
Parametric survival models assume a specific distribution, e.g., Weibull, for the survival times. These models, however, lead to biased estimates when the assumed survival time distribution is violated [33]. The semiparametric Cox Proportional Hazards (CPH) regression model [34] is the most common time-to-event analysis approach in the medical literature. Most survival models, including the CPH model, are designed for data with continuous failure time distributions. In real life, patient follow-up visits occur on a given day with irregular gaps between two consecutive visits. Applying standard continuous-time models on discrete-time data without adequate adjustments can lead to biased estimators [35,36]. Further, the CPH model’s proportionality hazard assumption, i.e., the effect of each patient covariate is the same at all values of the follow-up time, is unrealistic for most clinical situations [37].
As a nonlinear extension to CPH, Faraggi–Simon’s network [38] was an early attempt to extend CPH with a neural network. Since the state of development of neural networks was not as advanced as it is today, the results did not show improvement beyond the linear CPH model. Given the modern era of high-performance computing and available datasets of hundreds of thousands of patients and hundreds of millions of patient records, deep learning (DL) models, an artificial intelligence (AI) subfield, are increasingly common approaches to developing survival prediction using time-to-event data. AI-based survival models, specifically DL models, capture the complex nonlinear relationships among the patient’s characteristics, cancer characteristics, treatments, adverse events, comorbidity, etc. These models help achieve precision medicine, providing guidance for treatment that is personally tailored to individual patients with stage I–III BC.

2. Methods

2.1. Deep Learning Predictive Modeling

2.1.1. Discrete Time-to-Event Data

DeepSurv. The first successful attempt to extend the Cox regression model with neural networks (NNs) was proposed in [39], where the patient’s covariates are input to the network, and the single node out of the network uses a linear activation function to estimate the log-risk function in the Cox model. Their results demonstrated that NNs were able to outperform classical Cox models.
DeepHit. Lee et al. [40] were the first to apply NNs to the discrete-time likelihood for right-censored time-to-event data [41]. Their DL-based model, DeepHit, treated survival time as discrete and the time horizon as finite. The model makes no assumptions about the underlying stochastic process by directly learning the joint distribution of survival times and events and allowing for the possibility that the relationship between covariates and risk(s) changes over time. Their results showed that DeepHit outperformed previous models.
Nnet-survival. In [37], a discrete-time survival model, Nnet-survival, was proposed. Given input data of n patients, each with covariate vector x i , we can fit the model by minimizing the loss given by the mean negative log-likelihood. Their result showed “good” discrimination and calibration performance with simulated and real data.
Cox-Time. To overcome the proportionality assumption of the Cox model, Kvamme et al.’s [41] proposal was to consider time as a regular covariate and modify the relative risk function to have it dependent on time, resulting in h t x = h 0 t e g t , x . Thus, g t , x model interactions between time and the other covariates are considered. This model, Cox-Time, although no longer a proportional hazard model, is still a relative risk model with the same partial likelihood as the Cox model with the following loss function,   l o s s = 1 n   i : D i = 1 log j ϵ R i ~ e [ g T i , x j g T i , x i ] . The model is trained on continuous-time data but produces discrete-time predictions.
To summarize, DeepSurv is limited by the proportionality assumption of the CPH model, whereas DeepHit, Nnet-Survival, and Cox-Time are not restricted by the proportionality assumption. DeepSurv is designed for data with continuous failure time distributions; Cox-Time, on the other hand, is trained on continuous-time data but produces discrete-time predictions. Nnet-Survival is a discrete-time model where the baseline hazard rate and the effect of the input data on hazard probability can vary with follow-up time.

2.1.2. Time-Varying Covariates: Proposed Extension

A patient has two sets of covariates: time-fixed covariates (e.g., age at diagnosis) and time-varying covariates (e.g., current age, current comorbidity index, treatments administered at this visit and earlier visits, and adverse events). A patient’s survival status is recorded at each visit while the patient is at risk, i.e., has not yet experienced the event (death in our case) of interest.
The above methods, while dealing with discrete-time data, assume that the   x i covariates of a patient, i , are time-fixed. A realistic prediction of the patient’s survival needs to consider not only the patient’s covariates at the time of diagnosis but also the administered treatments, induced adverse events, comorbidity index, and the age at each visit.
To achieve this objective, we extended a patient covariate vector,   x i , to include not only time-fixed covariates but also covariates that summarize the patient’s history from previous visits. Specifically, for a given treatment,   T R j   f o r   j = 1 ,   2 , ,   46 , T R i j is a tally of the number of times T R j was administered to this patient, i , from the time of the diagnosis to the time of death/end of the study. Similarly, for a given adverse event induced,   A E k   f o r   k = 1 ,   2 , ,   18 , A E i k is a tally of the number of times the patient, i , experienced A E k , from the time of the diagnosis to the time of death/end of the study. We divide the age of a patient into 6 bins: b i n 1   ≤ 65, 65 < b i n 2   ≤ 70, 70 < b i n 3   ≤ 75, 75 < b i n 4   ≤ 80, 80 < b i n 5   ≤ 85, b i n 6   > 85. A G E i b is a tally of the number of times the age of patient, i , falls within the b i n b   f o r   b = 1 ,   2 ,   ,   6 , from the time of the diagnosis to the time of death/end of the study. We handle a patient comorbidity index similar to a patient’s age. We divide the comorbidity index of a patient into 6 bins: b i n 1   ≤ 2, 2 < b i n 2   ≤ 4, 4 < b i n 3   ≤ 6, 6 < b i n 4   ≤ 8, 8 < b i n 5   ≤ 10, b i n 6   > 10. C O M B i b is a tally of the number of times the comorbidity index of the patient, i , falls within the 6 comorbidity bins: 1 ,   2 ,   ,   6 from the time of the diagnosis to the time of death/end of the study.

2.2. Experiments

We compared the performance of the four predictive models discussed above when using the patients’ time-fixed covariates versus when using our proposed extended patients’ covariate vectors to include time-fixed covariates and covariates that summarize the patient’s history from previous visits.

2.2.1. Study Data: SEER-Medicare Linked Dataset

We utilized the SEER-Medicare linked dataset, which provides information on cancer care and outcomes of Medicare beneficiaries with cancer [42]. Medicare data have a patient Entitlement and Diagnosis Summary File, a person-level file that provides SEER demographic and clinical information for up to 10 primary cancer diagnoses, treatments, and mortality. Medicare files capture the fee-for-service claims from hospitals, outpatient facilities, National Claims History, hospice care, home health agencies, and Part D Prescription Drug Event claims. The CCflag file includes, for every patient, the date the patient was diagnosed with one of twenty-two chronic conditions. We used these data to compute a time-varying patient’s comorbidity index. There were 883,053 BC patients during 1991–2016. We fused the data in the various files based on the Observational Medical Outcomes Partnership (OMOP) common data model. These data within those disparate files are transformed into a common format (data model) and a common representation (terminologies, vocabularies, coding schemes).

2.2.2. Cohort Selection

In our study, we included women with the diagnosis of stage I–III breast cancer who have not had any other malignancy history except non-melanoma skin and eyelid cancer, as a standard in NCI clinical trials [43], included all comorbidities recorded at every visit from the date of diagnosis, averaged over the course of all the treatments assessed for each patient, all prior treatments, delineated age, race, marital status, breast cancer stage, tumor grade, laterality, and ER status, PR, and HER2 status. We only included patients whose age at enrollment was 65 or older and qualified by age, not disability. The enrolled population was, therefore, skewed by age and only represented an elderly subset of the breast cancer patient population. We included patients who were enrolled in both Parts A and B of Medicare with no HMO enrollment from 1 month prior to diagnosis through 20 years following diagnosis, hospice, or death to ensure that subjects were continuously enrolled in the proper parts of Medicare during the study period.
For our analysis, we divided the patient population into groups of patients with ER/PR + and − cancers, which are diseases with different genetic, behavioral, and survival characteristics [9], and patients with stages I, II, and III cancers, which have vastly different prognoses from each other [16]. Treatment affects patients with these different classifications differently according to guidelines developed based on clinical investigations [19,44,45].

2.2.3. Data Cleaning, Standardization, Encoding, and Embedding

We removed duplicate records and included alid data, e.g., ICD and HCPCS codes, dates, etc. We performed data transformation and standardization. We categorized treatments into 46 mechanistic categories and adverse events into 18 categories reported in the BC treatment literature. We applied embeddings, which resulted in the efficient computation and discovery of complex patterns, reduced overfitting, and captured the underlying structure of the data to better generalize new, unseen data [46].

2.2.4. Performance Metrics

The concordance index (C-index) is the most commonly applied discriminative evaluation metric in survival analysis [41]. The cause-specific time-dependent C-index, which explicitly accounts for censoring, estimates the model’s prediction error [47]. We also measured the model’s performance using the integrated Brier score (IBS) [48,49,50], a discrimination metric, and a calibration metric of a model’s estimates.

2.2.5. Model Hyperparameters

These neural network parameters are fixed by design and not tuned by training; they should be optimized. We applied Amazon SageMaker Python SDK 2.232.2 (software development kit), an open-source library, to fine-tune the model by identifying optimal values of the network’s hyperparameters. We used a Bayesian optimization search scheme [51]. Table 1 includes the list of the hyperparameters and their ranges. Figure 1 depicts the correlation between the performance metric, time-dependent concordance, and each of the model’s hyperparameters for the case of stage I and ER+.

2.2.6. Models Validation

We used the pycox package [41]. To validate our implementation, we applied the four models, DeepSurv, DeepHit, Nnet-survival, and Cox-Time, to the real datasets METABRIC [52] and SUPPORT [53]. The experiments were conducted using five-fold cross-validation. As shown in Table 2, our results confirm that “all methods perform quite similarly” [54] and are aligned with published results of each of these models using these datasets [54].

3. Results

The patient characteristics, including the number of entries and patients in the category, mean age, and comorbidity indices at the time of initial diagnosis, are presented in Table 3. The data concur with our prior observations with the SEER-Medicare file that ER− patients represent 17.3% of the patients. In the ER+ category, 59.7% of the patients had stage I cancer, and only 9.6% of patients had stage III cancer, while in the ER− group, only 42.8% of the patients had stage I disease, while 18.4% had stage III disease. These data show later-stage distributions in more aggressive ER− tumors than ER+ tumors [16].
The recurrence rates and survival of patients with ER+/PR+ cancers and ER−/PR− cancers vary with well-described characteristics [9,16]. To assess the degree of sensitivity of the model’s predictive accuracy and the model’s calibration to the ER/PR status of the patient population, we considered the following six scenarios of BC patients: scenario 1: stage I, ER/PR+; scenario 2: stage II, ER/PR+; scenario 3: stage III, ER/PR+; scenario 4: stage I, ER/PR−; scenario 5: stage II, ER/PR−; and Scenario 6: stage III, ER/PR−.
Table 4 demonstrates the predictive performance of the four models when considering only the time-fixed covariates versus when considering the time-fixed and the time-varying covariates. In terms of concordance, we observe the significant improvement of the model with the proposed extended patients’ covariates compared with that of the patients’ time-varying covariates. For example, the DeepSurv model’s prediction error is 4% when using the proposed extended patients’ covariates versus over 32% when using the patients’ time-fixed covariates.
Our results demonstrate that the performance of each of the models considered is relatively insensitive to the patient’s ER/PR status, both when considering only the patient’s covariates at diagnosis, or the patient’s time-varying covariates in addition to the patient’s covariate at diagnosis. The predictive capacity of the models improves significantly regardless of ER/PR status or stage when time-varying covariates are considered together with the time-fixed covariates.
To illustrate the practical application of the prediction models to clinical scenarios, we selected three hypothetical patients with unique individual patient, cancer treatment, adverse events features, and progressive age and comorbidity indices, and generated predicted survival curves based on their time-fixed and their time-varying variables (Figure 2). These data demonstrate the potential application of the model to individual patients with their unique characteristics to predict their own individual survival probabilities. We compared each of these hypothetical patient’s median predicted survival probability and the population-averaged predicted survival probability by age from Social Security tables [55], and stage and race [16]. Our data demonstrate that both sets of population-averaged predicted values are vastly different from the predicted median survival of each patient generated by the models. Our model-predicted survivals are influenced by the standard time-fixed variables of age, race, stage, hormone and Her2 status, tumor grade, as outlined in countless prior studies referenced above, as well as by the impact of treatments, treatment-associated adverse events, age and comorbidity progression with treatment, which collectively have positive and negative impacts on the relationship of the model-based survival and population-averaged survival.

Model Interpretability

AI-based model understanding, an active area of research, helps provide insights into the models’ decision-making process. Examples of methods that attempt to break the “black-box” characterization of AI-based models are LIME (Local Interpretable Model-agnostic Explanations) [56], SHAP (SHapley Additive exPlanations) [57], and Captum, which is a state-of-the-art open-source, comprehensive library for deep learning PyTorch model explainability [58]. Limitations of these methods include computational complexity and instability, i.e., different runs may produce different explanations for the same instance.
In addition, interpreting the resulting covariate importance is a challenge, especially, as is the case in our environment, when the input covariates interact in a complex manner, resulting in computed attribution scores that do not capture the nonlinear dependencies between the network inputs and outputs.

4. Discussion

Our findings present a unique and compelling opportunity to improve the prediction performance of the four DL models that handle discrete-time distributions by extending the patients’ covariates vectors to include both time-fixed covariates and covariates that summarize the patient’s history from previous visits. The IBS can be viewed as the mean square error of prediction; lower values of the IBS indicate better predictive performance.
Our analyses using time-fixed covariates in six different prognostic categories all demonstrated an error rate ranging from approximately 28% to 37%. When we combined time-varying covariates that included treatments, adverse events, aging, and comorbidities, the accuracy of all four predictive models improved significantly, with error rates decreasing to 0.4–16%. Although published survival predictions derived from time-fixed covariates vary significantly between stage I, II, and III cancers and between ER/PR+ and ER/PR− cancers, all four of the predictive models demonstrated the same narrow range of inaccuracy when trained on time-fixed covariates and all improved relatively equally to highly accurate estimates when we incorporated time-varying covariates into the modeling.
We can hypothesize that these trends result from our proposed extension to combine multiple time-varying events in the modeling. Indeed, survival from localized breast cancer is the result of the cumulative effects of multiple covariates that have a collective impact on time to death. These time-varying events include wide-ranging treatment categories [14,19,23,24], adverse events from treatment [59,60,61], patient health and mental health events [62,63,64], progressive age, and frailty [65]. Studies generally investigate treatments individually or in combination on their impact on recurrence and survival, which are linked nonlinearly based on a number of factors [19,44,66,67]. Most of the time-varying events have an impact individually on recurrence and survival, factors that depend on the initial cancer stage and hormonal status [15,68]. However, the totality of these time-varying events and their nuanced impact on individual scenarios resulted in a global improvement of predictability nearing unity.
There are several potential limitations to this study. Our patient population consisted of Medicare-enrolled patients over 65 who qualified by age and not for other medical conditions. Their age range, therefore, is not representative of the general population, and their overall expected survival may be less than that of patients with stage I–III BC at a younger age. They may also not be necessarily representative of patients with private insurance coverage. Due to the enrollment criteria, patients may have been diagnosed with BC and potentially received treatment before enrollment in Medicare, potentially skewing the results based on Medicare claims. To generalizthe applicability of our conclusions of significant improvements in the accuracy of survival predictions by combining time-varying covariates with time-fixed covariates, we will recapitulate these approaches in the Medicaid datasets that are composed of younger patients more representative in age of the general population. The limitations of the older ICD-9 diagnosis codes and the lack of recurrence data will need to be addressed using additional datasets [69]. Nevertheless, the predictive modeling accuracies were highly concurrent among the four models in all disease scenarios, suggesting a similar efficacy when considering these additional confounding variables.
DL-based prediction models exhibit outstanding performance; predictive models for BC recurrence and survival often focus on limited covariates related to tumor, treatment, molecular, and clinical covariates. As part of a follow-up investigation, we plan to conduct an in-depth study that builds on our preliminary experimentation with Captum, where we applied the integrated gradient-based method. We plan to study Camptum’s performance when applying DeepLift, FeatureAblation, and ArchDetect methods. Future investigations will focus on molecular characteristics and gene expression characteristics of different cancers to incorporate their impact on predictive probabilities. Future investigations of these models will also be conducted using datasets that record recurrence as well to address additional time-to-event endpoints, including time-to-recurrence and recurrence-free survival.

5. Conclusions

Our data demonstrate that predicting the survival of stage I–III BC patients using only time-fixed variables suffers from a significant error rate of around 30%. However, adding time-varying covariates to the time-fixed covariates in predictive modeling using four DL models significantly decreases the error rate to around 10%, regardless of the prognostic category of the patient prognostic groups with widely differing predicted survival hazard curves based on time-fixed data. The application of these models of individual patients led to predicted survival probabilities that are vastly different and more accurate than population-averaged data based on time-fixed variables based on race, stage, patient overall health, and activity. This approach will have a significant impact on improving the faithfulness of survival estimates based on the unique variables of individual patients and can be applied as an adjunct tool in the clinical care of stage I–III BC patients.

Author Contributions

Conceptualization, N.A. and R.W.; methodology, N.A. and R.W.; software, N.A. and R.W.; validation, N.A. and R.W.; formal analysis, N.A. and R.W.; investigation, N.A. and R.W.; resources, N.A. and R.W.; data curation, N.A. and R.W.; writing—original draft, N.A. and R.W.; writing—review and editing, N.A. and R.W.; visualization, N.A. and R.W.; project administration, N.A. and R.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by 1. Northeast Big Data Innovation Hub, USA, GG014586-02 (N.A. and R.W.); 2. 2020 Busch Biomedical Grant Program, USA (N.A. and R.W.); 3. Amazon Web Services Health Equity Initiative (“HEI”) Program, USA, CC ADV 00011104 2023 TR. (N.A. and R.W.). This study used the linked SEER-Medicare database. The interpretation and reporting of these data are the sole responsibility of the authors. The authors acknowledge the efforts of the National Cancer Institute; Information Management Services (IMS), Inc.; and the Surveillance, Epidemiology, and End Results (SEER) Program tumor registries in the creation of the SEER-Medicare database and wish to thank them for their advice and review of the datasets designating the different treatment venues. The collection of cancer incidence data from the California Cancer Registry used in this study was supported by the California Department of Public Health pursuant to California Health and Safety Code Section 103885; Centers for Disease Control and Prevention’s (CDC) National Program of Cancer Registries, under cooperative agreement 1NU58DP007156; the National Cancer Institute’s Surveillance, Epidemiology and End Results Program under contract HHSN261201800032I awarded to the University of California, San Francisco, contract HHSN261201800015I awarded to the University of Southern California, and contract HHSN261201800009I awarded to the Public Health Institute. The ideas and opinions expressed herein are those of the author(s) and do not necessarily reflect the opinions of the State of California, Department of Public Health, the National Cancer Institute, and the Centers for Disease Control and Prevention or their Contractors and Subcontractors.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Rutgers Institutional Review Board under Exempt Review study number Pro20140000175.

Informed Consent Statement

Not applicable.

Data Availability Statement

Original data were obtained from SEER-Medicare under a two-tiered review process. SEER-Medicare data are available to investigators upon review.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Braun, S.; Vogl, F.D.; Naume, B.; Janni, W.; Osborne, M.P.; Coombes, R.C.; Schlimok, G.; Diel, I.J.; Gerber, B.; Gebauer, G.; et al. A pooled analysis of bone marrow micrometastasis in breast cancer. N. Engl. J. Med. 2005, 353, 793–802. [Google Scholar] [CrossRef] [PubMed]
  2. Dent, R.; Trudeau, M.; Pritchard, K.I.; Hanna, W.M.; Kahn, H.K.; Sawka, C.A.; Lickley, L.A.; Rawlinson, E.; Sun, P.; Narod, S.A. Triple-negative breast cancer: Clinical features and patterns of recurrence. Clin. Cancer Res. 2007, 13, 4429–4434. [Google Scholar] [CrossRef] [PubMed]
  3. Cheng, L.; Swartz, M.D.; Zhao, H.; Kapadia, A.S.; Lai, D.; Rowan, P.J.; Buchholz, T.A.; Giordano, S.H. Hazard of recurrence among women after primary breast cancer treatment—A 10-year follow-up using data from SEER-Medicare. Cancer Epidemiol. Biomark. Prev. 2012, 21, 800–809. [Google Scholar] [CrossRef] [PubMed]
  4. Castellano, I.; Chiusa, L.; Vandone, A.M.; Beatrice, S.; Goia, M.; Donadio, M.; Arisio, R.; Muscarà, F.; Durando, A.; Viale, G.; et al. A simple and reproducible prognostic index in luminal ER-positive breast cancers. Ann. Oncol. 2013, 24, 2292–2297. [Google Scholar] [CrossRef]
  5. Silber, J.H.; Rosenbaum, P.R.; Clark, A.S.; Giantonio, B.J.; Ross, R.N.; Teng, Y.; Wang, M.; Niknam, B.A.; Ludwig, J.M.; Wang, W.; et al. Characteristics associated with differences in survival among black and white women with breast cancer. JAMA 2013, 310, 389–397. [Google Scholar] [CrossRef]
  6. Vera-Badillo, F.E.; Templeton, A.J.; de Gouveia, P.; Diaz-Padilla, I.; Bedard, P.L.; Al-Mubarak, M.; Amir, E. Androgen receptor expression and outcomes in early breast cancer: A systematic review and meta-analysis. J. Natl. Cancer Inst. 2014, 106, djt319. [Google Scholar] [CrossRef]
  7. Daly, B.; Olopade, O.I. A perfect storm: How tumor biology, genomics, and health care delivery patterns collide to create a racial survival disparity in breast cancer and proposed interventions for change. CA Cancer J. Clin. 2015, 65, 221–238. [Google Scholar] [CrossRef]
  8. Shi, R.; Taylor, H.; McLarty, J.; Liu, L.; Mills, G.; Burton, G. Effects of payer status on breast cancer survival: A retrospective study. BMC Cancer 2015, 15, 211. [Google Scholar] [CrossRef]
  9. Colleoni, M.; Sun, Z.; Price, K.N.; Karlsson, P.; Forbes, J.F.; Thürlimann, B.; Gianni, L.; Castiglione, M.; Gelber, R.D.; Coates, A.S.; et al. Annual Hazard Rates of Recurrence for Breast Cancer During 24 Years of Follow-Up: Results From the International Breast Cancer Study Group Trials I to V. J. Clin. Oncol. 2016, 34, 927–935. [Google Scholar] [CrossRef]
  10. Giuliani, J.; Mercanti, A.; Bonetti, A. Late recurrence (more than 10 years) in early (tumors equal to or smaller than 2 cm) breast cancer patients. Clin. Transl. Oncol. 2016, 18, 859–862. [Google Scholar] [CrossRef]
  11. Miyoshi, Y.; Shien, T.; Ogiya, A.; Ishida, N.; Yamazaki, K.; Horii, R.; Horimoto, Y.; Masuda, N.; Yasojima, H.; Inao, T.; et al. Collaborative Study Group of Scientific Research of the Japanese Breast Cancer Society. Differences in expression of the cancer stem cell marker aldehyde dehydrogenase 1 among estrogen receptor-positive/human epidermal growth factor receptor type 2-negative breast cancer cases with early, late, and no recurrence. Breast Cancer Res. 2016, 18, 73. [Google Scholar] [PubMed]
  12. Janni, W.J.; Rack, B.; Terstappen, L.W.; Pierga, J.Y.; Taran, F.A.; Fehm, T.; Hall, C.; de Groot, M.R.; Bidard, F.C.; Friedl, T.W.; et al. Pooled analysis of the prognostic relevance of circulating tumor cells in primary breast cancer. Clin. Cancer Res. 2016, 22, 2583–2593. [Google Scholar] [CrossRef] [PubMed]
  13. Geurts, Y.M.; Witteveen, A.; Bretveld, R.; Poortmans, P.M.; Sonke, G.S.; Strobbe, L.J.A.; Siesling, S. Patterns and predictors of first and subsequent recurrence in women with early breast cancer. Breast Cancer Res. Treat. 2017, 165, 709–720. [Google Scholar] [CrossRef] [PubMed]
  14. Pan, H.; Gray, R.; Braybrooke, J.; Davies, C.; Taylor, C.; McGale, P.; Peto, R.; Pritchard, K.I.; Bergh, J.; Dowsett, M.; et al. EBCTCG. 20-Year Risks of Breast-Cancer Recurrence after Stopping Endocrine Therapy at 5 Years. N. Engl. J. Med. 2017, 377, 1836–1846. [Google Scholar] [CrossRef]
  15. Sestak, I.; Zhang, Y.; Schroeder, B.E.; Schnabel, C.A.; Dowsett, M.; Cuzick, J.; Sgroi, D. Cross-Stratification and Differential Risk by Breast Cancer Index and Recurrence Score in Women with Hormone Receptor-Positive Lymph Node-Negative Early-Stage Breast Cancer. Clin. Cancer Res. 2016, 22, 5043–5048. [Google Scholar] [CrossRef]
  16. Wieder, R.; Shafiq, B.; Adam, N. African American race is an independent risk factor in survival form initially diagnosed localized breast cancer. J. Cancer 2016, 7, 1587–1598. [Google Scholar] [CrossRef]
  17. Tjensvoll, K.; Nordgård, O.; Skjæveland, M.; Oltedal, S.; Janssen, E.A.M.; Gilje, B. Detection of disseminated tumor cells in bone marrow predict late recurrences in operable breast cancer patients. BMC Cancer 2019, 19, 1131. [Google Scholar] [CrossRef]
  18. Wieder, R.; Shafiq, B.; Adam, N. Greater Survival Improvement in African American vs. Caucasian Women with Hormone Negative Breast Cancer. J. Cancer 2020, 11, 2808–2820. [Google Scholar] [CrossRef]
  19. Dar, H.; Johansson, A.; Nordenskjöld, A.; Iftimi, A.; Yau, C.; Perez-Tenorio, G.; Benz, C.; Nordenskjöld, B.; Stål, O.; Esserman, L.J.; et al. Assessment of 25-Year Survival of Women With Estrogen Receptor-Positive/ERBB2-Negative Breast Cancer Treated With and Without Tamoxifen Therapy: A Secondary Analysis of Data From the Stockholm Tamoxifen Randomized Clinical Trial. JAMA Netw. Open. 2021, 4, e2114904. [Google Scholar] [CrossRef]
  20. Prakash, O.; Hossain, F.; Danos, D.; Lassak, A.; Scribner, R.; Miele, L. Racial disparities in triple negative breast cancer: A review of the role of biologic and non-biologic factors. Front. Public Health 2020, 8, 576964. [Google Scholar] [CrossRef]
  21. Hoskins, K.F.; Danciu, O.C.; Ko, N.Y.; Calip, G.S. Association of Race/Ethnicity and the 21-Gene Recurrence Score With Breast Cancer-Specific Mortality Among US Women. JAMA Oncol. 2021, 7, 370–378. [Google Scholar] [CrossRef] [PubMed]
  22. Hoskins, K.F.; Calip, G.S.; Huang, H.C.; Ibraheem, A.; Danciu, O.C.; Rauscher, G.H. Association of social determinants and tumor biology with racial disparity in survival from early-stage, hormone-dependent breast cancer. JAMA Oncol. 2023, 9, 536–545. [Google Scholar] [CrossRef] [PubMed]
  23. Gagliato Dde, M.; Gonzalez-Angulo, A.M.; Lei, X.; Theriault, R.L.; Giordano, S.H.; Valero, V.; Hortobagyi, G.N.; Chavez-Macgregor, M. Clinical impact of delaying initiation of adjuvant chemotherapy in patients with breast cancer. J. Clin. Oncol. 2014, 32, 735–744. [Google Scholar] [CrossRef] [PubMed]
  24. Sheppard, V.B.; Oppong, B.A.; Hampton, R.; Snead, F.; Horton, S.; Hirpa, F.; Brathwaite, E.J.; Makambi, K.; Onyewu, S.; Boisvert, M.; et al. Disparities in breast cancer surgery delay: The lingering effect of race. Ann. Surg. Oncol. 2015, 22, 2902–2911. [Google Scholar] [CrossRef] [PubMed]
  25. Wieder, R. Awakening of Dormant Breast Cancer Cells in the Bone Marrow. Cancers 2023, 15, 3021. [Google Scholar] [CrossRef]
  26. Dillekås, H.; Demicheli, R.; Ardoino, I.; Jensen, S.A.H.; Biganzoli, E.; Straume, O. The recurrence pattern following delayed breast reconstruction after mastectomy for breast cancer suggests a systemic effect of surgery on occult dormant micrometastases. Breast Cancer Res. Treat. 2016, 158, 169–178. [Google Scholar] [CrossRef]
  27. Janni, W.; Vogl, F.D.; Wiedswang, G.; Synnestvedt, M.; Fehm, T.; Jückstock, J.; Borgen, E.; Rack, B.; Braun, S.; Sommer, H.; et al. Persistence of disseminated tumor cells in the bone marrow of breast cancer patients predicts increased risk for relapse—A European pooled analysis. Clin. Cancer Res. 2011, 17, 2967–2976. [Google Scholar] [CrossRef]
  28. Kwan, M.L.; Kushi, L.H.; Weltzien, E.; Tam, E.K.; Castillo, A.; Sweeney, C.; Caan, B.J. Alcohol consumption and breast cancer recurrence and survival among women with early-stage breast cancer: The life after cancer epidemiology study. J. Clin. Oncol. 2010, 28, 4410–4416. [Google Scholar] [CrossRef]
  29. Simapivapan, P.; Boltong, A.; Hodge, A. To what extent is alcohol consumption associated with breast cancer recurrence and second primary breast cancer? A systematic review. Cancer Treat. Rev. 2016, 50, 155–167. [Google Scholar] [CrossRef]
  30. Nechuta, S.; Chen, W.Y.; Cai, H.; Poole, E.M.; Kwan, M.L.; Flatt, S.W.; Patterson, R.E.; Pierce, J.P.; Caan, B.J.; Ou Shu, X. A pooled analysis of post-diagnosis lifestyle factors in association with late estrogen-receptor-positive breast cancer prognosis. Int. J. Cancer 2016, 138, 2088–2097. [Google Scholar] [CrossRef]
  31. Suresh, K.; Severn, C.; Ghosh, D. Survival prediction models: An introduction to discrete-time modeling. BMC Med. Res. Methodol. 2022, 22, 207. [Google Scholar] [CrossRef] [PubMed]
  32. Kattan, M.W. Comparison of cox regression with other methods for determining prediction models and nomograms. J. Urol. 2003, 170, 6–10. [Google Scholar] [CrossRef] [PubMed]
  33. Kleinbaum, D.G.; Klein, M. Survival Analysis; Springer: New York, NY, USA, 2010; Volume 3. [Google Scholar]
  34. Cox, D.R. Regression models and life-tables. J. R. Stat. Soc. Ser. B (Methodol.) 1972, 34, 187–202. [Google Scholar] [CrossRef]
  35. Meir, T.; Gutman, R.; Gorfine, M. PyDTS: A Python Package for Discrete Time Survival Analysis with Competing Risks. arXiv 2022, arXiv:2204.05731. [Google Scholar]
  36. Meir, T.; Gorfine, M. Discrete-time Competing-Risks Regression with or without Penalization. arXiv 2023, arXiv:2303.01186. [Google Scholar]
  37. Gensheimer, M.F.; Narasimhan, B. A scalable discrete-time survival model for neural networks. PeerJ 2019, 7, e6257. [Google Scholar] [CrossRef]
  38. Faraggi, D.; Simon, R. A neural network model for survival data. Stat. Med. 1995, 14, 73–82. [Google Scholar] [CrossRef]
  39. Katzman, J.L.; Shaham, U.; Cloninger, A.; Bates, J.; Jiang, T.; Kluger, Y. DeepSurv: Personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med. Res. Methodol. 2018, 18, 24. [Google Scholar] [CrossRef]
  40. Lee, C.; Zame, W.R.; Yoon, J.; van der Schaar, M. DeepHit: A deep learning approach to survival analysis with competing risks. Proc. AAAI Conf. Artif. Intell. 2018, 32, 2314–2321. [Google Scholar] [CrossRef]
  41. Kvamme, H.; Borgan, Ø.; Scheel, I. Time-to-event prediction with neural networks and Cox regression. J. Mach. Learn. Res. 2019, 20, 1–30. [Google Scholar]
  42. Enewold, L.; Parsons, H.; Zhao, L.; Bott, D.; Rivera, D.R.; Barrett, M.J.; Virnig, B.A.; Warren, J.L. Updated overview of the SEER-Medicare data: Enhanced content and applications. JNCI Monogr. 2020, 2020, 3–13. [Google Scholar]
  43. Perez, M.; Murphy, C.C.; Pruitt, S.L.; Rashdan, S.; Rahimi, A.; Gerber, D.E. Potential Impact of Revised NCI Eligibility Criteria Guidance: Prior Malignancy Exclusion in Breast Cancer Clinical Trials. J. Natl. Compr. Cancer Netw. 2022, 20, 792–799. [Google Scholar] [CrossRef] [PubMed]
  44. Goldvaser, H.; Ribnikar, D.; Majeed, H.; Ocana, A.; Amir, E. Absolute benefit from adjuvant chemotherapy in contemporary clinical trials: A systemic review and meta-analysis. Cancer Treat. Rev. 2018, 71, 68–75. [Google Scholar] [CrossRef] [PubMed]
  45. Hayes, D.F. Disease related indicators for a proper choice of adjuvant treatments. Breast 2011, 20 (Suppl. S3), S162–S164. [Google Scholar] [CrossRef]
  46. Guo, C.; Berkhahn, F. Entity embeddings of categorical variables. arXiv 2016, arXiv:1604.06737. [Google Scholar]
  47. Ishwaran, H.; Kogalur, U.B.; Blackstone, E.H.; Lauer, M.S. Random survival forests. Ann. Appl. Stat. 2008, 2, 841–860. [Google Scholar] [CrossRef]
  48. Brier, G.W. Verification of forecasts expressed in terms of probability. Mon. Weather Rev. 1950, 78, 1–3. [Google Scholar] [CrossRef]
  49. Graf, E.; Schmoor, C.; Sauerbrei, W.; Schumacher, M. Assessment and comparison of prognostic classification schemes for survival data. Stat. Med. 1999, 18, 2529–2545. [Google Scholar] [CrossRef]
  50. Gerds, T.A.; Schumacher, M. Consistent estimation of the expected goldvbrier score in general survival models with right-censored event times. Biom. J. 2006, 48, 1029–1040. [Google Scholar] [CrossRef]
  51. Snoek, J.; Larochelle, H.; Adams, R.P. Practical bayesian optimization of machine learning algorithms. Adv. Neural Inf. Process. Syst. 2012, 25, 1–9. [Google Scholar]
  52. Mucaki, E.J.; Baranova, K.; Pham, H.Q.; Rezaeian, I.; Angelov, D.; Ngom, A.; Rueda, L.; Rogan, P.K. Predicting Outcomes of Hormone and Chemotherapy in the: Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) Study by Biochemically-inspired Machine Learning. F1000Research 2016, 5, 2124. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  53. Connors, A.F.; Dawson, N.V.; Desbiens, N.A.; Fulkerson, W.J.; Goldman, L.; Knaus, W.A.; Lynn, J.; Oye, R.K.; Bergner, M.; Damiano, A.; et al. A controlled trial to improve care for seriously iII hospitalized patients: The study to understand prognoses and preferences for outcomes and risks of treatments (SUPPORT). JAMA 1995, 274, 1591–1598. [Google Scholar] [CrossRef]
  54. Kvamme, H.; Borgan, Ø. Continuous and discrete-time survival prediction with neural networks. Lifetime Data Anal. 2021, 27, 710–736. [Google Scholar] [CrossRef] [PubMed]
  55. Social Security Period Life Table, 2021, As Used in the 2024 Trustees Report. Available online: https://www.ssa.gov/oact/STATS/table4c6.html (accessed on 8 April 2024).
  56. Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the KDD ‘16: The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar] [CrossRef]
  57. Lundberg, S.M.; Lee, S.I. Consistent feature attribution for tree ensembles. arXiv 2017, arXiv:1706.06060. [Google Scholar]
  58. Miglani, V.; Yang, A.; Markosyan, A.H.; Garcia-Olano, D.; Kokhlikyan, N. Using captum to explain generative language models. arXiv 2023, arXiv:2312.05491. [Google Scholar]
  59. Russo, A.; Autelitano, M.; Bisanti, L. Re: Frequency and cost of chemotherapy-related serious adverse effects in a population sample of women with breast cancer. J. Natl. Cancer Inst. 2006, 98, 1826–1827. [Google Scholar] [CrossRef]
  60. Nyrop, K.A.; Damone, E.M.; Deal, A.M.; Wheeler, S.B.; Charlot, M.; Reeve, B.B.; Basch, E.; Shachar, S.S.; Carey, L.A.; Reeder-Hayes, K.E.; et al. Patient-reported treatment toxicity and adverse events in Black and White women receiving chemotherapy for early breast cancer. Breast Cancer Res. Treat. 2022, 191, 409–422. [Google Scholar] [CrossRef]
  61. Rosenzweig, M.Q.; Mazanec, S.R. Racial differences in breast cancer therapeutic toxicity: Implications for practice. Cancer Epidemiol. Biomark. Prev. 2023, 32, 157–158. [Google Scholar] [CrossRef]
  62. Barnett, K.; Mercer, S.W.; Norbury, M.; Watt, G.; Wyke, S.; Guthrie, B. Epidemiology of multimorbidity and implications for health care, research, and medical education: A cross-sectional study. Lancet 2012, 380, 37–43. [Google Scholar] [CrossRef]
  63. Freid, V.M.; Bernstein, A.B.; Bush, M.A. Multiple chronic conditions among adults aged 45 and over: Trends over the past 10 years. NCHS Data Brief 2012, 100, 1–8. Available online: http://ovidsp.ovid.com/ovidweb.cgi?T=JS&PAGE=reference&D=med9&NEWS=N&AN=23101759 (accessed on 8 April 2024).
  64. McGrath, J.J.; Al-Hamzawi, A.; Alonso, J.; Altwaijri, Y.; Andrade, L.H.; Bromet, E.J.; Bruffaerts, R.; de Almeida, J.M.C.; Chardoul, S.; Chiu, W.T.; et al. Age of onset and cumulative risk of mental disorders: A cross-national analysis of population surveys from 29 countries. Lancet Psychiatry 2023, 10, 668–681. [Google Scholar] [CrossRef] [PubMed]
  65. Schoenborn, N.L.; Blackford, A.L.; Joshu, C.E.; Boyd, C.M.; Varadhan, R. Life expectancy estimates based on comorbidities and frailty to inform preventive care. J. Am. Geriatr. Soc. 2022, 70, 99–109. [Google Scholar] [CrossRef] [PubMed]
  66. Jatoi, I.; Anderson, W.F.; Jeong, J.H.; Redmond, C.K. Breast cancer adjuvant therapy: Time to consider its time-dependent effects. J. Clin. Oncol. 2011, 29, 2301–2304. [Google Scholar] [CrossRef] [PubMed]
  67. Hudis, C.A.; Dickler, M. Increasing precision in adjuvant therapy for breast cancer. N. Engl. J. Med. 2016, 375, 790–791. [Google Scholar] [CrossRef] [PubMed]
  68. Chan, N.; Toppmeyer, D.L. The Final Verdict: Chemotherapy Benefits Estrogen Receptor-Negative Isolated Local Recurrence. J. Clin. Onc. 2018, 36, 1058–1059. [Google Scholar] [CrossRef]
  69. El Haji, H.; Souadka, A.; Patel, B.N.; Sbihi, N.; Ramasamy, G.; Patel, B.K.; Ghogho, M.; Banerjee, I. Evolution of Breast Cancer Recurrence Risk Prediction: A Systematic Review of Statistical and Machine Learning-Based Models. JCO Clin. Cancer Inform. 2023, 7, e2300049. [Google Scholar] [CrossRef]
Figure 1. The C-index vs. the model hyperparameters: n-layers, Ir, batch size, epochs, dropout, n_nodes, alpha, sigma.
Figure 1. The C-index vs. the model hyperparameters: n-layers, Ir, batch size, epochs, dropout, n_nodes, alpha, sigma.
Cancers 16 03527 g001
Figure 2. Predicted survival curves for three hypothetical patients. Details of data used to produce these hypothetical predicted survival curves are not permitted by NCI SEER-Medicare because they were derived from individual patient-identifying information.
Figure 2. Predicted survival curves for three hypothetical patients. Details of data used to produce these hypothetical predicted survival curves are not permitted by NCI SEER-Medicare because they were derived from individual patient-identifying information.
Cancers 16 03527 g002
Table 1. Hyperparameters.
Table 1. Hyperparameters.
HyperparameterTypeRange
Batch sizeCategorical[32, 64, 128, 256, 512]
EpochsCategorical[100, 200, 300, 500]
Dropout rateCategorical[0.001, 0.01, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]
Number of layersInteger values[2, 5]
Number of nodesCategorical[32, 64, 128, 256, 512]
AlphaCategorical[0.0, 0.001, 0.1, 0.2, 0.5, 0.8, 0.9, 0.99, 1.0]
SigmaCategorical[0.01, 0.1, 0.25, 0.5, 1.0, 10, 100]
Learning rateContinuous[0.0001, 0.1]
Table 2. Validation of applied models.
Table 2. Validation of applied models.
ModelTime-Dependent ConcordanceIntegrated Brier Score
SupportMetabricSupportMetabric
As Reported in the LiteratureOur ResultsReported in the LiteratureOur ResultsReported in the
Literature
Our ResultsReported in the
Literature
Our
Results
Cox-Time0.6300.6470.6640.6830.2120.1820.1730.150
DeepHit0.6390.6460.6750.6760.2270.1960.1860.103
DeepSurv0.6150.6300.6400.7100.2130.2310.1750.136
Nnet-Survival
(Logistic Hazard)
0.6250.6170.6580.6740.1840.2050.1720.142
Table 3. Patient characteristics.
Table 3. Patient characteristics.
StageNumber of EntriesNumber of PatientsAge ± SDComorbidity Index ± SDNumber of EntriesNumber of PatientsAge ± SDComorbidity Index ± SD
ER/PR+ER/PR−
I17,400,56992,46774.7 ± 6.82.9 ± 2.92,741,78013,88074.0 ± 6.72.8 ± 2.9
II8,828,80147,46975.7 ± 7.52.9 ± 3.12,292,61712,56075.2 ± 7.52.6 ± 3.0
III2,604,11514,82575.8 ± 7.62.5 ± 3.0979,047596675.7 ± 7.72.1 ± 2.9
Table 4. Time-dependent concordance and integrated Brier score.
Table 4. Time-dependent concordance and integrated Brier score.
ModelTime-Dependent
Concordance
Integrated Brier ScoreTime-Dependent
Concordance
Integrated Brier Score
SM_Time-Fixed
Patients’ Covariates
± SD
SM_Time-Fixed & Varying Patients’ Covariates
± SD
SM_Time-Fixed
Patients’ Covariates
± SD
SM_Time-Fixed & Varying Patients’ Covariates
± SD
SM_Time-Fixed
Patients’ Covariates
± SD
SM_Time-Fixed & Varying Patients’ Covariates
± SD
SM_Time-Fixed
Patients’ Covariates
± SD
SM_Time-Fixed & Varying Patients’ Covariates
± SD
ER/PR+ER/PR−
Stage I
Cox-Time0.679 ± 0.0010.987 ± 0.0010.112 ± 0.0020.009 ± 0.0030.690 ± 0.0050.987 ± 0.0020.120 ± 0.0020.011 ± 0.001
DeepHit0.667 ± 0.0020.958 ± 0.0010.110 ± 0.0030.013 ± 0.0010.671 ± 0.0030.960 ± 0.0030.127 ± 0.0010.042 ± 0.003
DeepSurv0.682 ± 0.0010.969 ± 0.0020.110 ± 0.0030.030 ± 0.0090.670 ± 0.0060.996 ± 0.0010.117 ± 0.0010.018 ± 0.003
Nnet-Survival
(Logistic Hazard)
0.668 ± 0.0010.976 ± 0.0010.131 ± 0.0030.037 ± 0.0020.642 ± 0.0050.980 ± 0.0010.110 ± 0.0020.042 ± 0.002
Stage II
Cox-Time0.689 ± 0.0030.988 ± 0.0010.106 ± 0.0010.007 ± 0.0030.676 ± 0.0060.978 ± 0.0030.110 ± 0.0020.011 ± 0.001
DeepHit0.722 ± 0.0010.988 ± 0.0010.122 ± 0.0010.080 ± 0.0070.724 ± 0.0030.842 ± 0.0050.129 ± 0.0020.001 ± 0.003
DeepSurv0.672 ± 0.0030.965 ± 0.0030.105 ± 0.0010.029 ± 0.0010.663 ± 0.0060.993 ± 0.0010.104 ± 0.0010.026 ± 0.002
Nnet-Survival
(Logistic Hazard)
0.661 ± 0.0010.977 ± 0.0010.110 ± 0.0010.038 ± 0.0030.618 ± 0.0020.984 ± 0.0010.118 ± 0.0010.024 ± 0.001
Stage III
Cox-Time0.642 ± 0.0010.981 ± 0.0020.091 ± 0.0030.008 ± 0.0010.709 ± 0.0050.968 ± 0.0040.080 ± 0.0010.011 ± 0.001
DeepHit0.621 ± 0.0020.981 ± 0.0010.094 ± 0.0010.052 ± 0.0030.703 ± 0.0040.973 ± 0.0010.085 ± 0.0010.089 ± 0.007
DeepSurv0.660 ± 0.0040.984 ± 0.0060.089 ± 0.0030.024 ± 0.0020.666 ± 0.0140.993 ± 0.0020.079 ± 0.0020.024 ± 0.001
Nnet-Survival
(Logistic Hazard)
0.627 ± 0.0020.944 ± 0.0090.091 ± 0.0030.005 ± 0.0020.604 ± 0.0050.944 ± 0.0030.087 ± 0.0020.064 ± 0.007
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Adam, N.; Wieder, R. AI Survival Prediction Modeling: The Importance of Considering Treatments and Changes in Health Status over Time. Cancers 2024, 16, 3527. https://doi.org/10.3390/cancers16203527

AMA Style

Adam N, Wieder R. AI Survival Prediction Modeling: The Importance of Considering Treatments and Changes in Health Status over Time. Cancers. 2024; 16(20):3527. https://doi.org/10.3390/cancers16203527

Chicago/Turabian Style

Adam, Nabil, and Robert Wieder. 2024. "AI Survival Prediction Modeling: The Importance of Considering Treatments and Changes in Health Status over Time" Cancers 16, no. 20: 3527. https://doi.org/10.3390/cancers16203527

APA Style

Adam, N., & Wieder, R. (2024). AI Survival Prediction Modeling: The Importance of Considering Treatments and Changes in Health Status over Time. Cancers, 16(20), 3527. https://doi.org/10.3390/cancers16203527

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop