SSL-SurvFormer: A Self-Supervised Learning and Continuously Monotonic Transformer Network for Missing Values in Survival Analysis
Abstract
:1. Introduction
2. Related Work
3. Proposed Method
3.1. Problem Setup and Notations
3.2. SurvFormer: Network Architecture
3.2.1. Feature Embedding
3.2.2. Transformer
Algorithm 1 Transformer |
|
3.2.3. Monotonically Positive
- As t increases, should decrease.
- . That is at the start of the study, no one has the event, thus the probability of surviving at is 1.
- . If the study were to go to , then everyone will eventually experience the event and hence the survival probability must be 0.
3.2.4. Loss Function
3.3. SSL-SurvFormer: SurvFormer with SSL
3.3.1. Data Augmentation
3.3.2. Pre-Training
4. Experiments
4.1. Datasets
4.2. Evaluation Metrics
4.3. Implementation Details
4.4. Performance Comparison
4.5. Results Analysis
4.6. Ablation Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Evaluation Metric Details
Appendix A.1. Quantile Metrics
- 1.
- is the time event really occurred or censored.
- 2.
- , the risk score of instance i at time t.
- 3.
- if else 0.
- 4.
- if event occurred, else 0.
Appendix A.2. Overall Metrics
- 1.
- is the time event really occurred or censored.
- 2.
- , the risk score of instance i at the time .
- 3.
- if else 0.
- 4.
- if event occurred, else 0.
Appendix B. Datasets
Appendix C. Hyperparameters Search Spaces
Hyperparameter | Values |
---|---|
Layers | |
Nodes per layer | |
Dropout | |
Weigh decay | |
Learning rate | |
Batch size | |
Loss (DeepHit) | |
Loss (DeepHit) |
Hyperparameter | Values |
---|---|
Layers | |
Layers Positive | |
Nodes per layer | |
Nodes per positive layer | |
Dropout | |
Weigh decay | |
Learning rate | |
Batch size |
Hyperparameter | Values |
---|---|
Feature embedding size | |
Attention heads | |
Learning rate | |
Weigh decay | |
Batch size |
Hyperparameter | Values |
---|---|
Feature embedding size (D) | |
No. attention heads | |
No. layers Positive | |
No. nodes per positive layer | |
Learning rate | |
Weigh decay | |
Batch size | |
Loss | |
Loss |
Hyperparameter | Values |
---|---|
Corrupt ratio | |
Learning rate | |
Weigh decay | |
Batch size | |
Loss |
References
- Ziehm, M.; Thornton, J.M. Unlocking the potential of survival data for model organisms through a new database and online analysis platform: SurvCurv. Aging Cell 2013, 12, 910–916. [Google Scholar] [CrossRef] [PubMed]
- Susto, G.A.; Schirru, A.; Pampuri, S.; McLoone, S.; Beghi, A. Machine Learning for Predictive Maintenance: A Multiple Classifier Approach. IEEE Tran. Ind. Inf. 2015, 11, 812–820. [Google Scholar] [CrossRef]
- Laurie, J.A.; Moertel, C.G.; Fleming, T.R.; Wieand, H.S.; Leigh, J.E.; Rubin, J.; McCormack, G.W.; Gerstner, J.B.; Krook, J.E.; Malliard, J. Surgical adjuvant therapy of large-bowel carcinoma: An evaluation of levamisole and the combination of levamisole and fluorouracil. The North Central Cancer Treatment Group and the Mayo Clinic. J. Clin. Oncol. 1989, 7, 1447–1456. [Google Scholar] [CrossRef] [PubMed]
- Dirick, L.; Claeskens, G.; Baesens, B. Time to default in credit scoring using survival analysis: A benchmark study. J. Oper. Res. Soc. 2017, 68, 652–665. [Google Scholar] [CrossRef]
- Van den Poel, D.; Larivière, B. Customer attrition analysis for financial services using proportional hazard models. Eur. J. Oper. Res. 2004, 157, 196–217. [Google Scholar] [CrossRef]
- Kaplan, E.L.; Meier, P. Nonparametric Estimation from Incomplete Observations. J. Am. Stat. Assoc. 1958, 53, 457–481. [Google Scholar] [CrossRef]
- Cox, D.R. Regression Models and Life-Tables. J. R. Stat. Soc. Ser. B (Methodol.) 1972, 34, 187–220. [Google Scholar] [CrossRef]
- Kalbfleisch, J.D.; Prentice, R.L. The Statistical Analysis of Failure Time Data; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
- Pölsterl, S.; Navab, N.; Katouzian, A. Fast training of support vector machines for survival analysis. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Porto, Portugal, 7–11 September 2015; pp. 243–259. [Google Scholar]
- Van Belle, V.; Pelckmans, K.; Suykens, J.A.; Van Huffel, S. Support vector methods for survival analysis: A comparison between ranking and regression approaches. Artif. Intell. Med. 2011, 53, 107–118. [Google Scholar] [CrossRef]
- Ishwaran, H.; Kogalur, U.B.; Blackstone, E.H.; Lauer, M.S. Random survival forests. Ann. Appl. Stat. 2008, 2, 841–860. [Google Scholar] [CrossRef]
- Wang, X.; Pakbin, A.; Mortazavi, B.; Zhao, H.; Lee, D. BoXHED: Boosted exact Hazard estimator with dynamic covariates. In Proceedings of the ICML, Virtual, 13–18 July 2020; pp. 9973–9982. [Google Scholar]
- Rindt, D.; Hu, R.; Steinsaltz, D.; Sejdinovic, D. Survival regression with proper scoring rules and monotonic neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Virtual, 28–30 March 2022; pp. 1190–1205. [Google Scholar]
- Lee, C.; Zame, W.; Yoon, J.; Van Der Schaar, M. Deephit: A deep learning approach to survival analysis with competing risks. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Katzman, J.L.; Shaham, U.; Cloninger, A.; Bates, J.; Jiang, T.; Kluger, Y. DeepSurv: Personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med. Res. Methodol. 2018, 18, 24. [Google Scholar] [CrossRef]
- Wang, Z.; Sun, J. Survtrace: Transformers for survival analysis with competing events. In Proceedings of the ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Northbrook, IL, USA, 7–10 August 2022; pp. 1–9. [Google Scholar]
- Hu, S.; Fridgeirsson, E.; van Wingen, G.; Welling, M. Transformer-based deep survival analysis. In Proceedings of the Survival Prediction-Algorithms, Challenges and Applications, Palo Alto, CA, USA, 22–24 March 2021; pp. 132–148. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. NIPS 2020, 33, 1877–1901. [Google Scholar]
- Truong, T.D.; Duong, C.N.; Pham, H.A.; Raj, B.; Le, N.; Luu, K. The right to talk: An audio-visual transformer approach. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1105–1114. [Google Scholar]
- Karita, S.; Chen, N.; Hayashi, T.; Hori, T.; Inaguma, H.; Jiang, Z.; Someki, M.; Soplin, N.E.Y.; Yamamoto, R.; Wang, X.; et al. A comparative study on transformer vs rnn in speech applications. In Proceedings of the 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Singapore, 14–18 December 2019. [Google Scholar]
- Dong, L.; Xu, S.; Xu, B. Speech-transformer: A no-recurrence sequence-to-sequence model for speech recognition. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 5884–5888. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Tran, M.; Vo, K.; Yamazaki, K.; Fernandes, A.; Kidd, M.; Le, N. AISFormer: Amodal Instance Segmentation with Transformer. arXiv 2022, arXiv:2210.06323. [Google Scholar]
- Yamazaki, K.; Vo, K.; Truong, Q.S.; Raj, B.; Le, N. VLTinT: Visual-linguistic transformer-in-transformer for coherent video paragraph captioning. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 3081–3090. [Google Scholar]
- Vo, K.; Truong, S.; Yamazaki, K.; Raj, B.; Tran, M.T.; Le, N. Aoe-net: Entities interactions modeling with adaptive attention mechanism for temporal action proposals generation. Int. J. Comput. Vis. 2023, 131, 302–323. [Google Scholar] [CrossRef]
- Huang, X.; Khetan, A.; Cvitkovic, M.; Karnin, Z. TabTransformer: Tabular Data Modeling Using Contextual Embeddings. arXiv 2020, arXiv:2012.06678. [Google Scholar]
- Gorishniy, Y.; Rubachev, I.; Khrulkov, V.; Babenko, A. Revisiting Deep Learning Models for Tabular Data. arXiv 2021, arXiv:2106.11959. [Google Scholar]
- Dispenzieri, A.; Katzmann, J.A.; Kyle, R.A.; Larson, D.R.; Therneau, T.M.; Colby, C.L.; Clark, R.J.; Mead, G.P.; Kumar, S.; Melton, L.J., III; et al. Use of nonclonal serum immunoglobulin free light chains to predict overall survival in the general population. Mayo Clin. Proc. 2012, 87, 517–523. [Google Scholar] [CrossRef]
- Curtis, C.; Shah, S.P.; Chin, S.F.; Turashvili, G.; Rueda, O.M.; Dunning, M.J.; Speed, D.; Lynch, A.G.; Samarajiwa, S.; Yuan, Y.; et al. The genomic and transcriptomic architecture of 2000 breast tumours reveals novel subgroups. Nature 2012, 486, 346–352. [Google Scholar] [CrossRef]
- Knaus, W.A.; Harrell, F.E.; Lynn, J.; Goldman, L.; Phillips, R.S.; Connors, A.F.; Dawson, N.V.; Fulkerson, W.J.; Califf, R.M.; Desbiens, N.; et al. The SUPPORT prognostic model: Objective estimates of survival for seriously ill hospitalized adults. Ann. Intern. Med. 1995, 122, 191–203. [Google Scholar] [CrossRef]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Tang, W.; Ma, J.; Mei, Q.; Zhu, J. SODEN: A Scalable Continuous-Time Survival Model through Ordinary Differential Equation Networks. J. Mach. Learn. Res. 2022, 23, 1–29. [Google Scholar]
- Ausset, G.; Ciffreo, T.; Portier, F.; Clémençon, S.; Papin, T. Individual Survival Curves with Conditional Normalizing Flows. In Proceedings of the IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA), Porto, Portugal, 6–9 October 2021; pp. 1–10. [Google Scholar]
- Danks, D.; Yau, C. Derivative-Based Neural Modelling of Cumulative Distribution Functions for Survival Analysis. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Virtual, 28–30 March 2022; pp. 7240–7256. [Google Scholar]
- Dupont, E.; Doucet, A.; Teh, Y.W. Augmented neural odes. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar] [CrossRef]
- Massaroli, S.; Poli, M.; Park, J.; Yamashita, A.; Asama, H. Dissecting neural odes. Adv. Neural Inf. Process. Syst. 2020, 33, 3952–3963. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. NIPS 2017, 30. [Google Scholar] [CrossRef]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part I 16. Springer: Cham, Switherland, 2020; pp. 213–229. [Google Scholar]
- Liu, S.; Li, F.; Zhang, H.; Yang, X.; Qi, X.; Su, H.; Zhu, J.; Zhang, L. DAB-DETR: Dynamic anchor boxes are better queries for DETR. arXiv 2022, arXiv:2201.12329. [Google Scholar]
- Li, F.; Zhang, H.; Liu, S.; Guo, J.; Ni, L.M.; Zhang, L. Dn-detr: Accelerate detr training by introducing query denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 13619–13627. [Google Scholar]
- Donders, A.R.T.; Van Der Heijden, G.J.; Stijnen, T.; Moons, K.G. A gentle introduction to imputation of missing values. J. Clin. Epidemiol. 2006, 59, 1087–1091. [Google Scholar] [CrossRef]
- Sinharay, S.; Stern, H.S.; Russell, D. The use of multiple imputation for the analysis of missing data. Psychol. Methods 2001, 6, 317. [Google Scholar] [CrossRef]
- Jerez, J.M.; Molina, I.; García-Laencina, P.J.; Alba, E.; Ribelles, N.; Martín, M.; Franco, L. Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif. Intell. Med. 2010, 50, 105–115. [Google Scholar] [CrossRef]
- Yoon, J.; Jordon, J.; Schaar, M. Gain: Missing data imputation using generative adversarial nets. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 5689–5698. [Google Scholar]
- McCoy, J.T.; Kroon, S.; Auret, L. Variational autoencoders for missing data imputation with application to a simulated milling circuit. IFAC-PapersOnLine 2018, 51, 141–146. [Google Scholar] [CrossRef]
- Wang, Q.; Li, B.; Xiao, T.; Zhu, J.; Li, C.; Wong, D.F.; Chao, L.S. Learning deep transformer models for machine translation. arXiv 2019, arXiv:1906.01787. [Google Scholar]
- Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
- Narang, S.; Chung, H.W.; Tay, Y.; Fedus, L.; Févry, T.; Matena, M.; Malkan, K.; Fiedel, N.; Shazeer, N.; Lan, Z.; et al. Do Transformer Modifications Transfer Across Implementations and Applications? In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Virtual, 7–11 November 2021; pp. 5758–5773. [Google Scholar]
- Shazeer, N. Glu variants improve transformer. arXiv 2020, arXiv:2002.05202. [Google Scholar]
- Chilinski, P.; Silva, R. Neural likelihoods via cumulative distribution functions. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), Virtual, 3–6 August 2020; pp. 420–429. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. NIPS 2019, 32. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Kay, W.; Carreira, J.; Simonyan, K.; Zhang, B.; Hillier, C.; Vijayanarasimhan, S.; Viola, F.; Green, T.; Back, T.; Natsev, P.; et al. The kinetics human action video dataset. arXiv 2017, arXiv:1705.06950. [Google Scholar]
- Zhai, X.; Oliver, A.; Kolesnikov, A.; Beyer, L. S4l: Self-supervised semi-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1476–1485. [Google Scholar]
- Tran, M.; Ly, L.; Hua, B.S.; Le, N. SS-3DCAPSNET: Self-Supervised 3d Capsule Networks for Medical Segmentation on Less Labeled Data. In Proceedings of the International Symposium on Biomedical Imaging (ISBI 2022), Kolkata, India, 28–31 March 2022; pp. 1–5. [Google Scholar]
- Phan, T.; Le, D.; Brijesh, P.; Adjeroh, D.; Wu, J.; Jensen, M.O.; Le, N. Multimodality Multi-Lead ECG Arrhythmia Classification using Self-Supervised Learning. In Proceedings of the IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI 2022), Ioannina, Greece, 27–30 September 2022; pp. 1–4. [Google Scholar]
- Harrell, F.E., Jr.; Lee, K.L.; Mark, D.B. Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat. Med. 1996, 15, 361–387. [Google Scholar] [CrossRef]
- Graf, E.; Schmoor, C.; Sauerbrei, W.; Schumacher, M. Assessment and comparison of prognostic classification schemes for survival data. Stat. Med. 1999, 18, 2529–2545. [Google Scholar] [CrossRef]
- Antolini, L.; Boracchi, P.; Biganzoli, E. A time-dependent discrimination index for survival data. Stat. Med. 2005, 24, 3927–3944. [Google Scholar] [CrossRef]
- Kvamme, H.; Borgan, ∅.; Scheel, I. Time-to-event prediction with neural networks and Cox regression. J. Mach. Learn. Res. 2019, 20, 1–30. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
Method | Time Continuity | Deep Learning | Architecture | Handle Missing Features |
---|---|---|---|---|
CoxPH [7] | ✓ | ✗ | – | ✗ |
DeepSurv [15] | ✓ | ✓ | MLP | ✗ |
DeepHit [14] | ✗ | ✓ | MLP | ✗ |
SuMo-Net [13] | ✓ | ✓ | MLP | ✗ |
SurvTRACE [16] | ✗ | ✓ | Transformer | ✗ |
SSL-SurvFormer | ✓ | ✓ | Transformer | ✓ |
Datasets | No. F | Max.No.M.F. | M.R (%) |
---|---|---|---|
METABRIC | 30 | 12 | 35.6% |
SUPPORT | 24 | 16 | 96.6% |
FLCHAIN | 8 | 0 | 0% |
Dataset | Method | Impute | IC-Index ↑ | INBLL ↓ | IBS ↓ |
---|---|---|---|---|---|
MET. | CoxPH | MM | 0.773 (0.010) | 0.441 (0.021) | 0.143 (0.008) |
KNN | 0.776 (0.009) | 0.436 (0.028) | 0.140 (0.006) | ||
DeepSurv | MM | 0.778 (0.015) | 0.433 (0.022) | 0.139 (0.013) | |
KNN | 0.780 (0.012) | 0.426 (0.025) | 0.135 (0.010) | ||
SuMo-Net | MM | 0.781 (0.012) | 0.425 (0.033) | 0.135 (0.011) | |
KNN | 0.784 (0.011) | 0.418 (0.030) | 0.131 (0.013) | ||
SSL-SurvFormer | 0.796 (0.014) | 0.406 (0.021) | 0.129 (0.009) | ||
SUP. | CoxPH | MM | 0.658 (0.008) | 0.566 (0.021) | 0.195 (0.003) |
KNN | 0.662 (0.006) | 0.563 (0.022) | 0.191 (0.004) | ||
DeepSurv | MM | 0.668 (0.009) | 0.563 (0.022) | 0.188 (0.006) | |
KNN | 0.668 (0.010) | 0.559 (0.019) | 0.186 (0.009) | ||
SuMo-Net | MM | 0.673 (0.010) | 0.545 (0.012) | 0.182 (0.007) | |
KNN | 0.679 (0.009) | 0.540 (0.018) | 0.180 (0.007) | ||
SSL-SurvFormer | 0.708 (0.007) | 0.528 (0.011) | 0.175 (0.004) |
Method | DeepHit | SurvTRACE | SSL-SurvFormer | ||||
---|---|---|---|---|---|---|---|
Impute | MM | KNN | MM | KNN | |||
METABRIC | CI ↑ | 0.815 (0.028) | 0.815 (0.031) | 0.816 (0.022) | 0.817 (0.028) | 0.826 (0.016) | |
0.799 (0.022) | 0.802 (0.029) | 0.806 (0.026) | 0.808 (0.021) | 0.816 (0.020) | |||
0.784 (0.029) | 0.785 (0.031) | 0.785 (0.024) | 0.788 (0.026) | 0.795 (0.020) | |||
BS ↓ | 0.112 (0.007) | 0.110 (0.009) | 0.101 (0.005) | 0.099 (0.005) | 0.095 (0.008) | ||
0.162 (0.009) | 0.160 (0.012) | 0.153 (0.008) | 0.148 (0.009) | 0.137 (0.013) | |||
0.174 (0.012) | 0.173 (0.013) | 0.164 (0.011) | 0.159 (0.010) | 0.152 (0.017) | |||
SUPPORT | CI ↑ | 0.769 (0.013) | 0.771 (0.015) | 0.773 (0.020) | 0.773 (0.019) | 0.785 (0.014) | |
0.717 (0.013) | 0.719 (0.012) | 0.718 (0.009) | 0.721 (0.011) | 0.738 (0.007) | |||
0.680 (0.012) | 0.684 (0.015) | 0.682 (0.014) | 0.684 (0.012) | 0.696 (0.009) | |||
BS ↓ | 0.143 (0.012) | 0.143 (0.009) | 0.128 (0.006) | 0.124 (0.005) | 0.118 (0.004) | ||
0.204 (0.010) | 0.201 (0.006) | 0.197 (0.009) | 0.195 (0.007) | 0.179 (0.002) | |||
0.227 (0.006) | 0.224 (0.005) | 0.214 (0.010) | 0.213 (0.007) | 0.204 (0.006) |
Method | IC-Index ↑ | INBLL ↓ | IBS ↓ |
---|---|---|---|
CoxPH | 0.782 (0.010) | 0.356 (0.016) | 0.110 (0.004) |
DeepSurv | 0.788 (0.008) | 0.352 (0.013) | 0.107 (0.003) |
SuMo-Net | 0.788 (0.011) | 0.343 (0.017) | 0.105 (0.006) |
SSL-SurvFormer | 0.795 (0.007) | 0.333 (0.014) | 0.101 (0.005) |
DeepHit | SurvTRACE | SSL-SurvFormer | ||
---|---|---|---|---|
CI ↑ | 0.798 (0.026) | 0.800 (0.030) | 0.802 (0.026) | |
0.793 (0.010) | 0.796 (0.009) | 0.800 (0.010) | ||
0.790 (0.008) | 0.791 (0.012) | 0.794 (0.011) | ||
BS ↓ | 0.083 (0.016) | 0.059 (0.005) | 0.059 (0.006) | |
0.136 (0.014) | 0.100 (0.004) | 0.099 (0.004) | ||
0.210 (0.027) | 0.126 (0.009) | 0.125 (0.008) |
Baseline | SSL | |||
---|---|---|---|---|
IC-index ↑ | 0.793 (0.006) | 0.793 (0.007) | 0.793 (0.008) | 0.795 (0.007) |
IBS ↓ | 0.340 (0.012) | 0.336 (0.014) | 0.334 (0.016) | 0.333 (0.014) |
INBLL ↓ | 0.104 (0.004) | 0.104 (0.004) | 0.102 (0.006) | 0.101 (0.005) |
Baseline | SSL | |||
---|---|---|---|---|
KNN Impute | ||||
IC-index ↑ | 0.688 (0.009) | 0.695 (0.009) | 0.700 (0.005) | 0.708 (0.007) |
IBS ↓ | 0.181 (0.006) | 0.178 (0.008) | 0.176 (0.009) | 0.175 (0.004) |
INBLL ↓ | 0.537 (0.015) | 0.531 (0.014) | 0.532 (0.010) | 0.528 (0.011) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Le, Q.-H.; Patel, B.; Adjeroh, D.; Doretto, G.; Le, N. SSL-SurvFormer: A Self-Supervised Learning and Continuously Monotonic Transformer Network for Missing Values in Survival Analysis. Informatics 2025, 12, 32. https://doi.org/10.3390/informatics12010032
Le Q-H, Patel B, Adjeroh D, Doretto G, Le N. SSL-SurvFormer: A Self-Supervised Learning and Continuously Monotonic Transformer Network for Missing Values in Survival Analysis. Informatics. 2025; 12(1):32. https://doi.org/10.3390/informatics12010032
Chicago/Turabian StyleLe, Quang-Hung, Brijesh Patel, Donald Adjeroh, Gianfranco Doretto, and Ngan Le. 2025. "SSL-SurvFormer: A Self-Supervised Learning and Continuously Monotonic Transformer Network for Missing Values in Survival Analysis" Informatics 12, no. 1: 32. https://doi.org/10.3390/informatics12010032
APA StyleLe, Q.-H., Patel, B., Adjeroh, D., Doretto, G., & Le, N. (2025). SSL-SurvFormer: A Self-Supervised Learning and Continuously Monotonic Transformer Network for Missing Values in Survival Analysis. Informatics, 12(1), 32. https://doi.org/10.3390/informatics12010032