Survival analysis of the Cancer Genome Atlas (TCGA) dataset is a well-known method for discovering gene expression-based prognostic biomarkers of head and neck squamous cell carcinoma (HNSCC). A cutoff point is usually used in survival analysis for patient dichotomization when using continuous gene expression values. There is some optimization software for cutoff determination. However, the software’s predetermined cutoffs are usually set at the medians or quantiles of gene expression values. There are also few clinicopathological features available in pre-processed datasets. We applied an in-house workflow, including data retrieving and pre-processing, feature selection, sliding-window cutoff selection, Kaplan–Meier survival analysis, and Cox proportional hazard modeling for biomarker discovery. In our approach for the TCGA HNSCC cohort, we scanned human protein-coding genes to find optimal cutoff values. After adjustments with confounders, clinical tumor stage and surgical margin involvement were found to be independent risk factors for prognosis. According to the results tables that show hazard ratios with Bonferroni-adjusted p
values under the optimal cutoff, three biomarker candidates, CAMK2N1, CALML5, and FCGBP, are significantly associated with overall survival. We validated this discovery by using the another independent HNSCC dataset (GSE65858). Thus, we suggest that transcriptomic analysis could help with biomarker discovery. Moreover, the robustness of the biomarkers we identified should be ensured through several additional tests with independent datasets.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited