Recently, large-scale bioinformatics and genomic data have been generated using advanced biotechnology methods, thus increasing the importance of analyzing such data. Numerous data mining methods have been developed to process genomic data in the field of bioinformatics. We extracted significant genes for the prognosis prediction of 1157 patients using gene expression data from patients with kidney cancer. We then proposed an end-to-end, cost-sensitive hybrid deep learning (COST-HDL) approach with a cost-sensitive loss function for classification tasks on imbalanced kidney cancer data. Here, we combined the deep symmetric auto encoder; the decoder is symmetric to the encoder in terms of layer structure, with reconstruction loss for non-linear feature extraction and neural network with balanced classification loss for prognosis prediction to address data imbalance problems. Combined clinical data from patients with kidney cancer and gene data were used to determine the optimal classification model and estimate classification accuracy by sample type, primary diagnosis, tumor stage, and vital status as risk factors representing the state of patients. Experimental results showed that the COST-HDL approach was more efficient with gene expression data for kidney cancer prognosis than other conventional machine learning and data mining techniques. These results could be applied to extract features from gene biomarkers for prognosis prediction of kidney cancer and prevention and early diagnosis.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited