Classiﬁcation of Kidney Cancer Data Using Cost-Sensitive Hybrid Deep Learning Approach

: Recently, large-scale bioinformatics and genomic data have been generated using advanced biotechnology methods, thus increasing the importance of analyzing such data. Numerous data mining methods have been developed to process genomic data in the ﬁeld of bioinformatics. We extracted signiﬁcant genes for the prognosis prediction of 1157 patients using gene expression data from patients with kidney cancer. We then proposed an end-to-end, cost-sensitive hybrid deep learning (COST-HDL) approach with a cost-sensitive loss function for classiﬁcation tasks on imbalanced kidney cancer data. Here, we combined the deep symmetric auto encoder; the decoder is symmetric to the encoder in terms of layer structure, with reconstruction loss for non-linear feature extraction and neural network with balanced classiﬁcation loss for prognosis prediction to address data imbalance problems. Combined clinical data from patients with kidney cancer and gene data were used to determine the optimal classiﬁcation model and estimate classiﬁcation accuracy by sample type, primary diagnosis, tumor stage, and vital status as risk factors representing the state of patients. Experimental results showed that the COST-HDL approach was more e ﬃ cient with gene expression data for kidney cancer prognosis than other conventional machine learning and data mining techniques. These results could be applied to extract features from gene biomarkers for prognosis prediction of kidney cancer and prevention and early diagnosis.


Introduction
Using bioinformatics approaches to identify genes that are useful for the diagnosis and prognosis prediction of patients with cancer can foster treatment.The analysis of cancer data is important yet difficult due to the large amounts of gene expression data available.Thus, only significant features that can express the health condition of patients must be extracted.Additionally, the development of efficient classification models based on the extracted genes is helpful for early diagnosis and prognosis prediction of patients with cancer.Cancer is caused by gene modifications, which may enable a cell to proliferate exponentially and then permeate normal surrounding cells before spreading through the body.In utilizing deep learning methods to accurately predict the disease condition of patients by analyzing mutations only in the gene sequence, studies have identified genes involved in spinal muscular atrophy, hereditary nonpolyposis colon cancer, and autism [1].
In this study, we extracted genes useful for the prognosis prediction of patients with kidney cancer and then predicted prognosis by applying a classification algorithm based on the gene.Kidney cancer is a primary tumor generated from the kidney, among which malignant renal cell carcinoma accounts for over 90% of cases.Because kidney cancer shows no symptoms at the early stages, it is often diagnosed at a progressive stage.According to registered statistics for cancer in Korea, 5043 kidney cancer cases were diagnosed in 2016, thereby ranking 10th among all cancers.In fact, the annual incidence of kidney cancer increased steadily from 1999 to 2019 [2].Additionally, the symptoms and treatment of kidney cancer decrease the quality of life of the patients by increasing the disease burden and medical costs.Lifestyle factors, such as poor diet, physical inactivity, smoking, and alcohol consumption, are associated with an increased risk of kidney cancer.Additionally, genetic and environmental factors influence all of these risk factors and diseases, such as diabetes, hypertension, and obesity [3].
There have been various successful applications of machine learning and data mining techniques to bioinformatics and genomics [4] research.For example, PathAI was implemented for digital pathology after the analysis of image data from patients with breast cancer using artificial intelligence, which decreased the error rate of diagnosing metastasized cancer through deep learning [5].Additionally, a study [6] at Emory University analyzed the survival rate of patients with brain tumors by combining gene data with pathology image data, and this showed a very high accuracy of survival rate prediction.It was reported that the deep learning convolutional neural networks achieved higher accuracy than pathologist-based diagnosis in the prediction of survival rate [6].Another study predicted the degree of risk of approximately 20 cancers by applying machine learning and artificial intelligence to analyze gene-related big data [7].Over the years, various technologies for data mining have been applied.Specifically, a deep learning method was applied to infer the expression of target genes from the expression of landmark genes [8].The performance of the tested method outperformed other machine learning algorithms significantly.Recent studies were also conducted to develop a classification model system for diagnosing disease and cancer using machine learning [9,10].
Most studies have been conducted to extract features using genome data from patients with kidney cancer by data mining, statistical methods, and classification algorithms [11][12][13].Various bioinformatics and genomic data have also been applied in algorithms based on machine learning [14][15][16].Recently, due to the advantages of deep learning, various deep learning approaches have been applied to the research of cancer using gene expression data [17][18][19].Deep learning approaches are useful for constructing predictive models and feature extraction: Where higher levels represent more abstract entities, they map the lowest input layer to the uppermost output layer without using hand-crafted features or rules [20,21].Using data from The Cancer Genome Atlas (TCGA) [22], we used a deep learning approach in a prior study to extract genes related to cancer by combining RNA sequencing and DNA methylation data.We evaluated breast invasive carcinoma, thyroid carcinoma, and kidney renal papillary cell carcinoma [23].
In this study, we combined gene expression and clinical data from patients with kidney cancer from TCGA and applied our proposed deep learning, end-to-end COST-HDL approach.We compared the proposed approach with several traditional data mining and machine learning methods that are not implemented end-to-end.These methods have multiple steps such as feature engineering, over-and under-sampling, and classification.The objectives of this study are to extract deep features from gene biomarkers for precisely predicting prognosis, overcome differences in various types of cancer data, and develop an end-to-end prediction model by comparing and analyzing classification algorithms using the extracted genes.The major contributions of this paper can be summarized as follows: (1) We propose an end-to-end approach without any manual engineering, which predicts kidney cancer prognosis including sample type, primary diagnosis, tumor stage, and vital status.The remainder of the paper is organized as follows: Section 2 introduces the gene expression dataset from patients with kidney cancer and explains the proposed deep learning approach in detail.In Section 3, the experimental results are provided.Finally, Section 4 discusses the experimental analysis, and addresses our conclusion.

Dataset
TCGA contains a variety of gene information such as single-nucleotide polymorphism (SNP) and gene expression (mRNA expression) data from large numbers of patients with cancer, which are stored in a database [22].We collected TCGA data from 1157 patients with kidney cancer and other clinical information including sample type, primary diagnosis, tumor stage, and vital status.Each clinical information is used as class labels in the prognosis prediction task.The degree of gene expression was estimated at the RNA level, and the expression data (transcriptome profiling) were merged and digitized after assigning transaction IDs.We used 60,483 gene expression data points from each patient with kidney cancer, values expressed with the Fragments Per Kilobase per Million mapped (FPKM) measure [24].The kidney cancer dataset was used to extract the complex structure of gene biomarkers and estimate classification accuracy as risk factors by sample type, primary diagnosis, tumor stage, and vital status representing the state of patients.
The statistics of the dataset are shown in Table 1.In the preprocessing step, we removed all no variance gene expression data and other noisy samples.Varying samples and gene expression data sizes were used for the prognoses, and they were split into 80% for training and 20% for testing.The datasets are highly imbalanced, especially the dataset of sample type prognosis, which contains 87.9% primary tumor samples and 12.1% solid tissue normal samples.In the analysis, we applied a cost function to solve this data imbalance problem and compared it with other sampling methods.We also used the DAE model to extract the high dimension of gene expression data and compared it with other feature-selection and dimension-reduction techniques.

The COST-HDL Approach
In the experiments, the extracted target genes were subject to classification analysis, and the performance was evaluated.Figure 1 shows the proposed COST-HDL approach which input the gene expression data of kidney cancer from the TCGA portal and output four kinds of prognoses namely, sample type, primary diagnosis, tumor stage, and vital status.It consists of a hybrid of DAE and NN models.For the RNA sequencing data, the number of variables is significantly higher than the number of samples.Therefore, general classification analysis is prohibited by technical challenges in dealing with more than 60,000 variables: it is challenging to apply the data mining and machine Symmetry 2020, 12, 154 4 of 21 learning algorithms to the raw dataset.Therefore, in this study, we used the 5-layer DAE model (the first 2 layers for encoding, the middle layer for gene extraction, and the last 2 layers for decoding) to extract significant genes and extract deep features from gene biomarkers as a result.The extracted deep features were input to the NN classification method (hidden layer + dropout [25] + Rectified Linear Unit (ReLU) [26] + softmax [27]).
learning algorithms to the raw dataset.Therefore, in this study, we used the 5-layer DAE model (the first 2 layers for encoding, the middle layer for gene extraction, and the last 2 layers for decoding) to extract significant genes and extract deep features from gene biomarkers as a result.The extracted deep features were input to the NN classification method (hidden layer + dropout [25] + Rectified Linear Unit (ReLU) [26] + softmax [27]).
The DAE model employed the mean squared error (MSE) as a reconstruction loss during the training, while the NN model used the focal loss [28] as a balanced classification loss.Focal loss is the reshaping of cross-entropy loss such that it down-weights the loss assigned to well-classified examples.The novel focal loss focuses on training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.The proposed COST-HDL approach uses the sum of the reconstruction loss and balanced classification loss as a cost function.The experimental hardware platform was the Intel Xeon E3 (32G memory, GTX 1080 Ti).We used Ubuntu 18.04 as the computational environment, and Python 3.7 was used for data collection and analysis.Python 3.7 Library uses Scikit-Learn [29] and Pytorch [30].The following paragraphs describe the DAE model for extracting deep features from gene biomarkers and the NN model for constructing prognosis prediction models in detail.

Extracting Deep Features from Gene Biomarkers
We utilized the training dataset to extract gene expression data by using the DAE non-linear feature transformation method, and we compared it with Principal Component Analysis (PCA) [31] linear feature transformation and the Least Absolute Shrinkage and Selection Operator (LASSO) [32] feature selection methods.PCA explains correlated multivariate data in a fewer number of linearly uncorrelated variables which are a linear combination of the original variable.Due to the linearity constraints, we developed a DAE with non-linear activation functions which give more accuracy in the reconstruction of data.However, the feature selection methods such as LASSO select the best features or a subset of the original feature set and do not alter the original representation of data [33].Thus, they may lose some important information during a selection process when extracting a complex structure of cancer data.
We developed the DAE model using Pytorch to extract deep features from gene biomarkers.The architecture of the DAE model consists of encoder and decoder parts.The encoder part comprised The experimental hardware platform was the Intel Xeon E3 (32G memory, GTX 1080 Ti).We used Ubuntu 18.04 as the computational environment, and Python 3.7 was used for data collection and analysis.Python 3.7 Library uses Scikit-Learn [29] and Pytorch [30].The following paragraphs describe the DAE model for extracting deep features from gene biomarkers and the NN model for constructing prognosis prediction models in detail.

Extracting Deep Features from Gene Biomarkers
We utilized the training dataset to extract gene expression data by using the DAE non-linear feature transformation method, and we compared it with Principal Component Analysis (PCA) [31] linear feature transformation and the Least Absolute Shrinkage and Selection Operator (LASSO) [32] feature selection methods.PCA explains correlated multivariate data in a fewer number of linearly uncorrelated variables which are a linear combination of the original variable.Due to the linearity constraints, we developed a DAE with non-linear activation functions which give more accuracy in the reconstruction of data.However, the feature selection methods such as LASSO select the best features or a subset of the original feature set and do not alter the original representation of data [33].Thus, they may lose some important information during a selection process when extracting a complex structure of cancer data.
We developed the DAE model using Pytorch to extract deep features from gene biomarkers.The architecture of the DAE model consists of encoder and decoder parts.The encoder part comprised one input layer, and three fully connected encoding hidden layers with 1000, 500, and 100 nodes, respectively.The last layer of the hidden layers was chosen to be the deep feature to extract the gene biomarkers.The decoder part comprised two fully connected decoding hidden layers with 500 and 1000 nodes, respectively.The last layer of the hidden layer was chosen to be the output layer (reconstructed input).These are used to transpose the encoding layer weights.The procedure can be formulated as below: where W 1 , W 2 , and W 3 are the weight metrics between the layers with the size of N × 1000, 1000 × 500, and 500 × 100, respectively; N is the size of input or number of samples; b 1 , b 2 , and b 3 are the biases for each node; and ReLU and Tanh are non-linear activation functions.The terms with superscripts refer to the transpose metrics.The hidden_encode 3 layer was chosen to be the activity values of the deep features in this model.The DAE has a loss function to handle the data reconstruction error which can measure the error between the original data and the reconstructed data, and it employed the MSE as its loss function.

Constructing Prognose Prediction Models
For the prognose prediction models, we constructed a feedforward neural network, which contained one input layer, one hidden layer with 100 nodes, and one output layer.The deep features of the hidden_encode 3 in the DAE model were used as the input of the NN model.This procedure can be formulated as below: where W 4 and W 5 are the weight metrics between the layers with the size of 100 × 100 and 100 × C, respectively; C is the size of output or number of class types; b 4 and b 5 are the biases for each node; and ReLU and so f tmax are non-linear activation functions.The so f tmax activation function computes softmax cross entropy between logits and labels, and the sum of its outputs to 1 makes an efficient probability analysis.A dropout layer was added after the hidden_layer, which randomly set 20% of the output of that layer to 0. The NN has a loss function to handle classification error which can measure the error between the true class and prediction class and also addresses the class imbalance.
The NN model employed the focal loss as its loss function.The focal loss addresses the class balance problem by reshaping the standard cross-entropy loss such that it down-weighs the loss assigned to well-classified examples.

Training the Models
The cost function L was used to measure the difference between the input and the output: For the optimization, we selected Adam optimizer [34], which has several arguments to be set freely, as the strategy to update the weights and bias so that the minima could be found.After running different trials, the learning rate was finally set to 0.00001, and the batch size and epoch were set to 128 and 2000, respectively.The models were finally trained under the parameters mentioned above.We chose the checkpoint model which shows the lowest error on the training set.The activity values and weight metrics related to deep features were readouts.

Visualization of Feature Extraction
The training set was utilized to analyze and extract deep features from gene biomarkers by the DAE model.We compared it with the PCA dimension reduction and LASSO feature selection methods.We extracted 100 features for each classification task for further analysis by the DAE model as shown in Table 2.For a fair comparison, we also extracted 100 features for each classification task by the PCA method as shown in Table 3. Different numbers of gene biomarkers were selected by the LASSO method as shown in Table 4.The testing set was utilized to evaluate the feature extraction from gene biomarkers.We developed the PCA and LASSO methods using Scikit-Learn and developed the DAE model using Pytorch.For the visualization of the deep features extracted by DAE, the features extracted by PCA, and the features selected by LASSO, we used t-Distributed Stochastic Neighbor Embedding (TSNE) [35].TSNE is a widely used non-linear dimensionality reduction technique for visualizing high-dimensional data with clear and perfect separation on the two-(or three-) dimensional plane.
We used the two-dimensional plane for the following visualizations of extracted features as shown in Figures 2-5        The visualization of the extracted features from the gene biomarkers for the prognosis such as sample type, primary diagnosis, tumor stage, and vital status are shown in Figures 2-5, respectively.It can be seen that the deep features extracted by the DAE model were distinguished better than the features extracted by the PCA method and the features selected by the LASSO method on both the training and testing sets.Further, other prognoses are identified by the DAE method.The visualization of the extracted features from the gene biomarkers for the prognosis such as sample type, primary diagnosis, tumor stage, and vital status are shown in Figures 2-5, respectively.It can be seen that the deep features extracted by the DAE model were distinguished better than the features extracted by the PCA method and the features selected by the LASSO method on both the training and testing sets.Further, other prognoses are identified by the DAE method.

Training Process
We trained our COST-HDL approach with 2000 epochs.Each loss (MSE, Focal, and Total) during the training is shown in Figures 6-9 for each prognosis.The MSE loss continuously decreased in all experiments for each diagnosis.In the multi-class case, tumor stage prognosis, it decreased more strictly.The focal loss decreased, but it was more sensitive during the training for each prognosis.In the binary class case, sample type prognosis, it was most sensitive and between the values 0.6 and 1.This was because the model was already satisfied with 100% of performance results.

Training Process
We trained our COST-HDL approach with 2000 epochs.Each loss (MSE, Focal, and Total) during the training is shown in Figures 6-9 for each prognosis.The MSE loss continuously decreased in all experiments for each diagnosis.In the multi-class case, tumor stage prognosis, it decreased more strictly.The focal loss decreased, but it was more sensitive during the training for each prognosis.In the binary class case, sample type prognosis, it was most sensitive and between the values 0.6 and 1.This was because the model was already satisfied with 100% of performance results.

Evaluation of Prognose Prediction Models
To evaluate our COST-HDL approach, four indices namely, accuracy, precision, recall, and f1score were employed the classification performance, and they are defined as follows.

Evaluation of Prognose Prediction Models
evaluate our COST-HDL approach, four indices namely, accuracy, precision, recall, and f1-score were employed the classification performance, and they are defined as follows.where TP, TN, FP, and FN are the number of true positives, true negatives, false positives, and false negatives, respectively.A true positive is an outcome where the model correctly predicts the positive class.Similarly, a true negative is an outcome where the model correctly predicts the negative class.A false positive is an outcome where the model incorrectly predicts the positive class, and a false negative is an outcome where the model incorrectly predicts the negative class.In Table 5, we compared the models with different loss functions (only MSE loss, only focal loss, and total loss).It can be seen that the models with total loss show better performances than the other single loss models, and the models with only MSE loss show the worst results.For the prediction of sample type prognosis, our COST-HDL approach with total loss achieved the highest results: 100% accuracy, 100% precision, 100% recall, and 100% f1-score.It improved the model with only focal loss by 0.43% of accuracy, 0.24% of precision, 2% of recall, and 1.14% of f1-score.
For the prediction of primary diagnosis prognosis, our COST-HDL approach with total loss achieved the highest results: 96.98% accuracy, 97.43% precision, 95.68% recall, and 96.49% f1-score.It improved the model with only focal loss by 0.43% of accuracy, 0.3% of precision, 0.67% of recall, and 0.52% of f1-score.
For the prediction of tumor stage prognosis, our COST-HDL approach with total loss achieved the highest results: 56.70% accuracy, 49.41% precision, 46.14% recall, and 46.68% f1-score.It improved the model with only focal loss by 2.24% of accuracy, 4.26% of precision, 1.09% of recall, and 2.92% of f1-score.
For the prediction of vital status prognosis, our COST-HDL approach with total loss achieved the highest results: 76.72% accuracy, 69.78% precision, 68.92% recall, and 69.32% f1-score.It improved the model with only focal loss by 0.43% of accuracy, 0.78% of precision, 1.87% of recall, and 1.49% of f1-score.We verified our COST-HDL approach performs better than general traditional machine learning classifiers, such as K-Nearest Neighbors (KNN) [36], Linear Support Vector Machine (Linear SVM) [37], Kernel Support Vector Machine (Kernel SVM) [38], Random Forest (RF) [39], and Neural Network (NN) [40].The traditional machine learning classifiers are followed by feature extraction methods such as PCA dimension reduction and LASSO feature selection.To solve the data imbalance problem, they usually employ sampling methods such as the Synthetic Minority Over-sampling Technique (SMOTE) [41], which is an over-sampling method.
Hence, in this paper, we compared our COST-HDL approach with a total loss to the traditional combination of methods: feature extraction → sampling → classifier, as shown in Tables 6-9 for each prognosis.
For the sample type prognosis, the RF classifier with LASSO feature selection and SMOTE sampling achieved 100% accuracy, 100% precision, 100% recall, and 100% f1-score.The second-best results were 99.57% accuracy, 98.08% precision, 99.76% recall, and 98.90% f1-score achieved by the KNN and NN with LASSO feature selection and SMOTE sampling.The worst results were achieved by Kernel SVM.
For the primary diagnosis prognosis, the second-best results were 95.69% accuracy, 95.37% precision, 94.73% recall, and 95.04% f1-score achieved by the Linear SVM with LASSO feature selection and SMOTE sampling.The worst results were achieved by Kernel SVM.For the tumor stage prognosis, the second-best results were 55.36% accuracy, 55.87% precision, 39.11% recall, and 39.07%f1-score achieved by the RF with LASSO feature selection and without SMOTE sampling.The worst results were achieved by the Linear SVM with PCA and SMOTE sampling.
For the vital status prognosis, the second-best results were 75.00% accuracy, 66.56% precision, 58.79% recall, and 59.33% f1-score achieved by the RF with LASSO feature selection and without SMOTE sampling.The worst results were achieved by the Linear SVM with PCA and SMOTE sampling.

Discussion and Conclusions
In this study, we showed that unsupervised non-linear DAE is an effective model to extract meaningful deep features of gene expression data from patients with kidney cancer.These features were significantly associated with the kidney cancer prognosis such as sample type, primary diagnosis, tumor stage, and vital status representing the state of patients.We also showed that the end-to-end hybrid deep learning architecture is more effective than the traditional machine learning analysis flow: feature extraction, sampling, classification.
We compared the proposed COST-HDL approach with other traditional approaches, and it achieved better results for all prognosis on gene expression data.The deep features extracted by the DAE model were distinguished better than the features extracted by the PCA method and the features selected by the LASSO method on both the training and testing sets.Further, another class label was identified by the DAE method.The results obtained can be applied to extract deep features from gene biomarkers for prognosis prediction of kidney cancer from various causes and; hence, it is useful for preventing kidney cancer and early diagnosis.
This study can be improved in three ways.The first is to develop unsupervised deep symmetric autoencoder methods such as stacking more layers, denoising, or variational functions.The second is to modify loss function which can also handle the imbalance problem, reconstruction, and classification error.The third is to improve the classifier instead of using the only neural network, and add more

( 2 )
We propose a non-linear transformation strategy, deep symmetric autoencoder, to extract deep features from gene biomarkers in kidney cancer by taking advantage of deep learning structure.(3) We propose a mixed loss function for the proposed deep learning model, both considering compression of knowledge representation and data imbalanced problem.Symmetry 2020, 12, 154 3 of 21

Figure 1 .
Figure 1.Overview of COST-HDL approach.We used kidney cancer gene expression data from the TCGA portal.The Deep Auto Encoder (DAE) model is used to extract deep features from gene biomarkers as a lower-dimensional vector.The Neural Network (NN) is used to classify sample type, primary diagnosis, tumor stage, and vital status.We summed the reconstruction loss (DAE) and balanced classification loss (NN) in the cost function.

Figure 1 .
Figure 1.Overview of COST-HDL approach.We used kidney cancer gene expression data from the TCGA portal.The Deep Auto Encoder (DAE) model is used to extract deep features from gene biomarkers as a lower-dimensional vector.The Neural Network (NN) is used to classify sample type, primary diagnosis, tumor stage, and vital status.We summed the reconstruction loss (DAE) and balanced classification loss (NN) in the cost function.The DAE model employed the mean squared error (MSE) as a reconstruction loss during the training, while the NN model used the focal loss [28] as a balanced classification loss.Focal loss is the reshaping of cross-entropy loss such that it down-weights the loss assigned to well-classified examples.The novel focal loss focuses on training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.The proposed COST-HDL approach uses the sum of the reconstruction loss and balanced classification loss as a cost function.The experimental hardware platform was the Intel Xeon E3 (32G memory, GTX 1080 Ti).We used Ubuntu 18.04 as the computational environment, and Python 3.7 was used for data collection and analysis.Python 3.7 Library uses Scikit-Learn[29] and Pytorch[30].The following paragraphs describe the DAE model for extracting deep features from gene biomarkers and the NN model for constructing prognosis prediction models in detail.
for each prognosis.

Figure 2 .
Figure 2. Visualization of extracted features from gene biomarkers for sample type prognosis: (a) train data extracted by PCA, (b) test data extracted by PCA, (c) train data extracted by LASSO, (d) test data extracted by LASSO, (e) train data extracted by DAE, (f) test data extracted by DAE.

Figure 2 .Figure 3 .
Figure 2. Visualization of extracted features from gene biomarkers for sample type prognosis: (a) train data extracted by PCA, (b) test data extracted by PCA, (c) train data extracted by LASSO, (d) test data extracted by LASSO, (e) train data extracted by DAE, (f) test data extracted by DAE.

Figure 3 .
Figure 3. Visualization of extracted features from gene biomarkers for primary diagnosis prognosis: (a) train data extracted by PCA, (b) test data extracted by PCA, (c) train data extracted by LASSO, (d) test data extracted by LASSO, (e) train data extracted by DAE, (f) test data extracted by DAE.

Figure 4 .
Figure 4. Visualization of extracted features from gene biomarkers for tumor stage prognosis: (a) train data extracted by PCA, (b) test data extracted by PCA, (c) train data extracted by LASSO, (d) test data extracted by LASSO, (e) train data extracted by DAE, (f) test data extracted by DAE.

Figure 4 .Figure 5 .
Figure 4. Visualization of extracted features from gene biomarkers for tumor stage prognosis: (a) train data extracted by PCA, (b) test data extracted by PCA, (c) train data extracted by LASSO, (d) test data extracted by LASSO, (e) train data extracted by DAE, (f) test data extracted by DAE.

Figure 5 .
Figure 5. Visualization of extracted features from gene biomarkers for vital status prognosis: (a) train data extracted by PCA, (b) test data extracted by PCA, (c) train data extracted by LASSO, (d) test data extracted by LASSO, (e) train data extracted by DAE, (f) test data extracted by DAE.

Figure 6 .
Figure 6.Training loss for sample type prognosis: (a) MSE loss, (b) focal loss, (c) total loss.The x axis indicates the number of epochs, and the y axis indicates the loss.

Figure 6 .
Figure 6.Training loss for sample type prognosis: (a) MSE loss, (b) focal loss, (c) total loss.The x axis indicates the number of epochs, and the y axis indicates the loss.

Figure 7 .
Figure 7. Training loss for primary diagnosis prognosis: (a) MSE loss, (b) focal loss, (c) total loss.The x axis indicates the number of epochs, and the y axis indicates the loss.

Figure 7 .Figure 8 .
Figure 7. Training loss for primary diagnosis prognosis: (a) MSE loss, (b) focal loss, (c) total loss.The x axis indicates the number of epochs, and the y axis indicates the loss.

Figure 8 .Figure 9 .
Figure 8. Training loss for tumor stage prognosis: (a) MSE loss, (b) focal loss, (c) total loss.The x axis indicates the number of epochs, and the y axis indicates loss.

Figure 9 .
Figure 9. Training loss for vital status prognosis: (a) MSE loss, (b) focal loss, (c) total loss.The x axis indicates the number of epochs, and the y axis indicates the loss.

Table 1 .
Number of Class Type of the dataset.

Table 2 .
The extracted the number of deep features from gene biomarkers by the DAE model.

Table 3 .
The extracted number of features from gene biomarkers by PCA method.

Table 4 .
The selected number of gene biomarkers by LASSO method.

Table 5 .
Effect of loss function of the COST-HDL approach.The best results are shown in bold.

Table 6 .
Evaluation of prediction models for sample type.The best results are shown in bold.

Table 7 .
Evaluation of prediction models primary diagnosis.The best results are shown in bold.

Table 8 .
Evaluation of prediction models for tumor stage.The best results are shown in bold.

Table 9 .
of prediction models for vital status.The best results are shown in bold.