# Predicting Deep Learning Based Multi-Omics Parallel Integration Survival Subtypes in Lung Cancer Using Reverse Phase Protein Array Data

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. TCGA Dataset

#### 2.2. Autoencoder

^{−6}. The loss function of the autoencoder was mean square error and the autoencoders were trained for 150 epochs.

#### 2.3. Feature Selection and k-Means Clustering

#### 2.4. Machine Learning Models that Predict Cluster ID

#### 2.5. Predict Cluster ID Using Compressed Uncommon Data

#### 2.6. Identification of the Proteins Associated with Cluster ID

#### 2.7. Statistical Analysis

## 3. Results

#### 3.1. Unsupervised Approach for Obtaining Clinically Meaningful Subtypes

#### 3.2. Predicting Integration Survival Subtypes Using Compressed Categorical Datasets

#### 3.3. Validation Using Uncommon RPPA Datasets

#### 3.4. Comparison of Integration Survival Subtypes and RPPA Survival Subtypes

#### 3.5. Insight into the Proteins Associated with Integration Survival Subtypes

_{pb}= 0.323). However, there was no correlation with NKX2-1 mRNA expression levels: high levels of NKX2-1 mRNA expression tended to be labelled as integration survival subtype 1, whereas low levels of NKX2-1 mRNA expression tended to be labelled as integration survival subtype 0 (Figure 7b, point biserial correlation coefficient: r

_{pb}= 0.064). This tendency is concordant with that seen in Figure 6. Meanwhile, there was a statistically significant difference in the expression of both NKX2-1 RPPA and mRNA between integration survival subtype 0 and 1 (Welch’s t-test: p < 0.001 [RPPA], p = 0.025 [mRNA], respectively). Importantly, there was no correlation between NKX2-1 RPPA expression levels and NKX2-1 mRNA expression levels (Supplementary Figure S4, Pearson’s correlation coefficient: r = 0.102). Recently, it has become evident that mRNA levels are not sufficient to predict protein levels and our results are consistent with the previous report [45]. Hence, it is possible to explain the different results obtained by other groups and those observed in the present study in the context of NKX2-1 [44].

## 4. Discussion

## 5. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Bray, F.; Ferlay, J.; Soerjomataram, I.; Siegel, R.L.; Torre, L.A.; Jemal, A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin.
**2018**, 68, 394–424. [Google Scholar] [CrossRef] [Green Version] - Siegel, R.L.; Miller, K.D.; Jemal, A. Cancer statistics, 2020. CA Cancer J. Clin.
**2020**, 70, 7–30. [Google Scholar] [CrossRef] [PubMed] - Yamaguchi, T.; Nishiura, H. Predicting the Epidemiological Dynamics of Lung Cancer in Japan. J. Clin. Med.
**2019**, 8, 326. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Inamura, K. Lung Cancer: Understanding Its Molecular Pathology and the 2015 WHO Classification. Front. Oncol.
**2017**, 7, 193. [Google Scholar] [CrossRef] [Green Version] - Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature
**2014**, 511, 543–550. [Google Scholar] [CrossRef] [PubMed] - George, J.; Lim, J.S.; Jang, S.J.; Cun, Y.; Ozretic, L.; Kong, G.; Leenders, F.; Lu, X.; Fernandez-Cuesta, L.; Bosco, G.; et al. Comprehensive genomic profiles of small cell lung cancer. Nature
**2015**, 524, 47–53. [Google Scholar] [CrossRef] - Cancer Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature
**2012**, 489, 519–525. [Google Scholar] [CrossRef] - Wu, C.; Zhou, F.; Ren, J.; Li, X.; Jiang, Y.; Ma, S. A Selective Review of Multi-Level Omics Data Integration Using Variable Selection. High Throughput.
**2019**, 8, 4. [Google Scholar] [CrossRef] [Green Version] - Ardila, D.; Kiraly, A.P.; Bharadwaj, S.; Choi, B.; Reicher, J.J.; Peng, L.; Tse, D.; Etemadi, M.; Ye, W.; Corrado, G.; et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat. Med.
**2019**, 25, 954–961. [Google Scholar] [CrossRef] - Xu, Y.; Hosny, A.; Zeleznik, R.; Parmar, C.; Coroller, T.; Franco, I.; Mak, R.H.; Aerts, H. Deep Learning Predicts Lung Cancer Treatment Response from Serial Medical Imaging. Clin. Cancer Res.
**2019**, 25, 3266–3275. [Google Scholar] [CrossRef] [Green Version] - Ramazzotti, D.; Lal, A.; Wang, B.; Batzoglou, S.; Sidow, A. Multi-omic tumor data reveal diversity of molecular mechanisms that correlate with survival. Nat. Commun.
**2018**, 9, 4453. [Google Scholar] [CrossRef] [Green Version] - Chaudhary, K.; Poirion, O.B.; Lu, L.; Garmire, L.X. Deep Learning-Based Multi-Omics Integration Robustly Predicts Survival in Liver Cancer. Clin. Cancer Res.
**2018**, 24, 1248–1259. [Google Scholar] [CrossRef] [Green Version] - Asada, K.; Kobayashi, K.; Joutard, S.; Tubaki, M.; Takahashi, S.; Takasawa, K.; Komatsu, M.; Kaneko, S.; Sese, J.; Hamamoto, R. Uncovering Prognosis-Related Genes and Pathways by Multi-Omics Analysis in Lung Cancer. Biomolecules
**2020**, 10, 524. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Zhang, L.; Lv, C.; Jin, Y.; Cheng, G.; Fu, Y.; Yuan, D.; Tao, Y.; Guo, Y.; Ni, X.; Shi, T. Deep Learning-Based Multi-Omics Data Integration Reveals Two Prognostic Subtypes in High-Risk Neuroblastoma. Front. Genet.
**2018**, 9, 477. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Wei, L.; Jin, Z.; Yang, S.; Xu, Y.; Zhu, Y.; Ji, Y. TCGA-assembler 2: Software pipeline for retrieval and processing of TCGA/CPTAC data. Bioinformatics
**2018**, 34, 1615–1617. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Yuan, C.; Yang, H. Research on K-Value Selection Method of K-Means Clustering Algorithm. J. Multidiscip. Sci. J.
**2019**, 2, 226–253. [Google Scholar] [CrossRef] [Green Version] - Calinski, T.; Harabasz, J. A dendrite method for cluster analysis. Commun. Stat. Theory Methods
**1974**, 3, 1–27. [Google Scholar] [CrossRef] - Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math.
**1987**, 20, 13. [Google Scholar] [CrossRef] [Green Version] - van der Maaten, L.; Hinton, G. Visualizing Data using t-SNE. J. Mach. Learn. Res.
**2008**, 9, 7. [Google Scholar] - Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. arXiv
**2016**, arXiv:1603.02754. [Google Scholar] - Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A highly efficient gradient boosting decision tree. In Proceedings of the NIPS’17: 31st International Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017; pp. 3149–3157. [Google Scholar]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. arXiv
**2017**, arXiv:1705.07874. [Google Scholar] - Yang, L.; Lin, M.; Ruan, W.J.; Dong, L.L.; Chen, E.G.; Wu, X.H.; Ying, K.J. Nkx2-1: A novel tumor biomarker of lung cancer. J. Zhejiang Univ. Sci. B
**2012**, 13, 855–866. [Google Scholar] [CrossRef] [Green Version] - Shi, Y.B.; Li, J.; Lai, X.N.; Jiang, R.; Zhao, R.C.; Xiong, L.X. Multifaceted Roles of Caveolin-1 in Lung Cancer: A New Investigation Focused on Tumor Occurrence, Development and Therapy. Cancers
**2020**, 12, 291. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Guo, T.; Kong, J.; Liu, Y.; Li, Z.; Xia, J.; Zhang, Y.; Zhao, S.; Li, F.; Li, J.; Gu, C. Transcriptional activation of NANOG by YBX1 promotes lung cancer stem-like properties and metastasis. Biochem. Biophys. Res. Commun.
**2017**, 487, 153–159. [Google Scholar] [CrossRef] - Wang, J.; Deng, L.; Huang, J.; Cai, R.; Zhu, X.; Liu, F.; Wang, Q.; Zhang, J.; Zheng, Y. High expression of Fibronectin 1 suppresses apoptosis through the NF-kappaB pathway and is associated with migration in nasopharyngeal carcinoma. Am. J. Transl. Res.
**2017**, 9, 4502–4511. [Google Scholar] - Kumara, H.; Bellini, G.A.; Caballero, O.L.; Herath, S.A.C.; Su, T.; Ahmed, A.; Njoh, L.; Cekic, V.; Whelan, R.L. P-Cadherin (CDH3) is overexpressed in colorectal tumors and has potential as a serum marker for colorectal cancer monitoring. Oncoscience
**2017**, 4, 139–147. [Google Scholar] [CrossRef] [Green Version] - Taniuchi, K.; Nakagawa, H.; Hosokawa, M.; Nakamura, T.; Eguchi, H.; Ohigashi, H.; Ishikawa, O.; Katagiri, T.; Nakamura, Y. Overexpressed P-cadherin/CDH3 promotes motility of pancreatic cancer cells by interacting with p120ctn and activating rho-family GTPases. Cancer Res.
**2005**, 65, 3092–3099. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Gao, W.; Liu, Y.; Qin, R.; Liu, D.; Feng, Q. Silence of fibronectin 1 increases cisplatin sensitivity of non-small cell lung cancer cell line. Biochem. Biophys. Res. Commun.
**2016**, 476, 35–41. [Google Scholar] [CrossRef] [PubMed] - Vieira, A.F.; Paredes, J. P-cadherin and the journey to cancer metastasis. Mol. Cancer
**2015**, 14, 178. [Google Scholar] [CrossRef] [Green Version] - Wang, C.L.; Yue, D.S.; Zhang, Z.F.; Zhan, Z.L.; Sun, L.N. Value of thyroid transcription factor-1 in identification of the prognosis of bronchioloalveolar carcinoma. Zhonghua Yi Xue Za Zhi
**2007**, 87, 2350–2354. [Google Scholar] [CrossRef] [Green Version] - Barletta, J.A.; Perner, S.; Iafrate, A.J.; Yeap, B.Y.; Weir, B.A.; Johnson, L.A.; Johnson, B.E.; Meyerson, M.; Rubin, M.A.; Travis, W.D.; et al. Clinical significance of TTF-1 protein expression and TTF-1 gene amplification in lung adenocarcinoma. J. Cell Mol. Med.
**2009**, 13, 1977–1986. [Google Scholar] [CrossRef] [PubMed] - Han, X.; Tan, Q.; Yang, S.; Li, J.; Xu, J.; Hao, X.; Hu, X.; Xing, P.; Liu, Y.; Lin, L.; et al. Comprehensive Profiling of Gene Copy Number Alterations Predicts Patient Prognosis in Resected Stages I-III Lung Adenocarcinoma. Front. Oncol.
**2019**, 9, 556. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Au, N.H.; Cheang, M.; Huntsman, D.G.; Yorida, E.; Coldman, A.; Elliott, W.M.; Bebb, G.; Flint, J.; English, J.; Gilks, C.B.; et al. Evaluation of immunohistochemical markers in non-small cell lung cancer by unsupervised hierarchical clustering analysis: A tissue microarray study of 284 cases and 18 markers. J. Pathol.
**2004**, 204, 101–109. [Google Scholar] [CrossRef] [PubMed] - Shah, L.; Walter, K.L.; Borczuk, A.C.; Kawut, S.M.; Sonett, J.R.; Gorenstein, L.A.; Ginsburg, M.E.; Steinglass, K.M.; Powell, C.A. Expression of syndecan-1 and expression of epidermal growth factor receptor are associated with survival in patients with nonsmall cell lung carcinoma. Cancer
**2004**, 101, 1632–1638. [Google Scholar] [CrossRef] - Haque, A.K.; Syed, S.; Lele, S.M.; Freeman, D.H.; Adegboyega, P.A. Immunohistochemical study of thyroid transcription factor-1 and HER2/neu in non-small cell lung cancer: Strong thyroid transcription factor-1 expression predicts better survival. Appl. Immunohistochem. Mol. Morphol.
**2002**, 10, 103–109. [Google Scholar] [CrossRef] - Pelosi, G.; Fraggetta, F.; Pasini, F.; Maisonneuve, P.; Sonzogni, A.; Iannucci, A.; Terzi, A.; Bresaola, E.; Valduga, F.; Lupo, C.; et al. Immunoreactivity for thyroid transcription factor-1 in stage I non-small cell carcinomas of the lung. Am. J. Surg. Pathol.
**2001**, 25, 363–372. [Google Scholar] [CrossRef] - Barlesi, F.; Pinot, D.; Legoffic, A.; Doddoli, C.; Chetaille, B.; Torre, J.P.; Astoul, P. Positive thyroid transcription factor 1 staining strongly correlates with survival of patients with adenocarcinoma of the lung. Br. J. Cancer
**2005**, 93, 450–452. [Google Scholar] [CrossRef] [Green Version] - Puglisi, F.; Barbone, F.; Damante, G.; Bruckbauer, M.; Di Lauro, V.; Beltrami, C.A.; Di Loreto, C. Prognostic value of thyroid transcription factor-1 in primary, resected, non-small cell lung carcinoma. Mod. Pathol.
**1999**, 12, 318–324. [Google Scholar] - Stenhouse, G.; Fyfe, N.; King, G.; Chapman, A.; Kerr, K.M. Thyroid transcription factor 1 in pulmonary adenocarcinoma. J. Clin. Pathol.
**2004**, 57, 383–387. [Google Scholar] [CrossRef] - Berghmans, T.; Paesmans, M.; Mascaux, C.; Martin, B.; Meert, A.P.; Haller, A.; Lafitte, J.J.; Sculier, J.P. Thyroid transcription factor 1—A new prognostic factor in lung cancer: A meta-analysis. Ann. Oncol.
**2006**, 17, 1673–1676. [Google Scholar] [CrossRef] - Myong, N.H. Thyroid transcription factor-1 (TTF-1) expression in human lung carcinomas: Its prognostic implication and relationship with wxpressions of p53 and Ki-67 proteins. J. Korean Med. Sci.
**2003**, 18, 494–500. [Google Scholar] [CrossRef] - Tan, D.; Li, Q.; Deeb, G.; Ramnath, N.; Slocum, H.K.; Brooks, J.; Cheney, R.; Wiseman, S.; Anderson, T.; Loewen, G. Thyroid transcription factor-1 expression prevalence and its clinical implications in non-small cell lung cancer: A high-throughput tissue microarray and immunohistochemistry study. Hum. Pathol.
**2003**, 34, 597–604. [Google Scholar] [CrossRef] - Yoon, S.O.; Kim, Y.T.; Jung, K.C.; Jeon, Y.K.; Kim, B.H.; Kim, C.W. TTF-1 mRNA-positive circulating tumor cells in the peripheral blood predict poor prognosis in surgically resected non-small cell lung cancer patients. Lung Cancer
**2011**, 71, 209–216. [Google Scholar] [CrossRef] [PubMed] - Liu, Y.; Beyer, A.; Aebersold, R. On the Dependency of Cellular Protein Levels on mRNA Abundance. Cell
**2016**, 165, 535–550. [Google Scholar] [CrossRef] [Green Version] - Diao, G.; Vidyashankar, A.N. Assessing genome-wide statistical significance for large p small n problems. Genetics
**2013**, 194, 781–783. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Hamamoto, R.; Komatsu, M.; Takasawa, K.; Asada, K.; Kaneko, S. Epigenetics Analysis and Integrated Analysis of Multiomics Data, Including Epigenetic Data, Using Artificial Intelligence in the Era of Precision Medicine. Biomolecules
**2019**, 10, 62. [Google Scholar] [CrossRef] [Green Version] - Chakraborty, S.; Tomsett, R.; Raghavendra, R.; Harborne, D.; Alzantot, M.; Cerutti, F.; Srivastava, M.B.; Preece, A.D.; Julier, S.J.; Rao, R.M.; et al. Interpretability of deep learning models: A survey of results. In 2017 IEEE SmartWorld, Ubiquitous Intelligence Computing, Advanced Trusted Computed, Scalable Computing Communications, Cloud Big Data Computing, Internet of People and Smart City Innovation, Proceedings of the 2017 IEEE SmartWorld, San Francisco, CA, USA, 4–8 August 2017; IEEE: New York, NY, USA, 2017; pp. 1–6. [Google Scholar] [CrossRef]

**Figure 1.**Overall workflow of the study. (

**a**) Detecting integration survival subtypes in non-small cell lung cancer (NSCLC) from six categorical multi-omics data in The Cancer Genome Atlas (TCGA). An autoencoder and unsupervised learning technique were used. (

**b**) Prediction of integration survival subtypes using only one categorical data and the validation of the model using uncommon data.

**Figure 2.**Prediction of the cluster number and k-means clustering. (

**a**) Result of the elbow method. The x-axis shows the number of clusters; the y-axis shows the distortion score. (

**b**) Result of the Calinski-Harabasz index and Silhouette Coefficient. The x-axis shows the number of clusters; the y-axis shows the Silhouette score or Calinski-Harabasz score. (

**c**) Visualization of the k-means clustering by t-SNE. (

**d**) Kaplan-Meier survival curves of integration survival subtypes.

**Figure 3.**3D-scatter plots of compressed common ID data belonging to one category. Each axis represents the data values and the color shows Cluster ID. (

**a**) Methylation common data. (

**b**) reverse phase protein array (RPPA) common data. (

**c**) Somatic mutation common data. (

**d**) miRNA common data. The Cluster ID are not separated in (

**a**,

**c**,

**d**). In (

**b**), the Cluster ID were separated clearly.

**Figure 4.**Kaplan-Meier survival curve of the RPPA uncommon dataset using the integration survival subtypes.

**Figure 5.**Receiver operating characteristic (ROC) analysis for evaluation of the machine learning models that predict the integration survival subtypes using uncompressed RPPA common datasets. ROC curves of XGBoost (

**a**) and LightGBM (

**b**).

**Figure 6.**SHapley Additive exPlanations (SHAP) summary plot. (

**a**) The plot shows the SHAP value of XGBoost magnitudes across all samples. The color represents the feature values (red represents high and blue represents low). (

**b**) The plot shows the sum of SHAP value of LightGBM.

**Figure 7.**Relationship between Cluster ID and NKX2-1 expression levels. (

**a**) Relationship between NKX2-1 RPPA expression levels and integration survival subtypes. x-Axis shows the integration survival subtype and Y-axis shows the value of NKX2-1 RRPA expression levels that are standardized against row (sample ID). (

**b**) Relationship between NKX2-1 mRNA expression levels and integration survival subtypes. x-Axis shows integration survival subtype and y-axis shows the value of NKX2-1 mRNA expression levels that are standardized against row (sample ID).

The Number of Samples of Each Data Type | |||
---|---|---|---|

Data Name | LUAD | LUSC | Total |

Common | 278 | 205 | 483 |

Clinical_uncommon | 197 | 262 | 459 |

mRNA_uncommon | 190 | 262 | 452 |

miRNA_uncommon | 125 | 103 | 228 |

RPPA_uncommon | 54 | 93 | 147 |

CNV_uncommon | 190 | 259 | 449 |

Somatic mutation_uncommon | 193 | 249 | 442 |

Methylation_uncommon | 135 | 131 | 266 |

The Number of Features in Each Step | |||
---|---|---|---|

Data Type | Before Compression | After Compression by Autoencoder | After Feature Selection by Cox-PH |

mRNA | 13,049 | 100 | 12 |

miRNA | 217 | 100 | 3 |

RPPA | 150 | 100 | 3 |

CNV | 14,786 | 100 | 5 |

Somatic mutation | 18,977 | 100 | 3 |

Methylation | 19,899 | 100 | 3 |

**Table 3.**Area under curve (AUC) of logistic regression models for predicting the survival subtypes using compressed data.

Data Type | AUC |
---|---|

mRNA | 0.57 ± 0.05 |

miRNA | 0.61 ± 0.07 |

RPPA | 0.99 ± 0.00 |

CNV | 0.43 ± 0.04 |

Somatic mutation | 0.50 ± 0.07 |

Methylation | 0.55 ± 0.05 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Takahashi, S.; Asada, K.; Takasawa, K.; Shimoyama, R.; Sakai, A.; Bolatkan, A.; Shinkai, N.; Kobayashi, K.; Komatsu, M.; Kaneko, S.;
et al. Predicting Deep Learning Based Multi-Omics Parallel Integration Survival Subtypes in Lung Cancer Using Reverse Phase Protein Array Data. *Biomolecules* **2020**, *10*, 1460.
https://doi.org/10.3390/biom10101460

**AMA Style**

Takahashi S, Asada K, Takasawa K, Shimoyama R, Sakai A, Bolatkan A, Shinkai N, Kobayashi K, Komatsu M, Kaneko S,
et al. Predicting Deep Learning Based Multi-Omics Parallel Integration Survival Subtypes in Lung Cancer Using Reverse Phase Protein Array Data. *Biomolecules*. 2020; 10(10):1460.
https://doi.org/10.3390/biom10101460

**Chicago/Turabian Style**

Takahashi, Satoshi, Ken Asada, Ken Takasawa, Ryo Shimoyama, Akira Sakai, Amina Bolatkan, Norio Shinkai, Kazuma Kobayashi, Masaaki Komatsu, Syuzo Kaneko,
and et al. 2020. "Predicting Deep Learning Based Multi-Omics Parallel Integration Survival Subtypes in Lung Cancer Using Reverse Phase Protein Array Data" *Biomolecules* 10, no. 10: 1460.
https://doi.org/10.3390/biom10101460